Request Hedging¶
Request hedging is a latency-reduction technique where a client sends a duplicate request to the same service after a short delay, and uses whichever response arrives first. The slower duplicate is cancelled as soon as the first response is received. This is sometimes called a "speculative retry" or "tail-cut" strategy.
Hedging is especially effective against tail latency - the high-percentile slow requests caused by GC pauses, kernel scheduling jitter, hot spots in downstream caches, or transient network congestion. Rather than waiting the full p99 time on every request, hedging lets you pay the cost of occasional extra network traffic to keep your observed latency close to the p50.
WithHedging¶
WithHedging enables request hedging. After after duration has elapsed and the first attempt has not yet returned a response, a duplicate request is issued in parallel. The first response to arrive (from either the original or the hedge) is returned to the caller. The other in-flight request is cancelled immediately.
package main
import (
"context"
"fmt"
"log"
"time"
"github.com/jhonsferg/relay"
)
func main() {
// Issue a hedge if the original request has not returned within 50ms.
// At p50 = 20ms, most requests finish before the hedge is needed.
// At p99 = 200ms, the hedge fires and the slow outlier is replaced.
client, err := relay.New(
relay.WithBaseURL("https://search.internal"),
relay.WithHedging(50*time.Millisecond),
)
if err != nil {
log.Fatal(err)
}
resp, err := client.Get(context.Background(), "/search?q=relay", nil)
if err != nil {
log.Fatal(err)
}
defer resp.Body.Close()
fmt.Println("status:", resp.StatusCode)
}
note The
afterduration should be set close to your p95 latency - not your p50 or p99. Setting it too low wastes bandwidth by hedging almost every request. Setting it too high means it only fires for extreme outliers and the latency improvement is minimal.
HedgeMaxAttempts¶
// HedgeMaxAttempts controls the maximum number of parallel attempts,
// including the original. Default is 2 (one original + one hedge).
type HedgeMaxAttempts int
By default, hedging issues at most two attempts: the original and one hedge. You can raise this limit to allow more parallel speculative requests, which reduces tail latency further at the cost of increased load on the downstream service.
package main
import (
"context"
"fmt"
"log"
"time"
"github.com/jhonsferg/relay"
)
func main() {
// Allow up to 3 parallel attempts: original + 2 hedges.
// Second hedge fires at 2*after if neither original nor first hedge has returned.
client, err := relay.New(
relay.WithBaseURL("https://catalog.internal"),
relay.WithHedging(30*time.Millisecond),
relay.WithHedgeMaxAttempts(3),
)
if err != nil {
log.Fatal(err)
}
resp, err := client.Get(context.Background(), "/items/42", nil)
if err != nil {
log.Fatal(err)
}
defer resp.Body.Close()
fmt.Println("status:", resp.StatusCode)
}
warning Raising
HedgeMaxAttemptsabove 2 multiplies the load you place on the downstream service in the tail. AtHedgeMaxAttempts(4)you may be sending 4x the traffic in worst-case scenarios. Always confirm the downstream service has capacity before increasing this value.
When to Use Hedging: Idempotent Reads Only¶
Hedging is safe only for idempotent requests where sending the same request twice produces no side effects. The canonical safe HTTP methods are:
| Method | Safe for hedging | Reason |
|---|---|---|
| GET | Yes | Reads only, no side effects |
| HEAD | Yes | Reads only, no side effects |
| OPTIONS | Yes | Metadata only, no side effects |
| PUT | Conditional | Safe only if the payload is identical and the operation is truly idempotent |
| POST | No | Creates resources, charges payments, triggers emails - sends twice = two actions |
| DELETE | No | Deleting twice is not always idempotent (e.g., second returns 404) |
| PATCH | No | Partial updates are rarely idempotent |
package main
import (
"context"
"encoding/json"
"fmt"
"log"
"time"
"github.com/jhonsferg/relay"
)
// safeSearchClient is configured for hedging - reads only.
func newSearchClient() (*relay.Client, error) {
return relay.New(
relay.WithBaseURL("https://search.internal"),
relay.WithHedging(40*time.Millisecond),
relay.WithHedgeMaxAttempts(2),
)
}
// writesClient has no hedging - mutations only.
func newOrdersClient() (*relay.Client, error) {
return relay.New(
relay.WithBaseURL("https://orders.internal"),
// No WithHedging here - POST /orders would create duplicate orders
relay.WithTimeout(5*time.Second),
)
}
func main() {
search, err := newSearchClient()
if err != nil {
log.Fatal(err)
}
orders, err := newOrdersClient()
if err != nil {
log.Fatal(err)
}
// Safe: GET with hedging
resp, err := search.Get(context.Background(), "/search?q=shoes", nil)
if err != nil {
log.Fatal(err)
}
defer resp.Body.Close()
var results map[string]interface{}
json.NewDecoder(resp.Body).Decode(&results)
fmt.Println("search results:", results)
// Unsafe for hedging: POST creates an order - do NOT hedge this
_, err = orders.Post(context.Background(), "/orders", map[string]interface{}{
"product_id": "prod-99",
"quantity": 1,
})
if err != nil {
log.Fatal(err)
}
}
Why NOT to Use Hedging on Writes¶
Suppose you hedge a POST /orders request. The sequence of events could be:
- Original request is sent at T=0ms.
- Hedge fires at T=50ms (no response yet).
- Original request succeeds at T=55ms, order is created in the database.
- The hedge arrives at T=60ms, a second order is created.
- Caller receives the first response (T=55ms) - they believe one order was placed.
- Customer is now charged twice and two orders are shipped.
Even if the server has its own idempotency layer, relying on server-side idempotency for hedging requires explicit coordination (idempotency keys). Without that, hedging writes is dangerous.
package main
import (
"context"
"log"
"github.com/jhonsferg/relay"
)
// badExample demonstrates what NOT to do.
// nolint:unused
func badExample() {
// DO NOT DO THIS - hedging POST requests is dangerous
client, err := relay.New(
relay.WithBaseURL("https://payments.internal"),
relay.WithHedging(50e6), // 50ms in nanoseconds
// ^ This would send duplicate payment requests!
)
if err != nil {
log.Fatal(err)
}
// This POST could be executed twice, charging the customer twice.
_, err = client.Post(context.Background(), "/charge", map[string]interface{}{
"amount": 9999,
"currency": "USD",
"card": "tok_visa",
})
if err != nil {
log.Println(err)
}
}
func main() {
log.Println("see badExample() for what NOT to do with hedging")
}
Latency Percentile Improvement¶
Hedging's effect on latency depends on the variance of your downstream service. The more variable the latency (high p99/p50 ratio), the more hedging helps.
Consider a service with the following latency distribution: - p50: 15ms - p95: 40ms - p99: 180ms - p999: 900ms
With hedging at after=40ms: - 95% of requests return within 40ms (no hedge fires) - 5% of requests trigger a hedge - The hedge effectively replaces the 5% slow tail with a second "fresh" attempt - New observed p99 drops from 180ms to approximately 80ms (the p99 of the hedged window)
package main
import (
"context"
"fmt"
"log"
"sort"
"sync"
"time"
"github.com/jhonsferg/relay"
)
// measureLatency runs n parallel requests and reports percentiles.
func measureLatency(client *relay.Client, path string, n int) {
latencies := make([]time.Duration, 0, n)
var mu sync.Mutex
var wg sync.WaitGroup
for i := 0; i < n; i++ {
wg.Add(1)
go func() {
defer wg.Done()
start := time.Now()
resp, err := client.Get(context.Background(), path, nil)
elapsed := time.Since(start)
if err == nil {
resp.Body.Close()
}
mu.Lock()
latencies = append(latencies, elapsed)
mu.Unlock()
}()
}
wg.Wait()
sort.Slice(latencies, func(i, j int) bool { return latencies[i] < latencies[j] })
p := func(pct float64) time.Duration {
idx := int(float64(len(latencies)) * pct)
if idx >= len(latencies) {
idx = len(latencies) - 1
}
return latencies[idx]
}
fmt.Printf("p50=%-8s p95=%-8s p99=%-8s p999=%s\n",
p(0.50), p(0.95), p(0.99), p(0.999))
}
func main() {
// Without hedging
plain, err := relay.New(relay.WithBaseURL("https://search.internal"))
if err != nil {
log.Fatal(err)
}
fmt.Print("without hedging: ")
measureLatency(plain, "/search?q=test", 1000)
// With hedging at p95 threshold
hedged, err := relay.New(
relay.WithBaseURL("https://search.internal"),
relay.WithHedging(40*time.Millisecond),
)
if err != nil {
log.Fatal(err)
}
fmt.Print("with hedging: ")
measureLatency(hedged, "/search?q=test", 1000)
}
Cancellation of Losing Requests¶
When the first response arrives, relay immediately cancels the context of all other in-flight hedge requests. This is important for resource cleanup - cancelled requests do not continue consuming connections in the connection pool.
The cancellation happens automatically. The downstream service will receive a TCP RST or the connection will be closed without reading the response body. Well-behaved servers detect context cancellation and abort processing.
package main
import (
"context"
"fmt"
"log"
"time"
"github.com/jhonsferg/relay"
)
func main() {
client, err := relay.New(
relay.WithBaseURL("https://api.internal"),
relay.WithHedging(25*time.Millisecond),
relay.WithHedgeMaxAttempts(2),
relay.OnResponse(func(ctx context.Context, resp *relay.Response) {
// The hedge attempt index is available in the context
fmt.Printf("response from attempt, status=%d\n", resp.StatusCode)
}),
relay.OnError(func(ctx context.Context, err error) {
if err == context.Canceled {
// This fires for the losing hedge when it is cancelled.
// This is expected - do not treat it as an application error.
fmt.Println("hedge attempt cancelled (expected)")
return
}
log.Println("unexpected error:", err)
}),
)
if err != nil {
log.Fatal(err)
}
resp, err := client.Get(context.Background(), "/data/report", nil)
if err != nil {
log.Fatal(err)
}
defer resp.Body.Close()
fmt.Println("final status:", resp.StatusCode)
}
note Your
OnErrorhandler will be called withcontext.Canceledfor the losing hedge request. This is expected behavior and should not be counted as an error in your metrics. Filter it out explicitly as shown above.
Combined with Circuit Breaker¶
Hedging and circuit breakers complement each other well. Hedging reduces latency from occasional slow requests. The circuit breaker prevents requests from being sent when the service is consistently failing.
package main
import (
"context"
"fmt"
"log"
"time"
"github.com/jhonsferg/relay"
)
func main() {
client, err := relay.New(
relay.WithBaseURL("https://catalog.internal"),
// Circuit breaker: open after 5 consecutive failures,
// probe again after 10 seconds.
relay.WithCircuitBreaker(relay.CircuitBreakerConfig{
FailureThreshold: 5,
RecoveryTimeout: 10 * time.Second,
HalfOpenMaxRequests: 2,
}),
// Hedging: fire a speculative retry after 35ms.
// Only fires when the circuit is closed (service is healthy).
relay.WithHedging(35*time.Millisecond),
// Overall timeout including all hedge attempts.
relay.WithTimeout(500*time.Millisecond),
)
if err != nil {
log.Fatal(err)
}
// The circuit breaker prevents requests from going to a failing service.
// When the service is healthy, hedging reduces tail latency.
for i := 0; i < 5; i++ {
resp, err := client.Get(context.Background(), fmt.Sprintf("/items/%d", i), nil)
if err != nil {
log.Printf("request %d failed: %v", i, err)
continue
}
resp.Body.Close()
fmt.Printf("request %d: status=%d\n", i, resp.StatusCode)
}
}
The interaction between hedging and the circuit breaker:
- Circuit closed (healthy): hedging is active, tail latency is reduced.
- Circuit opening (failures accumulating): hedges fire but also fail, contributing to the failure count.
- Circuit open (failing fast): hedging does not fire because requests are rejected before touching the network.
- Circuit half-open (probing): a single probe request is sent. If the hedge also fires and succeeds, the circuit may close faster.
tip Set your circuit breaker's failure threshold to account for hedge cancellations. Since cancelled hedges may appear as errors in some configurations, ensure
context.Cancelederrors do not count toward the failure threshold.
Summary¶
| Concern | Recommendation |
|---|---|
| Enable hedging | WithHedging(after) where after is near p95 latency |
| Limit parallel attempts | WithHedgeMaxAttempts(2) is usually enough |
| Safe methods only | GET, HEAD, OPTIONS - never POST, DELETE, PATCH |
| Handle cancellations | Filter context.Canceled in OnError |
| Monitor extra traffic | Expect up to ~5% extra requests when after=p95 |
| Combine with circuit breaker | Reduces both tail latency and cascading failures |