DNS Caching¶
In a typical Go HTTP client, DNS resolution happens on every new TCP connection. The OS DNS resolver (or Go's own pure-Go resolver) queries the configured nameservers, which involves at least one round-trip before the actual HTTP connection can begin. In environments with many short-lived connections or high-frequency requests to many distinct hostnames, DNS resolution overhead adds measurable latency.
relay's built-in DNS cache stores resolved IP addresses in memory for a configurable TTL. Subsequent connections to the same hostname skip DNS resolution entirely and dial the cached IP directly, reducing connection establishment latency.
WithDNSCache¶
WithDNSCache enables in-process DNS caching for the client. Resolved IP addresses are stored in a concurrent-safe map. Entries expire after ttl and are refreshed on the next request to that hostname after expiry.
package main
import (
"context"
"fmt"
"log"
"time"
"github.com/jhonsferg/relay"
)
func main() {
// Cache DNS results for 30 seconds.
// Requests to the same hostname within 30s reuse the cached IP.
client, err := relay.New(
relay.WithBaseURL("https://api.example.com"),
relay.WithDNSCache(30*time.Second),
)
if err != nil {
log.Fatal(err)
}
// First request: DNS resolution happens, result is cached
resp, err := client.Get(context.Background(), "/users", nil)
if err != nil {
log.Fatal(err)
}
resp.Body.Close()
fmt.Println("first request (DNS resolved):", resp.StatusCode)
// Second request: uses cached IP, no DNS query
resp, err = client.Get(context.Background(), "/users/42", nil)
if err != nil {
log.Fatal(err)
}
resp.Body.Close()
fmt.Println("second request (cached DNS):", resp.StatusCode)
}
note DNS caching is applied at the
DialContextlevel, before the TCP connection is established. It is separate from HTTP connection keep-alive. Even when HTTP connections are reused (keep-alive), the DNS cache provides a safety net for when those connections are dropped and re-established.
How DNS Caching Works in relay¶
When WithDNSCache(ttl) is configured, relay installs a caching resolver that wraps the system's default net.Resolver. The resolution flow is:
- A new TCP connection is needed for hostname
api.example.com. relay's caching resolver checks its in-memory map forapi.example.com.- Cache hit: the cached IP(s) are returned immediately. No DNS query is made.
- Cache miss or expired entry:
net.Resolver.LookupIPAddris called to resolve the hostname. The result is stored in the cache with the configured TTL. - The TCP connection is established to the resolved IP.
package main
import (
"context"
"fmt"
"log"
"sync"
"time"
"github.com/jhonsferg/relay"
)
func main() {
// With a 60-second TTL, DNS is resolved at most once per minute per hostname.
client, err := relay.New(
relay.WithBaseURL("https://cdn.example.com"),
relay.WithDNSCache(60*time.Second),
)
if err != nil {
log.Fatal(err)
}
var wg sync.WaitGroup
for i := 0; i < 50; i++ {
wg.Add(1)
go func(n int) {
defer wg.Done()
resp, err := client.Get(context.Background(), fmt.Sprintf("/assets/image-%d.png", n), nil)
if err != nil {
log.Printf("request %d failed: %v", n, err)
return
}
resp.Body.Close()
}(i)
}
wg.Wait()
// Despite 50 concurrent requests, DNS was resolved only once (for the first request).
// All subsequent requests reused the cached IP.
fmt.Println("50 requests completed with at most 1 DNS lookup")
}
Default TTL and Override¶
relay does not have a built-in default TTL for DNS caching - you must explicitly specify the TTL when calling WithDNSCache. This design choice is intentional: DNS TTL requirements vary significantly between use cases, and silently choosing a default could cause subtle correctness issues.
Common TTL values and their trade-offs:
| TTL | Use case | Trade-off |
|---|---|---|
| 5s | Kubernetes services that scale quickly | Frequent DNS queries, fast failover |
| 30s | Standard microservice-to-microservice | Good balance of performance and freshness |
| 60s | Stable external APIs (Stripe, AWS) | Fewer queries, slightly slower failover |
| 300s | CDN edge endpoints (rarely change) | Very few queries, slow to detect IP changes |
package main
import (
"log"
"time"
"github.com/jhonsferg/relay"
)
func main() {
// Short TTL for a Kubernetes service that uses pod IP endpoints
k8sClient, err := relay.New(
relay.WithBaseURL("https://my-service.default.svc.cluster.local"),
relay.WithDNSCache(5*time.Second),
)
if err != nil {
log.Fatal(err)
}
// Long TTL for a stable external API
stripeClient, err := relay.New(
relay.WithBaseURL("https://api.stripe.com"),
relay.WithDNSCache(5*time.Minute),
)
if err != nil {
log.Fatal(err)
}
log.Println("k8s client:", k8sClient)
log.Println("stripe client:", stripeClient)
}
Cache Invalidation¶
Cache entries expire automatically when their TTL elapses. The entry is removed lazily - it is not actively evicted when it expires, but it is treated as a miss the next time a connection to that hostname is requested.
You can also manually flush the entire DNS cache if you need to force fresh resolution immediately - for example, after receiving a webhook notification that a service's IP has changed:
package main
import (
"context"
"fmt"
"log"
"time"
"github.com/jhonsferg/relay"
)
func main() {
client, err := relay.New(
relay.WithBaseURL("https://partner.example.com"),
relay.WithDNSCache(60*time.Second),
)
if err != nil {
log.Fatal(err)
}
// Normal request - DNS is cached for 60 seconds
resp, err := client.Get(context.Background(), "/status", nil)
if err != nil {
log.Fatal(err)
}
resp.Body.Close()
fmt.Println("initial request ok")
// Suppose we received a notification that the partner service has
// migrated to new IPs. Flush the cache to force re-resolution.
client.FlushDNSCache()
fmt.Println("DNS cache flushed")
// This request will re-resolve DNS regardless of the TTL
resp, err = client.Get(context.Background(), "/status", nil)
if err != nil {
log.Fatal(err)
}
resp.Body.Close()
fmt.Println("post-flush request ok (fresh DNS):", resp.StatusCode)
// You can also invalidate a single hostname
client.InvalidateDNSEntry("partner.example.com")
fmt.Println("specific entry invalidated")
}
note
FlushDNSCache()is safe to call from any goroutine. It is a synchronous, blocking call that clears the cache atomically. Requests in flight at the time of the flush are not affected - they already have a resolved IP from the dialing phase.
Why DNS Caching Matters in Kubernetes¶
Kubernetes presents a particularly important case for DNS caching due to the interaction between the DNS TTL, the DNS server load, and connection keep-alive.
In a Kubernetes cluster, CoreDNS handles DNS resolution. The default TTL for in-cluster DNS records (ClusterIP services) is 5 seconds. If you have 100 pods each making 50 requests/second to a downstream service, and every request triggers a DNS lookup, you get:
This can saturate CoreDNS and cause intermittent SERVFAIL responses, which manifest as mysterious connection timeouts.
With DNS caching at a 30-second TTL:
A 1,500x reduction in DNS load.
package main
import (
"context"
"log"
"sync"
"time"
"github.com/jhonsferg/relay"
)
func newKubernetesClient(serviceURL string) (*relay.Client, error) {
return relay.New(
relay.WithBaseURL(serviceURL),
// 30-second TTL balances DNS freshness with CoreDNS load.
// After a rolling deployment, new pod IPs are picked up within 30s.
relay.WithDNSCache(30*time.Second),
relay.WithTimeout(5*time.Second),
relay.WithRetry(relay.RetryConfig{
MaxAttempts: 3,
Backoff: relay.ExponentialBackoff(100*time.Millisecond, 2.0),
}),
)
}
func main() {
client, err := newKubernetesClient("https://inventory.default.svc.cluster.local")
if err != nil {
log.Fatal(err)
}
// Simulate 100 concurrent requests
var wg sync.WaitGroup
for i := 0; i < 100; i++ {
wg.Add(1)
go func() {
defer wg.Done()
resp, err := client.Get(context.Background(), "/health", nil)
if err != nil {
log.Printf("health check failed: %v", err)
return
}
resp.Body.Close()
}()
}
wg.Wait()
log.Println("100 health checks completed with minimal DNS load")
}
tip In Kubernetes, also consider using
ndots:5optimization. By default, Go's resolver appends multiple search domain suffixes before resolving a short hostname. Appending the full FQDN (ending with.) bypasses search domain expansion and reduces DNS queries further:inventory.default.svc.cluster.local.instead ofinventory.
Performance Impact¶
The performance improvement from DNS caching is most visible in two scenarios:
Scenario 1: High connection churn Services that establish many short-lived connections (HTTP/1.0 without keep-alive, microbursts of traffic) see the largest benefit because DNS is resolved for every new TCP connection.
Scenario 2: Large fan-out A service that makes requests to many different hostnames (e.g., a gateway that proxies to 50 different microservices) benefits because DNS caching amortizes the resolver overhead across all calls to each hostname.
package main
import (
"context"
"fmt"
"sync"
"time"
"log"
"github.com/jhonsferg/relay"
)
func benchmark(label string, client *relay.Client, requests int) {
var wg sync.WaitGroup
start := time.Now()
for i := 0; i < requests; i++ {
wg.Add(1)
go func() {
defer wg.Done()
resp, err := client.Get(context.Background(), "/ping", nil)
if err == nil {
resp.Body.Close()
}
}()
}
wg.Wait()
elapsed := time.Since(start)
fmt.Printf("%-25s %d req in %s (%.0f req/s)\n",
label, requests, elapsed, float64(requests)/elapsed.Seconds())
}
func main() {
withCache, err := relay.New(
relay.WithBaseURL("https://api.example.com"),
relay.WithDNSCache(30*time.Second),
)
if err != nil {
log.Fatal(err)
}
withoutCache, err := relay.New(
relay.WithBaseURL("https://api.example.com"),
// No DNS caching
)
if err != nil {
log.Fatal(err)
}
benchmark("without DNS cache:", withoutCache, 500)
benchmark("with DNS cache: ", withCache, 500)
}
In a typical environment, DNS caching reduces mean request latency by 1ms to 5ms per request when the DNS resolver is local (same node), and by 5ms to 50ms when DNS queries cross a network boundary.
Thread Safety¶
The DNS cache is fully thread-safe. All cache operations (read, write, expire, flush) are protected by an internal sync.RWMutex. Multiple goroutines can read from the cache concurrently without contention. Writes (on cache miss or TTL expiry) acquire an exclusive lock for the duration of the update.
The lock granularity is per-cache, not per-entry. For caches with many concurrent miss-and-refresh events (e.g., cold start of a large fan-out service), there is brief contention. In practice this is negligible because DNS resolutions for distinct hostnames complete in milliseconds.
package main
import (
"context"
"fmt"
"log"
"sync"
"time"
"github.com/jhonsferg/relay"
)
func main() {
client, err := relay.New(
relay.WithBaseURL("https://api.example.com"),
relay.WithDNSCache(30*time.Second),
)
if err != nil {
log.Fatal(err)
}
// Safe: all goroutines share the same client and its DNS cache.
// Concurrent reads from the cache are non-blocking.
// Writes (on cache miss) are serialized but brief.
const goroutines = 200
var wg sync.WaitGroup
successCount := sync.Map{}
for i := 0; i < goroutines; i++ {
wg.Add(1)
go func(n int) {
defer wg.Done()
resp, err := client.Get(context.Background(), fmt.Sprintf("/items/%d", n), nil)
if err != nil {
return
}
resp.Body.Close()
successCount.Store(n, true)
}(i)
}
wg.Wait()
count := 0
successCount.Range(func(_, _ interface{}) bool {
count++
return true
})
log.Printf("%d/%d requests succeeded with concurrent DNS cache access", count, goroutines)
// Flush is also thread-safe - can be called from any goroutine
go func() {
time.Sleep(100 * time.Millisecond)
client.FlushDNSCache()
log.Println("cache flushed from background goroutine - safe")
}()
time.Sleep(200 * time.Millisecond)
fmt.Println("all operations completed safely")
}
note A single
relay.Clientinstance should be shared across goroutines. Creating a newrelay.Clientper request defeats DNS caching (the cache is per-client) and also defeats HTTP connection keep-alive. Always create one client per logical downstream service and reuse it throughout the lifetime of your application.
Summary¶
| Feature | API |
|---|---|
| Enable DNS caching | WithDNSCache(ttl) |
| Flush entire cache | client.FlushDNSCache() |
| Invalidate one entry | client.InvalidateDNSEntry(hostname) |
| Recommended TTL (Kubernetes) | 5s to 30s |
| Recommended TTL (external APIs) | 60s to 5m |
| Thread safety | Full - uses sync.RWMutex internally |
| Performance benefit | 1ms to 50ms per request (DNS RTT eliminated) |
| Kubernetes benefit | Reduces CoreDNS query load by 100x to 1000x |