This plugin enables federation of DNS responses by querying multiple upstream resolvers and merging their answers.
Perfect for distributed systems like rqlite, etcd, or other Raft-based applications that need to discover all replicas across multiple Kubernetes clusters using a single DNS query.
- Clone this plugin into your CoreDNS build directory:
cd coredns
mkdir -p plugin/merge
# Copy merge.go and setup.go to plugin/merge/- Add to
plugin.cfg(before theforwardplugin):
merge:merge
- Build CoreDNS:
makemerge ZONE... {
to UPSTREAM_ADDR [remove_zone|REWRITE_ZONE] [PROTOCOL]
}
- ZONE: The DNS zone(s) to intercept (e.g.,
cluster.all) - UPSTREAM_ADDR: Address of upstream DNS server (e.g.,
10.96.0.10:53) - remove_zone: Optional keyword - strips the matched zone from the query before forwarding
- REWRITE_ZONE: Optional zone to rewrite queries to before forwarding (e.g.,
svc.cluster.local) - PROTOCOL: Transport protocol -
udp,tcp, orudp_tcp(default:udp_tcp)udp: Use UDP onlytcp: Use TCP onlyudp_tcp: Try UDP first, automatically fallback to TCP on failure or truncation
The plugin supports three modes for handling DNS queries:
-
Zone Removal (
remove_zone): Strips the matched zone from the query- Query:
foo.cluster.all→ Upstream receives:foo - Use case: Forward to public DNS servers or simplified zone hierarchies
- Query:
-
Zone Rewriting (specify a zone): Replaces the matched zone with another zone
- Query:
foo.cluster.all→ Upstream receives:foo.svc.cluster.local - Use case: Multi-cluster federation with different internal zones
- Query:
-
Passthrough (no zone parameter): Keeps the query unchanged
- Query:
foo.cluster.all→ Upstream receives:foo.cluster.all - Use case: Upstreams that understand the same zone
- Query:
.:53 {
errors
health
ready
# Merge responses for *.cluster.all from two Kubernetes clusters
# Uses udp_tcp by default (UDP with TCP fallback)
merge cluster.all {
to 10.96.0.10:53 svc.cluster.local # Cluster A CoreDNS
to 10.97.0.10:53 svc.cluster.local # Cluster B CoreDNS
}
# Handle regular cluster-local queries
kubernetes cluster.local in-addr.arpa ip6.arpa {
pods insecure
fallthrough in-addr.arpa ip6.arpa
}
forward . /etc/resolv.conf
cache 30
loop
reload
loadbalance
}
.:53 {
errors
health
# Force TCP for environments where UDP is blocked
merge cluster.all {
to 10.96.0.10:53 svc.cluster.local tcp
to 10.97.0.10:53 svc.cluster.local tcp
}
forward . /etc/resolv.conf
cache 30
}
.:53 {
errors
log
# Production clusters - TCP only for strict firewall
merge prod.global {
to 10.10.1.10:53 svc.cluster.local tcp
to 10.10.2.10:53 svc.cluster.local tcp
to 10.10.3.10:53 svc.cluster.local tcp
}
# Staging clusters - UDP with TCP fallback
merge staging.global {
to 10.20.1.10:53 svc.cluster.local udp_tcp
to 10.20.2.10:53 svc.cluster.local udp_tcp
}
# Dev clusters - UDP only (fast, local network)
merge dev.global {
to 10.30.1.10:53 svc.cluster.local udp
}
forward . 8.8.8.8 8.8.4.4
cache 30
}
.:53 {
errors
log
# Strip .external zone and query public DNS
# Query: google.com.external -> Forwards: google.com
merge external {
to 8.8.8.8:53 remove_zone udp_tcp
to 1.1.1.1:53 remove_zone udp_tcp
}
# Internal services with zone rewriting
merge internal {
to 10.96.0.10:53 svc.cluster.local
}
forward . /etc/resolv.conf
cache 30
}
.:53 {
errors
log
# Mode 1: Zone removal - for public DNS queries
# Query: example.com.public -> Forwards: example.com
merge public {
to 8.8.8.8:53 remove_zone tcp
to 1.1.1.1:53 remove_zone tcp
}
# Mode 2: Zone rewriting - for multi-cluster federation
# Query: myapp.federated -> Forwards: myapp.svc.cluster.local
merge federated {
to 10.96.0.10:53 svc.cluster.local
to 10.97.0.10:53 svc.cluster.local
}
# Mode 3: Passthrough - upstreams understand the zone
# Query: service.shared.zone -> Forwards: service.shared.zone
merge shared.zone {
to 192.168.1.10:53
to 192.168.2.10:53
}
forward . 8.8.8.8
cache 30
}
-
Query Interception: When a query for
foo.bar.cluster.allarrives, the plugin intercepts it -
Zone Transformation: The query is transformed based on configuration:
- remove_zone: Query is rewritten to
foo.bar.(zone stripped) - Rewrite zone: Query is rewritten to
foo.bar.svc.cluster.local(zone replaced) - Passthrough: Query stays as
foo.bar.cluster.all(unchanged)
- remove_zone: Query is rewritten to
-
Parallel Resolution: All upstreams are queried simultaneously using configured protocol
-
Protocol Handling:
- UDP mode: Sends UDP queries only
- TCP mode: Sends TCP queries only
- UDP+TCP mode (default): Tries UDP first, automatically falls back to TCP if:
- UDP query fails (timeout, network error)
- Response is truncated (TC bit set)
-
Response Rewriting: DNS responses are rewritten back to match the original query name
-
Merging: DNS answers (A/AAAA records) are combined, duplicates removed
-
Response: A single response with all unique IPs is returned
- ✅ Parallel upstream queries for low latency
- ✅ Automatic deduplication of IP addresses
- ✅ Three zone handling modes: removal, rewriting, and passthrough
- ✅ Query rewriting per upstream
- ✅ Support for A and AAAA records
- ✅ TCP and UDP protocol support
- ✅ Automatic UDP to TCP fallback on truncation
- ✅ Per-upstream protocol configuration
- ✅ Configurable timeouts
- ✅ Multiple zone support
- Currently supports only A and AAAA record types
- Does not merge NS, MX, or other record types
- Fixed 5-second timeout per upstream (configurable in code)
Run the provided e2e test that builds the plugin, the docker image and runs a full e2e test:
./e2e_test.shUDP Blocked by Firewall
Error: UDP query to 10.96.0.10:53 failed: i/o timeout
Solution: Use tcp or udp_tcp protocol
Response Truncation
UDP response truncated for 10.96.0.10:53, retrying with TCP
This is normal - plugin automatically retries with TCP
No Responses
- Verify upstream DNS servers are reachable
- Check firewall rules for both UDP/53 and TCP/53
- Test with
dig @upstream-ip -p 53 test.svc.cluster.local
Configuration Error: Both remove_zone and rewrite_zone
Error: cannot specify both remove_zone and rewrite_zone for upstream 8.8.8.8:53
Solution: Choose only one zone handling mode per upstream - either remove_zone OR a rewrite zone, not both
Query Not Matching Zone
Enable debug logging to see query transformations:
.:53 {
log
merge cluster.all {
to 8.8.8.8:53 remove_zone
}
}
Look for log messages like:
Removing zone: foo.cluster.all -> foo for upstream 8.8.8.8:53Rewriting foo.cluster.all to foo.svc.cluster.local for upstream 8.8.8.8:53
This is a basic implementation. Potential enhancements:
- Support for more record types (SRV, TXT, etc.)
- Weighted merging based on upstream health
- Circuit breaker for failing upstreams
- Prometheus metrics