Skip to content

CoreDNS merge plugin for merging DNS answer from multiple upstreams. Works perfect in pair with Cilium Cluster Mesh for multicluster services/loadbalancing and Raft protocols.

Notifications You must be signed in to change notification settings

mattiaforc/coredns-merge

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CoreDNS Multi-Resolver Merge Plugin

This plugin enables federation of DNS responses by querying multiple upstream resolvers and merging their answers.

Use Case

Perfect for distributed systems like rqlite, etcd, or other Raft-based applications that need to discover all replicas across multiple Kubernetes clusters using a single DNS query.

Installation

  1. Clone this plugin into your CoreDNS build directory:
cd coredns
mkdir -p plugin/merge
# Copy merge.go and setup.go to plugin/merge/
  1. Add to plugin.cfg (before the forward plugin):
merge:merge
  1. Build CoreDNS:
make

Configuration

Syntax

merge ZONE... {
    to UPSTREAM_ADDR [remove_zone|REWRITE_ZONE] [PROTOCOL]
}
  • ZONE: The DNS zone(s) to intercept (e.g., cluster.all)
  • UPSTREAM_ADDR: Address of upstream DNS server (e.g., 10.96.0.10:53)
  • remove_zone: Optional keyword - strips the matched zone from the query before forwarding
  • REWRITE_ZONE: Optional zone to rewrite queries to before forwarding (e.g., svc.cluster.local)
  • PROTOCOL: Transport protocol - udp, tcp, or udp_tcp (default: udp_tcp)
    • udp: Use UDP only
    • tcp: Use TCP only
    • udp_tcp: Try UDP first, automatically fallback to TCP on failure or truncation

Zone Handling Modes

The plugin supports three modes for handling DNS queries:

  1. Zone Removal (remove_zone): Strips the matched zone from the query

    • Query: foo.cluster.all → Upstream receives: foo
    • Use case: Forward to public DNS servers or simplified zone hierarchies
  2. Zone Rewriting (specify a zone): Replaces the matched zone with another zone

    • Query: foo.cluster.all → Upstream receives: foo.svc.cluster.local
    • Use case: Multi-cluster federation with different internal zones
  3. Passthrough (no zone parameter): Keeps the query unchanged

    • Query: foo.cluster.all → Upstream receives: foo.cluster.all
    • Use case: Upstreams that understand the same zone

Example Corefile

Simple Multi-Cluster Setup

.:53 {
    errors
    health
    ready
    
    # Merge responses for *.cluster.all from two Kubernetes clusters
    # Uses udp_tcp by default (UDP with TCP fallback)
    merge cluster.all {
        to 10.96.0.10:53 svc.cluster.local    # Cluster A CoreDNS
        to 10.97.0.10:53 svc.cluster.local    # Cluster B CoreDNS
    }
    
    # Handle regular cluster-local queries
    kubernetes cluster.local in-addr.arpa ip6.arpa {
        pods insecure
        fallthrough in-addr.arpa ip6.arpa
    }
    
    forward . /etc/resolv.conf
    cache 30
    loop
    reload
    loadbalance
}

TCP-Only Configuration (Firewall Restrictions)

.:53 {
    errors
    health
    
    # Force TCP for environments where UDP is blocked
    merge cluster.all {
        to 10.96.0.10:53 svc.cluster.local tcp
        to 10.97.0.10:53 svc.cluster.local tcp
    }
    
    forward . /etc/resolv.conf
    cache 30
}

Advanced Setup with Multiple Zones

.:53 {
    errors
    log

    # Production clusters - TCP only for strict firewall
    merge prod.global {
        to 10.10.1.10:53 svc.cluster.local tcp
        to 10.10.2.10:53 svc.cluster.local tcp
        to 10.10.3.10:53 svc.cluster.local tcp
    }

    # Staging clusters - UDP with TCP fallback
    merge staging.global {
        to 10.20.1.10:53 svc.cluster.local udp_tcp
        to 10.20.2.10:53 svc.cluster.local udp_tcp
    }

    # Dev clusters - UDP only (fast, local network)
    merge dev.global {
        to 10.30.1.10:53 svc.cluster.local udp
    }

    forward . 8.8.8.8 8.8.4.4
    cache 30
}

Zone Removal for Public DNS Integration

.:53 {
    errors
    log

    # Strip .external zone and query public DNS
    # Query: google.com.external -> Forwards: google.com
    merge external {
        to 8.8.8.8:53 remove_zone udp_tcp
        to 1.1.1.1:53 remove_zone udp_tcp
    }

    # Internal services with zone rewriting
    merge internal {
        to 10.96.0.10:53 svc.cluster.local
    }

    forward . /etc/resolv.conf
    cache 30
}

Mixed Configuration (All Three Modes)

.:53 {
    errors
    log

    # Mode 1: Zone removal - for public DNS queries
    # Query: example.com.public -> Forwards: example.com
    merge public {
        to 8.8.8.8:53 remove_zone tcp
        to 1.1.1.1:53 remove_zone tcp
    }

    # Mode 2: Zone rewriting - for multi-cluster federation
    # Query: myapp.federated -> Forwards: myapp.svc.cluster.local
    merge federated {
        to 10.96.0.10:53 svc.cluster.local
        to 10.97.0.10:53 svc.cluster.local
    }

    # Mode 3: Passthrough - upstreams understand the zone
    # Query: service.shared.zone -> Forwards: service.shared.zone
    merge shared.zone {
        to 192.168.1.10:53
        to 192.168.2.10:53
    }

    forward . 8.8.8.8
    cache 30
}

How It Works

  1. Query Interception: When a query for foo.bar.cluster.all arrives, the plugin intercepts it

  2. Zone Transformation: The query is transformed based on configuration:

    • remove_zone: Query is rewritten to foo.bar. (zone stripped)
    • Rewrite zone: Query is rewritten to foo.bar.svc.cluster.local (zone replaced)
    • Passthrough: Query stays as foo.bar.cluster.all (unchanged)
  3. Parallel Resolution: All upstreams are queried simultaneously using configured protocol

  4. Protocol Handling:

    • UDP mode: Sends UDP queries only
    • TCP mode: Sends TCP queries only
    • UDP+TCP mode (default): Tries UDP first, automatically falls back to TCP if:
      • UDP query fails (timeout, network error)
      • Response is truncated (TC bit set)
  5. Response Rewriting: DNS responses are rewritten back to match the original query name

  6. Merging: DNS answers (A/AAAA records) are combined, duplicates removed

  7. Response: A single response with all unique IPs is returned

Features

  • ✅ Parallel upstream queries for low latency
  • ✅ Automatic deduplication of IP addresses
  • Three zone handling modes: removal, rewriting, and passthrough
  • ✅ Query rewriting per upstream
  • ✅ Support for A and AAAA records
  • TCP and UDP protocol support
  • Automatic UDP to TCP fallback on truncation
  • Per-upstream protocol configuration
  • ✅ Configurable timeouts
  • ✅ Multiple zone support

Limitations

  • Currently supports only A and AAAA record types
  • Does not merge NS, MX, or other record types
  • Fixed 5-second timeout per upstream (configurable in code)

Testing

Run the provided e2e test that builds the plugin, the docker image and runs a full e2e test:

./e2e_test.sh

Troubleshooting

Common Issues

UDP Blocked by Firewall

Error: UDP query to 10.96.0.10:53 failed: i/o timeout

Solution: Use tcp or udp_tcp protocol

Response Truncation

UDP response truncated for 10.96.0.10:53, retrying with TCP

This is normal - plugin automatically retries with TCP

No Responses

  • Verify upstream DNS servers are reachable
  • Check firewall rules for both UDP/53 and TCP/53
  • Test with dig @upstream-ip -p 53 test.svc.cluster.local

Configuration Error: Both remove_zone and rewrite_zone

Error: cannot specify both remove_zone and rewrite_zone for upstream 8.8.8.8:53

Solution: Choose only one zone handling mode per upstream - either remove_zone OR a rewrite zone, not both

Query Not Matching Zone

Enable debug logging to see query transformations:

.:53 {
    log
    merge cluster.all {
        to 8.8.8.8:53 remove_zone
    }
}

Look for log messages like:

  • Removing zone: foo.cluster.all -> foo for upstream 8.8.8.8:53
  • Rewriting foo.cluster.all to foo.svc.cluster.local for upstream 8.8.8.8:53

Contributing

This is a basic implementation. Potential enhancements:

  • Support for more record types (SRV, TXT, etc.)
  • Weighted merging based on upstream health
  • Circuit breaker for failing upstreams
  • Prometheus metrics

About

CoreDNS merge plugin for merging DNS answer from multiple upstreams. Works perfect in pair with Cilium Cluster Mesh for multicluster services/loadbalancing and Raft protocols.

Topics

Resources

Stars

Watchers

Forks