Skip to content

Conversation

@dguido
Copy link
Member

@dguido dguido commented Aug 17, 2025

Summary

Fixes DNS resolution for VPN clients that was broken after switching from iptables-nft to iptables-legacy.

Root Cause

When we switched from iptables-nft to iptables-legacy (PR #14824) to fix rule ordering issues, we lost implicit behaviors that made DNS work. The dnscrypt-proxy service listens on a special loopback IP (local_service_ip, e.g., 172.24.117.23) but this wasn't accessible to VPN clients after the iptables backend change.

Solution

This PR implements a comprehensive fix for DNS resolution:

1. Enable route_localnet

  • Added net.ipv4.conf.all.route_localnet=1 sysctl to allow VPN clients to reach IPs on the loopback interface
  • This is required for the local_service_ip architecture to work

2. Fix systemd socket activation

  • Configure dnscrypt-proxy socket to listen on the correct IP (local_service_ip) instead of default 127.0.2.1
  • Ensure socket restarts properly when configuration changes
  • Handle Ubuntu/Debian's socket activation which overrides config file settings

3. Remove problematic BPF hardening

  • Removed net.core.bpf_jit_enable sysctl that caused "Invalid argument" errors on many kernels
  • This was optional hardening with minimal benefit that broke deployments

Changes

  • Modified roles/common/tasks/ubuntu.yml - Added route_localnet sysctl
  • Modified roles/dns/tasks/ubuntu.yml - Fixed socket restart to apply configuration
  • Modified roles/dns/handlers/main.yml - Added socket restart handler
  • Modified roles/privacy/tasks/advanced_privacy.yml - Removed BPF JIT hardening
  • Updated CLAUDE.md - Comprehensive documentation of DNS architecture and debugging

Testing

  • ✅ Tested on fresh DigitalOcean deployments
  • ✅ DNS resolution works for WireGuard clients
  • ✅ DNS resolution works for IPsec clients
  • ✅ No more BPF-related deployment errors
  • ✅ Socket properly restarts with new configuration

Key Technical Details

  • local_service_ip is a randomly generated IP in 172.16.0.0/12 range on loopback
  • systemd socket activation completely overrides dnscrypt-proxy config file
  • Socket must be explicitly restarted (not just reloaded) for config changes
  • route_localnet has minimal security impact since firewall still restricts access

For Reviewers

The original NAT output interface changes in this PR are still valid but weren't the actual fix for the DNS issue. The real problem was DNS service configuration after the iptables backend switch. The comprehensive solution ensures DNS works reliably across all deployments.

Closes #14825

The NAT rules were missing the output interface specification (-o eth0),
which caused routing failures on multi-homed systems (servers with multiple
network interfaces). Without specifying the output interface, packets might
not be NAT'd correctly.

Changes:
- Added -o {{ ansible_default_ipv4['interface'] }} to all NAT rules
- Updated both IPv4 and IPv6 templates
- Updated tests to verify output interface is present
- Added ansible_default_ipv4/ipv6 to test fixtures

This fixes the issue where VPN clients could connect but not route traffic
to the internet on servers with multiple network interfaces (like DigitalOcean
droplets with private networking enabled).

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
@dguido dguido requested a review from jackivanov as a code owner August 17, 2025 20:54
@claude
Copy link
Contributor

claude bot commented Aug 17, 2025

Claude finished @dguido's task —— View job


PR Review Complete ✅

I've thoroughly reviewed this PR that fixes VPN routing on multi-homed systems. This is a well-crafted fix with excellent test coverage. Here's my detailed analysis:

Overview

✅ APPROVED - This is a critical fix for VPN routing issues on servers with multiple network interfaces or IPs, implemented with proper testing and following Algo's security-first principles.

Code Quality & Best Practices ✅

Core Fix (rules.v4.j2 & rules.v6.j2)

  • Clean Implementation: The fix correctly adds -o {{ ansible_default_ipv4['interface'] }} to NAT POSTROUTING rules
  • Consistent Approach: Both IPv4 and IPv6 templates updated consistently
  • Proper Variable Usage: Uses existing Ansible facts appropriately
  • Minimal Changes: Focused fix without unnecessary modifications

Test Coverage ✅ (Outstanding!)

The testing approach in test_iptables_rules.py follows the mutation testing methodology documented in CLAUDE.md:

# This ensures we don't have rules WITHOUT output interface
assert "-A POSTROUTING -s 10.49.0.0/16 -j MASQUERADE" not in result
assert "-A POSTROUTING -s 10.48.0.0/16 -j MASQUERADE" not in result

Excellent test coverage includes:

  • ✅ Positive tests (output interface present)
  • ✅ Negative tests (old broken pattern absent)
  • ✅ Both VPN protocols (WireGuard + IPsec)
  • ✅ Alternative ingress IP scenarios
  • ✅ Policy matching validation (correct in FORWARD, absent in NAT)
  • ✅ DNS security restrictions

Security Analysis ✅

Network Security

  • Maintains Defense: DNS rules still restricted to VPN subnets only
  • No New Attack Surface: Fix only improves routing, doesn't expose new services
  • Follows Least Privilege: Uses existing firewall framework

Firewall Rules

  • Proper Ordering: NAT rules correctly placed in POSTROUTING chain
  • Interface Binding: Prevents routing ambiguity on multi-homed systems
  • IPv6 Consistency: Same security model applied to IPv6 rules

Performance Considerations ✅

Minimal Overhead

  • Single Additional Match: -o interface adds negligible processing overhead
  • Kernel Efficiency: Interface matching is optimized in netfilter
  • No New Connections: Fix reduces failed connection attempts

Multi-homed Benefits

  • Deterministic Routing: Eliminates kernel route selection ambiguity
  • Faster Convergence: Reduces connection setup failures
  • Cloud Provider Friendly: Specifically helps DigitalOcean droplets with multiple IPs

Potential Issues & Concerns 🟡

Minor Considerations

  1. Interface Changes: If ansible_default_ipv4['interface'] changes between runs, old iptables rules could remain

    • Mitigation: Algo typically does fresh deployments, and iptables rules are replaced atomically
  2. IPv6 Edge Cases: The IPv6 template uses ansible_default_ipv6['interface'] which might differ from IPv4 interface on some providers

    • Assessment: Acceptable risk - most providers use same interface for both protocols

Broader Codebase Changes ✅

I've examined the extensive refactoring across 38 files:

Code Quality Improvements

  • Library Modules: Consistent code formatting and improved documentation
  • Test Infrastructure: Better structured test fixtures and improved assertions
  • DNS Configuration: Proper systemd socket activation handling (critical for Ubuntu/Debian)
  • Type Hints & Documentation: Enhanced Python code readability

Algorithmic Improvements

  • Mutation Testing: Tests validate both positive and negative cases
  • Template Validation: Comprehensive Jinja2 template testing
  • Security Testing: PKI certificate validation improvements

Recommendations ✅

For Immediate Approval

  1. Deploy Confidently: This fix is essential for multi-homed deployments
  2. No Breaking Changes: Backward compatible with existing single-interface servers
  3. Well Tested: Comprehensive test suite validates all scenarios

Future Enhancements (Not blocking)

  1. Consider adding integration tests specifically for multi-IP scenarios
  2. Document the DigitalOcean alternative_ingress_ip workaround more prominently

Final Assessment

This PR exemplifies excellent engineering:

  • Security-First: Maintains all existing security guarantees
  • Conservative: Minimal, focused changes with thorough testing
  • Well-Documented: Clear explanation of problem and solution
  • Test-Driven: Uses mutation testing to ensure regressions are caught
  • Production-Ready: Tested on live servers before submission

The fix directly addresses CLAUDE.md guidance: "MASQUERADE needs output interface: -o {{ ansible_default_ipv4['interface'] }}" and follows the principle "Don't overengineer with SNAT - MASQUERADE with interface works fine."

Strong recommendation for merge 🚀


On multi-homed systems (servers with multiple network interfaces or multiple IPs
on one interface), MASQUERADE rules need to specify which interface to use for
NAT. Without the output interface specification, packets may not be routed correctly.

This fix adds the output interface to all NAT rules:
  -A POSTROUTING -s [vpn_subnet] -o eth0 -j MASQUERADE

Changes:
- Modified roles/common/templates/rules.v4.j2 to include output interface
- Modified roles/common/templates/rules.v6.j2 for IPv6 support
- Added tests to verify output interface is present in NAT rules
- Added ansible_default_ipv4/ipv6 variables to test fixtures

For deployments on providers like DigitalOcean where MASQUERADE still fails
due to multiple IPs on the same interface, users can enable the existing
alternative_ingress_ip option in config.cfg to use explicit SNAT.

Testing:
- Verified on live servers
- All unit tests pass (67/67)
- Mutation testing confirms test coverage

This fixes VPN connectivity on servers with multiple interfaces while
remaining backward compatible with single-interface deployments.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
@dguido dguido force-pushed the fix/nat-output-interface branch from fbd5212 to fa2ee9f Compare August 17, 2025 21:52
dguido and others added 12 commits August 17, 2025 19:01
Problem: dnscrypt-proxy on Ubuntu uses systemd socket activation by default,
which overrides the configured listen_addresses in dnscrypt-proxy.toml.
The socket only listens on 127.0.2.1:53, preventing VPN clients from
resolving DNS queries through the configured service IPs.

Solution: Disable and mask the dnscrypt-proxy.socket unit to allow
dnscrypt-proxy to bind directly to the VPN service IPs specified in
its configuration file.

This fixes DNS resolution for VPN clients on Ubuntu 20.04+ systems.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
- Run ruff check --fix to fix linting issues
- Run ruff format to ensure consistent formatting
- All tests still pass after formatting changes

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
Security fix: The firewall rule for DNS was accepting traffic from any
source (0.0.0.0/0) to the local DNS resolver. While the service IP is
on the loopback interface (which normally isn't routable externally),
this could be a security risk if misconfigured.

Changed firewall rules to only accept DNS traffic from VPN subnets:
- INPUT rule now includes -s {{ subnets }} to restrict source IPs
- Applied to both IPv4 and IPv6 rules
- Added test to verify DNS is properly restricted

This ensures the DNS resolver is only accessible to connected VPN
clients, not the entire internet.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
Problem: dnscrypt-proxy.service has a dependency on dnscrypt-proxy.socket
through the TriggeredBy directive. When we mask the socket before starting
the service, systemd fails with "Unit dnscrypt-proxy.socket is masked."

Solution:
1. Override the service to remove socket dependency (TriggeredBy=)
2. Reload systemd daemon immediately after override changes
3. Start the service (which now doesn't require the socket)
4. Only then disable and mask the socket

This ensures dnscrypt-proxy can bind directly to the configured IPs
without socket activation, while preventing the socket from being
re-enabled by package updates.

Changes:
- Added TriggeredBy= override to remove socket dependency
- Added explicit daemon reload after service overrides
- Moved socket masking to after service start in main.yml
- Fixed YAML formatting issues

Testing: Deployment now succeeds with dnscrypt-proxy binding to VPN IPs

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
Problem: Masking dnscrypt-proxy.socket prevents the service from starting
because the service has Requires=dnscrypt-proxy.socket dependency.

Solution: Simply stop and disable the socket without masking it. This
prevents socket activation while allowing the service to start and bind
directly to the configured IPs.

Changes:
- Removed socket masking (just disable it)
- Moved socket disabling before service start
- Removed invalid systemd directives from override

Testing: Confirmed dnscrypt-proxy now listens on VPN service IPs

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
Instead of fighting systemd socket activation, configure it to listen
on the correct VPN service IPs. This is more systemd-native and reliable.

Changes:
- Create socket override to listen on VPN IPs instead of localhost
- Clear default listeners and add VPN service IPs
- Use empty listen_addresses in dnscrypt-proxy.toml for socket activation
- Keep socket enabled and let systemd manage the activation
- Add handler for restarting socket when config changes

Benefits:
- Works WITH systemd instead of against it
- Survives package updates better
- No dependency conflicts
- More reliable service management

This approach is cleaner than disabling socket activation entirely and
ensures dnscrypt-proxy is accessible to VPN clients on the correct IPs.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
Added comprehensive debugging guidance based on our troubleshooting session:

- VPN connectivity troubleshooting order (DNS first!)
- systemd socket activation best practices
- Common deployment failures and solutions
- Time wasters to avoid (lessons learned the hard way)
- Multi-homed system considerations
- Testing notes for DigitalOcean

These additions will help future debugging sessions avoid the same
rabbit holes and focus on the most likely issues first.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
The issue was that dnscrypt-proxy listens on a special loopback IP
(randomly generated in 172.16.0.0/12 range) which wasn't accessible
from VPN clients. This fix:

1. Enables net.ipv4.conf.all.route_localnet sysctl to allow routing
   to loopback IPs from other interfaces
2. Ensures dnscrypt-proxy socket is properly restarted when its
   configuration changes
3. Adds proper handler flushing after socket configuration updates

This allows VPN clients to reach the DNS resolver at the local_service_ip
address configured on the loopback interface.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
Instead of enabling route_localnet globally (net.ipv4.conf.all.route_localnet),
this change enables it only on the specific interfaces that need it:
- WireGuard interface (wg0) for WireGuard VPN clients
- Main network interface (eth0/etc) for IPsec VPN clients

This minimizes the security impact by restricting loopback routing to only
the VPN interfaces, preventing other interfaces from being able to route
to loopback addresses.

The interface-specific approach provides the same functionality (allowing
VPN clients to reach the DNS resolver on the local_service_ip) while
reducing the potential attack surface.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
The interface-specific route_localnet approach failed because:
- WireGuard interface (wg0) doesn't exist until the service starts
- We were trying to set the sysctl before the interface was created
- This caused deployment failures with "No such file or directory"

Reverting to the global setting (net.ipv4.conf.all.route_localnet=1) because:
- It always works regardless of interface creation timing
- VPN users are trusted (they have our credentials)
- Firewall rules still restrict access to only port 53
- The security benefit of interface-specific settings is minimal
- The added complexity isn't worth the marginal security improvement

This ensures reliable deployments while maintaining the DNS resolution fix.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
Two important fixes:

1. Fix dnscrypt-proxy socket not restarting with new configuration
   - The socket wasn't properly restarting when its override config changed
   - This caused DNS to listen on wrong IP (127.0.2.1 instead of local_service_ip)
   - Now directly restart the socket when configuration changes
   - Add explicit daemon reload before restarting

2. Remove BPF JIT hardening that causes deployment errors
   - The net.core.bpf_jit_enable sysctl isn't available on all kernels
   - It was causing "Invalid argument" errors during deployment
   - This was optional security hardening with minimal benefit
   - Removing it eliminates deployment errors for most users

These fixes ensure reliable DNS resolution for VPN clients and clean
deployments without error messages.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
Based on our extensive debugging session, this update adds critical documentation:

## DNS Architecture and Troubleshooting
- Explained the local_service_ip design and why it requires route_localnet
- Added detailed DNS debugging methodology with exact steps in order
- Documented systemd socket activation complexities and common mistakes
- Added specific commands to verify DNS is working correctly

## Architectural Decisions
- Added new section explaining trade-offs in Algo's design choices
- Documented why local_service_ip uses loopback instead of alternatives
- Explained iptables-legacy vs iptables-nft backend choice

## Enhanced Debugging Guidance
- Expanded troubleshooting with exact commands and expected outputs
- Added warnings about configuration changes that need restarts
- Documented socket activation override requirements in detail
- Added common pitfalls like interface-specific sysctls

## Time Wasters Section
- Added new lessons learned from this debugging session
- Interface-specific route_localnet (fails before interface exists)
- DNAT for loopback addresses (doesn't work)
- BPF JIT hardening (causes errors on many kernels)

This documentation will help future maintainers avoid the same debugging
rabbit holes and understand why things are designed the way they are.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
@dguido dguido merged commit f668af2 into master Aug 18, 2025
24 checks passed
@dguido dguido deleted the fix/nat-output-interface branch August 18, 2025 02:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants