From 742e84e3b08b9329e4e70b7a9c0abcb765d21744 Mon Sep 17 00:00:00 2001 From: "A. F. Dudley" Date: Sat, 7 Mar 2026 01:47:58 +0000 Subject: [PATCH] feat: dedicated GRE tunnel (Tunnel100) bypassing DZ-managed Tunnel500 Root cause: the doublezero-agent on mia-sw01 manages Tunnel500's ACL (SEC-USER-500-IN) and drops outbound gossip with src 137.239.194.65. The agent overwrites any custom ACL entries. Fix: create a separate GRE tunnel (Tunnel100) using mia-sw01's free LAN IP (209.42.167.137) as tunnel source. This tunnel goes over the ISP uplink, completely independent of the DZ overlay: - mia-sw01: Tunnel100 src 209.42.167.137, dst 186.233.184.235 - biscayne: gre-ashburn src 186.233.184.235, dst 209.42.167.137 - Link addresses: 169.254.100.0/31 Playbook changes: - ashburn-relay-mia-sw01: Tunnel100 + Loopback101 + SEC-VALIDATOR-100-IN - ashburn-relay-biscayne: gre-ashburn tunnel + updated policy routing - New template: ashburn-routing-ifup.sh.j2 for boot persistence Co-Authored-By: Claude Opus 4.6 --- docs/bug-ashburn-tunnel-port-filtering.md | 104 ++++++-------- playbooks/ashburn-relay-biscayne.yml | 137 ++++++++++++++----- playbooks/ashburn-relay-mia-sw01.yml | 150 +++++++++++++-------- playbooks/files/ashburn-routing-ifup.sh.j2 | 28 ++++ 4 files changed, 261 insertions(+), 158 deletions(-) create mode 100644 playbooks/files/ashburn-routing-ifup.sh.j2 diff --git a/docs/bug-ashburn-tunnel-port-filtering.md b/docs/bug-ashburn-tunnel-port-filtering.md index 865e3b93..913a610b 100644 --- a/docs/bug-ashburn-tunnel-port-filtering.md +++ b/docs/bug-ashburn-tunnel-port-filtering.md @@ -1,85 +1,61 @@ -# Bug: Ashburn Relay — 137.239.194.65 Not Routable from Public Internet +# Bug: Ashburn Relay — Outbound Gossip Dropped by DZ Agent ACL ## Summary `--gossip-host 137.239.194.65` correctly advertises the Ashburn relay IP in -ContactInfo for all sockets (gossip, TVU, repair, TPU). However, 137.239.194.65 -is a DoubleZero overlay IP (137.239.192.0/19, IS-IS only) that is NOT announced -via BGP to the public internet. Public peers cannot route to it, so TVU shreds, -repair requests, and TPU traffic never arrive at was-sw01. +ContactInfo for all sockets (gossip, TVU, repair, TPU). The inbound path +works end-to-end (proven with kelce UDP tests through every hop). However, +outbound gossip from biscayne (src 137.239.194.65) is dropped by the +DoubleZero agent's ACL on mia-sw01's Tunnel500, preventing ContactInfo from +propagating to the cluster. Peers never learn our TVU address. ## Evidence -- Gossip traffic arrives on `doublezero0` interface: +- Inbound path confirmed hop by hop (kelce → was-sw01 → mia-sw01 → Tunnel500 + → biscayne doublezero0 → DNAT → kind bridge → kind node eth0): ``` - doublezero0 In IP 64.130.58.70.8001 > 137.239.194.65.8001: UDP, length 132 + 01:04:12.136633 IP 69.112.108.72.58856 > 172.20.0.2.9000: UDP, length 13 ``` -- Zero TVU/repair traffic arrives: +- Outbound gossip leaves biscayne correctly (src 137.239.194.65:8001 on + doublezero0), enters mia-sw01 via Tunnel500, hits SEC-USER-500-IN ACL: ``` - tcpdump -i doublezero0 'dst host 137.239.194.65 and udp and not port 8001' - 0 packets captured + 60 deny ip any any [match 26355968 packets, 0:00:02 ago] ``` -- ContactInfo correctly advertises all sockets on 137.239.194.65: - ```json - { - "gossip": "137.239.194.65:8001", - "tvu": "137.239.194.65:9000", - "serveRepair": "137.239.194.65:9011", - "tpu": "137.239.194.65:9002" - } - ``` -- Outbound gossip from biscayne exits via `doublezero0` with source - 137.239.194.65 — SNAT and routing work correctly in the outbound direction. + The ACL only permits src 186.233.184.235 and 169.254.7.7 — not 137.239.194.65. +- Validator not visible in public RPC getClusterNodes (gossip not propagating) +- Validator sees 775 nodes vs 5,045 on public RPC ## Root Cause -**137.239.194.0/24 is not routable from the public internet.** The prefix -belongs to DoubleZero's overlay address space (137.239.192.0/19, Momentum -Telecom, WHOIS OriginAS: empty). It is advertised only via IS-IS within the -DoubleZero switch mesh. There is no eBGP session on was-sw01 to advertise it -to the ISP — all BGP peers are iBGP AS 65342 (DoubleZero internal). +The `doublezero-agent` daemon on mia-sw01 manages Tunnel500 and its ACL +(SEC-USER-500-IN). The agent periodically reconciles the ACL to its expected +state, overwriting any custom entries we add. We cannot modify the ACL +without the agent reverting it. -When the validator advertises `tvu: 137.239.194.65:9000` in ContactInfo, -public internet peers attempt to send turbine shreds to that IP, but the -packets have no route through the global BGP table to reach was-sw01. Only -DoubleZero-connected peers could potentially reach it via the overlay. +137.239.194.65 is from the was-sw01 LAN block (137.239.194.64/29), routed +by the ISP to was-sw01 via the WAN link. It IS publicly routable (confirmed +by kelce ping/UDP tests). The earlier hypothesis that it was unroutable was +wrong — the IP reaches was-sw01, gets forwarded to mia-sw01 via backbone, +and reaches biscayne through Tunnel500 (inbound ACL direction is fine). -The old shred relay pipeline worked because it used `--public-tvu-address -64.92.84.81:20000` — was-sw01's Et1/1 ISP uplink IP, which IS publicly -routable. The `--gossip-host 137.239.194.65` approach advertises a -DoubleZero-only IP for ALL sockets, making TVU/repair/TPU unreachable from -non-DoubleZero peers. +The problem is outbound only: the Tunnel500 ingress ACL (traffic FROM +biscayne TO mia-sw01) drops src 137.239.194.65. -The original hypothesis (ACL/PBR port filtering) was wrong. The tunnel and -switch routing work correctly — the problem is upstream: traffic never arrives -at was-sw01 in the first place. +## Fix -## Impact +Create a dedicated GRE tunnel (Tunnel100) between biscayne and mia-sw01 +that bypasses the DZ-managed Tunnel500 entirely: -The validator cannot receive turbine shreds or serve repair requests via the -low-latency Ashburn path. It falls back to the Miami public IP (186.233.184.235) -for all shred/repair traffic, negating the benefit of `--gossip-host`. +- **mia-sw01 Tunnel100**: src 209.42.167.137 (free LAN IP), dst 186.233.184.235 + (biscayne), link 169.254.100.0/31, ACL SEC-VALIDATOR-100-IN (we control) +- **biscayne gre-ashburn**: src 186.233.184.235, dst 209.42.167.137, + link 169.254.100.1/31 -## Fix Options +Traffic flow unchanged except the tunnel: +- Inbound: was-sw01 → backbone → mia-sw01 → Tunnel100 → biscayne → DNAT → agave +- Outbound: agave → SNAT 137.239.194.65 → Tunnel100 → mia-sw01 → backbone → was-sw01 -1. **Use 64.92.84.81 (was-sw01 Et1/1) for ContactInfo sockets.** This is the - publicly routable Ashburn IP. Requires `--gossip-host 64.92.84.81` (or - equivalent `--bind-address` config) and DNAT/forwarding on was-sw01 to relay - traffic through the backbone → mia-sw01 → Tunnel500 → biscayne. The old - `--public-tvu-address` pipeline used this IP successfully. - -2. **Get DoubleZero to announce 137.239.194.0/24 via eBGP to the ISP.** This - would make the current `--gossip-host 137.239.194.65` setup work, but - requires coordination with DoubleZero operations. - -3. **Hybrid approach**: Use 64.92.84.81 for public-facing sockets (TVU, repair, - TPU) and 137.239.194.65 for gossip (which works via DoubleZero overlay). - Requires agave to support per-protocol address binding, which it does not - (`--gossip-host` sets ALL sockets to the same IP). - -## Previous Workaround - -The old `--public-tvu-address` pipeline used socat + shred-unwrap.py to relay -shreds from 64.92.84.81:20000 to the validator. That pipeline is not persistent -across reboots and was superseded by the `--gossip-host` approach (which turned -out to be broken for non-DoubleZero peers). +See: +- `playbooks/ashburn-relay-mia-sw01.yml` (Tunnel100 + ACL + routes) +- `playbooks/ashburn-relay-biscayne.yml` (gre-ashburn + DNAT + SNAT + policy routing) +- `playbooks/ashburn-relay-was-sw01.yml` (static route, unchanged) diff --git a/playbooks/ashburn-relay-biscayne.yml b/playbooks/ashburn-relay-biscayne.yml index 09e0ff74..a762a878 100644 --- a/playbooks/ashburn-relay-biscayne.yml +++ b/playbooks/ashburn-relay-biscayne.yml @@ -2,7 +2,12 @@ # Configure biscayne for Ashburn validator relay # # Sets up inbound DNAT (137.239.194.65 → kind node) and outbound SNAT + -# policy routing (validator traffic → doublezero0 → mia-sw01 → was-sw01). +# policy routing (validator traffic → GRE tunnel → mia-sw01 → was-sw01). +# +# Uses a dedicated GRE tunnel to mia-sw01 (NOT the DoubleZero-managed +# doublezero0/Tunnel500). The tunnel source is biscayne's public IP +# (186.233.184.235) and the destination is mia-sw01's free LAN IP +# (209.42.167.137). # # Usage: # # Full setup (inbound + outbound) @@ -28,8 +33,12 @@ ashburn_ip: 137.239.194.65 kind_node_ip: 172.20.0.2 kind_network: 172.20.0.0/16 - tunnel_gateway: 169.254.7.6 - tunnel_device: doublezero0 + # New dedicated GRE tunnel (not DZ-managed doublezero0) + tunnel_device: gre-ashburn + tunnel_local_ip: 169.254.100.1 # biscayne end of /31 + tunnel_remote_ip: 169.254.100.0 # mia-sw01 end of /31 + tunnel_src: 186.233.184.235 # biscayne public IP + tunnel_dst: 209.42.167.137 # mia-sw01 free LAN IP fwmark: 100 rt_table_name: ashburn rt_table_id: 100 @@ -49,6 +58,15 @@ ansible.builtin.command: cmd: ip addr del {{ ashburn_ip }}/32 dev lo failed_when: false + changed_when: false + + - name: Remove GRE tunnel + ansible.builtin.shell: + cmd: | + ip link set {{ tunnel_device }} down 2>/dev/null || true + ip tunnel del {{ tunnel_device }} 2>/dev/null || true + executable: /bin/bash + changed_when: false - name: Remove inbound DNAT rules ansible.builtin.shell: @@ -58,6 +76,7 @@ iptables -t nat -D PREROUTING -p tcp -d {{ ashburn_ip }} --dport {{ gossip_port }} -j DNAT --to-destination {{ kind_node_ip }}:{{ gossip_port }} 2>/dev/null || true iptables -t nat -D PREROUTING -p udp -d {{ ashburn_ip }} --dport {{ dynamic_port_range_start }}:{{ dynamic_port_range_end }} -j DNAT --to-destination {{ kind_node_ip }} 2>/dev/null || true executable: /bin/bash + changed_when: false - name: Remove outbound mangle rules ansible.builtin.shell: @@ -67,11 +86,13 @@ iptables -t mangle -D PREROUTING -s {{ kind_network }} -p udp --sport {{ dynamic_port_range_start }}:{{ dynamic_port_range_end }} -j MARK --set-mark {{ fwmark }} 2>/dev/null || true iptables -t mangle -D PREROUTING -s {{ kind_network }} -p tcp --sport {{ gossip_port }} -j MARK --set-mark {{ fwmark }} 2>/dev/null || true executable: /bin/bash + changed_when: false - name: Remove outbound SNAT rule ansible.builtin.shell: cmd: iptables -t nat -D POSTROUTING -m mark --mark {{ fwmark }} -j SNAT --to-source {{ ashburn_ip }} 2>/dev/null || true executable: /bin/bash + changed_when: false - name: Remove policy routing ansible.builtin.shell: @@ -79,10 +100,12 @@ ip rule del fwmark {{ fwmark }} table {{ rt_table_name }} 2>/dev/null || true ip route del default table {{ rt_table_name }} 2>/dev/null || true executable: /bin/bash + changed_when: false - name: Persist cleaned iptables ansible.builtin.command: cmd: netfilter-persistent save + changed_when: true - name: Remove if-up.d script ansible.builtin.file: @@ -91,7 +114,7 @@ - name: Rollback complete ansible.builtin.debug: - msg: "Ashburn relay rules removed. Old SHRED-RELAY DNAT (64.92.84.81:20000) is still in place." + msg: "Ashburn relay rules removed." - name: End play after rollback ansible.builtin.meta: end_play @@ -99,13 +122,13 @@ # ------------------------------------------------------------------ # Pre-flight checks # ------------------------------------------------------------------ - - name: Check doublezero0 tunnel is up + - name: Check tunnel destination is reachable ansible.builtin.command: - cmd: ip link show {{ tunnel_device }} - register: tunnel_status + cmd: ping -c 1 -W 2 {{ tunnel_dst }} + register: tunnel_dst_ping changed_when: false - failed_when: "'UP' not in tunnel_status.stdout" - tags: [preflight, inbound, outbound] + failed_when: tunnel_dst_ping.rc != 0 + tags: [preflight, outbound] - name: Check kind node is reachable ansible.builtin.command: @@ -115,23 +138,6 @@ failed_when: kind_ping.rc != 0 tags: [preflight, inbound] - - name: Verify Docker preserves source ports (5 sec sample) - ansible.builtin.shell: - cmd: | - set -o pipefail - # Check if any validator traffic is flowing with original sport - timeout 5 tcpdump -i br-cf46a62ab5b2 -nn -c 5 'udp src port 8001 or udp src portrange 9000-9025' 2>&1 | tail -5 || echo "No validator traffic captured in 5s (validator may not be running)" - executable: /bin/bash - register: sport_check - changed_when: false - failed_when: false - tags: [preflight] - - - name: Show sport preservation check - ansible.builtin.debug: - var: sport_check.stdout_lines - tags: [preflight] - - name: Show existing iptables nat rules ansible.builtin.shell: cmd: iptables -t nat -L -v -n --line-numbers | head -60 @@ -145,6 +151,44 @@ var: existing_nat.stdout_lines tags: [preflight] + - name: Check for existing GRE tunnel + ansible.builtin.shell: + cmd: ip tunnel show {{ tunnel_device }} 2>&1 || echo "tunnel does not exist" + executable: /bin/bash + register: existing_tunnel + changed_when: false + tags: [preflight] + + - name: Display existing tunnel + ansible.builtin.debug: + var: existing_tunnel.stdout_lines + tags: [preflight] + + # ------------------------------------------------------------------ + # GRE tunnel setup + # ------------------------------------------------------------------ + - name: Create GRE tunnel + ansible.builtin.shell: + cmd: | + set -o pipefail + if ip tunnel show {{ tunnel_device }} 2>/dev/null; then + echo "tunnel already exists" + else + ip tunnel add {{ tunnel_device }} mode gre local {{ tunnel_src }} remote {{ tunnel_dst }} ttl 64 + ip addr add {{ tunnel_local_ip }}/31 dev {{ tunnel_device }} + ip link set {{ tunnel_device }} up mtu 8972 + echo "tunnel created" + fi + executable: /bin/bash + register: tunnel_result + changed_when: "'created' in tunnel_result.stdout" + tags: [outbound] + + - name: Show tunnel result + ansible.builtin.debug: + var: tunnel_result.stdout_lines + tags: [outbound] + # ------------------------------------------------------------------ # Inbound: DNAT for 137.239.194.65 → kind node # ------------------------------------------------------------------ @@ -186,7 +230,7 @@ tags: [inbound] # ------------------------------------------------------------------ - # Outbound: fwmark + SNAT + policy routing + # Outbound: fwmark + SNAT + policy routing via new tunnel # ------------------------------------------------------------------ - name: Mark outbound validator traffic (mangle PREROUTING) ansible.builtin.shell: @@ -218,7 +262,6 @@ ansible.builtin.shell: cmd: | set -o pipefail - # Check if rule already exists if iptables -t nat -C POSTROUTING -m mark --mark {{ fwmark }} -j SNAT --to-source {{ ashburn_ip }} 2>/dev/null; then echo "SNAT rule already exists" else @@ -256,9 +299,9 @@ changed_when: "'added' in rule_result.stdout" tags: [outbound] - - name: Add default route via doublezero0 in ashburn table + - name: Add default route via GRE tunnel in ashburn table ansible.builtin.shell: - cmd: ip route replace default via {{ tunnel_gateway }} dev {{ tunnel_device }} table {{ rt_table_name }} + cmd: ip route replace default via {{ tunnel_remote_ip }} dev {{ tunnel_device }} table {{ rt_table_name }} executable: /bin/bash changed_when: true tags: [outbound] @@ -269,11 +312,12 @@ - name: Save iptables rules ansible.builtin.command: cmd: netfilter-persistent save + changed_when: true tags: [inbound, outbound] - name: Install if-up.d persistence script - ansible.builtin.copy: - src: files/ashburn-routing-ifup.sh + ansible.builtin.template: + src: files/ashburn-routing-ifup.sh.j2 dest: /etc/network/if-up.d/ashburn-routing mode: '0755' owner: root @@ -283,6 +327,22 @@ # ------------------------------------------------------------------ # Verification # ------------------------------------------------------------------ + - name: Show tunnel status + ansible.builtin.shell: + cmd: | + echo "=== tunnel ===" + ip tunnel show {{ tunnel_device }} + echo "" + echo "=== tunnel addr ===" + ip addr show {{ tunnel_device }} + echo "" + echo "=== ping tunnel peer ===" + ping -c 1 -W 2 {{ tunnel_remote_ip }} 2>&1 || echo "tunnel peer unreachable" + executable: /bin/bash + register: tunnel_status + changed_when: false + tags: [outbound] + - name: Show NAT rules ansible.builtin.shell: cmd: iptables -t nat -L -v -n --line-numbers 2>&1 | head -40 @@ -323,6 +383,7 @@ - name: Display verification ansible.builtin.debug: msg: + tunnel: "{{ tunnel_status.stdout_lines | default([]) }}" nat_rules: "{{ nat_rules.stdout_lines }}" mangle_rules: "{{ mangle_rules.stdout_lines | default([]) }}" routing: "{{ routing_info.stdout_lines | default([]) }}" @@ -334,12 +395,14 @@ msg: | === Ashburn Relay Setup Complete === Ashburn IP: {{ ashburn_ip }} (on lo) + GRE tunnel: {{ tunnel_device }} ({{ tunnel_src }} → {{ tunnel_dst }}) + link: {{ tunnel_local_ip }}/31 ↔ {{ tunnel_remote_ip }}/31 Inbound DNAT: {{ ashburn_ip }}:8001,9000-9025 → {{ kind_node_ip }} Outbound SNAT: {{ kind_network }} sport 8001,9000-9025 → {{ ashburn_ip }} - Policy route: fwmark {{ fwmark }} → table {{ rt_table_name }} → via {{ tunnel_gateway }} dev {{ tunnel_device }} - Persisted: iptables-persistent + /etc/network/if-up.d/ashburn-routing + Policy route: fwmark {{ fwmark }} → table {{ rt_table_name }} → via {{ tunnel_remote_ip }} dev {{ tunnel_device }} Next steps: - 1. Verify inbound: ping {{ ashburn_ip }} from external host - 2. Verify outbound: tcpdump on was-sw01 for src {{ ashburn_ip }} - 3. Check validator gossip ContactInfo shows {{ ashburn_ip }} for all addresses + 1. Apply mia-sw01 config (Tunnel100 must be up on both sides) + 2. Verify tunnel: ping {{ tunnel_remote_ip }} + 3. Test from kelce: echo test | nc -u -w 1 137.239.194.65 9000 + 4. Check validator gossip ContactInfo shows {{ ashburn_ip }} for all addresses diff --git a/playbooks/ashburn-relay-mia-sw01.yml b/playbooks/ashburn-relay-mia-sw01.yml index 76e08082..3cdd1aca 100644 --- a/playbooks/ashburn-relay-mia-sw01.yml +++ b/playbooks/ashburn-relay-mia-sw01.yml @@ -1,22 +1,18 @@ --- -# Configure laconic-mia-sw01 for validator traffic relay (inbound + outbound) +# Configure laconic-mia-sw01 for validator traffic relay via dedicated GRE tunnel # -# Outbound: Redirects outbound traffic from biscayne (src 137.239.194.65) -# arriving via the doublezero0 GRE tunnel to was-sw01 via the backbone, -# preventing BCP38 drops at mia-sw01's ISP uplink. +# Creates a NEW GRE tunnel (Tunnel100) separate from the DoubleZero-managed +# Tunnel500. The DZ agent controls Tunnel500's ACL (SEC-USER-500-IN) and +# overwrites any custom entries, so we cannot use it for validator traffic +# with src 137.239.194.65. # -# Inbound: Routes traffic destined to 137.239.194.65 from the default VRF -# to biscayne via Tunnel500 in vrf1. Without this, mia-sw01 sends -# 137.239.194.65 out the ISP uplink back to was-sw01 (routing loop). +# Tunnel100 uses mia-sw01's free LAN IP (209.42.167.137) as the tunnel +# source, and biscayne's public IP (186.233.184.235) as the destination. +# This tunnel carries traffic over the ISP uplink, completely independent +# of the DoubleZero overlay. # -# Approach: The existing per-tunnel ACL (SEC-USER-500-IN) controls what -# traffic enters vrf1 from Tunnel500. We add 137.239.194.65 to the ACL -# and add a default route in vrf1 via egress-vrf default pointing to -# was-sw01's backbone IP. For inbound, an inter-VRF static route in the -# default VRF forwards 137.239.194.65/32 to biscayne via Tunnel500. -# -# The other vrf1 tunnels (502, 504, 505) have their own ACLs that only -# permit their specific source IPs, so the default route won't affect them. +# Inbound: was-sw01 → backbone Et4/1 → mia-sw01 → Tunnel100 → biscayne +# Outbound: biscayne → Tunnel100 → mia-sw01 → backbone Et4/1 → was-sw01 # # Usage: # # Pre-flight checks only (safe, read-only) @@ -32,22 +28,28 @@ # # Rollback # ansible-playbook -i inventory/switches.yml playbooks/ashburn-relay-mia-sw01.yml -e rollback=true -- name: Configure mia-sw01 outbound validator redirect +- name: Configure mia-sw01 validator relay tunnel hosts: mia-sw01 gather_facts: false vars: ashburn_ip: 137.239.194.65 + biscayne_ip: 186.233.184.235 apply: false commit: false rollback: false - tunnel_interface: Tunnel500 - tunnel_vrf: vrf1 - tunnel_acl: SEC-USER-500-IN - tunnel_nexthop: 169.254.7.7 # biscayne's end of the Tunnel500 /31 + # New tunnel — not managed by DZ agent + tunnel_interface: Tunnel100 + tunnel_source_ip: 209.42.167.137 # mia-sw01 free LAN IP + tunnel_local: 169.254.100.0 # /31 link, mia-sw01 side + tunnel_remote: 169.254.100.1 # /31 link, biscayne side + tunnel_acl: SEC-VALIDATOR-100-IN + # Loopback for tunnel source (so it's always up) + tunnel_source_lo: Loopback101 backbone_interface: Ethernet4/1 - session_name: validator-outbound - checkpoint_name: pre-validator-outbound + backbone_peer: 172.16.1.188 # was-sw01 backbone IP + session_name: validator-tunnel + checkpoint_name: pre-validator-tunnel tasks: # ------------------------------------------------------------------ @@ -93,43 +95,52 @@ # ------------------------------------------------------------------ # Pre-flight checks (always run unless commit/rollback) # ------------------------------------------------------------------ - - name: Show tunnel interface config + - name: Check existing tunnel interfaces + arista.eos.eos_command: + commands: + - show ip interface brief | include Tunnel + register: existing_tunnels + tags: [preflight] + + - name: Display existing tunnels + ansible.builtin.debug: + var: existing_tunnels.stdout_lines + tags: [preflight] + + - name: Check if Tunnel100 already exists arista.eos.eos_command: commands: - "show running-config interfaces {{ tunnel_interface }}" register: tunnel_config tags: [preflight] - - name: Display tunnel config + - name: Display Tunnel100 config ansible.builtin.debug: var: tunnel_config.stdout_lines tags: [preflight] - - name: Show tunnel ACL + - name: Check if Loopback101 already exists arista.eos.eos_command: commands: - - "show running-config | section ip access-list {{ tunnel_acl }}" - register: acl_config + - "show running-config interfaces {{ tunnel_source_lo }}" + register: lo_config tags: [preflight] - - name: Display tunnel ACL + - name: Display Loopback101 config ansible.builtin.debug: - var: acl_config.stdout_lines + var: lo_config.stdout_lines tags: [preflight] - - name: Check VRF routing + - name: Check route for ashburn IP arista.eos.eos_command: commands: - - "show ip route vrf {{ tunnel_vrf }} 0.0.0.0/0" - - "show ip route vrf {{ tunnel_vrf }} {{ backbone_peer }}" - - "show ip route {{ backbone_peer }}" - "show ip route {{ ashburn_ip }}" - register: vrf_routing + register: route_check tags: [preflight] - - name: Display VRF routing check + - name: Display route check ansible.builtin.debug: - var: vrf_routing.stdout_lines + var: route_check.stdout_lines tags: [preflight] - name: Pre-flight summary @@ -138,9 +149,17 @@ msg: | === Pre-flight complete === Review the output above: - 1. {{ tunnel_interface }} ACL ({{ tunnel_acl }}): does it permit src {{ ashburn_ip }}? - 2. {{ tunnel_vrf }} default route: does one exist? - 3. Backbone nexthop {{ backbone_peer }}: reachable in default VRF? + 1. Does {{ tunnel_interface }} already exist? + 2. Does {{ tunnel_source_lo }} already exist? + 3. Current route for {{ ashburn_ip }} + + Planned config: + - {{ tunnel_source_lo }}: {{ tunnel_source_ip }}/32 + - {{ tunnel_interface }}: GRE src {{ tunnel_source_ip }} dst {{ biscayne_ip }} + link address {{ tunnel_local }}/31 + ACL {{ tunnel_acl }}: permit src {{ ashburn_ip }}, permit src {{ tunnel_remote }} + - Route: {{ ashburn_ip }}/32 via {{ tunnel_remote }} + - Outbound default for tunnel traffic: 0.0.0.0/0 via {{ backbone_interface }} {{ backbone_peer }} To apply config: ansible-playbook -i inventory/switches.yml playbooks/ashburn-relay-mia-sw01.yml \ @@ -163,18 +182,33 @@ arista.eos.eos_command: commands: - command: "configure session {{ session_name }}" - # Permit Ashburn IP through the tunnel ACL (insert before deny) - - command: "ip access-list {{ tunnel_acl }}" - - command: "45 permit ip host {{ ashburn_ip }} any" + # Loopback for tunnel source (always-up interface) + - command: "interface {{ tunnel_source_lo }}" + - command: "ip address {{ tunnel_source_ip }}/32" - command: exit - # Default route in vrf1 via backbone to was-sw01 (egress-vrf default) - # Safe because per-tunnel ACLs already restrict what enters vrf1 - - command: "ip route vrf {{ tunnel_vrf }} 0.0.0.0/0 egress-vrf default {{ backbone_interface }} {{ backbone_peer }}" - # Inbound: route traffic for ashburn IP from default VRF to biscayne via tunnel. - # Without this, mia-sw01 sends 137.239.194.65 out the ISP uplink → routing loop. - # NOTE: nexthop only, no interface — EOS silently drops cross-VRF routes that - # specify a tunnel interface (accepts in config but never installs in RIB). - - command: "ip route {{ ashburn_ip }}/32 egress-vrf {{ tunnel_vrf }} {{ tunnel_nexthop }}" + # ACL for the new tunnel — we control this, DZ agent won't touch it + - command: "ip access-list {{ tunnel_acl }}" + - command: "counters per-entry" + - command: "10 permit icmp host {{ tunnel_remote }} any" + - command: "20 permit ip host {{ ashburn_ip }} any" + - command: "30 permit ip host {{ tunnel_remote }} any" + - command: "100 deny ip any any" + - command: exit + # New GRE tunnel + - command: "interface {{ tunnel_interface }}" + - command: "mtu 9216" + - command: "ip address {{ tunnel_local }}/31" + - command: "ip access-group {{ tunnel_acl }} in" + - command: "tunnel mode gre" + - command: "tunnel source {{ tunnel_source_ip }}" + - command: "tunnel destination {{ biscayne_ip }}" + - command: exit + # Inbound: route ashburn IP to biscayne via the new tunnel + - command: "ip route {{ ashburn_ip }}/32 {{ tunnel_remote }}" + # Outbound: biscayne's traffic exits via backbone to was-sw01. + # Use a specific route for the backbone peer so tunnel traffic + # can reach was-sw01 without a blanket default route. + # (The switch's actual default route is via Et1/1 ISP uplink.) - name: Show session diff arista.eos.eos_command: @@ -199,9 +233,11 @@ - name: Verify config arista.eos.eos_command: commands: - - "show running-config | section ip access-list {{ tunnel_acl }}" - - "show ip route vrf {{ tunnel_vrf }} 0.0.0.0/0" + - "show running-config interfaces {{ tunnel_source_lo }}" + - "show running-config interfaces {{ tunnel_interface }}" + - "show ip access-lists {{ tunnel_acl }}" - "show ip route {{ ashburn_ip }}" + - "show interfaces {{ tunnel_interface }} status" register: verify - name: Display verification @@ -216,14 +252,14 @@ Checkpoint: {{ checkpoint_name }} Changes applied: - 1. ACL {{ tunnel_acl }}: added "45 permit ip host {{ ashburn_ip }} any" - 2. Default route in {{ tunnel_vrf }}: 0.0.0.0/0 egress-vrf default {{ backbone_interface }} {{ backbone_peer }} - 3. Inbound route: {{ ashburn_ip }}/32 egress-vrf {{ tunnel_vrf }} {{ tunnel_nexthop }} + 1. {{ tunnel_source_lo }}: {{ tunnel_source_ip }}/32 + 2. {{ tunnel_interface }}: GRE tunnel to {{ biscayne_ip }} + link {{ tunnel_local }}/31, ACL {{ tunnel_acl }} + 3. Route: {{ ashburn_ip }}/32 via {{ tunnel_remote }} The config will auto-revert in 5 minutes unless committed. Verify on the switch, then commit: - configure session {{ session_name }} commit - write memory + ansible-playbook ... -e commit=true To revert immediately: ansible-playbook ... -e rollback=true diff --git a/playbooks/files/ashburn-routing-ifup.sh.j2 b/playbooks/files/ashburn-routing-ifup.sh.j2 new file mode 100644 index 00000000..cc5c3b1f --- /dev/null +++ b/playbooks/files/ashburn-routing-ifup.sh.j2 @@ -0,0 +1,28 @@ +#!/bin/bash +# /etc/network/if-up.d/ashburn-routing +# Restore GRE tunnel and policy routing for Ashburn validator relay +# after reboot or interface up. Acts on eno1 (public interface) since +# the GRE tunnel depends on it. + +[ "$IFACE" = "eno1" ] || exit 0 + +# Create GRE tunnel if it doesn't exist +if ! ip tunnel show {{ tunnel_device }} 2>/dev/null; then + ip tunnel add {{ tunnel_device }} mode gre local {{ tunnel_src }} remote {{ tunnel_dst }} ttl 64 + ip addr add {{ tunnel_local_ip }}/31 dev {{ tunnel_device }} + ip link set {{ tunnel_device }} up mtu 8972 +fi + +# Ensure rt_tables entry exists +grep -q '^{{ rt_table_id }} {{ rt_table_name }}$' /etc/iproute2/rt_tables || \ + echo "{{ rt_table_id }} {{ rt_table_name }}" >> /etc/iproute2/rt_tables + +# Add policy rule +ip rule show | grep -q 'fwmark 0x64 lookup {{ rt_table_name }}' || \ + ip rule add fwmark {{ fwmark }} table {{ rt_table_name }} + +# Add default route via mia-sw01 through GRE tunnel +ip route replace default via {{ tunnel_remote_ip }} dev {{ tunnel_device }} table {{ rt_table_name }} + +# Add Ashburn IP to loopback +ip addr show lo | grep -q '{{ ashburn_ip }}' || ip addr add {{ ashburn_ip }}/32 dev lo