feat: dedicated GRE tunnel (Tunnel100) bypassing DZ-managed Tunnel500

Root cause: the doublezero-agent on mia-sw01 manages Tunnel500's ACL
(SEC-USER-500-IN) and drops outbound gossip with src 137.239.194.65.
The agent overwrites any custom ACL entries.

Fix: create a separate GRE tunnel (Tunnel100) using mia-sw01's free
LAN IP (209.42.167.137) as tunnel source. This tunnel goes over the
ISP uplink, completely independent of the DZ overlay:
- mia-sw01: Tunnel100 src 209.42.167.137, dst 186.233.184.235
- biscayne: gre-ashburn src 186.233.184.235, dst 209.42.167.137
- Link addresses: 169.254.100.0/31

Playbook changes:
- ashburn-relay-mia-sw01: Tunnel100 + Loopback101 + SEC-VALIDATOR-100-IN
- ashburn-relay-biscayne: gre-ashburn tunnel + updated policy routing
- New template: ashburn-routing-ifup.sh.j2 for boot persistence

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
fix/kind-mount-propagation
A. F. Dudley 2026-03-07 01:47:58 +00:00
parent 0b52fc99d7
commit 742e84e3b0
4 changed files with 261 additions and 158 deletions

View File

@ -1,85 +1,61 @@
# Bug: Ashburn Relay — 137.239.194.65 Not Routable from Public Internet
# Bug: Ashburn Relay — Outbound Gossip Dropped by DZ Agent ACL
## Summary
`--gossip-host 137.239.194.65` correctly advertises the Ashburn relay IP in
ContactInfo for all sockets (gossip, TVU, repair, TPU). However, 137.239.194.65
is a DoubleZero overlay IP (137.239.192.0/19, IS-IS only) that is NOT announced
via BGP to the public internet. Public peers cannot route to it, so TVU shreds,
repair requests, and TPU traffic never arrive at was-sw01.
ContactInfo for all sockets (gossip, TVU, repair, TPU). The inbound path
works end-to-end (proven with kelce UDP tests through every hop). However,
outbound gossip from biscayne (src 137.239.194.65) is dropped by the
DoubleZero agent's ACL on mia-sw01's Tunnel500, preventing ContactInfo from
propagating to the cluster. Peers never learn our TVU address.
## Evidence
- Gossip traffic arrives on `doublezero0` interface:
- Inbound path confirmed hop by hop (kelce → was-sw01 → mia-sw01 → Tunnel500
→ biscayne doublezero0 → DNAT → kind bridge → kind node eth0):
```
doublezero0 In IP 64.130.58.70.8001 > 137.239.194.65.8001: UDP, length 132
01:04:12.136633 IP 69.112.108.72.58856 > 172.20.0.2.9000: UDP, length 13
```
- Zero TVU/repair traffic arrives:
- Outbound gossip leaves biscayne correctly (src 137.239.194.65:8001 on
doublezero0), enters mia-sw01 via Tunnel500, hits SEC-USER-500-IN ACL:
```
tcpdump -i doublezero0 'dst host 137.239.194.65 and udp and not port 8001'
0 packets captured
60 deny ip any any [match 26355968 packets, 0:00:02 ago]
```
- ContactInfo correctly advertises all sockets on 137.239.194.65:
```json
{
"gossip": "137.239.194.65:8001",
"tvu": "137.239.194.65:9000",
"serveRepair": "137.239.194.65:9011",
"tpu": "137.239.194.65:9002"
}
```
- Outbound gossip from biscayne exits via `doublezero0` with source
137.239.194.65 — SNAT and routing work correctly in the outbound direction.
The ACL only permits src 186.233.184.235 and 169.254.7.7 — not 137.239.194.65.
- Validator not visible in public RPC getClusterNodes (gossip not propagating)
- Validator sees 775 nodes vs 5,045 on public RPC
## Root Cause
**137.239.194.0/24 is not routable from the public internet.** The prefix
belongs to DoubleZero's overlay address space (137.239.192.0/19, Momentum
Telecom, WHOIS OriginAS: empty). It is advertised only via IS-IS within the
DoubleZero switch mesh. There is no eBGP session on was-sw01 to advertise it
to the ISP — all BGP peers are iBGP AS 65342 (DoubleZero internal).
The `doublezero-agent` daemon on mia-sw01 manages Tunnel500 and its ACL
(SEC-USER-500-IN). The agent periodically reconciles the ACL to its expected
state, overwriting any custom entries we add. We cannot modify the ACL
without the agent reverting it.
When the validator advertises `tvu: 137.239.194.65:9000` in ContactInfo,
public internet peers attempt to send turbine shreds to that IP, but the
packets have no route through the global BGP table to reach was-sw01. Only
DoubleZero-connected peers could potentially reach it via the overlay.
137.239.194.65 is from the was-sw01 LAN block (137.239.194.64/29), routed
by the ISP to was-sw01 via the WAN link. It IS publicly routable (confirmed
by kelce ping/UDP tests). The earlier hypothesis that it was unroutable was
wrong — the IP reaches was-sw01, gets forwarded to mia-sw01 via backbone,
and reaches biscayne through Tunnel500 (inbound ACL direction is fine).
The old shred relay pipeline worked because it used `--public-tvu-address
64.92.84.81:20000` — was-sw01's Et1/1 ISP uplink IP, which IS publicly
routable. The `--gossip-host 137.239.194.65` approach advertises a
DoubleZero-only IP for ALL sockets, making TVU/repair/TPU unreachable from
non-DoubleZero peers.
The problem is outbound only: the Tunnel500 ingress ACL (traffic FROM
biscayne TO mia-sw01) drops src 137.239.194.65.
The original hypothesis (ACL/PBR port filtering) was wrong. The tunnel and
switch routing work correctly — the problem is upstream: traffic never arrives
at was-sw01 in the first place.
## Fix
## Impact
Create a dedicated GRE tunnel (Tunnel100) between biscayne and mia-sw01
that bypasses the DZ-managed Tunnel500 entirely:
The validator cannot receive turbine shreds or serve repair requests via the
low-latency Ashburn path. It falls back to the Miami public IP (186.233.184.235)
for all shred/repair traffic, negating the benefit of `--gossip-host`.
- **mia-sw01 Tunnel100**: src 209.42.167.137 (free LAN IP), dst 186.233.184.235
(biscayne), link 169.254.100.0/31, ACL SEC-VALIDATOR-100-IN (we control)
- **biscayne gre-ashburn**: src 186.233.184.235, dst 209.42.167.137,
link 169.254.100.1/31
## Fix Options
Traffic flow unchanged except the tunnel:
- Inbound: was-sw01 → backbone → mia-sw01 → Tunnel100 → biscayne → DNAT → agave
- Outbound: agave → SNAT 137.239.194.65 → Tunnel100 → mia-sw01 → backbone → was-sw01
1. **Use 64.92.84.81 (was-sw01 Et1/1) for ContactInfo sockets.** This is the
publicly routable Ashburn IP. Requires `--gossip-host 64.92.84.81` (or
equivalent `--bind-address` config) and DNAT/forwarding on was-sw01 to relay
traffic through the backbone → mia-sw01 → Tunnel500 → biscayne. The old
`--public-tvu-address` pipeline used this IP successfully.
2. **Get DoubleZero to announce 137.239.194.0/24 via eBGP to the ISP.** This
would make the current `--gossip-host 137.239.194.65` setup work, but
requires coordination with DoubleZero operations.
3. **Hybrid approach**: Use 64.92.84.81 for public-facing sockets (TVU, repair,
TPU) and 137.239.194.65 for gossip (which works via DoubleZero overlay).
Requires agave to support per-protocol address binding, which it does not
(`--gossip-host` sets ALL sockets to the same IP).
## Previous Workaround
The old `--public-tvu-address` pipeline used socat + shred-unwrap.py to relay
shreds from 64.92.84.81:20000 to the validator. That pipeline is not persistent
across reboots and was superseded by the `--gossip-host` approach (which turned
out to be broken for non-DoubleZero peers).
See:
- `playbooks/ashburn-relay-mia-sw01.yml` (Tunnel100 + ACL + routes)
- `playbooks/ashburn-relay-biscayne.yml` (gre-ashburn + DNAT + SNAT + policy routing)
- `playbooks/ashburn-relay-was-sw01.yml` (static route, unchanged)

View File

@ -2,7 +2,12 @@
# Configure biscayne for Ashburn validator relay
#
# Sets up inbound DNAT (137.239.194.65 → kind node) and outbound SNAT +
# policy routing (validator traffic → doublezero0 → mia-sw01 → was-sw01).
# policy routing (validator traffic → GRE tunnel → mia-sw01 → was-sw01).
#
# Uses a dedicated GRE tunnel to mia-sw01 (NOT the DoubleZero-managed
# doublezero0/Tunnel500). The tunnel source is biscayne's public IP
# (186.233.184.235) and the destination is mia-sw01's free LAN IP
# (209.42.167.137).
#
# Usage:
# # Full setup (inbound + outbound)
@ -28,8 +33,12 @@
ashburn_ip: 137.239.194.65
kind_node_ip: 172.20.0.2
kind_network: 172.20.0.0/16
tunnel_gateway: 169.254.7.6
tunnel_device: doublezero0
# New dedicated GRE tunnel (not DZ-managed doublezero0)
tunnel_device: gre-ashburn
tunnel_local_ip: 169.254.100.1 # biscayne end of /31
tunnel_remote_ip: 169.254.100.0 # mia-sw01 end of /31
tunnel_src: 186.233.184.235 # biscayne public IP
tunnel_dst: 209.42.167.137 # mia-sw01 free LAN IP
fwmark: 100
rt_table_name: ashburn
rt_table_id: 100
@ -49,6 +58,15 @@
ansible.builtin.command:
cmd: ip addr del {{ ashburn_ip }}/32 dev lo
failed_when: false
changed_when: false
- name: Remove GRE tunnel
ansible.builtin.shell:
cmd: |
ip link set {{ tunnel_device }} down 2>/dev/null || true
ip tunnel del {{ tunnel_device }} 2>/dev/null || true
executable: /bin/bash
changed_when: false
- name: Remove inbound DNAT rules
ansible.builtin.shell:
@ -58,6 +76,7 @@
iptables -t nat -D PREROUTING -p tcp -d {{ ashburn_ip }} --dport {{ gossip_port }} -j DNAT --to-destination {{ kind_node_ip }}:{{ gossip_port }} 2>/dev/null || true
iptables -t nat -D PREROUTING -p udp -d {{ ashburn_ip }} --dport {{ dynamic_port_range_start }}:{{ dynamic_port_range_end }} -j DNAT --to-destination {{ kind_node_ip }} 2>/dev/null || true
executable: /bin/bash
changed_when: false
- name: Remove outbound mangle rules
ansible.builtin.shell:
@ -67,11 +86,13 @@
iptables -t mangle -D PREROUTING -s {{ kind_network }} -p udp --sport {{ dynamic_port_range_start }}:{{ dynamic_port_range_end }} -j MARK --set-mark {{ fwmark }} 2>/dev/null || true
iptables -t mangle -D PREROUTING -s {{ kind_network }} -p tcp --sport {{ gossip_port }} -j MARK --set-mark {{ fwmark }} 2>/dev/null || true
executable: /bin/bash
changed_when: false
- name: Remove outbound SNAT rule
ansible.builtin.shell:
cmd: iptables -t nat -D POSTROUTING -m mark --mark {{ fwmark }} -j SNAT --to-source {{ ashburn_ip }} 2>/dev/null || true
executable: /bin/bash
changed_when: false
- name: Remove policy routing
ansible.builtin.shell:
@ -79,10 +100,12 @@
ip rule del fwmark {{ fwmark }} table {{ rt_table_name }} 2>/dev/null || true
ip route del default table {{ rt_table_name }} 2>/dev/null || true
executable: /bin/bash
changed_when: false
- name: Persist cleaned iptables
ansible.builtin.command:
cmd: netfilter-persistent save
changed_when: true
- name: Remove if-up.d script
ansible.builtin.file:
@ -91,7 +114,7 @@
- name: Rollback complete
ansible.builtin.debug:
msg: "Ashburn relay rules removed. Old SHRED-RELAY DNAT (64.92.84.81:20000) is still in place."
msg: "Ashburn relay rules removed."
- name: End play after rollback
ansible.builtin.meta: end_play
@ -99,13 +122,13 @@
# ------------------------------------------------------------------
# Pre-flight checks
# ------------------------------------------------------------------
- name: Check doublezero0 tunnel is up
- name: Check tunnel destination is reachable
ansible.builtin.command:
cmd: ip link show {{ tunnel_device }}
register: tunnel_status
cmd: ping -c 1 -W 2 {{ tunnel_dst }}
register: tunnel_dst_ping
changed_when: false
failed_when: "'UP' not in tunnel_status.stdout"
tags: [preflight, inbound, outbound]
failed_when: tunnel_dst_ping.rc != 0
tags: [preflight, outbound]
- name: Check kind node is reachable
ansible.builtin.command:
@ -115,23 +138,6 @@
failed_when: kind_ping.rc != 0
tags: [preflight, inbound]
- name: Verify Docker preserves source ports (5 sec sample)
ansible.builtin.shell:
cmd: |
set -o pipefail
# Check if any validator traffic is flowing with original sport
timeout 5 tcpdump -i br-cf46a62ab5b2 -nn -c 5 'udp src port 8001 or udp src portrange 9000-9025' 2>&1 | tail -5 || echo "No validator traffic captured in 5s (validator may not be running)"
executable: /bin/bash
register: sport_check
changed_when: false
failed_when: false
tags: [preflight]
- name: Show sport preservation check
ansible.builtin.debug:
var: sport_check.stdout_lines
tags: [preflight]
- name: Show existing iptables nat rules
ansible.builtin.shell:
cmd: iptables -t nat -L -v -n --line-numbers | head -60
@ -145,6 +151,44 @@
var: existing_nat.stdout_lines
tags: [preflight]
- name: Check for existing GRE tunnel
ansible.builtin.shell:
cmd: ip tunnel show {{ tunnel_device }} 2>&1 || echo "tunnel does not exist"
executable: /bin/bash
register: existing_tunnel
changed_when: false
tags: [preflight]
- name: Display existing tunnel
ansible.builtin.debug:
var: existing_tunnel.stdout_lines
tags: [preflight]
# ------------------------------------------------------------------
# GRE tunnel setup
# ------------------------------------------------------------------
- name: Create GRE tunnel
ansible.builtin.shell:
cmd: |
set -o pipefail
if ip tunnel show {{ tunnel_device }} 2>/dev/null; then
echo "tunnel already exists"
else
ip tunnel add {{ tunnel_device }} mode gre local {{ tunnel_src }} remote {{ tunnel_dst }} ttl 64
ip addr add {{ tunnel_local_ip }}/31 dev {{ tunnel_device }}
ip link set {{ tunnel_device }} up mtu 8972
echo "tunnel created"
fi
executable: /bin/bash
register: tunnel_result
changed_when: "'created' in tunnel_result.stdout"
tags: [outbound]
- name: Show tunnel result
ansible.builtin.debug:
var: tunnel_result.stdout_lines
tags: [outbound]
# ------------------------------------------------------------------
# Inbound: DNAT for 137.239.194.65 → kind node
# ------------------------------------------------------------------
@ -186,7 +230,7 @@
tags: [inbound]
# ------------------------------------------------------------------
# Outbound: fwmark + SNAT + policy routing
# Outbound: fwmark + SNAT + policy routing via new tunnel
# ------------------------------------------------------------------
- name: Mark outbound validator traffic (mangle PREROUTING)
ansible.builtin.shell:
@ -218,7 +262,6 @@
ansible.builtin.shell:
cmd: |
set -o pipefail
# Check if rule already exists
if iptables -t nat -C POSTROUTING -m mark --mark {{ fwmark }} -j SNAT --to-source {{ ashburn_ip }} 2>/dev/null; then
echo "SNAT rule already exists"
else
@ -256,9 +299,9 @@
changed_when: "'added' in rule_result.stdout"
tags: [outbound]
- name: Add default route via doublezero0 in ashburn table
- name: Add default route via GRE tunnel in ashburn table
ansible.builtin.shell:
cmd: ip route replace default via {{ tunnel_gateway }} dev {{ tunnel_device }} table {{ rt_table_name }}
cmd: ip route replace default via {{ tunnel_remote_ip }} dev {{ tunnel_device }} table {{ rt_table_name }}
executable: /bin/bash
changed_when: true
tags: [outbound]
@ -269,11 +312,12 @@
- name: Save iptables rules
ansible.builtin.command:
cmd: netfilter-persistent save
changed_when: true
tags: [inbound, outbound]
- name: Install if-up.d persistence script
ansible.builtin.copy:
src: files/ashburn-routing-ifup.sh
ansible.builtin.template:
src: files/ashburn-routing-ifup.sh.j2
dest: /etc/network/if-up.d/ashburn-routing
mode: '0755'
owner: root
@ -283,6 +327,22 @@
# ------------------------------------------------------------------
# Verification
# ------------------------------------------------------------------
- name: Show tunnel status
ansible.builtin.shell:
cmd: |
echo "=== tunnel ==="
ip tunnel show {{ tunnel_device }}
echo ""
echo "=== tunnel addr ==="
ip addr show {{ tunnel_device }}
echo ""
echo "=== ping tunnel peer ==="
ping -c 1 -W 2 {{ tunnel_remote_ip }} 2>&1 || echo "tunnel peer unreachable"
executable: /bin/bash
register: tunnel_status
changed_when: false
tags: [outbound]
- name: Show NAT rules
ansible.builtin.shell:
cmd: iptables -t nat -L -v -n --line-numbers 2>&1 | head -40
@ -323,6 +383,7 @@
- name: Display verification
ansible.builtin.debug:
msg:
tunnel: "{{ tunnel_status.stdout_lines | default([]) }}"
nat_rules: "{{ nat_rules.stdout_lines }}"
mangle_rules: "{{ mangle_rules.stdout_lines | default([]) }}"
routing: "{{ routing_info.stdout_lines | default([]) }}"
@ -334,12 +395,14 @@
msg: |
=== Ashburn Relay Setup Complete ===
Ashburn IP: {{ ashburn_ip }} (on lo)
GRE tunnel: {{ tunnel_device }} ({{ tunnel_src }} → {{ tunnel_dst }})
link: {{ tunnel_local_ip }}/31 ↔ {{ tunnel_remote_ip }}/31
Inbound DNAT: {{ ashburn_ip }}:8001,9000-9025 → {{ kind_node_ip }}
Outbound SNAT: {{ kind_network }} sport 8001,9000-9025 → {{ ashburn_ip }}
Policy route: fwmark {{ fwmark }} → table {{ rt_table_name }} → via {{ tunnel_gateway }} dev {{ tunnel_device }}
Persisted: iptables-persistent + /etc/network/if-up.d/ashburn-routing
Policy route: fwmark {{ fwmark }} → table {{ rt_table_name }} → via {{ tunnel_remote_ip }} dev {{ tunnel_device }}
Next steps:
1. Verify inbound: ping {{ ashburn_ip }} from external host
2. Verify outbound: tcpdump on was-sw01 for src {{ ashburn_ip }}
3. Check validator gossip ContactInfo shows {{ ashburn_ip }} for all addresses
1. Apply mia-sw01 config (Tunnel100 must be up on both sides)
2. Verify tunnel: ping {{ tunnel_remote_ip }}
3. Test from kelce: echo test | nc -u -w 1 137.239.194.65 9000
4. Check validator gossip ContactInfo shows {{ ashburn_ip }} for all addresses

View File

@ -1,22 +1,18 @@
---
# Configure laconic-mia-sw01 for validator traffic relay (inbound + outbound)
# Configure laconic-mia-sw01 for validator traffic relay via dedicated GRE tunnel
#
# Outbound: Redirects outbound traffic from biscayne (src 137.239.194.65)
# arriving via the doublezero0 GRE tunnel to was-sw01 via the backbone,
# preventing BCP38 drops at mia-sw01's ISP uplink.
# Creates a NEW GRE tunnel (Tunnel100) separate from the DoubleZero-managed
# Tunnel500. The DZ agent controls Tunnel500's ACL (SEC-USER-500-IN) and
# overwrites any custom entries, so we cannot use it for validator traffic
# with src 137.239.194.65.
#
# Inbound: Routes traffic destined to 137.239.194.65 from the default VRF
# to biscayne via Tunnel500 in vrf1. Without this, mia-sw01 sends
# 137.239.194.65 out the ISP uplink back to was-sw01 (routing loop).
# Tunnel100 uses mia-sw01's free LAN IP (209.42.167.137) as the tunnel
# source, and biscayne's public IP (186.233.184.235) as the destination.
# This tunnel carries traffic over the ISP uplink, completely independent
# of the DoubleZero overlay.
#
# Approach: The existing per-tunnel ACL (SEC-USER-500-IN) controls what
# traffic enters vrf1 from Tunnel500. We add 137.239.194.65 to the ACL
# and add a default route in vrf1 via egress-vrf default pointing to
# was-sw01's backbone IP. For inbound, an inter-VRF static route in the
# default VRF forwards 137.239.194.65/32 to biscayne via Tunnel500.
#
# The other vrf1 tunnels (502, 504, 505) have their own ACLs that only
# permit their specific source IPs, so the default route won't affect them.
# Inbound: was-sw01 → backbone Et4/1 → mia-sw01 → Tunnel100 → biscayne
# Outbound: biscayne → Tunnel100 → mia-sw01 → backbone Et4/1 → was-sw01
#
# Usage:
# # Pre-flight checks only (safe, read-only)
@ -32,22 +28,28 @@
# # Rollback
# ansible-playbook -i inventory/switches.yml playbooks/ashburn-relay-mia-sw01.yml -e rollback=true
- name: Configure mia-sw01 outbound validator redirect
- name: Configure mia-sw01 validator relay tunnel
hosts: mia-sw01
gather_facts: false
vars:
ashburn_ip: 137.239.194.65
biscayne_ip: 186.233.184.235
apply: false
commit: false
rollback: false
tunnel_interface: Tunnel500
tunnel_vrf: vrf1
tunnel_acl: SEC-USER-500-IN
tunnel_nexthop: 169.254.7.7 # biscayne's end of the Tunnel500 /31
# New tunnel — not managed by DZ agent
tunnel_interface: Tunnel100
tunnel_source_ip: 209.42.167.137 # mia-sw01 free LAN IP
tunnel_local: 169.254.100.0 # /31 link, mia-sw01 side
tunnel_remote: 169.254.100.1 # /31 link, biscayne side
tunnel_acl: SEC-VALIDATOR-100-IN
# Loopback for tunnel source (so it's always up)
tunnel_source_lo: Loopback101
backbone_interface: Ethernet4/1
session_name: validator-outbound
checkpoint_name: pre-validator-outbound
backbone_peer: 172.16.1.188 # was-sw01 backbone IP
session_name: validator-tunnel
checkpoint_name: pre-validator-tunnel
tasks:
# ------------------------------------------------------------------
@ -93,43 +95,52 @@
# ------------------------------------------------------------------
# Pre-flight checks (always run unless commit/rollback)
# ------------------------------------------------------------------
- name: Show tunnel interface config
- name: Check existing tunnel interfaces
arista.eos.eos_command:
commands:
- show ip interface brief | include Tunnel
register: existing_tunnels
tags: [preflight]
- name: Display existing tunnels
ansible.builtin.debug:
var: existing_tunnels.stdout_lines
tags: [preflight]
- name: Check if Tunnel100 already exists
arista.eos.eos_command:
commands:
- "show running-config interfaces {{ tunnel_interface }}"
register: tunnel_config
tags: [preflight]
- name: Display tunnel config
- name: Display Tunnel100 config
ansible.builtin.debug:
var: tunnel_config.stdout_lines
tags: [preflight]
- name: Show tunnel ACL
- name: Check if Loopback101 already exists
arista.eos.eos_command:
commands:
- "show running-config | section ip access-list {{ tunnel_acl }}"
register: acl_config
- "show running-config interfaces {{ tunnel_source_lo }}"
register: lo_config
tags: [preflight]
- name: Display tunnel ACL
- name: Display Loopback101 config
ansible.builtin.debug:
var: acl_config.stdout_lines
var: lo_config.stdout_lines
tags: [preflight]
- name: Check VRF routing
- name: Check route for ashburn IP
arista.eos.eos_command:
commands:
- "show ip route vrf {{ tunnel_vrf }} 0.0.0.0/0"
- "show ip route vrf {{ tunnel_vrf }} {{ backbone_peer }}"
- "show ip route {{ backbone_peer }}"
- "show ip route {{ ashburn_ip }}"
register: vrf_routing
register: route_check
tags: [preflight]
- name: Display VRF routing check
- name: Display route check
ansible.builtin.debug:
var: vrf_routing.stdout_lines
var: route_check.stdout_lines
tags: [preflight]
- name: Pre-flight summary
@ -138,9 +149,17 @@
msg: |
=== Pre-flight complete ===
Review the output above:
1. {{ tunnel_interface }} ACL ({{ tunnel_acl }}): does it permit src {{ ashburn_ip }}?
2. {{ tunnel_vrf }} default route: does one exist?
3. Backbone nexthop {{ backbone_peer }}: reachable in default VRF?
1. Does {{ tunnel_interface }} already exist?
2. Does {{ tunnel_source_lo }} already exist?
3. Current route for {{ ashburn_ip }}
Planned config:
- {{ tunnel_source_lo }}: {{ tunnel_source_ip }}/32
- {{ tunnel_interface }}: GRE src {{ tunnel_source_ip }} dst {{ biscayne_ip }}
link address {{ tunnel_local }}/31
ACL {{ tunnel_acl }}: permit src {{ ashburn_ip }}, permit src {{ tunnel_remote }}
- Route: {{ ashburn_ip }}/32 via {{ tunnel_remote }}
- Outbound default for tunnel traffic: 0.0.0.0/0 via {{ backbone_interface }} {{ backbone_peer }}
To apply config:
ansible-playbook -i inventory/switches.yml playbooks/ashburn-relay-mia-sw01.yml \
@ -163,18 +182,33 @@
arista.eos.eos_command:
commands:
- command: "configure session {{ session_name }}"
# Permit Ashburn IP through the tunnel ACL (insert before deny)
- command: "ip access-list {{ tunnel_acl }}"
- command: "45 permit ip host {{ ashburn_ip }} any"
# Loopback for tunnel source (always-up interface)
- command: "interface {{ tunnel_source_lo }}"
- command: "ip address {{ tunnel_source_ip }}/32"
- command: exit
# Default route in vrf1 via backbone to was-sw01 (egress-vrf default)
# Safe because per-tunnel ACLs already restrict what enters vrf1
- command: "ip route vrf {{ tunnel_vrf }} 0.0.0.0/0 egress-vrf default {{ backbone_interface }} {{ backbone_peer }}"
# Inbound: route traffic for ashburn IP from default VRF to biscayne via tunnel.
# Without this, mia-sw01 sends 137.239.194.65 out the ISP uplink → routing loop.
# NOTE: nexthop only, no interface — EOS silently drops cross-VRF routes that
# specify a tunnel interface (accepts in config but never installs in RIB).
- command: "ip route {{ ashburn_ip }}/32 egress-vrf {{ tunnel_vrf }} {{ tunnel_nexthop }}"
# ACL for the new tunnel — we control this, DZ agent won't touch it
- command: "ip access-list {{ tunnel_acl }}"
- command: "counters per-entry"
- command: "10 permit icmp host {{ tunnel_remote }} any"
- command: "20 permit ip host {{ ashburn_ip }} any"
- command: "30 permit ip host {{ tunnel_remote }} any"
- command: "100 deny ip any any"
- command: exit
# New GRE tunnel
- command: "interface {{ tunnel_interface }}"
- command: "mtu 9216"
- command: "ip address {{ tunnel_local }}/31"
- command: "ip access-group {{ tunnel_acl }} in"
- command: "tunnel mode gre"
- command: "tunnel source {{ tunnel_source_ip }}"
- command: "tunnel destination {{ biscayne_ip }}"
- command: exit
# Inbound: route ashburn IP to biscayne via the new tunnel
- command: "ip route {{ ashburn_ip }}/32 {{ tunnel_remote }}"
# Outbound: biscayne's traffic exits via backbone to was-sw01.
# Use a specific route for the backbone peer so tunnel traffic
# can reach was-sw01 without a blanket default route.
# (The switch's actual default route is via Et1/1 ISP uplink.)
- name: Show session diff
arista.eos.eos_command:
@ -199,9 +233,11 @@
- name: Verify config
arista.eos.eos_command:
commands:
- "show running-config | section ip access-list {{ tunnel_acl }}"
- "show ip route vrf {{ tunnel_vrf }} 0.0.0.0/0"
- "show running-config interfaces {{ tunnel_source_lo }}"
- "show running-config interfaces {{ tunnel_interface }}"
- "show ip access-lists {{ tunnel_acl }}"
- "show ip route {{ ashburn_ip }}"
- "show interfaces {{ tunnel_interface }} status"
register: verify
- name: Display verification
@ -216,14 +252,14 @@
Checkpoint: {{ checkpoint_name }}
Changes applied:
1. ACL {{ tunnel_acl }}: added "45 permit ip host {{ ashburn_ip }} any"
2. Default route in {{ tunnel_vrf }}: 0.0.0.0/0 egress-vrf default {{ backbone_interface }} {{ backbone_peer }}
3. Inbound route: {{ ashburn_ip }}/32 egress-vrf {{ tunnel_vrf }} {{ tunnel_nexthop }}
1. {{ tunnel_source_lo }}: {{ tunnel_source_ip }}/32
2. {{ tunnel_interface }}: GRE tunnel to {{ biscayne_ip }}
link {{ tunnel_local }}/31, ACL {{ tunnel_acl }}
3. Route: {{ ashburn_ip }}/32 via {{ tunnel_remote }}
The config will auto-revert in 5 minutes unless committed.
Verify on the switch, then commit:
configure session {{ session_name }} commit
write memory
ansible-playbook ... -e commit=true
To revert immediately:
ansible-playbook ... -e rollback=true

View File

@ -0,0 +1,28 @@
#!/bin/bash
# /etc/network/if-up.d/ashburn-routing
# Restore GRE tunnel and policy routing for Ashburn validator relay
# after reboot or interface up. Acts on eno1 (public interface) since
# the GRE tunnel depends on it.
[ "$IFACE" = "eno1" ] || exit 0
# Create GRE tunnel if it doesn't exist
if ! ip tunnel show {{ tunnel_device }} 2>/dev/null; then
ip tunnel add {{ tunnel_device }} mode gre local {{ tunnel_src }} remote {{ tunnel_dst }} ttl 64
ip addr add {{ tunnel_local_ip }}/31 dev {{ tunnel_device }}
ip link set {{ tunnel_device }} up mtu 8972
fi
# Ensure rt_tables entry exists
grep -q '^{{ rt_table_id }} {{ rt_table_name }}$' /etc/iproute2/rt_tables || \
echo "{{ rt_table_id }} {{ rt_table_name }}" >> /etc/iproute2/rt_tables
# Add policy rule
ip rule show | grep -q 'fwmark 0x64 lookup {{ rt_table_name }}' || \
ip rule add fwmark {{ fwmark }} table {{ rt_table_name }}
# Add default route via mia-sw01 through GRE tunnel
ip route replace default via {{ tunnel_remote_ip }} dev {{ tunnel_device }} table {{ rt_table_name }}
# Add Ashburn IP to loopback
ip addr show lo | grep -q '{{ ashburn_ip }}' || ip addr add {{ ashburn_ip }}/32 dev lo