feat: dedicated GRE tunnel (Tunnel100) bypassing DZ-managed Tunnel500
Root cause: the doublezero-agent on mia-sw01 manages Tunnel500's ACL (SEC-USER-500-IN) and drops outbound gossip with src 137.239.194.65. The agent overwrites any custom ACL entries. Fix: create a separate GRE tunnel (Tunnel100) using mia-sw01's free LAN IP (209.42.167.137) as tunnel source. This tunnel goes over the ISP uplink, completely independent of the DZ overlay: - mia-sw01: Tunnel100 src 209.42.167.137, dst 186.233.184.235 - biscayne: gre-ashburn src 186.233.184.235, dst 209.42.167.137 - Link addresses: 169.254.100.0/31 Playbook changes: - ashburn-relay-mia-sw01: Tunnel100 + Loopback101 + SEC-VALIDATOR-100-IN - ashburn-relay-biscayne: gre-ashburn tunnel + updated policy routing - New template: ashburn-routing-ifup.sh.j2 for boot persistence Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>fix/kind-mount-propagation
parent
0b52fc99d7
commit
742e84e3b0
|
|
@ -1,85 +1,61 @@
|
|||
# Bug: Ashburn Relay — 137.239.194.65 Not Routable from Public Internet
|
||||
# Bug: Ashburn Relay — Outbound Gossip Dropped by DZ Agent ACL
|
||||
|
||||
## Summary
|
||||
|
||||
`--gossip-host 137.239.194.65` correctly advertises the Ashburn relay IP in
|
||||
ContactInfo for all sockets (gossip, TVU, repair, TPU). However, 137.239.194.65
|
||||
is a DoubleZero overlay IP (137.239.192.0/19, IS-IS only) that is NOT announced
|
||||
via BGP to the public internet. Public peers cannot route to it, so TVU shreds,
|
||||
repair requests, and TPU traffic never arrive at was-sw01.
|
||||
ContactInfo for all sockets (gossip, TVU, repair, TPU). The inbound path
|
||||
works end-to-end (proven with kelce UDP tests through every hop). However,
|
||||
outbound gossip from biscayne (src 137.239.194.65) is dropped by the
|
||||
DoubleZero agent's ACL on mia-sw01's Tunnel500, preventing ContactInfo from
|
||||
propagating to the cluster. Peers never learn our TVU address.
|
||||
|
||||
## Evidence
|
||||
|
||||
- Gossip traffic arrives on `doublezero0` interface:
|
||||
- Inbound path confirmed hop by hop (kelce → was-sw01 → mia-sw01 → Tunnel500
|
||||
→ biscayne doublezero0 → DNAT → kind bridge → kind node eth0):
|
||||
```
|
||||
doublezero0 In IP 64.130.58.70.8001 > 137.239.194.65.8001: UDP, length 132
|
||||
01:04:12.136633 IP 69.112.108.72.58856 > 172.20.0.2.9000: UDP, length 13
|
||||
```
|
||||
- Zero TVU/repair traffic arrives:
|
||||
- Outbound gossip leaves biscayne correctly (src 137.239.194.65:8001 on
|
||||
doublezero0), enters mia-sw01 via Tunnel500, hits SEC-USER-500-IN ACL:
|
||||
```
|
||||
tcpdump -i doublezero0 'dst host 137.239.194.65 and udp and not port 8001'
|
||||
0 packets captured
|
||||
60 deny ip any any [match 26355968 packets, 0:00:02 ago]
|
||||
```
|
||||
- ContactInfo correctly advertises all sockets on 137.239.194.65:
|
||||
```json
|
||||
{
|
||||
"gossip": "137.239.194.65:8001",
|
||||
"tvu": "137.239.194.65:9000",
|
||||
"serveRepair": "137.239.194.65:9011",
|
||||
"tpu": "137.239.194.65:9002"
|
||||
}
|
||||
```
|
||||
- Outbound gossip from biscayne exits via `doublezero0` with source
|
||||
137.239.194.65 — SNAT and routing work correctly in the outbound direction.
|
||||
The ACL only permits src 186.233.184.235 and 169.254.7.7 — not 137.239.194.65.
|
||||
- Validator not visible in public RPC getClusterNodes (gossip not propagating)
|
||||
- Validator sees 775 nodes vs 5,045 on public RPC
|
||||
|
||||
## Root Cause
|
||||
|
||||
**137.239.194.0/24 is not routable from the public internet.** The prefix
|
||||
belongs to DoubleZero's overlay address space (137.239.192.0/19, Momentum
|
||||
Telecom, WHOIS OriginAS: empty). It is advertised only via IS-IS within the
|
||||
DoubleZero switch mesh. There is no eBGP session on was-sw01 to advertise it
|
||||
to the ISP — all BGP peers are iBGP AS 65342 (DoubleZero internal).
|
||||
The `doublezero-agent` daemon on mia-sw01 manages Tunnel500 and its ACL
|
||||
(SEC-USER-500-IN). The agent periodically reconciles the ACL to its expected
|
||||
state, overwriting any custom entries we add. We cannot modify the ACL
|
||||
without the agent reverting it.
|
||||
|
||||
When the validator advertises `tvu: 137.239.194.65:9000` in ContactInfo,
|
||||
public internet peers attempt to send turbine shreds to that IP, but the
|
||||
packets have no route through the global BGP table to reach was-sw01. Only
|
||||
DoubleZero-connected peers could potentially reach it via the overlay.
|
||||
137.239.194.65 is from the was-sw01 LAN block (137.239.194.64/29), routed
|
||||
by the ISP to was-sw01 via the WAN link. It IS publicly routable (confirmed
|
||||
by kelce ping/UDP tests). The earlier hypothesis that it was unroutable was
|
||||
wrong — the IP reaches was-sw01, gets forwarded to mia-sw01 via backbone,
|
||||
and reaches biscayne through Tunnel500 (inbound ACL direction is fine).
|
||||
|
||||
The old shred relay pipeline worked because it used `--public-tvu-address
|
||||
64.92.84.81:20000` — was-sw01's Et1/1 ISP uplink IP, which IS publicly
|
||||
routable. The `--gossip-host 137.239.194.65` approach advertises a
|
||||
DoubleZero-only IP for ALL sockets, making TVU/repair/TPU unreachable from
|
||||
non-DoubleZero peers.
|
||||
The problem is outbound only: the Tunnel500 ingress ACL (traffic FROM
|
||||
biscayne TO mia-sw01) drops src 137.239.194.65.
|
||||
|
||||
The original hypothesis (ACL/PBR port filtering) was wrong. The tunnel and
|
||||
switch routing work correctly — the problem is upstream: traffic never arrives
|
||||
at was-sw01 in the first place.
|
||||
## Fix
|
||||
|
||||
## Impact
|
||||
Create a dedicated GRE tunnel (Tunnel100) between biscayne and mia-sw01
|
||||
that bypasses the DZ-managed Tunnel500 entirely:
|
||||
|
||||
The validator cannot receive turbine shreds or serve repair requests via the
|
||||
low-latency Ashburn path. It falls back to the Miami public IP (186.233.184.235)
|
||||
for all shred/repair traffic, negating the benefit of `--gossip-host`.
|
||||
- **mia-sw01 Tunnel100**: src 209.42.167.137 (free LAN IP), dst 186.233.184.235
|
||||
(biscayne), link 169.254.100.0/31, ACL SEC-VALIDATOR-100-IN (we control)
|
||||
- **biscayne gre-ashburn**: src 186.233.184.235, dst 209.42.167.137,
|
||||
link 169.254.100.1/31
|
||||
|
||||
## Fix Options
|
||||
Traffic flow unchanged except the tunnel:
|
||||
- Inbound: was-sw01 → backbone → mia-sw01 → Tunnel100 → biscayne → DNAT → agave
|
||||
- Outbound: agave → SNAT 137.239.194.65 → Tunnel100 → mia-sw01 → backbone → was-sw01
|
||||
|
||||
1. **Use 64.92.84.81 (was-sw01 Et1/1) for ContactInfo sockets.** This is the
|
||||
publicly routable Ashburn IP. Requires `--gossip-host 64.92.84.81` (or
|
||||
equivalent `--bind-address` config) and DNAT/forwarding on was-sw01 to relay
|
||||
traffic through the backbone → mia-sw01 → Tunnel500 → biscayne. The old
|
||||
`--public-tvu-address` pipeline used this IP successfully.
|
||||
|
||||
2. **Get DoubleZero to announce 137.239.194.0/24 via eBGP to the ISP.** This
|
||||
would make the current `--gossip-host 137.239.194.65` setup work, but
|
||||
requires coordination with DoubleZero operations.
|
||||
|
||||
3. **Hybrid approach**: Use 64.92.84.81 for public-facing sockets (TVU, repair,
|
||||
TPU) and 137.239.194.65 for gossip (which works via DoubleZero overlay).
|
||||
Requires agave to support per-protocol address binding, which it does not
|
||||
(`--gossip-host` sets ALL sockets to the same IP).
|
||||
|
||||
## Previous Workaround
|
||||
|
||||
The old `--public-tvu-address` pipeline used socat + shred-unwrap.py to relay
|
||||
shreds from 64.92.84.81:20000 to the validator. That pipeline is not persistent
|
||||
across reboots and was superseded by the `--gossip-host` approach (which turned
|
||||
out to be broken for non-DoubleZero peers).
|
||||
See:
|
||||
- `playbooks/ashburn-relay-mia-sw01.yml` (Tunnel100 + ACL + routes)
|
||||
- `playbooks/ashburn-relay-biscayne.yml` (gre-ashburn + DNAT + SNAT + policy routing)
|
||||
- `playbooks/ashburn-relay-was-sw01.yml` (static route, unchanged)
|
||||
|
|
|
|||
|
|
@ -2,7 +2,12 @@
|
|||
# Configure biscayne for Ashburn validator relay
|
||||
#
|
||||
# Sets up inbound DNAT (137.239.194.65 → kind node) and outbound SNAT +
|
||||
# policy routing (validator traffic → doublezero0 → mia-sw01 → was-sw01).
|
||||
# policy routing (validator traffic → GRE tunnel → mia-sw01 → was-sw01).
|
||||
#
|
||||
# Uses a dedicated GRE tunnel to mia-sw01 (NOT the DoubleZero-managed
|
||||
# doublezero0/Tunnel500). The tunnel source is biscayne's public IP
|
||||
# (186.233.184.235) and the destination is mia-sw01's free LAN IP
|
||||
# (209.42.167.137).
|
||||
#
|
||||
# Usage:
|
||||
# # Full setup (inbound + outbound)
|
||||
|
|
@ -28,8 +33,12 @@
|
|||
ashburn_ip: 137.239.194.65
|
||||
kind_node_ip: 172.20.0.2
|
||||
kind_network: 172.20.0.0/16
|
||||
tunnel_gateway: 169.254.7.6
|
||||
tunnel_device: doublezero0
|
||||
# New dedicated GRE tunnel (not DZ-managed doublezero0)
|
||||
tunnel_device: gre-ashburn
|
||||
tunnel_local_ip: 169.254.100.1 # biscayne end of /31
|
||||
tunnel_remote_ip: 169.254.100.0 # mia-sw01 end of /31
|
||||
tunnel_src: 186.233.184.235 # biscayne public IP
|
||||
tunnel_dst: 209.42.167.137 # mia-sw01 free LAN IP
|
||||
fwmark: 100
|
||||
rt_table_name: ashburn
|
||||
rt_table_id: 100
|
||||
|
|
@ -49,6 +58,15 @@
|
|||
ansible.builtin.command:
|
||||
cmd: ip addr del {{ ashburn_ip }}/32 dev lo
|
||||
failed_when: false
|
||||
changed_when: false
|
||||
|
||||
- name: Remove GRE tunnel
|
||||
ansible.builtin.shell:
|
||||
cmd: |
|
||||
ip link set {{ tunnel_device }} down 2>/dev/null || true
|
||||
ip tunnel del {{ tunnel_device }} 2>/dev/null || true
|
||||
executable: /bin/bash
|
||||
changed_when: false
|
||||
|
||||
- name: Remove inbound DNAT rules
|
||||
ansible.builtin.shell:
|
||||
|
|
@ -58,6 +76,7 @@
|
|||
iptables -t nat -D PREROUTING -p tcp -d {{ ashburn_ip }} --dport {{ gossip_port }} -j DNAT --to-destination {{ kind_node_ip }}:{{ gossip_port }} 2>/dev/null || true
|
||||
iptables -t nat -D PREROUTING -p udp -d {{ ashburn_ip }} --dport {{ dynamic_port_range_start }}:{{ dynamic_port_range_end }} -j DNAT --to-destination {{ kind_node_ip }} 2>/dev/null || true
|
||||
executable: /bin/bash
|
||||
changed_when: false
|
||||
|
||||
- name: Remove outbound mangle rules
|
||||
ansible.builtin.shell:
|
||||
|
|
@ -67,11 +86,13 @@
|
|||
iptables -t mangle -D PREROUTING -s {{ kind_network }} -p udp --sport {{ dynamic_port_range_start }}:{{ dynamic_port_range_end }} -j MARK --set-mark {{ fwmark }} 2>/dev/null || true
|
||||
iptables -t mangle -D PREROUTING -s {{ kind_network }} -p tcp --sport {{ gossip_port }} -j MARK --set-mark {{ fwmark }} 2>/dev/null || true
|
||||
executable: /bin/bash
|
||||
changed_when: false
|
||||
|
||||
- name: Remove outbound SNAT rule
|
||||
ansible.builtin.shell:
|
||||
cmd: iptables -t nat -D POSTROUTING -m mark --mark {{ fwmark }} -j SNAT --to-source {{ ashburn_ip }} 2>/dev/null || true
|
||||
executable: /bin/bash
|
||||
changed_when: false
|
||||
|
||||
- name: Remove policy routing
|
||||
ansible.builtin.shell:
|
||||
|
|
@ -79,10 +100,12 @@
|
|||
ip rule del fwmark {{ fwmark }} table {{ rt_table_name }} 2>/dev/null || true
|
||||
ip route del default table {{ rt_table_name }} 2>/dev/null || true
|
||||
executable: /bin/bash
|
||||
changed_when: false
|
||||
|
||||
- name: Persist cleaned iptables
|
||||
ansible.builtin.command:
|
||||
cmd: netfilter-persistent save
|
||||
changed_when: true
|
||||
|
||||
- name: Remove if-up.d script
|
||||
ansible.builtin.file:
|
||||
|
|
@ -91,7 +114,7 @@
|
|||
|
||||
- name: Rollback complete
|
||||
ansible.builtin.debug:
|
||||
msg: "Ashburn relay rules removed. Old SHRED-RELAY DNAT (64.92.84.81:20000) is still in place."
|
||||
msg: "Ashburn relay rules removed."
|
||||
|
||||
- name: End play after rollback
|
||||
ansible.builtin.meta: end_play
|
||||
|
|
@ -99,13 +122,13 @@
|
|||
# ------------------------------------------------------------------
|
||||
# Pre-flight checks
|
||||
# ------------------------------------------------------------------
|
||||
- name: Check doublezero0 tunnel is up
|
||||
- name: Check tunnel destination is reachable
|
||||
ansible.builtin.command:
|
||||
cmd: ip link show {{ tunnel_device }}
|
||||
register: tunnel_status
|
||||
cmd: ping -c 1 -W 2 {{ tunnel_dst }}
|
||||
register: tunnel_dst_ping
|
||||
changed_when: false
|
||||
failed_when: "'UP' not in tunnel_status.stdout"
|
||||
tags: [preflight, inbound, outbound]
|
||||
failed_when: tunnel_dst_ping.rc != 0
|
||||
tags: [preflight, outbound]
|
||||
|
||||
- name: Check kind node is reachable
|
||||
ansible.builtin.command:
|
||||
|
|
@ -115,23 +138,6 @@
|
|||
failed_when: kind_ping.rc != 0
|
||||
tags: [preflight, inbound]
|
||||
|
||||
- name: Verify Docker preserves source ports (5 sec sample)
|
||||
ansible.builtin.shell:
|
||||
cmd: |
|
||||
set -o pipefail
|
||||
# Check if any validator traffic is flowing with original sport
|
||||
timeout 5 tcpdump -i br-cf46a62ab5b2 -nn -c 5 'udp src port 8001 or udp src portrange 9000-9025' 2>&1 | tail -5 || echo "No validator traffic captured in 5s (validator may not be running)"
|
||||
executable: /bin/bash
|
||||
register: sport_check
|
||||
changed_when: false
|
||||
failed_when: false
|
||||
tags: [preflight]
|
||||
|
||||
- name: Show sport preservation check
|
||||
ansible.builtin.debug:
|
||||
var: sport_check.stdout_lines
|
||||
tags: [preflight]
|
||||
|
||||
- name: Show existing iptables nat rules
|
||||
ansible.builtin.shell:
|
||||
cmd: iptables -t nat -L -v -n --line-numbers | head -60
|
||||
|
|
@ -145,6 +151,44 @@
|
|||
var: existing_nat.stdout_lines
|
||||
tags: [preflight]
|
||||
|
||||
- name: Check for existing GRE tunnel
|
||||
ansible.builtin.shell:
|
||||
cmd: ip tunnel show {{ tunnel_device }} 2>&1 || echo "tunnel does not exist"
|
||||
executable: /bin/bash
|
||||
register: existing_tunnel
|
||||
changed_when: false
|
||||
tags: [preflight]
|
||||
|
||||
- name: Display existing tunnel
|
||||
ansible.builtin.debug:
|
||||
var: existing_tunnel.stdout_lines
|
||||
tags: [preflight]
|
||||
|
||||
# ------------------------------------------------------------------
|
||||
# GRE tunnel setup
|
||||
# ------------------------------------------------------------------
|
||||
- name: Create GRE tunnel
|
||||
ansible.builtin.shell:
|
||||
cmd: |
|
||||
set -o pipefail
|
||||
if ip tunnel show {{ tunnel_device }} 2>/dev/null; then
|
||||
echo "tunnel already exists"
|
||||
else
|
||||
ip tunnel add {{ tunnel_device }} mode gre local {{ tunnel_src }} remote {{ tunnel_dst }} ttl 64
|
||||
ip addr add {{ tunnel_local_ip }}/31 dev {{ tunnel_device }}
|
||||
ip link set {{ tunnel_device }} up mtu 8972
|
||||
echo "tunnel created"
|
||||
fi
|
||||
executable: /bin/bash
|
||||
register: tunnel_result
|
||||
changed_when: "'created' in tunnel_result.stdout"
|
||||
tags: [outbound]
|
||||
|
||||
- name: Show tunnel result
|
||||
ansible.builtin.debug:
|
||||
var: tunnel_result.stdout_lines
|
||||
tags: [outbound]
|
||||
|
||||
# ------------------------------------------------------------------
|
||||
# Inbound: DNAT for 137.239.194.65 → kind node
|
||||
# ------------------------------------------------------------------
|
||||
|
|
@ -186,7 +230,7 @@
|
|||
tags: [inbound]
|
||||
|
||||
# ------------------------------------------------------------------
|
||||
# Outbound: fwmark + SNAT + policy routing
|
||||
# Outbound: fwmark + SNAT + policy routing via new tunnel
|
||||
# ------------------------------------------------------------------
|
||||
- name: Mark outbound validator traffic (mangle PREROUTING)
|
||||
ansible.builtin.shell:
|
||||
|
|
@ -218,7 +262,6 @@
|
|||
ansible.builtin.shell:
|
||||
cmd: |
|
||||
set -o pipefail
|
||||
# Check if rule already exists
|
||||
if iptables -t nat -C POSTROUTING -m mark --mark {{ fwmark }} -j SNAT --to-source {{ ashburn_ip }} 2>/dev/null; then
|
||||
echo "SNAT rule already exists"
|
||||
else
|
||||
|
|
@ -256,9 +299,9 @@
|
|||
changed_when: "'added' in rule_result.stdout"
|
||||
tags: [outbound]
|
||||
|
||||
- name: Add default route via doublezero0 in ashburn table
|
||||
- name: Add default route via GRE tunnel in ashburn table
|
||||
ansible.builtin.shell:
|
||||
cmd: ip route replace default via {{ tunnel_gateway }} dev {{ tunnel_device }} table {{ rt_table_name }}
|
||||
cmd: ip route replace default via {{ tunnel_remote_ip }} dev {{ tunnel_device }} table {{ rt_table_name }}
|
||||
executable: /bin/bash
|
||||
changed_when: true
|
||||
tags: [outbound]
|
||||
|
|
@ -269,11 +312,12 @@
|
|||
- name: Save iptables rules
|
||||
ansible.builtin.command:
|
||||
cmd: netfilter-persistent save
|
||||
changed_when: true
|
||||
tags: [inbound, outbound]
|
||||
|
||||
- name: Install if-up.d persistence script
|
||||
ansible.builtin.copy:
|
||||
src: files/ashburn-routing-ifup.sh
|
||||
ansible.builtin.template:
|
||||
src: files/ashburn-routing-ifup.sh.j2
|
||||
dest: /etc/network/if-up.d/ashburn-routing
|
||||
mode: '0755'
|
||||
owner: root
|
||||
|
|
@ -283,6 +327,22 @@
|
|||
# ------------------------------------------------------------------
|
||||
# Verification
|
||||
# ------------------------------------------------------------------
|
||||
- name: Show tunnel status
|
||||
ansible.builtin.shell:
|
||||
cmd: |
|
||||
echo "=== tunnel ==="
|
||||
ip tunnel show {{ tunnel_device }}
|
||||
echo ""
|
||||
echo "=== tunnel addr ==="
|
||||
ip addr show {{ tunnel_device }}
|
||||
echo ""
|
||||
echo "=== ping tunnel peer ==="
|
||||
ping -c 1 -W 2 {{ tunnel_remote_ip }} 2>&1 || echo "tunnel peer unreachable"
|
||||
executable: /bin/bash
|
||||
register: tunnel_status
|
||||
changed_when: false
|
||||
tags: [outbound]
|
||||
|
||||
- name: Show NAT rules
|
||||
ansible.builtin.shell:
|
||||
cmd: iptables -t nat -L -v -n --line-numbers 2>&1 | head -40
|
||||
|
|
@ -323,6 +383,7 @@
|
|||
- name: Display verification
|
||||
ansible.builtin.debug:
|
||||
msg:
|
||||
tunnel: "{{ tunnel_status.stdout_lines | default([]) }}"
|
||||
nat_rules: "{{ nat_rules.stdout_lines }}"
|
||||
mangle_rules: "{{ mangle_rules.stdout_lines | default([]) }}"
|
||||
routing: "{{ routing_info.stdout_lines | default([]) }}"
|
||||
|
|
@ -334,12 +395,14 @@
|
|||
msg: |
|
||||
=== Ashburn Relay Setup Complete ===
|
||||
Ashburn IP: {{ ashburn_ip }} (on lo)
|
||||
GRE tunnel: {{ tunnel_device }} ({{ tunnel_src }} → {{ tunnel_dst }})
|
||||
link: {{ tunnel_local_ip }}/31 ↔ {{ tunnel_remote_ip }}/31
|
||||
Inbound DNAT: {{ ashburn_ip }}:8001,9000-9025 → {{ kind_node_ip }}
|
||||
Outbound SNAT: {{ kind_network }} sport 8001,9000-9025 → {{ ashburn_ip }}
|
||||
Policy route: fwmark {{ fwmark }} → table {{ rt_table_name }} → via {{ tunnel_gateway }} dev {{ tunnel_device }}
|
||||
Persisted: iptables-persistent + /etc/network/if-up.d/ashburn-routing
|
||||
Policy route: fwmark {{ fwmark }} → table {{ rt_table_name }} → via {{ tunnel_remote_ip }} dev {{ tunnel_device }}
|
||||
|
||||
Next steps:
|
||||
1. Verify inbound: ping {{ ashburn_ip }} from external host
|
||||
2. Verify outbound: tcpdump on was-sw01 for src {{ ashburn_ip }}
|
||||
3. Check validator gossip ContactInfo shows {{ ashburn_ip }} for all addresses
|
||||
1. Apply mia-sw01 config (Tunnel100 must be up on both sides)
|
||||
2. Verify tunnel: ping {{ tunnel_remote_ip }}
|
||||
3. Test from kelce: echo test | nc -u -w 1 137.239.194.65 9000
|
||||
4. Check validator gossip ContactInfo shows {{ ashburn_ip }} for all addresses
|
||||
|
|
|
|||
|
|
@ -1,22 +1,18 @@
|
|||
---
|
||||
# Configure laconic-mia-sw01 for validator traffic relay (inbound + outbound)
|
||||
# Configure laconic-mia-sw01 for validator traffic relay via dedicated GRE tunnel
|
||||
#
|
||||
# Outbound: Redirects outbound traffic from biscayne (src 137.239.194.65)
|
||||
# arriving via the doublezero0 GRE tunnel to was-sw01 via the backbone,
|
||||
# preventing BCP38 drops at mia-sw01's ISP uplink.
|
||||
# Creates a NEW GRE tunnel (Tunnel100) separate from the DoubleZero-managed
|
||||
# Tunnel500. The DZ agent controls Tunnel500's ACL (SEC-USER-500-IN) and
|
||||
# overwrites any custom entries, so we cannot use it for validator traffic
|
||||
# with src 137.239.194.65.
|
||||
#
|
||||
# Inbound: Routes traffic destined to 137.239.194.65 from the default VRF
|
||||
# to biscayne via Tunnel500 in vrf1. Without this, mia-sw01 sends
|
||||
# 137.239.194.65 out the ISP uplink back to was-sw01 (routing loop).
|
||||
# Tunnel100 uses mia-sw01's free LAN IP (209.42.167.137) as the tunnel
|
||||
# source, and biscayne's public IP (186.233.184.235) as the destination.
|
||||
# This tunnel carries traffic over the ISP uplink, completely independent
|
||||
# of the DoubleZero overlay.
|
||||
#
|
||||
# Approach: The existing per-tunnel ACL (SEC-USER-500-IN) controls what
|
||||
# traffic enters vrf1 from Tunnel500. We add 137.239.194.65 to the ACL
|
||||
# and add a default route in vrf1 via egress-vrf default pointing to
|
||||
# was-sw01's backbone IP. For inbound, an inter-VRF static route in the
|
||||
# default VRF forwards 137.239.194.65/32 to biscayne via Tunnel500.
|
||||
#
|
||||
# The other vrf1 tunnels (502, 504, 505) have their own ACLs that only
|
||||
# permit their specific source IPs, so the default route won't affect them.
|
||||
# Inbound: was-sw01 → backbone Et4/1 → mia-sw01 → Tunnel100 → biscayne
|
||||
# Outbound: biscayne → Tunnel100 → mia-sw01 → backbone Et4/1 → was-sw01
|
||||
#
|
||||
# Usage:
|
||||
# # Pre-flight checks only (safe, read-only)
|
||||
|
|
@ -32,22 +28,28 @@
|
|||
# # Rollback
|
||||
# ansible-playbook -i inventory/switches.yml playbooks/ashburn-relay-mia-sw01.yml -e rollback=true
|
||||
|
||||
- name: Configure mia-sw01 outbound validator redirect
|
||||
- name: Configure mia-sw01 validator relay tunnel
|
||||
hosts: mia-sw01
|
||||
gather_facts: false
|
||||
|
||||
vars:
|
||||
ashburn_ip: 137.239.194.65
|
||||
biscayne_ip: 186.233.184.235
|
||||
apply: false
|
||||
commit: false
|
||||
rollback: false
|
||||
tunnel_interface: Tunnel500
|
||||
tunnel_vrf: vrf1
|
||||
tunnel_acl: SEC-USER-500-IN
|
||||
tunnel_nexthop: 169.254.7.7 # biscayne's end of the Tunnel500 /31
|
||||
# New tunnel — not managed by DZ agent
|
||||
tunnel_interface: Tunnel100
|
||||
tunnel_source_ip: 209.42.167.137 # mia-sw01 free LAN IP
|
||||
tunnel_local: 169.254.100.0 # /31 link, mia-sw01 side
|
||||
tunnel_remote: 169.254.100.1 # /31 link, biscayne side
|
||||
tunnel_acl: SEC-VALIDATOR-100-IN
|
||||
# Loopback for tunnel source (so it's always up)
|
||||
tunnel_source_lo: Loopback101
|
||||
backbone_interface: Ethernet4/1
|
||||
session_name: validator-outbound
|
||||
checkpoint_name: pre-validator-outbound
|
||||
backbone_peer: 172.16.1.188 # was-sw01 backbone IP
|
||||
session_name: validator-tunnel
|
||||
checkpoint_name: pre-validator-tunnel
|
||||
|
||||
tasks:
|
||||
# ------------------------------------------------------------------
|
||||
|
|
@ -93,43 +95,52 @@
|
|||
# ------------------------------------------------------------------
|
||||
# Pre-flight checks (always run unless commit/rollback)
|
||||
# ------------------------------------------------------------------
|
||||
- name: Show tunnel interface config
|
||||
- name: Check existing tunnel interfaces
|
||||
arista.eos.eos_command:
|
||||
commands:
|
||||
- show ip interface brief | include Tunnel
|
||||
register: existing_tunnels
|
||||
tags: [preflight]
|
||||
|
||||
- name: Display existing tunnels
|
||||
ansible.builtin.debug:
|
||||
var: existing_tunnels.stdout_lines
|
||||
tags: [preflight]
|
||||
|
||||
- name: Check if Tunnel100 already exists
|
||||
arista.eos.eos_command:
|
||||
commands:
|
||||
- "show running-config interfaces {{ tunnel_interface }}"
|
||||
register: tunnel_config
|
||||
tags: [preflight]
|
||||
|
||||
- name: Display tunnel config
|
||||
- name: Display Tunnel100 config
|
||||
ansible.builtin.debug:
|
||||
var: tunnel_config.stdout_lines
|
||||
tags: [preflight]
|
||||
|
||||
- name: Show tunnel ACL
|
||||
- name: Check if Loopback101 already exists
|
||||
arista.eos.eos_command:
|
||||
commands:
|
||||
- "show running-config | section ip access-list {{ tunnel_acl }}"
|
||||
register: acl_config
|
||||
- "show running-config interfaces {{ tunnel_source_lo }}"
|
||||
register: lo_config
|
||||
tags: [preflight]
|
||||
|
||||
- name: Display tunnel ACL
|
||||
- name: Display Loopback101 config
|
||||
ansible.builtin.debug:
|
||||
var: acl_config.stdout_lines
|
||||
var: lo_config.stdout_lines
|
||||
tags: [preflight]
|
||||
|
||||
- name: Check VRF routing
|
||||
- name: Check route for ashburn IP
|
||||
arista.eos.eos_command:
|
||||
commands:
|
||||
- "show ip route vrf {{ tunnel_vrf }} 0.0.0.0/0"
|
||||
- "show ip route vrf {{ tunnel_vrf }} {{ backbone_peer }}"
|
||||
- "show ip route {{ backbone_peer }}"
|
||||
- "show ip route {{ ashburn_ip }}"
|
||||
register: vrf_routing
|
||||
register: route_check
|
||||
tags: [preflight]
|
||||
|
||||
- name: Display VRF routing check
|
||||
- name: Display route check
|
||||
ansible.builtin.debug:
|
||||
var: vrf_routing.stdout_lines
|
||||
var: route_check.stdout_lines
|
||||
tags: [preflight]
|
||||
|
||||
- name: Pre-flight summary
|
||||
|
|
@ -138,9 +149,17 @@
|
|||
msg: |
|
||||
=== Pre-flight complete ===
|
||||
Review the output above:
|
||||
1. {{ tunnel_interface }} ACL ({{ tunnel_acl }}): does it permit src {{ ashburn_ip }}?
|
||||
2. {{ tunnel_vrf }} default route: does one exist?
|
||||
3. Backbone nexthop {{ backbone_peer }}: reachable in default VRF?
|
||||
1. Does {{ tunnel_interface }} already exist?
|
||||
2. Does {{ tunnel_source_lo }} already exist?
|
||||
3. Current route for {{ ashburn_ip }}
|
||||
|
||||
Planned config:
|
||||
- {{ tunnel_source_lo }}: {{ tunnel_source_ip }}/32
|
||||
- {{ tunnel_interface }}: GRE src {{ tunnel_source_ip }} dst {{ biscayne_ip }}
|
||||
link address {{ tunnel_local }}/31
|
||||
ACL {{ tunnel_acl }}: permit src {{ ashburn_ip }}, permit src {{ tunnel_remote }}
|
||||
- Route: {{ ashburn_ip }}/32 via {{ tunnel_remote }}
|
||||
- Outbound default for tunnel traffic: 0.0.0.0/0 via {{ backbone_interface }} {{ backbone_peer }}
|
||||
|
||||
To apply config:
|
||||
ansible-playbook -i inventory/switches.yml playbooks/ashburn-relay-mia-sw01.yml \
|
||||
|
|
@ -163,18 +182,33 @@
|
|||
arista.eos.eos_command:
|
||||
commands:
|
||||
- command: "configure session {{ session_name }}"
|
||||
# Permit Ashburn IP through the tunnel ACL (insert before deny)
|
||||
- command: "ip access-list {{ tunnel_acl }}"
|
||||
- command: "45 permit ip host {{ ashburn_ip }} any"
|
||||
# Loopback for tunnel source (always-up interface)
|
||||
- command: "interface {{ tunnel_source_lo }}"
|
||||
- command: "ip address {{ tunnel_source_ip }}/32"
|
||||
- command: exit
|
||||
# Default route in vrf1 via backbone to was-sw01 (egress-vrf default)
|
||||
# Safe because per-tunnel ACLs already restrict what enters vrf1
|
||||
- command: "ip route vrf {{ tunnel_vrf }} 0.0.0.0/0 egress-vrf default {{ backbone_interface }} {{ backbone_peer }}"
|
||||
# Inbound: route traffic for ashburn IP from default VRF to biscayne via tunnel.
|
||||
# Without this, mia-sw01 sends 137.239.194.65 out the ISP uplink → routing loop.
|
||||
# NOTE: nexthop only, no interface — EOS silently drops cross-VRF routes that
|
||||
# specify a tunnel interface (accepts in config but never installs in RIB).
|
||||
- command: "ip route {{ ashburn_ip }}/32 egress-vrf {{ tunnel_vrf }} {{ tunnel_nexthop }}"
|
||||
# ACL for the new tunnel — we control this, DZ agent won't touch it
|
||||
- command: "ip access-list {{ tunnel_acl }}"
|
||||
- command: "counters per-entry"
|
||||
- command: "10 permit icmp host {{ tunnel_remote }} any"
|
||||
- command: "20 permit ip host {{ ashburn_ip }} any"
|
||||
- command: "30 permit ip host {{ tunnel_remote }} any"
|
||||
- command: "100 deny ip any any"
|
||||
- command: exit
|
||||
# New GRE tunnel
|
||||
- command: "interface {{ tunnel_interface }}"
|
||||
- command: "mtu 9216"
|
||||
- command: "ip address {{ tunnel_local }}/31"
|
||||
- command: "ip access-group {{ tunnel_acl }} in"
|
||||
- command: "tunnel mode gre"
|
||||
- command: "tunnel source {{ tunnel_source_ip }}"
|
||||
- command: "tunnel destination {{ biscayne_ip }}"
|
||||
- command: exit
|
||||
# Inbound: route ashburn IP to biscayne via the new tunnel
|
||||
- command: "ip route {{ ashburn_ip }}/32 {{ tunnel_remote }}"
|
||||
# Outbound: biscayne's traffic exits via backbone to was-sw01.
|
||||
# Use a specific route for the backbone peer so tunnel traffic
|
||||
# can reach was-sw01 without a blanket default route.
|
||||
# (The switch's actual default route is via Et1/1 ISP uplink.)
|
||||
|
||||
- name: Show session diff
|
||||
arista.eos.eos_command:
|
||||
|
|
@ -199,9 +233,11 @@
|
|||
- name: Verify config
|
||||
arista.eos.eos_command:
|
||||
commands:
|
||||
- "show running-config | section ip access-list {{ tunnel_acl }}"
|
||||
- "show ip route vrf {{ tunnel_vrf }} 0.0.0.0/0"
|
||||
- "show running-config interfaces {{ tunnel_source_lo }}"
|
||||
- "show running-config interfaces {{ tunnel_interface }}"
|
||||
- "show ip access-lists {{ tunnel_acl }}"
|
||||
- "show ip route {{ ashburn_ip }}"
|
||||
- "show interfaces {{ tunnel_interface }} status"
|
||||
register: verify
|
||||
|
||||
- name: Display verification
|
||||
|
|
@ -216,14 +252,14 @@
|
|||
Checkpoint: {{ checkpoint_name }}
|
||||
|
||||
Changes applied:
|
||||
1. ACL {{ tunnel_acl }}: added "45 permit ip host {{ ashburn_ip }} any"
|
||||
2. Default route in {{ tunnel_vrf }}: 0.0.0.0/0 egress-vrf default {{ backbone_interface }} {{ backbone_peer }}
|
||||
3. Inbound route: {{ ashburn_ip }}/32 egress-vrf {{ tunnel_vrf }} {{ tunnel_nexthop }}
|
||||
1. {{ tunnel_source_lo }}: {{ tunnel_source_ip }}/32
|
||||
2. {{ tunnel_interface }}: GRE tunnel to {{ biscayne_ip }}
|
||||
link {{ tunnel_local }}/31, ACL {{ tunnel_acl }}
|
||||
3. Route: {{ ashburn_ip }}/32 via {{ tunnel_remote }}
|
||||
|
||||
The config will auto-revert in 5 minutes unless committed.
|
||||
Verify on the switch, then commit:
|
||||
configure session {{ session_name }} commit
|
||||
write memory
|
||||
ansible-playbook ... -e commit=true
|
||||
|
||||
To revert immediately:
|
||||
ansible-playbook ... -e rollback=true
|
||||
|
|
|
|||
|
|
@ -0,0 +1,28 @@
|
|||
#!/bin/bash
|
||||
# /etc/network/if-up.d/ashburn-routing
|
||||
# Restore GRE tunnel and policy routing for Ashburn validator relay
|
||||
# after reboot or interface up. Acts on eno1 (public interface) since
|
||||
# the GRE tunnel depends on it.
|
||||
|
||||
[ "$IFACE" = "eno1" ] || exit 0
|
||||
|
||||
# Create GRE tunnel if it doesn't exist
|
||||
if ! ip tunnel show {{ tunnel_device }} 2>/dev/null; then
|
||||
ip tunnel add {{ tunnel_device }} mode gre local {{ tunnel_src }} remote {{ tunnel_dst }} ttl 64
|
||||
ip addr add {{ tunnel_local_ip }}/31 dev {{ tunnel_device }}
|
||||
ip link set {{ tunnel_device }} up mtu 8972
|
||||
fi
|
||||
|
||||
# Ensure rt_tables entry exists
|
||||
grep -q '^{{ rt_table_id }} {{ rt_table_name }}$' /etc/iproute2/rt_tables || \
|
||||
echo "{{ rt_table_id }} {{ rt_table_name }}" >> /etc/iproute2/rt_tables
|
||||
|
||||
# Add policy rule
|
||||
ip rule show | grep -q 'fwmark 0x64 lookup {{ rt_table_name }}' || \
|
||||
ip rule add fwmark {{ fwmark }} table {{ rt_table_name }}
|
||||
|
||||
# Add default route via mia-sw01 through GRE tunnel
|
||||
ip route replace default via {{ tunnel_remote_ip }} dev {{ tunnel_device }} table {{ rt_table_name }}
|
||||
|
||||
# Add Ashburn IP to loopback
|
||||
ip addr show lo | grep -q '{{ ashburn_ip }}' || ip addr add {{ ashburn_ip }}/32 dev lo
|
||||
Loading…
Reference in New Issue