stack-orchestrator/docs/ashburn-relay-checklist.md

169 lines
5.4 KiB
Markdown
Raw Normal View History

# Ashburn Relay / ip_echo Port Reachability Checklist
The validator exits when it can't verify UDP ports (8001, 9000, 9002, 9003) are
reachable from entrypoint servers. The ip_echo protocol: validator TCP-connects
to entrypoint on port 8001, entrypoint sees source IP, sends UDP probes back to
that IP on the validator's ports. If probes don't arrive, validator crashes.
## Layer 1: Biscayne outbound path
Validator's outbound ip_echo TCP (dport 8001) must exit via GRE tunnel so
entrypoints see `137.239.194.65`, not biscayne's real IP via Docker MASQUERADE.
```
[ ] 1.1 Mangle rules (4 rules in mangle PREROUTING):
- udp sport 8001 (gossip outbound)
- udp sport 9000:9025 (TVU/repair outbound)
- tcp sport 8001 (gossip TCP outbound)
- tcp dport 8001 (ip_echo outbound — THE CRITICAL ONE)
[ ] 1.2 SNAT rule at position 1 (before Docker MASQUERADE):
POSTROUTING -m mark --mark 100 -j SNAT --to-source 137.239.194.65
[ ] 1.3 Policy routing rule:
fwmark 0x64 lookup ashburn
[ ] 1.4 Ashburn routing table default route:
default via 169.254.100.0 dev gre-ashburn
[ ] 1.5 Mangle counters incrementing (pkts/bytes on tcp dport 8001 rule)
```
## Layer 2: GRE tunnel (biscayne ↔ mia-sw01)
```
[ ] 2.1 Tunnel exists and UP:
gre-ashburn with 169.254.100.1/31
[ ] 2.2 Tunnel peer reachable:
ping 169.254.100.0
[ ] 2.3 Ashburn IP on loopback:
137.239.194.65/32 dev lo
```
## Layer 3: Biscayne inbound path (DNAT + DOCKER-USER)
Entrypoint UDP probes arrive at `137.239.194.65` and must reach kind node
`172.20.0.2`.
```
[ ] 3.1 DNAT rules at position 1 in nat PREROUTING
(before Docker's ADDRTYPE LOCAL rule):
- udp dport 8001 → 172.20.0.2:8001
- tcp dport 8001 → 172.20.0.2:8001
- udp dport 9000:9025 → 172.20.0.2
[ ] 3.2 DOCKER-USER ACCEPT rules (3 rules):
- udp dport 8001 → ACCEPT
- tcp dport 8001 → ACCEPT
- udp dport 9000:9025 → ACCEPT
[ ] 3.3 DNAT counters incrementing
```
## Layer 4: mia-sw01
```
[ ] 4.1 Tunnel100 UP in VRF relay
src 209.42.167.137, dst 186.233.184.235, link 169.254.100.0/31
[ ] 4.2 VRF relay default route:
0.0.0.0/0 egress-vrf default 172.16.1.188
[ ] 4.3 Default VRF route to relay IP:
137.239.194.65/32 egress-vrf relay 169.254.100.1
[ ] 4.4 ACL SEC-VALIDATOR-100-IN permits all needed traffic
[ ] 4.5 Backbone Et4/1 UP (172.16.1.189/31)
```
## Layer 5: was-sw01
```
[ ] 5.1 Static route: 137.239.194.65/32 via 172.16.1.189
[ ] 5.2 Backbone Et4/1 UP (172.16.1.188/31)
[ ] 5.3 No Loopback101 (removed to avoid absorbing traffic locally)
```
## Layer 6: Persistence
```
[ ] 6.1 ashburn-relay.service enabled and active (runs After=docker.service)
[ ] 6.2 /usr/local/sbin/ashburn-relay-setup.sh exists
```
## Layer 7: End-to-end tests
All tests run via Ansible playbooks. The test scripts in `scripts/` are
utilities invoked by the playbooks — never run them manually via SSH.
```
[ ] 7.1 relay-test-tcp-dport.py (via ashburn-relay-check.yml or ad-hoc play)
Tests: outbound tcp dport 8001 mangle → SNAT → tunnel
Pass: entrypoint sees 137.239.194.65
Fail: entrypoint sees 186.233.184.235 (Docker MASQUERADE)
[ ] 7.2 relay-test-ip-echo.py (via ashburn-relay-check.yml or ad-hoc play)
Tests: FULL END-TO-END (outbound SNAT + inbound DNAT + DOCKER-USER)
Pass: UDP probe received from entrypoint
Fail: no UDP probes — inbound path broken
[ ] 7.3 relay-inbound-udp-test.yml (cross-inventory: biscayne + kelce)
Tests: inbound UDP from external host → DNAT → kind node
Pass: UDP arrives in kind netns
```
## Playbooks
```bash
# Read-only check of all relay state (biscayne + both switches):
ansible-playbook -i inventory-switches/switches.yml \
-i inventory/biscayne.yml playbooks/ashburn-relay-check.yml
# Apply all biscayne relay rules (idempotent):
ansible-playbook -i inventory/biscayne.yml playbooks/ashburn-relay-biscayne.yml
# Apply outbound only (the ip_echo fix):
ansible-playbook -i inventory/biscayne.yml \
playbooks/ashburn-relay-biscayne.yml -t outbound
# Apply inbound only (DNAT + DOCKER-USER):
ansible-playbook -i inventory/biscayne.yml \
playbooks/ashburn-relay-biscayne.yml -t inbound
# Apply mia-sw01 config:
ansible-playbook -i inventory-switches/switches.yml \
playbooks/ashburn-relay-mia-sw01.yml
# Apply was-sw01 config:
ansible-playbook -i inventory-switches/switches.yml \
playbooks/ashburn-relay-was-sw01.yml
# Cross-inventory inbound UDP test (biscayne + kelce):
ansible-playbook -i inventory/biscayne.yml -i inventory/kelce.yml \
playbooks/relay-inbound-udp-test.yml
```
## Historical root causes
1. **TCP dport 8001 mangle rule missing** — ip_echo TCP exits via Docker
MASQUERADE, entrypoint sees wrong IP, UDP probes go to wrong address.
2. **DOCKER-USER ACCEPT rules missing** — DNAT'd traffic hits Docker's FORWARD
DROP policy, never reaches kind node.
3. **DNAT rule position wrong** — Docker's `ADDRTYPE LOCAL` rule in PREROUTING
catches traffic to loopback IPs before our DNAT rules. Must use `-I
PREROUTING 1`.
4. **mia-sw01 egress-vrf route with interface specified** — silently fails in
EOS (accepted in config, never installed in RIB). Must use nexthop-only form.
5. **was-sw01 Loopback101 absorbing traffic** — local delivery instead of
forwarding to mia-sw01 via backbone.