191 lines
8.6 KiB
Markdown
191 lines
8.6 KiB
Markdown
# Post-Mortem: Ashburn Relay Outbound Path Failure
|
|
|
|
**Date resolved**: 2026-03-10
|
|
**Duration of impact**: Unknown — likely since firewalld was enabled (post-reboot
|
|
2026-03-09 ~21:24 UTC). The relay worked before this with firewalld disabled.
|
|
**Symptoms**: Validator CrashLoopBackOff on ip_echo port reachability check.
|
|
Entrypoint never receives the validator's outbound TCP connection, so it can't
|
|
verify UDP port reachability and the validator refuses to start.
|
|
|
|
## Timeline
|
|
|
|
### Session d02959a7 (2026-03-06 to 2026-03-08)
|
|
|
|
Initial relay infrastructure build-out. Multi-day effort across three repos.
|
|
|
|
1. **Validator deployed**, replaying at 0.24 slots/sec. RTT between Miami and
|
|
peers (~150ms per repair round-trip) identified as the bottleneck. Ashburn
|
|
relay identified as the fix.
|
|
|
|
2. **GRE tunnel created** (gre-ashburn: biscayne 186.233.184.235 ↔ mia-sw01
|
|
209.42.167.137). Tunnel100 on mia-sw01 in VRF relay. Policy routing with
|
|
fwmark 0x64 routes validator traffic through the tunnel.
|
|
|
|
3. **Inbound path debugged end-to-end**:
|
|
- Cross-VRF routing on mia-sw01 investigated (egress-vrf route form, hardware
|
|
FIB programming, TCAM profile).
|
|
- GRE decapsulation on biscayne verified (kernel source read to understand
|
|
ip_tunnel_lookup matching logic).
|
|
- **DOCKER chain drop rule found**: Docker's FORWARD chain only had ACCEPT
|
|
for TCP 6443/443/80. DNAT'd relay UDP was dropped. Fix: DOCKER-USER
|
|
ACCEPT rules for UDP 8001 and 9000-9025.
|
|
- Inbound UDP relay test passed (kelce → was-sw01 → mia-sw01 → Tunnel100 →
|
|
biscayne → DNAT → kind node).
|
|
|
|
4. **Outbound path partially verified**: Relay test scripts confirmed TCP and
|
|
UDP traffic from the kind container exits via gre-ashburn with correct SNAT.
|
|
But the **validator's own ip_echo check was never end-to-end verified** with
|
|
a successful startup. The validator entered CrashLoopBackOff after the
|
|
DOCKER-USER fix for unrelated reasons (monitoring container crashes, log path
|
|
issues).
|
|
|
|
5. **Ashburn relay checklist** written at `docs/ashburn-relay-checklist.md` —
|
|
7 layers covering the full path. All items remained unchecked.
|
|
|
|
### Session 0b5908a4 (2026-03-09)
|
|
|
|
Container rebuild, graceful shutdown implementation, ZFS upgrade, storage
|
|
migration. The validator was **running and catching up from a ~5,649 slot gap**,
|
|
confirming the relay was working. Then:
|
|
|
|
- io_uring/ZFS deadlock from ungraceful shutdown (ZFS 2.2.2, fixed in 2.2.8+)
|
|
- Reboot required to clear zombie processes
|
|
- **Firewalld was enabled/started on the reboot** (previously disabled)
|
|
|
|
### Session cc6c8c55 (2026-03-10, this session)
|
|
|
|
User asked to review session d02959a7 to confirm the ip_echo problem was
|
|
actually solved. It wasn't.
|
|
|
|
1. **ip_echo preflight tool written** (`scripts/agave-container/ip_echo_preflight.py`)
|
|
— reimplements the Solana ip_echo client protocol in Python, called from
|
|
`entrypoint.py` before snapshot download. Tested successfully against live
|
|
entrypoints from the host.
|
|
|
|
2. **Tested from kind netns** — TCP to entrypoint:8001 returns "No route to
|
|
host". Mangle PREROUTING counter increments (marking works) but SNAT
|
|
POSTROUTING counter stays at 0 (packets never reach POSTROUTING).
|
|
|
|
3. **Misdiagnoses**:
|
|
- `src_valid_mark=0` suspected as root cause. Set to 1, no change. The
|
|
`ip route get X from Y mark Z` command was misleading — it simulates
|
|
locally-originated traffic, not forwarded. The correct test is
|
|
`ip route get X from Y iif <iface> mark Z`, which showed routing works.
|
|
- Firewalld nftables backend not setting `src_valid_mark` was a red herring.
|
|
|
|
4. **Root cause found**: Firewalld's nftables `filter_FORWARD` chain (priority
|
|
filter+10) rejects forwarded traffic between interfaces not in known zones.
|
|
Docker bridges and gre-ashburn were not in any firewalld zone. The chain's
|
|
`filter_FORWARD_POLICIES` only had rules for eno1, eno2, and mesh.
|
|
Traffic from br-cf46a62ab5b2 to gre-ashburn fell through to
|
|
`reject with icmpx admin-prohibited`.
|
|
|
|
```
|
|
# The reject that was killing outbound relay traffic:
|
|
chain filter_FORWARD {
|
|
...
|
|
jump filter_FORWARD_POLICIES
|
|
reject with icmpx admin-prohibited ← packets from unknown interfaces
|
|
}
|
|
```
|
|
|
|
5. **Fix applied**:
|
|
- Docker bridges (br-cf46a62ab5b2, docker0, br-4fb6f6795448) → `docker` zone
|
|
- gre-ashburn → `trusted` zone
|
|
- New `docker-to-relay` policy: docker → trusted, ACCEPT
|
|
- All permanent (`firewall-cmd --permanent` + reload)
|
|
|
|
6. **Verified**: ip_echo from kind netns returns `seen_ip=137.239.194.65
|
|
shred_version=50093`. Full outbound path works.
|
|
|
|
## Root Cause
|
|
|
|
**Firewalld was enabled on biscayne after a reboot. Its nftables FORWARD chain
|
|
rejected forwarded traffic from Docker bridges to gre-ashburn because neither
|
|
interface was assigned to a firewalld zone.**
|
|
|
|
The relay worked before because firewalld was disabled. The iptables rules
|
|
(mangle marks, SNAT, DNAT, DOCKER-USER) operated without interference. When
|
|
firewalld was enabled, its nftables filter_FORWARD chain (priority filter+10)
|
|
added a second layer of forwarding policy enforcement that the iptables rules
|
|
couldn't bypass.
|
|
|
|
### Why Docker outbound to the internet still worked
|
|
|
|
Docker's outbound traffic to eno1 was accepted by firewalld because eno1 IS in
|
|
the `public` zone. The `filter_FWD_public_allow` chain has `oifname "eno1"
|
|
accept`. Only traffic to gre-ashburn (not in any zone) was rejected.
|
|
|
|
### Why iptables rules alone weren't enough
|
|
|
|
Linux netfilter processes hooks in priority order. At the FORWARD hook:
|
|
|
|
1. **Priority filter (0)**: iptables `FORWARD` chain — Docker's DOCKER-USER
|
|
and DOCKER-FORWARD chains. These accept the traffic.
|
|
2. **Priority filter+10**: nftables `filter_FORWARD` chain — firewalld's zone
|
|
policies. These reject the traffic if interfaces aren't in known zones.
|
|
|
|
Both chains must accept for the packet to pass. The iptables acceptance at
|
|
priority 0 is overridden by the nftables rejection at priority filter+10.
|
|
|
|
## Architecture After Fix
|
|
|
|
Firewalld manages forwarding policy. Iptables handles Docker-specific rules
|
|
that firewalld can't replace (DNAT ordering, DOCKER-USER chain, mangle marks,
|
|
SNAT). Both coexist because they operate at different netfilter priorities.
|
|
|
|
```
|
|
Firewalld (permanent, survives reboots):
|
|
docker zone: br-cf46a62ab5b2, docker0, br-4fb6f6795448
|
|
trusted zone: mesh, gre-ashburn
|
|
docker-forwarding policy: ANY → docker, ACCEPT (existing)
|
|
docker-to-relay policy: docker → trusted, ACCEPT (new)
|
|
|
|
Systemd service (ashburn-relay.service, After=docker+firewalld):
|
|
GRE tunnel creation (iproute2)
|
|
Ashburn IP on loopback (iproute2)
|
|
DNAT rules at PREROUTING position 1 (iptables, before Docker's chain)
|
|
DOCKER-USER ACCEPT rules (iptables, for Docker's FORWARD chain)
|
|
Mangle marks for policy routing (iptables)
|
|
SNAT for marked traffic (iptables)
|
|
ip rule + ip route for ashburn table (iproute2)
|
|
```
|
|
|
|
## Lessons
|
|
|
|
1. **Firewalld with nftables backend and Docker iptables coexist but don't
|
|
coordinate.** Adding an interface that Docker uses to forward traffic
|
|
requires explicitly assigning it to a firewalld zone. Docker's iptables
|
|
ACCEPT is necessary but not sufficient.
|
|
|
|
2. **`ip route get X from Y mark Z` is misleading for forwarded traffic.**
|
|
It simulates local origination and fails on source address validation. Use
|
|
`ip route get X from Y iif <iface> mark Z` to simulate forwarded packets.
|
|
This wasted significant debugging time.
|
|
|
|
3. **SNAT counter = 0 means packets die before POSTROUTING, but the cause
|
|
could be in either the routing decision OR a filter chain between PREROUTING
|
|
and POSTROUTING.** The nftables filter_FORWARD chain was invisible when only
|
|
checking iptables rules.
|
|
|
|
4. **The validator passed ip_echo and ran successfully before.** That prior
|
|
success was the strongest evidence that the infrastructure was correct and
|
|
something changed. The change was firewalld being enabled.
|
|
|
|
## Related Documents
|
|
|
|
- `docs/ashburn-relay-checklist.md` — 7-layer checklist for relay verification
|
|
- `docs/bug-ashburn-tunnel-port-filtering.md` — prior DOCKER chain drop bug
|
|
- `.claude/skills/biscayne-relay-debugging/SKILL.md` — debugging skill
|
|
- `playbooks/ashburn-relay-biscayne.yml` — migrated playbook (firewalld + iptables)
|
|
- `scripts/agave-container/ip_echo_preflight.py` — preflight diagnostic tool
|
|
|
|
## Related Sessions
|
|
|
|
- `d02959a7-2ec6-4d27-8326-1bc4aaf3ebf1` (2026-03-06): Initial relay build,
|
|
DOCKER-USER fix, inbound path verified, outbound not end-to-end verified
|
|
- `0b5908a4-eff7-46de-9024-a11440bd68a8` (2026-03-09): Relay working (validator
|
|
catching up), then reboot introduced firewalld
|
|
- `cc6c8c55-fb4c-4482-b161-332ddf175300` (2026-03-10): Root cause found and
|
|
fixed (firewalld zone assignment)
|