fix: ashburn relay playbooks and document DZ tunnel ACL root cause

Playbook fixes from testing:
- ashburn-relay-biscayne: insert DNAT rules at position 1 before
  Docker's ADDRTYPE LOCAL rule (was being swallowed at position 3+)
- ashburn-relay-mia-sw01: add inbound route for 137.239.194.65 via
  egress-vrf vrf1 (nexthop only, no interface — EOS silently drops
  cross-VRF routes that specify a tunnel interface)
- ashburn-relay-was-sw01: replace PBR with static route, remove
  Loopback101

Bug doc (bug-ashburn-tunnel-port-filtering.md): root cause is the
DoubleZero agent on mia-sw01 overwrites SEC-USER-500-IN ACL, dropping
outbound gossip with src 137.239.194.65. The DZ agent controls
Tunnel500's lifecycle. Fix requires a separate GRE tunnel using
mia-sw01's free LAN IP (209.42.167.137) to bypass DZ infrastructure.

Also adds all repo docs, scripts, inventory, and remaining playbooks.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
fix/kind-mount-propagation
A. F. Dudley 2026-03-07 01:44:25 +00:00
parent 6841d5e3c3
commit 0b52fc99d7
41 changed files with 40587 additions and 134 deletions

3
.gitignore vendored 100644
View File

@ -0,0 +1,3 @@
.venv/
sessions.duckdb
sessions.duckdb.wal

204
CLAUDE.md 100644
View File

@ -0,0 +1,204 @@
# Biscayne Agave Runbook
## Cluster Operations
### Shutdown Order
The agave validator runs inside a kind-based k8s cluster managed by `laconic-so`.
The kind node is a Docker container. **Never restart or kill the kind node container
while the validator is running.** Agave uses `io_uring` for async I/O, and on ZFS,
killing the process can produce unkillable kernel threads (D-state in
`io_wq_put_and_exit` blocked on ZFS transaction commits). This deadlocks the
container's PID namespace, making `docker stop`, `docker restart`, `docker exec`,
and even `reboot` hang.
Correct shutdown sequence:
1. Scale the deployment to 0 and wait for the pod to terminate:
```
kubectl scale deployment laconic-70ce4c4b47e23b85-deployment \
-n laconic-laconic-70ce4c4b47e23b85 --replicas=0
kubectl wait --for=delete pod -l app=laconic-70ce4c4b47e23b85-deployment \
-n laconic-laconic-70ce4c4b47e23b85 --timeout=120s
```
2. Only then restart the kind node if needed:
```
docker restart laconic-70ce4c4b47e23b85-control-plane
```
3. Scale back up:
```
kubectl scale deployment laconic-70ce4c4b47e23b85-deployment \
-n laconic-laconic-70ce4c4b47e23b85 --replicas=1
```
### Ramdisk
The accounts directory must be on a ramdisk for performance. `/dev/ram0` loses its
filesystem on reboot and must be reformatted before mounting.
**Boot ordering is handled by systemd units** (installed by `biscayne-boot.yml`):
- `format-ramdisk.service`: runs `mkfs.xfs -f /dev/ram0` before `local-fs.target`
- fstab entry: mounts `/dev/ram0` at `/srv/solana/ramdisk` with
`x-systemd.requires=format-ramdisk.service`
- `ramdisk-accounts.service`: creates `/srv/solana/ramdisk/accounts` and sets
ownership after the mount
These units run before docker, so the kind node's bind mounts always see the
ramdisk. **No manual intervention is needed after reboot.**
**Mount propagation**: The kind node bind-mounts `/srv/kind``/mnt`. Because
the ramdisk is mounted at `/srv/solana/ramdisk` and symlinked/overlaid through
`/srv/kind/solana/ramdisk`, mount propagation makes it visible inside the kind
node at `/mnt/solana/ramdisk` without restarting the kind node. **Do NOT restart
the kind node just to pick up a ramdisk mount.**
### KUBECONFIG
kubectl must be told where the kubeconfig is when running as root or via ansible:
```
KUBECONFIG=/home/rix/.kube/config kubectl ...
```
The ansible playbooks set `environment: KUBECONFIG: /home/rix/.kube/config`.
### SSH Agent
SSH to biscayne goes through a ProxyCommand jump host (abernathy.ch2.vaasl.io).
The SSH agent socket rotates when the user reconnects. Find the current one:
```
ls -t /tmp/ssh-*/agent.* | head -1
```
Then export it:
```
export SSH_AUTH_SOCK=/tmp/ssh-XXXX/agent.NNNN
```
### io_uring/ZFS Deadlock — Root Cause
When agave-validator is killed while performing I/O against ZFS-backed paths (not
the ramdisk), io_uring worker threads get stuck in D-state:
```
io_wq_put_and_exit → dsl_dir_tempreserve_space (ZFS module)
```
These threads are unkillable (SIGKILL has no effect on D-state processes). They
prevent the container's PID namespace from being reaped (`zap_pid_ns_processes`
waits forever), which breaks `docker stop`, `docker restart`, `docker exec`, and
even `reboot`. The only fix is a hard power cycle.
**Prevention**: Always scale the deployment to 0 and wait for the pod to terminate
before any destructive operation (namespace delete, kind restart, host reboot).
The `biscayne-stop.yml` playbook enforces this.
### laconic-so Architecture
`laconic-so` manages kind clusters atomically — `deployment start` creates the
kind cluster, namespace, PVs, PVCs, and deployment in one shot. There is no way
to create the cluster without deploying the pod.
Key code paths in stack-orchestrator:
- `deploy_k8s.py:up()` — creates everything atomically
- `cluster_info.py:get_pvs()` — translates host paths using `kind-mount-root`
- `helpers_k8s.py:get_kind_pv_bind_mount_path()` — strips `kind-mount-root`
prefix and prepends `/mnt/`
- `helpers_k8s.py:_generate_kind_mounts()` — when `kind-mount-root` is set,
emits a single `/srv/kind``/mnt` mount instead of individual mounts
The `kind-mount-root: /srv/kind` setting in `spec.yml` means all data volumes
whose host paths start with `/srv/kind` get translated to `/mnt/...` inside the
kind node via a single bind mount.
### Key Identifiers
- Kind cluster: `laconic-70ce4c4b47e23b85`
- Namespace: `laconic-laconic-70ce4c4b47e23b85`
- Deployment: `laconic-70ce4c4b47e23b85-deployment`
- Kind node container: `laconic-70ce4c4b47e23b85-control-plane`
- Deployment dir: `/srv/deployments/agave`
- Snapshot dir: `/srv/solana/snapshots`
- Ledger dir: `/srv/solana/ledger`
- Accounts dir: `/srv/solana/ramdisk/accounts`
- Log dir: `/srv/solana/log`
- Host bind mount root: `/srv/kind` -> kind node `/mnt`
- laconic-so: `/home/rix/.local/bin/laconic-so` (editable install)
### PV Mount Paths (inside kind node)
| PV Name | hostPath |
|----------------------|-------------------------------|
| validator-snapshots | /mnt/solana/snapshots |
| validator-ledger | /mnt/solana/ledger |
| validator-accounts | /mnt/solana/ramdisk/accounts |
| validator-log | /mnt/solana/log |
### Snapshot Freshness
If the snapshot is more than **20,000 slots behind** the current mainnet tip, it is
too old. Stop the validator, download a fresh snapshot, and restart. Do NOT let it
try to catch up from an old snapshot — it will take too long and may never converge.
Check with:
```
# Snapshot slot (from filename)
ls /srv/solana/snapshots/snapshot-*.tar.*
# Current mainnet slot
curl -s -X POST -H "Content-Type: application/json" \
-d '{"jsonrpc":"2.0","id":1,"method":"getSlot","params":[{"commitment":"finalized"}]}' \
https://api.mainnet-beta.solana.com
```
### Snapshot Leapfrog Recovery
When the validator is stuck in a repair-dependent gap (incomplete shreds from a
relay outage or insufficient turbine coverage), "grinding through" doesn't work.
At 0.4 slots/sec replay through incomplete blocks vs 2.5 slots/sec chain
production, the gap grows faster than it shrinks.
**Strategy**: Download a fresh snapshot whose slot lands *past* the incomplete zone,
into the range where turbine+relay shreds are accumulating in the blockstore.
**Keep the existing ledger** — it has those shreds. The validator replays from
local blockstore data instead of waiting on repair.
**Steps**:
1. Let the validator run — turbine+relay accumulate shreds at the tip
2. Monitor shred completeness at the tip:
`scripts/check-shred-completeness.sh 500`
3. When there's a contiguous run of complete blocks (>100 slots), note the
starting slot of that run
4. Scale to 0, wipe accounts (ramdisk), wipe old snapshots
5. **Do NOT wipe ledger** — it has the turbine shreds
6. Download a fresh snapshot (its slot should be within the complete run)
7. Scale to 1 — validator replays from local blockstore at 3-5 slots/sec
**Why this works**: Turbine delivers ~60% of shreds in real-time. Repair fills
the rest for recent slots quickly (peers prioritize recent data). The only
problem is repair for *old* slots (minutes/hours behind) which peers deprioritize.
By snapshotting past the gap, we skip the old-slot repair bottleneck entirely.
### Shred Relay (Ashburn)
The TVU shred relay from laconic-was-sw01 provides ~4,000 additional shreds/sec.
Without it, turbine alone delivers ~60% of blocks. With it, completeness improves
but still requires repair for full coverage.
**Current state**: Old pipeline (monitor session + socat + shred-unwrap.py).
The traffic-policy redirect was never committed (auto-revert after 5 min timer).
See `docs/tvu-shred-relay.md` for the traffic-policy config that needs to be
properly applied.
**Boot dependency**: `shred-unwrap.py` must be running on biscayne for the old
pipeline to work. It is NOT persistent across reboots. The iptables DNAT rule
for the new pipeline IS persistent (iptables-persistent installed).
### Redeploy Flow
See `playbooks/biscayne-redeploy.yml`. The scale-to-0 pattern is required because
`laconic-so` creates the cluster and deploys the pod atomically:
1. Delete namespace (teardown)
2. Optionally wipe data
3. `laconic-so deployment start` (creates cluster + pod)
4. Immediately scale to 0
5. Download snapshot via aria2c
6. Scale to 1
7. Verify

3
README.md 100644
View File

@ -0,0 +1,3 @@
# biscayne-agave-runbook
Ansible playbooks for operating the kind-based agave-stack deployment on biscayne.vaasl.io.

13
ansible.cfg 100644
View File

@ -0,0 +1,13 @@
[defaults]
inventory = inventory/
stdout_callback = ansible.builtin.default
result_format = yaml
callbacks_enabled = profile_tasks
retry_files_enabled = false
[privilege_escalation]
become = true
become_method = sudo
[ssh_connection]
pipelining = true

View File

@ -0,0 +1,114 @@
# Arista EOS Reference Notes
Collected from live switch CLI (`?` help) and Arista documentation search
results. Switch platform: 7280CR3A, EOS 4.34.0F.
## PBR (Policy-Based Routing)
EOS uses `policy-map type pbr` — NOT `traffic-policy` (which is a different
feature for ASIC-level traffic policies, not available on all platforms/modes).
### Syntax
```
! ACL to match traffic
ip access-list <ACL-NAME>
10 permit <proto> <src> <dst> [ports]
! Class-map referencing the ACL
class-map type pbr match-any <CLASS-NAME>
match ip access-group <ACL-NAME>
! Policy-map with nexthop redirect
policy-map type pbr <POLICY-NAME>
class <CLASS-NAME>
set nexthop <A.B.C.D> ! direct nexthop IP
set nexthop recursive <A.B.C.D> ! recursive resolution
! set nexthop-group <NAME> ! nexthop group
! set ttl <value> ! TTL override
! Apply on interface
interface <INTF>
service-policy type pbr input <POLICY-NAME>
```
### PBR `set` options (from CLI `?`)
```
set ?
nexthop Next hop IP address for forwarding
nexthop-group next hop group name
ttl TTL effective with nexthop/nexthop-group
```
```
set nexthop ?
A.B.C.D next hop IP address
A:B:C:D:E:F:G:H next hop IPv6 address
recursive Enable Recursive Next hop resolution
```
**No VRF qualifier on `set nexthop`.** The nexthop must be reachable in the
VRF where the policy is applied. For cross-VRF PBR, use a static inter-VRF
route to make the nexthop reachable (see below).
## Static Inter-VRF Routes
Source: [EOS 4.34.0F - Static Inter-VRF Route](https://www.arista.com/en/um-eos/eos-static-inter-vrf-route)
Allows configuring a static route in one VRF with a nexthop evaluated in a
different VRF. Uses the `egress-vrf` keyword.
### Syntax
```
ip route vrf <ingress-vrf> <prefix>/<mask> egress-vrf <egress-vrf> <nexthop-ip>
ip route vrf <ingress-vrf> <prefix>/<mask> egress-vrf <egress-vrf> <interface>
```
### Examples (from Arista docs)
```
! Route in vrf1 with nexthop resolved in default VRF
ip route vrf vrf1 1.0.1.0/24 egress-vrf default 1.0.0.2
! show ip route vrf vrf1 output:
! S 1.0.1.0/24 [1/0] via 1.0.0.2, Vlan2180 (egress VRF default)
```
### Key points
- For bidirectional traffic, static inter-VRF routes must be configured in
both VRFs.
- ECMP next-hop sets across same or heterogeneous egress VRFs are supported.
- The `show ip route vrf` output displays the egress VRF name when it differs
from the source VRF.
## Inter-VRF Local Route Leaking
Source: [EOS 4.35.1F - Inter-VRF Local Route Leaking](https://www.arista.com/en/um-eos/eos-inter-vrf-local-route-leaking)
An alternative to static inter-VRF routes that leaks routes dynamically from
one VRF (source) to another VRF (destination) on the same router.
## Config Sessions
```
configure session <name> ! enter named session
show session-config diffs ! MUST be run from inside the session
commit timer HH:MM:SS ! commit with auto-revert timer
abort ! discard session
```
From enable mode:
```
configure session <name> commit ! finalize a pending session
```
## Checkpoints and Rollback
```
configure checkpoint save <name>
rollback running-config checkpoint <name>
write memory
```

File diff suppressed because it is too large Load Diff

View File

@ -0,0 +1,181 @@
<!-- Source: https://www.arista.com/um-eos/eos-ingress-and-egress-per-port-for-ipv4-and-ipv6-counters -->
<!-- Scraped: 2026-03-06T20:50:41.080Z -->
# Ingress and Egress Per-Port for IPv4 and IPv6 Counters
This feature supports per-interface ingress and egress packet and byte counters for IPv4
and IPv6.
This section describes Ingress and Egress per-port for IPv4 and IPv6 counters, including
configuration instructions and command descriptions.
Topics covered by this chapter include:
- Configuration
- Show commands
- Dedicated ARP Entry for TX IPv4 and IPv6 Counters
- Considerations
## Configuration
IPv4 and IPv6 ingress counters (count **bridged and routed**
traffic, supported only on front-panel ports) can be enabled and disabled using the
**hardware counter feature ip in**
command:
```
`**[no] hardware counter feature ip in**`
```
For IPv4 and IPv6 ingress and egress counters that include only
**routed** traffic (supported on Layer3 interfaces such as
routed ports and L3 subinterfaces only), use the following commands:
Note: The DCS-7300X, DCS-7250X, DCS-7050X, and DCS-7060X platforms
do not require configuration for IPv4 and IPv6 packet counters for only routed
traffic. They are collected by default. Other platforms (DCS-7280SR, DCS-7280CR, and
DCS-7500-R) need the feature enabled.
```
`**[no] hardware counter feature ip in layer3**`
```
```
`**[no] hardware counter feature ip out layer3**`
```
### hardware counter feature ip
Use the **hardware counter feature ip** command to enable ingress
and egress counters at Layer 3. The **no** and **default** forms of the command
disables the feature. The feature is enabled by default.
**Command Mode**
Configuration mode
**Command Syntax**
**hardware counter feature ip in|out layer3**
**no hardware counter feature ip in|out layer3**
**default hardware counter feature in|out layer3**
**Example**
This example enables ingress and egress ip counters for Layer 3.
```
`**switch(config)# hardware counter feature in layer3**`
```
```
`**switch(config)# hardware counter feature out layer3**`
```
## Show commands
Use the [**show interfaces counters ip**](/um-eos/eos-ethernet-ports#xzx_RbdvgrfI6B) command to
display IPv4, IPv6 packets, and octets.
**Example**
```
`switch# **show interfaces counters ip**
Interface IPv4InOctets IPv4InPkts IPv6InOctets IPv6InPkts
Et1/1 0 0 0 0
Et1/2 0 0 0 0
Et1/3 0 0 0 0
Et1/4 0 0 0 0
...
Interface IPv4OutOctets IPv4OutPkts IPv6OutOctets IPv6OutPkts
Et1/1 0 0 0 0
Et1/2 0 0 0 0
Et1/3 0 0 0 0
Et1/4 0 0 0 0
...`
```
You can also query the output from the **show interfaces counters
ip** command through snmp via the ARISTA-IP-MIB.
To clear the IPv4 or IPv6 counters, use the [**clear
counters**](/um-eos/eos-ethernet-ports#topic_dnd_1nm_vnb) command.
**Example**
```
`switch# **clear counters**`
```
## Dedicated ARP Entry for TX IPv4 and IPv6 Counters
IPv4/IPv6 egress Layer 3 (**hardware counter feature ip out layer3**)
counting on DCS-7280SR, DCS-7280CR, and DCS-7500-R platforms work based on ARP entry of
the next hop. By default, IPv4's next-hop and IPv6's next-hop resolve to the same MAC
address and interface that shared the ARP entry.
To differentiate the counters between IPv4 and IPv6, disable
**arp** entry sharing with the following command:
```
`**ip hardware fib next-hop arp dedicated**`
```
Note: This command is required for IPv4 and IPv6 egress counters
to operate on the DCS-7280SR, DCS-7280CR, and DCS-7500-R platforms.
## Considerations
- Packet sizes greater than 9236 bytes are not counted by per-port IPv4 and IPv6 counters.
- Only the DCS-7260X3, DCS-7368, DCS-7300, DCS-7050SX3, DCS-7050CX3, DCS-7280SR,
DCS-7280CR and DCS-7500-R platforms support the **hardware counter feature ip in** command.
- Only the DCS-7280SR, DCS-7280CR and DCS-7500-R platforms support the **hardware counter feature ip [in|out] layer3** command.

View File

@ -0,0 +1,305 @@
<!-- Source: https://www.arista.com/en/um-eos/eos-inter-vrf-local-route-leaking -->
<!-- Scraped: 2026-03-06T20:43:28.363Z -->
# Inter-VRF Local Route Leaking
Inter-VRF local route leaking allows the leaking of routes from one VRF (the source VRF) to
another VRF (the destination VRF) on the same router.
Inter-VRF routes can exist in any VRF (including the
default VRF) on the system. Routes can be leaked using the
following methods:
- Inter-VRF Local Route Leaking using BGP
VPN
- Inter-VRF Local Route Leaking using VRF-leak
Agent
## Inter-VRF Local Route Leaking using BGP VPN
Inter-VRF local route leaking allows the user to export and import routes from one VRF to another
on the same device. This is implemented by exporting routes from a VRF to the local VPN table
using the route target extended community list and importing the same route target extended
community lists from the local VPN table into the target VRF. VRF route leaking is supported
on VPN-IPv4, VPN-IPv6, and EVPN types.
Figure 1. Inter-VRF Local Route Leaking using Local VPN Table
### Accessing Shared Resources Across VPNs
To access shared resources across VPNs, all the routes from the shared services VRF must be
leaked into each of the VPN VRFs, and customer routes must be leaked into the shared
services VRF for return traffic. Accessing shared resources allows the route target of the
shared services VRF to be exported into all customer VRFs, and allows the shared services
VRF to import route targets from customers A and B. The following figure shows how to
provide customers, corresponding to multiple VPN domains, access to services like DHCP
available in the shared VRF.
Route leaking across the VRFs is supported
on VPN-IPv4, VPN-IPv6, and EVPN.
Figure 2. Accessing Shared Resources Across VPNs
### Configuring Inter-VRF Local Route Leaking
Inter-VRF local route leaking is configured using VPN-IPv4, VPN-IPv6, and EVPN. Prefixes can be
exported and imported using any of the configured VPN types. Ensure that the same VPN
type that is exported is used while importing.
Leaking unicast IPv4 or IPv6 prefixes is supported and achieved by exporting prefixes locally to
the VPN table and importing locally from the VPN table into the target VRF on the same
device as shown in the figure titled **Inter-VRF Local Route Leaking using Local VPN
Table** using the **route-target** command.
Exporting or importing the routes to or from the EVPN table is accomplished with the following
two methods:
- Using VXLAN for encapsulation
- Using MPLS for encapsulation
#### Using VXLAN for Encapsulation
To use VXLAN encapsulation type, make sure that VRF to VNI mapping is present and the interface
status for the VXLAN interface is up. This is the default encapsulation type for
EVPN.
**Example**
The configuration for VXLAN encapsulation type is as
follows:
```
`switch(config)# **router bgp 65001**
switch(config-router-bgp)# **address-family evpn**
switch(config-router-bgp-af)# **neighbor default encapsulation VXLAN next-hop-self source-interface Loopback0**
switch(config)# **hardware tcam**
switch(config-hw-tcam)# **system profile VXLAN-routing**
switch(config-hw-tcam)# **interface VXLAN1**
switch(config-hw-tcam-if-Vx1)# **VXLAN source-interface Loopback0**
switch(config-hw-tcam-if-Vx1)# **VXLAN udp-port 4789**
switch(config-hw-tcam-if-Vx1)# **VXLAN vrf vrf-blue vni 20001**
switch(config-hw-tcam-if-Vx1)# **VXLAN vrf vrf-red vni 10001**`
```
#### Using MPLS for Encapsulation
To use MPLS encapsulation type to export
to the EVPN table, MPLS needs to be enabled globally on the device and
the encapsulation method needs to be changed from default type, that
is VXLAN to MPLS under the EVPN address-family sub-mode.
**Example**
```
`switch(config)# **router bgp 65001**
switch(config-router-bgp)# **address-family evpn**
switch(config-router-bgp-af)# **neighbor default encapsulation mpls next-hop-self source-interface Loopback0**`
```
### Route-Distinguisher
Route-Distinguisher (RD) uniquely identifies routes from a particular VRF.
Route-Distinguisher is configured for every VRF from which routes are exported from or
imported into.
The following commands are used to configure Route-Distinguisher for a VRF.
```
`switch(config-router-bgp)# **vrf vrf-services**
switch(config-router-bgp-vrf-vrf-services)# **rd 1.0.0.1:1**
switch(config-router-bgp)# **vrf vrf-blue**
switch(config-router-bgp-vrf-vrf-blue)# **rd 2.0.0.1:2**`
```
### Exporting Routes from a VRF
Use the **route-target export** command to export routes from a VRF to the
local VPN or EVPN table using the route target
extended community list.
**Examples**
- These commands export routes from
**vrf-red** to the local VPN
table.
```
`switch(config)# **service routing protocols model multi-agent**
switch(config)# **mpls ip**
switch(config)# **router bgp 65001**
switch(config-router-bgp)# **vrf vrf-red**
switch(config-router-bgp-vrf-vrf-red)# **rd 1:1**
switch(config-router-bgp-vrf-vrf-red)# **route-target export vpn-ipv4 10:10**
switch(config-router-bgp-vrf-vrf-red)# **route-target export vpn-ipv6 10:20**`
```
- These commands export routes from
**vrf-red** to the EVPN
table.
```
`switch(config)# **router bgp 65001**
switch(config-router-bgp)# **vrf vrf-red**
switch(config-router-bgp-vrf-vrf-red)# **rd 1:1**
switch(config-router-bgp-vrf-vrf-red)# **route-target export evpn 10:1**`
```
### Importing Routes into a VRF
Use the **route-target import** command to import the exported routes from
the local VPN or EVPN table to the target VRF
using the route target extended community
list.
**Examples**
- These commands import routes from the VPN
table to
**vrf-blue**.
```
`switch(config)# **service routing protocols model multi-agent**
switch(config)# **mpls ip**
switch(config)# **router bgp 65001**
switch(config-router-bgp)# **vrf vrf-blue**
switch(config-router-bgp-vrf-vrf-blue)# **rd 2:2**
switch(config-router-bgp-vrf-vrf-blue)# **route-target import vpn-ipv4 10:10**
switch(config-router-bgp-vrf-vrf-blue)# **route-target import vpn-ipv6 10:20**`
```
- These commands import routes from the EVPN
table to
**vrf-blue**.
```
`switch(config)# **router bgp 65001**
switch(config-router-bgp)# **vrf vrf-blue**
switch(config-router-bgp-vrf-vrf-blue)# **rd 2:2**
switch(config-router-bgp-vrf-vrf-blue)# **route-target import evpn 10:1**`
```
### Exporting and Importing Routes using Route
Map
To manage VRF route leaking, control the export and import prefixes with route-map export or
import commands. The route map is effective only if the VRF or the VPN
paths are already candidates for export or import. The route-target
export or import commandmust be configured first. Setting BGP
attributes using route maps is effective only on the export end.
Note: Prefixes that are leaked are not re-exported to the VPN table from the target VRF.
**Examples**
- These commands export routes from
**vrf-red** to the local VPN
table.
```
`switch(config)# **service routing protocols model multi-agent**
switch(config)# **mpls ip**
switch(config)# **router bgp 65001**
switch(config-router-bgp)# **vrf vrf-red**
switch(config-router-bgp-vrf-vrf-red)# **rd 1:1**
switch(config-router-bgp-vrf-vrf-red)# **route-target export vpn-ipv4 10:10**
switch(config-router-bgp-vrf-vrf-red)# **route-target export vpn-ipv6 10:20**
switch(config-router-bgp-vrf-vrf-red)# **route-target export vpn-ipv4 route-map EXPORT_V4_ROUTES_T0_VPN_TABLE**
switch(config-router-bgp-vrf-vrf-red)# **route-target export vpn-ipv6 route-map EXPORT_V6_ROUTES_T0_VPN_TABLE**`
```
- These commands export routes to from
**vrf-red** to the EVPN
table.
```
`switch(config)# **router bgp 65001**
switch(config-router-bgp)# **vrf vrf-red**
switch(config-router-bgp-vrf-vrf-red)# **rd 1:1**
switch(config-router-bgp-vrf-vrf-red)# **route-target export evpn 10:1**
switch(config-router-bgp-vrf-vrf-red)# **route-target export evpn route-map EXPORT_ROUTES_T0_EVPN_TABLE**`
```
- These commands import routes from the VPN table to
**vrf-blue**.
```
`switch(config)# **service routing protocols model multi-agent**
switch(config)# **mpls ip**
switch(config)# **router bgp 65001**
switch(config-router-bgp)# **vrf vrf-blue**
switch(config-router-bgp-vrf-vrf-blue)# **rd 1:1**
switch(config-router-bgp-vrf-vrf-blue)# **route-target import vpn-ipv4 10:10**
switch(config-router-bgp-vrf-vrf-blue)# **route-target import vpn-ipv6 10:20**
switch(config-router-bgp-vrf-vrf-blue)# **route-target import vpn-ipv4 route-map IMPORT_V4_ROUTES_VPN_TABLE**
switch(config-router-bgp-vrf-vrf-blue)# **route-target import vpn-ipv6 route-map IMPORT_V6_ROUTES_VPN_TABLE**`
```
- These commands import routes from the EVPN table to
**vrf-blue**.
```
`switch(config)# **router bgp 65001**
switch(config-router-bgp)# **vrf vrf-blue**
switch(config-router-bgp-vrf-vrf-blue)# **rd 2:2**
switch(config-router-bgp-vrf-vrf-blue)# **route-target import evpn 10:1**
switch(config-router-bgp-vrf-vrf-blue)# **route-target import evpn route-map IMPORT_ROUTES_FROM_EVPN_TABLE**`
```
## Inter-VRF Local Route Leaking using VRF-leak
Agent
Inter-VRF local route leaking allows routes to leak from one VRF to another using a route
map as a VRF-leak agent. VRFs are leaked based on the preferences assigned to each
VRF.
### Configuring Route Maps
To leak routes from one VRF to another using a route map, use the [router general](/um-eos/eos-evpn-and-vcs-commands#xx1351777) command to enter Router-General
Configuration Mode, then enter the VRF submode for the destination VRF, and use the
[leak routes](/um-eos/eos-evpn-and-vcs-commands#reference_g2h_2z3_hwb) command to specify the source
VRF and the route map to be used. Routes in the source VRF that match the policy in the
route map will then be considered for leaking into the configuration-mode VRF. If two or
more policies specify leaking the same prefix to the same destination VRF, the route
with a higher (post-set-clause) distance and preference is chosen.
**Example**
These commands configure a route map to leak routes from **VRF1**
to **VRF2** using route map
**RM1**.
```
`switch(config)# **router general**
switch(config-router-general)# **vrf VRF2**
switch(config-router-general-vrf-VRF2)# **leak routes source-vrf VRF1 subscribe-policy RM1**
switch(config-router-general-vrf-VRF2)#`
```

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

View File

@ -0,0 +1,82 @@
<!-- Source: https://www.arista.com/en/um-eos/eos-static-inter-vrf-route -->
<!-- Scraped: 2026-03-06T20:43:17.977Z -->
# Static Inter-VRF Route
The Static Inter-VRF Route feature adds support for static inter-VRF routes. This enables the configuration of routes to destinations in one ingress VRF with an ability to specify a next-hop in a different egress VRF through a static configuration.
You can configure static inter-VRF routes in default and non-default VRFs. A different
egress VRF is achieved by “tagging” the **next-hop** or **forwarding
via** with a reference to an egress VRF (different from the source
VRF) in which that next-hop should be evaluated. Static inter-VRF routes
with ECMP next-hop sets in the same egress VRF or heterogenous egress VRFs
can be specified.
The Static Inter-VRF Route feature is independent and complementary to other mechanisms that can be used to setup local inter-VRF routes. The other supported mechanisms in EOS and the broader use-cases they support are documented here:
- [Inter-VRF Local Route Leaking using BGP VPN](/um-eos/eos-inter-vrf-local-route-leaking#xx1348142)
- [Inter-VRF Local Route Leaking using VRF-leak Agent](/um-eos/eos-inter-vrf-local-route-leaking#xx1346287)
## Configuration
The configuration to setup static-Inter VRF routes in an ingress (source) VRF to forward IP traffic to a different egress (target) VRF can be done in the following modes:
- This command creates a static route in one ingress VRF that points to a next-hop
in a different egress VRF.
ip | ipv6
route [vrf
vrf-name
destination-prefix [egress-vrf
egress-next-hop-vrf-name]
next-hop]
## Show Commands
Use the **show ip route vrf** to display the egress VRF name if it
differs from the source VRF.
**Example**
```
`switch# **show ip route vrf vrf1**
VRF: vrf1
Codes: C - connected, S - static, K - kernel,
O - OSPF, IA - OSPF inter area, E1 - OSPF external type 1,
E2 - OSPF external type 2, N1 - OSPF NSSA external type 1,
N2 - OSPF NSSA external type2, B - BGP, B I - iBGP, B E - eBGP,
R - RIP, I L1 - IS-IS level 1, I L2 - IS-IS level 2,
O3 - OSPFv3, A B - BGP Aggregate, A O - OSPF Summary,
NG - Nexthop Group Static Route, V - VXLAN Control Service,
DH - DHCP client installed default route, M - Martian,
DP - Dynamic Policy Route, L - VRF Leaked
Gateway of last resort is not set
S 1.0.1.0/24 [1/0] via 1.0.0.2, Vlan2180 (egress VRF default)
S 1.0.7.0/24 [1/0] via 1.0.6.2, Vlan2507 (egress VRF vrf3)`
```
## Limitations
- For bidirectional traffic to work correctly between a pair of VRFs, static inter-VRF
routes in both VRFs must be configured.
- Static Inter-VRF routing is supported only in multi-agent routing protocol mode.

File diff suppressed because it is too large Load Diff

View File

@ -0,0 +1,275 @@
# Ashburn Validator Relay — Full Traffic Redirect
## Overview
All validator traffic (gossip, repair, TVU, TPU) enters and exits from
`137.239.194.65` (laconic-was-sw01, Ashburn). Peers see the validator as an
Ashburn node. This improves repair peer count and slot catchup rate by reducing
RTT to the TeraSwitch/Pittsburgh cluster from ~30ms (direct Miami) to ~5ms
(Ashburn).
Supersedes the previous TVU-only shred relay (see `tvu-shred-relay.md`).
## Architecture
```
OUTBOUND (validator → peers)
agave-validator (kind pod, ports 8001, 9000-9025)
↓ Docker bridge → host FORWARD chain
biscayne host (186.233.184.235)
↓ mangle PREROUTING: fwmark 100 on sport 8001,9000-9025 from 172.20.0.0/16
↓ nat POSTROUTING: SNAT → src 137.239.194.65
↓ policy route: fwmark 100 → table ashburn → via 169.254.7.6 dev doublezero0
laconic-mia-sw01 (209.42.167.133, Miami)
↓ traffic-policy VALIDATOR-OUTBOUND: src 137.239.194.65 → nexthop 172.16.1.188
↓ backbone Et4/1 (25.4ms)
laconic-was-sw01 Et4/1 (Ashburn)
↓ default route via 64.92.84.80 out Et1/1
Internet (peers see src 137.239.194.65)
INBOUND (peers → validator)
Solana peers → 137.239.194.65:8001,9000-9025
↓ internet routing to was-sw01
laconic-was-sw01 Et1/1 (Ashburn)
↓ traffic-policy VALIDATOR-RELAY: ASIC redirect, line rate
↓ nexthop 172.16.1.189 via Et4/1 backbone (25.4ms)
laconic-mia-sw01 Et4/1 (Miami)
↓ L3 forward → biscayne via doublezero0 GRE or ISP routing
biscayne (186.233.184.235)
↓ nat PREROUTING: DNAT dst 137.239.194.65:* → 172.20.0.2:* (kind node)
↓ Docker bridge → validator pod
agave-validator
```
RPC traffic (port 8899) is NOT relayed — clients connect directly to biscayne.
## Switch Config: laconic-was-sw01
SSH: `install@137.239.200.198`
### Pre-change
```
configure checkpoint save pre-validator-relay
```
Rollback: `rollback running-config checkpoint pre-validator-relay` then `write memory`.
### Config session with auto-revert
```
configure session validator-relay
! Loopback for 137.239.194.65 (do NOT touch Loopback100 which has .64)
interface Loopback101
ip address 137.239.194.65/32
! ACL covering all validator ports
ip access-list VALIDATOR-RELAY-ACL
10 permit udp any any eq 8001
20 permit udp any any range 9000 9025
30 permit tcp any any eq 8001
! Traffic-policy: ASIC redirect to backbone (mia-sw01)
traffic-policy VALIDATOR-RELAY
match VALIDATOR-RELAY-ACL
set nexthop 172.16.1.189
! Replace old SHRED-RELAY on Et1/1
interface Ethernet1/1
no traffic-policy input SHRED-RELAY
traffic-policy input VALIDATOR-RELAY
! system-rule overriding-action redirect (already present from SHRED-RELAY)
show session-config diffs
commit timer 00:05:00
```
After verification: `configure session validator-relay commit` then `write memory`.
### Cleanup (after stable)
Old SHRED-RELAY policy and ACL can be removed once VALIDATOR-RELAY is confirmed:
```
configure session cleanup-shred-relay
no traffic-policy SHRED-RELAY
no ip access-list SHRED-RELAY-ACL
show session-config diffs
commit
write memory
```
## Switch Config: laconic-mia-sw01
### Pre-flight checks
Before applying config, verify:
1. Which EOS interface terminates the doublezero0 GRE from biscayne
(endpoint 209.42.167.133). Check with `show interfaces tunnel` or
`show ip interface brief | include Tunnel`.
2. Whether `system-rule overriding-action redirect` is already configured.
Check with `show running-config | include system-rule`.
3. Whether EOS traffic-policy works on tunnel interfaces. If not, apply on
the physical interface where GRE packets arrive (likely Et<X> facing
biscayne's ISP network or the DZ infrastructure).
### Config session
```
configure checkpoint save pre-validator-outbound
configure session validator-outbound
! ACL matching outbound validator traffic (source = Ashburn IP)
ip access-list VALIDATOR-OUTBOUND-ACL
10 permit ip 137.239.194.65/32 any
! Redirect to was-sw01 via backbone
traffic-policy VALIDATOR-OUTBOUND
match VALIDATOR-OUTBOUND-ACL
set nexthop 172.16.1.188
! Apply on the interface where biscayne GRE traffic arrives
! Replace Tunnel<X> with the actual interface from pre-flight check #1
interface Tunnel<X>
traffic-policy input VALIDATOR-OUTBOUND
! Add system-rule if not already present (pre-flight check #2)
system-rule overriding-action redirect
show session-config diffs
commit timer 00:05:00
```
After verification: commit + `write memory`.
## Host Config: biscayne
Automated via ansible playbook `playbooks/ashburn-validator-relay.yml`.
### Manual equivalent
```bash
# 1. Accept packets destined for 137.239.194.65
sudo ip addr add 137.239.194.65/32 dev lo
# 2. Inbound DNAT to kind node (172.20.0.2)
sudo iptables -t nat -A PREROUTING -p udp -d 137.239.194.65 --dport 8001 \
-j DNAT --to-destination 172.20.0.2:8001
sudo iptables -t nat -A PREROUTING -p tcp -d 137.239.194.65 --dport 8001 \
-j DNAT --to-destination 172.20.0.2:8001
sudo iptables -t nat -A PREROUTING -p udp -d 137.239.194.65 --dport 9000:9025 \
-j DNAT --to-destination 172.20.0.2
# 3. Outbound: mark validator traffic
sudo iptables -t mangle -A PREROUTING -s 172.20.0.0/16 -p udp --sport 8001 \
-j MARK --set-mark 100
sudo iptables -t mangle -A PREROUTING -s 172.20.0.0/16 -p udp --sport 9000:9025 \
-j MARK --set-mark 100
sudo iptables -t mangle -A PREROUTING -s 172.20.0.0/16 -p tcp --sport 8001 \
-j MARK --set-mark 100
# 4. Outbound: SNAT to Ashburn IP (INSERT before Docker MASQUERADE)
sudo iptables -t nat -I POSTROUTING 1 -m mark --mark 100 \
-j SNAT --to-source 137.239.194.65
# 5. Policy routing table
echo "100 ashburn" | sudo tee -a /etc/iproute2/rt_tables
sudo ip rule add fwmark 100 table ashburn
sudo ip route add default via 169.254.7.6 dev doublezero0 table ashburn
# 6. Persist
sudo netfilter-persistent save
# ip rule + ip route persist via /etc/network/if-up.d/ashburn-routing
```
### Docker NAT port preservation
**Must verify before going live:** Docker masquerade must preserve source ports
for kind's hostNetwork pods. If Docker rewrites the source port, the mangle
PREROUTING match on `--sport 8001,9000-9025` will miss traffic.
Test: `tcpdump -i br-cf46a62ab5b2 -nn 'udp src port 8001'` — if you see
packets with sport 8001 from 172.20.0.2, port preservation works.
If Docker does NOT preserve ports, the mark must be set inside the kind node
container (on the pod's veth) rather than on the host.
## Execution Order
1. **was-sw01**: checkpoint → config session with 5min auto-revert → verify counters → commit
2. **biscayne**: add 137.239.194.65/32 to lo, add inbound DNAT rules
3. **Verify inbound**: `ping 137.239.194.65` from external host, check DNAT counters
4. **mia-sw01**: pre-flight checks → config session with 5min auto-revert → commit
5. **biscayne**: add outbound fwmark + policy routing + SNAT rules
6. **Test outbound**: from biscayne, send UDP from port 8001, verify src 137.239.194.65 on was-sw01
7. **Verify**: traffic-policy counters on both switches, iptables hit counts on biscayne
8. **Restart validator** if needed (gossip should auto-refresh, but restart ensures clean state)
9. **was-sw01 + mia-sw01**: `write memory` to persist
10. **Cleanup**: remove old SHRED-RELAY and 64.92.84.81:20000 DNAT after stable
## Verification
1. `show traffic-policy counters` on was-sw01 — VALIDATOR-RELAY-ACL matches
2. `show traffic-policy counters` on mia-sw01 — VALIDATOR-OUTBOUND-ACL matches
3. `sudo iptables -t nat -L -v -n` on biscayne — DNAT and SNAT hit counts
4. `sudo iptables -t mangle -L -v -n` on biscayne — fwmark hit counts
5. `ip rule show` on biscayne — fwmark 100 lookup ashburn
6. Validator gossip ContactInfo shows 137.239.194.65 for ALL addresses (gossip, repair, TVU, TPU)
7. Repair peer count increases (target: 20+ peers)
8. Slot catchup rate improves from ~0.9 toward ~2.5 slots/sec
9. `traceroute --sport=8001 <remote_peer>` from biscayne routes via doublezero0/was-sw01
## Rollback
### biscayne
```bash
sudo ip addr del 137.239.194.65/32 dev lo
sudo iptables -t nat -D PREROUTING -p udp -d 137.239.194.65 --dport 8001 -j DNAT --to-destination 172.20.0.2:8001
sudo iptables -t nat -D PREROUTING -p tcp -d 137.239.194.65 --dport 8001 -j DNAT --to-destination 172.20.0.2:8001
sudo iptables -t nat -D PREROUTING -p udp -d 137.239.194.65 --dport 9000:9025 -j DNAT --to-destination 172.20.0.2
sudo iptables -t mangle -D PREROUTING -s 172.20.0.0/16 -p udp --sport 8001 -j MARK --set-mark 100
sudo iptables -t mangle -D PREROUTING -s 172.20.0.0/16 -p udp --sport 9000:9025 -j MARK --set-mark 100
sudo iptables -t mangle -D PREROUTING -s 172.20.0.0/16 -p tcp --sport 8001 -j MARK --set-mark 100
sudo iptables -t nat -D POSTROUTING -m mark --mark 100 -j SNAT --to-source 137.239.194.65
sudo ip rule del fwmark 100 table ashburn
sudo ip route del default table ashburn
sudo netfilter-persistent save
```
### was-sw01
```
rollback running-config checkpoint pre-validator-relay
write memory
```
### mia-sw01
```
rollback running-config checkpoint pre-validator-outbound
write memory
```
## Key Details
| Item | Value |
|------|-------|
| Ashburn relay IP | `137.239.194.65` (Loopback101 on was-sw01) |
| Ashburn LAN block | `137.239.194.64/29` on was-sw01 Et1/1 |
| Biscayne IP | `186.233.184.235` |
| Kind node IP | `172.20.0.2` (Docker bridge br-cf46a62ab5b2) |
| Validator ports | 8001 (gossip), 9000-9025 (TVU/repair/TPU) |
| Excluded ports | 8899 (RPC), 8900 (WebSocket) — direct to biscayne |
| GRE tunnel | doublezero0: 169.254.7.7 ↔ 169.254.7.6, remote 209.42.167.133 |
| Backbone | was-sw01 Et4/1 172.16.1.188/31 ↔ mia-sw01 Et4/1 172.16.1.189/31 |
| Policy routing table | 100 ashburn |
| Fwmark | 100 |
| was-sw01 SSH | `install@137.239.200.198` |
| EOS version | 4.34.0F |

View File

@ -0,0 +1,416 @@
# Blue-Green Upgrades for Biscayne
Zero-downtime upgrade procedures for the agave-stack deployment on biscayne.
Uses ZFS clones for instant data duplication, Caddy health-check routing for
traffic shifting, and k8s native sidecars for independent container upgrades.
## Architecture
```
Caddy ingress (biscayne.vaasl.io)
├── upstream A: localhost:8899 ← health: /health
└── upstream B: localhost:8897 ← health: /health
┌─────────────────┴──────────────────┐
│ kind cluster │
│ │
│ Deployment A Deployment B │
│ ┌─────────────┐ ┌─────────────┐ │
│ │ agave :8899 │ │ agave :8897 │ │
│ │ doublezerod │ │ doublezerod │ │
│ └──────┬──────┘ └──────┬──────┘ │
└─────────┼─────────────────┼─────────┘
│ │
ZFS dataset A ZFS clone B
(original) (instant CoW copy)
```
Both deployments run in the same kind cluster with `hostNetwork: true`.
Caddy active health checks route traffic to whichever deployment has a
healthy `/health` endpoint.
## Storage Layout
| Data | Path | Type | Survives restart? |
|------|------|------|-------------------|
| Ledger | `/srv/solana/ledger` | ZFS zvol (xfs) | Yes |
| Snapshots | `/srv/solana/snapshots` | ZFS zvol (xfs) | Yes |
| Accounts | `/srv/solana/ramdisk/accounts` | `/dev/ram0` (xfs) | Until host reboot |
| Validator config | `/srv/deployments/agave/data/validator-config` | ZFS | Yes |
| DZ config | `/srv/deployments/agave/data/doublezero-config` | ZFS | Yes |
The ZFS zvol `biscayne/DATA/volumes/solana` backs `/srv/solana` (ledger, snapshots).
The ramdisk at `/dev/ram0` holds accounts — it's a block device, not tmpfs, so it
survives process restarts but not host reboots.
---
## Procedure 1: DoubleZero Binary Upgrade (zero downtime, single pod)
The GRE tunnel (`doublezero0`) and BGP routes live in kernel space. They persist
across doublezerod process restarts. Upgrading the DZ binary does not require
tearing down the tunnel or restarting the validator.
### Prerequisites
- doublezerod is defined as a k8s native sidecar (`spec.initContainers` with
`restartPolicy: Always`). See [Required Changes](#required-changes) below.
- k8s 1.29+ (biscayne runs 1.35.1)
### Steps
1. Build or pull the new doublezero container image.
2. Patch the pod's sidecar image:
```bash
kubectl -n <ns> patch pod <pod> --type='json' -p='[
{"op": "replace", "path": "/spec/initContainers/0/image",
"value": "laconicnetwork/doublezero:new-version"}
]'
```
3. Only the doublezerod container restarts. The agave container is unaffected.
The GRE tunnel interface and BGP routes remain in the kernel throughout.
4. Verify:
```bash
kubectl -n <ns> exec <pod> -c doublezerod -- doublezero --version
kubectl -n <ns> exec <pod> -c doublezerod -- doublezero status
ip route | grep doublezero0 # routes still present
```
### Rollback
Patch the image back to the previous version. Same process, same zero downtime.
---
## Procedure 2: Agave Version Upgrade (zero RPC downtime, blue-green)
Agave is the main container and must be restarted for a version change. To maintain
zero RPC downtime, we run two deployments simultaneously and let Caddy shift traffic
based on health checks.
### Prerequisites
- Caddy ingress configured with dual upstreams and active health checks
- A parameterized spec.yml that accepts alternate ports and volume paths
- ZFS snapshot/clone scripts
### Steps
#### Phase 1: Prepare (no downtime, no risk)
1. **ZFS snapshot** for rollback safety:
```bash
zfs snapshot -r biscayne/DATA@pre-upgrade-$(date +%Y%m%d)
```
2. **ZFS clone** the validator volumes:
```bash
zfs clone biscayne/DATA/volumes/solana@pre-upgrade-$(date +%Y%m%d) \
biscayne/DATA/volumes/solana-blue
```
This is instant (copy-on-write). No additional storage until writes diverge.
3. **Clone the ramdisk accounts** (not on ZFS):
```bash
mkdir -p /srv/solana-blue/ramdisk/accounts
cp -a /srv/solana/ramdisk/accounts/* /srv/solana-blue/ramdisk/accounts/
```
This is the slow step — 460GB on ramdisk. Consider `rsync` with `--inplace`
to minimize copy time, or investigate whether the ramdisk can move to a ZFS
dataset for instant cloning in future deployments.
4. **Build or pull** the new agave container image.
#### Phase 2: Start blue deployment (no downtime)
5. **Create Deployment B** in the same kind cluster, pointing at cloned volumes,
with RPC on port 8897:
```bash
# Apply the blue deployment manifest (parameterized spec)
kubectl apply -f deployment/k8s-manifests/agave-blue.yaml
```
6. **Deployment B catches up.** It starts from the snapshot point and replays.
Monitor progress:
```bash
kubectl -n <ns> exec <blue-pod> -c agave-validator -- \
solana -u http://127.0.0.1:8897 slot
```
7. **Validate** the new version works:
- RPC responds: `curl -sf http://localhost:8897/health`
- Correct version: `kubectl -n <ns> exec <blue-pod> -c agave-validator -- agave-validator --version`
- doublezerod connected (if applicable)
Take as long as needed. Deployment A is still serving all traffic.
#### Phase 3: Traffic shift (zero downtime)
8. **Caddy routes traffic to B.** Once B's `/health` returns 200, Caddy's active
health check automatically starts routing to it. Alternatively, update the
Caddy upstream config to prefer B.
9. **Verify** B is serving live traffic:
```bash
curl -sf https://biscayne.vaasl.io/health
# Check Caddy access logs for requests hitting port 8897
```
#### Phase 4: Cleanup
10. **Stop Deployment A:**
```bash
kubectl -n <ns> delete deployment agave-green
```
11. **Reconfigure B to use standard port** (8899) if desired, or update Caddy
to only route to 8897.
12. **Clean up ZFS clone** (or keep as rollback):
```bash
zfs destroy biscayne/DATA/volumes/solana-blue
```
### Rollback
At any point before Phase 4:
- Deployment A is untouched and still serving traffic (or can be restarted)
- Delete Deployment B: `kubectl -n <ns> delete deployment agave-blue`
- Destroy the ZFS clone: `zfs destroy biscayne/DATA/volumes/solana-blue`
After Phase 4 (A already stopped):
- `zfs rollback` to restore original data
- Redeploy A with old image
---
## Required Changes to agave-stack
### 1. Move doublezerod to native sidecar
In the pod spec generation (laconic-so or compose override), doublezerod must be
defined as a native sidecar container instead of a regular container:
```yaml
spec:
initContainers:
- name: doublezerod
image: laconicnetwork/doublezero:local
restartPolicy: Always # makes it a native sidecar
securityContext:
privileged: true
capabilities:
add: [NET_ADMIN]
env:
- name: DOUBLEZERO_RPC_ENDPOINT
value: https://api.mainnet-beta.solana.com
volumeMounts:
- name: doublezero-config
mountPath: /root/.config/doublezero
containers:
- name: agave-validator
image: laconicnetwork/agave:local
# ... existing config
```
This change means:
- doublezerod starts before agave and stays running
- Patching the doublezerod image restarts only that container
- agave can be restarted independently without affecting doublezerod
This requires a laconic-so change to support `initContainers` with `restartPolicy`
in compose-to-k8s translation — or a post-deployment patch.
### 2. Caddy dual-upstream config
Add health-checked upstreams for both blue and green deployments:
```caddyfile
biscayne.vaasl.io {
reverse_proxy {
to localhost:8899 localhost:8897
health_uri /health
health_interval 5s
health_timeout 3s
lb_policy first
}
}
```
`lb_policy first` routes to the first healthy upstream. When only A is running,
all traffic goes to :8899. When B comes up healthy, traffic shifts.
### 3. Parameterized deployment spec
Create a parameterized spec or kustomize overlay that accepts:
- RPC port (8899 vs 8897)
- Volume paths (original vs ZFS clone)
- Deployment name suffix (green vs blue)
### 4. Delete DaemonSet workaround
Remove `deployment/k8s-manifests/doublezero-daemonset.yaml` from agave-stack.
### 5. Fix container DZ identity
Copy the registered identity into the container volume:
```bash
sudo cp /home/solana/.config/doublezero/id.json \
/srv/deployments/agave/data/doublezero-config/id.json
```
### 6. Disable host systemd doublezerod
After the container sidecar is working:
```bash
sudo systemctl stop doublezerod
sudo systemctl disable doublezerod
```
---
## Implementation Order
This is a spec-driven, test-driven plan. Each step produces a testable artifact.
### Step 1: Fix existing DZ bugs (no code changes to laconic-so)
Fixes BUG-1 through BUG-5 from [doublezero-status.md](doublezero-status.md).
**Spec:** Container doublezerod shows correct identity, connects to laconic-mia-sw01,
host systemd doublezerod is disabled.
**Test:**
```bash
kubectl -n <ns> exec <pod> -c doublezerod -- doublezero address
# assert: 3Bw6v7EruQvTwoY79h2QjQCs2KBQFzSneBdYUbcXK1Tr
kubectl -n <ns> exec <pod> -c doublezerod -- doublezero status
# assert: BGP Session Up, laconic-mia-sw01
systemctl is-active doublezerod
# assert: inactive
```
**Changes:**
- Copy `id.json` to container volume
- Update `DOUBLEZERO_RPC_ENDPOINT` in spec.yml
- Deploy with hostNetwork-enabled stack-orchestrator
- Stop and disable host doublezerod
- Delete DaemonSet manifest from agave-stack
### Step 2: Native sidecar for doublezerod
**Spec:** doublezerod image can be patched without restarting the agave container.
GRE tunnel and routes persist across doublezerod restart.
**Test:**
```bash
# Record current agave container start time
BEFORE=$(kubectl -n <ns> get pod <pod> -o jsonpath='{.status.containerStatuses[?(@.name=="agave-validator")].state.running.startedAt}')
# Patch DZ image
kubectl -n <ns> patch pod <pod> --type='json' -p='[
{"op":"replace","path":"/spec/initContainers/0/image","value":"laconicnetwork/doublezero:test"}
]'
# Wait for DZ container to restart
sleep 10
# Verify agave was NOT restarted
AFTER=$(kubectl -n <ns> get pod <pod> -o jsonpath='{.status.containerStatuses[?(@.name=="agave-validator")].state.running.startedAt}')
[ "$BEFORE" = "$AFTER" ] # assert: same start time
# Verify tunnel survived
ip route | grep doublezero0 # assert: routes present
```
**Changes:**
- laconic-so: support `initContainers` with `restartPolicy: Always` in
compose-to-k8s translation (or: define doublezerod as native sidecar in
compose via `x-kubernetes-init-container` extension or equivalent)
- Alternatively: post-deploy kubectl patch to move doublezerod to initContainers
### Step 3: Caddy dual-upstream routing
**Spec:** Caddy routes RPC traffic to whichever backend is healthy. Adding a second
healthy backend on :8897 causes traffic to shift without configuration changes.
**Test:**
```bash
# Start a test HTTP server on :8897 with /health
python3 -c "
from http.server import HTTPServer, BaseHTTPRequestHandler
class H(BaseHTTPRequestHandler):
def do_GET(self):
self.send_response(200); self.end_headers(); self.wfile.write(b'ok')
HTTPServer(('', 8897), H).serve_forever()
" &
# Verify Caddy discovers it
sleep 10
curl -sf https://biscayne.vaasl.io/health
# assert: 200
kill %1
```
**Changes:**
- Update Caddy ingress config with dual upstreams and health checks
### Step 4: ZFS clone and blue-green tooling
**Spec:** A script creates a ZFS clone, starts a blue deployment on alternate ports
using the cloned data, and the deployment catches up and becomes healthy.
**Test:**
```bash
# Run the clone + deploy script
./scripts/blue-green-prepare.sh --target-version v2.2.1
# assert: ZFS clone exists
zfs list biscayne/DATA/volumes/solana-blue
# assert: blue deployment exists and is catching up
kubectl -n <ns> get deployment agave-blue
# assert: blue RPC eventually becomes healthy
timeout 600 bash -c 'until curl -sf http://localhost:8897/health; do sleep 5; done'
```
**Changes:**
- `scripts/blue-green-prepare.sh` — ZFS snapshot, clone, deploy B
- `scripts/blue-green-promote.sh` — tear down A, optional port swap
- `scripts/blue-green-rollback.sh` — destroy B, restore A
- Parameterized deployment spec (kustomize overlay or env-driven)
### Step 5: End-to-end upgrade test
**Spec:** Full upgrade cycle completes with zero dropped RPC requests.
**Test:**
```bash
# Start continuous health probe in background
while true; do
curl -sf -o /dev/null -w "%{http_code} %{time_total}\n" \
https://biscayne.vaasl.io/health || echo "FAIL $(date)"
sleep 0.5
done > /tmp/health-probe.log &
# Execute full blue-green upgrade
./scripts/blue-green-prepare.sh --target-version v2.2.1
# wait for blue to sync...
./scripts/blue-green-promote.sh
# Stop probe
kill %1
# assert: no FAIL lines in probe log
grep -c FAIL /tmp/health-probe.log
# assert: 0
```

View File

@ -0,0 +1,85 @@
# Bug: Ashburn Relay — 137.239.194.65 Not Routable from Public Internet
## Summary
`--gossip-host 137.239.194.65` correctly advertises the Ashburn relay IP in
ContactInfo for all sockets (gossip, TVU, repair, TPU). However, 137.239.194.65
is a DoubleZero overlay IP (137.239.192.0/19, IS-IS only) that is NOT announced
via BGP to the public internet. Public peers cannot route to it, so TVU shreds,
repair requests, and TPU traffic never arrive at was-sw01.
## Evidence
- Gossip traffic arrives on `doublezero0` interface:
```
doublezero0 In IP 64.130.58.70.8001 > 137.239.194.65.8001: UDP, length 132
```
- Zero TVU/repair traffic arrives:
```
tcpdump -i doublezero0 'dst host 137.239.194.65 and udp and not port 8001'
0 packets captured
```
- ContactInfo correctly advertises all sockets on 137.239.194.65:
```json
{
"gossip": "137.239.194.65:8001",
"tvu": "137.239.194.65:9000",
"serveRepair": "137.239.194.65:9011",
"tpu": "137.239.194.65:9002"
}
```
- Outbound gossip from biscayne exits via `doublezero0` with source
137.239.194.65 — SNAT and routing work correctly in the outbound direction.
## Root Cause
**137.239.194.0/24 is not routable from the public internet.** The prefix
belongs to DoubleZero's overlay address space (137.239.192.0/19, Momentum
Telecom, WHOIS OriginAS: empty). It is advertised only via IS-IS within the
DoubleZero switch mesh. There is no eBGP session on was-sw01 to advertise it
to the ISP — all BGP peers are iBGP AS 65342 (DoubleZero internal).
When the validator advertises `tvu: 137.239.194.65:9000` in ContactInfo,
public internet peers attempt to send turbine shreds to that IP, but the
packets have no route through the global BGP table to reach was-sw01. Only
DoubleZero-connected peers could potentially reach it via the overlay.
The old shred relay pipeline worked because it used `--public-tvu-address
64.92.84.81:20000` — was-sw01's Et1/1 ISP uplink IP, which IS publicly
routable. The `--gossip-host 137.239.194.65` approach advertises a
DoubleZero-only IP for ALL sockets, making TVU/repair/TPU unreachable from
non-DoubleZero peers.
The original hypothesis (ACL/PBR port filtering) was wrong. The tunnel and
switch routing work correctly — the problem is upstream: traffic never arrives
at was-sw01 in the first place.
## Impact
The validator cannot receive turbine shreds or serve repair requests via the
low-latency Ashburn path. It falls back to the Miami public IP (186.233.184.235)
for all shred/repair traffic, negating the benefit of `--gossip-host`.
## Fix Options
1. **Use 64.92.84.81 (was-sw01 Et1/1) for ContactInfo sockets.** This is the
publicly routable Ashburn IP. Requires `--gossip-host 64.92.84.81` (or
equivalent `--bind-address` config) and DNAT/forwarding on was-sw01 to relay
traffic through the backbone → mia-sw01 → Tunnel500 → biscayne. The old
`--public-tvu-address` pipeline used this IP successfully.
2. **Get DoubleZero to announce 137.239.194.0/24 via eBGP to the ISP.** This
would make the current `--gossip-host 137.239.194.65` setup work, but
requires coordination with DoubleZero operations.
3. **Hybrid approach**: Use 64.92.84.81 for public-facing sockets (TVU, repair,
TPU) and 137.239.194.65 for gossip (which works via DoubleZero overlay).
Requires agave to support per-protocol address binding, which it does not
(`--gossip-host` sets ALL sockets to the same IP).
## Previous Workaround
The old `--public-tvu-address` pipeline used socat + shred-unwrap.py to relay
shreds from 64.92.84.81:20000 to the validator. That pipeline is not persistent
across reboots and was superseded by the `--gossip-host` approach (which turned
out to be broken for non-DoubleZero peers).

View File

@ -0,0 +1,51 @@
# Bug: laconic-so etcd cleanup wipes core kubernetes service
## Summary
`_clean_etcd_keeping_certs()` in laconic-stack-orchestrator 1.1.0 deletes the `kubernetes` service from etcd, breaking cluster networking on restart.
## Component
`stack_orchestrator/deploy/k8s/helpers.py``_clean_etcd_keeping_certs()`
## Reproduction
1. Deploy with `laconic-so` to a k8s-kind target with persisted etcd (hostPath mount in kind-config.yml)
2. `laconic-so deployment --dir <dir> stop` (destroys cluster)
3. `laconic-so deployment --dir <dir> start` (recreates cluster with cleaned etcd)
## Symptoms
- `kindnet` pods enter CrashLoopBackOff with: `panic: unable to load in-cluster configuration, KUBERNETES_SERVICE_HOST and KUBERNETES_SERVICE_PORT must be defined`
- `kubectl get svc kubernetes -n default` returns `NotFound`
- coredns, caddy, local-path-provisioner stuck in Pending (no CNI without kindnet)
- No pods can be scheduled
## Root Cause
`_clean_etcd_keeping_certs()` uses a whitelist that only preserves `/registry/secrets/caddy-system` keys. All other etcd keys are deleted, including `/registry/services/specs/default/kubernetes` — the core `kubernetes` ClusterIP service that kube-apiserver auto-creates.
When the kind cluster starts with the cleaned etcd, kube-apiserver sees the existing etcd data and does not re-create the `kubernetes` service. kindnet depends on the `KUBERNETES_SERVICE_HOST` environment variable which is injected by the kubelet from this service — without it, kindnet panics.
## Fix Options
1. **Expand the whitelist** to include `/registry/services/specs/default/kubernetes` and other core cluster resources
2. **Fully wipe etcd** instead of selective cleanup — let the cluster bootstrap fresh (simpler, but loses Caddy TLS certs)
3. **Don't persist etcd at all** — ephemeral etcd means clean state every restart (recommended for kind deployments)
## Workaround
Fully delete the kind cluster before `start`:
```bash
kind delete cluster --name <cluster-name>
laconic-so deployment --dir <dir> start
```
This forces fresh etcd bootstrap. Downside: all other services deployed to the cluster (DaemonSets, other namespaces) are destroyed.
## Impact
- Affects any k8s-kind deployment with persisted etcd
- Cluster is unrecoverable without full destroy+recreate
- All non-laconic-so-managed workloads in the cluster are lost

View File

@ -0,0 +1,75 @@
# Bug: laconic-so crashes on re-deploy when caddy ingress already exists
## Summary
`laconic-so deployment start` crashes with `FailToCreateError` when the kind cluster already has caddy ingress resources installed. The deployer uses `create_from_yaml()` which fails on `AlreadyExists` conflicts instead of applying idempotently. This prevents the application deployment from ever being reached — the crash happens before any app manifests are applied.
## Component
`stack_orchestrator/deploy/k8s/deploy_k8s.py:366``up()` method
`stack_orchestrator/deploy/k8s/helpers.py:369``install_ingress_for_kind()`
## Reproduction
1. `kind delete cluster --name laconic-70ce4c4b47e23b85`
2. `laconic-so deployment --dir /srv/deployments/agave start` — creates cluster, loads images, installs caddy ingress, but times out or is interrupted before app deployment completes
3. `laconic-so deployment --dir /srv/deployments/agave start` — crashes immediately after image loading
## Symptoms
- Traceback ending in:
```
kubernetes.utils.create_from_yaml.FailToCreateError:
Error from server (Conflict): namespaces "caddy-system" already exists
Error from server (Conflict): serviceaccounts "caddy-ingress-controller" already exists
Error from server (Conflict): clusterroles.rbac.authorization.k8s.io "caddy-ingress-controller" already exists
...
```
- Namespace `laconic-laconic-70ce4c4b47e23b85` exists but is empty — no pods, no deployments, no events
- Cluster is healthy, images are loaded, but no app manifests are applied
## Root Cause
`install_ingress_for_kind()` calls `kubernetes.utils.create_from_yaml()` which uses `POST` (create) semantics. If the resources already exist (from a previous partial run), every resource returns `409 Conflict` and `create_from_yaml` raises `FailToCreateError`, aborting the entire `up()` method before the app deployment step.
The first `laconic-so start` after a fresh `kind delete` works because:
1. Image loading into the kind node takes 5-10 minutes (images are ~10GB+)
2. Caddy ingress is installed successfully
3. App deployment begins
But if that first run is interrupted (timeout, Ctrl-C, ansible timeout), the second run finds caddy already installed and crashes.
## Fix Options
1. **Use server-side apply** instead of `create_from_yaml()``kubectl apply` is idempotent
2. **Check if ingress exists before installing** — skip `install_ingress_for_kind()` if caddy-system namespace exists
3. **Catch `AlreadyExists` and continue** — treat 409 as success for infrastructure resources
## Workaround
Delete the caddy ingress resources before re-running:
```bash
kubectl delete namespace caddy-system
kubectl delete clusterrole caddy-ingress-controller
kubectl delete clusterrolebinding caddy-ingress-controller
kubectl delete ingressclass caddy
laconic-so deployment --dir /srv/deployments/agave start
```
Or nuke the entire cluster and start fresh:
```bash
kind delete cluster --name laconic-70ce4c4b47e23b85
laconic-so deployment --dir /srv/deployments/agave start
```
## Interaction with ansible timeout
The `biscayne-redeploy.yml` playbook sets a 600s timeout on the `laconic-so deployment start` task. Image loading alone can exceed this on a fresh cluster (images must be re-loaded into the new kind node). When ansible kills the process at 600s, the caddy ingress is already installed but the app is not — putting the cluster into the broken state described above. Subsequent playbook runs hit this bug on every attempt.
## Impact
- Blocks all re-deploys on biscayne without manual cleanup
- The playbook cannot recover automatically — every retry hits the same conflict
- Discovered 2026-03-05 during full wipe redeploy of biscayne validator

View File

@ -0,0 +1,121 @@
# DoubleZero Multicast Access Requests
## Status (2026-03-06)
DZ multicast is **still in testnet** (client v0.2.2). Multicast groups are defined
on the DZ ledger with on-chain access control (publishers/subscribers). The testnet
allocates addresses from 233.84.178.0/24 (AS21682). Not yet available for production
Solana shred delivery.
## Biscayne Connection Details
Provide these details when requesting subscriber access:
| Field | Value |
|-------|-------|
| Client IP | 186.233.184.235 |
| Validator identity | 4WeLUxfQghbhsLEuwaAzjZiHg2VBw87vqHc4iZrGvKPr |
| DZ identity | 3Bw6v7EruQvTwoY79h2QjQCs2KBQFzSneBdYUbcXK1Tr |
| DZ device | laconic-mia-sw01 |
| Contributor / tenant | laconic |
## Jito ShredStream
**Not a DZ multicast group.** ShredStream is Jito's own shred delivery service,
independent of DoubleZero multicast. It provides low-latency shreds from leaders
on the Solana network via a proxy client that connects to the Jito Block Engine.
| Property | Value |
|----------|-------|
| What it does | Delivers shreds from Jito-connected leaders with low latency. Provides a redundant shred path for servers in remote locations. |
| How it works | `shredstream-proxy` authenticates to a Jito Block Engine via keypair, receives shreds, forwards them to configured UDP destinations (e.g. validator TVU port). |
| Cost | **Unknown.** Docs don't list pricing. Was previously "complimentary" for searchers (2024). May require approval. |
| Requirements | Approved Solana pubkey (form submission), auth keypair, firewall open on UDP 20000, TVU port of your node. |
| Regions | Amsterdam, Dublin, Frankfurt, London, New York, Salt Lake City, Singapore, Tokyo. Max 2 regions selectable. |
| Limitations | No NAT support. Bridge networking incompatible with multicast mode. |
| Repo | https://github.com/jito-labs/shredstream-proxy |
| Docs | https://docs.jito.wtf/lowlatencytxnfeed/ |
| Status for biscayne | **Not yet requested.** Need to submit pubkey for approval. |
ShredStream is relevant to our shred completeness problem — it provides an additional
shred source beyond turbine and the Ashburn relay. It would run as a sidecar process
forwarding shreds to the validator's TVU port.
## DZ Multicast Groups
DZ multicast uses PIM (Protocol Independent Multicast) and MSDP (Multicast Source
Discovery Protocol). Group owners define allowed publishers and subscribers on the
DZ ledger. Switch ASICs handle packet replication — no CPU overhead.
### bebop
Listed in earlier notes as a multicast shred distribution group. **No public
documentation found.** Cannot confirm this exists as a DZ multicast group.
- **Owner:** Unknown
- **Status:** Unverified — may not exist as described
### turbine (future)
Solana's native shred propagation via DZ multicast. Jito has expressed interest
in leveraging multicast for shred delivery. Not yet available for production use.
- **Owner:** Solana Foundation / Anza (native turbine), Jito (shredstream)
- **Status:** Testnet only (DZ client v0.2.2)
## bloXroute OFR (Optimized Feed Relay)
Commercial shred delivery service. Runs a gateway docker container on your node that
connects to bloXroute's BDN (Blockchain Distribution Network) to receive shreds
faster than default turbine (~30-50ms improvement, beats turbine ~98% of the time).
| Property | Value |
|----------|-------|
| What it does | Delivers shreds via bloXroute's BDN with optimized relay topologies. Not just a different turbine path — uses their own distribution network. |
| How it works | Docker gateway container on your node, communicates with bloXroute OFR relay over UDP 18888. Forwards shreds to your validator. |
| Cost | **$300/mo** (Professional, 1500 tx/day), **$1,250/mo** (Enterprise, unlimited tx). OFR gateway without local node requires Enterprise Elite ($5,000+/mo). |
| Requirements | Docker, UDP port 18888 open, bloXroute subscription. |
| Open source | Gateway at https://github.com/bloXroute-Labs/solana-gateway |
| Docs | https://docs.bloxroute.com/solana/optimized-feed-relay |
| Status for biscayne | **Not yet evaluated.** Monthly cost may not be justified. |
bloXroute's value proposition: they operate nodes at multiple turbine tree positions
across their network, aggregate shreds, and redistribute via their BDN. This is the
"multiple identities collecting different shreds" approach — but operated by bloXroute,
not by us.
## How These Services Get More Shreds
Turbine tree position is determined by validator identity (pubkey). A single validator
gets shreds from one position in the tree per slot. Services like Jito ShredStream
and bloXroute OFR operate many nodes with different identities across the turbine
tree, aggregate the shreds they each receive, and redistribute the combined set to
subscribers. This is why they can deliver shreds the subscriber's own turbine position
would never see.
**An open-source equivalent would require running multiple lightweight validator
identities (non-voting, minimal stake) at different locations, each collecting shreds
from their unique turbine tree position, and forwarding them to the main validator.**
No known open-source project implements this pattern.
## Sources
- [Jito ShredStream docs](https://docs.jito.wtf/lowlatencytxnfeed/)
- [shredstream-proxy repo](https://github.com/jito-labs/shredstream-proxy)
- [bloXroute OFR docs](https://docs.bloxroute.com/solana/optimized-feed-relay)
- [bloXroute pricing](https://bloxroute.com/pricing/)
- [bloXroute OFR intro](https://bloxroute.com/pulse/introducing-ofrs-faster-shreds-better-performance-on-solana/)
- [DZ multicast announcement](https://doublezero.xyz/journal/doublezero-introduces-multicast-support-smarter-faster-data-delivery-for-distributed-systems)
## Request Template
When contacting a group owner, use something like:
> We'd like to subscribe to your DoubleZero multicast group for our Solana
> validator. Our details:
>
> - Validator: 4WeLUxfQghbhsLEuwaAzjZiHg2VBw87vqHc4iZrGvKPr
> - DZ identity: 3Bw6v7EruQvTwoY79h2QjQCs2KBQFzSneBdYUbcXK1Tr
> - Client IP: 186.233.184.235
> - Device: laconic-mia-sw01
> - Tenant: laconic

View File

@ -0,0 +1,121 @@
# DoubleZero Current State and Bug Fixes
## Biscayne Connection Details
| Field | Value |
|-------|-------|
| Host | biscayne.vaasl.io (186.233.184.235) |
| DZ identity | `3Bw6v7EruQvTwoY79h2QjQCs2KBQFzSneBdYUbcXK1Tr` |
| Validator identity | `4WeLUxfQghbhsLEuwaAzjZiHg2VBw87vqHc4iZrGvKPr` |
| Nearest device | laconic-mia-sw01 (0.3ms) |
| DZ version (host) | 0.8.10 |
| DZ version (container) | 0.8.11 |
| k8s version | 1.35.1 (kind) |
## Current State (2026-03-03)
The host systemd `doublezerod` is connected and working. The container sidecar
doublezerod is broken. Both are running simultaneously.
| Instance | Identity | Status |
|----------|----------|--------|
| Host systemd | `3Bw6v7...` (correct) | BGP Session Up, IBRL to laconic-mia-sw01 |
| Container sidecar | `Cw9qun...` (wrong) | Disconnected, error loop |
| DaemonSet manifest | N/A | Never applied, dead code |
### Access pass
The access pass for 186.233.184.235 is registered and connected:
```
type: prepaid
payer: 3Bw6v7EruQvTwoY79h2QjQCs2KBQFzSneBdYUbcXK1Tr
status: connected
owner: DZfLKFDgLShjY34WqXdVVzHUvVtrYXb7UtdrALnGa8jw
```
## Bugs
### BUG-1: Container doublezerod has wrong identity
The entrypoint script (`entrypoint.sh`) auto-generates a new `id.json` if one isn't
found. The volume at `/srv/deployments/agave/data/doublezero-config/` was empty at
first boot, so it generated `Cw9qun...` instead of using the registered identity.
**Root cause:** The real `id.json` lives at `/home/solana/.config/doublezero/id.json`
(created by the host-level DZ install). The container volume is a separate path that
was never seeded.
**Fix:**
```bash
sudo cp /home/solana/.config/doublezero/id.json \
/srv/deployments/agave/data/doublezero-config/id.json
```
### BUG-2: Container doublezerod can't resolve DZ passport program
`DOUBLEZERO_RPC_ENDPOINT` in `spec.yml` is `http://127.0.0.1:8899` — the local
validator. But the local validator hasn't replayed enough slots to have the DZ
passport program accounts (`ser2VaTMAcYTaauMrTSfSrxBaUDq7BLNs2xfUugTAGv`).
doublezerod calls `GetProgramAccounts` every 30 seconds and gets empty results.
**Fix in `deployment/spec.yml`:**
```yaml
# Use public RPC for DZ bootstrapping until local validator is caught up
DOUBLEZERO_RPC_ENDPOINT: https://api.mainnet-beta.solana.com
```
Switch back to `http://127.0.0.1:8899` once the local validator is synced.
### BUG-3: Container doublezerod lacks hostNetwork
laconic-so was not translating `network_mode: host` from compose files to
`hostNetwork: true` in generated k8s pod specs. Without host network access, the
container can't create GRE tunnels (IP proto 47) or run BGP (tcp/179 on
169.254.0.0/16).
**Fix:** Deploy with stack-orchestrator branch `fix/k8s-port-mappings-hostnetwork-v2`
(commit `fb69cc58`, 2026-03-03) which adds automatic hostNetwork detection.
### BUG-4: DaemonSet workaround is dead code
`deployment/k8s-manifests/doublezero-daemonset.yaml` was a workaround for BUG-3.
Now that laconic-so supports hostNetwork natively, it should be deleted.
**Fix:** Remove `deployment/k8s-manifests/doublezero-daemonset.yaml` from agave-stack.
### BUG-5: Two doublezerod instances running simultaneously
The host systemd `doublezerod` and the container sidecar are both running. Once the
container is fixed (BUG-1 through BUG-3), the host service must be disabled to avoid
two processes fighting over the GRE tunnel.
**Fix:**
```bash
sudo systemctl stop doublezerod
sudo systemctl disable doublezerod
```
## Diagnostic Commands
Always use `sudo -u solana` for host-level DZ commands — the identity is under
`/home/solana/.config/doublezero/`.
```bash
# Host
sudo -u solana doublezero address # expect 3Bw6v7...
sudo -u solana doublezero status # tunnel state
sudo -u solana doublezero latency # device reachability
sudo -u solana doublezero access-pass list | grep 186.233.184 # access pass
sudo -u solana doublezero balance # credits
ip route | grep doublezero0 # BGP routes
# Container (from kind node)
kubectl -n <ns> exec <pod> -c doublezerod -- doublezero address
kubectl -n <ns> exec <pod> -c doublezerod -- doublezero status
kubectl -n <ns> exec <pod> -c doublezerod -- doublezero --version
# Logs
kubectl -n <ns> logs <pod> -c doublezerod --tail=30
sudo journalctl -u doublezerod -f # host systemd logs
```

View File

@ -0,0 +1,65 @@
# Feature: Use local registry for kind image loading
## Summary
`laconic-so deployment start` uses `kind load docker-image` to copy container images from the host Docker daemon into the kind node's containerd. This serializes the full image (`docker save`), pipes it through `docker exec`, and deserializes it (`ctr image import`). For biscayne's ~837MB agave image plus the doublezero image, this takes 5-10 minutes on every cluster recreate — copying between two container runtimes on the same machine.
## Current behavior
```
docker build → host Docker daemon (image stored once)
kind load docker-image → docker save | docker exec kind-node ctr import (full copy)
```
This happens in `stack_orchestrator/deploy/k8s/deploy_k8s.py` every time `laconic-so deployment start` runs and the image isn't already present in the kind node.
## Proposed behavior
Run a persistent local registry (`registry:2`) on the host. `laconic-so` pushes images there after build. Kind's containerd is configured to pull from it.
```
docker build → docker tag localhost:5001/image → docker push localhost:5001/image
kind node containerd → pulls from localhost:5001 (fast, no serialization)
```
The registry container persists across kind cluster deletions. Images are always available without reloading.
## Implementation
1. **Registry container**: `docker run -d --restart=always -p 5001:5000 --name kind-registry registry:2`
2. **Kind config** — add registry mirror to `containerdConfigPatches` in kind-config.yml:
```yaml
containerdConfigPatches:
- |-
[plugins."io.containerd.grpc.v1.cri".registry.mirrors."localhost:5001"]
endpoint = ["http://kind-registry:5000"]
```
3. **Connect registry to kind network**: `docker network connect kind kind-registry`
4. **laconic-so change** — in `deploy_k8s.py`, replace `kind load docker-image` with:
```python
# Tag and push to local registry instead of kind load
docker tag image:local localhost:5001/image:local
docker push localhost:5001/image:local
```
5. **Compose files** — image references change from `laconicnetwork/agave:local` to `localhost:5001/laconicnetwork/agave:local`
Kind documents this pattern: https://kind.sigs.k8s.io/docs/user/local-registry/
## Impact
- Eliminates 5-10 minute image loading step on every cluster recreate
- Registry persists across `kind delete cluster` — no re-push needed unless the image itself changes
- `docker push` to a local registry is near-instant (shared filesystem, layer dedup)
- Unblocks faster iteration on redeploy cycles
## Scope
This is a `stack-orchestrator` change, specifically in `deploy_k8s.py`. The kind-config.yml also needs the registry mirror config, which `laconic-so` generates from `spec.yml`.
## Discovered
2026-03-05 — during biscayne full wipe redeploy, `laconic-so start` spent most of its runtime on `kind load docker-image`, causing ansible timeouts and cascading failures (caddy ingress conflict bug).

View File

@ -0,0 +1,78 @@
# Known Issues
## BUG-6: Validator logging not configured, only stdout available
**Observed:** 2026-03-03
The validator only logs to stdout. kubectl logs retains ~2 minutes of history
at current log volume before the buffer fills. When diagnosing a replay stall,
the startup logs (snapshot load, initial replay, error conditions) were gone.
**Impact:** Cannot determine why the validator replay stage stalled — the
startup logs that would show the root cause are not available.
**Fix:** Configure the `--log` flag in the validator start script to write to
a persistent volume, so logs survive container restarts and aren't limited
to the kubectl buffer.
## BUG-7: Metrics endpoint unreachable from validator pod
**Observed:** 2026-03-03
```
WARN solana_metrics::metrics submit error: error sending request for url
(http://localhost:8086/write?db=agave_metrics&u=admin&p=admin&precision=n)
```
The validator is configured with `SOLANA_METRICS_CONFIG` pointing to
`http://172.20.0.1:8086` (the kind docker bridge gateway), but the logs show
it trying `localhost:8086`. The InfluxDB container (`solana-monitoring-influxdb-1`)
is running on the host, but the validator can't reach it.
**Impact:** No metrics collection. Cannot use Grafana dashboards to diagnose
performance issues or track sync progress over time.
## BUG-8: sysctl values not visible inside kind container
**Observed:** 2026-03-03
```
ERROR solana_core::system_monitor_service Failed to query value for net.core.rmem_max: no such sysctl
WARN solana_core::system_monitor_service net.core.rmem_max: recommended=134217728, current=-1 too small
```
The host has correct sysctl values (`net.core.rmem_max = 134217728`), but
`/proc/sys/net/core/` does not exist inside the kind node container. The
validator reads `-1` and reports the buffer as too small.
The network buffers themselves may still be effective (they're set on the
host network namespace which the pod shares via `hostNetwork: true`), but
this is unverified. If the buffers are not effective, it could limit shred
ingestion throughput and contribute to slow repair.
**Fix options:**
- Set sysctls on the kind node container at creation time
(`kind` supports `kubeadmConfigPatches` and sysctl configuration)
- Verify empirically whether the host sysctls apply to hostNetwork pods
by checking actual socket buffer sizes from inside the pod
## Validator replay stall (under investigation)
**Observed:** 2026-03-03
The validator root has been stuck at slot 403,892,310 for 55+ minutes.
The gap to the cluster tip is ~120,000 slots and growing.
**Observed symptoms:**
- Zero `Frozen` banks in log history — replay stage is not processing slots
- All incoming slots show `bank_status: Unprocessed`
- Repair only requests tip slots and two specific old slots (403,892,310,
403,909,228) — not the ~120k slot gap
- Repair peer count is 3-12 per cycle (vs 1,000+ gossip peers)
- Startup logs have rotated out (BUG-6), so initialization context is lost
**Unknown:**
- What snapshot the validator loaded at boot
- Whether replay ever started or was blocked from the beginning
- Whether the sysctl issue (BUG-8) is limiting repair throughput
- Whether the missing metrics (BUG-7) would show what's happening internally

View File

@ -0,0 +1,191 @@
# Shred Collector Relay
## Problem
Turbine assigns each validator a single position in the shred distribution tree
per slot, determined by its pubkey. A validator in Miami with one identity receives
shreds from one set of tree neighbors — typically ~60-70% of shreds for any given
slot. The remaining 30-40% must come from the repair protocol, which is too slow
to keep pace with chain production (see analysis below).
Commercial services (Jito ShredStream, bloXroute OFR) solve this by running many
nodes with different identities across the turbine tree, aggregating shreds, and
redistributing the combined set to subscribers. This works but costs $300-5,000/mo
and adds a dependency on a third party.
## Concept
Run lightweight **shred collector** nodes at multiple geographic locations on
the Laconic network (Ashburn, Dallas, etc.). Each collector has its own keypair,
joins gossip with a unique identity, receives turbine shreds from its unique tree
position, and forwards raw shred packets to the main validator in Miami. The main
validator inserts these shreds into its blockstore alongside its own turbine shreds,
increasing completeness toward 100% without relying on repair.
```
Turbine Tree
/ | \
/ | \
collector-ash collector-dfw biscayne (main validator)
(Ashburn) (Dallas) (Miami)
identity A identity B identity C
~60% shreds ~60% shreds ~60% shreds
\ | /
\ | /
→ UDP forward via DZ backbone →
|
biscayne blockstore
~95%+ shreds (union of ABC)
```
Each collector sees a different ~60% slice of the turbine tree. The union of
three independent positions yields ~94% coverage (1 - 0.4³ = 0.936). Four
collectors yield ~97%. The main validator fills the remaining few percent via
repair, which is fast when only 3-6% of shreds are missing.
## Why This Works
The math from biscayne's recovery (2026-03-06):
| Metric | Value |
|--------|-------|
| Compute-bound replay (complete blocks) | 5.2 slots/sec |
| Repair-bound replay (incomplete blocks) | 0.5 slots/sec |
| Chain production rate | 2.5 slots/sec |
| Turbine + relay delivery per identity | ~60-70% |
| Repair bandwidth | ~600 shreds/sec (estimated) |
| Repair needed to converge at 60% delivery | 5x current bandwidth |
| Repair needed to converge at 95% delivery | Easily sufficient |
At 60% shred delivery, repair must fill 40% per slot — too slow to converge.
At 95% delivery (3 collectors), repair fills 5% per slot — well within capacity.
The validator replays at near compute-bound speed (5+ slots/sec) and converges.
## Infrastructure
Laconic already has DZ-connected switches at multiple sites:
| Site | Device | Latency to Miami | Backbone |
|------|--------|-------------------|----------|
| Miami | laconic-mia-sw01 | 0.24ms | local |
| Ashburn | laconic-was-sw01 | ~29ms | Et4/1 25.4ms |
| Dallas | laconic-dfw-sw01 | ~30ms | TBD |
The DZ backbone carries traffic between sites at line rate. Shred packets are
~1280 bytes each. At ~3,000 shreds/slot and 2.5 slots/sec, each collector
forwards ~7,500 packets/sec (~10 MB/s) — trivial bandwidth for the backbone.
## Collector Architecture
The collector does NOT need to be a full validator. It needs to:
1. **Join gossip** — advertise a ContactInfo with its own pubkey and a TVU
address (the site's IP)
2. **Receive turbine shreds** — UDP packets on the advertised TVU port
3. **Forward shreds** — retransmit raw UDP packets to biscayne's TVU port
It does NOT need to: replay transactions, maintain accounts state, store a
ledger, load a snapshot, vote, or run RPC.
### Option A: Firedancer Minimal Build
Firedancer (Apache 2, C) has a tile-based architecture where each function
(net, gossip, shred, bank, store, etc.) runs as an independent Linux process.
A minimal build using only the networking + gossip + shred tiles would:
- Join gossip and advertise a TVU address
- Receive turbine shreds via the shred tile
- Forward shreds to a configured destination instead of to bank/store
This requires modifying the shred tile to add a UDP forwarder output instead
of (or in addition to) the normal bank handoff. The rest of the tile pipeline
(bank, pack, poh, store) is simply not started.
**Estimated effort:** Moderate. Firedancer's tile architecture is designed for
this kind of composition. The main work is adding a forwarder sink to the shred
tile and testing gossip participation without the full validator stack.
**Source:** https://github.com/firedancer-io/firedancer
### Option B: Agave Non-Voting Minimal
Run `agave-validator --no-voting` with `--limit-ledger-size 0` and minimal
config. Agave still requires a snapshot to start and runs the full process, but
with no voting and minimal ledger it would be lighter than a full node.
**Downside:** Agave is monolithic — you can't easily disable replay/accounts.
It still loads a snapshot, builds the accounts index, and runs replay. This
defeats the purpose of a lightweight collector.
### Option C: Custom Gossip + TVU Receiver
Write a minimal Rust binary using agave's `solana-gossip` and `solana-streamer`
crates to:
1. Bootstrap into gossip via entrypoints
2. Advertise ContactInfo with TVU socket
3. Receive shred packets on TVU
4. Forward them via UDP
**Estimated effort:** Significant. Gossip protocol participation is complex
(CRDS protocol, pull/push protocol, protocol versioning). Using the agave
crates directly is possible but poorly documented for standalone use.
### Option D: Run Collectors on Biscayne
Run the collector processes on biscayne itself, each advertising a TVU address
at a remote site. The switches at each site forward inbound TVU traffic to
biscayne via the DZ backbone using traffic-policy redirects (same pattern as
`ashburn-validator-relay.md`).
**Advantage:** No compute needed at remote sites. Just switch config + loopback
IPs. All collector processes run in Miami.
**Risk:** Gossip advertises IP + port. If the collector runs on biscayne but
advertises an Ashburn IP, gossip protocol interactions (pull requests, pings)
arrive at the Ashburn IP and must be forwarded back to biscayne. This adds
~58ms RTT to gossip protocol messages, which may cause timeouts or peer
quality degradation. Needs testing.
## Recommendation
Option A (Firedancer minimal build) is the correct long-term approach. It
produces a single binary that does exactly one thing: collect shreds from a
unique turbine tree position and forward them. It runs on minimal hardware
(a small VM or container at each site, or on biscayne with remote TVU
addresses).
Option D (collectors on biscayne with switch forwarding) is the fastest to
test since it needs no new software — just switch config and multiple
agave-validator instances with `--no-voting`. The question is whether agave
can start without a snapshot if we only care about gossip + TVU.
## Deployment Topology
```
biscayne (186.233.184.235)
├── agave-validator (main, identity C, TVU 186.233.184.235:9000)
├── collector-ash (identity A, TVU 137.239.194.65:9000)
│ └── shreds forwarded via was-sw01 traffic-policy
├── collector-dfw (identity B, TVU <dfw-ip>:9000)
│ └── shreds forwarded via dfw-sw01 traffic-policy
└── blockstore receives union of ABC shreds
was-sw01 (Ashburn)
└── Loopback: 137.239.194.65
└── traffic-policy: UDP dst 137.239.194.65:9000 → nexthop mia-sw01
dfw-sw01 (Dallas)
└── Loopback: <assigned IP>
└── traffic-policy: UDP dst <assigned IP>:9000 → nexthop mia-sw01
```
## Open Questions
1. Can agave-validator start in gossip-only mode without a snapshot?
2. Does Firedancer's shred tile work standalone without bank/replay?
3. What is the gossip protocol timeout for remote TVU addresses (Option D)?
4. How does the turbine tree handle multiple identities from the same IP
(if running all collectors on biscayne)?
5. Do we need stake on collector identities to be placed in the turbine tree,
or do unstaked nodes still participate?
6. What IP block is available on dfw-sw01 for a collector loopback?

View File

@ -0,0 +1,161 @@
# TVU Shred Relay — Data-Plane Redirect
## Overview
Biscayne's agave validator advertises `64.92.84.81:20000` (laconic-was-sw01 Et1/1) as its TVU
address. Turbine shreds arrive as normal UDP to the switch's front-panel IP. The 7280CR3A ASIC
handles front-panel traffic without punting to Linux userspace — it sees a local interface IP
with no service and drops at the hardware level.
### Previous approach (monitor + socat)
EOS monitor session mirrored matched packets to CPU (mirror0 interface). socat read from mirror0
and relayed to biscayne. shred-unwrap.py on biscayne stripped encapsulation headers.
Fragile: socat ran as a foreground process, died on disconnect.
### New approach (traffic-policy redirect)
EOS `traffic-policy` with `set nexthop` and `system-rule overriding-action redirect` overrides
the ASIC's "local IP, handle myself" decision. The ASIC forwards matched packets to the
specified next-hop at line rate. Pure data plane, no CPU involvement, persists in startup-config.
Available since EOS 4.28.0F on R3 platforms. Confirmed on 4.34.0F.
## Architecture
```
Turbine peers (hundreds of validators)
|
v UDP shreds to 64.92.84.81:20000
laconic-was-sw01 Et1/1 (Ashburn)
| ASIC matches traffic-policy SHRED-RELAY
| Redirects to nexthop 172.16.1.189 (data plane, line rate)
v Et4/1 backbone (25.4ms)
laconic-mia-sw01 Et4/1 (Miami)
| forwards via default route (same metro)
v 0.13ms
biscayne (186.233.184.235, Miami)
| iptables DNAT: dst 64.92.84.81:20000 -> 127.0.0.1:9000
v
agave-validator TVU port (localhost:9000)
```
## Production Config: laconic-was-sw01
### Pre-change safety
```
configure checkpoint save pre-shred-relay
```
Rollback: `rollback running-config checkpoint pre-shred-relay` then `write memory`.
### Config session with auto-revert
```
configure session shred-relay
! ACL for traffic-policy match
ip access-list SHRED-RELAY-ACL
10 permit udp any any eq 20000
! Traffic policy: redirect matched packets to backbone next-hop
traffic-policy SHRED-RELAY
match SHRED-RELAY-ACL
set nexthop 172.16.1.189
! Override ASIC punt-to-CPU for redirected traffic
system-rule overriding-action redirect
! Apply to Et1/1 ingress
interface Ethernet1/1
traffic-policy input SHRED-RELAY
! Remove old monitor session and its ACL
no monitor session 1
no ip access-list SHRED-RELAY
! Review before committing
show session-config diffs
! Commit with 5-minute auto-revert safety net
commit timer 00:05:00
```
After verification: `configure session shred-relay commit` then `write memory`.
### Linux cleanup on was-sw01
```bash
# Kill socat relay (PID 27743)
kill 27743
# Remove Linux kernel route
ip route del 186.233.184.235/32
```
The EOS static route `ip route 186.233.184.235/32 172.16.1.189` stays (general reachability).
## Production Config: biscayne
### iptables DNAT
Traffic-policy sends normal L3-forwarded UDP packets (no mirror encapsulation). Packets arrive
with dst `64.92.84.81:20000` containing clean shred payloads directly in the UDP body.
```bash
sudo iptables -t nat -A PREROUTING -p udp -d 64.92.84.81 --dport 20000 \
-j DNAT --to-destination 127.0.0.1:9000
# Persist across reboot
sudo apt install -y iptables-persistent
sudo netfilter-persistent save
```
### Cleanup
```bash
# Kill shred-unwrap.py (PID 2497694)
kill 2497694
rm /tmp/shred-unwrap.py
```
## Verification
1. `show traffic-policy interface Ethernet1/1` — policy applied
2. `show traffic-policy counters` — packets matching and redirected
3. `sudo iptables -t nat -L PREROUTING -v -n` — DNAT rule with packet counts
4. Validator logs: slot replay rate should maintain ~3.3 slots/sec
5. `ss -unp | grep 9000` — validator receiving on TVU port
## What was removed
| Component | Host |
|-----------|------|
| monitor session 1 | was-sw01 |
| SHRED-RELAY ACL (old) | was-sw01 |
| socat relay process | was-sw01 |
| Linux kernel static route | was-sw01 |
| shred-unwrap.py | biscayne |
## What was added
| Component | Host | Persistent? |
|-----------|------|-------------|
| traffic-policy SHRED-RELAY | was-sw01 | Yes (startup-config) |
| SHRED-RELAY-ACL | was-sw01 | Yes (startup-config) |
| system-rule overriding-action redirect | was-sw01 | Yes (startup-config) |
| iptables DNAT rule | biscayne | Yes (iptables-persistent) |
## Key Details
| Item | Value |
|------|-------|
| Biscayne validator identity | `4WeLUxfQghbhsLEuwaAzjZiHg2VBw87vqHc4iZrGvKPr` |
| Biscayne IP | `186.233.184.235` |
| laconic-was-sw01 public IP | `64.92.84.81` (Et1/1) |
| laconic-was-sw01 backbone IP | `172.16.1.188` (Et4/1) |
| laconic-was-sw01 SSH | `install@137.239.200.198` |
| laconic-mia-sw01 backbone IP | `172.16.1.189` (Et4/1) |
| Backbone RTT (WAS-MIA) | 25.4ms |
| EOS version | 4.34.0F |

View File

@ -0,0 +1,14 @@
all:
hosts:
biscayne:
ansible_host: biscayne.vaasl.io
ansible_user: rix
ansible_become: true
# DoubleZero identities
dz_identity: 3Bw6v7EruQvTwoY79h2QjQCs2KBQFzSneBdYUbcXK1Tr
validator_identity: 4WeLUxfQghbhsLEuwaAzjZiHg2VBw87vqHc4iZrGvKPr
client_ip: 186.233.184.235
dz_device: laconic-mia-sw01
dz_tenant: laconic
dz_environment: mainnet-beta

View File

@ -0,0 +1,23 @@
all:
children:
switches:
vars:
ansible_connection: ansible.netcommon.network_cli
ansible_network_os: arista.eos.eos
ansible_user: install
ansible_become: true
ansible_become_method: enable
hosts:
was-sw01:
ansible_host: 137.239.200.198
# Et1/1: 64.92.84.81 (Ashburn uplink)
# Et4/1: 172.16.1.188 (backbone to mia-sw01)
# Loopback100: 137.239.194.64/32
backbone_ip: 172.16.1.188
backbone_peer: 172.16.1.189
uplink_gateway: 64.92.84.80
mia-sw01:
ansible_host: 209.42.167.133
# Et4/1: 172.16.1.189 (backbone to was-sw01)
backbone_ip: 172.16.1.189
backbone_peer: 172.16.1.188

View File

@ -156,73 +156,62 @@
failed_when: "add_ip.rc != 0 and 'RTNETLINK answers: File exists' not in add_ip.stderr" failed_when: "add_ip.rc != 0 and 'RTNETLINK answers: File exists' not in add_ip.stderr"
tags: [inbound] tags: [inbound]
- name: Add DNAT for gossip UDP - name: Add DNAT rules (inserted before DOCKER chain)
ansible.builtin.iptables: ansible.builtin.shell:
table: nat cmd: |
chain: PREROUTING set -o pipefail
protocol: udp # DNAT rules must be before Docker's ADDRTYPE LOCAL rule, otherwise
destination: "{{ ashburn_ip }}" # Docker's PREROUTING chain swallows traffic to 137.239.194.65 (which
destination_port: "{{ gossip_port }}" # is on loopback and therefore type LOCAL).
jump: DNAT for rule in \
to_destination: "{{ kind_node_ip }}:{{ gossip_port }}" "-p udp -d {{ ashburn_ip }} --dport {{ gossip_port }} -j DNAT --to-destination {{ kind_node_ip }}:{{ gossip_port }}" \
"-p tcp -d {{ ashburn_ip }} --dport {{ gossip_port }} -j DNAT --to-destination {{ kind_node_ip }}:{{ gossip_port }}" \
"-p udp -d {{ ashburn_ip }} --dport {{ dynamic_port_range_start }}:{{ dynamic_port_range_end }} -j DNAT --to-destination {{ kind_node_ip }}" \
; do
if ! iptables -t nat -C PREROUTING $rule 2>/dev/null; then
iptables -t nat -I PREROUTING 1 $rule
echo "added: $rule"
else
echo "exists: $rule"
fi
done
executable: /bin/bash
register: dnat_result
changed_when: "'added' in dnat_result.stdout"
tags: [inbound] tags: [inbound]
- name: Add DNAT for gossip TCP - name: Show DNAT result
ansible.builtin.iptables: ansible.builtin.debug:
table: nat var: dnat_result.stdout_lines
chain: PREROUTING
protocol: tcp
destination: "{{ ashburn_ip }}"
destination_port: "{{ gossip_port }}"
jump: DNAT
to_destination: "{{ kind_node_ip }}:{{ gossip_port }}"
tags: [inbound]
- name: Add DNAT for dynamic ports (UDP 9000-9025)
ansible.builtin.iptables:
table: nat
chain: PREROUTING
protocol: udp
destination: "{{ ashburn_ip }}"
destination_port: "{{ dynamic_port_range_start }}:{{ dynamic_port_range_end }}"
jump: DNAT
to_destination: "{{ kind_node_ip }}"
tags: [inbound] tags: [inbound]
# ------------------------------------------------------------------ # ------------------------------------------------------------------
# Outbound: fwmark + SNAT + policy routing # Outbound: fwmark + SNAT + policy routing
# ------------------------------------------------------------------ # ------------------------------------------------------------------
- name: Mark outbound validator UDP gossip traffic - name: Mark outbound validator traffic (mangle PREROUTING)
ansible.builtin.iptables: ansible.builtin.shell:
table: mangle cmd: |
chain: PREROUTING set -o pipefail
protocol: udp for rule in \
source: "{{ kind_network }}" "-p udp -s {{ kind_network }} --sport {{ gossip_port }} -j MARK --set-mark {{ fwmark }}" \
source_port: "{{ gossip_port }}" "-p udp -s {{ kind_network }} --sport {{ dynamic_port_range_start }}:{{ dynamic_port_range_end }} -j MARK --set-mark {{ fwmark }}" \
jump: MARK "-p tcp -s {{ kind_network }} --sport {{ gossip_port }} -j MARK --set-mark {{ fwmark }}" \
set_mark: "{{ fwmark }}" ; do
if ! iptables -t mangle -C PREROUTING $rule 2>/dev/null; then
iptables -t mangle -A PREROUTING $rule
echo "added: $rule"
else
echo "exists: $rule"
fi
done
executable: /bin/bash
register: mangle_result
changed_when: "'added' in mangle_result.stdout"
tags: [outbound] tags: [outbound]
- name: Mark outbound validator UDP dynamic port traffic - name: Show mangle result
ansible.builtin.iptables: ansible.builtin.debug:
table: mangle var: mangle_result.stdout_lines
chain: PREROUTING
protocol: udp
source: "{{ kind_network }}"
source_port: "{{ dynamic_port_range_start }}:{{ dynamic_port_range_end }}"
jump: MARK
set_mark: "{{ fwmark }}"
tags: [outbound]
- name: Mark outbound validator TCP gossip traffic
ansible.builtin.iptables:
table: mangle
chain: PREROUTING
protocol: tcp
source: "{{ kind_network }}"
source_port: "{{ gossip_port }}"
jump: MARK
set_mark: "{{ fwmark }}"
tags: [outbound] tags: [outbound]
- name: SNAT marked traffic to Ashburn IP (before Docker MASQUERADE) - name: SNAT marked traffic to Ashburn IP (before Docker MASQUERADE)
@ -337,7 +326,7 @@
nat_rules: "{{ nat_rules.stdout_lines }}" nat_rules: "{{ nat_rules.stdout_lines }}"
mangle_rules: "{{ mangle_rules.stdout_lines | default([]) }}" mangle_rules: "{{ mangle_rules.stdout_lines | default([]) }}"
routing: "{{ routing_info.stdout_lines | default([]) }}" routing: "{{ routing_info.stdout_lines | default([]) }}"
loopback: "{{ lo_addrs.stdout_lines }}" loopback: "{{ lo_addrs.stdout_lines | default([]) }}"
tags: [inbound, outbound] tags: [inbound, outbound]
- name: Summary - name: Summary

View File

@ -1,14 +1,19 @@
--- ---
# Configure laconic-mia-sw01 for outbound validator traffic redirect # Configure laconic-mia-sw01 for validator traffic relay (inbound + outbound)
# #
# Redirects outbound traffic from biscayne (src 137.239.194.65) arriving # Outbound: Redirects outbound traffic from biscayne (src 137.239.194.65)
# via the doublezero0 GRE tunnel to was-sw01 via the backbone, preventing # arriving via the doublezero0 GRE tunnel to was-sw01 via the backbone,
# BCP38 drops at mia-sw01's ISP uplink. # preventing BCP38 drops at mia-sw01's ISP uplink.
#
# Inbound: Routes traffic destined to 137.239.194.65 from the default VRF
# to biscayne via Tunnel500 in vrf1. Without this, mia-sw01 sends
# 137.239.194.65 out the ISP uplink back to was-sw01 (routing loop).
# #
# Approach: The existing per-tunnel ACL (SEC-USER-500-IN) controls what # Approach: The existing per-tunnel ACL (SEC-USER-500-IN) controls what
# traffic enters vrf1 from Tunnel500. We add 137.239.194.65 to the ACL # traffic enters vrf1 from Tunnel500. We add 137.239.194.65 to the ACL
# and add a default route in vrf1 via egress-vrf default pointing to # and add a default route in vrf1 via egress-vrf default pointing to
# was-sw01's backbone IP. No PBR needed — the ACL is the filter. # was-sw01's backbone IP. For inbound, an inter-VRF static route in the
# default VRF forwards 137.239.194.65/32 to biscayne via Tunnel500.
# #
# The other vrf1 tunnels (502, 504, 505) have their own ACLs that only # The other vrf1 tunnels (502, 504, 505) have their own ACLs that only
# permit their specific source IPs, so the default route won't affect them. # permit their specific source IPs, so the default route won't affect them.
@ -39,6 +44,7 @@
tunnel_interface: Tunnel500 tunnel_interface: Tunnel500
tunnel_vrf: vrf1 tunnel_vrf: vrf1
tunnel_acl: SEC-USER-500-IN tunnel_acl: SEC-USER-500-IN
tunnel_nexthop: 169.254.7.7 # biscayne's end of the Tunnel500 /31
backbone_interface: Ethernet4/1 backbone_interface: Ethernet4/1
session_name: validator-outbound session_name: validator-outbound
checkpoint_name: pre-validator-outbound checkpoint_name: pre-validator-outbound
@ -117,6 +123,7 @@
- "show ip route vrf {{ tunnel_vrf }} 0.0.0.0/0" - "show ip route vrf {{ tunnel_vrf }} 0.0.0.0/0"
- "show ip route vrf {{ tunnel_vrf }} {{ backbone_peer }}" - "show ip route vrf {{ tunnel_vrf }} {{ backbone_peer }}"
- "show ip route {{ backbone_peer }}" - "show ip route {{ backbone_peer }}"
- "show ip route {{ ashburn_ip }}"
register: vrf_routing register: vrf_routing
tags: [preflight] tags: [preflight]
@ -163,6 +170,11 @@
# Default route in vrf1 via backbone to was-sw01 (egress-vrf default) # Default route in vrf1 via backbone to was-sw01 (egress-vrf default)
# Safe because per-tunnel ACLs already restrict what enters vrf1 # Safe because per-tunnel ACLs already restrict what enters vrf1
- command: "ip route vrf {{ tunnel_vrf }} 0.0.0.0/0 egress-vrf default {{ backbone_interface }} {{ backbone_peer }}" - command: "ip route vrf {{ tunnel_vrf }} 0.0.0.0/0 egress-vrf default {{ backbone_interface }} {{ backbone_peer }}"
# Inbound: route traffic for ashburn IP from default VRF to biscayne via tunnel.
# Without this, mia-sw01 sends 137.239.194.65 out the ISP uplink → routing loop.
# NOTE: nexthop only, no interface — EOS silently drops cross-VRF routes that
# specify a tunnel interface (accepts in config but never installs in RIB).
- command: "ip route {{ ashburn_ip }}/32 egress-vrf {{ tunnel_vrf }} {{ tunnel_nexthop }}"
- name: Show session diff - name: Show session diff
arista.eos.eos_command: arista.eos.eos_command:
@ -189,6 +201,7 @@
commands: commands:
- "show running-config | section ip access-list {{ tunnel_acl }}" - "show running-config | section ip access-list {{ tunnel_acl }}"
- "show ip route vrf {{ tunnel_vrf }} 0.0.0.0/0" - "show ip route vrf {{ tunnel_vrf }} 0.0.0.0/0"
- "show ip route {{ ashburn_ip }}"
register: verify register: verify
- name: Display verification - name: Display verification
@ -205,6 +218,7 @@
Changes applied: Changes applied:
1. ACL {{ tunnel_acl }}: added "45 permit ip host {{ ashburn_ip }} any" 1. ACL {{ tunnel_acl }}: added "45 permit ip host {{ ashburn_ip }} any"
2. Default route in {{ tunnel_vrf }}: 0.0.0.0/0 egress-vrf default {{ backbone_interface }} {{ backbone_peer }} 2. Default route in {{ tunnel_vrf }}: 0.0.0.0/0 egress-vrf default {{ backbone_interface }} {{ backbone_peer }}
3. Inbound route: {{ ashburn_ip }}/32 egress-vrf {{ tunnel_vrf }} {{ tunnel_nexthop }}
The config will auto-revert in 5 minutes unless committed. The config will auto-revert in 5 minutes unless committed.
Verify on the switch, then commit: Verify on the switch, then commit:

View File

@ -1,15 +1,20 @@
--- ---
# Configure laconic-was-sw01 for full validator traffic relay # Configure laconic-was-sw01 for inbound validator traffic relay
# #
# Replaces the old SHRED-RELAY (TVU-only, port 20000) with VALIDATOR-RELAY # Routes all traffic destined to 137.239.194.65 to mia-sw01 via backbone.
# covering all validator ports (8001, 9000-9025). Adds Loopback101 for # A single static route replaces the previous Loopback101 + PBR approach.
# 137.239.194.65.
# #
# Uses EOS config session with 5-minute auto-revert for safety. # 137.239.194.65 is already routed to was-sw01 by its covering prefix
# After verification, run with -e commit=true to finalize. # (advertised via IS-IS on Loopback100). No loopback needed — the static
# route forwards traffic before the switch tries to deliver it locally.
#
# This playbook also removes the old PBR config if present (Loopback101,
# VALIDATOR-RELAY-ACL, VALIDATOR-RELAY-CLASS, VALIDATOR-RELAY policy-map,
# service-policy on Et1/1).
# #
# Usage: # Usage:
# ansible-playbook -i inventory/switches.yml playbooks/ashburn-relay-was-sw01.yml # ansible-playbook -i inventory/switches.yml playbooks/ashburn-relay-was-sw01.yml
# ansible-playbook -i inventory/switches.yml playbooks/ashburn-relay-was-sw01.yml -e apply=true
# ansible-playbook -i inventory/switches.yml playbooks/ashburn-relay-was-sw01.yml -e commit=true # ansible-playbook -i inventory/switches.yml playbooks/ashburn-relay-was-sw01.yml -e commit=true
# ansible-playbook -i inventory/switches.yml playbooks/ashburn-relay-was-sw01.yml -e rollback=true # ansible-playbook -i inventory/switches.yml playbooks/ashburn-relay-was-sw01.yml -e rollback=true
@ -19,10 +24,11 @@
vars: vars:
ashburn_ip: 137.239.194.65 ashburn_ip: 137.239.194.65
apply: false
commit: false commit: false
rollback: false rollback: false
session_name: validator-relay session_name: validator-relay-v2
checkpoint_name: pre-validator-relay checkpoint_name: pre-validator-relay-v2
tasks: tasks:
# ------------------------------------------------------------------ # ------------------------------------------------------------------
@ -66,77 +72,78 @@
ansible.builtin.meta: end_play ansible.builtin.meta: end_play
# ------------------------------------------------------------------ # ------------------------------------------------------------------
# Pre-checks # Pre-flight checks
# ------------------------------------------------------------------ # ------------------------------------------------------------------
- name: Show current traffic-policy on Et1/1 - name: Show current Et1/1 config
arista.eos.eos_command: arista.eos.eos_command:
commands: commands:
- show running-config interfaces Ethernet1/1 - show running-config interfaces Ethernet1/1
register: et1_config register: et1_config
tags: [preflight]
- name: Show current config - name: Display Et1/1 config
ansible.builtin.debug: ansible.builtin.debug:
var: et1_config.stdout_lines var: et1_config.stdout_lines
tags: [preflight]
- name: Show existing PBR policy on Et1/1 - name: Check for existing Loopback101 and PBR
arista.eos.eos_command: arista.eos.eos_command:
commands: commands:
- "show running-config interfaces Loopback101"
- "show running-config | include service-policy" - "show running-config | include service-policy"
register: existing_pbr - "show running-config section policy-map type pbr"
- "show ip route {{ ashburn_ip }}"
register: existing_config
tags: [preflight]
- name: Show existing PBR config - name: Display existing config
ansible.builtin.debug: ansible.builtin.debug:
var: existing_pbr.stdout_lines var: existing_config.stdout_lines
tags: [preflight]
- name: Pre-flight summary
when: not (apply | bool)
ansible.builtin.debug:
msg: |
=== Pre-flight complete ===
Review the output above:
1. Does Loopback101 exist with {{ ashburn_ip }}? (will be removed)
2. Is service-policy VALIDATOR-RELAY on Et1/1? (will be removed)
3. Current route for {{ ashburn_ip }}
To apply config:
ansible-playbook -i inventory/switches.yml playbooks/ashburn-relay-was-sw01.yml \
-e apply=true
tags: [preflight]
- name: End play if not applying
when: not (apply | bool)
ansible.builtin.meta: end_play
# ------------------------------------------------------------------ # ------------------------------------------------------------------
# Save checkpoint # Apply config via session with 5-minute auto-revert
# ------------------------------------------------------------------ # ------------------------------------------------------------------
- name: Save checkpoint for rollback - name: Save checkpoint
arista.eos.eos_command: arista.eos.eos_command:
commands: commands:
- "configure checkpoint save {{ checkpoint_name }}" - "configure checkpoint save {{ checkpoint_name }}"
register: checkpoint_result
- name: Show checkpoint result - name: Apply config session
ansible.builtin.debug:
var: checkpoint_result.stdout_lines
# ------------------------------------------------------------------
# Apply via config session with 5-minute auto-revert
#
# eos_config writes directly to running-config, bypassing sessions.
# Use eos_command with raw CLI to get the safety net.
# ------------------------------------------------------------------
- name: Apply config session with auto-revert
arista.eos.eos_command: arista.eos.eos_command:
commands: commands:
# Enter named config session
- command: "configure session {{ session_name }}" - command: "configure session {{ session_name }}"
# Loopback101 for Ashburn IP # Remove old PBR service-policy from Et1/1
- command: interface Loopback101
- command: "ip address {{ ashburn_ip }}/32"
- command: exit
# ACL covering all validator ports
- command: ip access-list VALIDATOR-RELAY-ACL
- command: 10 permit udp any any eq 8001
- command: 20 permit udp any any range 9000 9025
- command: 30 permit tcp any any eq 8001
- command: exit
# PBR class-map referencing the ACL
- command: class-map type pbr match-any VALIDATOR-RELAY-CLASS
- command: match ip access-group VALIDATOR-RELAY-ACL
- command: exit
# PBR policy-map with nexthop redirect
- command: policy-map type pbr VALIDATOR-RELAY
- command: class VALIDATOR-RELAY-CLASS
- command: "set nexthop {{ backbone_peer }}"
- command: exit
- command: exit
# Apply PBR policy on Et1/1
- command: interface Ethernet1/1 - command: interface Ethernet1/1
- command: service-policy type pbr input VALIDATOR-RELAY - command: no service-policy type pbr input VALIDATOR-RELAY
- command: exit - command: exit
tags: [config] # Remove old PBR policy-map, class-map, ACL
- command: no policy-map type pbr VALIDATOR-RELAY
- command: no class-map type pbr match-any VALIDATOR-RELAY-CLASS
- command: no ip access-list VALIDATOR-RELAY-ACL
# Remove Loopback101
- command: no interface Loopback101
# Add static route to forward all traffic for ashburn IP to mia-sw01
- command: "ip route {{ ashburn_ip }}/32 {{ backbone_peer }}"
- name: Show session diff - name: Show session diff
arista.eos.eos_command: arista.eos.eos_command:
@ -154,32 +161,20 @@
arista.eos.eos_command: arista.eos.eos_command:
commands: commands:
- "configure session {{ session_name }} commit timer 00:05:00" - "configure session {{ session_name }} commit timer 00:05:00"
tags: [config]
# ------------------------------------------------------------------ # ------------------------------------------------------------------
# Verify # Verify
# ------------------------------------------------------------------ # ------------------------------------------------------------------
- name: Show PBR policy on Et1/1 - name: Verify config
arista.eos.eos_command: arista.eos.eos_command:
commands: commands:
- "show ip route {{ ashburn_ip }}"
- show running-config interfaces Ethernet1/1 - show running-config interfaces Ethernet1/1
- show running-config section policy-map register: verify
- show ip interface Loopback101
register: pbr_interface
- name: Display verification - name: Display verification
ansible.builtin.debug: ansible.builtin.debug:
var: pbr_interface.stdout_lines var: verify.stdout_lines
- name: Show Loopback101
arista.eos.eos_command:
commands:
- show ip interface Loopback101
register: lo101
- name: Display Loopback101
ansible.builtin.debug:
var: lo101.stdout_lines
- name: Reminder - name: Reminder
ansible.builtin.debug: ansible.builtin.debug:
@ -188,8 +183,12 @@
Session: {{ session_name }} Session: {{ session_name }}
Checkpoint: {{ checkpoint_name }} Checkpoint: {{ checkpoint_name }}
Changes applied:
1. Removed: Loopback101, VALIDATOR-RELAY PBR (ACL, class-map, policy-map, service-policy)
2. Added: ip route {{ ashburn_ip }}/32 {{ backbone_peer }}
The config will auto-revert in 5 minutes unless committed. The config will auto-revert in 5 minutes unless committed.
Verify PBR policy is applied, then commit from the switch CLI: Verify on the switch, then commit:
configure session {{ session_name }} commit configure session {{ session_name }} commit
write memory write memory

View File

@ -0,0 +1,107 @@
---
# Configure biscayne OS-level services for agave validator
#
# Installs a systemd unit that formats and mounts the ramdisk on boot.
# /dev/ram0 loses its filesystem on reboot, so mkfs.xfs must run before
# the fstab mount. This unit runs before docker, ensuring the kind node's
# bind mounts always see the ramdisk.
#
# This playbook is idempotent — safe to run multiple times.
#
# Usage:
# ansible-playbook -i biscayne.vaasl.io, playbooks/biscayne-boot.yml
#
- name: Configure OS-level services for agave
hosts: all
gather_facts: false
become: true
vars:
ramdisk_device: /dev/ram0
ramdisk_mount: /srv/solana/ramdisk
accounts_dir: /srv/solana/ramdisk/accounts
tasks:
- name: Install ramdisk format service
copy:
dest: /etc/systemd/system/format-ramdisk.service
mode: "0644"
content: |
[Unit]
Description=Format /dev/ram0 as XFS for Solana accounts
DefaultDependencies=no
Before=local-fs.target
After=systemd-modules-load.service
ConditionPathExists={{ ramdisk_device }}
[Service]
Type=oneshot
RemainAfterExit=yes
ExecStart=/sbin/mkfs.xfs -f {{ ramdisk_device }}
[Install]
WantedBy=local-fs.target
register: unit_file
- name: Install ramdisk post-mount service
copy:
dest: /etc/systemd/system/ramdisk-accounts.service
mode: "0644"
content: |
[Unit]
Description=Create Solana accounts directory on ramdisk
After=srv-solana-ramdisk.mount
Requires=srv-solana-ramdisk.mount
[Service]
Type=oneshot
RemainAfterExit=yes
ExecStart=/bin/bash -c 'mkdir -p {{ accounts_dir }} && chown solana:solana {{ ramdisk_mount }} {{ accounts_dir }}'
[Install]
WantedBy=multi-user.target
register: accounts_unit
- name: Ensure fstab entry uses nofail
lineinfile:
path: /etc/fstab
regexp: '^{{ ramdisk_device }}\s+{{ ramdisk_mount }}'
line: '{{ ramdisk_device }} {{ ramdisk_mount }} xfs noatime,nodiratime,nofail,x-systemd.requires=format-ramdisk.service 0 0'
register: fstab_entry
- name: Reload systemd
systemd:
daemon_reload: true
when: unit_file.changed or accounts_unit.changed or fstab_entry.changed
- name: Enable ramdisk services
systemd:
name: "{{ item }}"
enabled: true
loop:
- format-ramdisk.service
- ramdisk-accounts.service
# ---- apply now if ramdisk not mounted ------------------------------------
- name: Check if ramdisk is mounted
command: mountpoint -q {{ ramdisk_mount }}
register: ramdisk_mounted
failed_when: false
changed_when: false
- name: Format and mount ramdisk now
shell: |
mkfs.xfs -f {{ ramdisk_device }}
mount {{ ramdisk_mount }}
mkdir -p {{ accounts_dir }}
chown solana:solana {{ ramdisk_mount }} {{ accounts_dir }}
when: ramdisk_mounted.rc != 0
# ---- verify --------------------------------------------------------------
- name: Verify ramdisk
command: df -hT {{ ramdisk_mount }}
register: ramdisk_df
changed_when: false
- name: Show ramdisk status
debug:
msg: "{{ ramdisk_df.stdout_lines }}"

View File

@ -0,0 +1,220 @@
---
# Recover agave validator from any state to healthy
#
# This playbook is idempotent — it assesses current state and picks up
# from wherever the system is. Each step checks its precondition and
# skips if already satisfied.
#
# Steps:
# 1. Scale deployment to 0
# 2. Wait for pods to terminate
# 3. Wipe accounts ramdisk
# 4. Clean old snapshots
# 5. Download fresh snapshot via aria2c
# 6. Verify snapshot accessible via PV (kubectl)
# 7. Scale deployment to 1
# 8. Wait for pod Running
# 9. Verify validator log shows snapshot unpacking
# 10. Check RPC health
#
# Usage:
# ansible-playbook -i biscayne.vaasl.io, playbooks/biscayne-recover.yml
#
# # Pass extra args to snapshot-download.py
# ansible-playbook -i biscayne.vaasl.io, playbooks/biscayne-recover.yml \
# -e 'snapshot_args=--version 2.2'
#
- name: Recover agave validator
hosts: all
gather_facts: false
environment:
KUBECONFIG: /home/rix/.kube/config
vars:
kind_cluster: laconic-70ce4c4b47e23b85
k8s_namespace: "laconic-{{ kind_cluster }}"
deployment_name: "{{ kind_cluster }}-deployment"
snapshot_dir: /srv/solana/snapshots
accounts_dir: /srv/solana/ramdisk/accounts
ramdisk_mount: /srv/solana/ramdisk
ramdisk_device: /dev/ram0
snapshot_script_local: "{{ playbook_dir }}/../scripts/snapshot-download.py"
snapshot_script: /tmp/snapshot-download.py
snapshot_args: ""
# Mainnet RPC for slot comparison
mainnet_rpc: https://api.mainnet-beta.solana.com
# Maximum slots behind before snapshot is considered stale
max_slot_lag: 20000
tasks:
# ---- step 1: scale to 0 ---------------------------------------------------
- name: Get current replica count
command: >
kubectl get deployment {{ deployment_name }}
-n {{ k8s_namespace }}
-o jsonpath='{.spec.replicas}'
register: current_replicas
failed_when: false
changed_when: false
- name: Scale deployment to 0
command: >
kubectl scale deployment {{ deployment_name }}
-n {{ k8s_namespace }} --replicas=0
when: current_replicas.stdout | default('0') | int > 0
changed_when: true
# ---- step 2: wait for pods to terminate ------------------------------------
- name: Wait for pods to terminate
command: >
kubectl get pods -n {{ k8s_namespace }}
-l app={{ deployment_name }}
-o jsonpath='{.items}'
register: pods_remaining
retries: 60
delay: 5
until: pods_remaining.stdout == "[]" or pods_remaining.stdout == ""
changed_when: false
when: current_replicas.stdout | default('0') | int > 0
- name: Verify no agave processes in kind node (io_uring safety check)
command: >
docker exec {{ kind_cluster }}-control-plane
pgrep -c agave-validator
register: agave_procs
failed_when: false
changed_when: false
- name: Fail if agave zombie detected
ansible.builtin.fail:
msg: >-
agave-validator process still running inside kind node after pod
termination. This is the io_uring/ZFS deadlock. Do NOT proceed —
host reboot required. See CLAUDE.md.
when: agave_procs.rc == 0
# ---- step 3: wipe accounts ramdisk -----------------------------------------
# Cannot umount+mkfs because the kind node's bind mount holds it open.
# Instead, delete contents. This is sufficient — agave starts clean.
- name: Wipe accounts data
ansible.builtin.shell: |
rm -rf {{ accounts_dir }}/*
chown solana:solana {{ ramdisk_mount }} {{ accounts_dir }}
become: true
changed_when: true
# ---- step 4: clean old snapshots -------------------------------------------
- name: Remove all old snapshots
ansible.builtin.shell: rm -f {{ snapshot_dir }}/*.tar.* {{ snapshot_dir }}/*.tar
become: true
changed_when: true
# ---- step 5: download fresh snapshot ---------------------------------------
- name: Verify aria2c installed
command: which aria2c
changed_when: false
- name: Copy snapshot script to remote
ansible.builtin.copy:
src: "{{ snapshot_script_local }}"
dest: "{{ snapshot_script }}"
mode: "0755"
- name: Download snapshot and scale to 1
ansible.builtin.shell: |
python3 {{ snapshot_script }} \
-o {{ snapshot_dir }} \
--max-snapshot-age {{ max_slot_lag }} \
--max-latency 500 \
{{ snapshot_args }} \
&& KUBECONFIG=/home/rix/.kube/config kubectl scale deployment \
{{ deployment_name }} -n {{ k8s_namespace }} --replicas=1
become: true
register: snapshot_result
timeout: 3600
changed_when: true
# ---- step 6: verify snapshot accessible via PV -----------------------------
- name: Get snapshot filename
ansible.builtin.shell: ls -1 {{ snapshot_dir }}/snapshot-*.tar.* | head -1 | xargs basename
register: snapshot_filename
changed_when: false
- name: Extract snapshot slot from filename
ansible.builtin.set_fact:
snapshot_slot: "{{ snapshot_filename.stdout | regex_search('snapshot-([0-9]+)-', '\\1') | first }}"
- name: Get current mainnet slot
ansible.builtin.uri:
url: "{{ mainnet_rpc }}"
method: POST
body_format: json
body:
jsonrpc: "2.0"
id: 1
method: getSlot
params:
- commitment: finalized
return_content: true
register: mainnet_slot_response
- name: Check snapshot freshness
ansible.builtin.fail:
msg: >-
Snapshot too old: slot {{ snapshot_slot }}, mainnet at
{{ mainnet_slot_response.json.result }},
{{ mainnet_slot_response.json.result | int - snapshot_slot | int }} slots behind
(max {{ max_slot_lag }}).
when: (mainnet_slot_response.json.result | int - snapshot_slot | int) > max_slot_lag
- name: Report snapshot freshness
ansible.builtin.debug:
msg: >-
Snapshot slot {{ snapshot_slot }}, mainnet {{ mainnet_slot_response.json.result }},
{{ mainnet_slot_response.json.result | int - snapshot_slot | int }} slots behind.
# ---- step 7: scale already done in download step above ----------------------
# ---- step 8: wait for pod running ------------------------------------------
- name: Wait for pod to be running
command: >
kubectl get pods -n {{ k8s_namespace }}
-l app={{ deployment_name }}
-o jsonpath='{.items[0].status.phase}'
register: pod_status
retries: 60
delay: 10
until: pod_status.stdout == "Running"
changed_when: false
# ---- step 9: verify validator log ------------------------------------------
- name: Wait for validator log file
command: >
kubectl exec -n {{ k8s_namespace }}
deployment/{{ deployment_name }}
-c agave-validator -- test -f /data/log/validator.log
register: log_file_check
retries: 12
delay: 10
until: log_file_check.rc == 0
changed_when: false
# ---- step 10: check RPC health ---------------------------------------------
- name: Check RPC health (non-blocking)
ansible.builtin.uri:
url: http://{{ inventory_hostname }}:8899/health
return_content: true
register: rpc_health
retries: 6
delay: 30
until: rpc_health.status == 200
failed_when: false
- name: Report final status
ansible.builtin.debug:
msg: >-
Recovery complete.
Snapshot: slot {{ snapshot_slot }}
({{ mainnet_slot_response.json.result | int - snapshot_slot | int }} slots behind).
Pod: {{ pod_status.stdout }}.
Log: {{ 'writing' if log_file_check.rc == 0 else 'not yet' }}.
RPC: {{ rpc_health.content | default('not yet responding — still catching up') }}.

View File

@ -0,0 +1,321 @@
---
# Redeploy agave-stack on biscayne with aria2c snapshot pre-download
#
# The validator's built-in downloader fetches snapshots at ~18 MB/s (single
# connection). snapshot-download.py uses aria2c with 16 parallel connections to
# saturate available bandwidth, cutting 90+ min downloads to ~10 min.
#
# Flow:
# 1. [teardown] Delete k8s namespace (preserve kind cluster)
# 2. [wipe] Conditionally clear ledger / accounts / old snapshots
# 3. [deploy] laconic-so deployment start, then immediately scale to 0
# 4. [snapshot] Download snapshot via aria2c to host bind mount
# 5. [snapshot] Verify snapshot visible inside kind node
# 6. [deploy] Scale validator back to 1
# 7. [verify] Wait for pod Running, check logs + RPC health
#
# The validator cannot run during snapshot download — it would lock/use the
# snapshot files. laconic-so creates the cluster AND deploys the pod in one
# shot, so we scale to 0 immediately after deploy, download, then scale to 1.
#
# Usage:
# # Standard redeploy (download snapshot, preserve accounts + ledger)
# ansible-playbook -i biscayne.vaasl.io, playbooks/biscayne-redeploy.yml
#
# # Full wipe (accounts + ledger) — slow rebuild
# ansible-playbook -i biscayne.vaasl.io, playbooks/biscayne-redeploy.yml \
# -e wipe_accounts=true -e wipe_ledger=true
#
# # Skip snapshot download (use existing)
# ansible-playbook -i biscayne.vaasl.io, playbooks/biscayne-redeploy.yml \
# -e skip_snapshot=true
#
# # Pass extra args to snapshot-download.py
# ansible-playbook -i biscayne.vaasl.io, playbooks/biscayne-redeploy.yml \
# -e 'snapshot_args=--version 2.2 --min-download-speed 50'
#
# # Snapshot only (no teardown/deploy)
# ansible-playbook -i biscayne.vaasl.io, playbooks/biscayne-redeploy.yml \
# --tags snapshot
#
- name: Redeploy agave validator on biscayne
hosts: all
gather_facts: false
environment:
KUBECONFIG: /home/rix/.kube/config
vars:
deployment_dir: /srv/deployments/agave
laconic_so: /home/rix/.local/bin/laconic-so
kind_cluster: laconic-70ce4c4b47e23b85
k8s_namespace: "laconic-{{ kind_cluster }}"
deployment_name: "{{ kind_cluster }}-deployment"
snapshot_dir: /srv/solana/snapshots
ledger_dir: /srv/solana/ledger
accounts_dir: /srv/solana/ramdisk/accounts
ramdisk_mount: /srv/solana/ramdisk
ramdisk_device: /dev/ram0
snapshot_script_local: "{{ playbook_dir }}/../scripts/snapshot-download.py"
snapshot_script: /tmp/snapshot-download.py
# Flags — non-destructive by default
wipe_accounts: false
wipe_ledger: false
skip_snapshot: false
snapshot_args: ""
tasks:
# ---- teardown: graceful stop, then delete namespace ----------------------
#
# IMPORTANT: Scale to 0 first, wait for agave to exit cleanly.
# Deleting the namespace while agave is running causes io_uring/ZFS
# deadlock (unkillable D-state threads). See CLAUDE.md.
- name: Scale deployment to 0 (graceful stop)
command: >
kubectl scale deployment {{ deployment_name }}
-n {{ k8s_namespace }} --replicas=0
register: pre_teardown_scale
failed_when: false
tags: [teardown]
- name: Wait for agave to exit
command: >
kubectl get pods -n {{ k8s_namespace }}
-l app={{ deployment_name }}
-o jsonpath='{.items}'
register: pre_teardown_pods
retries: 60
delay: 5
until: pre_teardown_pods.stdout == "[]" or pre_teardown_pods.stdout == "" or pre_teardown_pods.rc != 0
failed_when: false
when: pre_teardown_scale.rc == 0
tags: [teardown]
- name: Delete deployment namespace
command: >
kubectl delete namespace {{ k8s_namespace }} --timeout=120s
register: ns_delete
failed_when: false
tags: [teardown]
- name: Wait for namespace to terminate
command: >
kubectl get namespace {{ k8s_namespace }}
-o jsonpath='{.status.phase}'
register: ns_status
retries: 30
delay: 5
until: ns_status.rc != 0
failed_when: false
when: ns_delete.rc == 0
tags: [teardown]
# ---- wipe: opt-in data cleanup ------------------------------------------
- name: Wipe ledger data
shell: rm -rf {{ ledger_dir }}/*
become: true
when: wipe_ledger | bool
tags: [wipe]
- name: Wipe accounts ramdisk (umount + mkfs.xfs + mount)
shell: |
mountpoint -q {{ ramdisk_mount }} && umount {{ ramdisk_mount }} || true
mkfs.xfs -f {{ ramdisk_device }}
mount {{ ramdisk_mount }}
mkdir -p {{ accounts_dir }}
chown solana:solana {{ ramdisk_mount }} {{ accounts_dir }}
become: true
when: wipe_accounts | bool
tags: [wipe]
- name: Clean old snapshots (keep newest full + incremental)
shell: |
cd {{ snapshot_dir }} || exit 0
newest=$(ls -t snapshot-*.tar.* 2>/dev/null | head -1)
if [ -n "$newest" ]; then
newest_inc=$(ls -t incremental-snapshot-*.tar.* 2>/dev/null | head -1)
find . -maxdepth 1 -name '*.tar.*' \
! -name "$newest" \
! -name "${newest_inc:-__none__}" \
-delete
fi
become: true
when: not skip_snapshot | bool
tags: [wipe]
# ---- preflight: verify ramdisk and mounts before deploy ------------------
- name: Verify ramdisk is mounted
command: mountpoint -q {{ ramdisk_mount }}
register: ramdisk_check
failed_when: ramdisk_check.rc != 0
changed_when: false
tags: [deploy, preflight]
- name: Verify ramdisk is xfs (not the underlying ZFS)
shell: df -T {{ ramdisk_mount }} | grep -q xfs
register: ramdisk_type
failed_when: ramdisk_type.rc != 0
changed_when: false
tags: [deploy, preflight]
- name: Verify ramdisk visible inside kind node
shell: >
docker exec {{ kind_cluster }}-control-plane
df -T /mnt/solana/ramdisk 2>/dev/null | grep -q xfs
register: kind_ramdisk_check
failed_when: kind_ramdisk_check.rc != 0
changed_when: false
tags: [deploy, preflight]
# ---- deploy: bring up cluster, scale to 0 immediately -------------------
- name: Verify kind-config.yml has unified mount root
command: "grep -c 'containerPath: /mnt$' {{ deployment_dir }}/kind-config.yml"
register: mount_root_check
failed_when: mount_root_check.stdout | int < 1
tags: [deploy]
- name: Start deployment (creates kind cluster + deploys pod)
command: "{{ laconic_so }} deployment --dir {{ deployment_dir }} start"
timeout: 1200
tags: [deploy]
- name: Wait for deployment to exist
command: >
kubectl get deployment {{ deployment_name }}
-n {{ k8s_namespace }}
-o jsonpath='{.metadata.name}'
register: deploy_exists
retries: 30
delay: 10
until: deploy_exists.rc == 0
tags: [deploy]
- name: Scale validator to 0 (stop before snapshot download)
command: >
kubectl scale deployment {{ deployment_name }}
-n {{ k8s_namespace }} --replicas=0
tags: [deploy]
- name: Wait for pods to terminate
command: >
kubectl get pods -n {{ k8s_namespace }}
-l app={{ deployment_name }}
-o jsonpath='{.items}'
register: pods_gone
retries: 30
delay: 5
until: pods_gone.stdout == "[]" or pods_gone.stdout == ""
failed_when: false
tags: [deploy]
# ---- snapshot: download via aria2c, verify in kind node ------------------
- name: Verify aria2c installed
command: which aria2c
changed_when: false
when: not skip_snapshot | bool
tags: [snapshot]
- name: Copy snapshot script to remote
copy:
src: "{{ snapshot_script_local }}"
dest: "{{ snapshot_script }}"
mode: "0755"
when: not skip_snapshot | bool
tags: [snapshot]
- name: Verify kind node mounts
command: >
docker exec {{ kind_cluster }}-control-plane
ls /mnt/solana/snapshots/
register: kind_mount_check
tags: [snapshot]
- name: Download snapshot via aria2c
shell: >
python3 {{ snapshot_script }}
-o {{ snapshot_dir }}
{{ snapshot_args }}
become: true
register: snapshot_result
when: not skip_snapshot | bool
timeout: 3600
tags: [snapshot]
- name: Show snapshot download result
debug:
msg: "{{ snapshot_result.stdout_lines | default(['skipped']) }}"
tags: [snapshot]
- name: Verify snapshot visible inside kind node
shell: >
docker exec {{ kind_cluster }}-control-plane
ls -lhS /mnt/solana/snapshots/*.tar.* 2>/dev/null | head -5
register: kind_snapshot_check
failed_when: kind_snapshot_check.stdout == ""
when: not skip_snapshot | bool
tags: [snapshot]
- name: Show snapshot files in kind node
debug:
msg: "{{ kind_snapshot_check.stdout_lines | default(['skipped']) }}"
when: not skip_snapshot | bool
tags: [snapshot]
# ---- deploy (cont): scale validator back up with snapshot ----------------
- name: Scale validator to 1 (start with downloaded snapshot)
command: >
kubectl scale deployment {{ deployment_name }}
-n {{ k8s_namespace }} --replicas=1
tags: [deploy]
# ---- verify: confirm validator is running --------------------------------
- name: Wait for pod to be running
command: >
kubectl get pods -n {{ k8s_namespace }}
-o jsonpath='{.items[0].status.phase}'
register: pod_status
retries: 60
delay: 10
until: pod_status.stdout == "Running"
tags: [verify]
- name: Verify unified mount inside kind node
command: "docker exec {{ kind_cluster }}-control-plane ls /mnt/solana/"
register: mount_check
tags: [verify]
- name: Show mount contents
debug:
msg: "{{ mount_check.stdout_lines }}"
tags: [verify]
- name: Check validator log file is being written
command: >
kubectl exec -n {{ k8s_namespace }}
deployment/{{ deployment_name }}
-c agave-validator -- test -f /data/log/validator.log
retries: 12
delay: 10
until: log_file_check.rc == 0
register: log_file_check
failed_when: false
tags: [verify]
- name: Check RPC health
uri:
url: http://127.0.0.1:8899/health
return_content: true
register: rpc_health
retries: 6
delay: 10
until: rpc_health.status == 200
failed_when: false
delegate_to: "{{ inventory_hostname }}"
tags: [verify]
- name: Report status
debug:
msg: >-
Deployment complete.
Log: {{ 'writing' if log_file_check.rc == 0 else 'not yet created' }}.
RPC: {{ rpc_health.content | default('not responding') }}.
Wiped: ledger={{ wipe_ledger }}, accounts={{ wipe_accounts }}.
tags: [verify]

View File

@ -0,0 +1,106 @@
---
# Graceful shutdown of agave validator on biscayne
#
# Scales the deployment to 0 and waits for the pod to terminate.
# This MUST be done before any kind node restart, host reboot,
# or docker operations.
#
# The agave validator uses io_uring for async I/O. On ZFS, killing
# the process ungracefully (SIGKILL, docker kill, etc.) can produce
# unkillable kernel threads stuck in io_wq_put_and_exit, deadlocking
# the container's PID namespace. A graceful SIGTERM via k8s scale-down
# allows agave to flush and close its io_uring contexts cleanly.
#
# Usage:
# # Stop the validator
# ansible-playbook -i biscayne.vaasl.io, playbooks/biscayne-stop.yml
#
# # Stop and restart kind node (LAST RESORT — e.g., broken namespace)
# # Normally unnecessary: mount propagation means ramdisk/ZFS changes
# # are visible in the kind node without restarting it.
# ansible-playbook -i biscayne.vaasl.io, playbooks/biscayne-stop.yml \
# -e restart_kind=true
#
- name: Graceful validator shutdown
hosts: all
gather_facts: false
environment:
KUBECONFIG: /home/rix/.kube/config
vars:
kind_cluster: laconic-70ce4c4b47e23b85
k8s_namespace: "laconic-{{ kind_cluster }}"
deployment_name: "{{ kind_cluster }}-deployment"
restart_kind: false
tasks:
- name: Get current replica count
command: >
kubectl get deployment {{ deployment_name }}
-n {{ k8s_namespace }}
-o jsonpath='{.spec.replicas}'
register: current_replicas
failed_when: false
changed_when: false
- name: Scale deployment to 0
command: >
kubectl scale deployment {{ deployment_name }}
-n {{ k8s_namespace }} --replicas=0
when: current_replicas.stdout | default('0') | int > 0
- name: Wait for pods to terminate
command: >
kubectl get pods -n {{ k8s_namespace }}
-l app={{ deployment_name }}
-o jsonpath='{.items}'
register: pods_gone
retries: 60
delay: 5
until: pods_gone.stdout == "[]" or pods_gone.stdout == ""
when: current_replicas.stdout | default('0') | int > 0
- name: Verify no agave processes in kind node
command: >
docker exec {{ kind_cluster }}-control-plane
pgrep -c agave-validator
register: agave_procs
failed_when: false
changed_when: false
- name: Fail if agave still running
fail:
msg: >-
agave-validator process still running inside kind node after
pod termination. Do NOT restart the kind node — investigate
first to avoid io_uring/ZFS deadlock.
when: agave_procs.rc == 0
- name: Report stopped
debug:
msg: >-
Validator stopped. Replicas: {{ current_replicas.stdout | default('0') }} -> 0.
No agave processes detected in kind node.
when: not restart_kind | bool
# ---- optional: restart kind node -----------------------------------------
- name: Restart kind node
command: docker restart {{ kind_cluster }}-control-plane
when: restart_kind | bool
timeout: 120
- name: Wait for kind node ready
command: >
kubectl get node {{ kind_cluster }}-control-plane
-o jsonpath='{.status.conditions[?(@.type=="Ready")].status}'
register: node_ready
retries: 30
delay: 10
until: node_ready.stdout == "True"
when: restart_kind | bool
- name: Report restarted
debug:
msg: >-
Kind node restarted and ready.
Deployment at 0 replicas — scale up when ready.
when: restart_kind | bool

View File

@ -0,0 +1,134 @@
---
# Connect biscayne to DoubleZero multicast via laconic-mia-sw01
#
# Establishes a GRE tunnel to the nearest DZ hybrid device and subscribes
# to jito-shredstream and bebop multicast groups.
#
# Usage:
# ansible-playbook playbooks/connect-doublezero-multicast.yml
# ansible-playbook playbooks/connect-doublezero-multicast.yml --check # dry-run
- name: Connect biscayne to DoubleZero multicast
hosts: biscayne
gather_facts: false
vars:
dz_multicast_groups:
- jito-shredstream
- bebop
tasks:
# ------------------------------------------------------------------
# Pre-checks
# ------------------------------------------------------------------
- name: Verify doublezerod service is running
ansible.builtin.systemd:
name: doublezerod
state: started
check_mode: true
register: dz_service
failed_when: dz_service.status.ActiveState != "active"
- name: Get doublezero identity address
ansible.builtin.command:
cmd: doublezero address
register: dz_address
changed_when: false
- name: Verify doublezero identity matches expected pubkey
ansible.builtin.assert:
that:
- dz_address.stdout | trim == dz_identity
fail_msg: >-
DZ identity mismatch: got '{{ dz_address.stdout | trim }}',
expected '{{ dz_identity }}'
- name: Check current DZ connection status
ansible.builtin.command:
cmd: "doublezero -e {{ dz_environment }} status"
register: dz_status
changed_when: false
failed_when: false
- name: Fail if already connected (tunnel is up)
ansible.builtin.fail:
msg: >-
DoubleZero tunnel is already connected. To reconnect, first
disconnect manually with: doublezero -e {{ dz_environment }} disconnect
when: "'connected' in dz_status.stdout | lower"
# ------------------------------------------------------------------
# Create access pass
# ------------------------------------------------------------------
- name: Create DZ access pass for multicast subscriber
ansible.builtin.command:
cmd: >-
doublezero -e {{ dz_environment }} access-pass set
--accesspass-type solana-multicast-subscriber
--client-ip {{ client_ip }}
--user-payer {{ dz_identity }}
--solana-validator {{ validator_identity }}
--tenant {{ dz_tenant }}
register: dz_access_pass
changed_when: "'created' in dz_access_pass.stdout | lower or 'updated' in dz_access_pass.stdout | lower"
- name: Show access pass result
ansible.builtin.debug:
var: dz_access_pass.stdout_lines
# ------------------------------------------------------------------
# Connect to DZ multicast
# ------------------------------------------------------------------
- name: Connect to DoubleZero multicast via {{ dz_device }}
ansible.builtin.command:
cmd: >-
doublezero -e {{ dz_environment }} connect multicast
{% for group in dz_multicast_groups %}
--subscribe {{ group }}
{% endfor %}
--device {{ dz_device }}
--client-ip {{ client_ip }}
register: dz_connect
changed_when: true
- name: Show connect result
ansible.builtin.debug:
var: dz_connect.stdout_lines
# ------------------------------------------------------------------
# Post-checks
# ------------------------------------------------------------------
- name: Verify tunnel status is connected
ansible.builtin.command:
cmd: "doublezero -e {{ dz_environment }} status"
register: dz_post_status
changed_when: false
failed_when: "'connected' not in dz_post_status.stdout | lower"
- name: Show tunnel status
ansible.builtin.debug:
var: dz_post_status.stdout_lines
- name: Verify routes are installed
ansible.builtin.command:
cmd: "doublezero -e {{ dz_environment }} routes"
register: dz_routes
changed_when: false
- name: Show installed routes
ansible.builtin.debug:
var: dz_routes.stdout_lines
- name: Check multicast group membership
ansible.builtin.command:
cmd: "doublezero -e {{ dz_environment }} status"
register: dz_multicast_status
changed_when: false
- name: Connection summary
ansible.builtin.debug:
msg: >-
DoubleZero multicast connected via {{ dz_device }}.
Subscribed groups: {{ dz_multicast_groups | join(', ') }}.
Next step: request allowlist access from group owners
(see docs/doublezero-multicast-access.md).

View File

@ -0,0 +1,18 @@
#!/bin/bash
# /etc/network/if-up.d/ashburn-routing
# Restore policy routing for Ashburn validator relay after reboot/interface up.
# Only act when doublezero0 comes up.
[ "$IFACE" = "doublezero0" ] || exit 0
# Ensure rt_tables entry exists
grep -q '^100 ashburn$' /etc/iproute2/rt_tables || echo "100 ashburn" >> /etc/iproute2/rt_tables
# Add policy rule (idempotent — ip rule skips duplicates silently on some kernels)
ip rule show | grep -q 'fwmark 0x64 lookup ashburn' || ip rule add fwmark 100 table ashburn
# Add default route via mia-sw01 through doublezero0 tunnel
ip route replace default via 169.254.7.6 dev doublezero0 table ashburn
# Add Ashburn IP to loopback (idempotent)
ip addr show lo | grep -q '137.239.194.65' || ip addr add 137.239.194.65/32 dev lo

View File

@ -0,0 +1,166 @@
---
# Verify PV hostPaths match expected kind-node paths, fix if wrong.
#
# Checks each PV's hostPath against the expected path derived from the
# spec.yml volume mapping through the kind extraMounts. If any PV has a
# wrong path, fails unless -e fix=true is passed.
#
# Does NOT touch the deployment.
#
# Usage:
# # Check only (fails if mounts are bad)
# ansible-playbook -i biscayne.vaasl.io, playbooks/fix-pv-mounts.yml
#
# # Fix stale PVs
# ansible-playbook -i biscayne.vaasl.io, playbooks/fix-pv-mounts.yml -e fix=true
#
- name: Verify and fix PV mount paths
hosts: all
gather_facts: false
environment:
KUBECONFIG: /home/rix/.kube/config
vars:
kind_cluster: laconic-70ce4c4b47e23b85
k8s_namespace: "laconic-{{ kind_cluster }}"
fix: false
volumes:
- name: validator-snapshots
host_path: /mnt/solana/snapshots
capacity: 200Gi
- name: validator-ledger
host_path: /mnt/solana/ledger
capacity: 2Ti
- name: validator-accounts
host_path: /mnt/solana/ramdisk/accounts
capacity: 800Gi
- name: validator-log
host_path: /mnt/solana/log
capacity: 10Gi
tasks:
- name: Read current PV hostPaths
command: >
kubectl get pv {{ kind_cluster }}-{{ item.name }}
-o jsonpath='{.spec.hostPath.path}'
register: current_paths
loop: "{{ volumes }}"
failed_when: false
changed_when: false
- name: Build path comparison
set_fact:
path_mismatches: "{{ current_paths.results | selectattr('stdout', 'ne', '') | rejectattr('stdout', 'equalto', item.host_path) | list }}"
path_missing: "{{ current_paths.results | selectattr('stdout', 'equalto', '') | list }}"
loop: "{{ volumes }}"
loop_control:
label: "{{ item.name }}"
- name: Show current vs expected paths
debug:
msg: >-
{{ item.item.name }}:
current={{ item.stdout if item.stdout else 'NOT FOUND' }}
expected={{ item.item.host_path }}
{{ 'OK' if item.stdout == item.item.host_path else 'NEEDS FIX' }}
loop: "{{ current_paths.results }}"
loop_control:
label: "{{ item.item.name }}"
- name: Check for mismatched PVs
fail:
msg: >-
PV {{ item.item.name }} has wrong hostPath:
{{ item.stdout if item.stdout else 'NOT FOUND' }}
(expected {{ item.item.host_path }}).
Run with -e fix=true to delete and recreate.
when: item.stdout != item.item.host_path and not fix | bool
loop: "{{ current_paths.results }}"
loop_control:
label: "{{ item.item.name }}"
# ---- Fix mode ---------------------------------------------------------
- name: Delete stale PVCs
command: >
kubectl delete pvc {{ kind_cluster }}-{{ item.item.name }}
-n {{ k8s_namespace }} --timeout=60s
when: fix | bool and item.stdout != item.item.host_path
loop: "{{ current_paths.results }}"
loop_control:
label: "{{ item.item.name }}"
failed_when: false
- name: Delete stale PVs
command: >
kubectl delete pv {{ kind_cluster }}-{{ item.item.name }}
--timeout=60s
when: fix | bool and item.stdout != item.item.host_path
loop: "{{ current_paths.results }}"
loop_control:
label: "{{ item.item.name }}"
failed_when: false
- name: Create PVs with correct hostPaths
command: >
kubectl apply -f -
args:
stdin: |
apiVersion: v1
kind: PersistentVolume
metadata:
name: {{ kind_cluster }}-{{ item.item.name }}
spec:
capacity:
storage: {{ item.item.capacity }}
accessModes:
- ReadWriteOnce
persistentVolumeReclaimPolicy: Retain
storageClassName: manual
hostPath:
path: {{ item.item.host_path }}
when: fix | bool and item.stdout != item.item.host_path
loop: "{{ current_paths.results }}"
loop_control:
label: "{{ item.item.name }}"
- name: Create PVCs
command: >
kubectl apply -f -
args:
stdin: |
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: {{ kind_cluster }}-{{ item.item.name }}
namespace: {{ k8s_namespace }}
spec:
accessModes:
- ReadWriteOnce
storageClassName: manual
volumeName: {{ kind_cluster }}-{{ item.item.name }}
resources:
requests:
storage: {{ item.item.capacity }}
when: fix | bool and item.stdout != item.item.host_path
loop: "{{ current_paths.results }}"
loop_control:
label: "{{ item.item.name }}"
# ---- Final verify -----------------------------------------------------
- name: Verify PV paths
command: >
kubectl get pv {{ kind_cluster }}-{{ item.name }}
-o jsonpath='{.spec.hostPath.path}'
register: final_paths
loop: "{{ volumes }}"
changed_when: false
when: fix | bool
- name: Assert all PV paths correct
assert:
that: item.stdout == item.item.host_path
fail_msg: "{{ item.item.name }}: {{ item.stdout }} != {{ item.item.host_path }}"
success_msg: "{{ item.item.name }}: {{ item.stdout }} OK"
loop: "{{ final_paths.results }}"
loop_control:
label: "{{ item.item.name }}"
when: fix | bool

View File

@ -0,0 +1,340 @@
---
# Health check for biscayne agave-stack deployment
#
# Gathers system, validator, DoubleZero, and network status in a single run.
# All tasks are read-only — safe to run at any time.
#
# Usage:
# ansible-playbook playbooks/health-check.yml
# ansible-playbook playbooks/health-check.yml -t validator # just validator checks
# ansible-playbook playbooks/health-check.yml -t doublezero # just DZ checks
# ansible-playbook playbooks/health-check.yml -t network # just network checks
- name: Biscayne agave-stack health check
hosts: biscayne
gather_facts: false
tasks:
# ------------------------------------------------------------------
# Discover kind cluster and namespace
# ------------------------------------------------------------------
- name: Get kind cluster name
ansible.builtin.command:
cmd: kind get clusters
register: kind_clusters
changed_when: false
failed_when: kind_clusters.rc != 0 or kind_clusters.stdout_lines | length == 0
- name: Set cluster name fact
ansible.builtin.set_fact:
kind_cluster: "{{ kind_clusters.stdout_lines[0] }}"
- name: Discover agave namespace
ansible.builtin.shell:
cmd: >-
set -o pipefail &&
kubectl get namespaces --no-headers -o custom-columns=':metadata.name'
| grep '^laconic-'
executable: /bin/bash
register: ns_result
changed_when: false
failed_when: ns_result.stdout_lines | length == 0
- name: Set namespace fact
ansible.builtin.set_fact:
agave_ns: "{{ ns_result.stdout_lines[0] }}"
- name: Get pod name
ansible.builtin.shell:
cmd: >-
set -o pipefail &&
kubectl get pods -n {{ agave_ns }} --no-headers
-o custom-columns=':metadata.name' | head -1
executable: /bin/bash
register: pod_result
changed_when: false
failed_when: pod_result.stdout | trim == ''
- name: Set pod fact
ansible.builtin.set_fact:
agave_pod: "{{ pod_result.stdout | trim }}"
- name: Show discovered resources
ansible.builtin.debug:
msg: "cluster={{ kind_cluster }} ns={{ agave_ns }} pod={{ agave_pod }}"
# ------------------------------------------------------------------
# Pod status
# ------------------------------------------------------------------
- name: Get pod status
ansible.builtin.command:
cmd: kubectl get pods -n {{ agave_ns }} -o wide
register: pod_status
changed_when: false
tags: [validator]
- name: Show pod status
ansible.builtin.debug:
var: pod_status.stdout_lines
tags: [validator]
- name: Get container restart counts
ansible.builtin.shell:
cmd: >-
kubectl get pod {{ agave_pod }} -n {{ agave_ns }}
-o jsonpath='{range .status.containerStatuses[*]}{.name}{" restarts="}{.restartCount}{" ready="}{.ready}{"\n"}{end}'
register: restart_counts
changed_when: false
tags: [validator]
- name: Show restart counts
ansible.builtin.debug:
var: restart_counts.stdout_lines
tags: [validator]
# ------------------------------------------------------------------
# Validator sync status
# ------------------------------------------------------------------
- name: Get validator recent logs (replay progress)
ansible.builtin.command:
cmd: >-
kubectl logs -n {{ agave_ns }} {{ agave_pod }}
-c agave-validator --tail=30
register: validator_logs
changed_when: false
tags: [validator]
- name: Show validator logs
ansible.builtin.debug:
var: validator_logs.stdout_lines
tags: [validator]
- name: Check RPC health endpoint
ansible.builtin.uri:
url: http://127.0.0.1:8899/health
method: GET
return_content: true
timeout: 5
register: rpc_health
failed_when: false
tags: [validator]
- name: Show RPC health
ansible.builtin.debug:
msg: "RPC health: {{ rpc_health.status | default('unreachable') }} — {{ rpc_health.content | default('no response') }}"
tags: [validator]
- name: Get validator version
ansible.builtin.shell:
cmd: >-
kubectl exec -n {{ agave_ns }} {{ agave_pod }}
-c agave-validator -- agave-validator --version 2>&1 || true
register: validator_version
changed_when: false
tags: [validator]
- name: Show validator version
ansible.builtin.debug:
var: validator_version.stdout
tags: [validator]
# ------------------------------------------------------------------
# DoubleZero status
# ------------------------------------------------------------------
- name: Get host DZ identity
ansible.builtin.command:
cmd: sudo -u solana doublezero address
register: dz_address
changed_when: false
failed_when: false
tags: [doublezero]
- name: Get host DZ tunnel status
ansible.builtin.command:
cmd: sudo -u solana doublezero -e {{ dz_environment }} status
register: dz_status
changed_when: false
failed_when: false
tags: [doublezero]
- name: Get DZ routes
ansible.builtin.shell:
cmd: set -o pipefail && ip route | grep doublezero0 || echo "no doublezero0 routes"
executable: /bin/bash
register: dz_routes
changed_when: false
tags: [doublezero]
- name: Get host doublezerod service state
ansible.builtin.systemd:
name: doublezerod
register: dz_systemd_info
failed_when: false
check_mode: true
tags: [doublezero]
- name: Set DZ systemd state
ansible.builtin.set_fact:
dz_systemd_state: "{{ dz_systemd_info.status.ActiveState | default('unknown') }}"
tags: [doublezero]
- name: Get container DZ status
ansible.builtin.shell:
cmd: >-
kubectl exec -n {{ agave_ns }} {{ agave_pod }}
-c doublezerod -- doublezero status 2>&1 || echo "container DZ unavailable"
register: dz_container_status
changed_when: false
tags: [doublezero]
- name: Show DoubleZero status
ansible.builtin.debug:
msg:
identity: "{{ dz_address.stdout | default('unknown') }}"
host_tunnel: "{{ dz_status.stdout_lines | default(['unknown']) }}"
host_systemd: "{{ dz_systemd_state }}"
container: "{{ dz_container_status.stdout_lines | default(['unknown']) }}"
routes: "{{ dz_routes.stdout_lines | default([]) }}"
tags: [doublezero]
# ------------------------------------------------------------------
# Storage
# ------------------------------------------------------------------
- name: Check ramdisk usage
ansible.builtin.command:
cmd: df -h /srv/solana/ramdisk
register: ramdisk_df
changed_when: false
failed_when: false
tags: [storage]
- name: Check ZFS dataset usage
ansible.builtin.command:
cmd: zfs list -o name,used,avail,mountpoint -r biscayne/DATA
register: zfs_list
changed_when: false
tags: [storage]
- name: Check ZFS zvol I/O
ansible.builtin.shell:
cmd: set -o pipefail && iostat -x zd0 1 2 | tail -3
executable: /bin/bash
register: zvol_io
changed_when: false
failed_when: false
tags: [storage]
- name: Show storage status
ansible.builtin.debug:
msg:
ramdisk: "{{ ramdisk_df.stdout_lines | default(['not mounted']) }}"
zfs: "{{ zfs_list.stdout_lines | default([]) }}"
zvol_io: "{{ zvol_io.stdout_lines | default([]) }}"
tags: [storage]
# ------------------------------------------------------------------
# System resources
# ------------------------------------------------------------------
- name: Check memory
ansible.builtin.command:
cmd: free -h
register: mem
changed_when: false
tags: [system]
- name: Check load average
ansible.builtin.command:
cmd: cat /proc/loadavg
register: loadavg
changed_when: false
tags: [system]
- name: Check swap
ansible.builtin.command:
cmd: swapon --show
register: swap
changed_when: false
failed_when: false
tags: [system]
- name: Show system resources
ansible.builtin.debug:
msg:
memory: "{{ mem.stdout_lines }}"
load: "{{ loadavg.stdout }}"
swap: "{{ swap.stdout | default('none') }}"
tags: [system]
# ------------------------------------------------------------------
# Network / shred throughput
# ------------------------------------------------------------------
- name: Count shred packets per interface (5 sec sample)
ansible.builtin.shell:
cmd: |
set -o pipefail
for iface in eno1 doublezero0; do
count=$(timeout 5 tcpdump -i "$iface" -nn 'udp dst portrange 9000-10000' -q 2>&1 | grep -oP '\d+(?= packets captured)' || echo 0)
echo "$iface: $count packets/5s"
done
executable: /bin/bash
register: shred_counts
changed_when: false
failed_when: false
tags: [network]
- name: Check interface throughput
ansible.builtin.shell:
cmd: >-
set -o pipefail &&
grep -E 'eno1|doublezero0' /proc/net/dev
| awk '{printf "%s rx=%s tx=%s\n", $1, $2, $10}'
executable: /bin/bash
register: iface_stats
changed_when: false
tags: [network]
- name: Check gossip/repair port connections
ansible.builtin.shell:
cmd: >-
set -o pipefail &&
ss -tupn | grep -E ':8001|:900[0-9]' | head -20 || echo "no connections"
executable: /bin/bash
register: gossip_ports
changed_when: false
tags: [network]
- name: Check iptables DNAT rule (TVU shred relay)
ansible.builtin.shell:
cmd: >-
set -o pipefail &&
iptables -t nat -L PREROUTING -v -n | grep -E '64.92.84.81|20000' || echo "no DNAT rule"
executable: /bin/bash
register: dnat_rule
changed_when: false
tags: [network]
- name: Show network status
ansible.builtin.debug:
msg:
shred_counts: "{{ shred_counts.stdout_lines | default([]) }}"
interfaces: "{{ iface_stats.stdout_lines | default([]) }}"
gossip_ports: "{{ gossip_ports.stdout_lines | default([]) }}"
tvu_dnat: "{{ dnat_rule.stdout_lines | default([]) }}"
tags: [network]
# ------------------------------------------------------------------
# Summary
# ------------------------------------------------------------------
- name: Health check summary
ansible.builtin.debug:
msg: |
=== Biscayne Health Check ===
Cluster: {{ kind_cluster }}
Namespace: {{ agave_ns }}
Pod: {{ agave_pod }}
RPC: {{ rpc_health.status | default('unreachable') }}
DZ identity: {{ dz_address.stdout | default('unknown') | trim }}
DZ tunnel: {{ 'UP' if dz_status.rc | default(1) == 0 else 'DOWN' }}
DZ systemd: {{ dz_systemd_state }}
Ramdisk: {{ ramdisk_df.stdout_lines[-1] | default('unknown') }}
Load: {{ loadavg.stdout | default('unknown') }}

View File

@ -0,0 +1,98 @@
#!/bin/bash
# Check shred completeness at the tip of the blockstore.
#
# Samples the most recent N slots and reports how many are full.
# Use this to determine when enough complete blocks have accumulated
# to safely download a new snapshot that lands within the complete range.
#
# Usage: kubectl exec ... -- bash -c "$(cat check-shred-completeness.sh)"
# Or: ssh biscayne ... 'KUBECONFIG=... kubectl exec ... -- agave-ledger-tool ...'
set -euo pipefail
KUBECONFIG="${KUBECONFIG:-/home/rix/.kube/config}"
NS="laconic-laconic-70ce4c4b47e23b85"
DEPLOY="laconic-70ce4c4b47e23b85-deployment"
SAMPLE_SIZE="${1:-200}"
# Get blockstore bounds
BOUNDS=$(kubectl exec -n "$NS" deployment/"$DEPLOY" -c agave-validator -- \
agave-ledger-tool -l /data/ledger blockstore bounds 2>&1 | grep "^Ledger")
HIGHEST=$(echo "$BOUNDS" | grep -oP 'to \K[0-9]+')
START=$((HIGHEST - SAMPLE_SIZE))
echo "Blockstore highest slot: $HIGHEST"
echo "Sampling slots $START to $HIGHEST ($SAMPLE_SIZE slots)"
echo ""
# Get slot metadata
OUTPUT=$(kubectl exec -n "$NS" deployment/"$DEPLOY" -c agave-validator -- \
agave-ledger-tool -l /data/ledger blockstore print \
--starting-slot "$START" --ending-slot "$HIGHEST" 2>&1 \
| grep -E "^Slot|is_full")
TOTAL=$(echo "$OUTPUT" | grep -c "^Slot" || true)
FULL=$(echo "$OUTPUT" | grep -c "is_full: true" || true)
INCOMPLETE=$(echo "$OUTPUT" | grep -c "is_full: false" || true)
echo "Total slots with data: $TOTAL / $SAMPLE_SIZE"
echo "Complete (is_full: true): $FULL"
echo "Incomplete (is_full: false): $INCOMPLETE"
if [ "$TOTAL" -gt 0 ]; then
PCT=$((FULL * 100 / TOTAL))
echo "Completeness: ${PCT}%"
else
echo "Completeness: N/A (no data)"
fi
echo ""
# Find the first full slot counting backward from the tip
# This tells us where the contiguous complete run starts
echo "--- Contiguous complete run from tip ---"
# Get just the slot numbers and is_full in reverse order
REVERSED=$(echo "$OUTPUT" | paste - - | awk '{
slot = $2;
full = ($NF == "true") ? 1 : 0;
print slot, full
}' | sort -rn)
CONTIGUOUS=0
FIRST_FULL=""
while IFS=' ' read -r slot full; do
if [ "$full" -eq 1 ]; then
CONTIGUOUS=$((CONTIGUOUS + 1))
FIRST_FULL="$slot"
else
break
fi
done <<< "$REVERSED"
if [ -n "$FIRST_FULL" ]; then
echo "Contiguous complete slots from tip: $CONTIGUOUS"
echo "Run starts at slot: $FIRST_FULL"
echo "Run ends at slot: $HIGHEST"
echo ""
echo "A snapshot with slot >= $FIRST_FULL would replay from local blockstore."
# Check against mainnet
MAINNET_SLOT=$(curl -s -X POST -H "Content-Type: application/json" \
-d '{"jsonrpc":"2.0","id":1,"method":"getSlot","params":[{"commitment":"finalized"}]}' \
https://api.mainnet-beta.solana.com | grep -oP '"result":\K[0-9]+')
GAP=$((MAINNET_SLOT - HIGHEST))
echo "Mainnet tip: $MAINNET_SLOT (blockstore is $GAP slots behind tip)"
if [ "$CONTIGUOUS" -gt 100 ]; then
echo ""
echo ">>> READY: $CONTIGUOUS contiguous complete slots. Safe to download a snapshot."
else
echo ""
echo ">>> NOT READY: Only $CONTIGUOUS contiguous complete slots. Wait for more."
fi
else
echo "No contiguous complete run from tip found."
fi

View File

@ -0,0 +1,38 @@
#!/bin/bash
# Run a command in a tmux pane and capture its output.
# User sees it streaming in the pane; caller gets stdout back.
#
# Usage: pane-exec.sh <pane-id> <command...>
# Example: pane-exec.sh %6565 ansible-playbook -i inventory/switches.yml playbooks/foo.yml
set -euo pipefail
PANE="$1"
shift
CMD="$*"
TMPFILE=$(mktemp /tmp/pane-output.XXXXXX)
MARKER="__PANE_EXEC_DONE_${RANDOM}_$$__"
cleanup() {
tmux pipe-pane -t "$PANE" 2>/dev/null || true
rm -f "$TMPFILE"
}
trap cleanup EXIT
# Start capturing pane output
tmux pipe-pane -o -t "$PANE" "cat >> $TMPFILE"
# Send the command, then echo a marker so we know when it's done
tmux send-keys -t "$PANE" "$CMD; echo $MARKER" Enter
# Wait for the marker
while ! grep -q "$MARKER" "$TMPFILE" 2>/dev/null; do
sleep 0.5
done
# Stop capturing
tmux pipe-pane -t "$PANE"
# Strip ANSI escape codes, remove the marker line, output the rest
sed 's/\x1b\[[0-9;]*[a-zA-Z]//g; s/\x1b\[[?][0-9]*[a-zA-Z]//g' "$TMPFILE" | grep -v "$MARKER"

View File

@ -0,0 +1,151 @@
import { chromium } from 'playwright';
import { writeFileSync, mkdirSync } from 'fs';
import { join } from 'path';
const OUT_DIR = join(import.meta.dirname, '..', 'docs', 'arista-scraped');
mkdirSync(OUT_DIR, { recursive: true });
const pages = [
{ url: 'https://www.arista.com/en/um-eos/eos-static-inter-vrf-route', file: 'static-inter-vrf-route.md' },
{ url: 'https://www.arista.com/en/um-eos/eos-inter-vrf-local-route-leaking', file: 'inter-vrf-local-route-leaking.md' },
{ url: 'https://www.arista.com/en/um-eos/eos-policy-based-routing', file: 'policy-based-routing.md' },
{ url: 'https://www.arista.com/en/um-eos/eos-traffic-management', file: 'traffic-management.md' },
{ url: 'https://www.arista.com/en/um-eos/eos-policy-based-routing-pbr', file: 'pbr.md' },
{ url: 'https://www.arista.com/en/um-eos/eos-configuring-vrf-instances', file: 'configuring-vrf.md' },
{ url: 'https://www.arista.com/en/um-eos/eos-gre-tunnels', file: 'gre-tunnels.md' },
{ url: 'https://www.arista.com/en/um-eos/eos-access-control-lists', file: 'access-control-lists.md' },
{ url: 'https://www.arista.com/en/um-eos/eos-static-routes', file: 'static-routes.md' },
{ url: 'https://www.arista.com/en/um-eos/eos-configuration-sessions', file: 'configuration-sessions.md' },
{ url: 'https://www.arista.com/en/um-eos/eos-checkpoint-and-rollback', file: 'checkpoint-rollback.md' },
{ url: 'https://www.arista.com/en/um-eos', file: '_index.md' },
];
async function scrapePage(page, url, filename) {
console.log(`Scraping: ${url}`);
try {
const resp = await page.goto(url, { waitUntil: 'domcontentloaded', timeout: 30000 });
console.log(` Status: ${resp.status()}`);
// Wait for JS to render
await page.waitForTimeout(8000);
// Check for CAPTCHA
const bodyText = await page.evaluate(() => document.body.innerText.substring(0, 200));
if (bodyText.includes('CAPTCHA') || bodyText.includes("couldn't load")) {
console.log(` BLOCKED by CAPTCHA/anti-bot on ${filename}`);
writeFileSync(join(OUT_DIR, filename), `# BLOCKED BY CAPTCHA\n\nURL: ${url}\nThe Arista docs site requires CAPTCHA verification for headless browsers.\n`);
return false;
}
// Extract content
const content = await page.evaluate(() => {
const selectors = [
'#content', '.article-content', '.content-area', '#main-content',
'article', '.item-page', '#sp-component', '.com-content-article',
'main', '#sp-main-body',
];
let el = null;
for (const sel of selectors) {
el = document.querySelector(sel);
if (el && el.textContent.trim().length > 100) break;
}
if (!el) el = document.body;
function nodeToMd(node) {
if (node.nodeType === Node.TEXT_NODE) return node.textContent;
if (node.nodeType !== Node.ELEMENT_NODE) return '';
const tag = node.tagName.toLowerCase();
if (['nav', 'footer', 'script', 'style', 'noscript', 'iframe'].includes(tag)) return '';
if (node.classList && (node.classList.contains('nav') || node.classList.contains('sidebar') ||
node.classList.contains('menu') || node.classList.contains('footer') ||
node.classList.contains('header'))) return '';
let children = Array.from(node.childNodes).map(c => nodeToMd(c)).join('');
switch (tag) {
case 'h1': return `\n# ${children.trim()}\n\n`;
case 'h2': return `\n## ${children.trim()}\n\n`;
case 'h3': return `\n### ${children.trim()}\n\n`;
case 'h4': return `\n#### ${children.trim()}\n\n`;
case 'p': return `\n${children.trim()}\n\n`;
case 'br': return '\n';
case 'li': return `- ${children.trim()}\n`;
case 'ul': case 'ol': return `\n${children}\n`;
case 'pre': return `\n\`\`\`\n${children.trim()}\n\`\`\`\n\n`;
case 'code': return `\`${children.trim()}\``;
case 'strong': case 'b': return `**${children.trim()}**`;
case 'em': case 'i': return `*${children.trim()}*`;
case 'table': return `\n${children}\n`;
case 'tr': return `${children}|\n`;
case 'th': case 'td': return `| ${children.trim()} `;
case 'a': {
const href = node.getAttribute('href');
if (href && !href.startsWith('#') && !href.startsWith('javascript'))
return `[${children.trim()}](${href})`;
return children;
}
default: return children;
}
}
return nodeToMd(el);
});
const cleaned = content.replace(/\n{4,}/g, '\n\n\n').replace(/[ \t]+$/gm, '').trim();
const header = `<!-- Source: ${url} -->\n<!-- Scraped: ${new Date().toISOString()} -->\n\n`;
writeFileSync(join(OUT_DIR, filename), header + cleaned + '\n');
console.log(` Saved ${filename} (${cleaned.length} chars)`);
return true;
} catch (e) {
console.error(` FAILED: ${e.message}`);
writeFileSync(join(OUT_DIR, filename), `# FAILED TO LOAD\n\nURL: ${url}\nError: ${e.message}\n`);
return false;
}
}
async function main() {
// Launch with stealth-like settings
const browser = await chromium.launch({
headless: false, // Use headed mode via Xvfb if available, else new headless
args: [
'--headless=new', // New headless mode (less detectable)
'--disable-blink-features=AutomationControlled',
'--no-sandbox',
],
});
const context = await browser.newContext({
userAgent: 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36',
locale: 'en-US',
timezoneId: 'America/New_York',
viewport: { width: 1920, height: 1080 },
});
// Remove webdriver property
await context.addInitScript(() => {
Object.defineProperty(navigator, 'webdriver', { get: () => false });
// Override permissions
const originalQuery = window.navigator.permissions.query;
window.navigator.permissions.query = (parameters) =>
parameters.name === 'notifications'
? Promise.resolve({ state: Notification.permission })
: originalQuery(parameters);
});
const page = await context.newPage();
let anySuccess = false;
for (const { url, file } of pages) {
const ok = await scrapePage(page, url, file);
if (ok) anySuccess = true;
// Add delay between requests
await page.waitForTimeout(2000);
}
if (!anySuccess) {
console.log('\nAll pages blocked by CAPTCHA. Arista docs require human verification.');
}
await browser.close();
console.log('\nDone!');
}
main().catch(e => { console.error(e); process.exit(1); });

View File

@ -0,0 +1,34 @@
#!/usr/bin/env python3
"""Strip IP+UDP headers from mirrored packets and forward raw UDP payload."""
import socket
import sys
LISTEN_PORT = int(sys.argv[1]) if len(sys.argv) > 1 else 9100
FORWARD_HOST = sys.argv[2] if len(sys.argv) > 2 else "127.0.0.1"
FORWARD_PORT = int(sys.argv[3]) if len(sys.argv) > 3 else 9000
sock_in = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
sock_in.bind(("0.0.0.0", LISTEN_PORT))
sock_out = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
count = 0
while True:
data, addr = sock_in.recvfrom(65535)
if len(data) < 28:
continue
# IP header: first nibble is version (4), second nibble is IHL (words)
if (data[0] >> 4) != 4:
continue
ihl = (data[0] & 0x0F) * 4
# Protocol should be UDP (17)
if data[9] != 17:
continue
# Payload starts after IP header + 8-byte UDP header
offset = ihl + 8
payload = data[offset:]
if payload:
sock_out.sendto(payload, (FORWARD_HOST, FORWARD_PORT))
count += 1
if count % 10000 == 0:
print(f"Forwarded {count} shreds", flush=True)

View File

@ -0,0 +1,546 @@
#!/usr/bin/env python3
"""Download Solana snapshots using aria2c for parallel multi-connection downloads.
Discovers snapshot sources by querying getClusterNodes for all RPCs in the
cluster, probing each for available snapshots, benchmarking download speed,
and downloading from the fastest source using aria2c (16 connections by default).
Based on the discovery approach from etcusr/solana-snapshot-finder but replaces
the single-connection wget download with aria2c parallel chunked downloads.
Usage:
# Download to /srv/solana/snapshots (mainnet, 16 connections)
./snapshot-download.py -o /srv/solana/snapshots
# Dry run — find best source, print URL
./snapshot-download.py --dry-run
# Custom RPC for cluster node discovery + 32 connections
./snapshot-download.py -r https://api.mainnet-beta.solana.com -n 32
# Testnet
./snapshot-download.py -c testnet -o /data/snapshots
Requirements:
- aria2c (apt install aria2)
- python3 >= 3.10 (stdlib only, no pip dependencies)
"""
from __future__ import annotations
import argparse
import concurrent.futures
import json
import logging
import os
import re
import shutil
import subprocess
import sys
import time
import urllib.error
import urllib.request
from dataclasses import dataclass, field
from http.client import HTTPResponse
from pathlib import Path
from typing import NoReturn
from urllib.request import Request
log: logging.Logger = logging.getLogger("snapshot-download")
CLUSTER_RPC: dict[str, str] = {
"mainnet-beta": "https://api.mainnet-beta.solana.com",
"testnet": "https://api.testnet.solana.com",
"devnet": "https://api.devnet.solana.com",
}
# Snapshot filenames:
# snapshot-<slot>-<hash>.tar.zst
# incremental-snapshot-<base_slot>-<slot>-<hash>.tar.zst
FULL_SNAP_RE: re.Pattern[str] = re.compile(
r"^snapshot-(\d+)-([A-Za-z0-9]+)\.tar\.(zst|bz2)$"
)
INCR_SNAP_RE: re.Pattern[str] = re.compile(
r"^incremental-snapshot-(\d+)-(\d+)-([A-Za-z0-9]+)\.tar\.(zst|bz2)$"
)
@dataclass
class SnapshotSource:
"""A snapshot file available from a specific RPC node."""
rpc_address: str
# Full redirect paths as returned by the server (e.g. /snapshot-123-hash.tar.zst)
file_paths: list[str] = field(default_factory=list)
slots_diff: int = 0
latency_ms: float = 0.0
download_speed: float = 0.0 # bytes/sec
# -- JSON-RPC helpers ----------------------------------------------------------
class _NoRedirectHandler(urllib.request.HTTPRedirectHandler):
"""Handler that captures redirect Location instead of following it."""
def redirect_request(
self,
req: Request,
fp: HTTPResponse,
code: int,
msg: str,
headers: dict[str, str], # type: ignore[override]
newurl: str,
) -> None:
return None
def rpc_post(url: str, method: str, params: list[object] | None = None,
timeout: int = 25) -> object | None:
"""JSON-RPC POST. Returns parsed 'result' field or None on error."""
payload: bytes = json.dumps({
"jsonrpc": "2.0", "id": 1,
"method": method, "params": params or [],
}).encode()
req = Request(url, data=payload,
headers={"Content-Type": "application/json"})
try:
with urllib.request.urlopen(req, timeout=timeout) as resp:
data: dict[str, object] = json.loads(resp.read())
return data.get("result")
except (urllib.error.URLError, json.JSONDecodeError, OSError, TimeoutError) as e:
log.debug("rpc_post %s %s failed: %s", url, method, e)
return None
def head_no_follow(url: str, timeout: float = 3) -> tuple[str | None, float]:
"""HEAD request without following redirects.
Returns (Location header value, latency_sec) if the server returned a
3xx redirect. Returns (None, 0.0) on any error or non-redirect response.
"""
opener: urllib.request.OpenerDirector = urllib.request.build_opener(_NoRedirectHandler)
req = Request(url, method="HEAD")
try:
start: float = time.monotonic()
resp: HTTPResponse = opener.open(req, timeout=timeout) # type: ignore[assignment]
latency: float = time.monotonic() - start
# Non-redirect (2xx) — server didn't redirect, not useful for discovery
location: str | None = resp.headers.get("Location")
resp.close()
return location, latency
except urllib.error.HTTPError as e:
# 3xx redirects raise HTTPError with the redirect info
latency = time.monotonic() - start # type: ignore[possibly-undefined]
location = e.headers.get("Location")
if location and 300 <= e.code < 400:
return location, latency
return None, 0.0
except (urllib.error.URLError, OSError, TimeoutError):
return None, 0.0
# -- Discovery -----------------------------------------------------------------
def get_current_slot(rpc_url: str) -> int | None:
"""Get current slot from RPC."""
result: object | None = rpc_post(rpc_url, "getSlot")
if isinstance(result, int):
return result
return None
def get_cluster_rpc_nodes(rpc_url: str, version_filter: str | None = None) -> list[str]:
"""Get all RPC node addresses from getClusterNodes."""
result: object | None = rpc_post(rpc_url, "getClusterNodes")
if not isinstance(result, list):
return []
rpc_addrs: list[str] = []
for node in result:
if not isinstance(node, dict):
continue
if version_filter is not None:
node_version: str | None = node.get("version")
if node_version and not node_version.startswith(version_filter):
continue
rpc: str | None = node.get("rpc")
if rpc:
rpc_addrs.append(rpc)
return list(set(rpc_addrs))
def _parse_snapshot_filename(location: str) -> tuple[str, str | None]:
"""Extract filename and full redirect path from Location header.
Returns (filename, full_path). full_path includes any path prefix
the server returned (e.g. '/snapshots/snapshot-123-hash.tar.zst').
"""
# Location may be absolute URL or relative path
if location.startswith("http://") or location.startswith("https://"):
# Absolute URL — extract path
from urllib.parse import urlparse
path: str = urlparse(location).path
else:
path = location
filename: str = path.rsplit("/", 1)[-1]
return filename, path
def probe_rpc_snapshot(
rpc_address: str,
current_slot: int,
max_age_slots: int,
max_latency_ms: float,
) -> SnapshotSource | None:
"""Probe a single RPC node for available snapshots.
Probes for full snapshot first (required), then incremental. Records all
available files. Which files to actually download is decided at download
time based on what already exists locally not here.
Based on the discovery approach from etcusr/solana-snapshot-finder.
"""
full_url: str = f"http://{rpc_address}/snapshot.tar.bz2"
# Full snapshot is required — every source must have one
full_location, full_latency = head_no_follow(full_url, timeout=2)
if not full_location:
return None
latency_ms: float = full_latency * 1000
if latency_ms > max_latency_ms:
return None
full_filename, full_path = _parse_snapshot_filename(full_location)
fm: re.Match[str] | None = FULL_SNAP_RE.match(full_filename)
if not fm:
return None
full_snap_slot: int = int(fm.group(1))
slots_diff: int = current_slot - full_snap_slot
if slots_diff > max_age_slots or slots_diff < -100:
return None
file_paths: list[str] = [full_path]
# Also check for incremental snapshot
inc_url: str = f"http://{rpc_address}/incremental-snapshot.tar.bz2"
inc_location, _ = head_no_follow(inc_url, timeout=2)
if inc_location:
inc_filename, inc_path = _parse_snapshot_filename(inc_location)
m: re.Match[str] | None = INCR_SNAP_RE.match(inc_filename)
if m:
inc_base_slot: int = int(m.group(1))
# Incremental must be based on this source's full snapshot
if inc_base_slot == full_snap_slot:
file_paths.append(inc_path)
return SnapshotSource(
rpc_address=rpc_address,
file_paths=file_paths,
slots_diff=slots_diff,
latency_ms=latency_ms,
)
def discover_sources(
rpc_url: str,
current_slot: int,
max_age_slots: int,
max_latency_ms: float,
threads: int,
version_filter: str | None,
) -> list[SnapshotSource]:
"""Discover all snapshot sources from the cluster."""
rpc_nodes: list[str] = get_cluster_rpc_nodes(rpc_url, version_filter)
if not rpc_nodes:
log.error("No RPC nodes found via getClusterNodes")
return []
log.info("Found %d RPC nodes, probing for snapshots...", len(rpc_nodes))
sources: list[SnapshotSource] = []
with concurrent.futures.ThreadPoolExecutor(max_workers=threads) as pool:
futures: dict[concurrent.futures.Future[SnapshotSource | None], str] = {
pool.submit(
probe_rpc_snapshot, addr, current_slot,
max_age_slots, max_latency_ms,
): addr
for addr in rpc_nodes
}
done: int = 0
for future in concurrent.futures.as_completed(futures):
done += 1
if done % 200 == 0:
log.info(" probed %d/%d nodes, %d sources found",
done, len(rpc_nodes), len(sources))
try:
result: SnapshotSource | None = future.result()
except (urllib.error.URLError, OSError, TimeoutError) as e:
log.debug("Probe failed for %s: %s", futures[future], e)
continue
if result:
sources.append(result)
log.info("Found %d RPC nodes with suitable snapshots", len(sources))
return sources
# -- Speed benchmark -----------------------------------------------------------
def measure_speed(rpc_address: str, measure_time: int = 7) -> float:
"""Measure download speed from an RPC node. Returns bytes/sec."""
url: str = f"http://{rpc_address}/snapshot.tar.bz2"
req = Request(url)
try:
with urllib.request.urlopen(req, timeout=measure_time + 5) as resp:
start: float = time.monotonic()
total: int = 0
while True:
elapsed: float = time.monotonic() - start
if elapsed >= measure_time:
break
chunk: bytes = resp.read(81920)
if not chunk:
break
total += len(chunk)
elapsed = time.monotonic() - start
if elapsed <= 0:
return 0.0
return total / elapsed
except (urllib.error.URLError, OSError, TimeoutError):
return 0.0
# -- Download ------------------------------------------------------------------
def download_aria2c(
urls: list[str],
output_dir: str,
filename: str,
connections: int = 16,
) -> bool:
"""Download a file using aria2c with parallel connections.
When multiple URLs are provided, aria2c treats them as mirrors of the
same file and distributes chunks across all of them.
"""
num_mirrors: int = len(urls)
total_splits: int = max(connections, connections * num_mirrors)
cmd: list[str] = [
"aria2c",
"--file-allocation=none",
"--continue=true",
f"--max-connection-per-server={connections}",
f"--split={total_splits}",
"--min-split-size=50M",
# aria2c retries individual chunk connections on transient network
# errors (TCP reset, timeout). This is transport-level retry analogous
# to TCP retransmit, not application-level retry of a failed operation.
"--max-tries=5",
"--retry-wait=5",
"--timeout=60",
"--connect-timeout=10",
"--summary-interval=10",
"--console-log-level=notice",
f"--dir={output_dir}",
f"--out={filename}",
"--auto-file-renaming=false",
"--allow-overwrite=true",
*urls,
]
log.info("Downloading %s", filename)
log.info(" aria2c: %d connections × %d mirrors (%d splits)",
connections, num_mirrors, total_splits)
start: float = time.monotonic()
result: subprocess.CompletedProcess[bytes] = subprocess.run(cmd)
elapsed: float = time.monotonic() - start
if result.returncode != 0:
log.error("aria2c failed with exit code %d", result.returncode)
return False
filepath: Path = Path(output_dir) / filename
if not filepath.exists():
log.error("aria2c reported success but %s does not exist", filepath)
return False
size_bytes: int = filepath.stat().st_size
size_gb: float = size_bytes / (1024 ** 3)
avg_mb: float = size_bytes / elapsed / (1024 ** 2) if elapsed > 0 else 0
log.info(" Done: %.1f GB in %.0fs (%.1f MiB/s avg)", size_gb, elapsed, avg_mb)
return True
# -- Main ----------------------------------------------------------------------
def main() -> int:
p: argparse.ArgumentParser = argparse.ArgumentParser(
description="Download Solana snapshots with aria2c parallel downloads",
)
p.add_argument("-o", "--output", default="/srv/solana/snapshots",
help="Snapshot output directory (default: /srv/solana/snapshots)")
p.add_argument("-c", "--cluster", default="mainnet-beta",
choices=list(CLUSTER_RPC),
help="Solana cluster (default: mainnet-beta)")
p.add_argument("-r", "--rpc", default=None,
help="RPC URL for cluster discovery (default: public RPC)")
p.add_argument("-n", "--connections", type=int, default=16,
help="aria2c connections per download (default: 16)")
p.add_argument("-t", "--threads", type=int, default=500,
help="Threads for parallel RPC probing (default: 500)")
p.add_argument("--max-snapshot-age", type=int, default=1300,
help="Max snapshot age in slots (default: 1300)")
p.add_argument("--max-latency", type=float, default=100,
help="Max RPC probe latency in ms (default: 100)")
p.add_argument("--min-download-speed", type=int, default=20,
help="Min download speed in MiB/s (default: 20)")
p.add_argument("--measurement-time", type=int, default=7,
help="Speed measurement duration in seconds (default: 7)")
p.add_argument("--max-speed-checks", type=int, default=15,
help="Max nodes to benchmark before giving up (default: 15)")
p.add_argument("--version", default=None,
help="Filter nodes by version prefix (e.g. '2.2')")
p.add_argument("--full-only", action="store_true",
help="Download only full snapshot, skip incremental")
p.add_argument("--dry-run", action="store_true",
help="Find best source and print URL, don't download")
p.add_argument("-v", "--verbose", action="store_true")
args: argparse.Namespace = p.parse_args()
logging.basicConfig(
level=logging.DEBUG if args.verbose else logging.INFO,
format="%(asctime)s %(levelname)s %(message)s",
datefmt="%H:%M:%S",
)
rpc_url: str = args.rpc or CLUSTER_RPC[args.cluster]
# aria2c is required for actual downloads (not dry-run)
if not args.dry_run and not shutil.which("aria2c"):
log.error("aria2c not found. Install with: apt install aria2")
return 1
# Get current slot
log.info("Cluster: %s | RPC: %s", args.cluster, rpc_url)
current_slot: int | None = get_current_slot(rpc_url)
if current_slot is None:
log.error("Cannot get current slot from %s", rpc_url)
return 1
log.info("Current slot: %d", current_slot)
# Discover sources
sources: list[SnapshotSource] = discover_sources(
rpc_url, current_slot,
max_age_slots=args.max_snapshot_age,
max_latency_ms=args.max_latency,
threads=args.threads,
version_filter=args.version,
)
if not sources:
log.error("No snapshot sources found")
return 1
# Sort by latency (lowest first) for speed benchmarking
sources.sort(key=lambda s: s.latency_ms)
# Benchmark top candidates — all speeds in MiB/s (binary, 1 MiB = 1048576 bytes)
log.info("Benchmarking download speed on top %d sources...", args.max_speed_checks)
fast_sources: list[SnapshotSource] = []
checked: int = 0
min_speed_bytes: int = args.min_download_speed * 1024 * 1024 # MiB to bytes
for source in sources:
if checked >= args.max_speed_checks:
break
checked += 1
speed: float = measure_speed(source.rpc_address, args.measurement_time)
source.download_speed = speed
speed_mib: float = speed / (1024 ** 2)
if speed < min_speed_bytes:
log.info(" %s: %.1f MiB/s (too slow, need >=%d MiB/s)",
source.rpc_address, speed_mib, args.min_download_speed)
continue
log.info(" %s: %.1f MiB/s (latency: %.0fms, age: %d slots)",
source.rpc_address, speed_mib,
source.latency_ms, source.slots_diff)
fast_sources.append(source)
if not fast_sources:
log.error("No source met minimum speed requirement (%d MiB/s)",
args.min_download_speed)
log.info("Try: --min-download-speed 10")
return 1
# Use the fastest source as primary, collect mirrors for each file
best: SnapshotSource = fast_sources[0]
file_paths: list[str] = best.file_paths
if args.full_only:
file_paths = [fp for fp in file_paths
if fp.rsplit("/", 1)[-1].startswith("snapshot-")]
# Build mirror URL lists: for each file, collect URLs from all fast sources
# that serve the same filename
download_plan: list[tuple[str, list[str]]] = []
for fp in file_paths:
filename: str = fp.rsplit("/", 1)[-1]
mirror_urls: list[str] = [f"http://{best.rpc_address}{fp}"]
for other in fast_sources[1:]:
for other_fp in other.file_paths:
if other_fp.rsplit("/", 1)[-1] == filename:
mirror_urls.append(f"http://{other.rpc_address}{other_fp}")
break
download_plan.append((filename, mirror_urls))
speed_mib: float = best.download_speed / (1024 ** 2)
log.info("Best source: %s (%.1f MiB/s), %d mirrors total",
best.rpc_address, speed_mib, len(fast_sources))
for filename, mirror_urls in download_plan:
log.info(" %s (%d mirrors)", filename, len(mirror_urls))
for url in mirror_urls:
log.info(" %s", url)
if args.dry_run:
for _, mirror_urls in download_plan:
for url in mirror_urls:
print(url)
return 0
# Download — skip files that already exist locally
os.makedirs(args.output, exist_ok=True)
total_start: float = time.monotonic()
for filename, mirror_urls in download_plan:
filepath: Path = Path(args.output) / filename
if filepath.exists() and filepath.stat().st_size > 0:
log.info("Skipping %s (already exists: %.1f GB)",
filename, filepath.stat().st_size / (1024 ** 3))
continue
if not download_aria2c(mirror_urls, args.output, filename, args.connections):
log.error("Failed to download %s", filename)
return 1
total_elapsed: float = time.monotonic() - total_start
log.info("All downloads complete in %.0fs", total_elapsed)
for filename, _ in download_plan:
fp: Path = Path(args.output) / filename
if fp.exists():
log.info(" %s (%.1f GB)", fp.name, fp.stat().st_size / (1024 ** 3))
return 0
if __name__ == "__main__":
sys.exit(main())