fix: ashburn relay playbooks and document DZ tunnel ACL root cause
Playbook fixes from testing: - ashburn-relay-biscayne: insert DNAT rules at position 1 before Docker's ADDRTYPE LOCAL rule (was being swallowed at position 3+) - ashburn-relay-mia-sw01: add inbound route for 137.239.194.65 via egress-vrf vrf1 (nexthop only, no interface — EOS silently drops cross-VRF routes that specify a tunnel interface) - ashburn-relay-was-sw01: replace PBR with static route, remove Loopback101 Bug doc (bug-ashburn-tunnel-port-filtering.md): root cause is the DoubleZero agent on mia-sw01 overwrites SEC-USER-500-IN ACL, dropping outbound gossip with src 137.239.194.65. The DZ agent controls Tunnel500's lifecycle. Fix requires a separate GRE tunnel using mia-sw01's free LAN IP (209.42.167.137) to bypass DZ infrastructure. Also adds all repo docs, scripts, inventory, and remaining playbooks. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>fix/kind-mount-propagation
parent
6841d5e3c3
commit
0b52fc99d7
|
|
@ -0,0 +1,3 @@
|
||||||
|
.venv/
|
||||||
|
sessions.duckdb
|
||||||
|
sessions.duckdb.wal
|
||||||
|
|
@ -0,0 +1,204 @@
|
||||||
|
# Biscayne Agave Runbook
|
||||||
|
|
||||||
|
## Cluster Operations
|
||||||
|
|
||||||
|
### Shutdown Order
|
||||||
|
|
||||||
|
The agave validator runs inside a kind-based k8s cluster managed by `laconic-so`.
|
||||||
|
The kind node is a Docker container. **Never restart or kill the kind node container
|
||||||
|
while the validator is running.** Agave uses `io_uring` for async I/O, and on ZFS,
|
||||||
|
killing the process can produce unkillable kernel threads (D-state in
|
||||||
|
`io_wq_put_and_exit` blocked on ZFS transaction commits). This deadlocks the
|
||||||
|
container's PID namespace, making `docker stop`, `docker restart`, `docker exec`,
|
||||||
|
and even `reboot` hang.
|
||||||
|
|
||||||
|
Correct shutdown sequence:
|
||||||
|
|
||||||
|
1. Scale the deployment to 0 and wait for the pod to terminate:
|
||||||
|
```
|
||||||
|
kubectl scale deployment laconic-70ce4c4b47e23b85-deployment \
|
||||||
|
-n laconic-laconic-70ce4c4b47e23b85 --replicas=0
|
||||||
|
kubectl wait --for=delete pod -l app=laconic-70ce4c4b47e23b85-deployment \
|
||||||
|
-n laconic-laconic-70ce4c4b47e23b85 --timeout=120s
|
||||||
|
```
|
||||||
|
2. Only then restart the kind node if needed:
|
||||||
|
```
|
||||||
|
docker restart laconic-70ce4c4b47e23b85-control-plane
|
||||||
|
```
|
||||||
|
3. Scale back up:
|
||||||
|
```
|
||||||
|
kubectl scale deployment laconic-70ce4c4b47e23b85-deployment \
|
||||||
|
-n laconic-laconic-70ce4c4b47e23b85 --replicas=1
|
||||||
|
```
|
||||||
|
|
||||||
|
### Ramdisk
|
||||||
|
|
||||||
|
The accounts directory must be on a ramdisk for performance. `/dev/ram0` loses its
|
||||||
|
filesystem on reboot and must be reformatted before mounting.
|
||||||
|
|
||||||
|
**Boot ordering is handled by systemd units** (installed by `biscayne-boot.yml`):
|
||||||
|
- `format-ramdisk.service`: runs `mkfs.xfs -f /dev/ram0` before `local-fs.target`
|
||||||
|
- fstab entry: mounts `/dev/ram0` at `/srv/solana/ramdisk` with
|
||||||
|
`x-systemd.requires=format-ramdisk.service`
|
||||||
|
- `ramdisk-accounts.service`: creates `/srv/solana/ramdisk/accounts` and sets
|
||||||
|
ownership after the mount
|
||||||
|
|
||||||
|
These units run before docker, so the kind node's bind mounts always see the
|
||||||
|
ramdisk. **No manual intervention is needed after reboot.**
|
||||||
|
|
||||||
|
**Mount propagation**: The kind node bind-mounts `/srv/kind` → `/mnt`. Because
|
||||||
|
the ramdisk is mounted at `/srv/solana/ramdisk` and symlinked/overlaid through
|
||||||
|
`/srv/kind/solana/ramdisk`, mount propagation makes it visible inside the kind
|
||||||
|
node at `/mnt/solana/ramdisk` without restarting the kind node. **Do NOT restart
|
||||||
|
the kind node just to pick up a ramdisk mount.**
|
||||||
|
|
||||||
|
### KUBECONFIG
|
||||||
|
|
||||||
|
kubectl must be told where the kubeconfig is when running as root or via ansible:
|
||||||
|
```
|
||||||
|
KUBECONFIG=/home/rix/.kube/config kubectl ...
|
||||||
|
```
|
||||||
|
|
||||||
|
The ansible playbooks set `environment: KUBECONFIG: /home/rix/.kube/config`.
|
||||||
|
|
||||||
|
### SSH Agent
|
||||||
|
|
||||||
|
SSH to biscayne goes through a ProxyCommand jump host (abernathy.ch2.vaasl.io).
|
||||||
|
The SSH agent socket rotates when the user reconnects. Find the current one:
|
||||||
|
```
|
||||||
|
ls -t /tmp/ssh-*/agent.* | head -1
|
||||||
|
```
|
||||||
|
Then export it:
|
||||||
|
```
|
||||||
|
export SSH_AUTH_SOCK=/tmp/ssh-XXXX/agent.NNNN
|
||||||
|
```
|
||||||
|
|
||||||
|
### io_uring/ZFS Deadlock — Root Cause
|
||||||
|
|
||||||
|
When agave-validator is killed while performing I/O against ZFS-backed paths (not
|
||||||
|
the ramdisk), io_uring worker threads get stuck in D-state:
|
||||||
|
```
|
||||||
|
io_wq_put_and_exit → dsl_dir_tempreserve_space (ZFS module)
|
||||||
|
```
|
||||||
|
These threads are unkillable (SIGKILL has no effect on D-state processes). They
|
||||||
|
prevent the container's PID namespace from being reaped (`zap_pid_ns_processes`
|
||||||
|
waits forever), which breaks `docker stop`, `docker restart`, `docker exec`, and
|
||||||
|
even `reboot`. The only fix is a hard power cycle.
|
||||||
|
|
||||||
|
**Prevention**: Always scale the deployment to 0 and wait for the pod to terminate
|
||||||
|
before any destructive operation (namespace delete, kind restart, host reboot).
|
||||||
|
The `biscayne-stop.yml` playbook enforces this.
|
||||||
|
|
||||||
|
### laconic-so Architecture
|
||||||
|
|
||||||
|
`laconic-so` manages kind clusters atomically — `deployment start` creates the
|
||||||
|
kind cluster, namespace, PVs, PVCs, and deployment in one shot. There is no way
|
||||||
|
to create the cluster without deploying the pod.
|
||||||
|
|
||||||
|
Key code paths in stack-orchestrator:
|
||||||
|
- `deploy_k8s.py:up()` — creates everything atomically
|
||||||
|
- `cluster_info.py:get_pvs()` — translates host paths using `kind-mount-root`
|
||||||
|
- `helpers_k8s.py:get_kind_pv_bind_mount_path()` — strips `kind-mount-root`
|
||||||
|
prefix and prepends `/mnt/`
|
||||||
|
- `helpers_k8s.py:_generate_kind_mounts()` — when `kind-mount-root` is set,
|
||||||
|
emits a single `/srv/kind` → `/mnt` mount instead of individual mounts
|
||||||
|
|
||||||
|
The `kind-mount-root: /srv/kind` setting in `spec.yml` means all data volumes
|
||||||
|
whose host paths start with `/srv/kind` get translated to `/mnt/...` inside the
|
||||||
|
kind node via a single bind mount.
|
||||||
|
|
||||||
|
### Key Identifiers
|
||||||
|
|
||||||
|
- Kind cluster: `laconic-70ce4c4b47e23b85`
|
||||||
|
- Namespace: `laconic-laconic-70ce4c4b47e23b85`
|
||||||
|
- Deployment: `laconic-70ce4c4b47e23b85-deployment`
|
||||||
|
- Kind node container: `laconic-70ce4c4b47e23b85-control-plane`
|
||||||
|
- Deployment dir: `/srv/deployments/agave`
|
||||||
|
- Snapshot dir: `/srv/solana/snapshots`
|
||||||
|
- Ledger dir: `/srv/solana/ledger`
|
||||||
|
- Accounts dir: `/srv/solana/ramdisk/accounts`
|
||||||
|
- Log dir: `/srv/solana/log`
|
||||||
|
- Host bind mount root: `/srv/kind` -> kind node `/mnt`
|
||||||
|
- laconic-so: `/home/rix/.local/bin/laconic-so` (editable install)
|
||||||
|
|
||||||
|
### PV Mount Paths (inside kind node)
|
||||||
|
|
||||||
|
| PV Name | hostPath |
|
||||||
|
|----------------------|-------------------------------|
|
||||||
|
| validator-snapshots | /mnt/solana/snapshots |
|
||||||
|
| validator-ledger | /mnt/solana/ledger |
|
||||||
|
| validator-accounts | /mnt/solana/ramdisk/accounts |
|
||||||
|
| validator-log | /mnt/solana/log |
|
||||||
|
|
||||||
|
### Snapshot Freshness
|
||||||
|
|
||||||
|
If the snapshot is more than **20,000 slots behind** the current mainnet tip, it is
|
||||||
|
too old. Stop the validator, download a fresh snapshot, and restart. Do NOT let it
|
||||||
|
try to catch up from an old snapshot — it will take too long and may never converge.
|
||||||
|
|
||||||
|
Check with:
|
||||||
|
```
|
||||||
|
# Snapshot slot (from filename)
|
||||||
|
ls /srv/solana/snapshots/snapshot-*.tar.*
|
||||||
|
|
||||||
|
# Current mainnet slot
|
||||||
|
curl -s -X POST -H "Content-Type: application/json" \
|
||||||
|
-d '{"jsonrpc":"2.0","id":1,"method":"getSlot","params":[{"commitment":"finalized"}]}' \
|
||||||
|
https://api.mainnet-beta.solana.com
|
||||||
|
```
|
||||||
|
|
||||||
|
### Snapshot Leapfrog Recovery
|
||||||
|
|
||||||
|
When the validator is stuck in a repair-dependent gap (incomplete shreds from a
|
||||||
|
relay outage or insufficient turbine coverage), "grinding through" doesn't work.
|
||||||
|
At 0.4 slots/sec replay through incomplete blocks vs 2.5 slots/sec chain
|
||||||
|
production, the gap grows faster than it shrinks.
|
||||||
|
|
||||||
|
**Strategy**: Download a fresh snapshot whose slot lands *past* the incomplete zone,
|
||||||
|
into the range where turbine+relay shreds are accumulating in the blockstore.
|
||||||
|
**Keep the existing ledger** — it has those shreds. The validator replays from
|
||||||
|
local blockstore data instead of waiting on repair.
|
||||||
|
|
||||||
|
**Steps**:
|
||||||
|
1. Let the validator run — turbine+relay accumulate shreds at the tip
|
||||||
|
2. Monitor shred completeness at the tip:
|
||||||
|
`scripts/check-shred-completeness.sh 500`
|
||||||
|
3. When there's a contiguous run of complete blocks (>100 slots), note the
|
||||||
|
starting slot of that run
|
||||||
|
4. Scale to 0, wipe accounts (ramdisk), wipe old snapshots
|
||||||
|
5. **Do NOT wipe ledger** — it has the turbine shreds
|
||||||
|
6. Download a fresh snapshot (its slot should be within the complete run)
|
||||||
|
7. Scale to 1 — validator replays from local blockstore at 3-5 slots/sec
|
||||||
|
|
||||||
|
**Why this works**: Turbine delivers ~60% of shreds in real-time. Repair fills
|
||||||
|
the rest for recent slots quickly (peers prioritize recent data). The only
|
||||||
|
problem is repair for *old* slots (minutes/hours behind) which peers deprioritize.
|
||||||
|
By snapshotting past the gap, we skip the old-slot repair bottleneck entirely.
|
||||||
|
|
||||||
|
### Shred Relay (Ashburn)
|
||||||
|
|
||||||
|
The TVU shred relay from laconic-was-sw01 provides ~4,000 additional shreds/sec.
|
||||||
|
Without it, turbine alone delivers ~60% of blocks. With it, completeness improves
|
||||||
|
but still requires repair for full coverage.
|
||||||
|
|
||||||
|
**Current state**: Old pipeline (monitor session + socat + shred-unwrap.py).
|
||||||
|
The traffic-policy redirect was never committed (auto-revert after 5 min timer).
|
||||||
|
See `docs/tvu-shred-relay.md` for the traffic-policy config that needs to be
|
||||||
|
properly applied.
|
||||||
|
|
||||||
|
**Boot dependency**: `shred-unwrap.py` must be running on biscayne for the old
|
||||||
|
pipeline to work. It is NOT persistent across reboots. The iptables DNAT rule
|
||||||
|
for the new pipeline IS persistent (iptables-persistent installed).
|
||||||
|
|
||||||
|
### Redeploy Flow
|
||||||
|
|
||||||
|
See `playbooks/biscayne-redeploy.yml`. The scale-to-0 pattern is required because
|
||||||
|
`laconic-so` creates the cluster and deploys the pod atomically:
|
||||||
|
|
||||||
|
1. Delete namespace (teardown)
|
||||||
|
2. Optionally wipe data
|
||||||
|
3. `laconic-so deployment start` (creates cluster + pod)
|
||||||
|
4. Immediately scale to 0
|
||||||
|
5. Download snapshot via aria2c
|
||||||
|
6. Scale to 1
|
||||||
|
7. Verify
|
||||||
|
|
@ -0,0 +1,3 @@
|
||||||
|
# biscayne-agave-runbook
|
||||||
|
|
||||||
|
Ansible playbooks for operating the kind-based agave-stack deployment on biscayne.vaasl.io.
|
||||||
|
|
@ -0,0 +1,13 @@
|
||||||
|
[defaults]
|
||||||
|
inventory = inventory/
|
||||||
|
stdout_callback = ansible.builtin.default
|
||||||
|
result_format = yaml
|
||||||
|
callbacks_enabled = profile_tasks
|
||||||
|
retry_files_enabled = false
|
||||||
|
|
||||||
|
[privilege_escalation]
|
||||||
|
become = true
|
||||||
|
become_method = sudo
|
||||||
|
|
||||||
|
[ssh_connection]
|
||||||
|
pipelining = true
|
||||||
|
|
@ -0,0 +1,114 @@
|
||||||
|
# Arista EOS Reference Notes
|
||||||
|
|
||||||
|
Collected from live switch CLI (`?` help) and Arista documentation search
|
||||||
|
results. Switch platform: 7280CR3A, EOS 4.34.0F.
|
||||||
|
|
||||||
|
## PBR (Policy-Based Routing)
|
||||||
|
|
||||||
|
EOS uses `policy-map type pbr` — NOT `traffic-policy` (which is a different
|
||||||
|
feature for ASIC-level traffic policies, not available on all platforms/modes).
|
||||||
|
|
||||||
|
### Syntax
|
||||||
|
|
||||||
|
```
|
||||||
|
! ACL to match traffic
|
||||||
|
ip access-list <ACL-NAME>
|
||||||
|
10 permit <proto> <src> <dst> [ports]
|
||||||
|
|
||||||
|
! Class-map referencing the ACL
|
||||||
|
class-map type pbr match-any <CLASS-NAME>
|
||||||
|
match ip access-group <ACL-NAME>
|
||||||
|
|
||||||
|
! Policy-map with nexthop redirect
|
||||||
|
policy-map type pbr <POLICY-NAME>
|
||||||
|
class <CLASS-NAME>
|
||||||
|
set nexthop <A.B.C.D> ! direct nexthop IP
|
||||||
|
set nexthop recursive <A.B.C.D> ! recursive resolution
|
||||||
|
! set nexthop-group <NAME> ! nexthop group
|
||||||
|
! set ttl <value> ! TTL override
|
||||||
|
|
||||||
|
! Apply on interface
|
||||||
|
interface <INTF>
|
||||||
|
service-policy type pbr input <POLICY-NAME>
|
||||||
|
```
|
||||||
|
|
||||||
|
### PBR `set` options (from CLI `?`)
|
||||||
|
|
||||||
|
```
|
||||||
|
set ?
|
||||||
|
nexthop Next hop IP address for forwarding
|
||||||
|
nexthop-group next hop group name
|
||||||
|
ttl TTL effective with nexthop/nexthop-group
|
||||||
|
```
|
||||||
|
|
||||||
|
```
|
||||||
|
set nexthop ?
|
||||||
|
A.B.C.D next hop IP address
|
||||||
|
A:B:C:D:E:F:G:H next hop IPv6 address
|
||||||
|
recursive Enable Recursive Next hop resolution
|
||||||
|
```
|
||||||
|
|
||||||
|
**No VRF qualifier on `set nexthop`.** The nexthop must be reachable in the
|
||||||
|
VRF where the policy is applied. For cross-VRF PBR, use a static inter-VRF
|
||||||
|
route to make the nexthop reachable (see below).
|
||||||
|
|
||||||
|
## Static Inter-VRF Routes
|
||||||
|
|
||||||
|
Source: [EOS 4.34.0F - Static Inter-VRF Route](https://www.arista.com/en/um-eos/eos-static-inter-vrf-route)
|
||||||
|
|
||||||
|
Allows configuring a static route in one VRF with a nexthop evaluated in a
|
||||||
|
different VRF. Uses the `egress-vrf` keyword.
|
||||||
|
|
||||||
|
### Syntax
|
||||||
|
|
||||||
|
```
|
||||||
|
ip route vrf <ingress-vrf> <prefix>/<mask> egress-vrf <egress-vrf> <nexthop-ip>
|
||||||
|
ip route vrf <ingress-vrf> <prefix>/<mask> egress-vrf <egress-vrf> <interface>
|
||||||
|
```
|
||||||
|
|
||||||
|
### Examples (from Arista docs)
|
||||||
|
|
||||||
|
```
|
||||||
|
! Route in vrf1 with nexthop resolved in default VRF
|
||||||
|
ip route vrf vrf1 1.0.1.0/24 egress-vrf default 1.0.0.2
|
||||||
|
|
||||||
|
! show ip route vrf vrf1 output:
|
||||||
|
! S 1.0.1.0/24 [1/0] via 1.0.0.2, Vlan2180 (egress VRF default)
|
||||||
|
```
|
||||||
|
|
||||||
|
### Key points
|
||||||
|
|
||||||
|
- For bidirectional traffic, static inter-VRF routes must be configured in
|
||||||
|
both VRFs.
|
||||||
|
- ECMP next-hop sets across same or heterogeneous egress VRFs are supported.
|
||||||
|
- The `show ip route vrf` output displays the egress VRF name when it differs
|
||||||
|
from the source VRF.
|
||||||
|
|
||||||
|
## Inter-VRF Local Route Leaking
|
||||||
|
|
||||||
|
Source: [EOS 4.35.1F - Inter-VRF Local Route Leaking](https://www.arista.com/en/um-eos/eos-inter-vrf-local-route-leaking)
|
||||||
|
|
||||||
|
An alternative to static inter-VRF routes that leaks routes dynamically from
|
||||||
|
one VRF (source) to another VRF (destination) on the same router.
|
||||||
|
|
||||||
|
## Config Sessions
|
||||||
|
|
||||||
|
```
|
||||||
|
configure session <name> ! enter named session
|
||||||
|
show session-config diffs ! MUST be run from inside the session
|
||||||
|
commit timer HH:MM:SS ! commit with auto-revert timer
|
||||||
|
abort ! discard session
|
||||||
|
```
|
||||||
|
|
||||||
|
From enable mode:
|
||||||
|
```
|
||||||
|
configure session <name> commit ! finalize a pending session
|
||||||
|
```
|
||||||
|
|
||||||
|
## Checkpoints and Rollback
|
||||||
|
|
||||||
|
```
|
||||||
|
configure checkpoint save <name>
|
||||||
|
rollback running-config checkpoint <name>
|
||||||
|
write memory
|
||||||
|
```
|
||||||
File diff suppressed because it is too large
Load Diff
|
|
@ -0,0 +1,181 @@
|
||||||
|
<!-- Source: https://www.arista.com/um-eos/eos-ingress-and-egress-per-port-for-ipv4-and-ipv6-counters -->
|
||||||
|
<!-- Scraped: 2026-03-06T20:50:41.080Z -->
|
||||||
|
|
||||||
|
# Ingress and Egress Per-Port for IPv4 and IPv6 Counters
|
||||||
|
|
||||||
|
|
||||||
|
This feature supports per-interface ingress and egress packet and byte counters for IPv4
|
||||||
|
and IPv6.
|
||||||
|
|
||||||
|
|
||||||
|
This section describes Ingress and Egress per-port for IPv4 and IPv6 counters, including
|
||||||
|
configuration instructions and command descriptions.
|
||||||
|
|
||||||
|
|
||||||
|
Topics covered by this chapter include:
|
||||||
|
|
||||||
|
|
||||||
|
- Configuration
|
||||||
|
|
||||||
|
- Show commands
|
||||||
|
|
||||||
|
- Dedicated ARP Entry for TX IPv4 and IPv6 Counters
|
||||||
|
|
||||||
|
- Considerations
|
||||||
|
|
||||||
|
|
||||||
|
## Configuration
|
||||||
|
|
||||||
|
|
||||||
|
IPv4 and IPv6 ingress counters (count **bridged and routed**
|
||||||
|
traffic, supported only on front-panel ports) can be enabled and disabled using the
|
||||||
|
**hardware counter feature ip in**
|
||||||
|
command:
|
||||||
|
|
||||||
|
|
||||||
|
```
|
||||||
|
`**[no] hardware counter feature ip in**`
|
||||||
|
```
|
||||||
|
|
||||||
|
|
||||||
|
For IPv4 and IPv6 ingress and egress counters that include only
|
||||||
|
**routed** traffic (supported on Layer3 interfaces such as
|
||||||
|
routed ports and L3 subinterfaces only), use the following commands:
|
||||||
|
|
||||||
|
|
||||||
|
Note: The DCS-7300X, DCS-7250X, DCS-7050X, and DCS-7060X platforms
|
||||||
|
do not require configuration for IPv4 and IPv6 packet counters for only routed
|
||||||
|
traffic. They are collected by default. Other platforms (DCS-7280SR, DCS-7280CR, and
|
||||||
|
DCS-7500-R) need the feature enabled.
|
||||||
|
|
||||||
|
|
||||||
|
```
|
||||||
|
`**[no] hardware counter feature ip in layer3**`
|
||||||
|
```
|
||||||
|
|
||||||
|
|
||||||
|
```
|
||||||
|
`**[no] hardware counter feature ip out layer3**`
|
||||||
|
```
|
||||||
|
|
||||||
|
|
||||||
|
### hardware counter feature ip
|
||||||
|
|
||||||
|
|
||||||
|
Use the **hardware counter feature ip** command to enable ingress
|
||||||
|
and egress counters at Layer 3. The **no** and **default** forms of the command
|
||||||
|
disables the feature. The feature is enabled by default.
|
||||||
|
|
||||||
|
|
||||||
|
**Command Mode**
|
||||||
|
|
||||||
|
|
||||||
|
Configuration mode
|
||||||
|
|
||||||
|
|
||||||
|
**Command Syntax**
|
||||||
|
|
||||||
|
|
||||||
|
**hardware counter feature ip in|out layer3**
|
||||||
|
|
||||||
|
|
||||||
|
**no hardware counter feature ip in|out layer3**
|
||||||
|
|
||||||
|
|
||||||
|
**default hardware counter feature in|out layer3**
|
||||||
|
|
||||||
|
|
||||||
|
**Example**
|
||||||
|
|
||||||
|
|
||||||
|
This example enables ingress and egress ip counters for Layer 3.
|
||||||
|
```
|
||||||
|
`**switch(config)# hardware counter feature in layer3**`
|
||||||
|
```
|
||||||
|
|
||||||
|
|
||||||
|
```
|
||||||
|
`**switch(config)# hardware counter feature out layer3**`
|
||||||
|
```
|
||||||
|
|
||||||
|
|
||||||
|
## Show commands
|
||||||
|
|
||||||
|
|
||||||
|
Use the [**show interfaces counters ip**](/um-eos/eos-ethernet-ports#xzx_RbdvgrfI6B) command to
|
||||||
|
display IPv4, IPv6 packets, and octets.
|
||||||
|
|
||||||
|
|
||||||
|
**Example**
|
||||||
|
|
||||||
|
|
||||||
|
```
|
||||||
|
`switch# **show interfaces counters ip**
|
||||||
|
Interface IPv4InOctets IPv4InPkts IPv6InOctets IPv6InPkts
|
||||||
|
Et1/1 0 0 0 0
|
||||||
|
Et1/2 0 0 0 0
|
||||||
|
Et1/3 0 0 0 0
|
||||||
|
Et1/4 0 0 0 0
|
||||||
|
...
|
||||||
|
Interface IPv4OutOctets IPv4OutPkts IPv6OutOctets IPv6OutPkts
|
||||||
|
Et1/1 0 0 0 0
|
||||||
|
Et1/2 0 0 0 0
|
||||||
|
Et1/3 0 0 0 0
|
||||||
|
Et1/4 0 0 0 0
|
||||||
|
...`
|
||||||
|
```
|
||||||
|
|
||||||
|
|
||||||
|
You can also query the output from the **show interfaces counters
|
||||||
|
ip** command through snmp via the ARISTA-IP-MIB.
|
||||||
|
|
||||||
|
|
||||||
|
To clear the IPv4 or IPv6 counters, use the [**clear
|
||||||
|
counters**](/um-eos/eos-ethernet-ports#topic_dnd_1nm_vnb) command.
|
||||||
|
|
||||||
|
|
||||||
|
**Example**
|
||||||
|
```
|
||||||
|
`switch# **clear counters**`
|
||||||
|
```
|
||||||
|
|
||||||
|
|
||||||
|
## Dedicated ARP Entry for TX IPv4 and IPv6 Counters
|
||||||
|
|
||||||
|
|
||||||
|
IPv4/IPv6 egress Layer 3 (**hardware counter feature ip out layer3**)
|
||||||
|
counting on DCS-7280SR, DCS-7280CR, and DCS-7500-R platforms work based on ARP entry of
|
||||||
|
the next hop. By default, IPv4's next-hop and IPv6's next-hop resolve to the same MAC
|
||||||
|
address and interface that shared the ARP entry.
|
||||||
|
|
||||||
|
|
||||||
|
To differentiate the counters between IPv4 and IPv6, disable
|
||||||
|
**arp** entry sharing with the following command:
|
||||||
|
|
||||||
|
|
||||||
|
```
|
||||||
|
`**ip hardware fib next-hop arp dedicated**`
|
||||||
|
```
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
Note: This command is required for IPv4 and IPv6 egress counters
|
||||||
|
to operate on the DCS-7280SR, DCS-7280CR, and DCS-7500-R platforms.
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
## Considerations
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
- Packet sizes greater than 9236 bytes are not counted by per-port IPv4 and IPv6 counters.
|
||||||
|
|
||||||
|
- Only the DCS-7260X3, DCS-7368, DCS-7300, DCS-7050SX3, DCS-7050CX3, DCS-7280SR,
|
||||||
|
DCS-7280CR and DCS-7500-R platforms support the **hardware counter feature ip in** command.
|
||||||
|
|
||||||
|
- Only the DCS-7280SR, DCS-7280CR and DCS-7500-R platforms support the **hardware counter feature ip [in|out] layer3** command.
|
||||||
|
|
@ -0,0 +1,305 @@
|
||||||
|
<!-- Source: https://www.arista.com/en/um-eos/eos-inter-vrf-local-route-leaking -->
|
||||||
|
<!-- Scraped: 2026-03-06T20:43:28.363Z -->
|
||||||
|
|
||||||
|
# Inter-VRF Local Route Leaking
|
||||||
|
|
||||||
|
|
||||||
|
Inter-VRF local route leaking allows the leaking of routes from one VRF (the source VRF) to
|
||||||
|
another VRF (the destination VRF) on the same router.
|
||||||
|
Inter-VRF routes can exist in any VRF (including the
|
||||||
|
default VRF) on the system. Routes can be leaked using the
|
||||||
|
following methods:
|
||||||
|
|
||||||
|
- Inter-VRF Local Route Leaking using BGP
|
||||||
|
VPN
|
||||||
|
|
||||||
|
- Inter-VRF Local Route Leaking using VRF-leak
|
||||||
|
Agent
|
||||||
|
|
||||||
|
|
||||||
|
## Inter-VRF Local Route Leaking using BGP VPN
|
||||||
|
|
||||||
|
|
||||||
|
Inter-VRF local route leaking allows the user to export and import routes from one VRF to another
|
||||||
|
on the same device. This is implemented by exporting routes from a VRF to the local VPN table
|
||||||
|
using the route target extended community list and importing the same route target extended
|
||||||
|
community lists from the local VPN table into the target VRF. VRF route leaking is supported
|
||||||
|
on VPN-IPv4, VPN-IPv6, and EVPN types.
|
||||||
|
|
||||||
|
|
||||||
|
Figure 1. Inter-VRF Local Route Leaking using Local VPN Table
|
||||||
|
|
||||||
|
|
||||||
|
### Accessing Shared Resources Across VPNs
|
||||||
|
|
||||||
|
|
||||||
|
To access shared resources across VPNs, all the routes from the shared services VRF must be
|
||||||
|
leaked into each of the VPN VRFs, and customer routes must be leaked into the shared
|
||||||
|
services VRF for return traffic. Accessing shared resources allows the route target of the
|
||||||
|
shared services VRF to be exported into all customer VRFs, and allows the shared services
|
||||||
|
VRF to import route targets from customers A and B. The following figure shows how to
|
||||||
|
provide customers, corresponding to multiple VPN domains, access to services like DHCP
|
||||||
|
available in the shared VRF.
|
||||||
|
|
||||||
|
|
||||||
|
Route leaking across the VRFs is supported
|
||||||
|
on VPN-IPv4, VPN-IPv6, and EVPN.
|
||||||
|
|
||||||
|
|
||||||
|
Figure 2. Accessing Shared Resources Across VPNs
|
||||||
|
|
||||||
|
|
||||||
|
### Configuring Inter-VRF Local Route Leaking
|
||||||
|
|
||||||
|
|
||||||
|
Inter-VRF local route leaking is configured using VPN-IPv4, VPN-IPv6, and EVPN. Prefixes can be
|
||||||
|
exported and imported using any of the configured VPN types. Ensure that the same VPN
|
||||||
|
type that is exported is used while importing.
|
||||||
|
|
||||||
|
|
||||||
|
Leaking unicast IPv4 or IPv6 prefixes is supported and achieved by exporting prefixes locally to
|
||||||
|
the VPN table and importing locally from the VPN table into the target VRF on the same
|
||||||
|
device as shown in the figure titled **Inter-VRF Local Route Leaking using Local VPN
|
||||||
|
Table** using the **route-target** command.
|
||||||
|
|
||||||
|
|
||||||
|
Exporting or importing the routes to or from the EVPN table is accomplished with the following
|
||||||
|
two methods:
|
||||||
|
|
||||||
|
- Using VXLAN for encapsulation
|
||||||
|
|
||||||
|
- Using MPLS for encapsulation
|
||||||
|
|
||||||
|
|
||||||
|
#### Using VXLAN for Encapsulation
|
||||||
|
|
||||||
|
|
||||||
|
To use VXLAN encapsulation type, make sure that VRF to VNI mapping is present and the interface
|
||||||
|
status for the VXLAN interface is up. This is the default encapsulation type for
|
||||||
|
EVPN.
|
||||||
|
|
||||||
|
|
||||||
|
**Example**
|
||||||
|
|
||||||
|
|
||||||
|
The configuration for VXLAN encapsulation type is as
|
||||||
|
follows:
|
||||||
|
```
|
||||||
|
`switch(config)# **router bgp 65001**
|
||||||
|
switch(config-router-bgp)# **address-family evpn**
|
||||||
|
switch(config-router-bgp-af)# **neighbor default encapsulation VXLAN next-hop-self source-interface Loopback0**
|
||||||
|
switch(config)# **hardware tcam**
|
||||||
|
switch(config-hw-tcam)# **system profile VXLAN-routing**
|
||||||
|
switch(config-hw-tcam)# **interface VXLAN1**
|
||||||
|
switch(config-hw-tcam-if-Vx1)# **VXLAN source-interface Loopback0**
|
||||||
|
switch(config-hw-tcam-if-Vx1)# **VXLAN udp-port 4789**
|
||||||
|
switch(config-hw-tcam-if-Vx1)# **VXLAN vrf vrf-blue vni 20001**
|
||||||
|
switch(config-hw-tcam-if-Vx1)# **VXLAN vrf vrf-red vni 10001**`
|
||||||
|
```
|
||||||
|
|
||||||
|
|
||||||
|
#### Using MPLS for Encapsulation
|
||||||
|
|
||||||
|
|
||||||
|
To use MPLS encapsulation type to export
|
||||||
|
to the EVPN table, MPLS needs to be enabled globally on the device and
|
||||||
|
the encapsulation method needs to be changed from default type, that
|
||||||
|
is VXLAN to MPLS under the EVPN address-family sub-mode.
|
||||||
|
|
||||||
|
|
||||||
|
**Example**
|
||||||
|
```
|
||||||
|
`switch(config)# **router bgp 65001**
|
||||||
|
switch(config-router-bgp)# **address-family evpn**
|
||||||
|
switch(config-router-bgp-af)# **neighbor default encapsulation mpls next-hop-self source-interface Loopback0**`
|
||||||
|
```
|
||||||
|
|
||||||
|
|
||||||
|
### Route-Distinguisher
|
||||||
|
|
||||||
|
|
||||||
|
Route-Distinguisher (RD) uniquely identifies routes from a particular VRF.
|
||||||
|
Route-Distinguisher is configured for every VRF from which routes are exported from or
|
||||||
|
imported into.
|
||||||
|
|
||||||
|
|
||||||
|
The following commands are used to configure Route-Distinguisher for a VRF.
|
||||||
|
|
||||||
|
|
||||||
|
```
|
||||||
|
`switch(config-router-bgp)# **vrf vrf-services**
|
||||||
|
switch(config-router-bgp-vrf-vrf-services)# **rd 1.0.0.1:1**
|
||||||
|
|
||||||
|
switch(config-router-bgp)# **vrf vrf-blue**
|
||||||
|
switch(config-router-bgp-vrf-vrf-blue)# **rd 2.0.0.1:2**`
|
||||||
|
```
|
||||||
|
|
||||||
|
|
||||||
|
### Exporting Routes from a VRF
|
||||||
|
|
||||||
|
|
||||||
|
Use the **route-target export** command to export routes from a VRF to the
|
||||||
|
local VPN or EVPN table using the route target
|
||||||
|
extended community list.
|
||||||
|
|
||||||
|
|
||||||
|
**Examples**
|
||||||
|
|
||||||
|
- These commands export routes from
|
||||||
|
**vrf-red** to the local VPN
|
||||||
|
table.
|
||||||
|
```
|
||||||
|
`switch(config)# **service routing protocols model multi-agent**
|
||||||
|
switch(config)# **mpls ip**
|
||||||
|
switch(config)# **router bgp 65001**
|
||||||
|
switch(config-router-bgp)# **vrf vrf-red**
|
||||||
|
switch(config-router-bgp-vrf-vrf-red)# **rd 1:1**
|
||||||
|
switch(config-router-bgp-vrf-vrf-red)# **route-target export vpn-ipv4 10:10**
|
||||||
|
switch(config-router-bgp-vrf-vrf-red)# **route-target export vpn-ipv6 10:20**`
|
||||||
|
```
|
||||||
|
|
||||||
|
- These commands export routes from
|
||||||
|
**vrf-red** to the EVPN
|
||||||
|
table.
|
||||||
|
```
|
||||||
|
`switch(config)# **router bgp 65001**
|
||||||
|
switch(config-router-bgp)# **vrf vrf-red**
|
||||||
|
switch(config-router-bgp-vrf-vrf-red)# **rd 1:1**
|
||||||
|
switch(config-router-bgp-vrf-vrf-red)# **route-target export evpn 10:1**`
|
||||||
|
```
|
||||||
|
|
||||||
|
|
||||||
|
### Importing Routes into a VRF
|
||||||
|
|
||||||
|
|
||||||
|
Use the **route-target import** command to import the exported routes from
|
||||||
|
the local VPN or EVPN table to the target VRF
|
||||||
|
using the route target extended community
|
||||||
|
list.
|
||||||
|
|
||||||
|
|
||||||
|
**Examples**
|
||||||
|
|
||||||
|
- These commands import routes from the VPN
|
||||||
|
table to
|
||||||
|
**vrf-blue**.
|
||||||
|
```
|
||||||
|
`switch(config)# **service routing protocols model multi-agent**
|
||||||
|
switch(config)# **mpls ip**
|
||||||
|
switch(config)# **router bgp 65001**
|
||||||
|
switch(config-router-bgp)# **vrf vrf-blue**
|
||||||
|
switch(config-router-bgp-vrf-vrf-blue)# **rd 2:2**
|
||||||
|
switch(config-router-bgp-vrf-vrf-blue)# **route-target import vpn-ipv4 10:10**
|
||||||
|
switch(config-router-bgp-vrf-vrf-blue)# **route-target import vpn-ipv6 10:20**`
|
||||||
|
```
|
||||||
|
|
||||||
|
- These commands import routes from the EVPN
|
||||||
|
table to
|
||||||
|
**vrf-blue**.
|
||||||
|
```
|
||||||
|
`switch(config)# **router bgp 65001**
|
||||||
|
switch(config-router-bgp)# **vrf vrf-blue**
|
||||||
|
switch(config-router-bgp-vrf-vrf-blue)# **rd 2:2**
|
||||||
|
switch(config-router-bgp-vrf-vrf-blue)# **route-target import evpn 10:1**`
|
||||||
|
```
|
||||||
|
|
||||||
|
|
||||||
|
### Exporting and Importing Routes using Route
|
||||||
|
Map
|
||||||
|
|
||||||
|
|
||||||
|
To manage VRF route leaking, control the export and import prefixes with route-map export or
|
||||||
|
import commands. The route map is effective only if the VRF or the VPN
|
||||||
|
paths are already candidates for export or import. The route-target
|
||||||
|
export or import commandmust be configured first. Setting BGP
|
||||||
|
attributes using route maps is effective only on the export end.
|
||||||
|
|
||||||
|
|
||||||
|
Note: Prefixes that are leaked are not re-exported to the VPN table from the target VRF.
|
||||||
|
|
||||||
|
**Examples**
|
||||||
|
|
||||||
|
- These commands export routes from
|
||||||
|
**vrf-red** to the local VPN
|
||||||
|
table.
|
||||||
|
```
|
||||||
|
`switch(config)# **service routing protocols model multi-agent**
|
||||||
|
switch(config)# **mpls ip**
|
||||||
|
switch(config)# **router bgp 65001**
|
||||||
|
switch(config-router-bgp)# **vrf vrf-red**
|
||||||
|
switch(config-router-bgp-vrf-vrf-red)# **rd 1:1**
|
||||||
|
switch(config-router-bgp-vrf-vrf-red)# **route-target export vpn-ipv4 10:10**
|
||||||
|
switch(config-router-bgp-vrf-vrf-red)# **route-target export vpn-ipv6 10:20**
|
||||||
|
switch(config-router-bgp-vrf-vrf-red)# **route-target export vpn-ipv4 route-map EXPORT_V4_ROUTES_T0_VPN_TABLE**
|
||||||
|
switch(config-router-bgp-vrf-vrf-red)# **route-target export vpn-ipv6 route-map EXPORT_V6_ROUTES_T0_VPN_TABLE**`
|
||||||
|
```
|
||||||
|
|
||||||
|
- These commands export routes to from
|
||||||
|
**vrf-red** to the EVPN
|
||||||
|
table.
|
||||||
|
```
|
||||||
|
`switch(config)# **router bgp 65001**
|
||||||
|
switch(config-router-bgp)# **vrf vrf-red**
|
||||||
|
switch(config-router-bgp-vrf-vrf-red)# **rd 1:1**
|
||||||
|
switch(config-router-bgp-vrf-vrf-red)# **route-target export evpn 10:1**
|
||||||
|
switch(config-router-bgp-vrf-vrf-red)# **route-target export evpn route-map EXPORT_ROUTES_T0_EVPN_TABLE**`
|
||||||
|
```
|
||||||
|
|
||||||
|
- These commands import routes from the VPN table to
|
||||||
|
**vrf-blue**.
|
||||||
|
```
|
||||||
|
`switch(config)# **service routing protocols model multi-agent**
|
||||||
|
switch(config)# **mpls ip**
|
||||||
|
switch(config)# **router bgp 65001**
|
||||||
|
switch(config-router-bgp)# **vrf vrf-blue**
|
||||||
|
switch(config-router-bgp-vrf-vrf-blue)# **rd 1:1**
|
||||||
|
switch(config-router-bgp-vrf-vrf-blue)# **route-target import vpn-ipv4 10:10**
|
||||||
|
switch(config-router-bgp-vrf-vrf-blue)# **route-target import vpn-ipv6 10:20**
|
||||||
|
switch(config-router-bgp-vrf-vrf-blue)# **route-target import vpn-ipv4 route-map IMPORT_V4_ROUTES_VPN_TABLE**
|
||||||
|
switch(config-router-bgp-vrf-vrf-blue)# **route-target import vpn-ipv6 route-map IMPORT_V6_ROUTES_VPN_TABLE**`
|
||||||
|
```
|
||||||
|
|
||||||
|
- These commands import routes from the EVPN table to
|
||||||
|
**vrf-blue**.
|
||||||
|
```
|
||||||
|
`switch(config)# **router bgp 65001**
|
||||||
|
switch(config-router-bgp)# **vrf vrf-blue**
|
||||||
|
switch(config-router-bgp-vrf-vrf-blue)# **rd 2:2**
|
||||||
|
switch(config-router-bgp-vrf-vrf-blue)# **route-target import evpn 10:1**
|
||||||
|
switch(config-router-bgp-vrf-vrf-blue)# **route-target import evpn route-map IMPORT_ROUTES_FROM_EVPN_TABLE**`
|
||||||
|
```
|
||||||
|
|
||||||
|
|
||||||
|
## Inter-VRF Local Route Leaking using VRF-leak
|
||||||
|
Agent
|
||||||
|
|
||||||
|
|
||||||
|
Inter-VRF local route leaking allows routes to leak from one VRF to another using a route
|
||||||
|
map as a VRF-leak agent. VRFs are leaked based on the preferences assigned to each
|
||||||
|
VRF.
|
||||||
|
|
||||||
|
|
||||||
|
### Configuring Route Maps
|
||||||
|
|
||||||
|
|
||||||
|
To leak routes from one VRF to another using a route map, use the [router general](/um-eos/eos-evpn-and-vcs-commands#xx1351777) command to enter Router-General
|
||||||
|
Configuration Mode, then enter the VRF submode for the destination VRF, and use the
|
||||||
|
[leak routes](/um-eos/eos-evpn-and-vcs-commands#reference_g2h_2z3_hwb) command to specify the source
|
||||||
|
VRF and the route map to be used. Routes in the source VRF that match the policy in the
|
||||||
|
route map will then be considered for leaking into the configuration-mode VRF. If two or
|
||||||
|
more policies specify leaking the same prefix to the same destination VRF, the route
|
||||||
|
with a higher (post-set-clause) distance and preference is chosen.
|
||||||
|
|
||||||
|
|
||||||
|
**Example**
|
||||||
|
|
||||||
|
|
||||||
|
These commands configure a route map to leak routes from **VRF1**
|
||||||
|
to **VRF2** using route map
|
||||||
|
**RM1**.
|
||||||
|
```
|
||||||
|
`switch(config)# **router general**
|
||||||
|
switch(config-router-general)# **vrf VRF2**
|
||||||
|
switch(config-router-general-vrf-VRF2)# **leak routes source-vrf VRF1 subscribe-policy RM1**
|
||||||
|
switch(config-router-general-vrf-VRF2)#`
|
||||||
|
```
|
||||||
File diff suppressed because it is too large
Load Diff
File diff suppressed because it is too large
Load Diff
|
|
@ -0,0 +1,82 @@
|
||||||
|
<!-- Source: https://www.arista.com/en/um-eos/eos-static-inter-vrf-route -->
|
||||||
|
<!-- Scraped: 2026-03-06T20:43:17.977Z -->
|
||||||
|
|
||||||
|
# Static Inter-VRF Route
|
||||||
|
|
||||||
|
|
||||||
|
The Static Inter-VRF Route feature adds support for static inter-VRF routes. This enables the configuration of routes to destinations in one ingress VRF with an ability to specify a next-hop in a different egress VRF through a static configuration.
|
||||||
|
|
||||||
|
|
||||||
|
You can configure static inter-VRF routes in default and non-default VRFs. A different
|
||||||
|
egress VRF is achieved by “tagging” the **next-hop** or **forwarding
|
||||||
|
via** with a reference to an egress VRF (different from the source
|
||||||
|
VRF) in which that next-hop should be evaluated. Static inter-VRF routes
|
||||||
|
with ECMP next-hop sets in the same egress VRF or heterogenous egress VRFs
|
||||||
|
can be specified.
|
||||||
|
|
||||||
|
|
||||||
|
The Static Inter-VRF Route feature is independent and complementary to other mechanisms that can be used to setup local inter-VRF routes. The other supported mechanisms in EOS and the broader use-cases they support are documented here:
|
||||||
|
|
||||||
|
- [Inter-VRF Local Route Leaking using BGP VPN](/um-eos/eos-inter-vrf-local-route-leaking#xx1348142)
|
||||||
|
|
||||||
|
- [Inter-VRF Local Route Leaking using VRF-leak Agent](/um-eos/eos-inter-vrf-local-route-leaking#xx1346287)
|
||||||
|
|
||||||
|
|
||||||
|
## Configuration
|
||||||
|
|
||||||
|
|
||||||
|
The configuration to setup static-Inter VRF routes in an ingress (source) VRF to forward IP traffic to a different egress (target) VRF can be done in the following modes:
|
||||||
|
|
||||||
|
- This command creates a static route in one ingress VRF that points to a next-hop
|
||||||
|
in a different egress VRF.
|
||||||
|
ip | ipv6
|
||||||
|
route [vrf
|
||||||
|
vrf-name
|
||||||
|
destination-prefix [egress-vrf
|
||||||
|
egress-next-hop-vrf-name]
|
||||||
|
next-hop]
|
||||||
|
|
||||||
|
|
||||||
|
## Show Commands
|
||||||
|
|
||||||
|
|
||||||
|
Use the **show ip route vrf** to display the egress VRF name if it
|
||||||
|
differs from the source VRF.
|
||||||
|
|
||||||
|
|
||||||
|
**Example**
|
||||||
|
```
|
||||||
|
`switch# **show ip route vrf vrf1**
|
||||||
|
|
||||||
|
VRF: vrf1
|
||||||
|
Codes: C - connected, S - static, K - kernel,
|
||||||
|
O - OSPF, IA - OSPF inter area, E1 - OSPF external type 1,
|
||||||
|
E2 - OSPF external type 2, N1 - OSPF NSSA external type 1,
|
||||||
|
N2 - OSPF NSSA external type2, B - BGP, B I - iBGP, B E - eBGP,
|
||||||
|
R - RIP, I L1 - IS-IS level 1, I L2 - IS-IS level 2,
|
||||||
|
O3 - OSPFv3, A B - BGP Aggregate, A O - OSPF Summary,
|
||||||
|
NG - Nexthop Group Static Route, V - VXLAN Control Service,
|
||||||
|
DH - DHCP client installed default route, M - Martian,
|
||||||
|
DP - Dynamic Policy Route, L - VRF Leaked
|
||||||
|
|
||||||
|
Gateway of last resort is not set
|
||||||
|
|
||||||
|
S 1.0.1.0/24 [1/0] via 1.0.0.2, Vlan2180 (egress VRF default)
|
||||||
|
S 1.0.7.0/24 [1/0] via 1.0.6.2, Vlan2507 (egress VRF vrf3)`
|
||||||
|
```
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
## Limitations
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
- For bidirectional traffic to work correctly between a pair of VRFs, static inter-VRF
|
||||||
|
routes in both VRFs must be configured.
|
||||||
|
|
||||||
|
- Static Inter-VRF routing is supported only in multi-agent routing protocol mode.
|
||||||
File diff suppressed because it is too large
Load Diff
|
|
@ -0,0 +1,275 @@
|
||||||
|
# Ashburn Validator Relay — Full Traffic Redirect
|
||||||
|
|
||||||
|
## Overview
|
||||||
|
|
||||||
|
All validator traffic (gossip, repair, TVU, TPU) enters and exits from
|
||||||
|
`137.239.194.65` (laconic-was-sw01, Ashburn). Peers see the validator as an
|
||||||
|
Ashburn node. This improves repair peer count and slot catchup rate by reducing
|
||||||
|
RTT to the TeraSwitch/Pittsburgh cluster from ~30ms (direct Miami) to ~5ms
|
||||||
|
(Ashburn).
|
||||||
|
|
||||||
|
Supersedes the previous TVU-only shred relay (see `tvu-shred-relay.md`).
|
||||||
|
|
||||||
|
## Architecture
|
||||||
|
|
||||||
|
```
|
||||||
|
OUTBOUND (validator → peers)
|
||||||
|
agave-validator (kind pod, ports 8001, 9000-9025)
|
||||||
|
↓ Docker bridge → host FORWARD chain
|
||||||
|
biscayne host (186.233.184.235)
|
||||||
|
↓ mangle PREROUTING: fwmark 100 on sport 8001,9000-9025 from 172.20.0.0/16
|
||||||
|
↓ nat POSTROUTING: SNAT → src 137.239.194.65
|
||||||
|
↓ policy route: fwmark 100 → table ashburn → via 169.254.7.6 dev doublezero0
|
||||||
|
laconic-mia-sw01 (209.42.167.133, Miami)
|
||||||
|
↓ traffic-policy VALIDATOR-OUTBOUND: src 137.239.194.65 → nexthop 172.16.1.188
|
||||||
|
↓ backbone Et4/1 (25.4ms)
|
||||||
|
laconic-was-sw01 Et4/1 (Ashburn)
|
||||||
|
↓ default route via 64.92.84.80 out Et1/1
|
||||||
|
Internet (peers see src 137.239.194.65)
|
||||||
|
|
||||||
|
INBOUND (peers → validator)
|
||||||
|
Solana peers → 137.239.194.65:8001,9000-9025
|
||||||
|
↓ internet routing to was-sw01
|
||||||
|
laconic-was-sw01 Et1/1 (Ashburn)
|
||||||
|
↓ traffic-policy VALIDATOR-RELAY: ASIC redirect, line rate
|
||||||
|
↓ nexthop 172.16.1.189 via Et4/1 backbone (25.4ms)
|
||||||
|
laconic-mia-sw01 Et4/1 (Miami)
|
||||||
|
↓ L3 forward → biscayne via doublezero0 GRE or ISP routing
|
||||||
|
biscayne (186.233.184.235)
|
||||||
|
↓ nat PREROUTING: DNAT dst 137.239.194.65:* → 172.20.0.2:* (kind node)
|
||||||
|
↓ Docker bridge → validator pod
|
||||||
|
agave-validator
|
||||||
|
```
|
||||||
|
|
||||||
|
RPC traffic (port 8899) is NOT relayed — clients connect directly to biscayne.
|
||||||
|
|
||||||
|
## Switch Config: laconic-was-sw01
|
||||||
|
|
||||||
|
SSH: `install@137.239.200.198`
|
||||||
|
|
||||||
|
### Pre-change
|
||||||
|
|
||||||
|
```
|
||||||
|
configure checkpoint save pre-validator-relay
|
||||||
|
```
|
||||||
|
|
||||||
|
Rollback: `rollback running-config checkpoint pre-validator-relay` then `write memory`.
|
||||||
|
|
||||||
|
### Config session with auto-revert
|
||||||
|
|
||||||
|
```
|
||||||
|
configure session validator-relay
|
||||||
|
|
||||||
|
! Loopback for 137.239.194.65 (do NOT touch Loopback100 which has .64)
|
||||||
|
interface Loopback101
|
||||||
|
ip address 137.239.194.65/32
|
||||||
|
|
||||||
|
! ACL covering all validator ports
|
||||||
|
ip access-list VALIDATOR-RELAY-ACL
|
||||||
|
10 permit udp any any eq 8001
|
||||||
|
20 permit udp any any range 9000 9025
|
||||||
|
30 permit tcp any any eq 8001
|
||||||
|
|
||||||
|
! Traffic-policy: ASIC redirect to backbone (mia-sw01)
|
||||||
|
traffic-policy VALIDATOR-RELAY
|
||||||
|
match VALIDATOR-RELAY-ACL
|
||||||
|
set nexthop 172.16.1.189
|
||||||
|
|
||||||
|
! Replace old SHRED-RELAY on Et1/1
|
||||||
|
interface Ethernet1/1
|
||||||
|
no traffic-policy input SHRED-RELAY
|
||||||
|
traffic-policy input VALIDATOR-RELAY
|
||||||
|
|
||||||
|
! system-rule overriding-action redirect (already present from SHRED-RELAY)
|
||||||
|
|
||||||
|
show session-config diffs
|
||||||
|
commit timer 00:05:00
|
||||||
|
```
|
||||||
|
|
||||||
|
After verification: `configure session validator-relay commit` then `write memory`.
|
||||||
|
|
||||||
|
### Cleanup (after stable)
|
||||||
|
|
||||||
|
Old SHRED-RELAY policy and ACL can be removed once VALIDATOR-RELAY is confirmed:
|
||||||
|
|
||||||
|
```
|
||||||
|
configure session cleanup-shred-relay
|
||||||
|
no traffic-policy SHRED-RELAY
|
||||||
|
no ip access-list SHRED-RELAY-ACL
|
||||||
|
show session-config diffs
|
||||||
|
commit
|
||||||
|
write memory
|
||||||
|
```
|
||||||
|
|
||||||
|
## Switch Config: laconic-mia-sw01
|
||||||
|
|
||||||
|
### Pre-flight checks
|
||||||
|
|
||||||
|
Before applying config, verify:
|
||||||
|
|
||||||
|
1. Which EOS interface terminates the doublezero0 GRE from biscayne
|
||||||
|
(endpoint 209.42.167.133). Check with `show interfaces tunnel` or
|
||||||
|
`show ip interface brief | include Tunnel`.
|
||||||
|
|
||||||
|
2. Whether `system-rule overriding-action redirect` is already configured.
|
||||||
|
Check with `show running-config | include system-rule`.
|
||||||
|
|
||||||
|
3. Whether EOS traffic-policy works on tunnel interfaces. If not, apply on
|
||||||
|
the physical interface where GRE packets arrive (likely Et<X> facing
|
||||||
|
biscayne's ISP network or the DZ infrastructure).
|
||||||
|
|
||||||
|
### Config session
|
||||||
|
|
||||||
|
```
|
||||||
|
configure checkpoint save pre-validator-outbound
|
||||||
|
|
||||||
|
configure session validator-outbound
|
||||||
|
|
||||||
|
! ACL matching outbound validator traffic (source = Ashburn IP)
|
||||||
|
ip access-list VALIDATOR-OUTBOUND-ACL
|
||||||
|
10 permit ip 137.239.194.65/32 any
|
||||||
|
|
||||||
|
! Redirect to was-sw01 via backbone
|
||||||
|
traffic-policy VALIDATOR-OUTBOUND
|
||||||
|
match VALIDATOR-OUTBOUND-ACL
|
||||||
|
set nexthop 172.16.1.188
|
||||||
|
|
||||||
|
! Apply on the interface where biscayne GRE traffic arrives
|
||||||
|
! Replace Tunnel<X> with the actual interface from pre-flight check #1
|
||||||
|
interface Tunnel<X>
|
||||||
|
traffic-policy input VALIDATOR-OUTBOUND
|
||||||
|
|
||||||
|
! Add system-rule if not already present (pre-flight check #2)
|
||||||
|
system-rule overriding-action redirect
|
||||||
|
|
||||||
|
show session-config diffs
|
||||||
|
commit timer 00:05:00
|
||||||
|
```
|
||||||
|
|
||||||
|
After verification: commit + `write memory`.
|
||||||
|
|
||||||
|
## Host Config: biscayne
|
||||||
|
|
||||||
|
Automated via ansible playbook `playbooks/ashburn-validator-relay.yml`.
|
||||||
|
|
||||||
|
### Manual equivalent
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# 1. Accept packets destined for 137.239.194.65
|
||||||
|
sudo ip addr add 137.239.194.65/32 dev lo
|
||||||
|
|
||||||
|
# 2. Inbound DNAT to kind node (172.20.0.2)
|
||||||
|
sudo iptables -t nat -A PREROUTING -p udp -d 137.239.194.65 --dport 8001 \
|
||||||
|
-j DNAT --to-destination 172.20.0.2:8001
|
||||||
|
sudo iptables -t nat -A PREROUTING -p tcp -d 137.239.194.65 --dport 8001 \
|
||||||
|
-j DNAT --to-destination 172.20.0.2:8001
|
||||||
|
sudo iptables -t nat -A PREROUTING -p udp -d 137.239.194.65 --dport 9000:9025 \
|
||||||
|
-j DNAT --to-destination 172.20.0.2
|
||||||
|
|
||||||
|
# 3. Outbound: mark validator traffic
|
||||||
|
sudo iptables -t mangle -A PREROUTING -s 172.20.0.0/16 -p udp --sport 8001 \
|
||||||
|
-j MARK --set-mark 100
|
||||||
|
sudo iptables -t mangle -A PREROUTING -s 172.20.0.0/16 -p udp --sport 9000:9025 \
|
||||||
|
-j MARK --set-mark 100
|
||||||
|
sudo iptables -t mangle -A PREROUTING -s 172.20.0.0/16 -p tcp --sport 8001 \
|
||||||
|
-j MARK --set-mark 100
|
||||||
|
|
||||||
|
# 4. Outbound: SNAT to Ashburn IP (INSERT before Docker MASQUERADE)
|
||||||
|
sudo iptables -t nat -I POSTROUTING 1 -m mark --mark 100 \
|
||||||
|
-j SNAT --to-source 137.239.194.65
|
||||||
|
|
||||||
|
# 5. Policy routing table
|
||||||
|
echo "100 ashburn" | sudo tee -a /etc/iproute2/rt_tables
|
||||||
|
sudo ip rule add fwmark 100 table ashburn
|
||||||
|
sudo ip route add default via 169.254.7.6 dev doublezero0 table ashburn
|
||||||
|
|
||||||
|
# 6. Persist
|
||||||
|
sudo netfilter-persistent save
|
||||||
|
# ip rule + ip route persist via /etc/network/if-up.d/ashburn-routing
|
||||||
|
```
|
||||||
|
|
||||||
|
### Docker NAT port preservation
|
||||||
|
|
||||||
|
**Must verify before going live:** Docker masquerade must preserve source ports
|
||||||
|
for kind's hostNetwork pods. If Docker rewrites the source port, the mangle
|
||||||
|
PREROUTING match on `--sport 8001,9000-9025` will miss traffic.
|
||||||
|
|
||||||
|
Test: `tcpdump -i br-cf46a62ab5b2 -nn 'udp src port 8001'` — if you see
|
||||||
|
packets with sport 8001 from 172.20.0.2, port preservation works.
|
||||||
|
|
||||||
|
If Docker does NOT preserve ports, the mark must be set inside the kind node
|
||||||
|
container (on the pod's veth) rather than on the host.
|
||||||
|
|
||||||
|
## Execution Order
|
||||||
|
|
||||||
|
1. **was-sw01**: checkpoint → config session with 5min auto-revert → verify counters → commit
|
||||||
|
2. **biscayne**: add 137.239.194.65/32 to lo, add inbound DNAT rules
|
||||||
|
3. **Verify inbound**: `ping 137.239.194.65` from external host, check DNAT counters
|
||||||
|
4. **mia-sw01**: pre-flight checks → config session with 5min auto-revert → commit
|
||||||
|
5. **biscayne**: add outbound fwmark + policy routing + SNAT rules
|
||||||
|
6. **Test outbound**: from biscayne, send UDP from port 8001, verify src 137.239.194.65 on was-sw01
|
||||||
|
7. **Verify**: traffic-policy counters on both switches, iptables hit counts on biscayne
|
||||||
|
8. **Restart validator** if needed (gossip should auto-refresh, but restart ensures clean state)
|
||||||
|
9. **was-sw01 + mia-sw01**: `write memory` to persist
|
||||||
|
10. **Cleanup**: remove old SHRED-RELAY and 64.92.84.81:20000 DNAT after stable
|
||||||
|
|
||||||
|
## Verification
|
||||||
|
|
||||||
|
1. `show traffic-policy counters` on was-sw01 — VALIDATOR-RELAY-ACL matches
|
||||||
|
2. `show traffic-policy counters` on mia-sw01 — VALIDATOR-OUTBOUND-ACL matches
|
||||||
|
3. `sudo iptables -t nat -L -v -n` on biscayne — DNAT and SNAT hit counts
|
||||||
|
4. `sudo iptables -t mangle -L -v -n` on biscayne — fwmark hit counts
|
||||||
|
5. `ip rule show` on biscayne — fwmark 100 lookup ashburn
|
||||||
|
6. Validator gossip ContactInfo shows 137.239.194.65 for ALL addresses (gossip, repair, TVU, TPU)
|
||||||
|
7. Repair peer count increases (target: 20+ peers)
|
||||||
|
8. Slot catchup rate improves from ~0.9 toward ~2.5 slots/sec
|
||||||
|
9. `traceroute --sport=8001 <remote_peer>` from biscayne routes via doublezero0/was-sw01
|
||||||
|
|
||||||
|
## Rollback
|
||||||
|
|
||||||
|
### biscayne
|
||||||
|
|
||||||
|
```bash
|
||||||
|
sudo ip addr del 137.239.194.65/32 dev lo
|
||||||
|
sudo iptables -t nat -D PREROUTING -p udp -d 137.239.194.65 --dport 8001 -j DNAT --to-destination 172.20.0.2:8001
|
||||||
|
sudo iptables -t nat -D PREROUTING -p tcp -d 137.239.194.65 --dport 8001 -j DNAT --to-destination 172.20.0.2:8001
|
||||||
|
sudo iptables -t nat -D PREROUTING -p udp -d 137.239.194.65 --dport 9000:9025 -j DNAT --to-destination 172.20.0.2
|
||||||
|
sudo iptables -t mangle -D PREROUTING -s 172.20.0.0/16 -p udp --sport 8001 -j MARK --set-mark 100
|
||||||
|
sudo iptables -t mangle -D PREROUTING -s 172.20.0.0/16 -p udp --sport 9000:9025 -j MARK --set-mark 100
|
||||||
|
sudo iptables -t mangle -D PREROUTING -s 172.20.0.0/16 -p tcp --sport 8001 -j MARK --set-mark 100
|
||||||
|
sudo iptables -t nat -D POSTROUTING -m mark --mark 100 -j SNAT --to-source 137.239.194.65
|
||||||
|
sudo ip rule del fwmark 100 table ashburn
|
||||||
|
sudo ip route del default table ashburn
|
||||||
|
sudo netfilter-persistent save
|
||||||
|
```
|
||||||
|
|
||||||
|
### was-sw01
|
||||||
|
|
||||||
|
```
|
||||||
|
rollback running-config checkpoint pre-validator-relay
|
||||||
|
write memory
|
||||||
|
```
|
||||||
|
|
||||||
|
### mia-sw01
|
||||||
|
|
||||||
|
```
|
||||||
|
rollback running-config checkpoint pre-validator-outbound
|
||||||
|
write memory
|
||||||
|
```
|
||||||
|
|
||||||
|
## Key Details
|
||||||
|
|
||||||
|
| Item | Value |
|
||||||
|
|------|-------|
|
||||||
|
| Ashburn relay IP | `137.239.194.65` (Loopback101 on was-sw01) |
|
||||||
|
| Ashburn LAN block | `137.239.194.64/29` on was-sw01 Et1/1 |
|
||||||
|
| Biscayne IP | `186.233.184.235` |
|
||||||
|
| Kind node IP | `172.20.0.2` (Docker bridge br-cf46a62ab5b2) |
|
||||||
|
| Validator ports | 8001 (gossip), 9000-9025 (TVU/repair/TPU) |
|
||||||
|
| Excluded ports | 8899 (RPC), 8900 (WebSocket) — direct to biscayne |
|
||||||
|
| GRE tunnel | doublezero0: 169.254.7.7 ↔ 169.254.7.6, remote 209.42.167.133 |
|
||||||
|
| Backbone | was-sw01 Et4/1 172.16.1.188/31 ↔ mia-sw01 Et4/1 172.16.1.189/31 |
|
||||||
|
| Policy routing table | 100 ashburn |
|
||||||
|
| Fwmark | 100 |
|
||||||
|
| was-sw01 SSH | `install@137.239.200.198` |
|
||||||
|
| EOS version | 4.34.0F |
|
||||||
|
|
@ -0,0 +1,416 @@
|
||||||
|
# Blue-Green Upgrades for Biscayne
|
||||||
|
|
||||||
|
Zero-downtime upgrade procedures for the agave-stack deployment on biscayne.
|
||||||
|
Uses ZFS clones for instant data duplication, Caddy health-check routing for
|
||||||
|
traffic shifting, and k8s native sidecars for independent container upgrades.
|
||||||
|
|
||||||
|
## Architecture
|
||||||
|
|
||||||
|
```
|
||||||
|
Caddy ingress (biscayne.vaasl.io)
|
||||||
|
├── upstream A: localhost:8899 ← health: /health
|
||||||
|
└── upstream B: localhost:8897 ← health: /health
|
||||||
|
│
|
||||||
|
┌─────────────────┴──────────────────┐
|
||||||
|
│ kind cluster │
|
||||||
|
│ │
|
||||||
|
│ Deployment A Deployment B │
|
||||||
|
│ ┌─────────────┐ ┌─────────────┐ │
|
||||||
|
│ │ agave :8899 │ │ agave :8897 │ │
|
||||||
|
│ │ doublezerod │ │ doublezerod │ │
|
||||||
|
│ └──────┬──────┘ └──────┬──────┘ │
|
||||||
|
└─────────┼─────────────────┼─────────┘
|
||||||
|
│ │
|
||||||
|
ZFS dataset A ZFS clone B
|
||||||
|
(original) (instant CoW copy)
|
||||||
|
```
|
||||||
|
|
||||||
|
Both deployments run in the same kind cluster with `hostNetwork: true`.
|
||||||
|
Caddy active health checks route traffic to whichever deployment has a
|
||||||
|
healthy `/health` endpoint.
|
||||||
|
|
||||||
|
## Storage Layout
|
||||||
|
|
||||||
|
| Data | Path | Type | Survives restart? |
|
||||||
|
|------|------|------|-------------------|
|
||||||
|
| Ledger | `/srv/solana/ledger` | ZFS zvol (xfs) | Yes |
|
||||||
|
| Snapshots | `/srv/solana/snapshots` | ZFS zvol (xfs) | Yes |
|
||||||
|
| Accounts | `/srv/solana/ramdisk/accounts` | `/dev/ram0` (xfs) | Until host reboot |
|
||||||
|
| Validator config | `/srv/deployments/agave/data/validator-config` | ZFS | Yes |
|
||||||
|
| DZ config | `/srv/deployments/agave/data/doublezero-config` | ZFS | Yes |
|
||||||
|
|
||||||
|
The ZFS zvol `biscayne/DATA/volumes/solana` backs `/srv/solana` (ledger, snapshots).
|
||||||
|
The ramdisk at `/dev/ram0` holds accounts — it's a block device, not tmpfs, so it
|
||||||
|
survives process restarts but not host reboots.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Procedure 1: DoubleZero Binary Upgrade (zero downtime, single pod)
|
||||||
|
|
||||||
|
The GRE tunnel (`doublezero0`) and BGP routes live in kernel space. They persist
|
||||||
|
across doublezerod process restarts. Upgrading the DZ binary does not require
|
||||||
|
tearing down the tunnel or restarting the validator.
|
||||||
|
|
||||||
|
### Prerequisites
|
||||||
|
|
||||||
|
- doublezerod is defined as a k8s native sidecar (`spec.initContainers` with
|
||||||
|
`restartPolicy: Always`). See [Required Changes](#required-changes) below.
|
||||||
|
- k8s 1.29+ (biscayne runs 1.35.1)
|
||||||
|
|
||||||
|
### Steps
|
||||||
|
|
||||||
|
1. Build or pull the new doublezero container image.
|
||||||
|
|
||||||
|
2. Patch the pod's sidecar image:
|
||||||
|
```bash
|
||||||
|
kubectl -n <ns> patch pod <pod> --type='json' -p='[
|
||||||
|
{"op": "replace", "path": "/spec/initContainers/0/image",
|
||||||
|
"value": "laconicnetwork/doublezero:new-version"}
|
||||||
|
]'
|
||||||
|
```
|
||||||
|
|
||||||
|
3. Only the doublezerod container restarts. The agave container is unaffected.
|
||||||
|
The GRE tunnel interface and BGP routes remain in the kernel throughout.
|
||||||
|
|
||||||
|
4. Verify:
|
||||||
|
```bash
|
||||||
|
kubectl -n <ns> exec <pod> -c doublezerod -- doublezero --version
|
||||||
|
kubectl -n <ns> exec <pod> -c doublezerod -- doublezero status
|
||||||
|
ip route | grep doublezero0 # routes still present
|
||||||
|
```
|
||||||
|
|
||||||
|
### Rollback
|
||||||
|
|
||||||
|
Patch the image back to the previous version. Same process, same zero downtime.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Procedure 2: Agave Version Upgrade (zero RPC downtime, blue-green)
|
||||||
|
|
||||||
|
Agave is the main container and must be restarted for a version change. To maintain
|
||||||
|
zero RPC downtime, we run two deployments simultaneously and let Caddy shift traffic
|
||||||
|
based on health checks.
|
||||||
|
|
||||||
|
### Prerequisites
|
||||||
|
|
||||||
|
- Caddy ingress configured with dual upstreams and active health checks
|
||||||
|
- A parameterized spec.yml that accepts alternate ports and volume paths
|
||||||
|
- ZFS snapshot/clone scripts
|
||||||
|
|
||||||
|
### Steps
|
||||||
|
|
||||||
|
#### Phase 1: Prepare (no downtime, no risk)
|
||||||
|
|
||||||
|
1. **ZFS snapshot** for rollback safety:
|
||||||
|
```bash
|
||||||
|
zfs snapshot -r biscayne/DATA@pre-upgrade-$(date +%Y%m%d)
|
||||||
|
```
|
||||||
|
|
||||||
|
2. **ZFS clone** the validator volumes:
|
||||||
|
```bash
|
||||||
|
zfs clone biscayne/DATA/volumes/solana@pre-upgrade-$(date +%Y%m%d) \
|
||||||
|
biscayne/DATA/volumes/solana-blue
|
||||||
|
```
|
||||||
|
This is instant (copy-on-write). No additional storage until writes diverge.
|
||||||
|
|
||||||
|
3. **Clone the ramdisk accounts** (not on ZFS):
|
||||||
|
```bash
|
||||||
|
mkdir -p /srv/solana-blue/ramdisk/accounts
|
||||||
|
cp -a /srv/solana/ramdisk/accounts/* /srv/solana-blue/ramdisk/accounts/
|
||||||
|
```
|
||||||
|
This is the slow step — 460GB on ramdisk. Consider `rsync` with `--inplace`
|
||||||
|
to minimize copy time, or investigate whether the ramdisk can move to a ZFS
|
||||||
|
dataset for instant cloning in future deployments.
|
||||||
|
|
||||||
|
4. **Build or pull** the new agave container image.
|
||||||
|
|
||||||
|
#### Phase 2: Start blue deployment (no downtime)
|
||||||
|
|
||||||
|
5. **Create Deployment B** in the same kind cluster, pointing at cloned volumes,
|
||||||
|
with RPC on port 8897:
|
||||||
|
```bash
|
||||||
|
# Apply the blue deployment manifest (parameterized spec)
|
||||||
|
kubectl apply -f deployment/k8s-manifests/agave-blue.yaml
|
||||||
|
```
|
||||||
|
|
||||||
|
6. **Deployment B catches up.** It starts from the snapshot point and replays.
|
||||||
|
Monitor progress:
|
||||||
|
```bash
|
||||||
|
kubectl -n <ns> exec <blue-pod> -c agave-validator -- \
|
||||||
|
solana -u http://127.0.0.1:8897 slot
|
||||||
|
```
|
||||||
|
|
||||||
|
7. **Validate** the new version works:
|
||||||
|
- RPC responds: `curl -sf http://localhost:8897/health`
|
||||||
|
- Correct version: `kubectl -n <ns> exec <blue-pod> -c agave-validator -- agave-validator --version`
|
||||||
|
- doublezerod connected (if applicable)
|
||||||
|
|
||||||
|
Take as long as needed. Deployment A is still serving all traffic.
|
||||||
|
|
||||||
|
#### Phase 3: Traffic shift (zero downtime)
|
||||||
|
|
||||||
|
8. **Caddy routes traffic to B.** Once B's `/health` returns 200, Caddy's active
|
||||||
|
health check automatically starts routing to it. Alternatively, update the
|
||||||
|
Caddy upstream config to prefer B.
|
||||||
|
|
||||||
|
9. **Verify** B is serving live traffic:
|
||||||
|
```bash
|
||||||
|
curl -sf https://biscayne.vaasl.io/health
|
||||||
|
# Check Caddy access logs for requests hitting port 8897
|
||||||
|
```
|
||||||
|
|
||||||
|
#### Phase 4: Cleanup
|
||||||
|
|
||||||
|
10. **Stop Deployment A:**
|
||||||
|
```bash
|
||||||
|
kubectl -n <ns> delete deployment agave-green
|
||||||
|
```
|
||||||
|
|
||||||
|
11. **Reconfigure B to use standard port** (8899) if desired, or update Caddy
|
||||||
|
to only route to 8897.
|
||||||
|
|
||||||
|
12. **Clean up ZFS clone** (or keep as rollback):
|
||||||
|
```bash
|
||||||
|
zfs destroy biscayne/DATA/volumes/solana-blue
|
||||||
|
```
|
||||||
|
|
||||||
|
### Rollback
|
||||||
|
|
||||||
|
At any point before Phase 4:
|
||||||
|
- Deployment A is untouched and still serving traffic (or can be restarted)
|
||||||
|
- Delete Deployment B: `kubectl -n <ns> delete deployment agave-blue`
|
||||||
|
- Destroy the ZFS clone: `zfs destroy biscayne/DATA/volumes/solana-blue`
|
||||||
|
|
||||||
|
After Phase 4 (A already stopped):
|
||||||
|
- `zfs rollback` to restore original data
|
||||||
|
- Redeploy A with old image
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Required Changes to agave-stack
|
||||||
|
|
||||||
|
### 1. Move doublezerod to native sidecar
|
||||||
|
|
||||||
|
In the pod spec generation (laconic-so or compose override), doublezerod must be
|
||||||
|
defined as a native sidecar container instead of a regular container:
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
spec:
|
||||||
|
initContainers:
|
||||||
|
- name: doublezerod
|
||||||
|
image: laconicnetwork/doublezero:local
|
||||||
|
restartPolicy: Always # makes it a native sidecar
|
||||||
|
securityContext:
|
||||||
|
privileged: true
|
||||||
|
capabilities:
|
||||||
|
add: [NET_ADMIN]
|
||||||
|
env:
|
||||||
|
- name: DOUBLEZERO_RPC_ENDPOINT
|
||||||
|
value: https://api.mainnet-beta.solana.com
|
||||||
|
volumeMounts:
|
||||||
|
- name: doublezero-config
|
||||||
|
mountPath: /root/.config/doublezero
|
||||||
|
containers:
|
||||||
|
- name: agave-validator
|
||||||
|
image: laconicnetwork/agave:local
|
||||||
|
# ... existing config
|
||||||
|
```
|
||||||
|
|
||||||
|
This change means:
|
||||||
|
- doublezerod starts before agave and stays running
|
||||||
|
- Patching the doublezerod image restarts only that container
|
||||||
|
- agave can be restarted independently without affecting doublezerod
|
||||||
|
|
||||||
|
This requires a laconic-so change to support `initContainers` with `restartPolicy`
|
||||||
|
in compose-to-k8s translation — or a post-deployment patch.
|
||||||
|
|
||||||
|
### 2. Caddy dual-upstream config
|
||||||
|
|
||||||
|
Add health-checked upstreams for both blue and green deployments:
|
||||||
|
|
||||||
|
```caddyfile
|
||||||
|
biscayne.vaasl.io {
|
||||||
|
reverse_proxy {
|
||||||
|
to localhost:8899 localhost:8897
|
||||||
|
|
||||||
|
health_uri /health
|
||||||
|
health_interval 5s
|
||||||
|
health_timeout 3s
|
||||||
|
|
||||||
|
lb_policy first
|
||||||
|
}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
`lb_policy first` routes to the first healthy upstream. When only A is running,
|
||||||
|
all traffic goes to :8899. When B comes up healthy, traffic shifts.
|
||||||
|
|
||||||
|
### 3. Parameterized deployment spec
|
||||||
|
|
||||||
|
Create a parameterized spec or kustomize overlay that accepts:
|
||||||
|
- RPC port (8899 vs 8897)
|
||||||
|
- Volume paths (original vs ZFS clone)
|
||||||
|
- Deployment name suffix (green vs blue)
|
||||||
|
|
||||||
|
### 4. Delete DaemonSet workaround
|
||||||
|
|
||||||
|
Remove `deployment/k8s-manifests/doublezero-daemonset.yaml` from agave-stack.
|
||||||
|
|
||||||
|
### 5. Fix container DZ identity
|
||||||
|
|
||||||
|
Copy the registered identity into the container volume:
|
||||||
|
```bash
|
||||||
|
sudo cp /home/solana/.config/doublezero/id.json \
|
||||||
|
/srv/deployments/agave/data/doublezero-config/id.json
|
||||||
|
```
|
||||||
|
|
||||||
|
### 6. Disable host systemd doublezerod
|
||||||
|
|
||||||
|
After the container sidecar is working:
|
||||||
|
```bash
|
||||||
|
sudo systemctl stop doublezerod
|
||||||
|
sudo systemctl disable doublezerod
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Implementation Order
|
||||||
|
|
||||||
|
This is a spec-driven, test-driven plan. Each step produces a testable artifact.
|
||||||
|
|
||||||
|
### Step 1: Fix existing DZ bugs (no code changes to laconic-so)
|
||||||
|
|
||||||
|
Fixes BUG-1 through BUG-5 from [doublezero-status.md](doublezero-status.md).
|
||||||
|
|
||||||
|
**Spec:** Container doublezerod shows correct identity, connects to laconic-mia-sw01,
|
||||||
|
host systemd doublezerod is disabled.
|
||||||
|
|
||||||
|
**Test:**
|
||||||
|
```bash
|
||||||
|
kubectl -n <ns> exec <pod> -c doublezerod -- doublezero address
|
||||||
|
# assert: 3Bw6v7EruQvTwoY79h2QjQCs2KBQFzSneBdYUbcXK1Tr
|
||||||
|
|
||||||
|
kubectl -n <ns> exec <pod> -c doublezerod -- doublezero status
|
||||||
|
# assert: BGP Session Up, laconic-mia-sw01
|
||||||
|
|
||||||
|
systemctl is-active doublezerod
|
||||||
|
# assert: inactive
|
||||||
|
```
|
||||||
|
|
||||||
|
**Changes:**
|
||||||
|
- Copy `id.json` to container volume
|
||||||
|
- Update `DOUBLEZERO_RPC_ENDPOINT` in spec.yml
|
||||||
|
- Deploy with hostNetwork-enabled stack-orchestrator
|
||||||
|
- Stop and disable host doublezerod
|
||||||
|
- Delete DaemonSet manifest from agave-stack
|
||||||
|
|
||||||
|
### Step 2: Native sidecar for doublezerod
|
||||||
|
|
||||||
|
**Spec:** doublezerod image can be patched without restarting the agave container.
|
||||||
|
GRE tunnel and routes persist across doublezerod restart.
|
||||||
|
|
||||||
|
**Test:**
|
||||||
|
```bash
|
||||||
|
# Record current agave container start time
|
||||||
|
BEFORE=$(kubectl -n <ns> get pod <pod> -o jsonpath='{.status.containerStatuses[?(@.name=="agave-validator")].state.running.startedAt}')
|
||||||
|
|
||||||
|
# Patch DZ image
|
||||||
|
kubectl -n <ns> patch pod <pod> --type='json' -p='[
|
||||||
|
{"op":"replace","path":"/spec/initContainers/0/image","value":"laconicnetwork/doublezero:test"}
|
||||||
|
]'
|
||||||
|
|
||||||
|
# Wait for DZ container to restart
|
||||||
|
sleep 10
|
||||||
|
|
||||||
|
# Verify agave was NOT restarted
|
||||||
|
AFTER=$(kubectl -n <ns> get pod <pod> -o jsonpath='{.status.containerStatuses[?(@.name=="agave-validator")].state.running.startedAt}')
|
||||||
|
[ "$BEFORE" = "$AFTER" ] # assert: same start time
|
||||||
|
|
||||||
|
# Verify tunnel survived
|
||||||
|
ip route | grep doublezero0 # assert: routes present
|
||||||
|
```
|
||||||
|
|
||||||
|
**Changes:**
|
||||||
|
- laconic-so: support `initContainers` with `restartPolicy: Always` in
|
||||||
|
compose-to-k8s translation (or: define doublezerod as native sidecar in
|
||||||
|
compose via `x-kubernetes-init-container` extension or equivalent)
|
||||||
|
- Alternatively: post-deploy kubectl patch to move doublezerod to initContainers
|
||||||
|
|
||||||
|
### Step 3: Caddy dual-upstream routing
|
||||||
|
|
||||||
|
**Spec:** Caddy routes RPC traffic to whichever backend is healthy. Adding a second
|
||||||
|
healthy backend on :8897 causes traffic to shift without configuration changes.
|
||||||
|
|
||||||
|
**Test:**
|
||||||
|
```bash
|
||||||
|
# Start a test HTTP server on :8897 with /health
|
||||||
|
python3 -c "
|
||||||
|
from http.server import HTTPServer, BaseHTTPRequestHandler
|
||||||
|
class H(BaseHTTPRequestHandler):
|
||||||
|
def do_GET(self):
|
||||||
|
self.send_response(200); self.end_headers(); self.wfile.write(b'ok')
|
||||||
|
HTTPServer(('', 8897), H).serve_forever()
|
||||||
|
" &
|
||||||
|
|
||||||
|
# Verify Caddy discovers it
|
||||||
|
sleep 10
|
||||||
|
curl -sf https://biscayne.vaasl.io/health
|
||||||
|
# assert: 200
|
||||||
|
|
||||||
|
kill %1
|
||||||
|
```
|
||||||
|
|
||||||
|
**Changes:**
|
||||||
|
- Update Caddy ingress config with dual upstreams and health checks
|
||||||
|
|
||||||
|
### Step 4: ZFS clone and blue-green tooling
|
||||||
|
|
||||||
|
**Spec:** A script creates a ZFS clone, starts a blue deployment on alternate ports
|
||||||
|
using the cloned data, and the deployment catches up and becomes healthy.
|
||||||
|
|
||||||
|
**Test:**
|
||||||
|
```bash
|
||||||
|
# Run the clone + deploy script
|
||||||
|
./scripts/blue-green-prepare.sh --target-version v2.2.1
|
||||||
|
|
||||||
|
# assert: ZFS clone exists
|
||||||
|
zfs list biscayne/DATA/volumes/solana-blue
|
||||||
|
|
||||||
|
# assert: blue deployment exists and is catching up
|
||||||
|
kubectl -n <ns> get deployment agave-blue
|
||||||
|
|
||||||
|
# assert: blue RPC eventually becomes healthy
|
||||||
|
timeout 600 bash -c 'until curl -sf http://localhost:8897/health; do sleep 5; done'
|
||||||
|
```
|
||||||
|
|
||||||
|
**Changes:**
|
||||||
|
- `scripts/blue-green-prepare.sh` — ZFS snapshot, clone, deploy B
|
||||||
|
- `scripts/blue-green-promote.sh` — tear down A, optional port swap
|
||||||
|
- `scripts/blue-green-rollback.sh` — destroy B, restore A
|
||||||
|
- Parameterized deployment spec (kustomize overlay or env-driven)
|
||||||
|
|
||||||
|
### Step 5: End-to-end upgrade test
|
||||||
|
|
||||||
|
**Spec:** Full upgrade cycle completes with zero dropped RPC requests.
|
||||||
|
|
||||||
|
**Test:**
|
||||||
|
```bash
|
||||||
|
# Start continuous health probe in background
|
||||||
|
while true; do
|
||||||
|
curl -sf -o /dev/null -w "%{http_code} %{time_total}\n" \
|
||||||
|
https://biscayne.vaasl.io/health || echo "FAIL $(date)"
|
||||||
|
sleep 0.5
|
||||||
|
done > /tmp/health-probe.log &
|
||||||
|
|
||||||
|
# Execute full blue-green upgrade
|
||||||
|
./scripts/blue-green-prepare.sh --target-version v2.2.1
|
||||||
|
# wait for blue to sync...
|
||||||
|
./scripts/blue-green-promote.sh
|
||||||
|
|
||||||
|
# Stop probe
|
||||||
|
kill %1
|
||||||
|
|
||||||
|
# assert: no FAIL lines in probe log
|
||||||
|
grep -c FAIL /tmp/health-probe.log
|
||||||
|
# assert: 0
|
||||||
|
```
|
||||||
|
|
@ -0,0 +1,85 @@
|
||||||
|
# Bug: Ashburn Relay — 137.239.194.65 Not Routable from Public Internet
|
||||||
|
|
||||||
|
## Summary
|
||||||
|
|
||||||
|
`--gossip-host 137.239.194.65` correctly advertises the Ashburn relay IP in
|
||||||
|
ContactInfo for all sockets (gossip, TVU, repair, TPU). However, 137.239.194.65
|
||||||
|
is a DoubleZero overlay IP (137.239.192.0/19, IS-IS only) that is NOT announced
|
||||||
|
via BGP to the public internet. Public peers cannot route to it, so TVU shreds,
|
||||||
|
repair requests, and TPU traffic never arrive at was-sw01.
|
||||||
|
|
||||||
|
## Evidence
|
||||||
|
|
||||||
|
- Gossip traffic arrives on `doublezero0` interface:
|
||||||
|
```
|
||||||
|
doublezero0 In IP 64.130.58.70.8001 > 137.239.194.65.8001: UDP, length 132
|
||||||
|
```
|
||||||
|
- Zero TVU/repair traffic arrives:
|
||||||
|
```
|
||||||
|
tcpdump -i doublezero0 'dst host 137.239.194.65 and udp and not port 8001'
|
||||||
|
0 packets captured
|
||||||
|
```
|
||||||
|
- ContactInfo correctly advertises all sockets on 137.239.194.65:
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"gossip": "137.239.194.65:8001",
|
||||||
|
"tvu": "137.239.194.65:9000",
|
||||||
|
"serveRepair": "137.239.194.65:9011",
|
||||||
|
"tpu": "137.239.194.65:9002"
|
||||||
|
}
|
||||||
|
```
|
||||||
|
- Outbound gossip from biscayne exits via `doublezero0` with source
|
||||||
|
137.239.194.65 — SNAT and routing work correctly in the outbound direction.
|
||||||
|
|
||||||
|
## Root Cause
|
||||||
|
|
||||||
|
**137.239.194.0/24 is not routable from the public internet.** The prefix
|
||||||
|
belongs to DoubleZero's overlay address space (137.239.192.0/19, Momentum
|
||||||
|
Telecom, WHOIS OriginAS: empty). It is advertised only via IS-IS within the
|
||||||
|
DoubleZero switch mesh. There is no eBGP session on was-sw01 to advertise it
|
||||||
|
to the ISP — all BGP peers are iBGP AS 65342 (DoubleZero internal).
|
||||||
|
|
||||||
|
When the validator advertises `tvu: 137.239.194.65:9000` in ContactInfo,
|
||||||
|
public internet peers attempt to send turbine shreds to that IP, but the
|
||||||
|
packets have no route through the global BGP table to reach was-sw01. Only
|
||||||
|
DoubleZero-connected peers could potentially reach it via the overlay.
|
||||||
|
|
||||||
|
The old shred relay pipeline worked because it used `--public-tvu-address
|
||||||
|
64.92.84.81:20000` — was-sw01's Et1/1 ISP uplink IP, which IS publicly
|
||||||
|
routable. The `--gossip-host 137.239.194.65` approach advertises a
|
||||||
|
DoubleZero-only IP for ALL sockets, making TVU/repair/TPU unreachable from
|
||||||
|
non-DoubleZero peers.
|
||||||
|
|
||||||
|
The original hypothesis (ACL/PBR port filtering) was wrong. The tunnel and
|
||||||
|
switch routing work correctly — the problem is upstream: traffic never arrives
|
||||||
|
at was-sw01 in the first place.
|
||||||
|
|
||||||
|
## Impact
|
||||||
|
|
||||||
|
The validator cannot receive turbine shreds or serve repair requests via the
|
||||||
|
low-latency Ashburn path. It falls back to the Miami public IP (186.233.184.235)
|
||||||
|
for all shred/repair traffic, negating the benefit of `--gossip-host`.
|
||||||
|
|
||||||
|
## Fix Options
|
||||||
|
|
||||||
|
1. **Use 64.92.84.81 (was-sw01 Et1/1) for ContactInfo sockets.** This is the
|
||||||
|
publicly routable Ashburn IP. Requires `--gossip-host 64.92.84.81` (or
|
||||||
|
equivalent `--bind-address` config) and DNAT/forwarding on was-sw01 to relay
|
||||||
|
traffic through the backbone → mia-sw01 → Tunnel500 → biscayne. The old
|
||||||
|
`--public-tvu-address` pipeline used this IP successfully.
|
||||||
|
|
||||||
|
2. **Get DoubleZero to announce 137.239.194.0/24 via eBGP to the ISP.** This
|
||||||
|
would make the current `--gossip-host 137.239.194.65` setup work, but
|
||||||
|
requires coordination with DoubleZero operations.
|
||||||
|
|
||||||
|
3. **Hybrid approach**: Use 64.92.84.81 for public-facing sockets (TVU, repair,
|
||||||
|
TPU) and 137.239.194.65 for gossip (which works via DoubleZero overlay).
|
||||||
|
Requires agave to support per-protocol address binding, which it does not
|
||||||
|
(`--gossip-host` sets ALL sockets to the same IP).
|
||||||
|
|
||||||
|
## Previous Workaround
|
||||||
|
|
||||||
|
The old `--public-tvu-address` pipeline used socat + shred-unwrap.py to relay
|
||||||
|
shreds from 64.92.84.81:20000 to the validator. That pipeline is not persistent
|
||||||
|
across reboots and was superseded by the `--gossip-host` approach (which turned
|
||||||
|
out to be broken for non-DoubleZero peers).
|
||||||
|
|
@ -0,0 +1,51 @@
|
||||||
|
# Bug: laconic-so etcd cleanup wipes core kubernetes service
|
||||||
|
|
||||||
|
## Summary
|
||||||
|
|
||||||
|
`_clean_etcd_keeping_certs()` in laconic-stack-orchestrator 1.1.0 deletes the `kubernetes` service from etcd, breaking cluster networking on restart.
|
||||||
|
|
||||||
|
## Component
|
||||||
|
|
||||||
|
`stack_orchestrator/deploy/k8s/helpers.py` — `_clean_etcd_keeping_certs()`
|
||||||
|
|
||||||
|
## Reproduction
|
||||||
|
|
||||||
|
1. Deploy with `laconic-so` to a k8s-kind target with persisted etcd (hostPath mount in kind-config.yml)
|
||||||
|
2. `laconic-so deployment --dir <dir> stop` (destroys cluster)
|
||||||
|
3. `laconic-so deployment --dir <dir> start` (recreates cluster with cleaned etcd)
|
||||||
|
|
||||||
|
## Symptoms
|
||||||
|
|
||||||
|
- `kindnet` pods enter CrashLoopBackOff with: `panic: unable to load in-cluster configuration, KUBERNETES_SERVICE_HOST and KUBERNETES_SERVICE_PORT must be defined`
|
||||||
|
- `kubectl get svc kubernetes -n default` returns `NotFound`
|
||||||
|
- coredns, caddy, local-path-provisioner stuck in Pending (no CNI without kindnet)
|
||||||
|
- No pods can be scheduled
|
||||||
|
|
||||||
|
## Root Cause
|
||||||
|
|
||||||
|
`_clean_etcd_keeping_certs()` uses a whitelist that only preserves `/registry/secrets/caddy-system` keys. All other etcd keys are deleted, including `/registry/services/specs/default/kubernetes` — the core `kubernetes` ClusterIP service that kube-apiserver auto-creates.
|
||||||
|
|
||||||
|
When the kind cluster starts with the cleaned etcd, kube-apiserver sees the existing etcd data and does not re-create the `kubernetes` service. kindnet depends on the `KUBERNETES_SERVICE_HOST` environment variable which is injected by the kubelet from this service — without it, kindnet panics.
|
||||||
|
|
||||||
|
## Fix Options
|
||||||
|
|
||||||
|
1. **Expand the whitelist** to include `/registry/services/specs/default/kubernetes` and other core cluster resources
|
||||||
|
2. **Fully wipe etcd** instead of selective cleanup — let the cluster bootstrap fresh (simpler, but loses Caddy TLS certs)
|
||||||
|
3. **Don't persist etcd at all** — ephemeral etcd means clean state every restart (recommended for kind deployments)
|
||||||
|
|
||||||
|
## Workaround
|
||||||
|
|
||||||
|
Fully delete the kind cluster before `start`:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
kind delete cluster --name <cluster-name>
|
||||||
|
laconic-so deployment --dir <dir> start
|
||||||
|
```
|
||||||
|
|
||||||
|
This forces fresh etcd bootstrap. Downside: all other services deployed to the cluster (DaemonSets, other namespaces) are destroyed.
|
||||||
|
|
||||||
|
## Impact
|
||||||
|
|
||||||
|
- Affects any k8s-kind deployment with persisted etcd
|
||||||
|
- Cluster is unrecoverable without full destroy+recreate
|
||||||
|
- All non-laconic-so-managed workloads in the cluster are lost
|
||||||
|
|
@ -0,0 +1,75 @@
|
||||||
|
# Bug: laconic-so crashes on re-deploy when caddy ingress already exists
|
||||||
|
|
||||||
|
## Summary
|
||||||
|
|
||||||
|
`laconic-so deployment start` crashes with `FailToCreateError` when the kind cluster already has caddy ingress resources installed. The deployer uses `create_from_yaml()` which fails on `AlreadyExists` conflicts instead of applying idempotently. This prevents the application deployment from ever being reached — the crash happens before any app manifests are applied.
|
||||||
|
|
||||||
|
## Component
|
||||||
|
|
||||||
|
`stack_orchestrator/deploy/k8s/deploy_k8s.py:366` — `up()` method
|
||||||
|
`stack_orchestrator/deploy/k8s/helpers.py:369` — `install_ingress_for_kind()`
|
||||||
|
|
||||||
|
## Reproduction
|
||||||
|
|
||||||
|
1. `kind delete cluster --name laconic-70ce4c4b47e23b85`
|
||||||
|
2. `laconic-so deployment --dir /srv/deployments/agave start` — creates cluster, loads images, installs caddy ingress, but times out or is interrupted before app deployment completes
|
||||||
|
3. `laconic-so deployment --dir /srv/deployments/agave start` — crashes immediately after image loading
|
||||||
|
|
||||||
|
## Symptoms
|
||||||
|
|
||||||
|
- Traceback ending in:
|
||||||
|
```
|
||||||
|
kubernetes.utils.create_from_yaml.FailToCreateError:
|
||||||
|
Error from server (Conflict): namespaces "caddy-system" already exists
|
||||||
|
Error from server (Conflict): serviceaccounts "caddy-ingress-controller" already exists
|
||||||
|
Error from server (Conflict): clusterroles.rbac.authorization.k8s.io "caddy-ingress-controller" already exists
|
||||||
|
...
|
||||||
|
```
|
||||||
|
- Namespace `laconic-laconic-70ce4c4b47e23b85` exists but is empty — no pods, no deployments, no events
|
||||||
|
- Cluster is healthy, images are loaded, but no app manifests are applied
|
||||||
|
|
||||||
|
## Root Cause
|
||||||
|
|
||||||
|
`install_ingress_for_kind()` calls `kubernetes.utils.create_from_yaml()` which uses `POST` (create) semantics. If the resources already exist (from a previous partial run), every resource returns `409 Conflict` and `create_from_yaml` raises `FailToCreateError`, aborting the entire `up()` method before the app deployment step.
|
||||||
|
|
||||||
|
The first `laconic-so start` after a fresh `kind delete` works because:
|
||||||
|
1. Image loading into the kind node takes 5-10 minutes (images are ~10GB+)
|
||||||
|
2. Caddy ingress is installed successfully
|
||||||
|
3. App deployment begins
|
||||||
|
|
||||||
|
But if that first run is interrupted (timeout, Ctrl-C, ansible timeout), the second run finds caddy already installed and crashes.
|
||||||
|
|
||||||
|
## Fix Options
|
||||||
|
|
||||||
|
1. **Use server-side apply** instead of `create_from_yaml()` — `kubectl apply` is idempotent
|
||||||
|
2. **Check if ingress exists before installing** — skip `install_ingress_for_kind()` if caddy-system namespace exists
|
||||||
|
3. **Catch `AlreadyExists` and continue** — treat 409 as success for infrastructure resources
|
||||||
|
|
||||||
|
## Workaround
|
||||||
|
|
||||||
|
Delete the caddy ingress resources before re-running:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
kubectl delete namespace caddy-system
|
||||||
|
kubectl delete clusterrole caddy-ingress-controller
|
||||||
|
kubectl delete clusterrolebinding caddy-ingress-controller
|
||||||
|
kubectl delete ingressclass caddy
|
||||||
|
laconic-so deployment --dir /srv/deployments/agave start
|
||||||
|
```
|
||||||
|
|
||||||
|
Or nuke the entire cluster and start fresh:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
kind delete cluster --name laconic-70ce4c4b47e23b85
|
||||||
|
laconic-so deployment --dir /srv/deployments/agave start
|
||||||
|
```
|
||||||
|
|
||||||
|
## Interaction with ansible timeout
|
||||||
|
|
||||||
|
The `biscayne-redeploy.yml` playbook sets a 600s timeout on the `laconic-so deployment start` task. Image loading alone can exceed this on a fresh cluster (images must be re-loaded into the new kind node). When ansible kills the process at 600s, the caddy ingress is already installed but the app is not — putting the cluster into the broken state described above. Subsequent playbook runs hit this bug on every attempt.
|
||||||
|
|
||||||
|
## Impact
|
||||||
|
|
||||||
|
- Blocks all re-deploys on biscayne without manual cleanup
|
||||||
|
- The playbook cannot recover automatically — every retry hits the same conflict
|
||||||
|
- Discovered 2026-03-05 during full wipe redeploy of biscayne validator
|
||||||
|
|
@ -0,0 +1,121 @@
|
||||||
|
# DoubleZero Multicast Access Requests
|
||||||
|
|
||||||
|
## Status (2026-03-06)
|
||||||
|
|
||||||
|
DZ multicast is **still in testnet** (client v0.2.2). Multicast groups are defined
|
||||||
|
on the DZ ledger with on-chain access control (publishers/subscribers). The testnet
|
||||||
|
allocates addresses from 233.84.178.0/24 (AS21682). Not yet available for production
|
||||||
|
Solana shred delivery.
|
||||||
|
|
||||||
|
## Biscayne Connection Details
|
||||||
|
|
||||||
|
Provide these details when requesting subscriber access:
|
||||||
|
|
||||||
|
| Field | Value |
|
||||||
|
|-------|-------|
|
||||||
|
| Client IP | 186.233.184.235 |
|
||||||
|
| Validator identity | 4WeLUxfQghbhsLEuwaAzjZiHg2VBw87vqHc4iZrGvKPr |
|
||||||
|
| DZ identity | 3Bw6v7EruQvTwoY79h2QjQCs2KBQFzSneBdYUbcXK1Tr |
|
||||||
|
| DZ device | laconic-mia-sw01 |
|
||||||
|
| Contributor / tenant | laconic |
|
||||||
|
|
||||||
|
## Jito ShredStream
|
||||||
|
|
||||||
|
**Not a DZ multicast group.** ShredStream is Jito's own shred delivery service,
|
||||||
|
independent of DoubleZero multicast. It provides low-latency shreds from leaders
|
||||||
|
on the Solana network via a proxy client that connects to the Jito Block Engine.
|
||||||
|
|
||||||
|
| Property | Value |
|
||||||
|
|----------|-------|
|
||||||
|
| What it does | Delivers shreds from Jito-connected leaders with low latency. Provides a redundant shred path for servers in remote locations. |
|
||||||
|
| How it works | `shredstream-proxy` authenticates to a Jito Block Engine via keypair, receives shreds, forwards them to configured UDP destinations (e.g. validator TVU port). |
|
||||||
|
| Cost | **Unknown.** Docs don't list pricing. Was previously "complimentary" for searchers (2024). May require approval. |
|
||||||
|
| Requirements | Approved Solana pubkey (form submission), auth keypair, firewall open on UDP 20000, TVU port of your node. |
|
||||||
|
| Regions | Amsterdam, Dublin, Frankfurt, London, New York, Salt Lake City, Singapore, Tokyo. Max 2 regions selectable. |
|
||||||
|
| Limitations | No NAT support. Bridge networking incompatible with multicast mode. |
|
||||||
|
| Repo | https://github.com/jito-labs/shredstream-proxy |
|
||||||
|
| Docs | https://docs.jito.wtf/lowlatencytxnfeed/ |
|
||||||
|
| Status for biscayne | **Not yet requested.** Need to submit pubkey for approval. |
|
||||||
|
|
||||||
|
ShredStream is relevant to our shred completeness problem — it provides an additional
|
||||||
|
shred source beyond turbine and the Ashburn relay. It would run as a sidecar process
|
||||||
|
forwarding shreds to the validator's TVU port.
|
||||||
|
|
||||||
|
## DZ Multicast Groups
|
||||||
|
|
||||||
|
DZ multicast uses PIM (Protocol Independent Multicast) and MSDP (Multicast Source
|
||||||
|
Discovery Protocol). Group owners define allowed publishers and subscribers on the
|
||||||
|
DZ ledger. Switch ASICs handle packet replication — no CPU overhead.
|
||||||
|
|
||||||
|
### bebop
|
||||||
|
|
||||||
|
Listed in earlier notes as a multicast shred distribution group. **No public
|
||||||
|
documentation found.** Cannot confirm this exists as a DZ multicast group.
|
||||||
|
|
||||||
|
- **Owner:** Unknown
|
||||||
|
- **Status:** Unverified — may not exist as described
|
||||||
|
|
||||||
|
### turbine (future)
|
||||||
|
|
||||||
|
Solana's native shred propagation via DZ multicast. Jito has expressed interest
|
||||||
|
in leveraging multicast for shred delivery. Not yet available for production use.
|
||||||
|
|
||||||
|
- **Owner:** Solana Foundation / Anza (native turbine), Jito (shredstream)
|
||||||
|
- **Status:** Testnet only (DZ client v0.2.2)
|
||||||
|
|
||||||
|
## bloXroute OFR (Optimized Feed Relay)
|
||||||
|
|
||||||
|
Commercial shred delivery service. Runs a gateway docker container on your node that
|
||||||
|
connects to bloXroute's BDN (Blockchain Distribution Network) to receive shreds
|
||||||
|
faster than default turbine (~30-50ms improvement, beats turbine ~98% of the time).
|
||||||
|
|
||||||
|
| Property | Value |
|
||||||
|
|----------|-------|
|
||||||
|
| What it does | Delivers shreds via bloXroute's BDN with optimized relay topologies. Not just a different turbine path — uses their own distribution network. |
|
||||||
|
| How it works | Docker gateway container on your node, communicates with bloXroute OFR relay over UDP 18888. Forwards shreds to your validator. |
|
||||||
|
| Cost | **$300/mo** (Professional, 1500 tx/day), **$1,250/mo** (Enterprise, unlimited tx). OFR gateway without local node requires Enterprise Elite ($5,000+/mo). |
|
||||||
|
| Requirements | Docker, UDP port 18888 open, bloXroute subscription. |
|
||||||
|
| Open source | Gateway at https://github.com/bloXroute-Labs/solana-gateway |
|
||||||
|
| Docs | https://docs.bloxroute.com/solana/optimized-feed-relay |
|
||||||
|
| Status for biscayne | **Not yet evaluated.** Monthly cost may not be justified. |
|
||||||
|
|
||||||
|
bloXroute's value proposition: they operate nodes at multiple turbine tree positions
|
||||||
|
across their network, aggregate shreds, and redistribute via their BDN. This is the
|
||||||
|
"multiple identities collecting different shreds" approach — but operated by bloXroute,
|
||||||
|
not by us.
|
||||||
|
|
||||||
|
## How These Services Get More Shreds
|
||||||
|
|
||||||
|
Turbine tree position is determined by validator identity (pubkey). A single validator
|
||||||
|
gets shreds from one position in the tree per slot. Services like Jito ShredStream
|
||||||
|
and bloXroute OFR operate many nodes with different identities across the turbine
|
||||||
|
tree, aggregate the shreds they each receive, and redistribute the combined set to
|
||||||
|
subscribers. This is why they can deliver shreds the subscriber's own turbine position
|
||||||
|
would never see.
|
||||||
|
|
||||||
|
**An open-source equivalent would require running multiple lightweight validator
|
||||||
|
identities (non-voting, minimal stake) at different locations, each collecting shreds
|
||||||
|
from their unique turbine tree position, and forwarding them to the main validator.**
|
||||||
|
No known open-source project implements this pattern.
|
||||||
|
|
||||||
|
## Sources
|
||||||
|
|
||||||
|
- [Jito ShredStream docs](https://docs.jito.wtf/lowlatencytxnfeed/)
|
||||||
|
- [shredstream-proxy repo](https://github.com/jito-labs/shredstream-proxy)
|
||||||
|
- [bloXroute OFR docs](https://docs.bloxroute.com/solana/optimized-feed-relay)
|
||||||
|
- [bloXroute pricing](https://bloxroute.com/pricing/)
|
||||||
|
- [bloXroute OFR intro](https://bloxroute.com/pulse/introducing-ofrs-faster-shreds-better-performance-on-solana/)
|
||||||
|
- [DZ multicast announcement](https://doublezero.xyz/journal/doublezero-introduces-multicast-support-smarter-faster-data-delivery-for-distributed-systems)
|
||||||
|
|
||||||
|
## Request Template
|
||||||
|
|
||||||
|
When contacting a group owner, use something like:
|
||||||
|
|
||||||
|
> We'd like to subscribe to your DoubleZero multicast group for our Solana
|
||||||
|
> validator. Our details:
|
||||||
|
>
|
||||||
|
> - Validator: 4WeLUxfQghbhsLEuwaAzjZiHg2VBw87vqHc4iZrGvKPr
|
||||||
|
> - DZ identity: 3Bw6v7EruQvTwoY79h2QjQCs2KBQFzSneBdYUbcXK1Tr
|
||||||
|
> - Client IP: 186.233.184.235
|
||||||
|
> - Device: laconic-mia-sw01
|
||||||
|
> - Tenant: laconic
|
||||||
|
|
@ -0,0 +1,121 @@
|
||||||
|
# DoubleZero Current State and Bug Fixes
|
||||||
|
|
||||||
|
## Biscayne Connection Details
|
||||||
|
|
||||||
|
| Field | Value |
|
||||||
|
|-------|-------|
|
||||||
|
| Host | biscayne.vaasl.io (186.233.184.235) |
|
||||||
|
| DZ identity | `3Bw6v7EruQvTwoY79h2QjQCs2KBQFzSneBdYUbcXK1Tr` |
|
||||||
|
| Validator identity | `4WeLUxfQghbhsLEuwaAzjZiHg2VBw87vqHc4iZrGvKPr` |
|
||||||
|
| Nearest device | laconic-mia-sw01 (0.3ms) |
|
||||||
|
| DZ version (host) | 0.8.10 |
|
||||||
|
| DZ version (container) | 0.8.11 |
|
||||||
|
| k8s version | 1.35.1 (kind) |
|
||||||
|
|
||||||
|
## Current State (2026-03-03)
|
||||||
|
|
||||||
|
The host systemd `doublezerod` is connected and working. The container sidecar
|
||||||
|
doublezerod is broken. Both are running simultaneously.
|
||||||
|
|
||||||
|
| Instance | Identity | Status |
|
||||||
|
|----------|----------|--------|
|
||||||
|
| Host systemd | `3Bw6v7...` (correct) | BGP Session Up, IBRL to laconic-mia-sw01 |
|
||||||
|
| Container sidecar | `Cw9qun...` (wrong) | Disconnected, error loop |
|
||||||
|
| DaemonSet manifest | N/A | Never applied, dead code |
|
||||||
|
|
||||||
|
### Access pass
|
||||||
|
|
||||||
|
The access pass for 186.233.184.235 is registered and connected:
|
||||||
|
|
||||||
|
```
|
||||||
|
type: prepaid
|
||||||
|
payer: 3Bw6v7EruQvTwoY79h2QjQCs2KBQFzSneBdYUbcXK1Tr
|
||||||
|
status: connected
|
||||||
|
owner: DZfLKFDgLShjY34WqXdVVzHUvVtrYXb7UtdrALnGa8jw
|
||||||
|
```
|
||||||
|
|
||||||
|
## Bugs
|
||||||
|
|
||||||
|
### BUG-1: Container doublezerod has wrong identity
|
||||||
|
|
||||||
|
The entrypoint script (`entrypoint.sh`) auto-generates a new `id.json` if one isn't
|
||||||
|
found. The volume at `/srv/deployments/agave/data/doublezero-config/` was empty at
|
||||||
|
first boot, so it generated `Cw9qun...` instead of using the registered identity.
|
||||||
|
|
||||||
|
**Root cause:** The real `id.json` lives at `/home/solana/.config/doublezero/id.json`
|
||||||
|
(created by the host-level DZ install). The container volume is a separate path that
|
||||||
|
was never seeded.
|
||||||
|
|
||||||
|
**Fix:**
|
||||||
|
```bash
|
||||||
|
sudo cp /home/solana/.config/doublezero/id.json \
|
||||||
|
/srv/deployments/agave/data/doublezero-config/id.json
|
||||||
|
```
|
||||||
|
|
||||||
|
### BUG-2: Container doublezerod can't resolve DZ passport program
|
||||||
|
|
||||||
|
`DOUBLEZERO_RPC_ENDPOINT` in `spec.yml` is `http://127.0.0.1:8899` — the local
|
||||||
|
validator. But the local validator hasn't replayed enough slots to have the DZ
|
||||||
|
passport program accounts (`ser2VaTMAcYTaauMrTSfSrxBaUDq7BLNs2xfUugTAGv`).
|
||||||
|
doublezerod calls `GetProgramAccounts` every 30 seconds and gets empty results.
|
||||||
|
|
||||||
|
**Fix in `deployment/spec.yml`:**
|
||||||
|
```yaml
|
||||||
|
# Use public RPC for DZ bootstrapping until local validator is caught up
|
||||||
|
DOUBLEZERO_RPC_ENDPOINT: https://api.mainnet-beta.solana.com
|
||||||
|
```
|
||||||
|
|
||||||
|
Switch back to `http://127.0.0.1:8899` once the local validator is synced.
|
||||||
|
|
||||||
|
### BUG-3: Container doublezerod lacks hostNetwork
|
||||||
|
|
||||||
|
laconic-so was not translating `network_mode: host` from compose files to
|
||||||
|
`hostNetwork: true` in generated k8s pod specs. Without host network access, the
|
||||||
|
container can't create GRE tunnels (IP proto 47) or run BGP (tcp/179 on
|
||||||
|
169.254.0.0/16).
|
||||||
|
|
||||||
|
**Fix:** Deploy with stack-orchestrator branch `fix/k8s-port-mappings-hostnetwork-v2`
|
||||||
|
(commit `fb69cc58`, 2026-03-03) which adds automatic hostNetwork detection.
|
||||||
|
|
||||||
|
### BUG-4: DaemonSet workaround is dead code
|
||||||
|
|
||||||
|
`deployment/k8s-manifests/doublezero-daemonset.yaml` was a workaround for BUG-3.
|
||||||
|
Now that laconic-so supports hostNetwork natively, it should be deleted.
|
||||||
|
|
||||||
|
**Fix:** Remove `deployment/k8s-manifests/doublezero-daemonset.yaml` from agave-stack.
|
||||||
|
|
||||||
|
### BUG-5: Two doublezerod instances running simultaneously
|
||||||
|
|
||||||
|
The host systemd `doublezerod` and the container sidecar are both running. Once the
|
||||||
|
container is fixed (BUG-1 through BUG-3), the host service must be disabled to avoid
|
||||||
|
two processes fighting over the GRE tunnel.
|
||||||
|
|
||||||
|
**Fix:**
|
||||||
|
```bash
|
||||||
|
sudo systemctl stop doublezerod
|
||||||
|
sudo systemctl disable doublezerod
|
||||||
|
```
|
||||||
|
|
||||||
|
## Diagnostic Commands
|
||||||
|
|
||||||
|
Always use `sudo -u solana` for host-level DZ commands — the identity is under
|
||||||
|
`/home/solana/.config/doublezero/`.
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Host
|
||||||
|
sudo -u solana doublezero address # expect 3Bw6v7...
|
||||||
|
sudo -u solana doublezero status # tunnel state
|
||||||
|
sudo -u solana doublezero latency # device reachability
|
||||||
|
sudo -u solana doublezero access-pass list | grep 186.233.184 # access pass
|
||||||
|
sudo -u solana doublezero balance # credits
|
||||||
|
ip route | grep doublezero0 # BGP routes
|
||||||
|
|
||||||
|
# Container (from kind node)
|
||||||
|
kubectl -n <ns> exec <pod> -c doublezerod -- doublezero address
|
||||||
|
kubectl -n <ns> exec <pod> -c doublezerod -- doublezero status
|
||||||
|
kubectl -n <ns> exec <pod> -c doublezerod -- doublezero --version
|
||||||
|
|
||||||
|
# Logs
|
||||||
|
kubectl -n <ns> logs <pod> -c doublezerod --tail=30
|
||||||
|
sudo journalctl -u doublezerod -f # host systemd logs
|
||||||
|
```
|
||||||
|
|
@ -0,0 +1,65 @@
|
||||||
|
# Feature: Use local registry for kind image loading
|
||||||
|
|
||||||
|
## Summary
|
||||||
|
|
||||||
|
`laconic-so deployment start` uses `kind load docker-image` to copy container images from the host Docker daemon into the kind node's containerd. This serializes the full image (`docker save`), pipes it through `docker exec`, and deserializes it (`ctr image import`). For biscayne's ~837MB agave image plus the doublezero image, this takes 5-10 minutes on every cluster recreate — copying between two container runtimes on the same machine.
|
||||||
|
|
||||||
|
## Current behavior
|
||||||
|
|
||||||
|
```
|
||||||
|
docker build → host Docker daemon (image stored once)
|
||||||
|
kind load docker-image → docker save | docker exec kind-node ctr import (full copy)
|
||||||
|
```
|
||||||
|
|
||||||
|
This happens in `stack_orchestrator/deploy/k8s/deploy_k8s.py` every time `laconic-so deployment start` runs and the image isn't already present in the kind node.
|
||||||
|
|
||||||
|
## Proposed behavior
|
||||||
|
|
||||||
|
Run a persistent local registry (`registry:2`) on the host. `laconic-so` pushes images there after build. Kind's containerd is configured to pull from it.
|
||||||
|
|
||||||
|
```
|
||||||
|
docker build → docker tag localhost:5001/image → docker push localhost:5001/image
|
||||||
|
kind node containerd → pulls from localhost:5001 (fast, no serialization)
|
||||||
|
```
|
||||||
|
|
||||||
|
The registry container persists across kind cluster deletions. Images are always available without reloading.
|
||||||
|
|
||||||
|
## Implementation
|
||||||
|
|
||||||
|
1. **Registry container**: `docker run -d --restart=always -p 5001:5000 --name kind-registry registry:2`
|
||||||
|
|
||||||
|
2. **Kind config** — add registry mirror to `containerdConfigPatches` in kind-config.yml:
|
||||||
|
```yaml
|
||||||
|
containerdConfigPatches:
|
||||||
|
- |-
|
||||||
|
[plugins."io.containerd.grpc.v1.cri".registry.mirrors."localhost:5001"]
|
||||||
|
endpoint = ["http://kind-registry:5000"]
|
||||||
|
```
|
||||||
|
|
||||||
|
3. **Connect registry to kind network**: `docker network connect kind kind-registry`
|
||||||
|
|
||||||
|
4. **laconic-so change** — in `deploy_k8s.py`, replace `kind load docker-image` with:
|
||||||
|
```python
|
||||||
|
# Tag and push to local registry instead of kind load
|
||||||
|
docker tag image:local localhost:5001/image:local
|
||||||
|
docker push localhost:5001/image:local
|
||||||
|
```
|
||||||
|
|
||||||
|
5. **Compose files** — image references change from `laconicnetwork/agave:local` to `localhost:5001/laconicnetwork/agave:local`
|
||||||
|
|
||||||
|
Kind documents this pattern: https://kind.sigs.k8s.io/docs/user/local-registry/
|
||||||
|
|
||||||
|
## Impact
|
||||||
|
|
||||||
|
- Eliminates 5-10 minute image loading step on every cluster recreate
|
||||||
|
- Registry persists across `kind delete cluster` — no re-push needed unless the image itself changes
|
||||||
|
- `docker push` to a local registry is near-instant (shared filesystem, layer dedup)
|
||||||
|
- Unblocks faster iteration on redeploy cycles
|
||||||
|
|
||||||
|
## Scope
|
||||||
|
|
||||||
|
This is a `stack-orchestrator` change, specifically in `deploy_k8s.py`. The kind-config.yml also needs the registry mirror config, which `laconic-so` generates from `spec.yml`.
|
||||||
|
|
||||||
|
## Discovered
|
||||||
|
|
||||||
|
2026-03-05 — during biscayne full wipe redeploy, `laconic-so start` spent most of its runtime on `kind load docker-image`, causing ansible timeouts and cascading failures (caddy ingress conflict bug).
|
||||||
|
|
@ -0,0 +1,78 @@
|
||||||
|
# Known Issues
|
||||||
|
|
||||||
|
## BUG-6: Validator logging not configured, only stdout available
|
||||||
|
|
||||||
|
**Observed:** 2026-03-03
|
||||||
|
|
||||||
|
The validator only logs to stdout. kubectl logs retains ~2 minutes of history
|
||||||
|
at current log volume before the buffer fills. When diagnosing a replay stall,
|
||||||
|
the startup logs (snapshot load, initial replay, error conditions) were gone.
|
||||||
|
|
||||||
|
**Impact:** Cannot determine why the validator replay stage stalled — the
|
||||||
|
startup logs that would show the root cause are not available.
|
||||||
|
|
||||||
|
**Fix:** Configure the `--log` flag in the validator start script to write to
|
||||||
|
a persistent volume, so logs survive container restarts and aren't limited
|
||||||
|
to the kubectl buffer.
|
||||||
|
|
||||||
|
## BUG-7: Metrics endpoint unreachable from validator pod
|
||||||
|
|
||||||
|
**Observed:** 2026-03-03
|
||||||
|
|
||||||
|
```
|
||||||
|
WARN solana_metrics::metrics submit error: error sending request for url
|
||||||
|
(http://localhost:8086/write?db=agave_metrics&u=admin&p=admin&precision=n)
|
||||||
|
```
|
||||||
|
|
||||||
|
The validator is configured with `SOLANA_METRICS_CONFIG` pointing to
|
||||||
|
`http://172.20.0.1:8086` (the kind docker bridge gateway), but the logs show
|
||||||
|
it trying `localhost:8086`. The InfluxDB container (`solana-monitoring-influxdb-1`)
|
||||||
|
is running on the host, but the validator can't reach it.
|
||||||
|
|
||||||
|
**Impact:** No metrics collection. Cannot use Grafana dashboards to diagnose
|
||||||
|
performance issues or track sync progress over time.
|
||||||
|
|
||||||
|
## BUG-8: sysctl values not visible inside kind container
|
||||||
|
|
||||||
|
**Observed:** 2026-03-03
|
||||||
|
|
||||||
|
```
|
||||||
|
ERROR solana_core::system_monitor_service Failed to query value for net.core.rmem_max: no such sysctl
|
||||||
|
WARN solana_core::system_monitor_service net.core.rmem_max: recommended=134217728, current=-1 too small
|
||||||
|
```
|
||||||
|
|
||||||
|
The host has correct sysctl values (`net.core.rmem_max = 134217728`), but
|
||||||
|
`/proc/sys/net/core/` does not exist inside the kind node container. The
|
||||||
|
validator reads `-1` and reports the buffer as too small.
|
||||||
|
|
||||||
|
The network buffers themselves may still be effective (they're set on the
|
||||||
|
host network namespace which the pod shares via `hostNetwork: true`), but
|
||||||
|
this is unverified. If the buffers are not effective, it could limit shred
|
||||||
|
ingestion throughput and contribute to slow repair.
|
||||||
|
|
||||||
|
**Fix options:**
|
||||||
|
- Set sysctls on the kind node container at creation time
|
||||||
|
(`kind` supports `kubeadmConfigPatches` and sysctl configuration)
|
||||||
|
- Verify empirically whether the host sysctls apply to hostNetwork pods
|
||||||
|
by checking actual socket buffer sizes from inside the pod
|
||||||
|
|
||||||
|
## Validator replay stall (under investigation)
|
||||||
|
|
||||||
|
**Observed:** 2026-03-03
|
||||||
|
|
||||||
|
The validator root has been stuck at slot 403,892,310 for 55+ minutes.
|
||||||
|
The gap to the cluster tip is ~120,000 slots and growing.
|
||||||
|
|
||||||
|
**Observed symptoms:**
|
||||||
|
- Zero `Frozen` banks in log history — replay stage is not processing slots
|
||||||
|
- All incoming slots show `bank_status: Unprocessed`
|
||||||
|
- Repair only requests tip slots and two specific old slots (403,892,310,
|
||||||
|
403,909,228) — not the ~120k slot gap
|
||||||
|
- Repair peer count is 3-12 per cycle (vs 1,000+ gossip peers)
|
||||||
|
- Startup logs have rotated out (BUG-6), so initialization context is lost
|
||||||
|
|
||||||
|
**Unknown:**
|
||||||
|
- What snapshot the validator loaded at boot
|
||||||
|
- Whether replay ever started or was blocked from the beginning
|
||||||
|
- Whether the sysctl issue (BUG-8) is limiting repair throughput
|
||||||
|
- Whether the missing metrics (BUG-7) would show what's happening internally
|
||||||
|
|
@ -0,0 +1,191 @@
|
||||||
|
# Shred Collector Relay
|
||||||
|
|
||||||
|
## Problem
|
||||||
|
|
||||||
|
Turbine assigns each validator a single position in the shred distribution tree
|
||||||
|
per slot, determined by its pubkey. A validator in Miami with one identity receives
|
||||||
|
shreds from one set of tree neighbors — typically ~60-70% of shreds for any given
|
||||||
|
slot. The remaining 30-40% must come from the repair protocol, which is too slow
|
||||||
|
to keep pace with chain production (see analysis below).
|
||||||
|
|
||||||
|
Commercial services (Jito ShredStream, bloXroute OFR) solve this by running many
|
||||||
|
nodes with different identities across the turbine tree, aggregating shreds, and
|
||||||
|
redistributing the combined set to subscribers. This works but costs $300-5,000/mo
|
||||||
|
and adds a dependency on a third party.
|
||||||
|
|
||||||
|
## Concept
|
||||||
|
|
||||||
|
Run lightweight **shred collector** nodes at multiple geographic locations on
|
||||||
|
the Laconic network (Ashburn, Dallas, etc.). Each collector has its own keypair,
|
||||||
|
joins gossip with a unique identity, receives turbine shreds from its unique tree
|
||||||
|
position, and forwards raw shred packets to the main validator in Miami. The main
|
||||||
|
validator inserts these shreds into its blockstore alongside its own turbine shreds,
|
||||||
|
increasing completeness toward 100% without relying on repair.
|
||||||
|
|
||||||
|
```
|
||||||
|
Turbine Tree
|
||||||
|
/ | \
|
||||||
|
/ | \
|
||||||
|
collector-ash collector-dfw biscayne (main validator)
|
||||||
|
(Ashburn) (Dallas) (Miami)
|
||||||
|
identity A identity B identity C
|
||||||
|
~60% shreds ~60% shreds ~60% shreds
|
||||||
|
\ | /
|
||||||
|
\ | /
|
||||||
|
→ UDP forward via DZ backbone →
|
||||||
|
|
|
||||||
|
biscayne blockstore
|
||||||
|
~95%+ shreds (union of A∪B∪C)
|
||||||
|
```
|
||||||
|
|
||||||
|
Each collector sees a different ~60% slice of the turbine tree. The union of
|
||||||
|
three independent positions yields ~94% coverage (1 - 0.4³ = 0.936). Four
|
||||||
|
collectors yield ~97%. The main validator fills the remaining few percent via
|
||||||
|
repair, which is fast when only 3-6% of shreds are missing.
|
||||||
|
|
||||||
|
## Why This Works
|
||||||
|
|
||||||
|
The math from biscayne's recovery (2026-03-06):
|
||||||
|
|
||||||
|
| Metric | Value |
|
||||||
|
|--------|-------|
|
||||||
|
| Compute-bound replay (complete blocks) | 5.2 slots/sec |
|
||||||
|
| Repair-bound replay (incomplete blocks) | 0.5 slots/sec |
|
||||||
|
| Chain production rate | 2.5 slots/sec |
|
||||||
|
| Turbine + relay delivery per identity | ~60-70% |
|
||||||
|
| Repair bandwidth | ~600 shreds/sec (estimated) |
|
||||||
|
| Repair needed to converge at 60% delivery | 5x current bandwidth |
|
||||||
|
| Repair needed to converge at 95% delivery | Easily sufficient |
|
||||||
|
|
||||||
|
At 60% shred delivery, repair must fill 40% per slot — too slow to converge.
|
||||||
|
At 95% delivery (3 collectors), repair fills 5% per slot — well within capacity.
|
||||||
|
The validator replays at near compute-bound speed (5+ slots/sec) and converges.
|
||||||
|
|
||||||
|
## Infrastructure
|
||||||
|
|
||||||
|
Laconic already has DZ-connected switches at multiple sites:
|
||||||
|
|
||||||
|
| Site | Device | Latency to Miami | Backbone |
|
||||||
|
|------|--------|-------------------|----------|
|
||||||
|
| Miami | laconic-mia-sw01 | 0.24ms | local |
|
||||||
|
| Ashburn | laconic-was-sw01 | ~29ms | Et4/1 25.4ms |
|
||||||
|
| Dallas | laconic-dfw-sw01 | ~30ms | TBD |
|
||||||
|
|
||||||
|
The DZ backbone carries traffic between sites at line rate. Shred packets are
|
||||||
|
~1280 bytes each. At ~3,000 shreds/slot and 2.5 slots/sec, each collector
|
||||||
|
forwards ~7,500 packets/sec (~10 MB/s) — trivial bandwidth for the backbone.
|
||||||
|
|
||||||
|
## Collector Architecture
|
||||||
|
|
||||||
|
The collector does NOT need to be a full validator. It needs to:
|
||||||
|
|
||||||
|
1. **Join gossip** — advertise a ContactInfo with its own pubkey and a TVU
|
||||||
|
address (the site's IP)
|
||||||
|
2. **Receive turbine shreds** — UDP packets on the advertised TVU port
|
||||||
|
3. **Forward shreds** — retransmit raw UDP packets to biscayne's TVU port
|
||||||
|
|
||||||
|
It does NOT need to: replay transactions, maintain accounts state, store a
|
||||||
|
ledger, load a snapshot, vote, or run RPC.
|
||||||
|
|
||||||
|
### Option A: Firedancer Minimal Build
|
||||||
|
|
||||||
|
Firedancer (Apache 2, C) has a tile-based architecture where each function
|
||||||
|
(net, gossip, shred, bank, store, etc.) runs as an independent Linux process.
|
||||||
|
A minimal build using only the networking + gossip + shred tiles would:
|
||||||
|
|
||||||
|
- Join gossip and advertise a TVU address
|
||||||
|
- Receive turbine shreds via the shred tile
|
||||||
|
- Forward shreds to a configured destination instead of to bank/store
|
||||||
|
|
||||||
|
This requires modifying the shred tile to add a UDP forwarder output instead
|
||||||
|
of (or in addition to) the normal bank handoff. The rest of the tile pipeline
|
||||||
|
(bank, pack, poh, store) is simply not started.
|
||||||
|
|
||||||
|
**Estimated effort:** Moderate. Firedancer's tile architecture is designed for
|
||||||
|
this kind of composition. The main work is adding a forwarder sink to the shred
|
||||||
|
tile and testing gossip participation without the full validator stack.
|
||||||
|
|
||||||
|
**Source:** https://github.com/firedancer-io/firedancer
|
||||||
|
|
||||||
|
### Option B: Agave Non-Voting Minimal
|
||||||
|
|
||||||
|
Run `agave-validator --no-voting` with `--limit-ledger-size 0` and minimal
|
||||||
|
config. Agave still requires a snapshot to start and runs the full process, but
|
||||||
|
with no voting and minimal ledger it would be lighter than a full node.
|
||||||
|
|
||||||
|
**Downside:** Agave is monolithic — you can't easily disable replay/accounts.
|
||||||
|
It still loads a snapshot, builds the accounts index, and runs replay. This
|
||||||
|
defeats the purpose of a lightweight collector.
|
||||||
|
|
||||||
|
### Option C: Custom Gossip + TVU Receiver
|
||||||
|
|
||||||
|
Write a minimal Rust binary using agave's `solana-gossip` and `solana-streamer`
|
||||||
|
crates to:
|
||||||
|
1. Bootstrap into gossip via entrypoints
|
||||||
|
2. Advertise ContactInfo with TVU socket
|
||||||
|
3. Receive shred packets on TVU
|
||||||
|
4. Forward them via UDP
|
||||||
|
|
||||||
|
**Estimated effort:** Significant. Gossip protocol participation is complex
|
||||||
|
(CRDS protocol, pull/push protocol, protocol versioning). Using the agave
|
||||||
|
crates directly is possible but poorly documented for standalone use.
|
||||||
|
|
||||||
|
### Option D: Run Collectors on Biscayne
|
||||||
|
|
||||||
|
Run the collector processes on biscayne itself, each advertising a TVU address
|
||||||
|
at a remote site. The switches at each site forward inbound TVU traffic to
|
||||||
|
biscayne via the DZ backbone using traffic-policy redirects (same pattern as
|
||||||
|
`ashburn-validator-relay.md`).
|
||||||
|
|
||||||
|
**Advantage:** No compute needed at remote sites. Just switch config + loopback
|
||||||
|
IPs. All collector processes run in Miami.
|
||||||
|
|
||||||
|
**Risk:** Gossip advertises IP + port. If the collector runs on biscayne but
|
||||||
|
advertises an Ashburn IP, gossip protocol interactions (pull requests, pings)
|
||||||
|
arrive at the Ashburn IP and must be forwarded back to biscayne. This adds
|
||||||
|
~58ms RTT to gossip protocol messages, which may cause timeouts or peer
|
||||||
|
quality degradation. Needs testing.
|
||||||
|
|
||||||
|
## Recommendation
|
||||||
|
|
||||||
|
Option A (Firedancer minimal build) is the correct long-term approach. It
|
||||||
|
produces a single binary that does exactly one thing: collect shreds from a
|
||||||
|
unique turbine tree position and forward them. It runs on minimal hardware
|
||||||
|
(a small VM or container at each site, or on biscayne with remote TVU
|
||||||
|
addresses).
|
||||||
|
|
||||||
|
Option D (collectors on biscayne with switch forwarding) is the fastest to
|
||||||
|
test since it needs no new software — just switch config and multiple
|
||||||
|
agave-validator instances with `--no-voting`. The question is whether agave
|
||||||
|
can start without a snapshot if we only care about gossip + TVU.
|
||||||
|
|
||||||
|
## Deployment Topology
|
||||||
|
|
||||||
|
```
|
||||||
|
biscayne (186.233.184.235)
|
||||||
|
├── agave-validator (main, identity C, TVU 186.233.184.235:9000)
|
||||||
|
├── collector-ash (identity A, TVU 137.239.194.65:9000)
|
||||||
|
│ └── shreds forwarded via was-sw01 traffic-policy
|
||||||
|
├── collector-dfw (identity B, TVU <dfw-ip>:9000)
|
||||||
|
│ └── shreds forwarded via dfw-sw01 traffic-policy
|
||||||
|
└── blockstore receives union of A∪B∪C shreds
|
||||||
|
|
||||||
|
was-sw01 (Ashburn)
|
||||||
|
└── Loopback: 137.239.194.65
|
||||||
|
└── traffic-policy: UDP dst 137.239.194.65:9000 → nexthop mia-sw01
|
||||||
|
|
||||||
|
dfw-sw01 (Dallas)
|
||||||
|
└── Loopback: <assigned IP>
|
||||||
|
└── traffic-policy: UDP dst <assigned IP>:9000 → nexthop mia-sw01
|
||||||
|
```
|
||||||
|
|
||||||
|
## Open Questions
|
||||||
|
|
||||||
|
1. Can agave-validator start in gossip-only mode without a snapshot?
|
||||||
|
2. Does Firedancer's shred tile work standalone without bank/replay?
|
||||||
|
3. What is the gossip protocol timeout for remote TVU addresses (Option D)?
|
||||||
|
4. How does the turbine tree handle multiple identities from the same IP
|
||||||
|
(if running all collectors on biscayne)?
|
||||||
|
5. Do we need stake on collector identities to be placed in the turbine tree,
|
||||||
|
or do unstaked nodes still participate?
|
||||||
|
6. What IP block is available on dfw-sw01 for a collector loopback?
|
||||||
|
|
@ -0,0 +1,161 @@
|
||||||
|
# TVU Shred Relay — Data-Plane Redirect
|
||||||
|
|
||||||
|
## Overview
|
||||||
|
|
||||||
|
Biscayne's agave validator advertises `64.92.84.81:20000` (laconic-was-sw01 Et1/1) as its TVU
|
||||||
|
address. Turbine shreds arrive as normal UDP to the switch's front-panel IP. The 7280CR3A ASIC
|
||||||
|
handles front-panel traffic without punting to Linux userspace — it sees a local interface IP
|
||||||
|
with no service and drops at the hardware level.
|
||||||
|
|
||||||
|
### Previous approach (monitor + socat)
|
||||||
|
|
||||||
|
EOS monitor session mirrored matched packets to CPU (mirror0 interface). socat read from mirror0
|
||||||
|
and relayed to biscayne. shred-unwrap.py on biscayne stripped encapsulation headers.
|
||||||
|
|
||||||
|
Fragile: socat ran as a foreground process, died on disconnect.
|
||||||
|
|
||||||
|
### New approach (traffic-policy redirect)
|
||||||
|
|
||||||
|
EOS `traffic-policy` with `set nexthop` and `system-rule overriding-action redirect` overrides
|
||||||
|
the ASIC's "local IP, handle myself" decision. The ASIC forwards matched packets to the
|
||||||
|
specified next-hop at line rate. Pure data plane, no CPU involvement, persists in startup-config.
|
||||||
|
|
||||||
|
Available since EOS 4.28.0F on R3 platforms. Confirmed on 4.34.0F.
|
||||||
|
|
||||||
|
## Architecture
|
||||||
|
|
||||||
|
```
|
||||||
|
Turbine peers (hundreds of validators)
|
||||||
|
|
|
||||||
|
v UDP shreds to 64.92.84.81:20000
|
||||||
|
laconic-was-sw01 Et1/1 (Ashburn)
|
||||||
|
| ASIC matches traffic-policy SHRED-RELAY
|
||||||
|
| Redirects to nexthop 172.16.1.189 (data plane, line rate)
|
||||||
|
v Et4/1 backbone (25.4ms)
|
||||||
|
laconic-mia-sw01 Et4/1 (Miami)
|
||||||
|
| forwards via default route (same metro)
|
||||||
|
v 0.13ms
|
||||||
|
biscayne (186.233.184.235, Miami)
|
||||||
|
| iptables DNAT: dst 64.92.84.81:20000 -> 127.0.0.1:9000
|
||||||
|
v
|
||||||
|
agave-validator TVU port (localhost:9000)
|
||||||
|
```
|
||||||
|
|
||||||
|
## Production Config: laconic-was-sw01
|
||||||
|
|
||||||
|
### Pre-change safety
|
||||||
|
|
||||||
|
```
|
||||||
|
configure checkpoint save pre-shred-relay
|
||||||
|
```
|
||||||
|
|
||||||
|
Rollback: `rollback running-config checkpoint pre-shred-relay` then `write memory`.
|
||||||
|
|
||||||
|
### Config session with auto-revert
|
||||||
|
|
||||||
|
```
|
||||||
|
configure session shred-relay
|
||||||
|
|
||||||
|
! ACL for traffic-policy match
|
||||||
|
ip access-list SHRED-RELAY-ACL
|
||||||
|
10 permit udp any any eq 20000
|
||||||
|
|
||||||
|
! Traffic policy: redirect matched packets to backbone next-hop
|
||||||
|
traffic-policy SHRED-RELAY
|
||||||
|
match SHRED-RELAY-ACL
|
||||||
|
set nexthop 172.16.1.189
|
||||||
|
|
||||||
|
! Override ASIC punt-to-CPU for redirected traffic
|
||||||
|
system-rule overriding-action redirect
|
||||||
|
|
||||||
|
! Apply to Et1/1 ingress
|
||||||
|
interface Ethernet1/1
|
||||||
|
traffic-policy input SHRED-RELAY
|
||||||
|
|
||||||
|
! Remove old monitor session and its ACL
|
||||||
|
no monitor session 1
|
||||||
|
no ip access-list SHRED-RELAY
|
||||||
|
|
||||||
|
! Review before committing
|
||||||
|
show session-config diffs
|
||||||
|
|
||||||
|
! Commit with 5-minute auto-revert safety net
|
||||||
|
commit timer 00:05:00
|
||||||
|
```
|
||||||
|
|
||||||
|
After verification: `configure session shred-relay commit` then `write memory`.
|
||||||
|
|
||||||
|
### Linux cleanup on was-sw01
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Kill socat relay (PID 27743)
|
||||||
|
kill 27743
|
||||||
|
# Remove Linux kernel route
|
||||||
|
ip route del 186.233.184.235/32
|
||||||
|
```
|
||||||
|
|
||||||
|
The EOS static route `ip route 186.233.184.235/32 172.16.1.189` stays (general reachability).
|
||||||
|
|
||||||
|
## Production Config: biscayne
|
||||||
|
|
||||||
|
### iptables DNAT
|
||||||
|
|
||||||
|
Traffic-policy sends normal L3-forwarded UDP packets (no mirror encapsulation). Packets arrive
|
||||||
|
with dst `64.92.84.81:20000` containing clean shred payloads directly in the UDP body.
|
||||||
|
|
||||||
|
```bash
|
||||||
|
sudo iptables -t nat -A PREROUTING -p udp -d 64.92.84.81 --dport 20000 \
|
||||||
|
-j DNAT --to-destination 127.0.0.1:9000
|
||||||
|
|
||||||
|
# Persist across reboot
|
||||||
|
sudo apt install -y iptables-persistent
|
||||||
|
sudo netfilter-persistent save
|
||||||
|
```
|
||||||
|
|
||||||
|
### Cleanup
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Kill shred-unwrap.py (PID 2497694)
|
||||||
|
kill 2497694
|
||||||
|
rm /tmp/shred-unwrap.py
|
||||||
|
```
|
||||||
|
|
||||||
|
## Verification
|
||||||
|
|
||||||
|
1. `show traffic-policy interface Ethernet1/1` — policy applied
|
||||||
|
2. `show traffic-policy counters` — packets matching and redirected
|
||||||
|
3. `sudo iptables -t nat -L PREROUTING -v -n` — DNAT rule with packet counts
|
||||||
|
4. Validator logs: slot replay rate should maintain ~3.3 slots/sec
|
||||||
|
5. `ss -unp | grep 9000` — validator receiving on TVU port
|
||||||
|
|
||||||
|
## What was removed
|
||||||
|
|
||||||
|
| Component | Host |
|
||||||
|
|-----------|------|
|
||||||
|
| monitor session 1 | was-sw01 |
|
||||||
|
| SHRED-RELAY ACL (old) | was-sw01 |
|
||||||
|
| socat relay process | was-sw01 |
|
||||||
|
| Linux kernel static route | was-sw01 |
|
||||||
|
| shred-unwrap.py | biscayne |
|
||||||
|
|
||||||
|
## What was added
|
||||||
|
|
||||||
|
| Component | Host | Persistent? |
|
||||||
|
|-----------|------|-------------|
|
||||||
|
| traffic-policy SHRED-RELAY | was-sw01 | Yes (startup-config) |
|
||||||
|
| SHRED-RELAY-ACL | was-sw01 | Yes (startup-config) |
|
||||||
|
| system-rule overriding-action redirect | was-sw01 | Yes (startup-config) |
|
||||||
|
| iptables DNAT rule | biscayne | Yes (iptables-persistent) |
|
||||||
|
|
||||||
|
## Key Details
|
||||||
|
|
||||||
|
| Item | Value |
|
||||||
|
|------|-------|
|
||||||
|
| Biscayne validator identity | `4WeLUxfQghbhsLEuwaAzjZiHg2VBw87vqHc4iZrGvKPr` |
|
||||||
|
| Biscayne IP | `186.233.184.235` |
|
||||||
|
| laconic-was-sw01 public IP | `64.92.84.81` (Et1/1) |
|
||||||
|
| laconic-was-sw01 backbone IP | `172.16.1.188` (Et4/1) |
|
||||||
|
| laconic-was-sw01 SSH | `install@137.239.200.198` |
|
||||||
|
| laconic-mia-sw01 backbone IP | `172.16.1.189` (Et4/1) |
|
||||||
|
| Backbone RTT (WAS-MIA) | 25.4ms |
|
||||||
|
| EOS version | 4.34.0F |
|
||||||
|
|
@ -0,0 +1,14 @@
|
||||||
|
all:
|
||||||
|
hosts:
|
||||||
|
biscayne:
|
||||||
|
ansible_host: biscayne.vaasl.io
|
||||||
|
ansible_user: rix
|
||||||
|
ansible_become: true
|
||||||
|
|
||||||
|
# DoubleZero identities
|
||||||
|
dz_identity: 3Bw6v7EruQvTwoY79h2QjQCs2KBQFzSneBdYUbcXK1Tr
|
||||||
|
validator_identity: 4WeLUxfQghbhsLEuwaAzjZiHg2VBw87vqHc4iZrGvKPr
|
||||||
|
client_ip: 186.233.184.235
|
||||||
|
dz_device: laconic-mia-sw01
|
||||||
|
dz_tenant: laconic
|
||||||
|
dz_environment: mainnet-beta
|
||||||
|
|
@ -0,0 +1,23 @@
|
||||||
|
all:
|
||||||
|
children:
|
||||||
|
switches:
|
||||||
|
vars:
|
||||||
|
ansible_connection: ansible.netcommon.network_cli
|
||||||
|
ansible_network_os: arista.eos.eos
|
||||||
|
ansible_user: install
|
||||||
|
ansible_become: true
|
||||||
|
ansible_become_method: enable
|
||||||
|
hosts:
|
||||||
|
was-sw01:
|
||||||
|
ansible_host: 137.239.200.198
|
||||||
|
# Et1/1: 64.92.84.81 (Ashburn uplink)
|
||||||
|
# Et4/1: 172.16.1.188 (backbone to mia-sw01)
|
||||||
|
# Loopback100: 137.239.194.64/32
|
||||||
|
backbone_ip: 172.16.1.188
|
||||||
|
backbone_peer: 172.16.1.189
|
||||||
|
uplink_gateway: 64.92.84.80
|
||||||
|
mia-sw01:
|
||||||
|
ansible_host: 209.42.167.133
|
||||||
|
# Et4/1: 172.16.1.189 (backbone to was-sw01)
|
||||||
|
backbone_ip: 172.16.1.189
|
||||||
|
backbone_peer: 172.16.1.188
|
||||||
|
|
@ -156,73 +156,62 @@
|
||||||
failed_when: "add_ip.rc != 0 and 'RTNETLINK answers: File exists' not in add_ip.stderr"
|
failed_when: "add_ip.rc != 0 and 'RTNETLINK answers: File exists' not in add_ip.stderr"
|
||||||
tags: [inbound]
|
tags: [inbound]
|
||||||
|
|
||||||
- name: Add DNAT for gossip UDP
|
- name: Add DNAT rules (inserted before DOCKER chain)
|
||||||
ansible.builtin.iptables:
|
ansible.builtin.shell:
|
||||||
table: nat
|
cmd: |
|
||||||
chain: PREROUTING
|
set -o pipefail
|
||||||
protocol: udp
|
# DNAT rules must be before Docker's ADDRTYPE LOCAL rule, otherwise
|
||||||
destination: "{{ ashburn_ip }}"
|
# Docker's PREROUTING chain swallows traffic to 137.239.194.65 (which
|
||||||
destination_port: "{{ gossip_port }}"
|
# is on loopback and therefore type LOCAL).
|
||||||
jump: DNAT
|
for rule in \
|
||||||
to_destination: "{{ kind_node_ip }}:{{ gossip_port }}"
|
"-p udp -d {{ ashburn_ip }} --dport {{ gossip_port }} -j DNAT --to-destination {{ kind_node_ip }}:{{ gossip_port }}" \
|
||||||
|
"-p tcp -d {{ ashburn_ip }} --dport {{ gossip_port }} -j DNAT --to-destination {{ kind_node_ip }}:{{ gossip_port }}" \
|
||||||
|
"-p udp -d {{ ashburn_ip }} --dport {{ dynamic_port_range_start }}:{{ dynamic_port_range_end }} -j DNAT --to-destination {{ kind_node_ip }}" \
|
||||||
|
; do
|
||||||
|
if ! iptables -t nat -C PREROUTING $rule 2>/dev/null; then
|
||||||
|
iptables -t nat -I PREROUTING 1 $rule
|
||||||
|
echo "added: $rule"
|
||||||
|
else
|
||||||
|
echo "exists: $rule"
|
||||||
|
fi
|
||||||
|
done
|
||||||
|
executable: /bin/bash
|
||||||
|
register: dnat_result
|
||||||
|
changed_when: "'added' in dnat_result.stdout"
|
||||||
tags: [inbound]
|
tags: [inbound]
|
||||||
|
|
||||||
- name: Add DNAT for gossip TCP
|
- name: Show DNAT result
|
||||||
ansible.builtin.iptables:
|
ansible.builtin.debug:
|
||||||
table: nat
|
var: dnat_result.stdout_lines
|
||||||
chain: PREROUTING
|
|
||||||
protocol: tcp
|
|
||||||
destination: "{{ ashburn_ip }}"
|
|
||||||
destination_port: "{{ gossip_port }}"
|
|
||||||
jump: DNAT
|
|
||||||
to_destination: "{{ kind_node_ip }}:{{ gossip_port }}"
|
|
||||||
tags: [inbound]
|
|
||||||
|
|
||||||
- name: Add DNAT for dynamic ports (UDP 9000-9025)
|
|
||||||
ansible.builtin.iptables:
|
|
||||||
table: nat
|
|
||||||
chain: PREROUTING
|
|
||||||
protocol: udp
|
|
||||||
destination: "{{ ashburn_ip }}"
|
|
||||||
destination_port: "{{ dynamic_port_range_start }}:{{ dynamic_port_range_end }}"
|
|
||||||
jump: DNAT
|
|
||||||
to_destination: "{{ kind_node_ip }}"
|
|
||||||
tags: [inbound]
|
tags: [inbound]
|
||||||
|
|
||||||
# ------------------------------------------------------------------
|
# ------------------------------------------------------------------
|
||||||
# Outbound: fwmark + SNAT + policy routing
|
# Outbound: fwmark + SNAT + policy routing
|
||||||
# ------------------------------------------------------------------
|
# ------------------------------------------------------------------
|
||||||
- name: Mark outbound validator UDP gossip traffic
|
- name: Mark outbound validator traffic (mangle PREROUTING)
|
||||||
ansible.builtin.iptables:
|
ansible.builtin.shell:
|
||||||
table: mangle
|
cmd: |
|
||||||
chain: PREROUTING
|
set -o pipefail
|
||||||
protocol: udp
|
for rule in \
|
||||||
source: "{{ kind_network }}"
|
"-p udp -s {{ kind_network }} --sport {{ gossip_port }} -j MARK --set-mark {{ fwmark }}" \
|
||||||
source_port: "{{ gossip_port }}"
|
"-p udp -s {{ kind_network }} --sport {{ dynamic_port_range_start }}:{{ dynamic_port_range_end }} -j MARK --set-mark {{ fwmark }}" \
|
||||||
jump: MARK
|
"-p tcp -s {{ kind_network }} --sport {{ gossip_port }} -j MARK --set-mark {{ fwmark }}" \
|
||||||
set_mark: "{{ fwmark }}"
|
; do
|
||||||
|
if ! iptables -t mangle -C PREROUTING $rule 2>/dev/null; then
|
||||||
|
iptables -t mangle -A PREROUTING $rule
|
||||||
|
echo "added: $rule"
|
||||||
|
else
|
||||||
|
echo "exists: $rule"
|
||||||
|
fi
|
||||||
|
done
|
||||||
|
executable: /bin/bash
|
||||||
|
register: mangle_result
|
||||||
|
changed_when: "'added' in mangle_result.stdout"
|
||||||
tags: [outbound]
|
tags: [outbound]
|
||||||
|
|
||||||
- name: Mark outbound validator UDP dynamic port traffic
|
- name: Show mangle result
|
||||||
ansible.builtin.iptables:
|
ansible.builtin.debug:
|
||||||
table: mangle
|
var: mangle_result.stdout_lines
|
||||||
chain: PREROUTING
|
|
||||||
protocol: udp
|
|
||||||
source: "{{ kind_network }}"
|
|
||||||
source_port: "{{ dynamic_port_range_start }}:{{ dynamic_port_range_end }}"
|
|
||||||
jump: MARK
|
|
||||||
set_mark: "{{ fwmark }}"
|
|
||||||
tags: [outbound]
|
|
||||||
|
|
||||||
- name: Mark outbound validator TCP gossip traffic
|
|
||||||
ansible.builtin.iptables:
|
|
||||||
table: mangle
|
|
||||||
chain: PREROUTING
|
|
||||||
protocol: tcp
|
|
||||||
source: "{{ kind_network }}"
|
|
||||||
source_port: "{{ gossip_port }}"
|
|
||||||
jump: MARK
|
|
||||||
set_mark: "{{ fwmark }}"
|
|
||||||
tags: [outbound]
|
tags: [outbound]
|
||||||
|
|
||||||
- name: SNAT marked traffic to Ashburn IP (before Docker MASQUERADE)
|
- name: SNAT marked traffic to Ashburn IP (before Docker MASQUERADE)
|
||||||
|
|
@ -337,7 +326,7 @@
|
||||||
nat_rules: "{{ nat_rules.stdout_lines }}"
|
nat_rules: "{{ nat_rules.stdout_lines }}"
|
||||||
mangle_rules: "{{ mangle_rules.stdout_lines | default([]) }}"
|
mangle_rules: "{{ mangle_rules.stdout_lines | default([]) }}"
|
||||||
routing: "{{ routing_info.stdout_lines | default([]) }}"
|
routing: "{{ routing_info.stdout_lines | default([]) }}"
|
||||||
loopback: "{{ lo_addrs.stdout_lines }}"
|
loopback: "{{ lo_addrs.stdout_lines | default([]) }}"
|
||||||
tags: [inbound, outbound]
|
tags: [inbound, outbound]
|
||||||
|
|
||||||
- name: Summary
|
- name: Summary
|
||||||
|
|
|
||||||
|
|
@ -1,14 +1,19 @@
|
||||||
---
|
---
|
||||||
# Configure laconic-mia-sw01 for outbound validator traffic redirect
|
# Configure laconic-mia-sw01 for validator traffic relay (inbound + outbound)
|
||||||
#
|
#
|
||||||
# Redirects outbound traffic from biscayne (src 137.239.194.65) arriving
|
# Outbound: Redirects outbound traffic from biscayne (src 137.239.194.65)
|
||||||
# via the doublezero0 GRE tunnel to was-sw01 via the backbone, preventing
|
# arriving via the doublezero0 GRE tunnel to was-sw01 via the backbone,
|
||||||
# BCP38 drops at mia-sw01's ISP uplink.
|
# preventing BCP38 drops at mia-sw01's ISP uplink.
|
||||||
|
#
|
||||||
|
# Inbound: Routes traffic destined to 137.239.194.65 from the default VRF
|
||||||
|
# to biscayne via Tunnel500 in vrf1. Without this, mia-sw01 sends
|
||||||
|
# 137.239.194.65 out the ISP uplink back to was-sw01 (routing loop).
|
||||||
#
|
#
|
||||||
# Approach: The existing per-tunnel ACL (SEC-USER-500-IN) controls what
|
# Approach: The existing per-tunnel ACL (SEC-USER-500-IN) controls what
|
||||||
# traffic enters vrf1 from Tunnel500. We add 137.239.194.65 to the ACL
|
# traffic enters vrf1 from Tunnel500. We add 137.239.194.65 to the ACL
|
||||||
# and add a default route in vrf1 via egress-vrf default pointing to
|
# and add a default route in vrf1 via egress-vrf default pointing to
|
||||||
# was-sw01's backbone IP. No PBR needed — the ACL is the filter.
|
# was-sw01's backbone IP. For inbound, an inter-VRF static route in the
|
||||||
|
# default VRF forwards 137.239.194.65/32 to biscayne via Tunnel500.
|
||||||
#
|
#
|
||||||
# The other vrf1 tunnels (502, 504, 505) have their own ACLs that only
|
# The other vrf1 tunnels (502, 504, 505) have their own ACLs that only
|
||||||
# permit their specific source IPs, so the default route won't affect them.
|
# permit their specific source IPs, so the default route won't affect them.
|
||||||
|
|
@ -39,6 +44,7 @@
|
||||||
tunnel_interface: Tunnel500
|
tunnel_interface: Tunnel500
|
||||||
tunnel_vrf: vrf1
|
tunnel_vrf: vrf1
|
||||||
tunnel_acl: SEC-USER-500-IN
|
tunnel_acl: SEC-USER-500-IN
|
||||||
|
tunnel_nexthop: 169.254.7.7 # biscayne's end of the Tunnel500 /31
|
||||||
backbone_interface: Ethernet4/1
|
backbone_interface: Ethernet4/1
|
||||||
session_name: validator-outbound
|
session_name: validator-outbound
|
||||||
checkpoint_name: pre-validator-outbound
|
checkpoint_name: pre-validator-outbound
|
||||||
|
|
@ -117,6 +123,7 @@
|
||||||
- "show ip route vrf {{ tunnel_vrf }} 0.0.0.0/0"
|
- "show ip route vrf {{ tunnel_vrf }} 0.0.0.0/0"
|
||||||
- "show ip route vrf {{ tunnel_vrf }} {{ backbone_peer }}"
|
- "show ip route vrf {{ tunnel_vrf }} {{ backbone_peer }}"
|
||||||
- "show ip route {{ backbone_peer }}"
|
- "show ip route {{ backbone_peer }}"
|
||||||
|
- "show ip route {{ ashburn_ip }}"
|
||||||
register: vrf_routing
|
register: vrf_routing
|
||||||
tags: [preflight]
|
tags: [preflight]
|
||||||
|
|
||||||
|
|
@ -163,6 +170,11 @@
|
||||||
# Default route in vrf1 via backbone to was-sw01 (egress-vrf default)
|
# Default route in vrf1 via backbone to was-sw01 (egress-vrf default)
|
||||||
# Safe because per-tunnel ACLs already restrict what enters vrf1
|
# Safe because per-tunnel ACLs already restrict what enters vrf1
|
||||||
- command: "ip route vrf {{ tunnel_vrf }} 0.0.0.0/0 egress-vrf default {{ backbone_interface }} {{ backbone_peer }}"
|
- command: "ip route vrf {{ tunnel_vrf }} 0.0.0.0/0 egress-vrf default {{ backbone_interface }} {{ backbone_peer }}"
|
||||||
|
# Inbound: route traffic for ashburn IP from default VRF to biscayne via tunnel.
|
||||||
|
# Without this, mia-sw01 sends 137.239.194.65 out the ISP uplink → routing loop.
|
||||||
|
# NOTE: nexthop only, no interface — EOS silently drops cross-VRF routes that
|
||||||
|
# specify a tunnel interface (accepts in config but never installs in RIB).
|
||||||
|
- command: "ip route {{ ashburn_ip }}/32 egress-vrf {{ tunnel_vrf }} {{ tunnel_nexthop }}"
|
||||||
|
|
||||||
- name: Show session diff
|
- name: Show session diff
|
||||||
arista.eos.eos_command:
|
arista.eos.eos_command:
|
||||||
|
|
@ -189,6 +201,7 @@
|
||||||
commands:
|
commands:
|
||||||
- "show running-config | section ip access-list {{ tunnel_acl }}"
|
- "show running-config | section ip access-list {{ tunnel_acl }}"
|
||||||
- "show ip route vrf {{ tunnel_vrf }} 0.0.0.0/0"
|
- "show ip route vrf {{ tunnel_vrf }} 0.0.0.0/0"
|
||||||
|
- "show ip route {{ ashburn_ip }}"
|
||||||
register: verify
|
register: verify
|
||||||
|
|
||||||
- name: Display verification
|
- name: Display verification
|
||||||
|
|
@ -205,6 +218,7 @@
|
||||||
Changes applied:
|
Changes applied:
|
||||||
1. ACL {{ tunnel_acl }}: added "45 permit ip host {{ ashburn_ip }} any"
|
1. ACL {{ tunnel_acl }}: added "45 permit ip host {{ ashburn_ip }} any"
|
||||||
2. Default route in {{ tunnel_vrf }}: 0.0.0.0/0 egress-vrf default {{ backbone_interface }} {{ backbone_peer }}
|
2. Default route in {{ tunnel_vrf }}: 0.0.0.0/0 egress-vrf default {{ backbone_interface }} {{ backbone_peer }}
|
||||||
|
3. Inbound route: {{ ashburn_ip }}/32 egress-vrf {{ tunnel_vrf }} {{ tunnel_nexthop }}
|
||||||
|
|
||||||
The config will auto-revert in 5 minutes unless committed.
|
The config will auto-revert in 5 minutes unless committed.
|
||||||
Verify on the switch, then commit:
|
Verify on the switch, then commit:
|
||||||
|
|
|
||||||
|
|
@ -1,15 +1,20 @@
|
||||||
---
|
---
|
||||||
# Configure laconic-was-sw01 for full validator traffic relay
|
# Configure laconic-was-sw01 for inbound validator traffic relay
|
||||||
#
|
#
|
||||||
# Replaces the old SHRED-RELAY (TVU-only, port 20000) with VALIDATOR-RELAY
|
# Routes all traffic destined to 137.239.194.65 to mia-sw01 via backbone.
|
||||||
# covering all validator ports (8001, 9000-9025). Adds Loopback101 for
|
# A single static route replaces the previous Loopback101 + PBR approach.
|
||||||
# 137.239.194.65.
|
|
||||||
#
|
#
|
||||||
# Uses EOS config session with 5-minute auto-revert for safety.
|
# 137.239.194.65 is already routed to was-sw01 by its covering prefix
|
||||||
# After verification, run with -e commit=true to finalize.
|
# (advertised via IS-IS on Loopback100). No loopback needed — the static
|
||||||
|
# route forwards traffic before the switch tries to deliver it locally.
|
||||||
|
#
|
||||||
|
# This playbook also removes the old PBR config if present (Loopback101,
|
||||||
|
# VALIDATOR-RELAY-ACL, VALIDATOR-RELAY-CLASS, VALIDATOR-RELAY policy-map,
|
||||||
|
# service-policy on Et1/1).
|
||||||
#
|
#
|
||||||
# Usage:
|
# Usage:
|
||||||
# ansible-playbook -i inventory/switches.yml playbooks/ashburn-relay-was-sw01.yml
|
# ansible-playbook -i inventory/switches.yml playbooks/ashburn-relay-was-sw01.yml
|
||||||
|
# ansible-playbook -i inventory/switches.yml playbooks/ashburn-relay-was-sw01.yml -e apply=true
|
||||||
# ansible-playbook -i inventory/switches.yml playbooks/ashburn-relay-was-sw01.yml -e commit=true
|
# ansible-playbook -i inventory/switches.yml playbooks/ashburn-relay-was-sw01.yml -e commit=true
|
||||||
# ansible-playbook -i inventory/switches.yml playbooks/ashburn-relay-was-sw01.yml -e rollback=true
|
# ansible-playbook -i inventory/switches.yml playbooks/ashburn-relay-was-sw01.yml -e rollback=true
|
||||||
|
|
||||||
|
|
@ -19,10 +24,11 @@
|
||||||
|
|
||||||
vars:
|
vars:
|
||||||
ashburn_ip: 137.239.194.65
|
ashburn_ip: 137.239.194.65
|
||||||
|
apply: false
|
||||||
commit: false
|
commit: false
|
||||||
rollback: false
|
rollback: false
|
||||||
session_name: validator-relay
|
session_name: validator-relay-v2
|
||||||
checkpoint_name: pre-validator-relay
|
checkpoint_name: pre-validator-relay-v2
|
||||||
|
|
||||||
tasks:
|
tasks:
|
||||||
# ------------------------------------------------------------------
|
# ------------------------------------------------------------------
|
||||||
|
|
@ -66,77 +72,78 @@
|
||||||
ansible.builtin.meta: end_play
|
ansible.builtin.meta: end_play
|
||||||
|
|
||||||
# ------------------------------------------------------------------
|
# ------------------------------------------------------------------
|
||||||
# Pre-checks
|
# Pre-flight checks
|
||||||
# ------------------------------------------------------------------
|
# ------------------------------------------------------------------
|
||||||
- name: Show current traffic-policy on Et1/1
|
- name: Show current Et1/1 config
|
||||||
arista.eos.eos_command:
|
arista.eos.eos_command:
|
||||||
commands:
|
commands:
|
||||||
- show running-config interfaces Ethernet1/1
|
- show running-config interfaces Ethernet1/1
|
||||||
register: et1_config
|
register: et1_config
|
||||||
|
tags: [preflight]
|
||||||
|
|
||||||
- name: Show current config
|
- name: Display Et1/1 config
|
||||||
ansible.builtin.debug:
|
ansible.builtin.debug:
|
||||||
var: et1_config.stdout_lines
|
var: et1_config.stdout_lines
|
||||||
|
tags: [preflight]
|
||||||
|
|
||||||
- name: Show existing PBR policy on Et1/1
|
- name: Check for existing Loopback101 and PBR
|
||||||
arista.eos.eos_command:
|
arista.eos.eos_command:
|
||||||
commands:
|
commands:
|
||||||
|
- "show running-config interfaces Loopback101"
|
||||||
- "show running-config | include service-policy"
|
- "show running-config | include service-policy"
|
||||||
register: existing_pbr
|
- "show running-config section policy-map type pbr"
|
||||||
|
- "show ip route {{ ashburn_ip }}"
|
||||||
|
register: existing_config
|
||||||
|
tags: [preflight]
|
||||||
|
|
||||||
- name: Show existing PBR config
|
- name: Display existing config
|
||||||
ansible.builtin.debug:
|
ansible.builtin.debug:
|
||||||
var: existing_pbr.stdout_lines
|
var: existing_config.stdout_lines
|
||||||
|
tags: [preflight]
|
||||||
|
|
||||||
|
- name: Pre-flight summary
|
||||||
|
when: not (apply | bool)
|
||||||
|
ansible.builtin.debug:
|
||||||
|
msg: |
|
||||||
|
=== Pre-flight complete ===
|
||||||
|
Review the output above:
|
||||||
|
1. Does Loopback101 exist with {{ ashburn_ip }}? (will be removed)
|
||||||
|
2. Is service-policy VALIDATOR-RELAY on Et1/1? (will be removed)
|
||||||
|
3. Current route for {{ ashburn_ip }}
|
||||||
|
|
||||||
|
To apply config:
|
||||||
|
ansible-playbook -i inventory/switches.yml playbooks/ashburn-relay-was-sw01.yml \
|
||||||
|
-e apply=true
|
||||||
|
tags: [preflight]
|
||||||
|
|
||||||
|
- name: End play if not applying
|
||||||
|
when: not (apply | bool)
|
||||||
|
ansible.builtin.meta: end_play
|
||||||
|
|
||||||
# ------------------------------------------------------------------
|
# ------------------------------------------------------------------
|
||||||
# Save checkpoint
|
# Apply config via session with 5-minute auto-revert
|
||||||
# ------------------------------------------------------------------
|
# ------------------------------------------------------------------
|
||||||
- name: Save checkpoint for rollback
|
- name: Save checkpoint
|
||||||
arista.eos.eos_command:
|
arista.eos.eos_command:
|
||||||
commands:
|
commands:
|
||||||
- "configure checkpoint save {{ checkpoint_name }}"
|
- "configure checkpoint save {{ checkpoint_name }}"
|
||||||
register: checkpoint_result
|
|
||||||
|
|
||||||
- name: Show checkpoint result
|
- name: Apply config session
|
||||||
ansible.builtin.debug:
|
|
||||||
var: checkpoint_result.stdout_lines
|
|
||||||
|
|
||||||
# ------------------------------------------------------------------
|
|
||||||
# Apply via config session with 5-minute auto-revert
|
|
||||||
#
|
|
||||||
# eos_config writes directly to running-config, bypassing sessions.
|
|
||||||
# Use eos_command with raw CLI to get the safety net.
|
|
||||||
# ------------------------------------------------------------------
|
|
||||||
- name: Apply config session with auto-revert
|
|
||||||
arista.eos.eos_command:
|
arista.eos.eos_command:
|
||||||
commands:
|
commands:
|
||||||
# Enter named config session
|
|
||||||
- command: "configure session {{ session_name }}"
|
- command: "configure session {{ session_name }}"
|
||||||
# Loopback101 for Ashburn IP
|
# Remove old PBR service-policy from Et1/1
|
||||||
- command: interface Loopback101
|
|
||||||
- command: "ip address {{ ashburn_ip }}/32"
|
|
||||||
- command: exit
|
|
||||||
# ACL covering all validator ports
|
|
||||||
- command: ip access-list VALIDATOR-RELAY-ACL
|
|
||||||
- command: 10 permit udp any any eq 8001
|
|
||||||
- command: 20 permit udp any any range 9000 9025
|
|
||||||
- command: 30 permit tcp any any eq 8001
|
|
||||||
- command: exit
|
|
||||||
# PBR class-map referencing the ACL
|
|
||||||
- command: class-map type pbr match-any VALIDATOR-RELAY-CLASS
|
|
||||||
- command: match ip access-group VALIDATOR-RELAY-ACL
|
|
||||||
- command: exit
|
|
||||||
# PBR policy-map with nexthop redirect
|
|
||||||
- command: policy-map type pbr VALIDATOR-RELAY
|
|
||||||
- command: class VALIDATOR-RELAY-CLASS
|
|
||||||
- command: "set nexthop {{ backbone_peer }}"
|
|
||||||
- command: exit
|
|
||||||
- command: exit
|
|
||||||
# Apply PBR policy on Et1/1
|
|
||||||
- command: interface Ethernet1/1
|
- command: interface Ethernet1/1
|
||||||
- command: service-policy type pbr input VALIDATOR-RELAY
|
- command: no service-policy type pbr input VALIDATOR-RELAY
|
||||||
- command: exit
|
- command: exit
|
||||||
tags: [config]
|
# Remove old PBR policy-map, class-map, ACL
|
||||||
|
- command: no policy-map type pbr VALIDATOR-RELAY
|
||||||
|
- command: no class-map type pbr match-any VALIDATOR-RELAY-CLASS
|
||||||
|
- command: no ip access-list VALIDATOR-RELAY-ACL
|
||||||
|
# Remove Loopback101
|
||||||
|
- command: no interface Loopback101
|
||||||
|
# Add static route to forward all traffic for ashburn IP to mia-sw01
|
||||||
|
- command: "ip route {{ ashburn_ip }}/32 {{ backbone_peer }}"
|
||||||
|
|
||||||
- name: Show session diff
|
- name: Show session diff
|
||||||
arista.eos.eos_command:
|
arista.eos.eos_command:
|
||||||
|
|
@ -154,32 +161,20 @@
|
||||||
arista.eos.eos_command:
|
arista.eos.eos_command:
|
||||||
commands:
|
commands:
|
||||||
- "configure session {{ session_name }} commit timer 00:05:00"
|
- "configure session {{ session_name }} commit timer 00:05:00"
|
||||||
tags: [config]
|
|
||||||
|
|
||||||
# ------------------------------------------------------------------
|
# ------------------------------------------------------------------
|
||||||
# Verify
|
# Verify
|
||||||
# ------------------------------------------------------------------
|
# ------------------------------------------------------------------
|
||||||
- name: Show PBR policy on Et1/1
|
- name: Verify config
|
||||||
arista.eos.eos_command:
|
arista.eos.eos_command:
|
||||||
commands:
|
commands:
|
||||||
|
- "show ip route {{ ashburn_ip }}"
|
||||||
- show running-config interfaces Ethernet1/1
|
- show running-config interfaces Ethernet1/1
|
||||||
- show running-config section policy-map
|
register: verify
|
||||||
- show ip interface Loopback101
|
|
||||||
register: pbr_interface
|
|
||||||
|
|
||||||
- name: Display verification
|
- name: Display verification
|
||||||
ansible.builtin.debug:
|
ansible.builtin.debug:
|
||||||
var: pbr_interface.stdout_lines
|
var: verify.stdout_lines
|
||||||
|
|
||||||
- name: Show Loopback101
|
|
||||||
arista.eos.eos_command:
|
|
||||||
commands:
|
|
||||||
- show ip interface Loopback101
|
|
||||||
register: lo101
|
|
||||||
|
|
||||||
- name: Display Loopback101
|
|
||||||
ansible.builtin.debug:
|
|
||||||
var: lo101.stdout_lines
|
|
||||||
|
|
||||||
- name: Reminder
|
- name: Reminder
|
||||||
ansible.builtin.debug:
|
ansible.builtin.debug:
|
||||||
|
|
@ -188,8 +183,12 @@
|
||||||
Session: {{ session_name }}
|
Session: {{ session_name }}
|
||||||
Checkpoint: {{ checkpoint_name }}
|
Checkpoint: {{ checkpoint_name }}
|
||||||
|
|
||||||
|
Changes applied:
|
||||||
|
1. Removed: Loopback101, VALIDATOR-RELAY PBR (ACL, class-map, policy-map, service-policy)
|
||||||
|
2. Added: ip route {{ ashburn_ip }}/32 {{ backbone_peer }}
|
||||||
|
|
||||||
The config will auto-revert in 5 minutes unless committed.
|
The config will auto-revert in 5 minutes unless committed.
|
||||||
Verify PBR policy is applied, then commit from the switch CLI:
|
Verify on the switch, then commit:
|
||||||
configure session {{ session_name }} commit
|
configure session {{ session_name }} commit
|
||||||
write memory
|
write memory
|
||||||
|
|
||||||
|
|
|
||||||
|
|
@ -0,0 +1,107 @@
|
||||||
|
---
|
||||||
|
# Configure biscayne OS-level services for agave validator
|
||||||
|
#
|
||||||
|
# Installs a systemd unit that formats and mounts the ramdisk on boot.
|
||||||
|
# /dev/ram0 loses its filesystem on reboot, so mkfs.xfs must run before
|
||||||
|
# the fstab mount. This unit runs before docker, ensuring the kind node's
|
||||||
|
# bind mounts always see the ramdisk.
|
||||||
|
#
|
||||||
|
# This playbook is idempotent — safe to run multiple times.
|
||||||
|
#
|
||||||
|
# Usage:
|
||||||
|
# ansible-playbook -i biscayne.vaasl.io, playbooks/biscayne-boot.yml
|
||||||
|
#
|
||||||
|
- name: Configure OS-level services for agave
|
||||||
|
hosts: all
|
||||||
|
gather_facts: false
|
||||||
|
become: true
|
||||||
|
vars:
|
||||||
|
ramdisk_device: /dev/ram0
|
||||||
|
ramdisk_mount: /srv/solana/ramdisk
|
||||||
|
accounts_dir: /srv/solana/ramdisk/accounts
|
||||||
|
|
||||||
|
tasks:
|
||||||
|
- name: Install ramdisk format service
|
||||||
|
copy:
|
||||||
|
dest: /etc/systemd/system/format-ramdisk.service
|
||||||
|
mode: "0644"
|
||||||
|
content: |
|
||||||
|
[Unit]
|
||||||
|
Description=Format /dev/ram0 as XFS for Solana accounts
|
||||||
|
DefaultDependencies=no
|
||||||
|
Before=local-fs.target
|
||||||
|
After=systemd-modules-load.service
|
||||||
|
ConditionPathExists={{ ramdisk_device }}
|
||||||
|
|
||||||
|
[Service]
|
||||||
|
Type=oneshot
|
||||||
|
RemainAfterExit=yes
|
||||||
|
ExecStart=/sbin/mkfs.xfs -f {{ ramdisk_device }}
|
||||||
|
|
||||||
|
[Install]
|
||||||
|
WantedBy=local-fs.target
|
||||||
|
register: unit_file
|
||||||
|
|
||||||
|
- name: Install ramdisk post-mount service
|
||||||
|
copy:
|
||||||
|
dest: /etc/systemd/system/ramdisk-accounts.service
|
||||||
|
mode: "0644"
|
||||||
|
content: |
|
||||||
|
[Unit]
|
||||||
|
Description=Create Solana accounts directory on ramdisk
|
||||||
|
After=srv-solana-ramdisk.mount
|
||||||
|
Requires=srv-solana-ramdisk.mount
|
||||||
|
|
||||||
|
[Service]
|
||||||
|
Type=oneshot
|
||||||
|
RemainAfterExit=yes
|
||||||
|
ExecStart=/bin/bash -c 'mkdir -p {{ accounts_dir }} && chown solana:solana {{ ramdisk_mount }} {{ accounts_dir }}'
|
||||||
|
|
||||||
|
[Install]
|
||||||
|
WantedBy=multi-user.target
|
||||||
|
register: accounts_unit
|
||||||
|
|
||||||
|
- name: Ensure fstab entry uses nofail
|
||||||
|
lineinfile:
|
||||||
|
path: /etc/fstab
|
||||||
|
regexp: '^{{ ramdisk_device }}\s+{{ ramdisk_mount }}'
|
||||||
|
line: '{{ ramdisk_device }} {{ ramdisk_mount }} xfs noatime,nodiratime,nofail,x-systemd.requires=format-ramdisk.service 0 0'
|
||||||
|
register: fstab_entry
|
||||||
|
|
||||||
|
- name: Reload systemd
|
||||||
|
systemd:
|
||||||
|
daemon_reload: true
|
||||||
|
when: unit_file.changed or accounts_unit.changed or fstab_entry.changed
|
||||||
|
|
||||||
|
- name: Enable ramdisk services
|
||||||
|
systemd:
|
||||||
|
name: "{{ item }}"
|
||||||
|
enabled: true
|
||||||
|
loop:
|
||||||
|
- format-ramdisk.service
|
||||||
|
- ramdisk-accounts.service
|
||||||
|
|
||||||
|
# ---- apply now if ramdisk not mounted ------------------------------------
|
||||||
|
- name: Check if ramdisk is mounted
|
||||||
|
command: mountpoint -q {{ ramdisk_mount }}
|
||||||
|
register: ramdisk_mounted
|
||||||
|
failed_when: false
|
||||||
|
changed_when: false
|
||||||
|
|
||||||
|
- name: Format and mount ramdisk now
|
||||||
|
shell: |
|
||||||
|
mkfs.xfs -f {{ ramdisk_device }}
|
||||||
|
mount {{ ramdisk_mount }}
|
||||||
|
mkdir -p {{ accounts_dir }}
|
||||||
|
chown solana:solana {{ ramdisk_mount }} {{ accounts_dir }}
|
||||||
|
when: ramdisk_mounted.rc != 0
|
||||||
|
|
||||||
|
# ---- verify --------------------------------------------------------------
|
||||||
|
- name: Verify ramdisk
|
||||||
|
command: df -hT {{ ramdisk_mount }}
|
||||||
|
register: ramdisk_df
|
||||||
|
changed_when: false
|
||||||
|
|
||||||
|
- name: Show ramdisk status
|
||||||
|
debug:
|
||||||
|
msg: "{{ ramdisk_df.stdout_lines }}"
|
||||||
|
|
@ -0,0 +1,220 @@
|
||||||
|
---
|
||||||
|
# Recover agave validator from any state to healthy
|
||||||
|
#
|
||||||
|
# This playbook is idempotent — it assesses current state and picks up
|
||||||
|
# from wherever the system is. Each step checks its precondition and
|
||||||
|
# skips if already satisfied.
|
||||||
|
#
|
||||||
|
# Steps:
|
||||||
|
# 1. Scale deployment to 0
|
||||||
|
# 2. Wait for pods to terminate
|
||||||
|
# 3. Wipe accounts ramdisk
|
||||||
|
# 4. Clean old snapshots
|
||||||
|
# 5. Download fresh snapshot via aria2c
|
||||||
|
# 6. Verify snapshot accessible via PV (kubectl)
|
||||||
|
# 7. Scale deployment to 1
|
||||||
|
# 8. Wait for pod Running
|
||||||
|
# 9. Verify validator log shows snapshot unpacking
|
||||||
|
# 10. Check RPC health
|
||||||
|
#
|
||||||
|
# Usage:
|
||||||
|
# ansible-playbook -i biscayne.vaasl.io, playbooks/biscayne-recover.yml
|
||||||
|
#
|
||||||
|
# # Pass extra args to snapshot-download.py
|
||||||
|
# ansible-playbook -i biscayne.vaasl.io, playbooks/biscayne-recover.yml \
|
||||||
|
# -e 'snapshot_args=--version 2.2'
|
||||||
|
#
|
||||||
|
- name: Recover agave validator
|
||||||
|
hosts: all
|
||||||
|
gather_facts: false
|
||||||
|
environment:
|
||||||
|
KUBECONFIG: /home/rix/.kube/config
|
||||||
|
vars:
|
||||||
|
kind_cluster: laconic-70ce4c4b47e23b85
|
||||||
|
k8s_namespace: "laconic-{{ kind_cluster }}"
|
||||||
|
deployment_name: "{{ kind_cluster }}-deployment"
|
||||||
|
snapshot_dir: /srv/solana/snapshots
|
||||||
|
accounts_dir: /srv/solana/ramdisk/accounts
|
||||||
|
ramdisk_mount: /srv/solana/ramdisk
|
||||||
|
ramdisk_device: /dev/ram0
|
||||||
|
snapshot_script_local: "{{ playbook_dir }}/../scripts/snapshot-download.py"
|
||||||
|
snapshot_script: /tmp/snapshot-download.py
|
||||||
|
snapshot_args: ""
|
||||||
|
# Mainnet RPC for slot comparison
|
||||||
|
mainnet_rpc: https://api.mainnet-beta.solana.com
|
||||||
|
# Maximum slots behind before snapshot is considered stale
|
||||||
|
max_slot_lag: 20000
|
||||||
|
|
||||||
|
tasks:
|
||||||
|
# ---- step 1: scale to 0 ---------------------------------------------------
|
||||||
|
- name: Get current replica count
|
||||||
|
command: >
|
||||||
|
kubectl get deployment {{ deployment_name }}
|
||||||
|
-n {{ k8s_namespace }}
|
||||||
|
-o jsonpath='{.spec.replicas}'
|
||||||
|
register: current_replicas
|
||||||
|
failed_when: false
|
||||||
|
changed_when: false
|
||||||
|
|
||||||
|
- name: Scale deployment to 0
|
||||||
|
command: >
|
||||||
|
kubectl scale deployment {{ deployment_name }}
|
||||||
|
-n {{ k8s_namespace }} --replicas=0
|
||||||
|
when: current_replicas.stdout | default('0') | int > 0
|
||||||
|
changed_when: true
|
||||||
|
|
||||||
|
# ---- step 2: wait for pods to terminate ------------------------------------
|
||||||
|
- name: Wait for pods to terminate
|
||||||
|
command: >
|
||||||
|
kubectl get pods -n {{ k8s_namespace }}
|
||||||
|
-l app={{ deployment_name }}
|
||||||
|
-o jsonpath='{.items}'
|
||||||
|
register: pods_remaining
|
||||||
|
retries: 60
|
||||||
|
delay: 5
|
||||||
|
until: pods_remaining.stdout == "[]" or pods_remaining.stdout == ""
|
||||||
|
changed_when: false
|
||||||
|
when: current_replicas.stdout | default('0') | int > 0
|
||||||
|
|
||||||
|
- name: Verify no agave processes in kind node (io_uring safety check)
|
||||||
|
command: >
|
||||||
|
docker exec {{ kind_cluster }}-control-plane
|
||||||
|
pgrep -c agave-validator
|
||||||
|
register: agave_procs
|
||||||
|
failed_when: false
|
||||||
|
changed_when: false
|
||||||
|
|
||||||
|
- name: Fail if agave zombie detected
|
||||||
|
ansible.builtin.fail:
|
||||||
|
msg: >-
|
||||||
|
agave-validator process still running inside kind node after pod
|
||||||
|
termination. This is the io_uring/ZFS deadlock. Do NOT proceed —
|
||||||
|
host reboot required. See CLAUDE.md.
|
||||||
|
when: agave_procs.rc == 0
|
||||||
|
|
||||||
|
# ---- step 3: wipe accounts ramdisk -----------------------------------------
|
||||||
|
# Cannot umount+mkfs because the kind node's bind mount holds it open.
|
||||||
|
# Instead, delete contents. This is sufficient — agave starts clean.
|
||||||
|
- name: Wipe accounts data
|
||||||
|
ansible.builtin.shell: |
|
||||||
|
rm -rf {{ accounts_dir }}/*
|
||||||
|
chown solana:solana {{ ramdisk_mount }} {{ accounts_dir }}
|
||||||
|
become: true
|
||||||
|
changed_when: true
|
||||||
|
|
||||||
|
# ---- step 4: clean old snapshots -------------------------------------------
|
||||||
|
- name: Remove all old snapshots
|
||||||
|
ansible.builtin.shell: rm -f {{ snapshot_dir }}/*.tar.* {{ snapshot_dir }}/*.tar
|
||||||
|
become: true
|
||||||
|
changed_when: true
|
||||||
|
|
||||||
|
# ---- step 5: download fresh snapshot ---------------------------------------
|
||||||
|
- name: Verify aria2c installed
|
||||||
|
command: which aria2c
|
||||||
|
changed_when: false
|
||||||
|
|
||||||
|
- name: Copy snapshot script to remote
|
||||||
|
ansible.builtin.copy:
|
||||||
|
src: "{{ snapshot_script_local }}"
|
||||||
|
dest: "{{ snapshot_script }}"
|
||||||
|
mode: "0755"
|
||||||
|
|
||||||
|
- name: Download snapshot and scale to 1
|
||||||
|
ansible.builtin.shell: |
|
||||||
|
python3 {{ snapshot_script }} \
|
||||||
|
-o {{ snapshot_dir }} \
|
||||||
|
--max-snapshot-age {{ max_slot_lag }} \
|
||||||
|
--max-latency 500 \
|
||||||
|
{{ snapshot_args }} \
|
||||||
|
&& KUBECONFIG=/home/rix/.kube/config kubectl scale deployment \
|
||||||
|
{{ deployment_name }} -n {{ k8s_namespace }} --replicas=1
|
||||||
|
become: true
|
||||||
|
register: snapshot_result
|
||||||
|
timeout: 3600
|
||||||
|
changed_when: true
|
||||||
|
|
||||||
|
# ---- step 6: verify snapshot accessible via PV -----------------------------
|
||||||
|
- name: Get snapshot filename
|
||||||
|
ansible.builtin.shell: ls -1 {{ snapshot_dir }}/snapshot-*.tar.* | head -1 | xargs basename
|
||||||
|
register: snapshot_filename
|
||||||
|
changed_when: false
|
||||||
|
|
||||||
|
- name: Extract snapshot slot from filename
|
||||||
|
ansible.builtin.set_fact:
|
||||||
|
snapshot_slot: "{{ snapshot_filename.stdout | regex_search('snapshot-([0-9]+)-', '\\1') | first }}"
|
||||||
|
|
||||||
|
- name: Get current mainnet slot
|
||||||
|
ansible.builtin.uri:
|
||||||
|
url: "{{ mainnet_rpc }}"
|
||||||
|
method: POST
|
||||||
|
body_format: json
|
||||||
|
body:
|
||||||
|
jsonrpc: "2.0"
|
||||||
|
id: 1
|
||||||
|
method: getSlot
|
||||||
|
params:
|
||||||
|
- commitment: finalized
|
||||||
|
return_content: true
|
||||||
|
register: mainnet_slot_response
|
||||||
|
|
||||||
|
- name: Check snapshot freshness
|
||||||
|
ansible.builtin.fail:
|
||||||
|
msg: >-
|
||||||
|
Snapshot too old: slot {{ snapshot_slot }}, mainnet at
|
||||||
|
{{ mainnet_slot_response.json.result }},
|
||||||
|
{{ mainnet_slot_response.json.result | int - snapshot_slot | int }} slots behind
|
||||||
|
(max {{ max_slot_lag }}).
|
||||||
|
when: (mainnet_slot_response.json.result | int - snapshot_slot | int) > max_slot_lag
|
||||||
|
|
||||||
|
- name: Report snapshot freshness
|
||||||
|
ansible.builtin.debug:
|
||||||
|
msg: >-
|
||||||
|
Snapshot slot {{ snapshot_slot }}, mainnet {{ mainnet_slot_response.json.result }},
|
||||||
|
{{ mainnet_slot_response.json.result | int - snapshot_slot | int }} slots behind.
|
||||||
|
|
||||||
|
# ---- step 7: scale already done in download step above ----------------------
|
||||||
|
|
||||||
|
# ---- step 8: wait for pod running ------------------------------------------
|
||||||
|
- name: Wait for pod to be running
|
||||||
|
command: >
|
||||||
|
kubectl get pods -n {{ k8s_namespace }}
|
||||||
|
-l app={{ deployment_name }}
|
||||||
|
-o jsonpath='{.items[0].status.phase}'
|
||||||
|
register: pod_status
|
||||||
|
retries: 60
|
||||||
|
delay: 10
|
||||||
|
until: pod_status.stdout == "Running"
|
||||||
|
changed_when: false
|
||||||
|
|
||||||
|
# ---- step 9: verify validator log ------------------------------------------
|
||||||
|
- name: Wait for validator log file
|
||||||
|
command: >
|
||||||
|
kubectl exec -n {{ k8s_namespace }}
|
||||||
|
deployment/{{ deployment_name }}
|
||||||
|
-c agave-validator -- test -f /data/log/validator.log
|
||||||
|
register: log_file_check
|
||||||
|
retries: 12
|
||||||
|
delay: 10
|
||||||
|
until: log_file_check.rc == 0
|
||||||
|
changed_when: false
|
||||||
|
|
||||||
|
# ---- step 10: check RPC health ---------------------------------------------
|
||||||
|
- name: Check RPC health (non-blocking)
|
||||||
|
ansible.builtin.uri:
|
||||||
|
url: http://{{ inventory_hostname }}:8899/health
|
||||||
|
return_content: true
|
||||||
|
register: rpc_health
|
||||||
|
retries: 6
|
||||||
|
delay: 30
|
||||||
|
until: rpc_health.status == 200
|
||||||
|
failed_when: false
|
||||||
|
|
||||||
|
- name: Report final status
|
||||||
|
ansible.builtin.debug:
|
||||||
|
msg: >-
|
||||||
|
Recovery complete.
|
||||||
|
Snapshot: slot {{ snapshot_slot }}
|
||||||
|
({{ mainnet_slot_response.json.result | int - snapshot_slot | int }} slots behind).
|
||||||
|
Pod: {{ pod_status.stdout }}.
|
||||||
|
Log: {{ 'writing' if log_file_check.rc == 0 else 'not yet' }}.
|
||||||
|
RPC: {{ rpc_health.content | default('not yet responding — still catching up') }}.
|
||||||
|
|
@ -0,0 +1,321 @@
|
||||||
|
---
|
||||||
|
# Redeploy agave-stack on biscayne with aria2c snapshot pre-download
|
||||||
|
#
|
||||||
|
# The validator's built-in downloader fetches snapshots at ~18 MB/s (single
|
||||||
|
# connection). snapshot-download.py uses aria2c with 16 parallel connections to
|
||||||
|
# saturate available bandwidth, cutting 90+ min downloads to ~10 min.
|
||||||
|
#
|
||||||
|
# Flow:
|
||||||
|
# 1. [teardown] Delete k8s namespace (preserve kind cluster)
|
||||||
|
# 2. [wipe] Conditionally clear ledger / accounts / old snapshots
|
||||||
|
# 3. [deploy] laconic-so deployment start, then immediately scale to 0
|
||||||
|
# 4. [snapshot] Download snapshot via aria2c to host bind mount
|
||||||
|
# 5. [snapshot] Verify snapshot visible inside kind node
|
||||||
|
# 6. [deploy] Scale validator back to 1
|
||||||
|
# 7. [verify] Wait for pod Running, check logs + RPC health
|
||||||
|
#
|
||||||
|
# The validator cannot run during snapshot download — it would lock/use the
|
||||||
|
# snapshot files. laconic-so creates the cluster AND deploys the pod in one
|
||||||
|
# shot, so we scale to 0 immediately after deploy, download, then scale to 1.
|
||||||
|
#
|
||||||
|
# Usage:
|
||||||
|
# # Standard redeploy (download snapshot, preserve accounts + ledger)
|
||||||
|
# ansible-playbook -i biscayne.vaasl.io, playbooks/biscayne-redeploy.yml
|
||||||
|
#
|
||||||
|
# # Full wipe (accounts + ledger) — slow rebuild
|
||||||
|
# ansible-playbook -i biscayne.vaasl.io, playbooks/biscayne-redeploy.yml \
|
||||||
|
# -e wipe_accounts=true -e wipe_ledger=true
|
||||||
|
#
|
||||||
|
# # Skip snapshot download (use existing)
|
||||||
|
# ansible-playbook -i biscayne.vaasl.io, playbooks/biscayne-redeploy.yml \
|
||||||
|
# -e skip_snapshot=true
|
||||||
|
#
|
||||||
|
# # Pass extra args to snapshot-download.py
|
||||||
|
# ansible-playbook -i biscayne.vaasl.io, playbooks/biscayne-redeploy.yml \
|
||||||
|
# -e 'snapshot_args=--version 2.2 --min-download-speed 50'
|
||||||
|
#
|
||||||
|
# # Snapshot only (no teardown/deploy)
|
||||||
|
# ansible-playbook -i biscayne.vaasl.io, playbooks/biscayne-redeploy.yml \
|
||||||
|
# --tags snapshot
|
||||||
|
#
|
||||||
|
- name: Redeploy agave validator on biscayne
|
||||||
|
hosts: all
|
||||||
|
gather_facts: false
|
||||||
|
environment:
|
||||||
|
KUBECONFIG: /home/rix/.kube/config
|
||||||
|
vars:
|
||||||
|
deployment_dir: /srv/deployments/agave
|
||||||
|
laconic_so: /home/rix/.local/bin/laconic-so
|
||||||
|
kind_cluster: laconic-70ce4c4b47e23b85
|
||||||
|
k8s_namespace: "laconic-{{ kind_cluster }}"
|
||||||
|
deployment_name: "{{ kind_cluster }}-deployment"
|
||||||
|
snapshot_dir: /srv/solana/snapshots
|
||||||
|
ledger_dir: /srv/solana/ledger
|
||||||
|
accounts_dir: /srv/solana/ramdisk/accounts
|
||||||
|
ramdisk_mount: /srv/solana/ramdisk
|
||||||
|
ramdisk_device: /dev/ram0
|
||||||
|
snapshot_script_local: "{{ playbook_dir }}/../scripts/snapshot-download.py"
|
||||||
|
snapshot_script: /tmp/snapshot-download.py
|
||||||
|
# Flags — non-destructive by default
|
||||||
|
wipe_accounts: false
|
||||||
|
wipe_ledger: false
|
||||||
|
skip_snapshot: false
|
||||||
|
snapshot_args: ""
|
||||||
|
|
||||||
|
tasks:
|
||||||
|
# ---- teardown: graceful stop, then delete namespace ----------------------
|
||||||
|
#
|
||||||
|
# IMPORTANT: Scale to 0 first, wait for agave to exit cleanly.
|
||||||
|
# Deleting the namespace while agave is running causes io_uring/ZFS
|
||||||
|
# deadlock (unkillable D-state threads). See CLAUDE.md.
|
||||||
|
- name: Scale deployment to 0 (graceful stop)
|
||||||
|
command: >
|
||||||
|
kubectl scale deployment {{ deployment_name }}
|
||||||
|
-n {{ k8s_namespace }} --replicas=0
|
||||||
|
register: pre_teardown_scale
|
||||||
|
failed_when: false
|
||||||
|
tags: [teardown]
|
||||||
|
|
||||||
|
- name: Wait for agave to exit
|
||||||
|
command: >
|
||||||
|
kubectl get pods -n {{ k8s_namespace }}
|
||||||
|
-l app={{ deployment_name }}
|
||||||
|
-o jsonpath='{.items}'
|
||||||
|
register: pre_teardown_pods
|
||||||
|
retries: 60
|
||||||
|
delay: 5
|
||||||
|
until: pre_teardown_pods.stdout == "[]" or pre_teardown_pods.stdout == "" or pre_teardown_pods.rc != 0
|
||||||
|
failed_when: false
|
||||||
|
when: pre_teardown_scale.rc == 0
|
||||||
|
tags: [teardown]
|
||||||
|
|
||||||
|
- name: Delete deployment namespace
|
||||||
|
command: >
|
||||||
|
kubectl delete namespace {{ k8s_namespace }} --timeout=120s
|
||||||
|
register: ns_delete
|
||||||
|
failed_when: false
|
||||||
|
tags: [teardown]
|
||||||
|
|
||||||
|
- name: Wait for namespace to terminate
|
||||||
|
command: >
|
||||||
|
kubectl get namespace {{ k8s_namespace }}
|
||||||
|
-o jsonpath='{.status.phase}'
|
||||||
|
register: ns_status
|
||||||
|
retries: 30
|
||||||
|
delay: 5
|
||||||
|
until: ns_status.rc != 0
|
||||||
|
failed_when: false
|
||||||
|
when: ns_delete.rc == 0
|
||||||
|
tags: [teardown]
|
||||||
|
|
||||||
|
# ---- wipe: opt-in data cleanup ------------------------------------------
|
||||||
|
- name: Wipe ledger data
|
||||||
|
shell: rm -rf {{ ledger_dir }}/*
|
||||||
|
become: true
|
||||||
|
when: wipe_ledger | bool
|
||||||
|
tags: [wipe]
|
||||||
|
|
||||||
|
- name: Wipe accounts ramdisk (umount + mkfs.xfs + mount)
|
||||||
|
shell: |
|
||||||
|
mountpoint -q {{ ramdisk_mount }} && umount {{ ramdisk_mount }} || true
|
||||||
|
mkfs.xfs -f {{ ramdisk_device }}
|
||||||
|
mount {{ ramdisk_mount }}
|
||||||
|
mkdir -p {{ accounts_dir }}
|
||||||
|
chown solana:solana {{ ramdisk_mount }} {{ accounts_dir }}
|
||||||
|
become: true
|
||||||
|
when: wipe_accounts | bool
|
||||||
|
tags: [wipe]
|
||||||
|
|
||||||
|
- name: Clean old snapshots (keep newest full + incremental)
|
||||||
|
shell: |
|
||||||
|
cd {{ snapshot_dir }} || exit 0
|
||||||
|
newest=$(ls -t snapshot-*.tar.* 2>/dev/null | head -1)
|
||||||
|
if [ -n "$newest" ]; then
|
||||||
|
newest_inc=$(ls -t incremental-snapshot-*.tar.* 2>/dev/null | head -1)
|
||||||
|
find . -maxdepth 1 -name '*.tar.*' \
|
||||||
|
! -name "$newest" \
|
||||||
|
! -name "${newest_inc:-__none__}" \
|
||||||
|
-delete
|
||||||
|
fi
|
||||||
|
become: true
|
||||||
|
when: not skip_snapshot | bool
|
||||||
|
tags: [wipe]
|
||||||
|
|
||||||
|
# ---- preflight: verify ramdisk and mounts before deploy ------------------
|
||||||
|
- name: Verify ramdisk is mounted
|
||||||
|
command: mountpoint -q {{ ramdisk_mount }}
|
||||||
|
register: ramdisk_check
|
||||||
|
failed_when: ramdisk_check.rc != 0
|
||||||
|
changed_when: false
|
||||||
|
tags: [deploy, preflight]
|
||||||
|
|
||||||
|
- name: Verify ramdisk is xfs (not the underlying ZFS)
|
||||||
|
shell: df -T {{ ramdisk_mount }} | grep -q xfs
|
||||||
|
register: ramdisk_type
|
||||||
|
failed_when: ramdisk_type.rc != 0
|
||||||
|
changed_when: false
|
||||||
|
tags: [deploy, preflight]
|
||||||
|
|
||||||
|
- name: Verify ramdisk visible inside kind node
|
||||||
|
shell: >
|
||||||
|
docker exec {{ kind_cluster }}-control-plane
|
||||||
|
df -T /mnt/solana/ramdisk 2>/dev/null | grep -q xfs
|
||||||
|
register: kind_ramdisk_check
|
||||||
|
failed_when: kind_ramdisk_check.rc != 0
|
||||||
|
changed_when: false
|
||||||
|
tags: [deploy, preflight]
|
||||||
|
|
||||||
|
# ---- deploy: bring up cluster, scale to 0 immediately -------------------
|
||||||
|
- name: Verify kind-config.yml has unified mount root
|
||||||
|
command: "grep -c 'containerPath: /mnt$' {{ deployment_dir }}/kind-config.yml"
|
||||||
|
register: mount_root_check
|
||||||
|
failed_when: mount_root_check.stdout | int < 1
|
||||||
|
tags: [deploy]
|
||||||
|
|
||||||
|
- name: Start deployment (creates kind cluster + deploys pod)
|
||||||
|
command: "{{ laconic_so }} deployment --dir {{ deployment_dir }} start"
|
||||||
|
timeout: 1200
|
||||||
|
tags: [deploy]
|
||||||
|
|
||||||
|
- name: Wait for deployment to exist
|
||||||
|
command: >
|
||||||
|
kubectl get deployment {{ deployment_name }}
|
||||||
|
-n {{ k8s_namespace }}
|
||||||
|
-o jsonpath='{.metadata.name}'
|
||||||
|
register: deploy_exists
|
||||||
|
retries: 30
|
||||||
|
delay: 10
|
||||||
|
until: deploy_exists.rc == 0
|
||||||
|
tags: [deploy]
|
||||||
|
|
||||||
|
- name: Scale validator to 0 (stop before snapshot download)
|
||||||
|
command: >
|
||||||
|
kubectl scale deployment {{ deployment_name }}
|
||||||
|
-n {{ k8s_namespace }} --replicas=0
|
||||||
|
tags: [deploy]
|
||||||
|
|
||||||
|
- name: Wait for pods to terminate
|
||||||
|
command: >
|
||||||
|
kubectl get pods -n {{ k8s_namespace }}
|
||||||
|
-l app={{ deployment_name }}
|
||||||
|
-o jsonpath='{.items}'
|
||||||
|
register: pods_gone
|
||||||
|
retries: 30
|
||||||
|
delay: 5
|
||||||
|
until: pods_gone.stdout == "[]" or pods_gone.stdout == ""
|
||||||
|
failed_when: false
|
||||||
|
tags: [deploy]
|
||||||
|
|
||||||
|
# ---- snapshot: download via aria2c, verify in kind node ------------------
|
||||||
|
- name: Verify aria2c installed
|
||||||
|
command: which aria2c
|
||||||
|
changed_when: false
|
||||||
|
when: not skip_snapshot | bool
|
||||||
|
tags: [snapshot]
|
||||||
|
|
||||||
|
- name: Copy snapshot script to remote
|
||||||
|
copy:
|
||||||
|
src: "{{ snapshot_script_local }}"
|
||||||
|
dest: "{{ snapshot_script }}"
|
||||||
|
mode: "0755"
|
||||||
|
when: not skip_snapshot | bool
|
||||||
|
tags: [snapshot]
|
||||||
|
|
||||||
|
- name: Verify kind node mounts
|
||||||
|
command: >
|
||||||
|
docker exec {{ kind_cluster }}-control-plane
|
||||||
|
ls /mnt/solana/snapshots/
|
||||||
|
register: kind_mount_check
|
||||||
|
tags: [snapshot]
|
||||||
|
|
||||||
|
- name: Download snapshot via aria2c
|
||||||
|
shell: >
|
||||||
|
python3 {{ snapshot_script }}
|
||||||
|
-o {{ snapshot_dir }}
|
||||||
|
{{ snapshot_args }}
|
||||||
|
become: true
|
||||||
|
register: snapshot_result
|
||||||
|
when: not skip_snapshot | bool
|
||||||
|
timeout: 3600
|
||||||
|
tags: [snapshot]
|
||||||
|
|
||||||
|
- name: Show snapshot download result
|
||||||
|
debug:
|
||||||
|
msg: "{{ snapshot_result.stdout_lines | default(['skipped']) }}"
|
||||||
|
tags: [snapshot]
|
||||||
|
|
||||||
|
- name: Verify snapshot visible inside kind node
|
||||||
|
shell: >
|
||||||
|
docker exec {{ kind_cluster }}-control-plane
|
||||||
|
ls -lhS /mnt/solana/snapshots/*.tar.* 2>/dev/null | head -5
|
||||||
|
register: kind_snapshot_check
|
||||||
|
failed_when: kind_snapshot_check.stdout == ""
|
||||||
|
when: not skip_snapshot | bool
|
||||||
|
tags: [snapshot]
|
||||||
|
|
||||||
|
- name: Show snapshot files in kind node
|
||||||
|
debug:
|
||||||
|
msg: "{{ kind_snapshot_check.stdout_lines | default(['skipped']) }}"
|
||||||
|
when: not skip_snapshot | bool
|
||||||
|
tags: [snapshot]
|
||||||
|
|
||||||
|
# ---- deploy (cont): scale validator back up with snapshot ----------------
|
||||||
|
- name: Scale validator to 1 (start with downloaded snapshot)
|
||||||
|
command: >
|
||||||
|
kubectl scale deployment {{ deployment_name }}
|
||||||
|
-n {{ k8s_namespace }} --replicas=1
|
||||||
|
tags: [deploy]
|
||||||
|
|
||||||
|
# ---- verify: confirm validator is running --------------------------------
|
||||||
|
- name: Wait for pod to be running
|
||||||
|
command: >
|
||||||
|
kubectl get pods -n {{ k8s_namespace }}
|
||||||
|
-o jsonpath='{.items[0].status.phase}'
|
||||||
|
register: pod_status
|
||||||
|
retries: 60
|
||||||
|
delay: 10
|
||||||
|
until: pod_status.stdout == "Running"
|
||||||
|
tags: [verify]
|
||||||
|
|
||||||
|
- name: Verify unified mount inside kind node
|
||||||
|
command: "docker exec {{ kind_cluster }}-control-plane ls /mnt/solana/"
|
||||||
|
register: mount_check
|
||||||
|
tags: [verify]
|
||||||
|
|
||||||
|
- name: Show mount contents
|
||||||
|
debug:
|
||||||
|
msg: "{{ mount_check.stdout_lines }}"
|
||||||
|
tags: [verify]
|
||||||
|
|
||||||
|
- name: Check validator log file is being written
|
||||||
|
command: >
|
||||||
|
kubectl exec -n {{ k8s_namespace }}
|
||||||
|
deployment/{{ deployment_name }}
|
||||||
|
-c agave-validator -- test -f /data/log/validator.log
|
||||||
|
retries: 12
|
||||||
|
delay: 10
|
||||||
|
until: log_file_check.rc == 0
|
||||||
|
register: log_file_check
|
||||||
|
failed_when: false
|
||||||
|
tags: [verify]
|
||||||
|
|
||||||
|
- name: Check RPC health
|
||||||
|
uri:
|
||||||
|
url: http://127.0.0.1:8899/health
|
||||||
|
return_content: true
|
||||||
|
register: rpc_health
|
||||||
|
retries: 6
|
||||||
|
delay: 10
|
||||||
|
until: rpc_health.status == 200
|
||||||
|
failed_when: false
|
||||||
|
delegate_to: "{{ inventory_hostname }}"
|
||||||
|
tags: [verify]
|
||||||
|
|
||||||
|
- name: Report status
|
||||||
|
debug:
|
||||||
|
msg: >-
|
||||||
|
Deployment complete.
|
||||||
|
Log: {{ 'writing' if log_file_check.rc == 0 else 'not yet created' }}.
|
||||||
|
RPC: {{ rpc_health.content | default('not responding') }}.
|
||||||
|
Wiped: ledger={{ wipe_ledger }}, accounts={{ wipe_accounts }}.
|
||||||
|
tags: [verify]
|
||||||
|
|
@ -0,0 +1,106 @@
|
||||||
|
---
|
||||||
|
# Graceful shutdown of agave validator on biscayne
|
||||||
|
#
|
||||||
|
# Scales the deployment to 0 and waits for the pod to terminate.
|
||||||
|
# This MUST be done before any kind node restart, host reboot,
|
||||||
|
# or docker operations.
|
||||||
|
#
|
||||||
|
# The agave validator uses io_uring for async I/O. On ZFS, killing
|
||||||
|
# the process ungracefully (SIGKILL, docker kill, etc.) can produce
|
||||||
|
# unkillable kernel threads stuck in io_wq_put_and_exit, deadlocking
|
||||||
|
# the container's PID namespace. A graceful SIGTERM via k8s scale-down
|
||||||
|
# allows agave to flush and close its io_uring contexts cleanly.
|
||||||
|
#
|
||||||
|
# Usage:
|
||||||
|
# # Stop the validator
|
||||||
|
# ansible-playbook -i biscayne.vaasl.io, playbooks/biscayne-stop.yml
|
||||||
|
#
|
||||||
|
# # Stop and restart kind node (LAST RESORT — e.g., broken namespace)
|
||||||
|
# # Normally unnecessary: mount propagation means ramdisk/ZFS changes
|
||||||
|
# # are visible in the kind node without restarting it.
|
||||||
|
# ansible-playbook -i biscayne.vaasl.io, playbooks/biscayne-stop.yml \
|
||||||
|
# -e restart_kind=true
|
||||||
|
#
|
||||||
|
- name: Graceful validator shutdown
|
||||||
|
hosts: all
|
||||||
|
gather_facts: false
|
||||||
|
environment:
|
||||||
|
KUBECONFIG: /home/rix/.kube/config
|
||||||
|
vars:
|
||||||
|
kind_cluster: laconic-70ce4c4b47e23b85
|
||||||
|
k8s_namespace: "laconic-{{ kind_cluster }}"
|
||||||
|
deployment_name: "{{ kind_cluster }}-deployment"
|
||||||
|
restart_kind: false
|
||||||
|
|
||||||
|
tasks:
|
||||||
|
- name: Get current replica count
|
||||||
|
command: >
|
||||||
|
kubectl get deployment {{ deployment_name }}
|
||||||
|
-n {{ k8s_namespace }}
|
||||||
|
-o jsonpath='{.spec.replicas}'
|
||||||
|
register: current_replicas
|
||||||
|
failed_when: false
|
||||||
|
changed_when: false
|
||||||
|
|
||||||
|
- name: Scale deployment to 0
|
||||||
|
command: >
|
||||||
|
kubectl scale deployment {{ deployment_name }}
|
||||||
|
-n {{ k8s_namespace }} --replicas=0
|
||||||
|
when: current_replicas.stdout | default('0') | int > 0
|
||||||
|
|
||||||
|
- name: Wait for pods to terminate
|
||||||
|
command: >
|
||||||
|
kubectl get pods -n {{ k8s_namespace }}
|
||||||
|
-l app={{ deployment_name }}
|
||||||
|
-o jsonpath='{.items}'
|
||||||
|
register: pods_gone
|
||||||
|
retries: 60
|
||||||
|
delay: 5
|
||||||
|
until: pods_gone.stdout == "[]" or pods_gone.stdout == ""
|
||||||
|
when: current_replicas.stdout | default('0') | int > 0
|
||||||
|
|
||||||
|
- name: Verify no agave processes in kind node
|
||||||
|
command: >
|
||||||
|
docker exec {{ kind_cluster }}-control-plane
|
||||||
|
pgrep -c agave-validator
|
||||||
|
register: agave_procs
|
||||||
|
failed_when: false
|
||||||
|
changed_when: false
|
||||||
|
|
||||||
|
- name: Fail if agave still running
|
||||||
|
fail:
|
||||||
|
msg: >-
|
||||||
|
agave-validator process still running inside kind node after
|
||||||
|
pod termination. Do NOT restart the kind node — investigate
|
||||||
|
first to avoid io_uring/ZFS deadlock.
|
||||||
|
when: agave_procs.rc == 0
|
||||||
|
|
||||||
|
- name: Report stopped
|
||||||
|
debug:
|
||||||
|
msg: >-
|
||||||
|
Validator stopped. Replicas: {{ current_replicas.stdout | default('0') }} -> 0.
|
||||||
|
No agave processes detected in kind node.
|
||||||
|
when: not restart_kind | bool
|
||||||
|
|
||||||
|
# ---- optional: restart kind node -----------------------------------------
|
||||||
|
- name: Restart kind node
|
||||||
|
command: docker restart {{ kind_cluster }}-control-plane
|
||||||
|
when: restart_kind | bool
|
||||||
|
timeout: 120
|
||||||
|
|
||||||
|
- name: Wait for kind node ready
|
||||||
|
command: >
|
||||||
|
kubectl get node {{ kind_cluster }}-control-plane
|
||||||
|
-o jsonpath='{.status.conditions[?(@.type=="Ready")].status}'
|
||||||
|
register: node_ready
|
||||||
|
retries: 30
|
||||||
|
delay: 10
|
||||||
|
until: node_ready.stdout == "True"
|
||||||
|
when: restart_kind | bool
|
||||||
|
|
||||||
|
- name: Report restarted
|
||||||
|
debug:
|
||||||
|
msg: >-
|
||||||
|
Kind node restarted and ready.
|
||||||
|
Deployment at 0 replicas — scale up when ready.
|
||||||
|
when: restart_kind | bool
|
||||||
|
|
@ -0,0 +1,134 @@
|
||||||
|
---
|
||||||
|
# Connect biscayne to DoubleZero multicast via laconic-mia-sw01
|
||||||
|
#
|
||||||
|
# Establishes a GRE tunnel to the nearest DZ hybrid device and subscribes
|
||||||
|
# to jito-shredstream and bebop multicast groups.
|
||||||
|
#
|
||||||
|
# Usage:
|
||||||
|
# ansible-playbook playbooks/connect-doublezero-multicast.yml
|
||||||
|
# ansible-playbook playbooks/connect-doublezero-multicast.yml --check # dry-run
|
||||||
|
|
||||||
|
- name: Connect biscayne to DoubleZero multicast
|
||||||
|
hosts: biscayne
|
||||||
|
gather_facts: false
|
||||||
|
|
||||||
|
vars:
|
||||||
|
dz_multicast_groups:
|
||||||
|
- jito-shredstream
|
||||||
|
- bebop
|
||||||
|
|
||||||
|
tasks:
|
||||||
|
# ------------------------------------------------------------------
|
||||||
|
# Pre-checks
|
||||||
|
# ------------------------------------------------------------------
|
||||||
|
- name: Verify doublezerod service is running
|
||||||
|
ansible.builtin.systemd:
|
||||||
|
name: doublezerod
|
||||||
|
state: started
|
||||||
|
check_mode: true
|
||||||
|
register: dz_service
|
||||||
|
failed_when: dz_service.status.ActiveState != "active"
|
||||||
|
|
||||||
|
- name: Get doublezero identity address
|
||||||
|
ansible.builtin.command:
|
||||||
|
cmd: doublezero address
|
||||||
|
register: dz_address
|
||||||
|
changed_when: false
|
||||||
|
|
||||||
|
- name: Verify doublezero identity matches expected pubkey
|
||||||
|
ansible.builtin.assert:
|
||||||
|
that:
|
||||||
|
- dz_address.stdout | trim == dz_identity
|
||||||
|
fail_msg: >-
|
||||||
|
DZ identity mismatch: got '{{ dz_address.stdout | trim }}',
|
||||||
|
expected '{{ dz_identity }}'
|
||||||
|
|
||||||
|
- name: Check current DZ connection status
|
||||||
|
ansible.builtin.command:
|
||||||
|
cmd: "doublezero -e {{ dz_environment }} status"
|
||||||
|
register: dz_status
|
||||||
|
changed_when: false
|
||||||
|
failed_when: false
|
||||||
|
|
||||||
|
- name: Fail if already connected (tunnel is up)
|
||||||
|
ansible.builtin.fail:
|
||||||
|
msg: >-
|
||||||
|
DoubleZero tunnel is already connected. To reconnect, first
|
||||||
|
disconnect manually with: doublezero -e {{ dz_environment }} disconnect
|
||||||
|
when: "'connected' in dz_status.stdout | lower"
|
||||||
|
|
||||||
|
# ------------------------------------------------------------------
|
||||||
|
# Create access pass
|
||||||
|
# ------------------------------------------------------------------
|
||||||
|
- name: Create DZ access pass for multicast subscriber
|
||||||
|
ansible.builtin.command:
|
||||||
|
cmd: >-
|
||||||
|
doublezero -e {{ dz_environment }} access-pass set
|
||||||
|
--accesspass-type solana-multicast-subscriber
|
||||||
|
--client-ip {{ client_ip }}
|
||||||
|
--user-payer {{ dz_identity }}
|
||||||
|
--solana-validator {{ validator_identity }}
|
||||||
|
--tenant {{ dz_tenant }}
|
||||||
|
register: dz_access_pass
|
||||||
|
changed_when: "'created' in dz_access_pass.stdout | lower or 'updated' in dz_access_pass.stdout | lower"
|
||||||
|
|
||||||
|
- name: Show access pass result
|
||||||
|
ansible.builtin.debug:
|
||||||
|
var: dz_access_pass.stdout_lines
|
||||||
|
|
||||||
|
# ------------------------------------------------------------------
|
||||||
|
# Connect to DZ multicast
|
||||||
|
# ------------------------------------------------------------------
|
||||||
|
- name: Connect to DoubleZero multicast via {{ dz_device }}
|
||||||
|
ansible.builtin.command:
|
||||||
|
cmd: >-
|
||||||
|
doublezero -e {{ dz_environment }} connect multicast
|
||||||
|
{% for group in dz_multicast_groups %}
|
||||||
|
--subscribe {{ group }}
|
||||||
|
{% endfor %}
|
||||||
|
--device {{ dz_device }}
|
||||||
|
--client-ip {{ client_ip }}
|
||||||
|
register: dz_connect
|
||||||
|
changed_when: true
|
||||||
|
|
||||||
|
- name: Show connect result
|
||||||
|
ansible.builtin.debug:
|
||||||
|
var: dz_connect.stdout_lines
|
||||||
|
|
||||||
|
# ------------------------------------------------------------------
|
||||||
|
# Post-checks
|
||||||
|
# ------------------------------------------------------------------
|
||||||
|
- name: Verify tunnel status is connected
|
||||||
|
ansible.builtin.command:
|
||||||
|
cmd: "doublezero -e {{ dz_environment }} status"
|
||||||
|
register: dz_post_status
|
||||||
|
changed_when: false
|
||||||
|
failed_when: "'connected' not in dz_post_status.stdout | lower"
|
||||||
|
|
||||||
|
- name: Show tunnel status
|
||||||
|
ansible.builtin.debug:
|
||||||
|
var: dz_post_status.stdout_lines
|
||||||
|
|
||||||
|
- name: Verify routes are installed
|
||||||
|
ansible.builtin.command:
|
||||||
|
cmd: "doublezero -e {{ dz_environment }} routes"
|
||||||
|
register: dz_routes
|
||||||
|
changed_when: false
|
||||||
|
|
||||||
|
- name: Show installed routes
|
||||||
|
ansible.builtin.debug:
|
||||||
|
var: dz_routes.stdout_lines
|
||||||
|
|
||||||
|
- name: Check multicast group membership
|
||||||
|
ansible.builtin.command:
|
||||||
|
cmd: "doublezero -e {{ dz_environment }} status"
|
||||||
|
register: dz_multicast_status
|
||||||
|
changed_when: false
|
||||||
|
|
||||||
|
- name: Connection summary
|
||||||
|
ansible.builtin.debug:
|
||||||
|
msg: >-
|
||||||
|
DoubleZero multicast connected via {{ dz_device }}.
|
||||||
|
Subscribed groups: {{ dz_multicast_groups | join(', ') }}.
|
||||||
|
Next step: request allowlist access from group owners
|
||||||
|
(see docs/doublezero-multicast-access.md).
|
||||||
|
|
@ -0,0 +1,18 @@
|
||||||
|
#!/bin/bash
|
||||||
|
# /etc/network/if-up.d/ashburn-routing
|
||||||
|
# Restore policy routing for Ashburn validator relay after reboot/interface up.
|
||||||
|
# Only act when doublezero0 comes up.
|
||||||
|
|
||||||
|
[ "$IFACE" = "doublezero0" ] || exit 0
|
||||||
|
|
||||||
|
# Ensure rt_tables entry exists
|
||||||
|
grep -q '^100 ashburn$' /etc/iproute2/rt_tables || echo "100 ashburn" >> /etc/iproute2/rt_tables
|
||||||
|
|
||||||
|
# Add policy rule (idempotent — ip rule skips duplicates silently on some kernels)
|
||||||
|
ip rule show | grep -q 'fwmark 0x64 lookup ashburn' || ip rule add fwmark 100 table ashburn
|
||||||
|
|
||||||
|
# Add default route via mia-sw01 through doublezero0 tunnel
|
||||||
|
ip route replace default via 169.254.7.6 dev doublezero0 table ashburn
|
||||||
|
|
||||||
|
# Add Ashburn IP to loopback (idempotent)
|
||||||
|
ip addr show lo | grep -q '137.239.194.65' || ip addr add 137.239.194.65/32 dev lo
|
||||||
|
|
@ -0,0 +1,166 @@
|
||||||
|
---
|
||||||
|
# Verify PV hostPaths match expected kind-node paths, fix if wrong.
|
||||||
|
#
|
||||||
|
# Checks each PV's hostPath against the expected path derived from the
|
||||||
|
# spec.yml volume mapping through the kind extraMounts. If any PV has a
|
||||||
|
# wrong path, fails unless -e fix=true is passed.
|
||||||
|
#
|
||||||
|
# Does NOT touch the deployment.
|
||||||
|
#
|
||||||
|
# Usage:
|
||||||
|
# # Check only (fails if mounts are bad)
|
||||||
|
# ansible-playbook -i biscayne.vaasl.io, playbooks/fix-pv-mounts.yml
|
||||||
|
#
|
||||||
|
# # Fix stale PVs
|
||||||
|
# ansible-playbook -i biscayne.vaasl.io, playbooks/fix-pv-mounts.yml -e fix=true
|
||||||
|
#
|
||||||
|
- name: Verify and fix PV mount paths
|
||||||
|
hosts: all
|
||||||
|
gather_facts: false
|
||||||
|
environment:
|
||||||
|
KUBECONFIG: /home/rix/.kube/config
|
||||||
|
vars:
|
||||||
|
kind_cluster: laconic-70ce4c4b47e23b85
|
||||||
|
k8s_namespace: "laconic-{{ kind_cluster }}"
|
||||||
|
fix: false
|
||||||
|
volumes:
|
||||||
|
- name: validator-snapshots
|
||||||
|
host_path: /mnt/solana/snapshots
|
||||||
|
capacity: 200Gi
|
||||||
|
- name: validator-ledger
|
||||||
|
host_path: /mnt/solana/ledger
|
||||||
|
capacity: 2Ti
|
||||||
|
- name: validator-accounts
|
||||||
|
host_path: /mnt/solana/ramdisk/accounts
|
||||||
|
capacity: 800Gi
|
||||||
|
- name: validator-log
|
||||||
|
host_path: /mnt/solana/log
|
||||||
|
capacity: 10Gi
|
||||||
|
|
||||||
|
tasks:
|
||||||
|
- name: Read current PV hostPaths
|
||||||
|
command: >
|
||||||
|
kubectl get pv {{ kind_cluster }}-{{ item.name }}
|
||||||
|
-o jsonpath='{.spec.hostPath.path}'
|
||||||
|
register: current_paths
|
||||||
|
loop: "{{ volumes }}"
|
||||||
|
failed_when: false
|
||||||
|
changed_when: false
|
||||||
|
|
||||||
|
- name: Build path comparison
|
||||||
|
set_fact:
|
||||||
|
path_mismatches: "{{ current_paths.results | selectattr('stdout', 'ne', '') | rejectattr('stdout', 'equalto', item.host_path) | list }}"
|
||||||
|
path_missing: "{{ current_paths.results | selectattr('stdout', 'equalto', '') | list }}"
|
||||||
|
loop: "{{ volumes }}"
|
||||||
|
loop_control:
|
||||||
|
label: "{{ item.name }}"
|
||||||
|
|
||||||
|
- name: Show current vs expected paths
|
||||||
|
debug:
|
||||||
|
msg: >-
|
||||||
|
{{ item.item.name }}:
|
||||||
|
current={{ item.stdout if item.stdout else 'NOT FOUND' }}
|
||||||
|
expected={{ item.item.host_path }}
|
||||||
|
{{ 'OK' if item.stdout == item.item.host_path else 'NEEDS FIX' }}
|
||||||
|
loop: "{{ current_paths.results }}"
|
||||||
|
loop_control:
|
||||||
|
label: "{{ item.item.name }}"
|
||||||
|
|
||||||
|
- name: Check for mismatched PVs
|
||||||
|
fail:
|
||||||
|
msg: >-
|
||||||
|
PV {{ item.item.name }} has wrong hostPath:
|
||||||
|
{{ item.stdout if item.stdout else 'NOT FOUND' }}
|
||||||
|
(expected {{ item.item.host_path }}).
|
||||||
|
Run with -e fix=true to delete and recreate.
|
||||||
|
when: item.stdout != item.item.host_path and not fix | bool
|
||||||
|
loop: "{{ current_paths.results }}"
|
||||||
|
loop_control:
|
||||||
|
label: "{{ item.item.name }}"
|
||||||
|
|
||||||
|
# ---- Fix mode ---------------------------------------------------------
|
||||||
|
- name: Delete stale PVCs
|
||||||
|
command: >
|
||||||
|
kubectl delete pvc {{ kind_cluster }}-{{ item.item.name }}
|
||||||
|
-n {{ k8s_namespace }} --timeout=60s
|
||||||
|
when: fix | bool and item.stdout != item.item.host_path
|
||||||
|
loop: "{{ current_paths.results }}"
|
||||||
|
loop_control:
|
||||||
|
label: "{{ item.item.name }}"
|
||||||
|
failed_when: false
|
||||||
|
|
||||||
|
- name: Delete stale PVs
|
||||||
|
command: >
|
||||||
|
kubectl delete pv {{ kind_cluster }}-{{ item.item.name }}
|
||||||
|
--timeout=60s
|
||||||
|
when: fix | bool and item.stdout != item.item.host_path
|
||||||
|
loop: "{{ current_paths.results }}"
|
||||||
|
loop_control:
|
||||||
|
label: "{{ item.item.name }}"
|
||||||
|
failed_when: false
|
||||||
|
|
||||||
|
- name: Create PVs with correct hostPaths
|
||||||
|
command: >
|
||||||
|
kubectl apply -f -
|
||||||
|
args:
|
||||||
|
stdin: |
|
||||||
|
apiVersion: v1
|
||||||
|
kind: PersistentVolume
|
||||||
|
metadata:
|
||||||
|
name: {{ kind_cluster }}-{{ item.item.name }}
|
||||||
|
spec:
|
||||||
|
capacity:
|
||||||
|
storage: {{ item.item.capacity }}
|
||||||
|
accessModes:
|
||||||
|
- ReadWriteOnce
|
||||||
|
persistentVolumeReclaimPolicy: Retain
|
||||||
|
storageClassName: manual
|
||||||
|
hostPath:
|
||||||
|
path: {{ item.item.host_path }}
|
||||||
|
when: fix | bool and item.stdout != item.item.host_path
|
||||||
|
loop: "{{ current_paths.results }}"
|
||||||
|
loop_control:
|
||||||
|
label: "{{ item.item.name }}"
|
||||||
|
|
||||||
|
- name: Create PVCs
|
||||||
|
command: >
|
||||||
|
kubectl apply -f -
|
||||||
|
args:
|
||||||
|
stdin: |
|
||||||
|
apiVersion: v1
|
||||||
|
kind: PersistentVolumeClaim
|
||||||
|
metadata:
|
||||||
|
name: {{ kind_cluster }}-{{ item.item.name }}
|
||||||
|
namespace: {{ k8s_namespace }}
|
||||||
|
spec:
|
||||||
|
accessModes:
|
||||||
|
- ReadWriteOnce
|
||||||
|
storageClassName: manual
|
||||||
|
volumeName: {{ kind_cluster }}-{{ item.item.name }}
|
||||||
|
resources:
|
||||||
|
requests:
|
||||||
|
storage: {{ item.item.capacity }}
|
||||||
|
when: fix | bool and item.stdout != item.item.host_path
|
||||||
|
loop: "{{ current_paths.results }}"
|
||||||
|
loop_control:
|
||||||
|
label: "{{ item.item.name }}"
|
||||||
|
|
||||||
|
# ---- Final verify -----------------------------------------------------
|
||||||
|
- name: Verify PV paths
|
||||||
|
command: >
|
||||||
|
kubectl get pv {{ kind_cluster }}-{{ item.name }}
|
||||||
|
-o jsonpath='{.spec.hostPath.path}'
|
||||||
|
register: final_paths
|
||||||
|
loop: "{{ volumes }}"
|
||||||
|
changed_when: false
|
||||||
|
when: fix | bool
|
||||||
|
|
||||||
|
- name: Assert all PV paths correct
|
||||||
|
assert:
|
||||||
|
that: item.stdout == item.item.host_path
|
||||||
|
fail_msg: "{{ item.item.name }}: {{ item.stdout }} != {{ item.item.host_path }}"
|
||||||
|
success_msg: "{{ item.item.name }}: {{ item.stdout }} OK"
|
||||||
|
loop: "{{ final_paths.results }}"
|
||||||
|
loop_control:
|
||||||
|
label: "{{ item.item.name }}"
|
||||||
|
when: fix | bool
|
||||||
|
|
@ -0,0 +1,340 @@
|
||||||
|
---
|
||||||
|
# Health check for biscayne agave-stack deployment
|
||||||
|
#
|
||||||
|
# Gathers system, validator, DoubleZero, and network status in a single run.
|
||||||
|
# All tasks are read-only — safe to run at any time.
|
||||||
|
#
|
||||||
|
# Usage:
|
||||||
|
# ansible-playbook playbooks/health-check.yml
|
||||||
|
# ansible-playbook playbooks/health-check.yml -t validator # just validator checks
|
||||||
|
# ansible-playbook playbooks/health-check.yml -t doublezero # just DZ checks
|
||||||
|
# ansible-playbook playbooks/health-check.yml -t network # just network checks
|
||||||
|
|
||||||
|
- name: Biscayne agave-stack health check
|
||||||
|
hosts: biscayne
|
||||||
|
gather_facts: false
|
||||||
|
|
||||||
|
tasks:
|
||||||
|
# ------------------------------------------------------------------
|
||||||
|
# Discover kind cluster and namespace
|
||||||
|
# ------------------------------------------------------------------
|
||||||
|
- name: Get kind cluster name
|
||||||
|
ansible.builtin.command:
|
||||||
|
cmd: kind get clusters
|
||||||
|
register: kind_clusters
|
||||||
|
changed_when: false
|
||||||
|
failed_when: kind_clusters.rc != 0 or kind_clusters.stdout_lines | length == 0
|
||||||
|
|
||||||
|
- name: Set cluster name fact
|
||||||
|
ansible.builtin.set_fact:
|
||||||
|
kind_cluster: "{{ kind_clusters.stdout_lines[0] }}"
|
||||||
|
|
||||||
|
- name: Discover agave namespace
|
||||||
|
ansible.builtin.shell:
|
||||||
|
cmd: >-
|
||||||
|
set -o pipefail &&
|
||||||
|
kubectl get namespaces --no-headers -o custom-columns=':metadata.name'
|
||||||
|
| grep '^laconic-'
|
||||||
|
executable: /bin/bash
|
||||||
|
register: ns_result
|
||||||
|
changed_when: false
|
||||||
|
failed_when: ns_result.stdout_lines | length == 0
|
||||||
|
|
||||||
|
- name: Set namespace fact
|
||||||
|
ansible.builtin.set_fact:
|
||||||
|
agave_ns: "{{ ns_result.stdout_lines[0] }}"
|
||||||
|
|
||||||
|
- name: Get pod name
|
||||||
|
ansible.builtin.shell:
|
||||||
|
cmd: >-
|
||||||
|
set -o pipefail &&
|
||||||
|
kubectl get pods -n {{ agave_ns }} --no-headers
|
||||||
|
-o custom-columns=':metadata.name' | head -1
|
||||||
|
executable: /bin/bash
|
||||||
|
register: pod_result
|
||||||
|
changed_when: false
|
||||||
|
failed_when: pod_result.stdout | trim == ''
|
||||||
|
|
||||||
|
- name: Set pod fact
|
||||||
|
ansible.builtin.set_fact:
|
||||||
|
agave_pod: "{{ pod_result.stdout | trim }}"
|
||||||
|
|
||||||
|
- name: Show discovered resources
|
||||||
|
ansible.builtin.debug:
|
||||||
|
msg: "cluster={{ kind_cluster }} ns={{ agave_ns }} pod={{ agave_pod }}"
|
||||||
|
|
||||||
|
# ------------------------------------------------------------------
|
||||||
|
# Pod status
|
||||||
|
# ------------------------------------------------------------------
|
||||||
|
- name: Get pod status
|
||||||
|
ansible.builtin.command:
|
||||||
|
cmd: kubectl get pods -n {{ agave_ns }} -o wide
|
||||||
|
register: pod_status
|
||||||
|
changed_when: false
|
||||||
|
tags: [validator]
|
||||||
|
|
||||||
|
- name: Show pod status
|
||||||
|
ansible.builtin.debug:
|
||||||
|
var: pod_status.stdout_lines
|
||||||
|
tags: [validator]
|
||||||
|
|
||||||
|
- name: Get container restart counts
|
||||||
|
ansible.builtin.shell:
|
||||||
|
cmd: >-
|
||||||
|
kubectl get pod {{ agave_pod }} -n {{ agave_ns }}
|
||||||
|
-o jsonpath='{range .status.containerStatuses[*]}{.name}{" restarts="}{.restartCount}{" ready="}{.ready}{"\n"}{end}'
|
||||||
|
register: restart_counts
|
||||||
|
changed_when: false
|
||||||
|
tags: [validator]
|
||||||
|
|
||||||
|
- name: Show restart counts
|
||||||
|
ansible.builtin.debug:
|
||||||
|
var: restart_counts.stdout_lines
|
||||||
|
tags: [validator]
|
||||||
|
|
||||||
|
# ------------------------------------------------------------------
|
||||||
|
# Validator sync status
|
||||||
|
# ------------------------------------------------------------------
|
||||||
|
- name: Get validator recent logs (replay progress)
|
||||||
|
ansible.builtin.command:
|
||||||
|
cmd: >-
|
||||||
|
kubectl logs -n {{ agave_ns }} {{ agave_pod }}
|
||||||
|
-c agave-validator --tail=30
|
||||||
|
register: validator_logs
|
||||||
|
changed_when: false
|
||||||
|
tags: [validator]
|
||||||
|
|
||||||
|
- name: Show validator logs
|
||||||
|
ansible.builtin.debug:
|
||||||
|
var: validator_logs.stdout_lines
|
||||||
|
tags: [validator]
|
||||||
|
|
||||||
|
- name: Check RPC health endpoint
|
||||||
|
ansible.builtin.uri:
|
||||||
|
url: http://127.0.0.1:8899/health
|
||||||
|
method: GET
|
||||||
|
return_content: true
|
||||||
|
timeout: 5
|
||||||
|
register: rpc_health
|
||||||
|
failed_when: false
|
||||||
|
tags: [validator]
|
||||||
|
|
||||||
|
- name: Show RPC health
|
||||||
|
ansible.builtin.debug:
|
||||||
|
msg: "RPC health: {{ rpc_health.status | default('unreachable') }} — {{ rpc_health.content | default('no response') }}"
|
||||||
|
tags: [validator]
|
||||||
|
|
||||||
|
- name: Get validator version
|
||||||
|
ansible.builtin.shell:
|
||||||
|
cmd: >-
|
||||||
|
kubectl exec -n {{ agave_ns }} {{ agave_pod }}
|
||||||
|
-c agave-validator -- agave-validator --version 2>&1 || true
|
||||||
|
register: validator_version
|
||||||
|
changed_when: false
|
||||||
|
tags: [validator]
|
||||||
|
|
||||||
|
- name: Show validator version
|
||||||
|
ansible.builtin.debug:
|
||||||
|
var: validator_version.stdout
|
||||||
|
tags: [validator]
|
||||||
|
|
||||||
|
# ------------------------------------------------------------------
|
||||||
|
# DoubleZero status
|
||||||
|
# ------------------------------------------------------------------
|
||||||
|
- name: Get host DZ identity
|
||||||
|
ansible.builtin.command:
|
||||||
|
cmd: sudo -u solana doublezero address
|
||||||
|
register: dz_address
|
||||||
|
changed_when: false
|
||||||
|
failed_when: false
|
||||||
|
tags: [doublezero]
|
||||||
|
|
||||||
|
- name: Get host DZ tunnel status
|
||||||
|
ansible.builtin.command:
|
||||||
|
cmd: sudo -u solana doublezero -e {{ dz_environment }} status
|
||||||
|
register: dz_status
|
||||||
|
changed_when: false
|
||||||
|
failed_when: false
|
||||||
|
tags: [doublezero]
|
||||||
|
|
||||||
|
- name: Get DZ routes
|
||||||
|
ansible.builtin.shell:
|
||||||
|
cmd: set -o pipefail && ip route | grep doublezero0 || echo "no doublezero0 routes"
|
||||||
|
executable: /bin/bash
|
||||||
|
register: dz_routes
|
||||||
|
changed_when: false
|
||||||
|
tags: [doublezero]
|
||||||
|
|
||||||
|
- name: Get host doublezerod service state
|
||||||
|
ansible.builtin.systemd:
|
||||||
|
name: doublezerod
|
||||||
|
register: dz_systemd_info
|
||||||
|
failed_when: false
|
||||||
|
check_mode: true
|
||||||
|
tags: [doublezero]
|
||||||
|
|
||||||
|
- name: Set DZ systemd state
|
||||||
|
ansible.builtin.set_fact:
|
||||||
|
dz_systemd_state: "{{ dz_systemd_info.status.ActiveState | default('unknown') }}"
|
||||||
|
tags: [doublezero]
|
||||||
|
|
||||||
|
- name: Get container DZ status
|
||||||
|
ansible.builtin.shell:
|
||||||
|
cmd: >-
|
||||||
|
kubectl exec -n {{ agave_ns }} {{ agave_pod }}
|
||||||
|
-c doublezerod -- doublezero status 2>&1 || echo "container DZ unavailable"
|
||||||
|
register: dz_container_status
|
||||||
|
changed_when: false
|
||||||
|
tags: [doublezero]
|
||||||
|
|
||||||
|
- name: Show DoubleZero status
|
||||||
|
ansible.builtin.debug:
|
||||||
|
msg:
|
||||||
|
identity: "{{ dz_address.stdout | default('unknown') }}"
|
||||||
|
host_tunnel: "{{ dz_status.stdout_lines | default(['unknown']) }}"
|
||||||
|
host_systemd: "{{ dz_systemd_state }}"
|
||||||
|
container: "{{ dz_container_status.stdout_lines | default(['unknown']) }}"
|
||||||
|
routes: "{{ dz_routes.stdout_lines | default([]) }}"
|
||||||
|
tags: [doublezero]
|
||||||
|
|
||||||
|
# ------------------------------------------------------------------
|
||||||
|
# Storage
|
||||||
|
# ------------------------------------------------------------------
|
||||||
|
- name: Check ramdisk usage
|
||||||
|
ansible.builtin.command:
|
||||||
|
cmd: df -h /srv/solana/ramdisk
|
||||||
|
register: ramdisk_df
|
||||||
|
changed_when: false
|
||||||
|
failed_when: false
|
||||||
|
tags: [storage]
|
||||||
|
|
||||||
|
- name: Check ZFS dataset usage
|
||||||
|
ansible.builtin.command:
|
||||||
|
cmd: zfs list -o name,used,avail,mountpoint -r biscayne/DATA
|
||||||
|
register: zfs_list
|
||||||
|
changed_when: false
|
||||||
|
tags: [storage]
|
||||||
|
|
||||||
|
- name: Check ZFS zvol I/O
|
||||||
|
ansible.builtin.shell:
|
||||||
|
cmd: set -o pipefail && iostat -x zd0 1 2 | tail -3
|
||||||
|
executable: /bin/bash
|
||||||
|
register: zvol_io
|
||||||
|
changed_when: false
|
||||||
|
failed_when: false
|
||||||
|
tags: [storage]
|
||||||
|
|
||||||
|
- name: Show storage status
|
||||||
|
ansible.builtin.debug:
|
||||||
|
msg:
|
||||||
|
ramdisk: "{{ ramdisk_df.stdout_lines | default(['not mounted']) }}"
|
||||||
|
zfs: "{{ zfs_list.stdout_lines | default([]) }}"
|
||||||
|
zvol_io: "{{ zvol_io.stdout_lines | default([]) }}"
|
||||||
|
tags: [storage]
|
||||||
|
|
||||||
|
# ------------------------------------------------------------------
|
||||||
|
# System resources
|
||||||
|
# ------------------------------------------------------------------
|
||||||
|
- name: Check memory
|
||||||
|
ansible.builtin.command:
|
||||||
|
cmd: free -h
|
||||||
|
register: mem
|
||||||
|
changed_when: false
|
||||||
|
tags: [system]
|
||||||
|
|
||||||
|
- name: Check load average
|
||||||
|
ansible.builtin.command:
|
||||||
|
cmd: cat /proc/loadavg
|
||||||
|
register: loadavg
|
||||||
|
changed_when: false
|
||||||
|
tags: [system]
|
||||||
|
|
||||||
|
- name: Check swap
|
||||||
|
ansible.builtin.command:
|
||||||
|
cmd: swapon --show
|
||||||
|
register: swap
|
||||||
|
changed_when: false
|
||||||
|
failed_when: false
|
||||||
|
tags: [system]
|
||||||
|
|
||||||
|
- name: Show system resources
|
||||||
|
ansible.builtin.debug:
|
||||||
|
msg:
|
||||||
|
memory: "{{ mem.stdout_lines }}"
|
||||||
|
load: "{{ loadavg.stdout }}"
|
||||||
|
swap: "{{ swap.stdout | default('none') }}"
|
||||||
|
tags: [system]
|
||||||
|
|
||||||
|
# ------------------------------------------------------------------
|
||||||
|
# Network / shred throughput
|
||||||
|
# ------------------------------------------------------------------
|
||||||
|
- name: Count shred packets per interface (5 sec sample)
|
||||||
|
ansible.builtin.shell:
|
||||||
|
cmd: |
|
||||||
|
set -o pipefail
|
||||||
|
for iface in eno1 doublezero0; do
|
||||||
|
count=$(timeout 5 tcpdump -i "$iface" -nn 'udp dst portrange 9000-10000' -q 2>&1 | grep -oP '\d+(?= packets captured)' || echo 0)
|
||||||
|
echo "$iface: $count packets/5s"
|
||||||
|
done
|
||||||
|
executable: /bin/bash
|
||||||
|
register: shred_counts
|
||||||
|
changed_when: false
|
||||||
|
failed_when: false
|
||||||
|
tags: [network]
|
||||||
|
|
||||||
|
- name: Check interface throughput
|
||||||
|
ansible.builtin.shell:
|
||||||
|
cmd: >-
|
||||||
|
set -o pipefail &&
|
||||||
|
grep -E 'eno1|doublezero0' /proc/net/dev
|
||||||
|
| awk '{printf "%s rx=%s tx=%s\n", $1, $2, $10}'
|
||||||
|
executable: /bin/bash
|
||||||
|
register: iface_stats
|
||||||
|
changed_when: false
|
||||||
|
tags: [network]
|
||||||
|
|
||||||
|
- name: Check gossip/repair port connections
|
||||||
|
ansible.builtin.shell:
|
||||||
|
cmd: >-
|
||||||
|
set -o pipefail &&
|
||||||
|
ss -tupn | grep -E ':8001|:900[0-9]' | head -20 || echo "no connections"
|
||||||
|
executable: /bin/bash
|
||||||
|
register: gossip_ports
|
||||||
|
changed_when: false
|
||||||
|
tags: [network]
|
||||||
|
|
||||||
|
- name: Check iptables DNAT rule (TVU shred relay)
|
||||||
|
ansible.builtin.shell:
|
||||||
|
cmd: >-
|
||||||
|
set -o pipefail &&
|
||||||
|
iptables -t nat -L PREROUTING -v -n | grep -E '64.92.84.81|20000' || echo "no DNAT rule"
|
||||||
|
executable: /bin/bash
|
||||||
|
register: dnat_rule
|
||||||
|
changed_when: false
|
||||||
|
tags: [network]
|
||||||
|
|
||||||
|
- name: Show network status
|
||||||
|
ansible.builtin.debug:
|
||||||
|
msg:
|
||||||
|
shred_counts: "{{ shred_counts.stdout_lines | default([]) }}"
|
||||||
|
interfaces: "{{ iface_stats.stdout_lines | default([]) }}"
|
||||||
|
gossip_ports: "{{ gossip_ports.stdout_lines | default([]) }}"
|
||||||
|
tvu_dnat: "{{ dnat_rule.stdout_lines | default([]) }}"
|
||||||
|
tags: [network]
|
||||||
|
|
||||||
|
# ------------------------------------------------------------------
|
||||||
|
# Summary
|
||||||
|
# ------------------------------------------------------------------
|
||||||
|
- name: Health check summary
|
||||||
|
ansible.builtin.debug:
|
||||||
|
msg: |
|
||||||
|
=== Biscayne Health Check ===
|
||||||
|
Cluster: {{ kind_cluster }}
|
||||||
|
Namespace: {{ agave_ns }}
|
||||||
|
Pod: {{ agave_pod }}
|
||||||
|
RPC: {{ rpc_health.status | default('unreachable') }}
|
||||||
|
DZ identity: {{ dz_address.stdout | default('unknown') | trim }}
|
||||||
|
DZ tunnel: {{ 'UP' if dz_status.rc | default(1) == 0 else 'DOWN' }}
|
||||||
|
DZ systemd: {{ dz_systemd_state }}
|
||||||
|
Ramdisk: {{ ramdisk_df.stdout_lines[-1] | default('unknown') }}
|
||||||
|
Load: {{ loadavg.stdout | default('unknown') }}
|
||||||
|
|
@ -0,0 +1,98 @@
|
||||||
|
#!/bin/bash
|
||||||
|
# Check shred completeness at the tip of the blockstore.
|
||||||
|
#
|
||||||
|
# Samples the most recent N slots and reports how many are full.
|
||||||
|
# Use this to determine when enough complete blocks have accumulated
|
||||||
|
# to safely download a new snapshot that lands within the complete range.
|
||||||
|
#
|
||||||
|
# Usage: kubectl exec ... -- bash -c "$(cat check-shred-completeness.sh)"
|
||||||
|
# Or: ssh biscayne ... 'KUBECONFIG=... kubectl exec ... -- agave-ledger-tool ...'
|
||||||
|
|
||||||
|
set -euo pipefail
|
||||||
|
|
||||||
|
KUBECONFIG="${KUBECONFIG:-/home/rix/.kube/config}"
|
||||||
|
NS="laconic-laconic-70ce4c4b47e23b85"
|
||||||
|
DEPLOY="laconic-70ce4c4b47e23b85-deployment"
|
||||||
|
SAMPLE_SIZE="${1:-200}"
|
||||||
|
|
||||||
|
# Get blockstore bounds
|
||||||
|
BOUNDS=$(kubectl exec -n "$NS" deployment/"$DEPLOY" -c agave-validator -- \
|
||||||
|
agave-ledger-tool -l /data/ledger blockstore bounds 2>&1 | grep "^Ledger")
|
||||||
|
|
||||||
|
HIGHEST=$(echo "$BOUNDS" | grep -oP 'to \K[0-9]+')
|
||||||
|
START=$((HIGHEST - SAMPLE_SIZE))
|
||||||
|
|
||||||
|
echo "Blockstore highest slot: $HIGHEST"
|
||||||
|
echo "Sampling slots $START to $HIGHEST ($SAMPLE_SIZE slots)"
|
||||||
|
echo ""
|
||||||
|
|
||||||
|
# Get slot metadata
|
||||||
|
OUTPUT=$(kubectl exec -n "$NS" deployment/"$DEPLOY" -c agave-validator -- \
|
||||||
|
agave-ledger-tool -l /data/ledger blockstore print \
|
||||||
|
--starting-slot "$START" --ending-slot "$HIGHEST" 2>&1 \
|
||||||
|
| grep -E "^Slot|is_full")
|
||||||
|
|
||||||
|
TOTAL=$(echo "$OUTPUT" | grep -c "^Slot" || true)
|
||||||
|
FULL=$(echo "$OUTPUT" | grep -c "is_full: true" || true)
|
||||||
|
INCOMPLETE=$(echo "$OUTPUT" | grep -c "is_full: false" || true)
|
||||||
|
|
||||||
|
echo "Total slots with data: $TOTAL / $SAMPLE_SIZE"
|
||||||
|
echo "Complete (is_full: true): $FULL"
|
||||||
|
echo "Incomplete (is_full: false): $INCOMPLETE"
|
||||||
|
|
||||||
|
if [ "$TOTAL" -gt 0 ]; then
|
||||||
|
PCT=$((FULL * 100 / TOTAL))
|
||||||
|
echo "Completeness: ${PCT}%"
|
||||||
|
else
|
||||||
|
echo "Completeness: N/A (no data)"
|
||||||
|
fi
|
||||||
|
|
||||||
|
echo ""
|
||||||
|
|
||||||
|
# Find the first full slot counting backward from the tip
|
||||||
|
# This tells us where the contiguous complete run starts
|
||||||
|
echo "--- Contiguous complete run from tip ---"
|
||||||
|
|
||||||
|
# Get just the slot numbers and is_full in reverse order
|
||||||
|
REVERSED=$(echo "$OUTPUT" | paste - - | awk '{
|
||||||
|
slot = $2;
|
||||||
|
full = ($NF == "true") ? 1 : 0;
|
||||||
|
print slot, full
|
||||||
|
}' | sort -rn)
|
||||||
|
|
||||||
|
CONTIGUOUS=0
|
||||||
|
FIRST_FULL=""
|
||||||
|
while IFS=' ' read -r slot full; do
|
||||||
|
if [ "$full" -eq 1 ]; then
|
||||||
|
CONTIGUOUS=$((CONTIGUOUS + 1))
|
||||||
|
FIRST_FULL="$slot"
|
||||||
|
else
|
||||||
|
break
|
||||||
|
fi
|
||||||
|
done <<< "$REVERSED"
|
||||||
|
|
||||||
|
if [ -n "$FIRST_FULL" ]; then
|
||||||
|
echo "Contiguous complete slots from tip: $CONTIGUOUS"
|
||||||
|
echo "Run starts at slot: $FIRST_FULL"
|
||||||
|
echo "Run ends at slot: $HIGHEST"
|
||||||
|
echo ""
|
||||||
|
echo "A snapshot with slot >= $FIRST_FULL would replay from local blockstore."
|
||||||
|
|
||||||
|
# Check against mainnet
|
||||||
|
MAINNET_SLOT=$(curl -s -X POST -H "Content-Type: application/json" \
|
||||||
|
-d '{"jsonrpc":"2.0","id":1,"method":"getSlot","params":[{"commitment":"finalized"}]}' \
|
||||||
|
https://api.mainnet-beta.solana.com | grep -oP '"result":\K[0-9]+')
|
||||||
|
|
||||||
|
GAP=$((MAINNET_SLOT - HIGHEST))
|
||||||
|
echo "Mainnet tip: $MAINNET_SLOT (blockstore is $GAP slots behind tip)"
|
||||||
|
|
||||||
|
if [ "$CONTIGUOUS" -gt 100 ]; then
|
||||||
|
echo ""
|
||||||
|
echo ">>> READY: $CONTIGUOUS contiguous complete slots. Safe to download a snapshot."
|
||||||
|
else
|
||||||
|
echo ""
|
||||||
|
echo ">>> NOT READY: Only $CONTIGUOUS contiguous complete slots. Wait for more."
|
||||||
|
fi
|
||||||
|
else
|
||||||
|
echo "No contiguous complete run from tip found."
|
||||||
|
fi
|
||||||
|
|
@ -0,0 +1,38 @@
|
||||||
|
#!/bin/bash
|
||||||
|
# Run a command in a tmux pane and capture its output.
|
||||||
|
# User sees it streaming in the pane; caller gets stdout back.
|
||||||
|
#
|
||||||
|
# Usage: pane-exec.sh <pane-id> <command...>
|
||||||
|
# Example: pane-exec.sh %6565 ansible-playbook -i inventory/switches.yml playbooks/foo.yml
|
||||||
|
|
||||||
|
set -euo pipefail
|
||||||
|
|
||||||
|
PANE="$1"
|
||||||
|
shift
|
||||||
|
CMD="$*"
|
||||||
|
|
||||||
|
TMPFILE=$(mktemp /tmp/pane-output.XXXXXX)
|
||||||
|
MARKER="__PANE_EXEC_DONE_${RANDOM}_$$__"
|
||||||
|
|
||||||
|
cleanup() {
|
||||||
|
tmux pipe-pane -t "$PANE" 2>/dev/null || true
|
||||||
|
rm -f "$TMPFILE"
|
||||||
|
}
|
||||||
|
trap cleanup EXIT
|
||||||
|
|
||||||
|
# Start capturing pane output
|
||||||
|
tmux pipe-pane -o -t "$PANE" "cat >> $TMPFILE"
|
||||||
|
|
||||||
|
# Send the command, then echo a marker so we know when it's done
|
||||||
|
tmux send-keys -t "$PANE" "$CMD; echo $MARKER" Enter
|
||||||
|
|
||||||
|
# Wait for the marker
|
||||||
|
while ! grep -q "$MARKER" "$TMPFILE" 2>/dev/null; do
|
||||||
|
sleep 0.5
|
||||||
|
done
|
||||||
|
|
||||||
|
# Stop capturing
|
||||||
|
tmux pipe-pane -t "$PANE"
|
||||||
|
|
||||||
|
# Strip ANSI escape codes, remove the marker line, output the rest
|
||||||
|
sed 's/\x1b\[[0-9;]*[a-zA-Z]//g; s/\x1b\[[?][0-9]*[a-zA-Z]//g' "$TMPFILE" | grep -v "$MARKER"
|
||||||
|
|
@ -0,0 +1,151 @@
|
||||||
|
import { chromium } from 'playwright';
|
||||||
|
import { writeFileSync, mkdirSync } from 'fs';
|
||||||
|
import { join } from 'path';
|
||||||
|
|
||||||
|
const OUT_DIR = join(import.meta.dirname, '..', 'docs', 'arista-scraped');
|
||||||
|
mkdirSync(OUT_DIR, { recursive: true });
|
||||||
|
|
||||||
|
const pages = [
|
||||||
|
{ url: 'https://www.arista.com/en/um-eos/eos-static-inter-vrf-route', file: 'static-inter-vrf-route.md' },
|
||||||
|
{ url: 'https://www.arista.com/en/um-eos/eos-inter-vrf-local-route-leaking', file: 'inter-vrf-local-route-leaking.md' },
|
||||||
|
{ url: 'https://www.arista.com/en/um-eos/eos-policy-based-routing', file: 'policy-based-routing.md' },
|
||||||
|
{ url: 'https://www.arista.com/en/um-eos/eos-traffic-management', file: 'traffic-management.md' },
|
||||||
|
{ url: 'https://www.arista.com/en/um-eos/eos-policy-based-routing-pbr', file: 'pbr.md' },
|
||||||
|
{ url: 'https://www.arista.com/en/um-eos/eos-configuring-vrf-instances', file: 'configuring-vrf.md' },
|
||||||
|
{ url: 'https://www.arista.com/en/um-eos/eos-gre-tunnels', file: 'gre-tunnels.md' },
|
||||||
|
{ url: 'https://www.arista.com/en/um-eos/eos-access-control-lists', file: 'access-control-lists.md' },
|
||||||
|
{ url: 'https://www.arista.com/en/um-eos/eos-static-routes', file: 'static-routes.md' },
|
||||||
|
{ url: 'https://www.arista.com/en/um-eos/eos-configuration-sessions', file: 'configuration-sessions.md' },
|
||||||
|
{ url: 'https://www.arista.com/en/um-eos/eos-checkpoint-and-rollback', file: 'checkpoint-rollback.md' },
|
||||||
|
{ url: 'https://www.arista.com/en/um-eos', file: '_index.md' },
|
||||||
|
];
|
||||||
|
|
||||||
|
async function scrapePage(page, url, filename) {
|
||||||
|
console.log(`Scraping: ${url}`);
|
||||||
|
try {
|
||||||
|
const resp = await page.goto(url, { waitUntil: 'domcontentloaded', timeout: 30000 });
|
||||||
|
console.log(` Status: ${resp.status()}`);
|
||||||
|
|
||||||
|
// Wait for JS to render
|
||||||
|
await page.waitForTimeout(8000);
|
||||||
|
|
||||||
|
// Check for CAPTCHA
|
||||||
|
const bodyText = await page.evaluate(() => document.body.innerText.substring(0, 200));
|
||||||
|
if (bodyText.includes('CAPTCHA') || bodyText.includes("couldn't load")) {
|
||||||
|
console.log(` BLOCKED by CAPTCHA/anti-bot on ${filename}`);
|
||||||
|
writeFileSync(join(OUT_DIR, filename), `# BLOCKED BY CAPTCHA\n\nURL: ${url}\nThe Arista docs site requires CAPTCHA verification for headless browsers.\n`);
|
||||||
|
return false;
|
||||||
|
}
|
||||||
|
|
||||||
|
// Extract content
|
||||||
|
const content = await page.evaluate(() => {
|
||||||
|
const selectors = [
|
||||||
|
'#content', '.article-content', '.content-area', '#main-content',
|
||||||
|
'article', '.item-page', '#sp-component', '.com-content-article',
|
||||||
|
'main', '#sp-main-body',
|
||||||
|
];
|
||||||
|
|
||||||
|
let el = null;
|
||||||
|
for (const sel of selectors) {
|
||||||
|
el = document.querySelector(sel);
|
||||||
|
if (el && el.textContent.trim().length > 100) break;
|
||||||
|
}
|
||||||
|
if (!el) el = document.body;
|
||||||
|
|
||||||
|
function nodeToMd(node) {
|
||||||
|
if (node.nodeType === Node.TEXT_NODE) return node.textContent;
|
||||||
|
if (node.nodeType !== Node.ELEMENT_NODE) return '';
|
||||||
|
const tag = node.tagName.toLowerCase();
|
||||||
|
if (['nav', 'footer', 'script', 'style', 'noscript', 'iframe'].includes(tag)) return '';
|
||||||
|
if (node.classList && (node.classList.contains('nav') || node.classList.contains('sidebar') ||
|
||||||
|
node.classList.contains('menu') || node.classList.contains('footer') ||
|
||||||
|
node.classList.contains('header'))) return '';
|
||||||
|
let children = Array.from(node.childNodes).map(c => nodeToMd(c)).join('');
|
||||||
|
switch (tag) {
|
||||||
|
case 'h1': return `\n# ${children.trim()}\n\n`;
|
||||||
|
case 'h2': return `\n## ${children.trim()}\n\n`;
|
||||||
|
case 'h3': return `\n### ${children.trim()}\n\n`;
|
||||||
|
case 'h4': return `\n#### ${children.trim()}\n\n`;
|
||||||
|
case 'p': return `\n${children.trim()}\n\n`;
|
||||||
|
case 'br': return '\n';
|
||||||
|
case 'li': return `- ${children.trim()}\n`;
|
||||||
|
case 'ul': case 'ol': return `\n${children}\n`;
|
||||||
|
case 'pre': return `\n\`\`\`\n${children.trim()}\n\`\`\`\n\n`;
|
||||||
|
case 'code': return `\`${children.trim()}\``;
|
||||||
|
case 'strong': case 'b': return `**${children.trim()}**`;
|
||||||
|
case 'em': case 'i': return `*${children.trim()}*`;
|
||||||
|
case 'table': return `\n${children}\n`;
|
||||||
|
case 'tr': return `${children}|\n`;
|
||||||
|
case 'th': case 'td': return `| ${children.trim()} `;
|
||||||
|
case 'a': {
|
||||||
|
const href = node.getAttribute('href');
|
||||||
|
if (href && !href.startsWith('#') && !href.startsWith('javascript'))
|
||||||
|
return `[${children.trim()}](${href})`;
|
||||||
|
return children;
|
||||||
|
}
|
||||||
|
default: return children;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
return nodeToMd(el);
|
||||||
|
});
|
||||||
|
|
||||||
|
const cleaned = content.replace(/\n{4,}/g, '\n\n\n').replace(/[ \t]+$/gm, '').trim();
|
||||||
|
const header = `<!-- Source: ${url} -->\n<!-- Scraped: ${new Date().toISOString()} -->\n\n`;
|
||||||
|
writeFileSync(join(OUT_DIR, filename), header + cleaned + '\n');
|
||||||
|
console.log(` Saved ${filename} (${cleaned.length} chars)`);
|
||||||
|
return true;
|
||||||
|
} catch (e) {
|
||||||
|
console.error(` FAILED: ${e.message}`);
|
||||||
|
writeFileSync(join(OUT_DIR, filename), `# FAILED TO LOAD\n\nURL: ${url}\nError: ${e.message}\n`);
|
||||||
|
return false;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
async function main() {
|
||||||
|
// Launch with stealth-like settings
|
||||||
|
const browser = await chromium.launch({
|
||||||
|
headless: false, // Use headed mode via Xvfb if available, else new headless
|
||||||
|
args: [
|
||||||
|
'--headless=new', // New headless mode (less detectable)
|
||||||
|
'--disable-blink-features=AutomationControlled',
|
||||||
|
'--no-sandbox',
|
||||||
|
],
|
||||||
|
});
|
||||||
|
|
||||||
|
const context = await browser.newContext({
|
||||||
|
userAgent: 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36',
|
||||||
|
locale: 'en-US',
|
||||||
|
timezoneId: 'America/New_York',
|
||||||
|
viewport: { width: 1920, height: 1080 },
|
||||||
|
});
|
||||||
|
|
||||||
|
// Remove webdriver property
|
||||||
|
await context.addInitScript(() => {
|
||||||
|
Object.defineProperty(navigator, 'webdriver', { get: () => false });
|
||||||
|
// Override permissions
|
||||||
|
const originalQuery = window.navigator.permissions.query;
|
||||||
|
window.navigator.permissions.query = (parameters) =>
|
||||||
|
parameters.name === 'notifications'
|
||||||
|
? Promise.resolve({ state: Notification.permission })
|
||||||
|
: originalQuery(parameters);
|
||||||
|
});
|
||||||
|
|
||||||
|
const page = await context.newPage();
|
||||||
|
|
||||||
|
let anySuccess = false;
|
||||||
|
for (const { url, file } of pages) {
|
||||||
|
const ok = await scrapePage(page, url, file);
|
||||||
|
if (ok) anySuccess = true;
|
||||||
|
// Add delay between requests
|
||||||
|
await page.waitForTimeout(2000);
|
||||||
|
}
|
||||||
|
|
||||||
|
if (!anySuccess) {
|
||||||
|
console.log('\nAll pages blocked by CAPTCHA. Arista docs require human verification.');
|
||||||
|
}
|
||||||
|
|
||||||
|
await browser.close();
|
||||||
|
console.log('\nDone!');
|
||||||
|
}
|
||||||
|
|
||||||
|
main().catch(e => { console.error(e); process.exit(1); });
|
||||||
|
|
@ -0,0 +1,34 @@
|
||||||
|
#!/usr/bin/env python3
|
||||||
|
"""Strip IP+UDP headers from mirrored packets and forward raw UDP payload."""
|
||||||
|
import socket
|
||||||
|
import sys
|
||||||
|
|
||||||
|
LISTEN_PORT = int(sys.argv[1]) if len(sys.argv) > 1 else 9100
|
||||||
|
FORWARD_HOST = sys.argv[2] if len(sys.argv) > 2 else "127.0.0.1"
|
||||||
|
FORWARD_PORT = int(sys.argv[3]) if len(sys.argv) > 3 else 9000
|
||||||
|
|
||||||
|
sock_in = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
|
||||||
|
sock_in.bind(("0.0.0.0", LISTEN_PORT))
|
||||||
|
|
||||||
|
sock_out = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
|
||||||
|
|
||||||
|
count = 0
|
||||||
|
while True:
|
||||||
|
data, addr = sock_in.recvfrom(65535)
|
||||||
|
if len(data) < 28:
|
||||||
|
continue
|
||||||
|
# IP header: first nibble is version (4), second nibble is IHL (words)
|
||||||
|
if (data[0] >> 4) != 4:
|
||||||
|
continue
|
||||||
|
ihl = (data[0] & 0x0F) * 4
|
||||||
|
# Protocol should be UDP (17)
|
||||||
|
if data[9] != 17:
|
||||||
|
continue
|
||||||
|
# Payload starts after IP header + 8-byte UDP header
|
||||||
|
offset = ihl + 8
|
||||||
|
payload = data[offset:]
|
||||||
|
if payload:
|
||||||
|
sock_out.sendto(payload, (FORWARD_HOST, FORWARD_PORT))
|
||||||
|
count += 1
|
||||||
|
if count % 10000 == 0:
|
||||||
|
print(f"Forwarded {count} shreds", flush=True)
|
||||||
|
|
@ -0,0 +1,546 @@
|
||||||
|
#!/usr/bin/env python3
|
||||||
|
"""Download Solana snapshots using aria2c for parallel multi-connection downloads.
|
||||||
|
|
||||||
|
Discovers snapshot sources by querying getClusterNodes for all RPCs in the
|
||||||
|
cluster, probing each for available snapshots, benchmarking download speed,
|
||||||
|
and downloading from the fastest source using aria2c (16 connections by default).
|
||||||
|
|
||||||
|
Based on the discovery approach from etcusr/solana-snapshot-finder but replaces
|
||||||
|
the single-connection wget download with aria2c parallel chunked downloads.
|
||||||
|
|
||||||
|
Usage:
|
||||||
|
# Download to /srv/solana/snapshots (mainnet, 16 connections)
|
||||||
|
./snapshot-download.py -o /srv/solana/snapshots
|
||||||
|
|
||||||
|
# Dry run — find best source, print URL
|
||||||
|
./snapshot-download.py --dry-run
|
||||||
|
|
||||||
|
# Custom RPC for cluster node discovery + 32 connections
|
||||||
|
./snapshot-download.py -r https://api.mainnet-beta.solana.com -n 32
|
||||||
|
|
||||||
|
# Testnet
|
||||||
|
./snapshot-download.py -c testnet -o /data/snapshots
|
||||||
|
|
||||||
|
Requirements:
|
||||||
|
- aria2c (apt install aria2)
|
||||||
|
- python3 >= 3.10 (stdlib only, no pip dependencies)
|
||||||
|
"""
|
||||||
|
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
import argparse
|
||||||
|
import concurrent.futures
|
||||||
|
import json
|
||||||
|
import logging
|
||||||
|
import os
|
||||||
|
import re
|
||||||
|
import shutil
|
||||||
|
import subprocess
|
||||||
|
import sys
|
||||||
|
import time
|
||||||
|
import urllib.error
|
||||||
|
import urllib.request
|
||||||
|
from dataclasses import dataclass, field
|
||||||
|
from http.client import HTTPResponse
|
||||||
|
from pathlib import Path
|
||||||
|
from typing import NoReturn
|
||||||
|
from urllib.request import Request
|
||||||
|
|
||||||
|
log: logging.Logger = logging.getLogger("snapshot-download")
|
||||||
|
|
||||||
|
CLUSTER_RPC: dict[str, str] = {
|
||||||
|
"mainnet-beta": "https://api.mainnet-beta.solana.com",
|
||||||
|
"testnet": "https://api.testnet.solana.com",
|
||||||
|
"devnet": "https://api.devnet.solana.com",
|
||||||
|
}
|
||||||
|
|
||||||
|
# Snapshot filenames:
|
||||||
|
# snapshot-<slot>-<hash>.tar.zst
|
||||||
|
# incremental-snapshot-<base_slot>-<slot>-<hash>.tar.zst
|
||||||
|
FULL_SNAP_RE: re.Pattern[str] = re.compile(
|
||||||
|
r"^snapshot-(\d+)-([A-Za-z0-9]+)\.tar\.(zst|bz2)$"
|
||||||
|
)
|
||||||
|
INCR_SNAP_RE: re.Pattern[str] = re.compile(
|
||||||
|
r"^incremental-snapshot-(\d+)-(\d+)-([A-Za-z0-9]+)\.tar\.(zst|bz2)$"
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass
|
||||||
|
class SnapshotSource:
|
||||||
|
"""A snapshot file available from a specific RPC node."""
|
||||||
|
|
||||||
|
rpc_address: str
|
||||||
|
# Full redirect paths as returned by the server (e.g. /snapshot-123-hash.tar.zst)
|
||||||
|
file_paths: list[str] = field(default_factory=list)
|
||||||
|
slots_diff: int = 0
|
||||||
|
latency_ms: float = 0.0
|
||||||
|
download_speed: float = 0.0 # bytes/sec
|
||||||
|
|
||||||
|
|
||||||
|
# -- JSON-RPC helpers ----------------------------------------------------------
|
||||||
|
|
||||||
|
|
||||||
|
class _NoRedirectHandler(urllib.request.HTTPRedirectHandler):
|
||||||
|
"""Handler that captures redirect Location instead of following it."""
|
||||||
|
|
||||||
|
def redirect_request(
|
||||||
|
self,
|
||||||
|
req: Request,
|
||||||
|
fp: HTTPResponse,
|
||||||
|
code: int,
|
||||||
|
msg: str,
|
||||||
|
headers: dict[str, str], # type: ignore[override]
|
||||||
|
newurl: str,
|
||||||
|
) -> None:
|
||||||
|
return None
|
||||||
|
|
||||||
|
|
||||||
|
def rpc_post(url: str, method: str, params: list[object] | None = None,
|
||||||
|
timeout: int = 25) -> object | None:
|
||||||
|
"""JSON-RPC POST. Returns parsed 'result' field or None on error."""
|
||||||
|
payload: bytes = json.dumps({
|
||||||
|
"jsonrpc": "2.0", "id": 1,
|
||||||
|
"method": method, "params": params or [],
|
||||||
|
}).encode()
|
||||||
|
req = Request(url, data=payload,
|
||||||
|
headers={"Content-Type": "application/json"})
|
||||||
|
try:
|
||||||
|
with urllib.request.urlopen(req, timeout=timeout) as resp:
|
||||||
|
data: dict[str, object] = json.loads(resp.read())
|
||||||
|
return data.get("result")
|
||||||
|
except (urllib.error.URLError, json.JSONDecodeError, OSError, TimeoutError) as e:
|
||||||
|
log.debug("rpc_post %s %s failed: %s", url, method, e)
|
||||||
|
return None
|
||||||
|
|
||||||
|
|
||||||
|
def head_no_follow(url: str, timeout: float = 3) -> tuple[str | None, float]:
|
||||||
|
"""HEAD request without following redirects.
|
||||||
|
|
||||||
|
Returns (Location header value, latency_sec) if the server returned a
|
||||||
|
3xx redirect. Returns (None, 0.0) on any error or non-redirect response.
|
||||||
|
"""
|
||||||
|
opener: urllib.request.OpenerDirector = urllib.request.build_opener(_NoRedirectHandler)
|
||||||
|
req = Request(url, method="HEAD")
|
||||||
|
try:
|
||||||
|
start: float = time.monotonic()
|
||||||
|
resp: HTTPResponse = opener.open(req, timeout=timeout) # type: ignore[assignment]
|
||||||
|
latency: float = time.monotonic() - start
|
||||||
|
# Non-redirect (2xx) — server didn't redirect, not useful for discovery
|
||||||
|
location: str | None = resp.headers.get("Location")
|
||||||
|
resp.close()
|
||||||
|
return location, latency
|
||||||
|
except urllib.error.HTTPError as e:
|
||||||
|
# 3xx redirects raise HTTPError with the redirect info
|
||||||
|
latency = time.monotonic() - start # type: ignore[possibly-undefined]
|
||||||
|
location = e.headers.get("Location")
|
||||||
|
if location and 300 <= e.code < 400:
|
||||||
|
return location, latency
|
||||||
|
return None, 0.0
|
||||||
|
except (urllib.error.URLError, OSError, TimeoutError):
|
||||||
|
return None, 0.0
|
||||||
|
|
||||||
|
|
||||||
|
# -- Discovery -----------------------------------------------------------------
|
||||||
|
|
||||||
|
|
||||||
|
def get_current_slot(rpc_url: str) -> int | None:
|
||||||
|
"""Get current slot from RPC."""
|
||||||
|
result: object | None = rpc_post(rpc_url, "getSlot")
|
||||||
|
if isinstance(result, int):
|
||||||
|
return result
|
||||||
|
return None
|
||||||
|
|
||||||
|
|
||||||
|
def get_cluster_rpc_nodes(rpc_url: str, version_filter: str | None = None) -> list[str]:
|
||||||
|
"""Get all RPC node addresses from getClusterNodes."""
|
||||||
|
result: object | None = rpc_post(rpc_url, "getClusterNodes")
|
||||||
|
if not isinstance(result, list):
|
||||||
|
return []
|
||||||
|
|
||||||
|
rpc_addrs: list[str] = []
|
||||||
|
for node in result:
|
||||||
|
if not isinstance(node, dict):
|
||||||
|
continue
|
||||||
|
if version_filter is not None:
|
||||||
|
node_version: str | None = node.get("version")
|
||||||
|
if node_version and not node_version.startswith(version_filter):
|
||||||
|
continue
|
||||||
|
rpc: str | None = node.get("rpc")
|
||||||
|
if rpc:
|
||||||
|
rpc_addrs.append(rpc)
|
||||||
|
return list(set(rpc_addrs))
|
||||||
|
|
||||||
|
|
||||||
|
def _parse_snapshot_filename(location: str) -> tuple[str, str | None]:
|
||||||
|
"""Extract filename and full redirect path from Location header.
|
||||||
|
|
||||||
|
Returns (filename, full_path). full_path includes any path prefix
|
||||||
|
the server returned (e.g. '/snapshots/snapshot-123-hash.tar.zst').
|
||||||
|
"""
|
||||||
|
# Location may be absolute URL or relative path
|
||||||
|
if location.startswith("http://") or location.startswith("https://"):
|
||||||
|
# Absolute URL — extract path
|
||||||
|
from urllib.parse import urlparse
|
||||||
|
path: str = urlparse(location).path
|
||||||
|
else:
|
||||||
|
path = location
|
||||||
|
|
||||||
|
filename: str = path.rsplit("/", 1)[-1]
|
||||||
|
return filename, path
|
||||||
|
|
||||||
|
|
||||||
|
def probe_rpc_snapshot(
|
||||||
|
rpc_address: str,
|
||||||
|
current_slot: int,
|
||||||
|
max_age_slots: int,
|
||||||
|
max_latency_ms: float,
|
||||||
|
) -> SnapshotSource | None:
|
||||||
|
"""Probe a single RPC node for available snapshots.
|
||||||
|
|
||||||
|
Probes for full snapshot first (required), then incremental. Records all
|
||||||
|
available files. Which files to actually download is decided at download
|
||||||
|
time based on what already exists locally — not here.
|
||||||
|
|
||||||
|
Based on the discovery approach from etcusr/solana-snapshot-finder.
|
||||||
|
"""
|
||||||
|
full_url: str = f"http://{rpc_address}/snapshot.tar.bz2"
|
||||||
|
|
||||||
|
# Full snapshot is required — every source must have one
|
||||||
|
full_location, full_latency = head_no_follow(full_url, timeout=2)
|
||||||
|
if not full_location:
|
||||||
|
return None
|
||||||
|
|
||||||
|
latency_ms: float = full_latency * 1000
|
||||||
|
if latency_ms > max_latency_ms:
|
||||||
|
return None
|
||||||
|
|
||||||
|
full_filename, full_path = _parse_snapshot_filename(full_location)
|
||||||
|
fm: re.Match[str] | None = FULL_SNAP_RE.match(full_filename)
|
||||||
|
if not fm:
|
||||||
|
return None
|
||||||
|
|
||||||
|
full_snap_slot: int = int(fm.group(1))
|
||||||
|
slots_diff: int = current_slot - full_snap_slot
|
||||||
|
|
||||||
|
if slots_diff > max_age_slots or slots_diff < -100:
|
||||||
|
return None
|
||||||
|
|
||||||
|
file_paths: list[str] = [full_path]
|
||||||
|
|
||||||
|
# Also check for incremental snapshot
|
||||||
|
inc_url: str = f"http://{rpc_address}/incremental-snapshot.tar.bz2"
|
||||||
|
inc_location, _ = head_no_follow(inc_url, timeout=2)
|
||||||
|
if inc_location:
|
||||||
|
inc_filename, inc_path = _parse_snapshot_filename(inc_location)
|
||||||
|
m: re.Match[str] | None = INCR_SNAP_RE.match(inc_filename)
|
||||||
|
if m:
|
||||||
|
inc_base_slot: int = int(m.group(1))
|
||||||
|
# Incremental must be based on this source's full snapshot
|
||||||
|
if inc_base_slot == full_snap_slot:
|
||||||
|
file_paths.append(inc_path)
|
||||||
|
|
||||||
|
return SnapshotSource(
|
||||||
|
rpc_address=rpc_address,
|
||||||
|
file_paths=file_paths,
|
||||||
|
slots_diff=slots_diff,
|
||||||
|
latency_ms=latency_ms,
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
def discover_sources(
|
||||||
|
rpc_url: str,
|
||||||
|
current_slot: int,
|
||||||
|
max_age_slots: int,
|
||||||
|
max_latency_ms: float,
|
||||||
|
threads: int,
|
||||||
|
version_filter: str | None,
|
||||||
|
) -> list[SnapshotSource]:
|
||||||
|
"""Discover all snapshot sources from the cluster."""
|
||||||
|
rpc_nodes: list[str] = get_cluster_rpc_nodes(rpc_url, version_filter)
|
||||||
|
if not rpc_nodes:
|
||||||
|
log.error("No RPC nodes found via getClusterNodes")
|
||||||
|
return []
|
||||||
|
|
||||||
|
log.info("Found %d RPC nodes, probing for snapshots...", len(rpc_nodes))
|
||||||
|
|
||||||
|
sources: list[SnapshotSource] = []
|
||||||
|
with concurrent.futures.ThreadPoolExecutor(max_workers=threads) as pool:
|
||||||
|
futures: dict[concurrent.futures.Future[SnapshotSource | None], str] = {
|
||||||
|
pool.submit(
|
||||||
|
probe_rpc_snapshot, addr, current_slot,
|
||||||
|
max_age_slots, max_latency_ms,
|
||||||
|
): addr
|
||||||
|
for addr in rpc_nodes
|
||||||
|
}
|
||||||
|
done: int = 0
|
||||||
|
for future in concurrent.futures.as_completed(futures):
|
||||||
|
done += 1
|
||||||
|
if done % 200 == 0:
|
||||||
|
log.info(" probed %d/%d nodes, %d sources found",
|
||||||
|
done, len(rpc_nodes), len(sources))
|
||||||
|
try:
|
||||||
|
result: SnapshotSource | None = future.result()
|
||||||
|
except (urllib.error.URLError, OSError, TimeoutError) as e:
|
||||||
|
log.debug("Probe failed for %s: %s", futures[future], e)
|
||||||
|
continue
|
||||||
|
if result:
|
||||||
|
sources.append(result)
|
||||||
|
|
||||||
|
log.info("Found %d RPC nodes with suitable snapshots", len(sources))
|
||||||
|
return sources
|
||||||
|
|
||||||
|
|
||||||
|
# -- Speed benchmark -----------------------------------------------------------
|
||||||
|
|
||||||
|
|
||||||
|
def measure_speed(rpc_address: str, measure_time: int = 7) -> float:
|
||||||
|
"""Measure download speed from an RPC node. Returns bytes/sec."""
|
||||||
|
url: str = f"http://{rpc_address}/snapshot.tar.bz2"
|
||||||
|
req = Request(url)
|
||||||
|
try:
|
||||||
|
with urllib.request.urlopen(req, timeout=measure_time + 5) as resp:
|
||||||
|
start: float = time.monotonic()
|
||||||
|
total: int = 0
|
||||||
|
while True:
|
||||||
|
elapsed: float = time.monotonic() - start
|
||||||
|
if elapsed >= measure_time:
|
||||||
|
break
|
||||||
|
chunk: bytes = resp.read(81920)
|
||||||
|
if not chunk:
|
||||||
|
break
|
||||||
|
total += len(chunk)
|
||||||
|
elapsed = time.monotonic() - start
|
||||||
|
if elapsed <= 0:
|
||||||
|
return 0.0
|
||||||
|
return total / elapsed
|
||||||
|
except (urllib.error.URLError, OSError, TimeoutError):
|
||||||
|
return 0.0
|
||||||
|
|
||||||
|
|
||||||
|
# -- Download ------------------------------------------------------------------
|
||||||
|
|
||||||
|
|
||||||
|
def download_aria2c(
|
||||||
|
urls: list[str],
|
||||||
|
output_dir: str,
|
||||||
|
filename: str,
|
||||||
|
connections: int = 16,
|
||||||
|
) -> bool:
|
||||||
|
"""Download a file using aria2c with parallel connections.
|
||||||
|
|
||||||
|
When multiple URLs are provided, aria2c treats them as mirrors of the
|
||||||
|
same file and distributes chunks across all of them.
|
||||||
|
"""
|
||||||
|
num_mirrors: int = len(urls)
|
||||||
|
total_splits: int = max(connections, connections * num_mirrors)
|
||||||
|
cmd: list[str] = [
|
||||||
|
"aria2c",
|
||||||
|
"--file-allocation=none",
|
||||||
|
"--continue=true",
|
||||||
|
f"--max-connection-per-server={connections}",
|
||||||
|
f"--split={total_splits}",
|
||||||
|
"--min-split-size=50M",
|
||||||
|
# aria2c retries individual chunk connections on transient network
|
||||||
|
# errors (TCP reset, timeout). This is transport-level retry analogous
|
||||||
|
# to TCP retransmit, not application-level retry of a failed operation.
|
||||||
|
"--max-tries=5",
|
||||||
|
"--retry-wait=5",
|
||||||
|
"--timeout=60",
|
||||||
|
"--connect-timeout=10",
|
||||||
|
"--summary-interval=10",
|
||||||
|
"--console-log-level=notice",
|
||||||
|
f"--dir={output_dir}",
|
||||||
|
f"--out={filename}",
|
||||||
|
"--auto-file-renaming=false",
|
||||||
|
"--allow-overwrite=true",
|
||||||
|
*urls,
|
||||||
|
]
|
||||||
|
|
||||||
|
log.info("Downloading %s", filename)
|
||||||
|
log.info(" aria2c: %d connections × %d mirrors (%d splits)",
|
||||||
|
connections, num_mirrors, total_splits)
|
||||||
|
|
||||||
|
start: float = time.monotonic()
|
||||||
|
result: subprocess.CompletedProcess[bytes] = subprocess.run(cmd)
|
||||||
|
elapsed: float = time.monotonic() - start
|
||||||
|
|
||||||
|
if result.returncode != 0:
|
||||||
|
log.error("aria2c failed with exit code %d", result.returncode)
|
||||||
|
return False
|
||||||
|
|
||||||
|
filepath: Path = Path(output_dir) / filename
|
||||||
|
if not filepath.exists():
|
||||||
|
log.error("aria2c reported success but %s does not exist", filepath)
|
||||||
|
return False
|
||||||
|
|
||||||
|
size_bytes: int = filepath.stat().st_size
|
||||||
|
size_gb: float = size_bytes / (1024 ** 3)
|
||||||
|
avg_mb: float = size_bytes / elapsed / (1024 ** 2) if elapsed > 0 else 0
|
||||||
|
log.info(" Done: %.1f GB in %.0fs (%.1f MiB/s avg)", size_gb, elapsed, avg_mb)
|
||||||
|
return True
|
||||||
|
|
||||||
|
|
||||||
|
# -- Main ----------------------------------------------------------------------
|
||||||
|
|
||||||
|
|
||||||
|
def main() -> int:
|
||||||
|
p: argparse.ArgumentParser = argparse.ArgumentParser(
|
||||||
|
description="Download Solana snapshots with aria2c parallel downloads",
|
||||||
|
)
|
||||||
|
p.add_argument("-o", "--output", default="/srv/solana/snapshots",
|
||||||
|
help="Snapshot output directory (default: /srv/solana/snapshots)")
|
||||||
|
p.add_argument("-c", "--cluster", default="mainnet-beta",
|
||||||
|
choices=list(CLUSTER_RPC),
|
||||||
|
help="Solana cluster (default: mainnet-beta)")
|
||||||
|
p.add_argument("-r", "--rpc", default=None,
|
||||||
|
help="RPC URL for cluster discovery (default: public RPC)")
|
||||||
|
p.add_argument("-n", "--connections", type=int, default=16,
|
||||||
|
help="aria2c connections per download (default: 16)")
|
||||||
|
p.add_argument("-t", "--threads", type=int, default=500,
|
||||||
|
help="Threads for parallel RPC probing (default: 500)")
|
||||||
|
p.add_argument("--max-snapshot-age", type=int, default=1300,
|
||||||
|
help="Max snapshot age in slots (default: 1300)")
|
||||||
|
p.add_argument("--max-latency", type=float, default=100,
|
||||||
|
help="Max RPC probe latency in ms (default: 100)")
|
||||||
|
p.add_argument("--min-download-speed", type=int, default=20,
|
||||||
|
help="Min download speed in MiB/s (default: 20)")
|
||||||
|
p.add_argument("--measurement-time", type=int, default=7,
|
||||||
|
help="Speed measurement duration in seconds (default: 7)")
|
||||||
|
p.add_argument("--max-speed-checks", type=int, default=15,
|
||||||
|
help="Max nodes to benchmark before giving up (default: 15)")
|
||||||
|
p.add_argument("--version", default=None,
|
||||||
|
help="Filter nodes by version prefix (e.g. '2.2')")
|
||||||
|
p.add_argument("--full-only", action="store_true",
|
||||||
|
help="Download only full snapshot, skip incremental")
|
||||||
|
p.add_argument("--dry-run", action="store_true",
|
||||||
|
help="Find best source and print URL, don't download")
|
||||||
|
p.add_argument("-v", "--verbose", action="store_true")
|
||||||
|
args: argparse.Namespace = p.parse_args()
|
||||||
|
|
||||||
|
logging.basicConfig(
|
||||||
|
level=logging.DEBUG if args.verbose else logging.INFO,
|
||||||
|
format="%(asctime)s %(levelname)s %(message)s",
|
||||||
|
datefmt="%H:%M:%S",
|
||||||
|
)
|
||||||
|
|
||||||
|
rpc_url: str = args.rpc or CLUSTER_RPC[args.cluster]
|
||||||
|
|
||||||
|
# aria2c is required for actual downloads (not dry-run)
|
||||||
|
if not args.dry_run and not shutil.which("aria2c"):
|
||||||
|
log.error("aria2c not found. Install with: apt install aria2")
|
||||||
|
return 1
|
||||||
|
|
||||||
|
# Get current slot
|
||||||
|
log.info("Cluster: %s | RPC: %s", args.cluster, rpc_url)
|
||||||
|
current_slot: int | None = get_current_slot(rpc_url)
|
||||||
|
if current_slot is None:
|
||||||
|
log.error("Cannot get current slot from %s", rpc_url)
|
||||||
|
return 1
|
||||||
|
log.info("Current slot: %d", current_slot)
|
||||||
|
|
||||||
|
# Discover sources
|
||||||
|
sources: list[SnapshotSource] = discover_sources(
|
||||||
|
rpc_url, current_slot,
|
||||||
|
max_age_slots=args.max_snapshot_age,
|
||||||
|
max_latency_ms=args.max_latency,
|
||||||
|
threads=args.threads,
|
||||||
|
version_filter=args.version,
|
||||||
|
)
|
||||||
|
if not sources:
|
||||||
|
log.error("No snapshot sources found")
|
||||||
|
return 1
|
||||||
|
|
||||||
|
# Sort by latency (lowest first) for speed benchmarking
|
||||||
|
sources.sort(key=lambda s: s.latency_ms)
|
||||||
|
|
||||||
|
# Benchmark top candidates — all speeds in MiB/s (binary, 1 MiB = 1048576 bytes)
|
||||||
|
log.info("Benchmarking download speed on top %d sources...", args.max_speed_checks)
|
||||||
|
fast_sources: list[SnapshotSource] = []
|
||||||
|
checked: int = 0
|
||||||
|
min_speed_bytes: int = args.min_download_speed * 1024 * 1024 # MiB to bytes
|
||||||
|
|
||||||
|
for source in sources:
|
||||||
|
if checked >= args.max_speed_checks:
|
||||||
|
break
|
||||||
|
checked += 1
|
||||||
|
|
||||||
|
speed: float = measure_speed(source.rpc_address, args.measurement_time)
|
||||||
|
source.download_speed = speed
|
||||||
|
speed_mib: float = speed / (1024 ** 2)
|
||||||
|
|
||||||
|
if speed < min_speed_bytes:
|
||||||
|
log.info(" %s: %.1f MiB/s (too slow, need >=%d MiB/s)",
|
||||||
|
source.rpc_address, speed_mib, args.min_download_speed)
|
||||||
|
continue
|
||||||
|
|
||||||
|
log.info(" %s: %.1f MiB/s (latency: %.0fms, age: %d slots)",
|
||||||
|
source.rpc_address, speed_mib,
|
||||||
|
source.latency_ms, source.slots_diff)
|
||||||
|
fast_sources.append(source)
|
||||||
|
|
||||||
|
if not fast_sources:
|
||||||
|
log.error("No source met minimum speed requirement (%d MiB/s)",
|
||||||
|
args.min_download_speed)
|
||||||
|
log.info("Try: --min-download-speed 10")
|
||||||
|
return 1
|
||||||
|
|
||||||
|
# Use the fastest source as primary, collect mirrors for each file
|
||||||
|
best: SnapshotSource = fast_sources[0]
|
||||||
|
file_paths: list[str] = best.file_paths
|
||||||
|
if args.full_only:
|
||||||
|
file_paths = [fp for fp in file_paths
|
||||||
|
if fp.rsplit("/", 1)[-1].startswith("snapshot-")]
|
||||||
|
|
||||||
|
# Build mirror URL lists: for each file, collect URLs from all fast sources
|
||||||
|
# that serve the same filename
|
||||||
|
download_plan: list[tuple[str, list[str]]] = []
|
||||||
|
for fp in file_paths:
|
||||||
|
filename: str = fp.rsplit("/", 1)[-1]
|
||||||
|
mirror_urls: list[str] = [f"http://{best.rpc_address}{fp}"]
|
||||||
|
for other in fast_sources[1:]:
|
||||||
|
for other_fp in other.file_paths:
|
||||||
|
if other_fp.rsplit("/", 1)[-1] == filename:
|
||||||
|
mirror_urls.append(f"http://{other.rpc_address}{other_fp}")
|
||||||
|
break
|
||||||
|
download_plan.append((filename, mirror_urls))
|
||||||
|
|
||||||
|
speed_mib: float = best.download_speed / (1024 ** 2)
|
||||||
|
log.info("Best source: %s (%.1f MiB/s), %d mirrors total",
|
||||||
|
best.rpc_address, speed_mib, len(fast_sources))
|
||||||
|
for filename, mirror_urls in download_plan:
|
||||||
|
log.info(" %s (%d mirrors)", filename, len(mirror_urls))
|
||||||
|
for url in mirror_urls:
|
||||||
|
log.info(" %s", url)
|
||||||
|
|
||||||
|
if args.dry_run:
|
||||||
|
for _, mirror_urls in download_plan:
|
||||||
|
for url in mirror_urls:
|
||||||
|
print(url)
|
||||||
|
return 0
|
||||||
|
|
||||||
|
# Download — skip files that already exist locally
|
||||||
|
os.makedirs(args.output, exist_ok=True)
|
||||||
|
total_start: float = time.monotonic()
|
||||||
|
|
||||||
|
for filename, mirror_urls in download_plan:
|
||||||
|
filepath: Path = Path(args.output) / filename
|
||||||
|
if filepath.exists() and filepath.stat().st_size > 0:
|
||||||
|
log.info("Skipping %s (already exists: %.1f GB)",
|
||||||
|
filename, filepath.stat().st_size / (1024 ** 3))
|
||||||
|
continue
|
||||||
|
if not download_aria2c(mirror_urls, args.output, filename, args.connections):
|
||||||
|
log.error("Failed to download %s", filename)
|
||||||
|
return 1
|
||||||
|
|
||||||
|
total_elapsed: float = time.monotonic() - total_start
|
||||||
|
log.info("All downloads complete in %.0fs", total_elapsed)
|
||||||
|
for filename, _ in download_plan:
|
||||||
|
fp: Path = Path(args.output) / filename
|
||||||
|
if fp.exists():
|
||||||
|
log.info(" %s (%.1f GB)", fp.name, fp.stat().st_size / (1024 ** 3))
|
||||||
|
|
||||||
|
return 0
|
||||||
|
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
sys.exit(main())
|
||||||
Loading…
Reference in New Issue