9.6 KiB
Biscayne Agave Runbook
Deployment Layers
Operations on biscayne follow a strict layering. Each layer assumes the layers below it are correct. Playbooks belong to exactly one layer.
| Layer | What | Playbooks |
|---|---|---|
| 1. Base system | Docker, ZFS, packages | Out of scope (manual/PXE) |
| 2. Prepare kind | /srv/kind exists (ZFS dataset) |
None needed (ZFS handles it) |
| 3. Install kind | laconic-so deployment start creates kind cluster, mounts /srv/kind → /mnt in kind node |
biscayne-redeploy.yml (deploy tags) |
| 4. Prepare agave | Host storage for agave: ZFS dataset, ramdisk | biscayne-prepare-agave.yml |
| 5. Deploy agave | Deploy agave-stack into kind, snapshot download, scale up | biscayne-redeploy.yml (snapshot/verify tags), biscayne-recover.yml |
Layer 4 invariants (asserted by biscayne-prepare-agave.yml):
/srv/kind/solanais a ZFS dataset (biscayne/DATA/srv/kind/solana), child of the/srv/kinddataset/srv/kind/solana/ramdiskis tmpfs (1TB) — accounts must be in RAM/srv/solanais NOT the data path — it's a directory on the parent ZFS dataset. All data paths use/srv/kind/solana
These invariants are checked at runtime and persisted to fstab/systemd so they survive reboot.
Cross-cutting: health-check.yml (read-only diagnostics), biscayne-stop.yml
(layer 5 — graceful shutdown), fix-pv-mounts.yml (layer 5 — PV repair).
Cluster Operations
Shutdown Order
The agave validator runs inside a kind-based k8s cluster managed by laconic-so.
The kind node is a Docker container. Never restart or kill the kind node container
while the validator is running. Use agave-validator exit --force via the admin
RPC socket for graceful shutdown, or scale the deployment to 0 and wait.
Correct shutdown sequence:
- Scale the deployment to 0 and wait for the pod to terminate:
kubectl scale deployment laconic-70ce4c4b47e23b85-deployment \ -n laconic-laconic-70ce4c4b47e23b85 --replicas=0 kubectl wait --for=delete pod -l app=laconic-70ce4c4b47e23b85-deployment \ -n laconic-laconic-70ce4c4b47e23b85 --timeout=120s - Only then restart the kind node if needed:
docker restart laconic-70ce4c4b47e23b85-control-plane - Scale back up:
kubectl scale deployment laconic-70ce4c4b47e23b85-deployment \ -n laconic-laconic-70ce4c4b47e23b85 --replicas=1
Ramdisk
The accounts directory must be in RAM for performance. tmpfs is used instead of
/dev/ram0 — simpler (no format-on-boot service needed), resizable on the fly
with mount -o remount,size=<new>, and what most Solana operators use.
Boot ordering: /srv/kind/solana is a ZFS dataset mounted automatically by
zfs-mount.service. The tmpfs ramdisk fstab entry uses
x-systemd.requires=zfs-mount.service to ensure the dataset is mounted first.
No manual intervention after reboot.
Mount propagation: The kind node bind-mounts /srv/kind → /mnt at container
start. laconic-so sets propagation: HostToContainer on all kind extraMounts
(commit a11d40f2 in stack-orchestrator), so host submounts propagate into the
kind node automatically. A kind restart is required to pick up the new config
after updating laconic-so.
KUBECONFIG
kubectl must be told where the kubeconfig is when running as root or via ansible:
KUBECONFIG=/home/rix/.kube/config kubectl ...
The ansible playbooks set environment: KUBECONFIG: /home/rix/.kube/config.
SSH Agent
SSH to biscayne goes through a ProxyCommand jump host (abernathy.ch2.vaasl.io). The SSH agent socket rotates when the user reconnects. Find the current one:
ls -t /tmp/ssh-*/agent.* | head -1
Then export it:
export SSH_AUTH_SOCK=/tmp/ssh-XXXX/agent.NNNN
io_uring/ZFS Deadlock — Historical Note
Agave uses io_uring for async I/O. Killing agave ungracefully while it has
outstanding I/O against ZFS can produce unkillable D-state kernel threads
(io_wq_put_and_exit blocked on ZFS transactions), deadlocking the container.
Prevention: Use graceful shutdown (agave-validator exit --force via admin
RPC, or scale to 0 and wait). The biscayne-stop.yml playbook enforces this.
With graceful shutdown, io_uring contexts are closed cleanly and ZFS storage
is safe to use directly (no zvol/XFS workaround needed).
ZFS fix: The underlying io_uring bug is fixed in ZFS 2.2.8+ (PR #17298). Biscayne currently runs ZFS 2.2.2. Upgrading ZFS will eliminate the deadlock risk entirely, even for ungraceful shutdowns.
laconic-so Architecture
laconic-so manages kind clusters atomically — deployment start creates the
kind cluster, namespace, PVs, PVCs, and deployment in one shot. There is no way
to create the cluster without deploying the pod.
Key code paths in stack-orchestrator:
deploy_k8s.py:up()— creates everything atomicallycluster_info.py:get_pvs()— translates host paths usingkind-mount-roothelpers_k8s.py:get_kind_pv_bind_mount_path()— stripskind-mount-rootprefix and prepends/mnt/helpers_k8s.py:_generate_kind_mounts()— whenkind-mount-rootis set, emits a single/srv/kind→/mntmount instead of individual mounts
The kind-mount-root: /srv/kind setting in spec.yml means all data volumes
whose host paths start with /srv/kind get translated to /mnt/... inside the
kind node via a single bind mount.
Key Identifiers
- Kind cluster:
laconic-70ce4c4b47e23b85 - Namespace:
laconic-laconic-70ce4c4b47e23b85 - Deployment:
laconic-70ce4c4b47e23b85-deployment - Kind node container:
laconic-70ce4c4b47e23b85-control-plane - Deployment dir:
/srv/deployments/agave - Snapshot dir:
/srv/kind/solana/snapshots(ZFS dataset, visible to kind at/mnt/validator-snapshots) - Ledger dir:
/srv/kind/solana/ledger(ZFS dataset, visible to kind at/mnt/validator-ledger) - Accounts dir:
/srv/kind/solana/ramdisk/accounts(tmpfs ramdisk, visible to kind at/mnt/validator-accounts) - Log dir:
/srv/kind/solana/log(ZFS dataset, visible to kind at/mnt/validator-log) - WARNING:
/srv/solanais a different ZFS dataset directory. All data paths use/srv/kind/solana. - Host bind mount root:
/srv/kind-> kind node/mnt - laconic-so:
/home/rix/.local/bin/laconic-so(editable install)
PV Mount Paths (inside kind node)
| PV Name | hostPath |
|---|---|
| validator-snapshots | /mnt/validator-snapshots |
| validator-ledger | /mnt/validator-ledger |
| validator-accounts | /mnt/validator-accounts |
| validator-log | /mnt/validator-log |
Snapshot Freshness
If the snapshot is more than 20,000 slots behind the current mainnet tip, it is too old. Stop the validator, download a fresh snapshot, and restart. Do NOT let it try to catch up from an old snapshot — it will take too long and may never converge.
Check with:
# Snapshot slot (from filename)
ls /srv/kind/solana/snapshots/snapshot-*.tar.*
# Current mainnet slot
curl -s -X POST -H "Content-Type: application/json" \
-d '{"jsonrpc":"2.0","id":1,"method":"getSlot","params":[{"commitment":"finalized"}]}' \
https://api.mainnet-beta.solana.com
Snapshot Leapfrog Recovery
When the validator is stuck in a repair-dependent gap (incomplete shreds from a relay outage or insufficient turbine coverage), "grinding through" doesn't work. At 0.4 slots/sec replay through incomplete blocks vs 2.5 slots/sec chain production, the gap grows faster than it shrinks.
Strategy: Download a fresh snapshot whose slot lands past the incomplete zone, into the range where turbine+relay shreds are accumulating in the blockstore. Keep the existing ledger — it has those shreds. The validator replays from local blockstore data instead of waiting on repair.
Steps:
- Let the validator run — turbine+relay accumulate shreds at the tip
- Monitor shred completeness at the tip:
scripts/check-shred-completeness.sh 500 - When there's a contiguous run of complete blocks (>100 slots), note the starting slot of that run
- Scale to 0, wipe accounts (ramdisk), wipe old snapshots
- Do NOT wipe ledger — it has the turbine shreds
- Download a fresh snapshot (its slot should be within the complete run)
- Scale to 1 — validator replays from local blockstore at 3-5 slots/sec
Why this works: Turbine delivers ~60% of shreds in real-time. Repair fills the rest for recent slots quickly (peers prioritize recent data). The only problem is repair for old slots (minutes/hours behind) which peers deprioritize. By snapshotting past the gap, we skip the old-slot repair bottleneck entirely.
Shred Relay (Ashburn)
The TVU shred relay from laconic-was-sw01 provides ~4,000 additional shreds/sec. Without it, turbine alone delivers ~60% of blocks. With it, completeness improves but still requires repair for full coverage.
Current state: Old pipeline (monitor session + socat + shred-unwrap.py).
The traffic-policy redirect was never committed (auto-revert after 5 min timer).
See docs/tvu-shred-relay.md for the traffic-policy config that needs to be
properly applied.
Boot dependency: shred-unwrap.py must be running on biscayne for the old
pipeline to work. It is NOT persistent across reboots. The iptables DNAT rule
for the new pipeline IS persistent (iptables-persistent installed).
Redeploy Flow
See playbooks/biscayne-redeploy.yml. The scale-to-0 pattern is required because
laconic-so creates the cluster and deploys the pod atomically:
- Delete namespace (teardown)
- Optionally wipe data
laconic-so deployment start(creates cluster + pod)- Immediately scale to 0
- Download snapshot via aria2c
- Scale to 1
- Verify