scripts/agave-container/ is a git subtree of agave-stack's container-build
directory. Replaces fragile cross-repo symlink with proper subtree.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Remounting tmpfs is instant (kernel frees pages), while rm -rf on 400GB+
of accounts files traverses every inode. Recover playbook keeps rm -rf
because the kind node's bind mount prevents umount while the container
is running.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The /dev/ram0 + XFS + format-ramdisk.service approach was unnecessary
complexity from a migration confusion — there was no actual tmpfs bug
with io_uring. tmpfs is simpler (no format-on-boot), resizable on the
fly, and what every other Solana operator uses.
Changes:
- prepare-agave: remove format-ramdisk.service and ramdisk-accounts.service,
use tmpfs fstab entry with size=1024G (was 600G /dev/ram0, too small)
- recover: remove ramdisk_device var (no longer needed)
- redeploy: wipe accounts by rm -rf instead of umount+mkfs
- snapshot-download.py: extract download_best_snapshot() public API for
use by the new container entrypoint.py (in agave-stack)
- CLAUDE.md: update ramdisk docs, fix /srv/solana → /srv/kind/solana paths
- health-check: fix ramdisk path references
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Revert snapshot_dir to /srv/solana/snapshots — aria2c runs on the host
where this is the direct zvol mount (always available), unlike
/srv/kind/solana/snapshots which depends on the bind mount
- Add laconic_so_branch variable (default: main) and use it in both
git reset commands so the branch can be overridden via -e
- Move "Verify ramdisk visible inside kind node" from preflight to after
"Wait for deployment to exist" — the kind container may not exist
during preflight after teardown
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add laconic_so_repo variable (/home/rix/stack-orchestrator) and a
git pull task before deployment start — the editable install must be
current or stale code causes deploy failures
- Downgrade unified mount root check from fatal assertion to debug
warning — the mount style depends on which laconic-so version is
deployed, and individual PV mounts (/mnt/validator-*) work fine
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Fix snapshot_dir: /srv/solana/snapshots → /srv/kind/solana/snapshots
(kind node reads from the bind mount, not the zvol mount directly)
- Fix kind-internal paths: /mnt/solana/... → /mnt/validator-... to match
actual PV hostPath layout (individual mounts, not unified)
- Add 'scale-up' tag to "Scale validator to 1" task for partial recovery
(--tags snapshot,scale-up,verify resumes without re-running deploy)
- Make 'Start deployment' idempotent: failed_when: false + follow-up
check so existing deployment doesn't fail the play
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- FQCN for all modules (ansible.builtin.*)
- changed_when/failed_when on all command/shell tasks
- set -o pipefail on all shell tasks
- Add KUBECONFIG environment to health-check.yml
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- ansible.cfg: enable SSH agent forwarding for git operations
- biscayne-redeploy.yml: add git pull, deploy create --update, and
clear stale PV claimRefs after namespace deletion
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Playbook fixes from testing:
- ashburn-relay-biscayne: insert DNAT rules at position 1 before
Docker's ADDRTYPE LOCAL rule (was being swallowed at position 3+)
- ashburn-relay-mia-sw01: add inbound route for 137.239.194.65 via
egress-vrf vrf1 (nexthop only, no interface — EOS silently drops
cross-VRF routes that specify a tunnel interface)
- ashburn-relay-was-sw01: replace PBR with static route, remove
Loopback101
Bug doc (bug-ashburn-tunnel-port-filtering.md): root cause is the
DoubleZero agent on mia-sw01 overwrites SEC-USER-500-IN ACL, dropping
outbound gossip with src 137.239.194.65. The DZ agent controls
Tunnel500's lifecycle. Fix requires a separate GRE tunnel using
mia-sw01's free LAN IP (209.42.167.137) to bypass DZ infrastructure.
Also adds all repo docs, scripts, inventory, and remaining playbooks.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>