Go to file
A. F. Dudley 3bf87a2e9b feat: snapshot leapfrog — auto-recovery when validator falls behind
Entrypoint changes:
- Always require full + incremental before starting (retry until found)
- Check incremental freshness against convergence threshold (500 slots)
- Gap monitor thread: if validator falls >5000 slots behind for 3
  consecutive checks, graceful stop + restart with fresh incremental
- cmd_serve is now a loop: download → run → monitor → leapfrog → repeat
- --no-snapshot-fetch moved to common args (both RPC and validator modes)
- --maximum-full-snapshots-to-retain default 1 (validator deletes
  downloaded full after generating its own)
- SNAPSHOT_MAX_AGE_SLOTS default 100000 (one full snapshot generation)

snapshot_download.py refactoring:
- Extract _discover_and_benchmark() and _rolling_incremental_download()
  as shared helpers
- Restore download_incremental_for_slot() using shared helpers (downloads
  only an incremental for an existing full snapshot)
- download_best_snapshot() uses shared helpers, downloads full then
  incremental as separate operations

The leapfrog cycle: validator generates full snapshots at standard 100k
block height intervals (same slots as the rest of the network). When the
gap monitor triggers, the entrypoint loops back to maybe_download_snapshot
which finds the validator's local full, downloads a fresh network
incremental (generated every ~40s, converges within the ~11hr full
generation window), and restarts.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-10 05:53:56 +00:00
.pebbles chore: populate pebbles with known bugs and feature requests 2026-03-08 06:59:07 +00:00
ashburn-relay-lab chore: add containerlab topologies for relay testing 2026-03-07 22:30:03 +00:00
docs docs: document DoubleZero agent managed config on both switches 2026-03-07 23:45:36 +00:00
inventory fix: DOCKER-USER rules for inbound relay, add UDP test playbooks 2026-03-08 02:43:31 +00:00
inventory-switches fix: separate switch inventory to prevent accidental targeting 2026-03-07 10:56:48 +00:00
playbooks fix: recovery playbook fixes grafana PV ownership before scale-up 2026-03-10 00:57:36 +00:00
scripts feat: snapshot leapfrog — auto-recovery when validator falls behind 2026-03-10 05:53:56 +00:00
shred-relay-lab chore: add containerlab topologies for relay testing 2026-03-07 22:30:03 +00:00
.gitignore fix: switch ramdisk from /dev/ram0 to tmpfs, refactor snapshot-download.py 2026-03-08 18:43:41 +00:00
CLAUDE.md feat: graceful shutdown, ZFS upgrade, storage migration, sync-tools build 2026-03-09 07:58:37 +00:00
README.md fix: ashburn relay playbooks and document DZ tunnel ACL root cause 2026-03-07 01:44:25 +00:00
ansible.cfg fix: redeploy playbook handles SSH agent, git pull, config regen, stale PVs 2026-03-07 09:58:29 +00:00

README.md

biscayne-agave-runbook

Ansible playbooks for operating the kind-based agave-stack deployment on biscayne.vaasl.io.