Go to file

A. F. Dudley 3bf87a2e9b feat: snapshot leapfrog — auto-recovery when validator falls behind Entrypoint changes: - Always require full + incremental before starting (retry until found) - Check incremental freshness against convergence threshold (500 slots) - Gap monitor thread: if validator falls >5000 slots behind for 3 consecutive checks, graceful stop + restart with fresh incremental - cmd_serve is now a loop: download → run → monitor → leapfrog → repeat - --no-snapshot-fetch moved to common args (both RPC and validator modes) - --maximum-full-snapshots-to-retain default 1 (validator deletes downloaded full after generating its own) - SNAPSHOT_MAX_AGE_SLOTS default 100000 (one full snapshot generation) snapshot_download.py refactoring: - Extract _discover_and_benchmark() and _rolling_incremental_download() as shared helpers - Restore download_incremental_for_slot() using shared helpers (downloads only an incremental for an existing full snapshot) - download_best_snapshot() uses shared helpers, downloads full then incremental as separate operations The leapfrog cycle: validator generates full snapshots at standard 100k block height intervals (same slots as the rest of the network). When the gap monitor triggers, the entrypoint loops back to maybe_download_snapshot which finds the validator's local full, downloads a fresh network incremental (generated every ~40s, converges within the ~11hr full generation window), and restarts. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>		2026-03-10 05:53:56 +00:00
.pebbles	chore: populate pebbles with known bugs and feature requests	2026-03-08 06:59:07 +00:00
ashburn-relay-lab	chore: add containerlab topologies for relay testing	2026-03-07 22:30:03 +00:00
docs	docs: document DoubleZero agent managed config on both switches	2026-03-07 23:45:36 +00:00
inventory	fix: DOCKER-USER rules for inbound relay, add UDP test playbooks	2026-03-08 02:43:31 +00:00
inventory-switches	fix: separate switch inventory to prevent accidental targeting	2026-03-07 10:56:48 +00:00
playbooks	fix: recovery playbook fixes grafana PV ownership before scale-up	2026-03-10 00:57:36 +00:00
scripts	feat: snapshot leapfrog — auto-recovery when validator falls behind	2026-03-10 05:53:56 +00:00
shred-relay-lab	chore: add containerlab topologies for relay testing	2026-03-07 22:30:03 +00:00
.gitignore	fix: switch ramdisk from /dev/ram0 to tmpfs, refactor snapshot-download.py	2026-03-08 18:43:41 +00:00
CLAUDE.md	feat: graceful shutdown, ZFS upgrade, storage migration, sync-tools build	2026-03-09 07:58:37 +00:00
README.md	fix: ashburn relay playbooks and document DZ tunnel ACL root cause	2026-03-07 01:44:25 +00:00
ansible.cfg	fix: redeploy playbook handles SSH agent, git pull, config regen, stale PVs	2026-03-07 09:58:29 +00:00

README.md

biscayne-agave-runbook

Ansible playbooks for operating the kind-based agave-stack deployment on biscayne.vaasl.io.