Commit Graph

5 Commits (0bbc3b5a64aceb5086d74f38593b575d6dd757fc)

Author SHA1 Message Date
A. F. Dudley cd36bfe5ee fix: check-status.py smooth in-place redraw, remove comment bars
- Overwrite lines in place instead of clear+redraw (no flicker)
- Pad lines to terminal width to clear stale characters
- Blank leftover rows when output shrinks between frames
- Hide cursor during watch mode
- Remove section comment bars
- Replace unicode checkmarks with +/x

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-10 01:00:36 +00:00
A. F. Dudley b88af2be70 feat: graceful shutdown, ZFS upgrade, storage migration, sync-tools build
- entrypoint.py: Python stays PID 1, traps SIGTERM, requests graceful exit
  via admin RPC (agave-validator exit --force) before falling back to signals
- snapshot_download.py: fix break-on-failure bug in incremental download loop
  (continue + re-probe instead of giving up)
- biscayne-upgrade-zfs.yml: upgrade ZFS 2.2.2 → 2.2.9 via arter97/zfs-lts
  PPA to fix io_uring deadlock at kernel module level
- biscayne-migrate-storage.yml: one-time migration from zvol/XFS to ZFS
  dataset (zvol workaround no longer needed with graceful shutdown + ZFS fix)
- biscayne-stop.yml: patch terminationGracePeriodSeconds to 300 before
  scaling to 0, updated docs for admin RPC shutdown
- biscayne-sync-tools.yml: fix SSH agent forwarding (vars: ansible_become),
  add --tags build-container support, add set -e to shell blocks
- biscayne-recover.yml: updated for graceful shutdown awareness
- check-status.py: add --pane flag for tmux, clean redraw in watch mode
- CLAUDE.md: update docs for ZFS dataset storage, graceful shutdown

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-09 07:58:37 +00:00
A. F. Dudley 173b807451 fix: check-status.py discovers cluster-id from deployment.yml
Instead of hardcoding the laconic cluster ID, namespace, deployment
name, and pod label, read cluster-id from deployment.yml on biscayne
and derive everything from it.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-09 06:48:19 +00:00
A. F. Dudley ed6f6bfd59 fix: check-status.py pod label selector matches actual k8s labels
The pod label is app=laconic-70ce4c4b47e23b85, not
app=laconic-70ce4c4b47e23b85-deployment.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-09 06:46:17 +00:00
A. F. Dudley 09728a719c fix: recovery playbook is fire-and-forget, add check-status.py
The recovery playbook now exits after scaling to 1. The container
entrypoint handles snapshot download (60+ min) and validator startup
autonomously. Removed all polling/verification steps that would
time out waiting.

Added scripts/check-status.py for monitoring download progress,
validator slot, gap to mainnet, catch-up rate, and ramdisk usage.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-09 06:39:25 +00:00