stack-orchestrator/.pebbles/events.jsonl

{"type":"create","timestamp":"2026-03-08T06:56:07.080584539Z","issue_id":"so-076","payload":{"description":"Currently laconic-so maps one stack to one deployment to one pod. All containers\nin a stack's compose files become containers in a single k8s pod. This means:\n\n- Can't upgrade doublezero without restarting agave-validator\n- Can't restart monitoring without disrupting the validator\n- Can't independently scale or lifecycle-manage components\n\nThe fix is stack composition. A meta-stack (e.g. biscayne-stack) composes\nsub-stacks (agave, doublezero, agave-monitoring), each becoming its own\nk8s Deployment with independent lifecycle.","priority":"2","title":"Stack composition: deploy multiple stacks into one kind cluster","type":"epic"}}
{"type":"create","timestamp":"2026-03-08T06:56:07.551986919Z","issue_id":"so-ab0","payload":{"description":"Add laconic-so deployment prepare that creates cluster infrastructure without pods. Already implemented, needs review.","priority":"2","title":"deployment prepare command","type":"task"}}
{"type":"create","timestamp":"2026-03-08T06:56:07.884418759Z","issue_id":"so-04f","payload":{"description":"deployment stop on ANY deployment deletes the shared kind cluster. Should only delete its own namespace.","priority":"2","title":"deployment stop should not destroy shared cluster","type":"bug"}}
{"type":"create","timestamp":"2026-03-08T06:56:08.253520249Z","issue_id":"so-370","payload":{"description":"Allow stack.yml to reference sub-stacks. Each sub-stack becomes its own k8s Deployment sharing namespace and PVs.","priority":"2","title":"Add stacks: field to stack.yml for composition","type":"task"}}
{"type":"create","timestamp":"2026-03-08T06:56:08.646764337Z","issue_id":"so-f7c","payload":{"description":"Create three independent stacks from the monolithic agave-stack. Each gets its own compose file and independent lifecycle.","priority":"2","title":"Split agave-stack into agave + doublezero + monitoring","type":"task"}}
{"type":"rename","timestamp":"2026-03-08T06:56:14.499990161Z","issue_id":"so-ab0","payload":{"new_id":"so-076.1"}}
{"type":"dep_add","timestamp":"2026-03-08T06:56:14.499992031Z","issue_id":"so-076.1","payload":{"dep_type":"parent-child","depends_on":"so-076"}}
{"type":"rename","timestamp":"2026-03-08T06:56:14.786407752Z","issue_id":"so-04f","payload":{"new_id":"so-076.2"}}
{"type":"dep_add","timestamp":"2026-03-08T06:56:14.786409842Z","issue_id":"so-076.2","payload":{"dep_type":"parent-child","depends_on":"so-076"}}
{"type":"rename","timestamp":"2026-03-08T06:56:15.058959714Z","issue_id":"so-370","payload":{"new_id":"so-076.3"}}
{"type":"dep_add","timestamp":"2026-03-08T06:56:15.058961364Z","issue_id":"so-076.3","payload":{"dep_type":"parent-child","depends_on":"so-076"}}
{"type":"rename","timestamp":"2026-03-08T06:56:15.410080785Z","issue_id":"so-f7c","payload":{"new_id":"so-076.4"}}
{"type":"dep_add","timestamp":"2026-03-08T06:56:15.410082305Z","issue_id":"so-076.4","payload":{"dep_type":"parent-child","depends_on":"so-076"}}
{"type":"dep_add","timestamp":"2026-03-08T06:56:16.313585082Z","issue_id":"so-076.3","payload":{"dep_type":"blocks","depends_on":"so-076.2"}}
{"type":"dep_add","timestamp":"2026-03-08T06:56:16.567629422Z","issue_id":"so-076.4","payload":{"dep_type":"blocks","depends_on":"so-076.3"}}
{"type": "create", "timestamp": "2026-03-18T14:45:07.038870Z", "issue_id": "so-a1a", "payload": {"title": "deploy create should support external credential injection", "type": "feature", "priority": "2", "description": "deploy create generates config.env but provides no mechanism to inject external credentials (API keys, tokens, etc.) at creation time. Operators must append to config.env after the fact, which mutates a build artifact. deploy create should accept --credentials-file or similar to include secrets in the generated config.env."}}
{"type": "create", "timestamp": "2026-03-18T14:45:07.038942Z", "issue_id": "so-b2b", "payload": {"title": "REGISTRY_TOKEN / imagePullSecret flow undocumented", "type": "bug", "priority": "2", "description": "create_registry_secret() exists in deployment_create.py and is called during up(), but REGISTRY_TOKEN is not documented in spec.yml or any user-facing docs. The restart command warns \"Registry token env var REGISTRY_TOKEN not set, skipping registry secret\" but doesn't explain how to set it. For GHCR private images, this is required and the flow from spec.yml -> config.env -> imagePullSecret needs documentation."}}
{"type": "create", "timestamp": "2026-03-18T19:10:00.000000Z", "issue_id": "so-k1k", "payload": {"title": "Stack path resolution differs between deploy create and deployment restart", "type": "bug", "priority": "2", "description": "deploy create resolves --stack as a relative path from cwd. deployment restart resolves --stack-path as absolute, then computes repo_root as 4 parents up (assuming stack_orchestrator/data/stacks/name structure). External stacks with different nesting depths (e.g. stack-orchestrator/stacks/name = 3 levels) get wrong repo_root, causing --spec-file resolution to fail. The two commands should use the same path resolution logic."}}
{"type": "create", "timestamp": "2026-03-18T19:25:00.000000Z", "issue_id": "so-l2l", "payload": {"title": "deployment restart should update in place, not delete/recreate", "type": "bug", "priority": "1", "description": "deployment restart deletes the entire namespace then recreates everything from scratch. This causes:\n\n1. **Downtime** — nothing serves traffic between delete and successful recreate\n2. **No rollback** — deleting the namespace destroys ReplicaSet revision history\n3. **Race conditions** — namespace may still be terminating when up() tries to create\n4. **Cascading failures** — if ANY container fails to start, the entire site is down with no fallback\n\nFix: three changes needed.\n\n**A. up() should create-or-update, not just create.** Use patch/apply semantics for Deployments, Services, Ingresses. When the pod spec changes (new env vars, new image), k8s creates a new ReplicaSet, scales it up, waits for readiness probes, then scales the old one down. Old pods serve traffic until new pods are healthy.\n\n**B. down() should never delete the namespace on restart.** Only on explicit teardown. The namespace owns the revision history. Current code: _delete_namespace() on every down(). Should: delete individual resources by label for teardown, do nothing for restart (let update-in-place handle it).\n\n**C. All containers need readiness probes.** Without them k8s considers pods ready immediately, defeating rolling update safety. laconic-so should generate readiness probes from the http-proxy routes in spec.yml (if a container has an http route, probe that port).\n\nWith these changes, k8s native rolling updates provide zero-downtime deploys and automatic rollback (if new pods fail readiness, rollout stalls, old pods keep serving).\n\nSource files:\n- deploy_k8s.py: up(), down(), _create_deployment(), _delete_namespace()\n- cluster_info.py: pod spec generation (needs readiness probes)\n- deployment.py: restart() orchestration"}}
{"type": "create", "timestamp": "2026-03-18T20:15:03.000000Z", "issue_id": "so-m3m", "payload": {"title": "Add credentials-files spec key for on-disk credential injection", "type": "feature", "priority": "1", "description": "deployment restart regenerates config.env from spec.yml, wiping credentials that were appended from on-disk files (e.g. ~/.credentials/*.env). Operators must append credentials after deploy create, which is fragile and breaks on restart.\n\nFix: New top-level spec key credentials-files. _write_config_file() reads each file and appends its contents to config.env after writing config vars. Files are read at deploy time from the deployment host.\n\nSpec syntax:\n  credentials-files:\n    - ~/.credentials/dumpster-secrets.env\n    - ~/.credentials/dumpster-r2.env\n\nFiles:\n- deploy/spec.py: add get_credentials_files() returning list of paths\n- deploy/deployment_create.py: in _write_config_file(), after writing config vars, read and append each credentials file (expand ~ to home dir)\n\nAlso update dumpster-stack spec.yml to use the new key and remove the ansible credential append workaround from woodburn_deployer (group_vars/all.yml credentials_env_files, stack_deploy role append tasks, restart_dumpster.yml credential steps). Those cleanups are in the woodburn_deployer repo."}}
{"type":"status_update","timestamp":"2026-03-18T21:54:12.59148256Z","issue_id":"so-m3m","payload":{"status":"in_progress"}}
{"type":"close","timestamp":"2026-03-18T21:55:31.6035544Z","issue_id":"so-m3m","payload":{}}
{"type": "create", "timestamp": "2026-03-20T23:05:00.000000Z", "issue_id": "so-n1n", "payload": {"title": "Merge kind-mount-propagation branch — HostToContainer propagation for extraMounts", "type": "feature", "priority": "2", "description": "The kind-mount-root feature was cherry-picked to main (commit 8d03083d) but the mount propagation fix (commit 929bdab8 on branch enya-ac868cc4-kind-mount-propagation-fix) adds HostToContainer propagation so host submounts propagate into the Kind node. This is needed for ZFS child datasets and tmpfs mounts under the root. Cherry-pick 929bdab8 to main."}}
{"type": "create", "timestamp": "2026-03-20T23:05:00.000000Z", "issue_id": "so-o2o", "payload": {"title": "etcd cert backup not persisting across cluster deletion", "type": "bug", "priority": "1", "description": "The extraMount for etcd at data/cluster-backups/<id>/etcd is configured but after cluster deletion the directory is empty. Caddy TLS certificates stored in etcd are lost. Either etcd isn't writing to the host mount, or the cleanup code is deleting the backup. Investigate _clean_etcd_keeping_certs in helpers.py."}}
{"type": "create", "timestamp": "2026-03-21T00:20:00.000000Z", "issue_id": "so-p3p", "payload": {"title": "laconic-so should manage Caddy ingress image lifecycle", "type": "feature", "priority": "2", "description": "The Caddy ingress controller image is hardcoded in ingress-caddy-kind-deploy.yaml. There's no mechanism to update it without manual kubectl commands or cluster recreation. laconic-so should: 1) Allow spec.yml to specify a custom Caddy image, 2) Support updating the Caddy image as part of deployment restart, 3) Set strategy: Recreate on the Caddy Deployment (hostPort pods can't do RollingUpdate). This would let cryovial or similar tooling trigger Caddy updates through the normal deployment pipeline."}}
{"type":"create","timestamp":"2026-04-08T05:51:31.557582604Z","issue_id":"so-5cd","payload":{"description":"The DockerDeployer.up() in stack_orchestrator/deploy/compose/deploy_docker.py accepts image_overrides as a parameter but silently drops it — only k8s mode (deploy_k8s.py) actually applies overrides.\n\nImpact: the --image container=image CLI flag on 'laconic-so deployment start' is a no-op for compose-mode deployments. Spec-level image-overrides: keys are also ignored in compose mode (they reach up() via deployment.py but are never applied).\n\nUse case: gorchain-stacks test scripts build :local images via build-containers, but compose files reference ghcr.io/gorbagana-dev/*:latest (so prod pulls work). Without image override support in compose mode, tests either need to docker tag the builds or the compose file needs to be rewritten before start — both ugly workarounds for what should be a first-class mechanism.\n\nFix sketch: in DockerDeployer.up(), when image_overrides is non-empty, write a temporary docker-compose.override.yml with {services: {name: {image: ref}}} and construct a new DockerClient with compose_files + [override_path]. Keeps k8s path untouched, reuses existing --image CLI flag and spec-level image-overrides: plumbing.","priority":"2","title":"Compose deployer ignores image_overrides","type":"bug"}}
{"type": "create", "timestamp": "2026-04-13T09:54:05.207241Z", "issue_id": "so-c71", "payload": {"title": "extraPortMappings maps all compose ports unconditionally", "type": "bug", "priority": "2", "description": "Commit fb69cc58 added compose service port mapping to Kind extraPortMappings. The intent was to support network_mode: host services (RPC, gossip), but the implementation maps ALL compose ports unconditionally. Internal-only ports (postgres 5432, redis 6379) get exposed on the host, causing conflicts with local services. The port mapping should only apply to services with network_mode: host, or be controlled by a spec-level opt-in.", "source_commit": "fb69cc58"}}
{"type": "create", "timestamp": "2026-04-14T09:53:31.040118Z", "issue_id": "so-078", "payload": {"title": "Deployments should be self-sufficient: copy hooks into deployment dir", "type": "feature", "priority": "1", "description": "deploy/commands.py hooks are resolved from the stack repo at runtime via get_stack_path. The deployment dir has no copy. This means: (1) the repo must remain at the same path after deploy create, (2) deployment start/restart fail with 'stack does not exist' if cwd differs from deploy create time (stack-source in deployment.yml is relative), (3) deployments cannot be moved or run independently of the source repo. Fix: deploy create should copy deploy/commands.py into the deployment dir alongside compose files and configmaps. call_stack_deploy_start should load from the deployment dir. The deployment becomes self-sufficient."}}
{"type": "update", "timestamp": "2026-04-14T10:01:14.937483Z", "issue_id": "so-c71", "payload": {"status": "resolved", "resolution": "Fixed in commit e909357a on fix/extraport-host-only branch. Only map ports for services with network_mode: host. Ports 80/443 for Caddy always mapped."}}