stack-orchestrator/.pebbles/events.jsonl

{"type": "create", "timestamp": "2026-03-18T14:45:07.038870Z", "issue_id": "so-a1a", "payload": {"title": "deploy create should support external credential injection", "type": "feature", "priority": "2", "description": "deploy create generates config.env but provides no mechanism to inject external credentials (API keys, tokens, etc.) at creation time. Operators must append to config.env after the fact, which mutates a build artifact. deploy create should accept --credentials-file or similar to include secrets in the generated config.env."}}
{"type": "create", "timestamp": "2026-03-18T14:45:07.038942Z", "issue_id": "so-b2b", "payload": {"title": "REGISTRY_TOKEN / imagePullSecret flow undocumented", "type": "bug", "priority": "2", "description": "create_registry_secret() exists in deployment_create.py and is called during up(), but REGISTRY_TOKEN is not documented in spec.yml or any user-facing docs. The restart command warns \"Registry token env var REGISTRY_TOKEN not set, skipping registry secret\" but doesn't explain how to set it. For GHCR private images, this is required and the flow from spec.yml -> config.env -> imagePullSecret needs documentation."}}
{"type": "create", "timestamp": "2026-03-18T19:10:00.000000Z", "issue_id": "so-k1k", "payload": {"title": "Stack path resolution differs between deploy create and deployment restart", "type": "bug", "priority": "2", "description": "deploy create resolves --stack as a relative path from cwd. deployment restart resolves --stack-path as absolute, then computes repo_root as 4 parents up (assuming stack_orchestrator/data/stacks/name structure). External stacks with different nesting depths (e.g. stack-orchestrator/stacks/name = 3 levels) get wrong repo_root, causing --spec-file resolution to fail. The two commands should use the same path resolution logic."}}
{"type": "create", "timestamp": "2026-03-18T19:25:00.000000Z", "issue_id": "so-l2l", "payload": {"title": "deployment restart should update in place, not delete/recreate", "type": "bug", "priority": "1", "description": "deployment restart deletes the entire namespace then recreates everything from scratch. This causes:\n\n1. **Downtime** — nothing serves traffic between delete and successful recreate\n2. **No rollback** — deleting the namespace destroys ReplicaSet revision history\n3. **Race conditions** — namespace may still be terminating when up() tries to create\n4. **Cascading failures** — if ANY container fails to start, the entire site is down with no fallback\n\nFix: three changes needed.\n\n**A. up() should create-or-update, not just create.** Use patch/apply semantics for Deployments, Services, Ingresses. When the pod spec changes (new env vars, new image), k8s creates a new ReplicaSet, scales it up, waits for readiness probes, then scales the old one down. Old pods serve traffic until new pods are healthy.\n\n**B. down() should never delete the namespace on restart.** Only on explicit teardown. The namespace owns the revision history. Current code: _delete_namespace() on every down(). Should: delete individual resources by label for teardown, do nothing for restart (let update-in-place handle it).\n\n**C. All containers need readiness probes.** Without them k8s considers pods ready immediately, defeating rolling update safety. laconic-so should generate readiness probes from the http-proxy routes in spec.yml (if a container has an http route, probe that port).\n\nWith these changes, k8s native rolling updates provide zero-downtime deploys and automatic rollback (if new pods fail readiness, rollout stalls, old pods keep serving).\n\nSource files:\n- deploy_k8s.py: up(), down(), _create_deployment(), _delete_namespace()\n- cluster_info.py: pod spec generation (needs readiness probes)\n- deployment.py: restart() orchestration"}}
{"type": "create", "timestamp": "2026-03-18T20:15:03.000000Z", "issue_id": "so-m3m", "payload": {"title": "Add credentials-files spec key for on-disk credential injection", "type": "feature", "priority": "1", "description": "deployment restart regenerates config.env from spec.yml, wiping credentials that were appended from on-disk files (e.g. ~/.credentials/*.env). Operators must append credentials after deploy create, which is fragile and breaks on restart.\n\nFix: New top-level spec key credentials-files. _write_config_file() reads each file and appends its contents to config.env after writing config vars. Files are read at deploy time from the deployment host.\n\nSpec syntax:\n  credentials-files:\n    - ~/.credentials/dumpster-secrets.env\n    - ~/.credentials/dumpster-r2.env\n\nFiles:\n- deploy/spec.py: add get_credentials_files() returning list of paths\n- deploy/deployment_create.py: in _write_config_file(), after writing config vars, read and append each credentials file (expand ~ to home dir)\n\nAlso update dumpster-stack spec.yml to use the new key and remove the ansible credential append workaround from woodburn_deployer (group_vars/all.yml credentials_env_files, stack_deploy role append tasks, restart_dumpster.yml credential steps). Those cleanups are in the woodburn_deployer repo."}}
{"type":"status_update","timestamp":"2026-03-18T21:54:12.59148256Z","issue_id":"so-m3m","payload":{"status":"in_progress"}}
{"type":"close","timestamp":"2026-03-18T21:55:31.6035544Z","issue_id":"so-m3m","payload":{}}
{"type": "create", "timestamp": "2026-03-20T23:05:00.000000Z", "issue_id": "so-n1n", "payload": {"title": "Merge kind-mount-propagation branch — HostToContainer propagation for extraMounts", "type": "feature", "priority": "2", "description": "The kind-mount-root feature was cherry-picked to main (commit 8d03083d) but the mount propagation fix (commit 929bdab8 on branch enya-ac868cc4-kind-mount-propagation-fix) adds HostToContainer propagation so host submounts propagate into the Kind node. This is needed for ZFS child datasets and tmpfs mounts under the root. Cherry-pick 929bdab8 to main."}}
{"type": "create", "timestamp": "2026-03-20T23:05:00.000000Z", "issue_id": "so-o2o", "payload": {"title": "etcd cert backup not persisting across cluster deletion", "type": "bug", "priority": "1", "description": "The extraMount for etcd at data/cluster-backups/<id>/etcd is configured but after cluster deletion the directory is empty. Caddy TLS certificates stored in etcd are lost. Either etcd isn't writing to the host mount, or the cleanup code is deleting the backup. Investigate _clean_etcd_keeping_certs in helpers.py."}}