stack-orchestrator

Commit Graph

Author	SHA1	Message	Date
Prathamesh Musale	a6d54c7bf8	feat(k8s): manage Caddy ingress image lifecycle via spec (so-p3p) The Caddy ingress image was hardcoded in the component manifest and had no update path shy of cluster recreate or manual kubectl patch. That forced woodburn to run an out-of-band ansible playbook to bump Caddy, and broke the "spec.yml is source of truth" model. Changes: - spec.yml: new `caddy-ingress-image` key (default `ghcr.io/laconicnetwork/caddy-ingress:latest`). - Deployment manifest: `strategy: Recreate` on the Caddy Deployment — required because the pod binds hostPort 80/443, which prevents any rolling update from completing (new pod hangs Pending forever waiting for old pod to release the ports). - install_ingress_for_kind: accepts caddy_image and templates the manifest before applying, same pattern as the existing acme-email templating. - update_caddy_ingress_image: patches the running Caddy Deployment when the spec image differs from the live image. No-op if they match. Returns True if a patch was applied so the caller can wait for the rollout. - deploy_k8s._setup_cluster: on cluster reuse (ingress already up), reconcile the running image against the spec. Installs path unchanged; only the "already running, maybe needs update" branch is new. Cluster-scoped caveat: caddy-system is shared by every deployment on the cluster, so the spec value in any one deployment rolls Caddy for all of them — last `deployment start` wins. Documented in deployment_patterns.md alongside the other cluster-scoped concerns (kind-mount-root, namespace ownership). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-21 06:51:53 +00:00
prathamesh0	421b83c430	k8s: shared-cluster safety checks and deployment-id decoupling (#748 ) - Kind extraMount compatibility: fail fast at `deployment start` when a new deployment's mounts don't match the running cluster; warn when the first cluster is created without a `kind-mount-root` umbrella; replace the cryptic `ConfigException` with readable errors when the cluster is missing - Auto-ConfigMap for file-level host-path compose volumes (so-7fc): `../config/foo.sh:/opt/foo.sh`-style binds become per-namespace ConfigMaps at deploy start instead of aliasing via the kind extraMount chain. `deploy create` rejects `:rw`, subdirs, and over-budget sources. Deployment-dir layout unchanged - Namespace ownership: stamp the namespace with `laconic.com/deployment-dir` on create; fail loudly if another deployment tries to land in the same namespace. Pre-existing namespaces adopt ownership on next start - deployment-id / cluster-id decoupling: split the two roles (kube context vs resource-name prefix) into separate `deployment.yml` fields. Backward-compat fallback keeps existing resource names stable - Close stale pebbles `so-n1n` and `so-ad7`	2026-04-21 12:17:28 +05:30
prathamesh0	eb4704b563	chore(pebbles): close so-o2o (#747 ) Deploy Test / Run deploy test suite (push) Failing after 0s Details Webapp Test / Run webapp test suite (push) Failing after 0s Details Smoke Test / Run basic test suite (push) Failing after 0s Details Lint Checks / Run linter (push) Failing after 0s Details Publish / Gate: k8s deploy e2e (push) Failing after 3s Details Publish / Build and publish (push) Has been skipped Details Implementation shipped in PR #746. Woodburn migration (one-shot host-kubectl export to seed the backup file) completed manually. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-17 17:40:12 +05:30
prathamesh0	1334900407	so-o2o: detect etcd image dynamically + diagnose whitelist cleanup bugs (#745 ) Replaces the hardcoded `gcr.io/etcd-development/etcd:v3.5.9` in `_clean_etcd_keeping_certs` with a dynamic ref captured from the running Kind node via `crictl`, persisted to `{backup_dir}/etcd-image.txt` and reused on subsequent cleanup runs. Self-adapts to Kind upgrades, no version table to maintain. Testing on Kind v0.32 / etcd 3.6 surfaced two additional bugs in the whitelist cleanup that this PR does not fix (see so-o2o comments): (a) the restore step pipes raw protobuf values through bash `echo`, corrupting binary bytes; (b) the whitelist omits cluster-admin RBAC, SAs, and bootstrap tokens needed by kubeadm's pre-addon health check. Merging this narrow fix + diagnosis trail; follow-up branch will replace the etcd-surgery approach with a kubectl-level Caddy secret backup/restore.	2026-04-17 13:48:30 +05:30
prathamesh0	cf8b7533fe	so-ad7: build per-pod Service for maintenance container (#744 ) Webapp Test / Run webapp test suite (push) Failing after 0s Details Publish / Gate: k8s deploy e2e (push) Failing after 3s Details Publish / Build and publish (push) Has been skipped Details Deploy Test / Run deploy test suite (push) Failing after 0s Details K8s Deploy Test / Run deploy test suite on kind/k8s (push) Failing after 0s Details Lint Checks / Run linter (push) Failing after 0s Details Smoke Test / Run basic test suite (push) Failing after 0s Details - Maintenance-page swap during `restart` was broken: Ingress got patched to point at `{app_name}-{pod_name}-service` for the maintenance pod, but that Service was never created. Caddy had no valid backend, users saw "site cannot be reached" instead of the maintenance page - Root cause: `get_services()` only builds per-pod Services for pods referenced by `http-proxy` routes; the maintenance pod has no http-proxy route by design - Fix: `get_services()` now also includes the container named by `maintenance-service:` in the container-ports map, so its per-pod `Service` gets built and sits idle until the swap window - Also files `so-b9a` (P4) noting the latent fragility in the resolver/builder contract	2026-04-16 15:07:25 +05:30
prathamesh0	fc5dc80058	so-l2l: in-place stop/restart via label-scoped cleanup (#743 ) - `down()` scopes cleanup to a single stack via `app.kubernetes.io/stack` and keeps the namespace `Active` by default - New `stop/down --delete-namespace` flag for opt-in full teardown - `down()` is synchronous - waits until resources are actually gone before returning. Callers can drop their own wait loops - `up()` skip-if-exists for Jobs completes the create-or-replace coverage (other kinds already had it) - Orphan PVs from a prior `stop --delete-namespace` get cleaned on the next `stop --delete-volumes` - Every k8s resource SO creates now carries `app.kubernetes.io/stack` via a new `ClusterInfo._stack_labels()` helper - Closes so-l2l, so-076.2. Also includes pebble audit: closes so-c71, so-b2b, so-k1k; files so-328	2026-04-16 12:10:04 +05:30
prathamesh0	f40913d187	Fix Kind port mappings and configmap source path resolution (#742 ) Publish / Gate: k8s deploy e2e (push) Failing after 2s Details Webapp Test / Run webapp test suite (push) Failing after 0s Details Publish / Build and publish (push) Has been skipped Details Smoke Test / Run basic test suite (push) Failing after 0s Details Lint Checks / Run linter (push) Failing after 0s Details Deploy Test / Run deploy test suite (push) Failing after 0s Details - Only map host ports for services with network_mode: host (80/443 for Caddy always mapped). Previously all compose service ports were mapped unconditionally, causing conflicts with local services like postgres and redis - Use spec configmap values as source paths instead of ignoring them. Fixes configmaps with user-defined paths (e.g. `stack-orchestrator/compose/maintenance`) and home-relative paths (e.g. `~/.credentials/local-certs/s3`) - Read configmap files from deployment dir (`configmaps/{name}/`) when building k8s ConfigMap objects, not from the spec's source path which doesn't exist in the deployment dir - File pebbles: `so-c71` (resolved), `so-078`: self-sufficient deployments (hooks should be copied to deployment dir)	2026-04-14 17:33:47 +05:30
A. F. Dudley	549ac8c01d	Merge fix/kind-mount-propagation: all local branches unified Merges 6 local branches into main: - enya: HostToContainer mount propagation for kind-mount-root - fix/k8s-port-mappings-v5: port protocol parsing, namespace fix - peirce: idempotent deploy (create-or-replace), update-envs rename - prince: etcd cleanup whitelist - wd-a7b: timestamp cluster IDs, stack-derived namespaces, jobs, multi-cert ingress, user secrets, _build_containers refactor - fix/kind-mount-propagation: deployment prepare command, pebbles Conflicts resolved keeping main's evolved multi-pod architecture (get_deployments, per-pod Services, CA cert injection) while incorporating branch additions (HostToContainer propagation, user secrets, jobs support). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-01 18:26:05 +00:00
A. F. Dudley	7141dc7637	file so-p3p: laconic-so should manage Caddy ingress image lifecycle Lint Checks / Run linter (push) Failing after 0s Details Publish / Build and publish (push) Failing after 0s Details Deploy Test / Run deploy test suite (push) Failing after 0s Details Smoke Test / Run basic test suite (push) Failing after 0s Details Webapp Test / Run webapp test suite (push) Failing after 0s Details Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-21 00:30:46 +00:00
A. F. Dudley	24cf22fea5	File pebbles: mount propagation merge + etcd cert backup broken Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-20 23:01:20 +00:00
A. F. Dudley	967936e524	Multi-deployment: one k8s Deployment per pod in stack.yml Lint Checks / Run linter (push) Failing after 0s Details Deploy Test / Run deploy test suite (push) Failing after 0s Details Webapp Test / Run webapp test suite (push) Failing after 0s Details Publish / Build and publish (push) Failing after 0s Details Smoke Test / Run basic test suite (push) Failing after 0s Details Each pod entry in stack.yml now creates its own k8s Deployment with independent lifecycle and update strategy. Pods with PVCs get Recreate, pods without get RollingUpdate. This enables maintenance services that survive main pod restarts. - cluster_info: get_deployments() builds per-pod Deployments, Services - cluster_info: Ingress routes to correct per-pod Service - deploy_k8s: _create_deployment() iterates all Deployments/Services - deployment: restart swaps Ingress to maintenance service during Recreate - spec: add maintenance-service key Single-pod stacks are backward compatible (same resource names). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-20 01:40:45 +00:00
A. F. Dudley	e5a8ec5f06	fix: rename registry secret to image-pull-secret The secret name `{app}-registry` is ambiguous — it could be a container registry credential or a Laconic registry config. Rename to `{app}-image-pull-secret` which clearly describes its purpose as a Kubernetes imagePullSecret for private container registries. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-18 15:33:11 +00:00
A. F. Dudley	9c5b8e3f4e	chore: initialize pebbles issue tracker Track stack-orchestrator work items with pebbles (append-only event log). Epic so-076: Stack composition — deploy multiple stacks into one kind cluster with independent lifecycle management per sub-stack. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-08 06:56:25 +00:00

13 Commits (616475ce2d5f39124e9e0917b46023bde34a17b0)