stack-orchestrator

Commit Graph

Author	SHA1	Message	Date
Prathamesh Musale	a62be4def8	host-metrics: use native-stack name and laconic-so deployment logs `host-metrics` is a native stack -- spec.yml and `laconic-so --stack` both take the bare stack name, not a path. Replace the `docker ps -qf` filter with `laconic-so deployment --dir ... logs` so the verify recipe works regardless of the laconic deployment-hash prefix on the container name.	2026-05-11 11:58:07 +00:00
Prathamesh Musale	4eaca0ecb0	host-metrics: operator README	2026-05-11 10:51:43 +00:00
Prathamesh Musale	f898d65983	host-metrics: entrypoint + offline tests Add telegraf-entrypoint.sh to render telegraf.conf from the template (replacing @@HOST_TAG_BLOCK@@ and @@ZFS_BLOCK@@ markers via awk) and exec telegraf. Add test-telegraf-entrypoint.sh with 8 offline tests (10 assertions) covering marker substitution and required-env validation. Fix run() stderr redirect from >/dev/null 2>&1 to >/dev/null so that entrypoint error output reaches the T6-T8 assertion captures. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-11 10:50:13 +00:00
Prathamesh Musale	f5adcfc77e	host-metrics: remove marker mentions from template comments The entrypoint substitutes @@HOST_TAG_BLOCK@@ and @@ZFS_BLOCK@@ globally; having them inside leading comments would corrupt the rendered file.	2026-05-11 10:45:49 +00:00
Prathamesh Musale	c79da27585	host-metrics: telegraf config template Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-11 10:44:58 +00:00
Prathamesh Musale	f13f347f3a	host-metrics: declare compose file version for consistency	2026-05-11 10:43:45 +00:00
Prathamesh Musale	874c61820d	host-metrics: stack.yml + compose skeleton Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-11 10:40:57 +00:00
prathamesh0	2ff7e5eb77	deploy: restart now force-recreates compose containers (#752 ) Publish / Gate: k8s deploy e2e (push) Failing after 3s Details Deploy Test / Run deploy test suite (push) Failing after 0s Details Publish / Build and publish (push) Has been skipped Details Smoke Test / Run basic test suite (push) Failing after 0s Details Lint Checks / Run linter (push) Failing after 0s Details Webapp Test / Run webapp test suite (push) Failing after 0s Details Operator-reported: editing source files mounted into a service via bind volumes (alert rules, dashboards, scripts, templates, telegraf config) and running 'laconic-so deployment ... restart' did not take effect. Operator had to fall back to 'stop && start' to pick up changes. Root cause: 'restart' calls up_operation, which translates to 'docker compose up -d'. Compose's up only recreates a container when the service definition itself (image, env, ports, volume declarations) changes. Bind-mount target file content is not part of that hash, so the running container kept its old in-memory state (e.g. Grafana's pre-edit provisioning). Add force_recreate kwarg through the deployer interface and have restart pass force_recreate=True. compose path threads through to python_on_whales' compose.up(force_recreate=...). k8s path accepts the kwarg but is a no-op for now (rolling update on unchanged-spec needs a separate fix that stamps the kubectl.kubernetes.io/restartedAt annotation on managed Deployments; tracked in a follow-up).	2026-05-06 15:26:30 +05:30
prathamesh0	cf0e230b66	bug-fix: fix image-overrides usage to load locally build images into kind cluster (#751 ) Webapp Test / Run webapp test suite (push) Failing after 0s Details Lint Checks / Run linter (push) Failing after 0s Details Publish / Gate: k8s deploy e2e (push) Failing after 3s Details Deploy Test / Run deploy test suite (push) Failing after 0s Details Publish / Build and publish (push) Has been skipped Details Smoke Test / Run basic test suite (push) Failing after 0s Details - Cluster setup was only considering images from containers list in `stack.yml` for kind-loading into the cluster; i.e. images from `image_overrides` in spec were not being loaded - This also resulted in laconic-so to attempt kind-loading images not present locally sometimes - Fix: union `image_overrides` values (user-specified local images) with the ones from container-list, filtered to only ones that are actually present on the docker host	2026-05-05 10:08:08 +05:30
prathamesh0	7c65d39bb2	Make deployments self-sufficient and add E2E restart test (#750 ) Smoke Test / Run basic test suite (push) Failing after 0s Details Lint Checks / Run linter (push) Failing after 0s Details Webapp Test / Run webapp test suite (push) Failing after 0s Details K8s Deploy Test / Run deploy test suite on kind/k8s (push) Failing after 0s Details Publish / Gate: k8s deploy e2e (push) Failing after 3s Details Deploy Test / Run deploy test suite (push) Failing after 0s Details Publish / Build and publish (push) Has been skipped Details - `deploy create` now copies each pod's `commands.py` into `<deployment>/hooks/`. `call_stack_deploy_start` loads from there, so `deployment start` / `restart` no longer need the live stack source on disk to run the `start()` hook - Only the `start()` hook is affected. `init`, `setup`, and `create` still load from the live source — they only run at `deploy create` time, when the source is guaranteed to be present - Multi-repo stacks produce `hooks/commands_0.py`, `hooks/commands_1.py`, …; `call_stack_deploy_start` loads them all in sorted order - Adds `tests/k8s-deploy/run-restart-test.sh` covering the full single-repo restart cycle (v1 -> mutate working tree -> `restart` re-copies and re-executes v2) and the multi-repo file-naming + multi-hook invocation. Wired into the existing K8s Deploy Test workflow	2026-04-28 17:28:02 +05:30
prathamesh0	4977e3ff43	k8s: manage Caddy ingress image via spec (so-p3p) (#749 ) Publish / Gate: k8s deploy e2e (push) Failing after 3s Details Deploy Test / Run deploy test suite (push) Failing after 0s Details Publish / Build and publish (push) Has been skipped Details K8s Deployment Control Test / Run deployment control suite on kind/k8s (push) Failing after 0s Details Webapp Test / Run webapp test suite (push) Failing after 0s Details Smoke Test / Run basic test suite (push) Failing after 0s Details Lint Checks / Run linter (push) Failing after 0s Details K8s Deploy Test / Run deploy test suite on kind/k8s (push) Failing after 0s Details Closes so-p3p: - New spec key `caddy-ingress-image`: on fresh install, deploys Caddy with this image; on subsequent `deployment start`, patches the running Caddy Deployment if the image differs. Defaults to the manifest's hardcoded image when absent - When the spec key is absent, SO does not touch a running Caddy — avoids silently reverting an image set out-of-band (ansible playbook, another deployment's spec) - `strategy: Recreate` on the Caddy Deployment manifest (required — hostPort 80/443 deadlocks rolling updates) - Reconcile runs under both `--perform-cluster-management` and the default `--skip-cluster-management` (it's a k8s-API patch, not a cluster-lifecycle op) - Image template by container name rather than string match, so the spec override wins regardless of what the shipped manifest hardcodes - Cluster-scoped caveat documented: `caddy-system` is shared across deployments, so the last `deployment start` that sets the key wins for everyone	2026-04-21 14:40:39 +05:30
prathamesh0	421b83c430	k8s: shared-cluster safety checks and deployment-id decoupling (#748 ) - Kind extraMount compatibility: fail fast at `deployment start` when a new deployment's mounts don't match the running cluster; warn when the first cluster is created without a `kind-mount-root` umbrella; replace the cryptic `ConfigException` with readable errors when the cluster is missing - Auto-ConfigMap for file-level host-path compose volumes (so-7fc): `../config/foo.sh:/opt/foo.sh`-style binds become per-namespace ConfigMaps at deploy start instead of aliasing via the kind extraMount chain. `deploy create` rejects `:rw`, subdirs, and over-budget sources. Deployment-dir layout unchanged - Namespace ownership: stamp the namespace with `laconic.com/deployment-dir` on create; fail loudly if another deployment tries to land in the same namespace. Pre-existing namespaces adopt ownership on next start - deployment-id / cluster-id decoupling: split the two roles (kube context vs resource-name prefix) into separate `deployment.yml` fields. Backward-compat fallback keeps existing resource names stable - Close stale pebbles `so-n1n` and `so-ad7`	2026-04-21 12:17:28 +05:30
prathamesh0	eb4704b563	chore(pebbles): close so-o2o (#747 ) Deploy Test / Run deploy test suite (push) Failing after 0s Details Webapp Test / Run webapp test suite (push) Failing after 0s Details Smoke Test / Run basic test suite (push) Failing after 0s Details Lint Checks / Run linter (push) Failing after 0s Details Publish / Gate: k8s deploy e2e (push) Failing after 3s Details Publish / Build and publish (push) Has been skipped Details Implementation shipped in PR #746. Woodburn migration (one-shot host-kubectl export to seed the backup file) completed manually. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-17 17:40:12 +05:30
prathamesh0	7f4b058066	so-o2o: kubectl-level Caddy cert backup/restore (#746 ) Publish / Gate: k8s deploy e2e (push) Failing after 3s Details Publish / Build and publish (push) Has been skipped Details K8s Deploy Test / Run deploy test suite on kind/k8s (push) Failing after 0s Details Lint Checks / Run linter (push) Failing after 0s Details Deploy Test / Run deploy test suite (push) Failing after 0s Details Webapp Test / Run webapp test suite (push) Failing after 0s Details Smoke Test / Run basic test suite (push) Failing after 0s Details Replaces the etcd-surgery persistence approach with a CronJob that dumps `manager=caddy` Secrets to `{kind-mount-root}/caddy-cert-backup/` every 5 min, and a restore step that applies the file before Caddy starts on a fresh cluster. Closes so-o2o. Deletes `_clean_etcd_keeping_certs` and the etcd+PKI extraMounts. No new spec keys - activates when `kind-mount-root` is set.	2026-04-17 15:36:40 +05:30
prathamesh0	1334900407	so-o2o: detect etcd image dynamically + diagnose whitelist cleanup bugs (#745 ) Replaces the hardcoded `gcr.io/etcd-development/etcd:v3.5.9` in `_clean_etcd_keeping_certs` with a dynamic ref captured from the running Kind node via `crictl`, persisted to `{backup_dir}/etcd-image.txt` and reused on subsequent cleanup runs. Self-adapts to Kind upgrades, no version table to maintain. Testing on Kind v0.32 / etcd 3.6 surfaced two additional bugs in the whitelist cleanup that this PR does not fix (see so-o2o comments): (a) the restore step pipes raw protobuf values through bash `echo`, corrupting binary bytes; (b) the whitelist omits cluster-admin RBAC, SAs, and bootstrap tokens needed by kubeadm's pre-addon health check. Merging this narrow fix + diagnosis trail; follow-up branch will replace the etcd-surgery approach with a kubectl-level Caddy secret backup/restore.	2026-04-17 13:48:30 +05:30
prathamesh0	cf8b7533fe	so-ad7: build per-pod Service for maintenance container (#744 ) Webapp Test / Run webapp test suite (push) Failing after 0s Details Publish / Gate: k8s deploy e2e (push) Failing after 3s Details Publish / Build and publish (push) Has been skipped Details Deploy Test / Run deploy test suite (push) Failing after 0s Details K8s Deploy Test / Run deploy test suite on kind/k8s (push) Failing after 0s Details Lint Checks / Run linter (push) Failing after 0s Details Smoke Test / Run basic test suite (push) Failing after 0s Details - Maintenance-page swap during `restart` was broken: Ingress got patched to point at `{app_name}-{pod_name}-service` for the maintenance pod, but that Service was never created. Caddy had no valid backend, users saw "site cannot be reached" instead of the maintenance page - Root cause: `get_services()` only builds per-pod Services for pods referenced by `http-proxy` routes; the maintenance pod has no http-proxy route by design - Fix: `get_services()` now also includes the container named by `maintenance-service:` in the container-ports map, so its per-pod `Service` gets built and sits idle until the swap window - Also files `so-b9a` (P4) noting the latent fragility in the resolver/builder contract	2026-04-16 15:07:25 +05:30
prathamesh0	fc5dc80058	so-l2l: in-place stop/restart via label-scoped cleanup (#743 ) - `down()` scopes cleanup to a single stack via `app.kubernetes.io/stack` and keeps the namespace `Active` by default - New `stop/down --delete-namespace` flag for opt-in full teardown - `down()` is synchronous - waits until resources are actually gone before returning. Callers can drop their own wait loops - `up()` skip-if-exists for Jobs completes the create-or-replace coverage (other kinds already had it) - Orphan PVs from a prior `stop --delete-namespace` get cleaned on the next `stop --delete-volumes` - Every k8s resource SO creates now carries `app.kubernetes.io/stack` via a new `ClusterInfo._stack_labels()` helper - Closes so-l2l, so-076.2. Also includes pebble audit: closes so-c71, so-b2b, so-k1k; files so-328	2026-04-16 12:10:04 +05:30
prathamesh0	f40913d187	Fix Kind port mappings and configmap source path resolution (#742 ) Publish / Gate: k8s deploy e2e (push) Failing after 2s Details Webapp Test / Run webapp test suite (push) Failing after 0s Details Publish / Build and publish (push) Has been skipped Details Smoke Test / Run basic test suite (push) Failing after 0s Details Lint Checks / Run linter (push) Failing after 0s Details Deploy Test / Run deploy test suite (push) Failing after 0s Details - Only map host ports for services with network_mode: host (80/443 for Caddy always mapped). Previously all compose service ports were mapped unconditionally, causing conflicts with local services like postgres and redis - Use spec configmap values as source paths instead of ignoring them. Fixes configmaps with user-defined paths (e.g. `stack-orchestrator/compose/maintenance`) and home-relative paths (e.g. `~/.credentials/local-certs/s3`) - Read configmap files from deployment dir (`configmaps/{name}/`) when building k8s ConfigMap objects, not from the spec's source path which doesn't exist in the deployment dir - File pebbles: `so-c71` (resolved), `so-078`: self-sufficient deployments (hooks should be copied to deployment dir)	2026-04-14 17:33:47 +05:30
prathamesh0	17b614cb4d	Fix configmap source path resolution for user-defined spec paths (#741 ) Lint Checks / Run linter (push) Failing after 0s Details Webapp Test / Run webapp test suite (push) Failing after 0s Details Smoke Test / Run basic test suite (push) Failing after 0s Details Publish / Gate: k8s deploy e2e (push) Failing after 2s Details Deploy Test / Run deploy test suite (push) Failing after 0s Details Publish / Build and publish (push) Has been skipped Details	2026-04-14 11:30:27 +05:30
prathamesh0	0bf1ea70d5	Add ip mode to external-services for static IP endpoints (#740 ) Publish / Gate: k8s deploy e2e (push) Failing after 2s Details Container Registry Test / Run container registry hosting test on kind/k8s (push) Failing after 0s Details Publish / Build and publish (push) Has been skipped Details Database Test / Run database hosting test on kind/k8s (push) Failing after 0s Details External Stack Test / Run external stack test suite (push) Failing after 0s Details K8s Deploy Test / Run deploy test suite on kind/k8s (push) Failing after 0s Details K8s Deployment Control Test / Run deployment control suite on kind/k8s (push) Failing after 0s Details Webapp Test / Run webapp test suite (push) Failing after 0s Details Smoke Test / Run basic test suite (push) Failing after 0s Details Lint Checks / Run linter (push) Failing after 0s Details Deploy Test / Run deploy test suite (push) Failing after 0s Details ExternalName services only support DNS names (CNAME records), not raw IP addresses. Add an ip mode that creates a headless Service + Endpoints with a static IP, enabling routing to host-network services like Kind gateway IPs or bare-metal endpoints. Spec format: external-services: my-service: ip: 172.18.0.1 port: 8899 Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-04-02 17:53:23 +05:30
prathamesh0	185ebf17f9	Fix failing k8s and external-stack CI test scripts (#739 ) - Add `--perform-cluster-management` to container-registry, k8s-deployment-control, and database test scripts (`--skip-cluster-management` is now the default) - Fix `wait_for_log_output()` in all k8s tests - "No logs available" is non-empty, so the check was passing prematurely - Use HTTPS for container-registry catalog check (Caddy redirects HTTP->HTTPS) - Fix external-stack sync test: sed pattern used `=` but spec is YAML (`: `), so the substitution never matched - Workaround hyphenated env var name (`test-variable-1`) from upstream test-external-stack repo - docker compose v2 rejects hyphens - Quote `echo $log_output` vars to prevent glob expansion in error output - Use stack name (instead of cluster-id) derived namespace in k8s-deployment-control test	2026-04-02 15:00:57 +05:30
A. F. Dudley	eb881ac179	merge upstream: resolve test-k8s-deploy.yml conflict, add workflow_call Keep upstream's schedule/path triggers and install scripts, add workflow_dispatch and workflow_call so publish.yml can gate on it. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-02 05:30:14 +00:00
prathamesh0	7f11766b05	Migrate canonical source from Gitea to GitHub (#738 ) - Update all self-references from `git.vdb.to/cerc-io/stack-orchestrator` to `github.com/cerc-io/stack-orchestrator` (setup.py, pyproject.toml, README, docs, install scripts, cloud-init scripts, stack READMEs) - Fix release download URL pattern (`releases/download/latest` -> `releases/latest/download`) - Port 5 Gitea-only CI workflows to GitHub Actions (k8s-deploy, k8s-deployment-control, container-registry, database, external-stack) - Pin `shiv==1.0.8` in all workflows for reproducible builds - Restrict smoke/deploy/webapp test push triggers to `main` only - Remove `.gitea/` directory - Gitea repo to be archived	2026-04-02 10:58:14 +05:30
A. F. Dudley	a76cae5c70	Merge branch 'main' of github.com:cerc-io/stack-orchestrator	2026-04-02 05:26:10 +00:00
A. F. Dudley	3da23683f6	fix: black formatting, line length, pyright type narrowing - Apply black reformatting to deployer.py, cluster_info.py, deploy_k8s.py - Shorten docstrings exceeding 88 char line limit - Add assert for pyright Optional type narrowing on tls list Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-02 05:22:25 +00:00
A. F. Dudley	63325f68a7	fix: deduplicate container ports by (port, protocol) Compose files with both "8001" (TCP) and "8001/udp" produce separate V1ContainerPort entries that k8s rejects as duplicates. Deduplicate after parsing by (container_port, protocol) key. This was blocking biscayne's agave deployment — the spec has both TCP 8001 (ip_echo) and UDP 8001 (gossip), which generated two UDP 8001 entries. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-02 05:18:10 +00:00
A. F. Dudley	ae1eae5b9b	gate releases on k8s e2e test, remove per-push trigger - test-k8s-deploy.yml: trigger on workflow_call and workflow_dispatch only (not every push/PR) - publish.yml: add needs: e2e job that calls test-k8s-deploy.yml — release is blocked until the k8s e2e suite passes Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-02 04:31:38 +00:00
A. F. Dudley	c95eeeffb8	add k8s deploy e2e test to CI and pre-push hook - .github/workflows/test-k8s-deploy.yml: new workflow that installs kind+kubectl and runs tests/k8s-deploy/run-deploy-test.sh on every push and PR. Same script used locally and in release validation. - .pre-commit-config.yaml: add local pre-push hook that runs the k8s e2e test (~3 min) before pushing to remote. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-02 04:30:37 +00:00
Prathamesh Musale	2afc7ad1ce	Update refs in root readme; test publish workflow Lint Checks / Run linter (push) Failing after 0s Details Webapp Test / Run webapp test suite (push) Failing after 0s Details Smoke Test / Run basic test suite (push) Failing after 0s Details Publish / Build and publish (push) Failing after 0s Details Deploy Test / Run deploy test suite (push) Failing after 0s Details	2026-04-02 09:33:53 +05:30
A. F. Dudley	87761c7041	fix: imagePullPolicy for kind, job images, duplicate registry call, test namespace - deploy_k8s.py: default imagePullPolicy to IfNotPresent for kind (local images loaded via kind load, not pulled from registry) - cluster_info.py: add job images to image_set so they're loaded into kind - deploy_k8s.py: remove duplicate create_registry_secret call (merge artifact) - deploy_k8s.py: fix indentation in run_job job_pull_policy (replace_all damage) - tests/k8s-deploy: update namespace from laconic-{id} to laconic-{stack_name} to match the new stack-derived namespace scheme from wd-a7b All 15 k8s deploy e2e tests pass. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-01 23:34:51 +00:00
A. F. Dudley	66da312f67	fix: base36 IDs for kind-compatible cluster names, test --perform-cluster-management - ids.py: use base36 (lowercase+digits) instead of base62 — kind cluster names must match ^[a-z0-9.-]+$ - k8s deploy test: pass --perform-cluster-management on first start since 'start' defaults to --skip-cluster-management Found by running tests/k8s-deploy/run-deploy-test.sh locally. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-01 21:50:53 +00:00
A. F. Dudley	5bf96112d3	fix lint errors from merge: duplicate def, shadowed import, empty f-string - deployment_create.py: remove duplicate create_registry_secret signature - deploy_k8s.py: rename loop var 'config' to 'svc_config' (shadowed import) - deploy_k8s.py: remove f-prefix from string without placeholders - deploy_k8s.py: suppress pre-existing C901 on _create_volume_data Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-01 19:17:45 +00:00
A. F. Dudley	549ac8c01d	Merge fix/kind-mount-propagation: all local branches unified Merges 6 local branches into main: - enya: HostToContainer mount propagation for kind-mount-root - fix/k8s-port-mappings-v5: port protocol parsing, namespace fix - peirce: idempotent deploy (create-or-replace), update-envs rename - prince: etcd cleanup whitelist - wd-a7b: timestamp cluster IDs, stack-derived namespaces, jobs, multi-cert ingress, user secrets, _build_containers refactor - fix/kind-mount-propagation: deployment prepare command, pebbles Conflicts resolved keeping main's evolved multi-pod architecture (get_deployments, per-pod Services, CA cert injection) while incorporating branch additions (HostToContainer propagation, user secrets, jobs support). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-01 18:26:05 +00:00
A. F. Dudley	d50bd2b6d2	Merge wd-a7b: cluster-id/namespace naming, jobs, multi-cert, secrets Combines timestamp-based cluster IDs, namespace derived from stack name, _build_containers refactor, jobs support, multi-ingress certificates, user-declared secrets, and label-based resource cleanup with the existing idempotent deploy, mount propagation, and port mapping fixes. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-01 18:22:07 +00:00
A. F. Dudley	2307696a66	Merge fix/k8s-port-mappings-v5 into fix/kind-mount-propagation Resolve conflicts keeping HostToContainer propagation on mount root entry and per-container resource layering from the propagation branch. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-01 17:06:25 +00:00
A. F. Dudley	c64820ad5c	Merge branch 'enya-ac868cc4-kind-mount-propagation-fix' into fix/kind-mount-propagation	2026-04-01 17:05:06 +00:00
A. F. Dudley	3e3f349151	Merge remote-tracking branch 'cerc-io.github.com/main' # Conflicts: # stack_orchestrator/deploy/deployment_create.py # stack_orchestrator/deploy/k8s/deploy_k8s.py	2026-04-01 14:47:46 +00:00
Prathamesh Musale	33d3474d7d	Fix registry secret created in wrong namespace (#998 ) Publish / Build and publish (push) Failing after 0s Details Deploy Test / Run deploy test suite (push) Failing after 0s Details Webapp Test / Run webapp test suite (push) Failing after 0s Details Lint Checks / Run linter (push) Failing after 0s Details Smoke Test / Run basic test suite (push) Failing after 0s Details `create_registry_secret()` was hardcoded to use the "default" namespace, but pods are deployed to the spec's configured namespace. The secret must be in the same namespace as the pods for `imagePullSecrets` to work. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Reviewed-on: https://git.vdb.to/cerc-io/stack-orchestrator/pulls/998 Co-authored-by: Prathamesh Musale <prathamesh.musale0@gmail.com> Co-committed-by: Prathamesh Musale <prathamesh.musale0@gmail.com>	2026-03-26 08:36:39 +00:00
Snake Game Developer	90e32ffd60	Support image-overrides in spec for testing Spec can override container images: image-overrides: dumpster-kubo: ghcr.io/.../dumpster-kubo:test-tag Merged with CLI overrides (CLI wins). Enables testing with GHCR-pushed test tags without modifying compose files. Also reverts the image-pull-policy spec key (not needed — the fix is to use proper GHCR tags, not IfNotPresent). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-22 01:02:23 +00:00
Snake Game Developer	1052a1d4e7	Support image-pull-policy in spec (default: Always) Testing specs can set image-pull-policy: IfNotPresent so kind-loaded local images are used instead of pulling from the registry. Production specs omit the key and get the default Always behavior. Root cause: with Always, k8s pulled the GHCR kubo image (with baked R2 endpoint) instead of the locally-built image (with https://s3:443), causing kubo to connect to R2 directly and get Unauthorized. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-21 20:17:06 +00:00
Snake Game Developer	f93541f7db	Fix CA cert mounting: subPath for Go, expanduser for configmaps - CA certs mounted via subPath into /etc/ssl/certs/ so Go's x509 picks them up (directory mount replaces the entire dir) - get_configmaps() now expands ~ in paths via os.path.expanduser() - Both changes discovered during testing with mkcert + MinIO Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-21 19:27:14 +00:00
Snake Game Developer	713a81c245	Add external-services and ca-certificates spec keys New spec.yml features for routing external service dependencies: external-services: s3: host: example.com # ExternalName Service (production) port: 443 s3: selector: {app: mock} # headless Service + Endpoints (testing) namespace: mock-ns port: 443 ca-certificates: - ~/.local/share/mkcert/rootCA.pem # testing only laconic-so creates the appropriate k8s Service type per mode: - host mode: ExternalName (DNS CNAME to external provider) - selector mode: headless Service + Endpoints with pod IPs discovered from the target namespace at deploy time ca-certificates mounts CA files into all containers at /etc/ssl/certs/ and sets NODE_EXTRA_CA_CERTS for Node/Bun. Also includes the previously committed PV Released state fix. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-21 15:25:47 +00:00
Snake Game Developer	98ff221a21	Fix PV rebinding after deployment stop/start cycle deployment stop deletes the namespace (and PVCs) but preserves PVs by default. On the next deployment start, PVs are in Released state with a stale claimRef pointing at the deleted PVC. New PVCs cannot bind to Released PVs, so pods get stuck in Pending. Clear the claimRef on any Released PV during _create_volume_data() so the PV returns to Available and can accept new PVC bindings. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-21 07:47:23 +00:00
A. F. Dudley	7141dc7637	file so-p3p: laconic-so should manage Caddy ingress image lifecycle Lint Checks / Run linter (push) Failing after 0s Details Publish / Build and publish (push) Failing after 0s Details Deploy Test / Run deploy test suite (push) Failing after 0s Details Smoke Test / Run basic test suite (push) Failing after 0s Details Webapp Test / Run webapp test suite (push) Failing after 0s Details Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-21 00:30:46 +00:00
A. F. Dudley	2555df06b5	fix: use patched Caddy ingress image with ACME storage fix Switch from caddy/ingress:latest to ghcr.io/laconicnetwork/caddy-ingress:latest which has the List()/Stat() fix for secret_store. This fixes multi-domain ACME provisioning deadlock where the second domain's cert request fails because List() returns mangled keys and Stat() returns wrong IsTerminal. Source: LaconicNetwork/ingress@109d69a (fix/acme-account-reuse branch) Fixes: so-o2o (partially — etcd backup investigation still needed) Closes: ds-v22v (Caddy sequential provisioning no longer needed) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-20 23:31:39 +00:00
A. F. Dudley	24cf22fea5	File pebbles: mount propagation merge + etcd cert backup broken Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-20 23:01:20 +00:00
A. F. Dudley	8d03083d0d	feat: add kind-mount-root for unified Kind extraMount Publish / Build and publish (push) Failing after 0s Details Webapp Test / Run webapp test suite (push) Failing after 0s Details Smoke Test / Run basic test suite (push) Failing after 0s Details Lint Checks / Run linter (push) Failing after 0s Details Deploy Test / Run deploy test suite (push) Failing after 0s Details When kind-mount-root is set in spec.yml, emit a single extraMount mapping the root to /mnt instead of per-volume mounts. This allows adding new volumes without recreating the Kind cluster. Volumes whose host path is under the root skip individual extraMounts and their PV paths resolve to /mnt/{relative_path}. Volumes outside the root keep individual extraMounts as before. Cherry-picked from branch enya-ac868cc4-kind-mount-propagation-fix (commits `b6d6ad81`, `929bdab8`) and adapted for current main. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-20 21:28:40 +00:00
A. F. Dudley	9109cfb7a1	feat: add token-file option for image-pull-secret registry auth Adds token-file key to image-pull-secret spec config. Reads the registry token from a file on disk instead of requiring an environment variable. File path supports ~ expansion. Falls back to token-env if token-file is not set or file doesn't exist. This lets operators store the GHCR token in ~/.credentials/ alongside other secrets, removing the need for ansible to pass REGISTRY_TOKEN as an env var. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-20 19:30:44 +00:00
A. F. Dudley	61afeb255c	fix: keep cwd at repo root through entire restart, revert try/except The stack path in spec.yml is relative — both create_operation and up_operation need cwd at the repo root for stack_is_external() to resolve it. Move os.chdir(prev_cwd) to after up_operation completes instead of between the two operations. Reverts the SystemExit catch in call_stack_deploy_start — the root cause was cwd, not the hook. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-20 15:54:46 +00:00
A. F. Dudley	32f6e57b70	fix: ConfigMap volumes don't force Recreate strategy + resilient hooks Two fixes for multi-deployment: 1. _pod_has_pvcs now excludes ConfigMap volumes from PVC detection. Pods with only ConfigMap volumes (like maintenance) correctly get RollingUpdate strategy instead of Recreate. 2. call_stack_deploy_start catches SystemExit when stack path doesn't resolve from cwd (common during restart). Most stacks don't have deploy hooks, so this is non-fatal. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-20 15:51:58 +00:00

1 2 3 4 5 ...

1271 Commits (a62be4def8536b857e1dd4812ddc0851af7cbce8) All Branches Search

1271 Commits (a62be4def8536b857e1dd4812ddc0851af7cbce8)

All Branches