stack-orchestrator

Commit Graph

Author	SHA1	Message	Date
A. F. Dudley	6923e1c23b	refactor: extract methods from K8sDeployer.up to fix C901 complexity Split up() into _setup_cluster(), _create_ingress(), _create_nodeports(). Reduces cyclomatic complexity below the flake8 threshold. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-20 15:20:50 +00:00
A. F. Dudley	5b8303f8f9	fix: resolve stack path from repo root + update deploy test - chdir to git repo root before create_operation so relative stack paths in spec.yml resolve correctly via stack_is_external() - Update deploy test: config.env is now regenerated from spec on --update (matching `72aabe7d` behavior), verify backup exists Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-20 15:14:47 +00:00
A. F. Dudley	0ac886bf95	fix: chdir to repo root before create_operation in restart The spec's "stack:" value is a relative path that must resolve from the repo root. stack_is_external() checks Path(stack).exists() from cwd, which fails when cwd isn't the repo root. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-20 15:06:38 +00:00
A. F. Dudley	2484abfcce	fix: use git rev-parse for repo root in restart command The repo_root calculation assumed stack paths are always 4 levels deep (stack_orchestrator/data/stacks/name). External stacks with different nesting (e.g. stack-orchestrator/stacks/name = 3 levels) got the wrong root, causing --spec-file resolution to fail. Use git rev-parse --show-toplevel instead. Fixes: so-k1k Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-20 15:03:24 +00:00
A. F. Dudley	967936e524	Multi-deployment: one k8s Deployment per pod in stack.yml Lint Checks / Run linter (push) Failing after 0s Details Deploy Test / Run deploy test suite (push) Failing after 0s Details Webapp Test / Run webapp test suite (push) Failing after 0s Details Publish / Build and publish (push) Failing after 0s Details Smoke Test / Run basic test suite (push) Failing after 0s Details Each pod entry in stack.yml now creates its own k8s Deployment with independent lifecycle and update strategy. Pods with PVCs get Recreate, pods without get RollingUpdate. This enables maintenance services that survive main pod restarts. - cluster_info: get_deployments() builds per-pod Deployments, Services - cluster_info: Ingress routes to correct per-pod Service - deploy_k8s: _create_deployment() iterates all Deployments/Services - deployment: restart swaps Ingress to maintenance service during Recreate - spec: add maintenance-service key Single-pod stacks are backward compatible (same resource names). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-20 01:40:45 +00:00
A. F. Dudley	6ace024cd3	fix: use replace instead of patch for k8s resource updates Lint Checks / Run linter (push) Failing after 0s Details Publish / Build and publish (push) Failing after 0s Details Deploy Test / Run deploy test suite (push) Failing after 0s Details Webapp Test / Run webapp test suite (push) Failing after 0s Details Smoke Test / Run basic test suite (push) Failing after 0s Details Strategic merge patch preserves fields not present in the patch body. This means removed volumes, ports, and env vars persist in the running Deployment after a restart. Replace sends the complete spec built from the current compose files — removed fields are actually deleted. Affects Deployment, Service, Ingress, and NodePort updates. Service replace preserves clusterIP (immutable field) by reading it from the existing resource before replacing. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-19 03:44:57 +00:00
A. F. Dudley	ea610bb8d6	Merge branch 'cv-c3c-image-flag-for-restart' # Conflicts: # stack_orchestrator/deploy/k8s/deploy_k8s.py	2026-03-18 23:04:55 +00:00
A. F. Dudley	4b1fc27a1e	cv-c3c: add --image flag to deployment restart command Allows callers to override container images during restart, e.g.: laconic-so deployment restart --image backend=ghcr.io/org/app:sha123 The override is applied to the k8s Deployment spec before create-or-patch. Docker/compose deployers accept the parameter but ignore it. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-18 22:42:56 +00:00
A. F. Dudley	25e5ff09d9	so-m3m: add credentials-files spec key for on-disk credential injection _write_config_file() now reads each file listed under the credentials-files top-level spec key and appends its contents to config.env after config vars. Paths support ~ expansion. Missing files fail hard with sys.exit(1). Also adds get_credentials_files() to Spec class following the same pattern as get_image_registry_config(). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-18 21:55:28 +00:00
A. F. Dudley	0e4ecc3602	refactor: rename registry-credentials to image-pull-secret in spec The spec key `registry-credentials` was ambiguous — could mean container registry auth or Laconic registry config. Rename to `image-pull-secret` which matches the k8s secret name it creates. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-18 21:38:31 +00:00
A. F. Dudley	dc15c0f4a5	feat: auto-generate readiness probes from http-proxy routes Lint Checks / Run linter (push) Failing after 0s Details Publish / Build and publish (push) Failing after 0s Details Deploy Test / Run deploy test suite (push) Failing after 0s Details Webapp Test / Run webapp test suite (push) Failing after 0s Details Smoke Test / Run basic test suite (push) Failing after 0s Details Containers referenced in spec.yml http-proxy routes now get TCP readiness probes on the proxied port. This tells k8s when a container is actually ready to serve traffic. Without readiness probes, k8s considers pods ready immediately after start, which means: - Rolling updates cut over before the app is listening - Broken containers look "ready" and receive traffic (502s) - kubectl rollout undo has nothing to roll back to The probes use TCP socket checks (not HTTP) to work with any protocol. Initial delay 5s, check every 10s, fail after 3 consecutive failures. Closes so-l2l part C. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-18 19:43:09 +00:00
A. F. Dudley	2d11ca7bb0	feat: update-in-place deployments with rolling updates Replace the destroy-and-recreate deployment model with in-place updates. deploy_k8s.py: All resource creation (Deployment, Service, Ingress, NodePort, ConfigMap) now uses create-or-update semantics. If a resource already exists (409 Conflict), it patches instead of failing. For Deployments, this triggers a k8s rolling update — old pods serve traffic until new pods pass readiness checks. deployment.py: restart() no longer calls down(). It just calls up() which patches existing resources. No namespace deletion, no downtime gap, no race conditions. k8s handles the rollout. This gives: - Zero-downtime deploys (old pods serve during rollout) - Automatic rollback (if new pods fail readiness, rollout stalls) - Manual rollback via kubectl rollout undo Closes so-l2l (parts A and B). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-18 19:40:20 +00:00
A. F. Dudley	ba39c991f1	fix: create imagePullSecret in deployment namespace, not default create_registry_secret() hardcoded namespace="default" but deployments now run in dedicated laconic-* namespaces. The secret was invisible to pods in the deployment namespace, causing 401 on GHCR pulls. Accept namespace as parameter, passed from deploy_k8s.py which knows the correct namespace. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-18 19:08:52 +00:00
A. F. Dudley	0b3e5559d0	fix: wait for namespace termination in down() before returning Reverts the label-based deletion approach — resources created by older laconic-so lack labels, so label queries return empty results. Namespace deletion is the only reliable cleanup. Adds _wait_for_namespace_gone() so down() blocks until the namespace is fully terminated. This prevents the race condition where up() tries to create resources in a still-terminating namespace (403 Forbidden). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-18 18:49:38 +00:00
A. F. Dudley	ae2cea3410	fix: never delete namespace on deployment down down() deleted the entire namespace when it wasn't explicitly set in the spec. This causes a race condition on restart: up() tries to create resources in a namespace that's still terminating, getting 403 Forbidden. Always use _delete_resources_by_label() instead. The namespace is cheap to keep and required for immediate up() after down(). This also matches the shared-namespace behavior, making down() consistent regardless of namespace configuration. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-18 18:47:05 +00:00
A. F. Dudley	e298e7444f	fix: add auto-generated header to config.env config.env is regenerated from spec.yml on every deploy create and restart, silently overwriting manual edits. Add a header comment explaining this so operators know to edit spec.yml instead. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-18 18:24:27 +00:00
A. F. Dudley	e5a8ec5f06	fix: rename registry secret to image-pull-secret The secret name `{app}-registry` is ambiguous — it could be a container registry credential or a Laconic registry config. Rename to `{app}-image-pull-secret` which clearly describes its purpose as a Kubernetes imagePullSecret for private container registries. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-18 15:33:11 +00:00
A. F. Dudley	0bbb51067c	fix: set imagePullPolicy=Always for kind deployments Lint Checks / Run linter (push) Failing after 0s Details Kind deployments used imagePullPolicy=None (defaults to IfNotPresent), which means the kind node caches images by tag and never re-pulls from the local registry. After a container rebuild + registry push, the pod keeps using the stale cached image. Set Always for all deployment types so k8s re-pulls on every pod restart. With a local registry this adds negligible overhead. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-17 17:44:35 +00:00
A. F. Dudley	72aabe7d9a	fix: deploy create --update now syncs config.env from spec Publish / Build and publish (push) Failing after 0s Details Deploy Test / Run deploy test suite (push) Failing after 0s Details Webapp Test / Run webapp test suite (push) Failing after 0s Details Smoke Test / Run basic test suite (push) Failing after 0s Details Lint Checks / Run linter (push) Failing after 0s Details The --update path excluded config.env from the safe_copy_tree, which meant new config vars added to spec.yml were never written to config.env. The XXX comment already flagged this as broken. Remove config.env from exclude_patterns so --update regenerates it from spec.yml like the non-update path does. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-17 08:20:45 +00:00
A. F. Dudley	36c37d2bde	wd-a7b: Fix cluster-id and namespace naming - Replace token_hex cluster IDs with sortable timestamp-based IDs (laconic-{base62_timestamp}{random_suffix}) via new ids.py module - Check for existing Kind cluster before generating a new cluster-id - Derive k8s namespace from stack name instead of compose_project_name (e.g. laconic-dumpster instead of laconic-<random>) - Plumb namespace through to secret generation instead of hardcoding 'default' Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-16 08:01:11 +00:00
afd	8a7491d3e0	Support multiple http-proxy entries in a single deployment Lint Checks / Run linter (push) Failing after 0s Details Previously get_ingress() only used the first http-proxy entry, silently ignoring additional hostnames. Now iterates over all entries, creating an Ingress rule and TLS config per hostname. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-16 06:16:28 +00:00
Prathamesh Musale	e7483bc7d1	Add init containers, shared namespaces, per-volume sizing, and user/label support (#997 ) Lint Checks / Run linter (push) Failing after 0s Details Publish / Build and publish (push) Failing after 0s Details Webapp Test / Run webapp test suite (push) Failing after 0s Details Deploy Test / Run deploy test suite (push) Failing after 0s Details Smoke Test / Run basic test suite (push) Failing after 0s Details Reviewed-on: https://git.vdb.to/cerc-io/stack-orchestrator/pulls/997 Co-authored-by: Prathamesh Musale <prathamesh.musale0@gmail.com> Co-committed-by: Prathamesh Musale <prathamesh.musale0@gmail.com>	2026-03-12 10:34:45 +00:00
Prathamesh Musale	5af6a83fa2	Add Job and secrets support for k8s-kind deployments (#995 ) Publish / Build and publish (push) Failing after 0s Details K8s Deployment Control Test / Run deployment control suite on kind/k8s (push) Failing after 0s Details K8s Deploy Test / Run deploy test suite on kind/k8s (push) Failing after 0s Details Webapp Test / Run webapp test suite (push) Failing after 0s Details Smoke Test / Run basic test suite (push) Failing after 0s Details Lint Checks / Run linter (push) Failing after 0s Details Deploy Test / Run deploy test suite (push) Failing after 0s Details Part of https://plan.wireit.in/deepstack/browse/VUL-315 Reviewed-on: https://git.vdb.to/cerc-io/stack-orchestrator/pulls/995 Co-authored-by: Prathamesh Musale <prathamesh.musale0@gmail.com> Co-committed-by: Prathamesh Musale <prathamesh.musale0@gmail.com>	2026-03-11 03:56:21 +00:00
AFDudley	8cc0a9a19a	add/local-test-runner (#996 ) Publish / Build and publish (push) Failing after 0s Details Deploy Test / Run deploy test suite (push) Failing after 0s Details Webapp Test / Run webapp test suite (push) Failing after 0s Details Lint Checks / Run linter (push) Failing after 0s Details Smoke Test / Run basic test suite (push) Failing after 0s Details Co-authored-by: A. F. Dudley <a.frederick.dudley@gmail.com> Reviewed-on: https://git.vdb.to/cerc-io/stack-orchestrator/pulls/996	2026-03-09 20:04:58 +00:00
A. F. Dudley	974eed0c73	feat: add `deployment prepare` command (so-076.1) Refactors K8sDeployer.up() into three composable methods: - _setup_cluster_and_namespace(): kind cluster, API, namespace, ingress - _create_infrastructure(): PVs, PVCs, ConfigMaps, Services, NodePorts - _create_deployment(): Deployment resource (pods) `prepare` calls the first two only — creates all cluster infrastructure without starting pods. This eliminates the scale-to-0 workaround where operators had to run `deployment start` then immediately scale down. Usage: laconic-so deployment --dir <dir> prepare Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-08 06:56:34 +00:00
A. F. Dudley	9c5b8e3f4e	chore: initialize pebbles issue tracker Track stack-orchestrator work items with pebbles (append-only event log). Epic so-076: Stack composition — deploy multiple stacks into one kind cluster with independent lifecycle management per sub-stack. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-08 06:56:25 +00:00
A. F. Dudley	14f423ea0c	fix(k8s): read existing resourceVersion/clusterIP before replace K8s PUT (replace) operations require metadata.resourceVersion for optimistic concurrency control. Services additionally have immutable spec.clusterIP that must be preserved from the existing object. On 409 conflict, all _ensure_* methods now read the existing resource first and copy resourceVersion (and clusterIP for Services) into the body before calling replace. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-08 04:32:20 +00:00
A. F. Dudley	1da69cf739	fix(k8s): make deploy_k8s.py idempotent with create-or-replace semantics All K8s resource creation in deploy_k8s.py now uses try-create, catch ApiException(409), then replace — matching the pattern already used for secrets in deployment_create.py. This allows `deployment start` to be safely re-run without 409 Conflict errors. Resources made idempotent: - Deployment (create_namespaced_deployment → replace on 409) - Service (create_namespaced_service → replace on 409) - Ingress (create_namespaced_ingress → replace on 409) - NodePort services (same as Service) - ConfigMap (create_namespaced_config_map → replace on 409) - PV/PVC: bare `except: pass` replaced with explicit ApiException catch for 404 Extracted _ensure_deployment(), _ensure_service(), _ensure_ingress(), and _ensure_config_map() helpers to keep cyclomatic complexity in check. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-08 04:15:03 +00:00
A. F. Dudley	cc6acd5f09	fix: default skip-cluster-management to true Destroying the kind cluster on stop/start is almost never the intent. The cluster holds PVs, ConfigMaps, and networking state that are expensive to recreate. Default to preserving the cluster; pass --perform-cluster-management explicitly when a full teardown is needed. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-08 02:41:25 +00:00
A. F. Dudley	806c1bb723	refactor: rename `deployment update` to `deployment update-envs` The update command only patches environment variables and adds a restart annotation. It does not update ports, volumes, configmaps, or any other deployment spec. The old name was misleading — it implied a full spec update, causing operators to expect changes that never took effect. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-08 02:33:20 +00:00
A. F. Dudley	7f205732f2	fix(k8s): expand etcd cleanup whitelist to preserve core cluster services _clean_etcd_keeping_certs() only preserved /registry/secrets/caddy-system, deleting everything else including the kubernetes ClusterIP service in the default namespace. When kind recreated the cluster with the cleaned etcd, kube-apiserver saw existing data and skipped bootstrapping the service. kindnet panicked on KUBERNETES_SERVICE_HOST missing, blocking all pod networking. Expand the whitelist to also preserve: - /registry/services/specs/default/kubernetes - /registry/services/endpoints/default/kubernetes Loop over multiple prefixes instead of a single etcdctl get --prefix call. See docs/bug-laconic-so-etcd-cleanup.md in biscayne-agave-runbook. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-07 17:56:13 +00:00
A. F. Dudley	a11d40f2f3	fix(k8s): add HostToContainer mount propagation to kind extraMounts Without propagation, rbind submounts on the host (e.g., XFS zvol at /srv/kind/solana) are invisible inside the kind node — it sees the underlying filesystem (ZFS) instead. This causes agave's io_uring to deadlock on ZFS transaction commits (D-state in dsl_dir_tempreserve_space). HostToContainer propagation ensures host submounts propagate into the kind node, so /mnt/solana correctly resolves to the XFS zvol. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-07 13:07:12 +00:00
A. F. Dudley	929bdab8a4	fix(k8s): add HostToContainer mount propagation to kind-mount-root The kind-mount-root extraMount entry used kind's default propagation (None), so new bind mounts under the root on the host (e.g. zvols mounted under /srv/kind) were not visible inside the kind node until restart. Setting propagation to HostToContainer makes host-side mount changes propagate into the kind node automatically. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-07 12:58:04 +00:00
A. F. Dudley	b6d6ad8145	feat(k8s): add kind-mount-root for unified kind extraMount When kind-mount-root is set in spec.yml, emit a single extraMount mapping the root to /mnt instead of per-volume mounts. This allows adding new volumes without recreating the kind cluster. Volumes whose host path is under the root are skipped for individual extraMounts and their PV paths resolve to /mnt/{relative_path}. Volumes outside the root keep individual extraMounts as before. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-07 12:57:19 +00:00
A. F. Dudley	eae4c3cdff	feat(k8s): per-service resource layering in deployer Resolve container resources using layered priority: 1. spec.yml per-container override (resources.containers.<name>) 2. Compose file deploy.resources block 3. spec.yml global resources 4. DEFAULT_CONTAINER_RESOURCES fallback This prevents monitoring sidecars from inheriting the validator's resource requests (e.g., 256G memory). Each service gets appropriate resources from its compose definition unless explicitly overridden. Note: existing deployments with a global resources block in spec.yml can remove it once compose files declare per-service defaults. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-07 10:26:10 +00:00
A. F. Dudley	8a8b882e32	bug: deploy create doesn't auto-generate volume mappings for new pods Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-07 09:56:28 +00:00
A. F. Dudley	d4dcbedd48	bug: deploy create doesn't auto-generate volume mappings for new pods Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-07 09:55:24 +00:00
A. F. Dudley	d090f2064e	docs: annotate spec.yml config layering conventions Compose file owns application defaults. spec.yml config: section is for deployment-specific overrides only (hostnames, IPs, secrets). Start scripts should not have their own defaults — they read what the compose file provides. Annotations added: - CLAUDE.md: config layering table and anti-pattern callout - spec.py: Spec class docstring with good/bad config examples - deployment_create.py: _write_config_file docstring Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-07 08:47:12 +00:00
A. F. Dudley	26dea540e9	fix(k8s): use deployment namespace for pod and container lookups pods_in_deployment() and containers_in_pod() were hardcoded to search the "default" namespace, but deployments are created in a per-deployment namespace (laconic-{name}). This caused logs() to report "Pods not running" even when pods were healthy. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-04 17:13:08 +00:00
A. F. Dudley	7cd5043a83	feat(k8s): add kind-mount-root for unified kind extraMount When kind-mount-root is set in spec.yml, emit a single extraMount mapping the root to /mnt instead of per-volume mounts. This allows adding new volumes without recreating the kind cluster. Volumes whose host path is under the root are skipped for individual extraMounts and their PV paths resolve to /mnt/{relative_path}. Volumes outside the root keep individual extraMounts as before. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-04 16:41:16 +00:00
A. F. Dudley	f305214ce1	add local test runner script Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-03 05:28:55 +00:00
A. F. Dudley	fb69cc58ff	feat(k8s): map compose service ports to Kind extraPortMappings and support hostNetwork Kind's extraPortMappings only included ports 80/443 for Caddy. Compose service ports (RPC, gossip, UDP) were never forwarded, making them unreachable from the host. Also adds hostNetwork/dnsPolicy to the k8s pod spec when any compose service uses network_mode: host. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-03 05:28:52 +00:00
AFDudley	4a1b5d86fd	Merge pull request 'fix(k8s): translate service names to localhost for sidecar containers' (#989 ) from fix-sidecar-localhost into main Webapp Test / Run webapp test suite (push) Failing after 0s Details Smoke Test / Run basic test suite (push) Failing after 0s Details Lint Checks / Run linter (push) Failing after 0s Details Publish / Build and publish (push) Failing after 0s Details Deploy Test / Run deploy test suite (push) Failing after 0s Details Reviewed-on: https://git.vdb.to/cerc-io/stack-orchestrator/pulls/989	2026-02-03 23:13:27 +00:00
A. F. Dudley	019225ca18	fix(k8s): translate service names to localhost for sidecar containers In docker-compose, services can reference each other by name (e.g., 'db:5432'). In Kubernetes, when multiple containers are in the same pod (sidecars), they share the same network namespace and must use 'localhost' instead. This fix adds translate_sidecar_service_names() which replaces docker-compose service name references with 'localhost' in environment variable values for containers that share the same pod. Fixes issue where multi-container pods fail because one container tries to connect to a sibling using the compose service name instead of localhost. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-03 18:10:32 -05:00
AFDudley	0296da6f64	Merge pull request 'feat(k8s): namespace-per-deployment for resource isolation and cleanup' (#988 ) from feat-namespace-per-deployment into main Reviewed-on: https://git.vdb.to/cerc-io/stack-orchestrator/pulls/988	2026-02-03 23:09:16 +00:00
A. F. Dudley	d913926144	feat(k8s): namespace-per-deployment for resource isolation and cleanup Each deployment now gets its own Kubernetes namespace (laconic-{deployment_id}). This provides: - Resource isolation between deployments on the same cluster - Simplified cleanup: deleting the namespace cascades to all namespaced resources - No orphaned resources possible when deployment IDs change Changes: - Set k8s_namespace based on deployment name in __init__ - Add _ensure_namespace() to create namespace before deploying resources - Add _delete_namespace() for cleanup - Simplify down() to just delete PVs (cluster-scoped) and the namespace - Fix hardcoded "default" namespace in logs function Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-03 18:04:52 -05:00
AFDudley	b41e0cb2f5	Merge pull request 'fix(k8s): query resources by label in down() for proper cleanup' (#987 ) from fix-down-cleanup-by-label into main Reviewed-on: https://git.vdb.to/cerc-io/stack-orchestrator/pulls/987	2026-02-03 22:57:52 +00:00
A. F. Dudley	47d3d10ead	fix(k8s): query resources by label in down() for proper cleanup Previously, down() generated resource names from the deployment config and deleted those specific names. This failed to clean up orphaned resources when deployment IDs changed (e.g., after force_redeploy). Changes: - Add 'app' label to all resources: Ingress, Service, NodePort, ConfigMap, PV - Refactor down() to query K8s by label selector instead of generating names - This ensures all resources for a deployment are cleaned up, even if the deployment config has changed or been deleted Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-03 17:55:14 -05:00
AFDudley	21d47908cc	Merge pull request 'feat(k8s): ACME email fix, etcd persistence, volume paths' (#986 ) from fix-caddy-acme-email-rbac into main Reviewed-on: https://git.vdb.to/cerc-io/stack-orchestrator/pulls/986	2026-02-03 22:31:47 +00:00
A. F. Dudley	f70e87b848	Add etcd + PKI extraMounts for offline data recovery Mount /var/lib/etcd and /etc/kubernetes/pki to host filesystem so cluster state is preserved for offline recovery. Each deployment gets its own backup directory keyed by deployment ID. Directory structure: data/cluster-backups/{deployment_id}/etcd/ data/cluster-backups/{deployment_id}/pki/ This enables extracting secrets from etcd backups using etcdctl with the preserved PKI certificates. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-03 17:19:52 -05:00

1 2 3 4 5 ...

1271 Commits (68b4bda6782a86b206399b314a318b5bab490887) All Branches Search

1271 Commits (68b4bda6782a86b206399b314a318b5bab490887)

All Branches