Commit Graph

18 Commits (ac4a509d6f0ed4cb6d7d311653663e81165bacd2)

Author SHA1 Message Date
Prathamesh Musale ac4a509d6f feat(k8s): decouple deployment-id from cluster-id
cluster-id plays two roles today: (a) which kind cluster this
deployment attaches to (used for the kube-config context name) and
(b) compose_project_name -> app_name, the prefix for every k8s
resource the deployment creates. _get_existing_kind_cluster() in
deploy create forces (a) to inherit the running cluster's name, and
because (a) and (b) are the same field, (b) inherits too — so two
deployments that share a cluster also share an app_name and collide
on every resource whose suffix isn't naturally distinct (PVs are
cluster-scoped; same-stack deployments collide there in particular).

Decouple: add a distinct `deployment-id` field. cluster-id keeps its
current behavior (inherit running cluster, else fresh). deployment-id
is always fresh per `deploy create`. K8sDeployer sources
kind_cluster_name from cluster-id and app_name from deployment-id.

Backward compatibility:
- Existing deployment.yml files have only cluster-id; no on-disk
  change until the next `deploy create`.
- DeploymentContext.init() falls back: deployment-id = cluster-id
  when the field is absent. Existing deployments keep their current
  app_name and resource names on next start — no PV renames, no
  re-binds, no data orphaning.
- `compose_project_name` parameter to K8sDeployer is retained (still
  used by the compose deployer path); only the k8s-side internals
  switch to deployment_context getters.
- The helm chart generator continues to derive chart names from
  cluster-id; untouched here, worth a follow-up for consistency.

Effect on woodburn: dumpster/rpc/trashscan each already carry a
distinct cluster-id in their deployment.yml (pre-`_get_existing_kind_cluster`
era). Under the fallback, they all adopt their existing cluster-id
as deployment-id, so resource names are identical to today.

Effect on new deployments: even when they share a running cluster
(kind-cluster-name in kube-config matches cluster-id), they get
distinct deployment-ids at deploy create, and thus distinct resource
name prefixes. The same-stack PV collision the namespace ownership
check surfaces goes away by construction.

Test: run-deploy-test.sh now reads deployment-id from the new field,
falling back to cluster-id for pre-decouple fixtures.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-21 05:27:11 +00:00
Prathamesh Musale cb84388d00 feat(k8s): auto-ConfigMap for file-level host-path compose volumes
File-level host-path compose volumes (e.g. `../config/foo.sh:/opt/foo.sh`)
were synthesized into a kind extraMount + hostPath PV chain with a
sanitized containerPath (`/mnt/host-path-<sanitized>`). The sanitized
name is derived from the compose volume source and is identical across
deployments of the same stack, so two deployments sharing a cluster
collided at the containerPath — kind only honors the first deployment's
bind, subsequent deployments' pods silently read the first's content.
The same code path was also broken on real k8s, which has no way to
populate `/mnt/host-path-*` on worker nodes.

File-level compose binds are conceptually k8s ConfigMaps. The snowball
stack already uses the ConfigMap-backed named-volume pattern by hand.
Make that automatic at the k8s object-generation layer, without
touching deployment-dir compose or spec files.

Behavior at deploy create (validation only, no file mutation):
- :rw on a host-path bind        -> DeployerException (use a named
                                     volume for writable data)
- Directory with subdirectories  -> DeployerException (embed in image,
                                     split into configmaps, or use
                                     initContainer)
- Directory or file > ~700 KiB   -> DeployerException (ConfigMap budget)
- File, or flat small directory  -> accepted, handled at deploy start

Behavior at deploy start:
- cluster_info.get_configmaps() additionally walks pod + job compose
  volumes and emits a V1ConfigMap per host-path bind (deduped by
  sanitized name across all pods/services). Content read from
  {deployment_dir}/config/<pod>/<file> (already populated by
  _copy_extra_config_dirs).
- volumes_for_pod_files emits V1ConfigMapVolumeSource instead of
  V1HostPathVolumeSource for host-path binds.
- volume_mounts_for_service stats the source and sets V1VolumeMount
  sub_path to the filename when source is a regular file — single-key
  ConfigMaps land as files, whole-dir ConfigMaps land as directories.
- _generate_kind_mounts no longer emits `/mnt/host-path-*` extraMounts
  for these binds (the ConfigMap path bypasses kind node FS entirely).

Deployment dir layout is unchanged. Compose files, spec.yml, and
{deployment_dir}/config/<pod>/ remain exactly as today — trivially
diffable against stack source, no synthetic volume names. ConfigMaps
are visible only in k8s (kubectl get cm -n <ns>).

The existing `/mnt/host-path-*` skip in check_mounts_compatible is
retained as a transition tolerance for deployments created before
this change.

Updates:
- deployment_create: _validate_host_path_mounts() called per pod/job
  in the create loops; 700 KiB ConfigMap budget (accounts for base64
  + metadata overhead)
- helpers: _generate_kind_mounts skips host-path entries;
  volumes_for_pod_files emits ConfigMap-backed V1Volume;
  volume_mounts_for_service takes optional deployment_dir and
  auto-sets sub_path for single-file sources
- cluster_info: new _host_path_bind_configmaps() walked from
  get_configmaps(); volume_mounts_for_service call passes
  deployment_dir from spec.file_path
- docs: document the behavior and the rejected shapes in
  deployment_patterns.md
- tests: k8s-deploy asserts the host-path ConfigMaps exist,
  compose/spec unchanged, and no `/mnt/host-path-*` extraMounts

Refs: so-b86

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-20 13:13:43 +00:00
prathamesh0 7f4b058066
so-o2o: kubectl-level Caddy cert backup/restore (#746)
Publish / Gate: k8s deploy e2e (push) Failing after 3s Details
Publish / Build and publish (push) Has been skipped Details
K8s Deploy Test / Run deploy test suite on kind/k8s (push) Failing after 0s Details
Lint Checks / Run linter (push) Failing after 0s Details
Deploy Test / Run deploy test suite (push) Failing after 0s Details
Webapp Test / Run webapp test suite (push) Failing after 0s Details
Smoke Test / Run basic test suite (push) Failing after 0s Details
Replaces the etcd-surgery persistence approach with a CronJob that dumps `manager=caddy` Secrets to `{kind-mount-root}/caddy-cert-backup/` every 5 min, and a restore step that applies the file before Caddy starts on a fresh cluster. Closes so-o2o.

Deletes `_clean_etcd_keeping_certs` and the etcd+PKI extraMounts. No new spec keys - activates when `kind-mount-root` is set.
2026-04-17 15:36:40 +05:30
prathamesh0 fc5dc80058
so-l2l: in-place stop/restart via label-scoped cleanup (#743)
- `down()` scopes cleanup to a single stack via `app.kubernetes.io/stack` and keeps the namespace `Active` by default
- New `stop/down --delete-namespace` flag for opt-in full teardown
- `down()` is synchronous - waits until resources are actually gone before returning. Callers can drop their own wait loops
- `up()` skip-if-exists for Jobs completes the create-or-replace coverage (other kinds already had it)
- Orphan PVs from a prior `stop --delete-namespace` get cleaned on the next `stop --delete-volumes`
- Every k8s resource SO creates now carries `app.kubernetes.io/stack` via a new `ClusterInfo._stack_labels()` helper
- Closes so-l2l, so-076.2. Also includes pebble audit: closes so-c71, so-b2b, so-k1k; files so-328
2026-04-16 12:10:04 +05:30
prathamesh0 185ebf17f9
Fix failing k8s and external-stack CI test scripts (#739)
- Add `--perform-cluster-management` to container-registry, k8s-deployment-control, and database test scripts (`--skip-cluster-management` is now the default)
- Fix `wait_for_log_output()` in all k8s tests - "No logs available" is non-empty, so the check was passing prematurely
- Use HTTPS for container-registry catalog check (Caddy redirects HTTP->HTTPS)
- Fix external-stack sync test: sed pattern used `=` but spec is YAML (`: `), so the substitution never matched
- Workaround hyphenated env var name (`test-variable-1`) from upstream test-external-stack repo - docker compose v2 rejects hyphens
- Quote `echo $log_output` vars to prevent glob expansion in error output
- Use stack name (instead of cluster-id) derived namespace in k8s-deployment-control test
2026-04-02 15:00:57 +05:30
A. F. Dudley 87761c7041 fix: imagePullPolicy for kind, job images, duplicate registry call, test namespace
- deploy_k8s.py: default imagePullPolicy to IfNotPresent for kind
  (local images loaded via kind load, not pulled from registry)
- cluster_info.py: add job images to image_set so they're loaded into kind
- deploy_k8s.py: remove duplicate create_registry_secret call (merge artifact)
- deploy_k8s.py: fix indentation in run_job job_pull_policy (replace_all damage)
- tests/k8s-deploy: update namespace from laconic-{id} to laconic-{stack_name}
  to match the new stack-derived namespace scheme from wd-a7b

All 15 k8s deploy e2e tests pass.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-01 23:34:51 +00:00
A. F. Dudley 66da312f67 fix: base36 IDs for kind-compatible cluster names, test --perform-cluster-management
- ids.py: use base36 (lowercase+digits) instead of base62 — kind
  cluster names must match ^[a-z0-9.-]+$
- k8s deploy test: pass --perform-cluster-management on first start
  since 'start' defaults to --skip-cluster-management

Found by running tests/k8s-deploy/run-deploy-test.sh locally.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-01 21:50:53 +00:00
Prathamesh Musale 5af6a83fa2 Add Job and secrets support for k8s-kind deployments (#995)
Publish / Build and publish (push) Failing after 0s Details
K8s Deployment Control Test / Run deployment control suite on kind/k8s (push) Failing after 0s Details
K8s Deploy Test / Run deploy test suite on kind/k8s (push) Failing after 0s Details
Webapp Test / Run webapp test suite (push) Failing after 0s Details
Smoke Test / Run basic test suite (push) Failing after 0s Details
Lint Checks / Run linter (push) Failing after 0s Details
Deploy Test / Run deploy test suite (push) Failing after 0s Details
Part of https://plan.wireit.in/deepstack/browse/VUL-315

Reviewed-on: https://git.vdb.to/cerc-io/stack-orchestrator/pulls/995
Co-authored-by: Prathamesh Musale <prathamesh.musale0@gmail.com>
Co-committed-by: Prathamesh Musale <prathamesh.musale0@gmail.com>
2026-03-11 03:56:21 +00:00
Thomas E Lackey 65d67dba10 Fix k8s and enable it by default on PRs (#742)
Publish / Build and publish (push) Successful in 52s Details
Smoke Test / Run basic test suite (push) Successful in 3m45s Details
Lint Checks / Run linter (push) Failing after 3s Details
Deploy Test / Run deploy test suite (push) Successful in 3m32s Details
K8s Deploy Test / Run deploy test suite on kind/k8s (push) Failing after 1m2s Details
Webapp Test / Run webapp test suite (push) Successful in 3m22s Details
Reviewed-on: https://git.vdb.to/cerc-io/stack-orchestrator/pulls/742
Co-authored-by: Thomas E Lackey <telackey@bozemanpass.com>
Co-committed-by: Thomas E Lackey <telackey@bozemanpass.com>
2024-02-14 23:50:09 +00:00
Thomas E Lackey b22c72e715 For k8s, use provisioner-managed volumes when an absolute host path is not specified. (#741)
In kind, when we bind-mount a host directory it is first mounted into the kind container at /mnt, then into the pod at the desired location.

We accidentally picked this up for full-blown k8s, and were creating volumes at /mnt.  This changes the behavior for both kind and regular k8s so that bind mounts are only allowed if a fully-qualified path is specified.  If no path is specified at all, a default storageClass is assumed to be present, and the volume managed by a provisioner.

Eg, for kind, the default provisioner is: https://github.com/rancher/local-path-provisioner

```
stack: test
deploy-to: k8s-kind
config:
  test-variable-1: test-value-1
network:
  ports:
    test:
     - '80'
volumes:
  # this will be bind-mounted to a host-path
  test-data-bind: /srv/data
  # this will be managed by the k8s node
  test-data-auto:
configmaps:
  test-config: ./configmap/test-config
```

Reviewed-on: https://git.vdb.to/cerc-io/stack-orchestrator/pulls/741
Co-authored-by: Thomas E Lackey <telackey@bozemanpass.com>
Co-committed-by: Thomas E Lackey <telackey@bozemanpass.com>
2024-02-14 21:45:01 +00:00
David Boreham 8be1e684e8 Process environment variables defined in compose files (#736)
Reviewed-on: https://git.vdb.to/cerc-io/stack-orchestrator/pulls/736
Co-authored-by: David Boreham <david@bozemanpass.com>
Co-committed-by: David Boreham <david@bozemanpass.com>
2024-02-08 19:41:57 +00:00
Thomas E Lackey 36bb068983
Add ConfigMap test. (#726)
Publish / Build and publish (push) Successful in 1m24s Details
Deploy Test / Run deploy test suite (push) Successful in 2m52s Details
K8s Deploy Test / Run deploy test suite on kind/k8s (push) Failing after 1m3s Details
Webapp Test / Run webapp test suite (push) Successful in 2m40s Details
Smoke Test / Run basic test suite (push) Successful in 4m24s Details
* Add ConfigMap test.

* eof

* Minor tweak

* Trigger test

---------

Co-authored-by: David Boreham <david@bozemanpass.com>
2024-02-05 14:15:11 -06:00
David Boreham a750b645b9
Merge Ci test branch fixes (#717) 2024-01-30 11:18:08 -07:00
David Boreham b7f215d9bf
k8s test fixes (#713)
Publish / Build and publish (push) Successful in 51s Details
Deploy Test / Run deploy test suite (push) Successful in 3m2s Details
K8s Deploy Test / Run deploy test suite on kind/k8s (push) Failing after 1m3s Details
Webapp Test / Run webapp test suite (push) Successful in 2m34s Details
Smoke Test / Run basic test suite (push) Successful in 3m48s Details
* Add cgroup setup, increase test timeouts

* Trigger from test script or CI job changes too
2024-01-28 16:21:39 -07:00
David Boreham 635aa7037b Build test container 2024-01-16 21:15:21 -07:00
David Boreham 1f9653e6f7
Fix kind mode and add k8s deployment test (#704)
* Fix kind mode and add k8s deployment test

* Fix lint errors
2024-01-16 15:55:58 -07:00
David Boreham c9c6a0eee3
Changes for remote k8s (#655)
Publish / Build and publish (push) Successful in 1m2s Details
Deploy Test / Run deploy test suite (push) Successful in 3m6s Details
K8s Deploy Test / Run deploy test suite (push) Failing after 3m4s Details
Webapp Test / Run webapp test suite (push) Failing after 3m36s Details
Smoke Test / Run basic test suite (push) Successful in 4m4s Details
2023-11-20 09:12:57 -07:00
David Boreham a27cf86748
Add basic k8s test (#635)
* Add CI job

* Add basic k8s test
2023-11-08 19:12:48 -07:00