Commit Graph

1261 Commits (3d83c6ad272815ff96a2ab01b3abd443889805ed)

Author SHA1 Message Date
A. F. Dudley dc15c0f4a5 feat: auto-generate readiness probes from http-proxy routes
Lint Checks / Run linter (push) Failing after 0s Details
Publish / Build and publish (push) Failing after 0s Details
Deploy Test / Run deploy test suite (push) Failing after 0s Details
Webapp Test / Run webapp test suite (push) Failing after 0s Details
Smoke Test / Run basic test suite (push) Failing after 0s Details
Containers referenced in spec.yml http-proxy routes now get TCP
readiness probes on the proxied port. This tells k8s when a container
is actually ready to serve traffic.

Without readiness probes, k8s considers pods ready immediately after
start, which means:
- Rolling updates cut over before the app is listening
- Broken containers look "ready" and receive traffic (502s)
- kubectl rollout undo has nothing to roll back to

The probes use TCP socket checks (not HTTP) to work with any protocol.
Initial delay 5s, check every 10s, fail after 3 consecutive failures.

Closes so-l2l part C.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-18 19:43:09 +00:00
A. F. Dudley 2d11ca7bb0 feat: update-in-place deployments with rolling updates
Replace the destroy-and-recreate deployment model with in-place updates.

deploy_k8s.py: All resource creation (Deployment, Service, Ingress,
NodePort, ConfigMap) now uses create-or-update semantics. If a resource
already exists (409 Conflict), it patches instead of failing. For
Deployments, this triggers a k8s rolling update — old pods serve traffic
until new pods pass readiness checks.

deployment.py: restart() no longer calls down(). It just calls up()
which patches existing resources. No namespace deletion, no downtime
gap, no race conditions. k8s handles the rollout.

This gives:
- Zero-downtime deploys (old pods serve during rollout)
- Automatic rollback (if new pods fail readiness, rollout stalls)
- Manual rollback via kubectl rollout undo

Closes so-l2l (parts A and B).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-18 19:40:20 +00:00
A. F. Dudley ba39c991f1 fix: create imagePullSecret in deployment namespace, not default
create_registry_secret() hardcoded namespace="default" but deployments
now run in dedicated laconic-* namespaces. The secret was invisible
to pods in the deployment namespace, causing 401 on GHCR pulls.

Accept namespace as parameter, passed from deploy_k8s.py which knows
the correct namespace.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-18 19:08:52 +00:00
A. F. Dudley 0b3e5559d0 fix: wait for namespace termination in down() before returning
Reverts the label-based deletion approach — resources created by older
laconic-so lack labels, so label queries return empty results. Namespace
deletion is the only reliable cleanup.

Adds _wait_for_namespace_gone() so down() blocks until the namespace
is fully terminated. This prevents the race condition where up() tries
to create resources in a still-terminating namespace (403 Forbidden).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-18 18:49:38 +00:00
A. F. Dudley ae2cea3410 fix: never delete namespace on deployment down
down() deleted the entire namespace when it wasn't explicitly set in
the spec. This causes a race condition on restart: up() tries to create
resources in a namespace that's still terminating, getting 403 Forbidden.

Always use _delete_resources_by_label() instead. The namespace is cheap
to keep and required for immediate up() after down(). This also matches
the shared-namespace behavior, making down() consistent regardless of
namespace configuration.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-18 18:47:05 +00:00
A. F. Dudley e298e7444f fix: add auto-generated header to config.env
config.env is regenerated from spec.yml on every deploy create and
restart, silently overwriting manual edits. Add a header comment
explaining this so operators know to edit spec.yml instead.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-18 18:24:27 +00:00
A. F. Dudley e5a8ec5f06 fix: rename registry secret to image-pull-secret
The secret name `{app}-registry` is ambiguous — it could be a container
registry credential or a Laconic registry config. Rename to
`{app}-image-pull-secret` which clearly describes its purpose as a
Kubernetes imagePullSecret for private container registries.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-18 15:33:11 +00:00
A. F. Dudley 0bbb51067c fix: set imagePullPolicy=Always for kind deployments
Lint Checks / Run linter (push) Failing after 0s Details
Kind deployments used imagePullPolicy=None (defaults to IfNotPresent),
which means the kind node caches images by tag and never re-pulls from
the local registry. After a container rebuild + registry push, the pod
keeps using the stale cached image.

Set Always for all deployment types so k8s re-pulls on every pod
restart. With a local registry this adds negligible overhead.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-17 17:44:35 +00:00
A. F. Dudley 72aabe7d9a fix: deploy create --update now syncs config.env from spec
Publish / Build and publish (push) Failing after 0s Details
Deploy Test / Run deploy test suite (push) Failing after 0s Details
Webapp Test / Run webapp test suite (push) Failing after 0s Details
Smoke Test / Run basic test suite (push) Failing after 0s Details
Lint Checks / Run linter (push) Failing after 0s Details
The --update path excluded config.env from the safe_copy_tree, which
meant new config vars added to spec.yml were never written to
config.env. The XXX comment already flagged this as broken.

Remove config.env from exclude_patterns so --update regenerates it
from spec.yml like the non-update path does.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-17 08:20:45 +00:00
A. F. Dudley 36c37d2bde wd-a7b: Fix cluster-id and namespace naming
- Replace token_hex cluster IDs with sortable timestamp-based IDs
  (laconic-{base62_timestamp}{random_suffix}) via new ids.py module
- Check for existing Kind cluster before generating a new cluster-id
- Derive k8s namespace from stack name instead of compose_project_name
  (e.g. laconic-dumpster instead of laconic-<random>)
- Plumb namespace through to secret generation instead of hardcoding
  'default'

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-16 08:01:11 +00:00
afd 8a7491d3e0 Support multiple http-proxy entries in a single deployment
Lint Checks / Run linter (push) Failing after 0s Details
Previously get_ingress() only used the first http-proxy entry,
silently ignoring additional hostnames. Now iterates over all
entries, creating an Ingress rule and TLS config per hostname.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-16 06:16:28 +00:00
Prathamesh Musale e7483bc7d1 Add init containers, shared namespaces, per-volume sizing, and user/label support (#997)
Lint Checks / Run linter (push) Failing after 0s Details
Publish / Build and publish (push) Failing after 0s Details
Webapp Test / Run webapp test suite (push) Failing after 0s Details
Deploy Test / Run deploy test suite (push) Failing after 0s Details
Smoke Test / Run basic test suite (push) Failing after 0s Details
Reviewed-on: https://git.vdb.to/cerc-io/stack-orchestrator/pulls/997
Co-authored-by: Prathamesh Musale <prathamesh.musale0@gmail.com>
Co-committed-by: Prathamesh Musale <prathamesh.musale0@gmail.com>
2026-03-12 10:34:45 +00:00
Prathamesh Musale 5af6a83fa2 Add Job and secrets support for k8s-kind deployments (#995)
Publish / Build and publish (push) Failing after 0s Details
K8s Deployment Control Test / Run deployment control suite on kind/k8s (push) Failing after 0s Details
K8s Deploy Test / Run deploy test suite on kind/k8s (push) Failing after 0s Details
Webapp Test / Run webapp test suite (push) Failing after 0s Details
Smoke Test / Run basic test suite (push) Failing after 0s Details
Lint Checks / Run linter (push) Failing after 0s Details
Deploy Test / Run deploy test suite (push) Failing after 0s Details
Part of https://plan.wireit.in/deepstack/browse/VUL-315

Reviewed-on: https://git.vdb.to/cerc-io/stack-orchestrator/pulls/995
Co-authored-by: Prathamesh Musale <prathamesh.musale0@gmail.com>
Co-committed-by: Prathamesh Musale <prathamesh.musale0@gmail.com>
2026-03-11 03:56:21 +00:00
AFDudley 8cc0a9a19a add/local-test-runner (#996)
Publish / Build and publish (push) Failing after 0s Details
Deploy Test / Run deploy test suite (push) Failing after 0s Details
Webapp Test / Run webapp test suite (push) Failing after 0s Details
Lint Checks / Run linter (push) Failing after 0s Details
Smoke Test / Run basic test suite (push) Failing after 0s Details
Co-authored-by: A. F. Dudley <a.frederick.dudley@gmail.com>
Reviewed-on: https://git.vdb.to/cerc-io/stack-orchestrator/pulls/996
2026-03-09 20:04:58 +00:00
A. F. Dudley 974eed0c73 feat: add `deployment prepare` command (so-076.1)
Refactors K8sDeployer.up() into three composable methods:
- _setup_cluster_and_namespace(): kind cluster, API, namespace, ingress
- _create_infrastructure(): PVs, PVCs, ConfigMaps, Services, NodePorts
- _create_deployment(): Deployment resource (pods)

`prepare` calls the first two only — creates all cluster infrastructure
without starting pods. This eliminates the scale-to-0 workaround where
operators had to run `deployment start` then immediately scale down.

Usage: laconic-so deployment --dir <dir> prepare

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-08 06:56:34 +00:00
A. F. Dudley 9c5b8e3f4e chore: initialize pebbles issue tracker
Track stack-orchestrator work items with pebbles (append-only event log).

Epic so-076: Stack composition — deploy multiple stacks into one kind cluster
with independent lifecycle management per sub-stack.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-08 06:56:25 +00:00
A. F. Dudley 14f423ea0c fix(k8s): read existing resourceVersion/clusterIP before replace
K8s PUT (replace) operations require metadata.resourceVersion for
optimistic concurrency control. Services additionally have immutable
spec.clusterIP that must be preserved from the existing object.

On 409 conflict, all _ensure_* methods now read the existing resource
first and copy resourceVersion (and clusterIP for Services) into the
body before calling replace.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-08 04:32:20 +00:00
A. F. Dudley 1da69cf739 fix(k8s): make deploy_k8s.py idempotent with create-or-replace semantics
All K8s resource creation in deploy_k8s.py now uses try-create, catch
ApiException(409), then replace — matching the pattern already used for
secrets in deployment_create.py. This allows `deployment start` to be
safely re-run without 409 Conflict errors.

Resources made idempotent:
- Deployment (create_namespaced_deployment → replace on 409)
- Service (create_namespaced_service → replace on 409)
- Ingress (create_namespaced_ingress → replace on 409)
- NodePort services (same as Service)
- ConfigMap (create_namespaced_config_map → replace on 409)
- PV/PVC: bare `except: pass` replaced with explicit ApiException
  catch for 404

Extracted _ensure_deployment(), _ensure_service(), _ensure_ingress(),
and _ensure_config_map() helpers to keep cyclomatic complexity in check.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-08 04:15:03 +00:00
A. F. Dudley cc6acd5f09 fix: default skip-cluster-management to true
Destroying the kind cluster on stop/start is almost never the intent.
The cluster holds PVs, ConfigMaps, and networking state that are
expensive to recreate. Default to preserving the cluster; pass
--perform-cluster-management explicitly when a full teardown is needed.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-08 02:41:25 +00:00
A. F. Dudley 806c1bb723 refactor: rename `deployment update` to `deployment update-envs`
The update command only patches environment variables and adds a
restart annotation. It does not update ports, volumes, configmaps,
or any other deployment spec. The old name was misleading — it
implied a full spec update, causing operators to expect changes
that never took effect.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-08 02:33:20 +00:00
A. F. Dudley 7f205732f2 fix(k8s): expand etcd cleanup whitelist to preserve core cluster services
_clean_etcd_keeping_certs() only preserved /registry/secrets/caddy-system,
deleting everything else including the kubernetes ClusterIP service in the
default namespace. When kind recreated the cluster with the cleaned etcd,
kube-apiserver saw existing data and skipped bootstrapping the service.
kindnet panicked on KUBERNETES_SERVICE_HOST missing, blocking all pod
networking.

Expand the whitelist to also preserve:
- /registry/services/specs/default/kubernetes
- /registry/services/endpoints/default/kubernetes

Loop over multiple prefixes instead of a single etcdctl get --prefix call.

See docs/bug-laconic-so-etcd-cleanup.md in biscayne-agave-runbook.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-07 17:56:13 +00:00
A. F. Dudley a11d40f2f3 fix(k8s): add HostToContainer mount propagation to kind extraMounts
Without propagation, rbind submounts on the host (e.g., XFS zvol at
/srv/kind/solana) are invisible inside the kind node — it sees the
underlying filesystem (ZFS) instead. This causes agave's io_uring to
deadlock on ZFS transaction commits (D-state in dsl_dir_tempreserve_space).

HostToContainer propagation ensures host submounts propagate into the
kind node, so /mnt/solana correctly resolves to the XFS zvol.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-07 13:07:12 +00:00
A. F. Dudley 929bdab8a4 fix(k8s): add HostToContainer mount propagation to kind-mount-root
The kind-mount-root extraMount entry used kind's default propagation
(None), so new bind mounts under the root on the host (e.g. zvols
mounted under /srv/kind) were not visible inside the kind node until
restart. Setting propagation to HostToContainer makes host-side mount
changes propagate into the kind node automatically.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-07 12:58:04 +00:00
A. F. Dudley b6d6ad8145 feat(k8s): add kind-mount-root for unified kind extraMount
When kind-mount-root is set in spec.yml, emit a single extraMount
mapping the root to /mnt instead of per-volume mounts. This allows
adding new volumes without recreating the kind cluster.

Volumes whose host path is under the root are skipped for individual
extraMounts and their PV paths resolve to /mnt/{relative_path}.
Volumes outside the root keep individual extraMounts as before.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-07 12:57:19 +00:00
A. F. Dudley eae4c3cdff feat(k8s): per-service resource layering in deployer
Resolve container resources using layered priority:
1. spec.yml per-container override (resources.containers.<name>)
2. Compose file deploy.resources block
3. spec.yml global resources
4. DEFAULT_CONTAINER_RESOURCES fallback

This prevents monitoring sidecars from inheriting the validator's
resource requests (e.g., 256G memory). Each service gets appropriate
resources from its compose definition unless explicitly overridden.

Note: existing deployments with a global resources block in spec.yml
can remove it once compose files declare per-service defaults.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-07 10:26:10 +00:00
A. F. Dudley 8a8b882e32 bug: deploy create doesn't auto-generate volume mappings for new pods
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-07 09:56:28 +00:00
A. F. Dudley d4dcbedd48 bug: deploy create doesn't auto-generate volume mappings for new pods
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-07 09:55:24 +00:00
A. F. Dudley d090f2064e docs: annotate spec.yml config layering conventions
Compose file owns application defaults. spec.yml config: section is for
deployment-specific overrides only (hostnames, IPs, secrets). Start
scripts should not have their own defaults — they read what the compose
file provides.

Annotations added:
- CLAUDE.md: config layering table and anti-pattern callout
- spec.py: Spec class docstring with good/bad config examples
- deployment_create.py: _write_config_file docstring

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-07 08:47:12 +00:00
A. F. Dudley 26dea540e9 fix(k8s): use deployment namespace for pod and container lookups
pods_in_deployment() and containers_in_pod() were hardcoded to search
the "default" namespace, but deployments are created in a per-deployment
namespace (laconic-{name}). This caused logs() to report "Pods not
running" even when pods were healthy.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-04 17:13:08 +00:00
A. F. Dudley 7cd5043a83 feat(k8s): add kind-mount-root for unified kind extraMount
When kind-mount-root is set in spec.yml, emit a single extraMount
mapping the root to /mnt instead of per-volume mounts. This allows
adding new volumes without recreating the kind cluster.

Volumes whose host path is under the root are skipped for individual
extraMounts and their PV paths resolve to /mnt/{relative_path}.
Volumes outside the root keep individual extraMounts as before.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-04 16:41:16 +00:00
A. F. Dudley f305214ce1 add local test runner script
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-03 05:28:55 +00:00
A. F. Dudley fb69cc58ff feat(k8s): map compose service ports to Kind extraPortMappings and support hostNetwork
Kind's extraPortMappings only included ports 80/443 for Caddy. Compose
service ports (RPC, gossip, UDP) were never forwarded, making them
unreachable from the host. Also adds hostNetwork/dnsPolicy to the k8s
pod spec when any compose service uses network_mode: host.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-03 05:28:52 +00:00
AFDudley 4a1b5d86fd Merge pull request 'fix(k8s): translate service names to localhost for sidecar containers' (#989) from fix-sidecar-localhost into main
Webapp Test / Run webapp test suite (push) Failing after 0s Details
Smoke Test / Run basic test suite (push) Failing after 0s Details
Lint Checks / Run linter (push) Failing after 0s Details
Publish / Build and publish (push) Failing after 0s Details
Deploy Test / Run deploy test suite (push) Failing after 0s Details
Reviewed-on: https://git.vdb.to/cerc-io/stack-orchestrator/pulls/989
2026-02-03 23:13:27 +00:00
A. F. Dudley 019225ca18 fix(k8s): translate service names to localhost for sidecar containers
In docker-compose, services can reference each other by name (e.g., 'db:5432').
In Kubernetes, when multiple containers are in the same pod (sidecars), they
share the same network namespace and must use 'localhost' instead.

This fix adds translate_sidecar_service_names() which replaces docker-compose
service name references with 'localhost' in environment variable values for
containers that share the same pod.

Fixes issue where multi-container pods fail because one container tries to
connect to a sibling using the compose service name instead of localhost.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-03 18:10:32 -05:00
AFDudley 0296da6f64 Merge pull request 'feat(k8s): namespace-per-deployment for resource isolation and cleanup' (#988) from feat-namespace-per-deployment into main
Reviewed-on: https://git.vdb.to/cerc-io/stack-orchestrator/pulls/988
2026-02-03 23:09:16 +00:00
A. F. Dudley d913926144 feat(k8s): namespace-per-deployment for resource isolation and cleanup
Each deployment now gets its own Kubernetes namespace (laconic-{deployment_id}).
This provides:
- Resource isolation between deployments on the same cluster
- Simplified cleanup: deleting the namespace cascades to all namespaced resources
- No orphaned resources possible when deployment IDs change

Changes:
- Set k8s_namespace based on deployment name in __init__
- Add _ensure_namespace() to create namespace before deploying resources
- Add _delete_namespace() for cleanup
- Simplify down() to just delete PVs (cluster-scoped) and the namespace
- Fix hardcoded "default" namespace in logs function

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-03 18:04:52 -05:00
AFDudley b41e0cb2f5 Merge pull request 'fix(k8s): query resources by label in down() for proper cleanup' (#987) from fix-down-cleanup-by-label into main
Reviewed-on: https://git.vdb.to/cerc-io/stack-orchestrator/pulls/987
2026-02-03 22:57:52 +00:00
A. F. Dudley 47d3d10ead fix(k8s): query resources by label in down() for proper cleanup
Previously, down() generated resource names from the deployment config
and deleted those specific names. This failed to clean up orphaned
resources when deployment IDs changed (e.g., after force_redeploy).

Changes:
- Add 'app' label to all resources: Ingress, Service, NodePort, ConfigMap, PV
- Refactor down() to query K8s by label selector instead of generating names
- This ensures all resources for a deployment are cleaned up, even if
  the deployment config has changed or been deleted

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-03 17:55:14 -05:00
AFDudley 21d47908cc Merge pull request 'feat(k8s): ACME email fix, etcd persistence, volume paths' (#986) from fix-caddy-acme-email-rbac into main
Reviewed-on: https://git.vdb.to/cerc-io/stack-orchestrator/pulls/986
2026-02-03 22:31:47 +00:00
A. F. Dudley f70e87b848 Add etcd + PKI extraMounts for offline data recovery
Mount /var/lib/etcd and /etc/kubernetes/pki to host filesystem
so cluster state is preserved for offline recovery. Each deployment
gets its own backup directory keyed by deployment ID.

Directory structure:
  data/cluster-backups/{deployment_id}/etcd/
  data/cluster-backups/{deployment_id}/pki/

This enables extracting secrets from etcd backups using etcdctl
with the preserved PKI certificates.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-03 17:19:52 -05:00
A. F. Dudley 5bc6c978ac feat(k8s): support acme-email config for Caddy ingress
Adds support for configuring ACME email for Let's Encrypt certificates
in kind deployments. The email can be specified in the spec under
network.acme-email and will be used to configure the Caddy ingress
controller ConfigMap.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-03 17:19:52 -05:00
A. F. Dudley ee59918082 Allow relative volume paths for k8s-kind deployments
For k8s-kind, relative paths (e.g., ./data/rpc-config) are resolved to
$DEPLOYMENT_DIR/path by _make_absolute_host_path() during kind config
generation. This provides Docker Host persistence that survives cluster
restarts.

Previously, validation threw an exception before paths could be resolved,
making it impossible to use relative paths for persistent storage.

Changes:
- deployment_create.py: Skip relative path check for k8s-kind
- cluster_info.py: Allow relative paths to reach PV generation
- docs/deployment_patterns.md: Document volume persistence patterns

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-03 17:17:44 -05:00
A. F. Dudley 581ceaea94 docs: Add cluster and volume management section
Document that:
- Volumes persist across cluster deletion by design
- Only use --delete-volumes when explicitly requested
- Multiple deployments share one kind cluster
- Use --skip-cluster-management to stop single deployment

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-03 17:16:26 -05:00
A. F. Dudley 7cecf2caa6 Fix Caddy ACME email race condition by templating YAML
Previously, install_ingress_for_kind() applied the YAML (which starts
the Caddy pod with email: ""), then patched the ConfigMap afterward.
The pod had already read the empty email and Caddy doesn't hot-reload.

Now template the email into the YAML before applying, so the pod starts
with the correct email from the beginning.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-03 17:16:26 -05:00
A. F. Dudley cb6fdb77a6 Rename image-registry to registry-credentials to avoid collision
The existing 'image-registry' key is used for pushing images to a remote
registry (URL string). Rename the new auth config to 'registry-credentials'
to avoid collision.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-03 17:16:26 -05:00
A. F. Dudley 73ba13aaa5 Add private registry authentication support
Add ability to configure private container registry credentials in spec.yml
for deployments using images from registries like GHCR.

- Add get_image_registry_config() to spec.py for parsing image-registry config
- Add create_registry_secret() to create K8s docker-registry secrets
- Update cluster_info.py to use dynamic {deployment}-registry secret names
- Update deploy_k8s.py to create registry secret before deployment
- Document feature in deployment_patterns.md

The token-env pattern keeps credentials out of git - the spec references an
environment variable name, and the actual token is passed at runtime.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-03 17:16:26 -05:00
A. F. Dudley d82b3fb881 Only load locally-built images into kind, auto-detect ingress
- Check stack.yml containers: field to determine which images are local builds
- Only load local images via kind load; let k8s pull registry images directly
- Add is_ingress_running() to skip ingress installation if already running
- Fixes deployment failures when public registry images aren't in local Docker

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-03 17:16:26 -05:00
A. F. Dudley 3bc7832d8c Fix deployment name extraction from path
When stack: field in spec.yml contains a path (e.g., stack_orchestrator/data/stacks/name),
extract just the final name component for K8s secret naming. K8s resource names must
be valid RFC 1123 subdomains and cannot contain slashes.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-03 17:16:26 -05:00
A. F. Dudley a75138093b Add setup-repositories to key files list
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-03 17:16:26 -05:00
A. F. Dudley 1128c95969 Split documentation: README for users, CLAUDE.md for agents
README.md: deployment types, external stacks, commands, spec.yml reference
CLAUDE.md: implementation details, code locations, codebase navigation

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-03 17:16:26 -05:00