Commit Graph

1205 Commits (b85c12e4da87adef9bb41d4fb5c532cc5d37ba6f)

Author SHA1 Message Date
Prathamesh Musale b85c12e4da fix(test): use --skip-cluster-management for stop/start volume test
Recreating a kind cluster in the same CI run fails due to stale
etcd/certs and cgroup detection issues. Use --skip-cluster-management
to reuse the existing cluster, and --delete-volumes to clear PVs so
fresh PVCs can bind on restart.

The volume retention semantics are preserved: bind-mount host path
data survives (filesystem is old), provisioner volumes are fresh
(PVs were deleted).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-10 06:49:42 +00:00
Prathamesh Musale a1c6c35834 style: wrap long line in cluster_info.py to fix flake8 E501
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-10 06:31:25 +00:00
Prathamesh Musale 91f4e5fe38 fix(k8s): use distinct app label for job pods
Job pod templates used the same app={deployment_id} label as
deployment pods, causing pods_in_deployment() to return both.
This made the logs command warn about multiple pods and pick
the wrong one.

Use app={deployment_id}-job for job pod templates so they are
not matched by pods_in_deployment(). The Job metadata itself
retains the original app label for stack-level queries.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-10 06:26:03 +00:00
Prathamesh Musale 68ef9de016 fix(k8s): resolve internal job compose files from data/compose-jobs
resolve_job_compose_file() used Path(stack).parent.parent for the
internal fallback, which resolved to data/stacks/compose-jobs/ instead
of data/compose-jobs/. This meant deploy create couldn't find job
compose files for internal stacks, so they were never copied to the
deployment directory and never created as k8s Jobs.

Use the same data directory resolution pattern as resolve_compose_file.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-10 06:15:15 +00:00
Prathamesh Musale a1b5220e40 fix(test): prevent set -e from killing kubectl queries in test checks
kubectl commands that query jobs or pod specs exit non-zero when the
resource doesn't exist yet. Under set -e, a bare command substitution
like var=$(kubectl ...) aborts the script silently. Add || true so
the polling loop and assertion logic can handle failures gracefully.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-10 05:59:35 +00:00
Prathamesh Musale 464215c72a fix(test): replace empty secrets key instead of appending duplicate
deploy init already writes 'secrets: {}' into the spec file. The test
was appending a second secrets block via heredoc, which ruamel.yaml
rejects as a duplicate key. Use sed to replace the empty value instead.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-10 05:34:37 +00:00
Prathamesh Musale 108f13a09b fix(test): wait for kind cluster cleanup before recreating
Replace the fixed `sleep 20` with a polling loop that waits for
`kind get clusters` to report no clusters. The previous approach
was flaky on CI runners where Docker takes longer to tear down
cgroup hierarchies after `kind delete cluster`.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-10 05:26:48 +00:00
Prathamesh Musale d64046df55 Revert "fix(test): reuse kind cluster on stop/start cycle in deploy test"
This reverts commit 35f179b755.
2026-03-10 05:24:00 +00:00
Prathamesh Musale 35f179b755 fix(test): reuse kind cluster on stop/start cycle in deploy test
Use --skip-cluster-management to avoid destroying and recreating the
kind cluster during the stop/start volume retention test. The second
kind create fails on some CI runners due to cgroups detection issues.

Use --delete-volumes to clear PVs so fresh PVCs can bind on restart.
Bind-mount data survives on the host filesystem; provisioner volumes
are recreated fresh.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-10 05:13:26 +00:00
Prathamesh Musale 1375f209d3 test(k8s): add tests for jobs, secrets, labels, and namespace isolation
Add a job compose file for the test stack and extend the k8s deploy
test to verify new features:
- Namespace isolation: pod exists in laconic-{id}, not default
- Stack labels: app.kubernetes.io/stack label set on pods
- Job completion: test-job runs to completion (status.succeeded=1)
- Secrets: spec secrets: key results in envFrom secretRef on pod

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-10 05:06:31 +00:00
Prathamesh Musale 241cd75671 fix(test): use deployment namespace in k8s control test
The deployment control test queries pods with raw kubectl but didn't
specify the namespace. Since pods now live in laconic-{deployment_id}
instead of default, the query returned empty results.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-10 04:53:52 +00:00
Prathamesh Musale 183a188874 ci: upgrade Kind to v0.25.0 and pin kubectl to v1.31.2
Kind v0.20.0 defaults to k8s v1.27.3 which fails on newer CI runners
(kubelet cgroups issue). Upgrade to Kind v0.25.0 (k8s v1.31.2) and
pin kubectl to match.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-10 04:40:06 +00:00
Prathamesh Musale 517e102830 fix(k8s): use deployment namespace for pod/container lookups
pods_in_deployment() and containers_in_pod() hardcoded
namespace="default", but pods are created in the deployment-specific
namespace (laconic-{cluster-id}). This caused logs() to return
"Pods not running" even when pods were healthy.

Add namespace parameter to both functions and pass
self.k8s_namespace from the logs() caller.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-10 04:40:06 +00:00
Prathamesh Musale ef07b2c86e k8s: extract basename from stack path for labels
Stack.name contains the full absolute path from the spec file's
"stack:" key (e.g. /home/.../stacks/hyperlane-minio). K8s labels
must be <= 63 bytes and alphanumeric. Extract just the directory
basename (e.g. "hyperlane-minio") before using it as a label value.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-10 04:40:06 +00:00
Prathamesh Musale 7c8a4d91e7 k8s: add start() hook for post-deployment k8s resource creation
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-10 04:40:06 +00:00
Prathamesh Musale b8702f0bfc k8s: add app.kubernetes.io/stack label to pods and jobs
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-10 04:40:06 +00:00
Prathamesh Musale d7a742032e fix(webapp): use YAML round-trip instead of raw string append in _fixup_url_spec
The secrets: {} key added by init_operation for k8s deployments became
the last key in the spec file, breaking the raw string append that
assumed network: was always last. Replace with proper YAML load/modify/dump.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-10 04:40:06 +00:00
Prathamesh Musale b77037c73d fix: remove shadowed os import in cluster_info
Inline `import os` at line 663 shadowed the top-level import,
causing flake8 F402.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-10 04:40:06 +00:00
Prathamesh Musale 8d65cb13a0 fix(k8s): copy configmap dirs for jobs-only stacks during deploy create
The k8s configmap directory copying was inside the `for pod in pods:`
loop. For jobs-only stacks (no pods), the loop never executes, so
configmap files were never copied into the deployment directory.
The ConfigMaps were created as empty objects, leaving volume mounts
with no files.

Move the k8s configmap copying outside the pod loop so it runs
regardless of whether the stack has pods.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-10 04:40:06 +00:00
Prathamesh Musale 47f3068e70 fix(k8s): include job volumes in PVC/ConfigMap/PV creation
For jobs-only stacks, named_volumes_from_pod_files() returned empty
because it only scanned parsed_pod_yaml_map. This caused ConfigMaps
and PVCs declared in the spec to be silently skipped.

- Add _all_named_volumes() helper that scans both pod and job maps
- Guard update() against empty parsed_pod_yaml_map (uncaught 404)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-10 04:40:06 +00:00
Prathamesh Musale 9b304b8990 fix(k8s): remove job-name label that conflicts with k8s auto-label
Kubernetes automatically adds a job-name label to Job pod templates
matching the full Job name. Our custom job-name label used the short
name, causing a 422 validation error. Let k8s manage this label.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-10 04:40:06 +00:00
Prathamesh Musale e0a8477326 fix(k8s): skip Deployment creation for jobs-only stacks
When a stack defines only jobs: (no pods:), the parsed_pod_yaml_map
is empty. Creating a Deployment with no containers causes a 422 error
from the k8s API. Skip Deployment creation when there are no pods.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-10 04:40:06 +00:00
Prathamesh Musale 74deb3f8d6 feat(k8s): add Job support for non-Helm k8s-kind deployments
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-10 04:40:06 +00:00
Prathamesh Musale 589ed3cf69 docs: update CLI reference to match actual code
cli.md:
- Document `start`/`stop` as preferred commands (`up`/`down` as legacy)
- Add --skip-cluster-management flag for start and stop
- Add --delete-volumes flag for stop
- Add missing subcommands: restart, exec, status, port, push-images, run-job
- Add --helm-chart option to deploy create
- Reorganize deploy vs deployment sections for clarity

deployment_patterns.md:
- Add missing --stack flag to deploy create example

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-10 04:38:49 +00:00
Prathamesh Musale 641052558a feat: add secrets support for k8s deployments
Adds a `secrets:` key to spec.yml that references pre-existing k8s
Secrets by name. SO mounts them as envFrom.secretRef on all pod
containers. Secret contents are managed out-of-band by the operator.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-10 04:38:48 +00:00
AFDudley 8cc0a9a19a add/local-test-runner (#996)
Publish / Build and publish (push) Failing after 0s Details
Deploy Test / Run deploy test suite (push) Failing after 0s Details
Webapp Test / Run webapp test suite (push) Failing after 0s Details
Lint Checks / Run linter (push) Failing after 0s Details
Smoke Test / Run basic test suite (push) Failing after 0s Details
Co-authored-by: A. F. Dudley <a.frederick.dudley@gmail.com>
Reviewed-on: https://git.vdb.to/cerc-io/stack-orchestrator/pulls/996
2026-03-09 20:04:58 +00:00
AFDudley 4a1b5d86fd Merge pull request 'fix(k8s): translate service names to localhost for sidecar containers' (#989) from fix-sidecar-localhost into main
Webapp Test / Run webapp test suite (push) Failing after 0s Details
Smoke Test / Run basic test suite (push) Failing after 0s Details
Lint Checks / Run linter (push) Failing after 0s Details
Publish / Build and publish (push) Failing after 0s Details
Deploy Test / Run deploy test suite (push) Failing after 0s Details
Reviewed-on: https://git.vdb.to/cerc-io/stack-orchestrator/pulls/989
2026-02-03 23:13:27 +00:00
A. F. Dudley 019225ca18 fix(k8s): translate service names to localhost for sidecar containers
In docker-compose, services can reference each other by name (e.g., 'db:5432').
In Kubernetes, when multiple containers are in the same pod (sidecars), they
share the same network namespace and must use 'localhost' instead.

This fix adds translate_sidecar_service_names() which replaces docker-compose
service name references with 'localhost' in environment variable values for
containers that share the same pod.

Fixes issue where multi-container pods fail because one container tries to
connect to a sibling using the compose service name instead of localhost.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-03 18:10:32 -05:00
AFDudley 0296da6f64 Merge pull request 'feat(k8s): namespace-per-deployment for resource isolation and cleanup' (#988) from feat-namespace-per-deployment into main
Reviewed-on: https://git.vdb.to/cerc-io/stack-orchestrator/pulls/988
2026-02-03 23:09:16 +00:00
A. F. Dudley d913926144 feat(k8s): namespace-per-deployment for resource isolation and cleanup
Each deployment now gets its own Kubernetes namespace (laconic-{deployment_id}).
This provides:
- Resource isolation between deployments on the same cluster
- Simplified cleanup: deleting the namespace cascades to all namespaced resources
- No orphaned resources possible when deployment IDs change

Changes:
- Set k8s_namespace based on deployment name in __init__
- Add _ensure_namespace() to create namespace before deploying resources
- Add _delete_namespace() for cleanup
- Simplify down() to just delete PVs (cluster-scoped) and the namespace
- Fix hardcoded "default" namespace in logs function

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-03 18:04:52 -05:00
AFDudley b41e0cb2f5 Merge pull request 'fix(k8s): query resources by label in down() for proper cleanup' (#987) from fix-down-cleanup-by-label into main
Reviewed-on: https://git.vdb.to/cerc-io/stack-orchestrator/pulls/987
2026-02-03 22:57:52 +00:00
A. F. Dudley 47d3d10ead fix(k8s): query resources by label in down() for proper cleanup
Previously, down() generated resource names from the deployment config
and deleted those specific names. This failed to clean up orphaned
resources when deployment IDs changed (e.g., after force_redeploy).

Changes:
- Add 'app' label to all resources: Ingress, Service, NodePort, ConfigMap, PV
- Refactor down() to query K8s by label selector instead of generating names
- This ensures all resources for a deployment are cleaned up, even if
  the deployment config has changed or been deleted

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-03 17:55:14 -05:00
AFDudley 21d47908cc Merge pull request 'feat(k8s): ACME email fix, etcd persistence, volume paths' (#986) from fix-caddy-acme-email-rbac into main
Reviewed-on: https://git.vdb.to/cerc-io/stack-orchestrator/pulls/986
2026-02-03 22:31:47 +00:00
A. F. Dudley f70e87b848 Add etcd + PKI extraMounts for offline data recovery
Mount /var/lib/etcd and /etc/kubernetes/pki to host filesystem
so cluster state is preserved for offline recovery. Each deployment
gets its own backup directory keyed by deployment ID.

Directory structure:
  data/cluster-backups/{deployment_id}/etcd/
  data/cluster-backups/{deployment_id}/pki/

This enables extracting secrets from etcd backups using etcdctl
with the preserved PKI certificates.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-03 17:19:52 -05:00
A. F. Dudley 5bc6c978ac feat(k8s): support acme-email config for Caddy ingress
Adds support for configuring ACME email for Let's Encrypt certificates
in kind deployments. The email can be specified in the spec under
network.acme-email and will be used to configure the Caddy ingress
controller ConfigMap.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-03 17:19:52 -05:00
A. F. Dudley ee59918082 Allow relative volume paths for k8s-kind deployments
For k8s-kind, relative paths (e.g., ./data/rpc-config) are resolved to
$DEPLOYMENT_DIR/path by _make_absolute_host_path() during kind config
generation. This provides Docker Host persistence that survives cluster
restarts.

Previously, validation threw an exception before paths could be resolved,
making it impossible to use relative paths for persistent storage.

Changes:
- deployment_create.py: Skip relative path check for k8s-kind
- cluster_info.py: Allow relative paths to reach PV generation
- docs/deployment_patterns.md: Document volume persistence patterns

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-03 17:17:44 -05:00
A. F. Dudley 581ceaea94 docs: Add cluster and volume management section
Document that:
- Volumes persist across cluster deletion by design
- Only use --delete-volumes when explicitly requested
- Multiple deployments share one kind cluster
- Use --skip-cluster-management to stop single deployment

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-03 17:16:26 -05:00
A. F. Dudley 7cecf2caa6 Fix Caddy ACME email race condition by templating YAML
Previously, install_ingress_for_kind() applied the YAML (which starts
the Caddy pod with email: ""), then patched the ConfigMap afterward.
The pod had already read the empty email and Caddy doesn't hot-reload.

Now template the email into the YAML before applying, so the pod starts
with the correct email from the beginning.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-03 17:16:26 -05:00
A. F. Dudley cb6fdb77a6 Rename image-registry to registry-credentials to avoid collision
The existing 'image-registry' key is used for pushing images to a remote
registry (URL string). Rename the new auth config to 'registry-credentials'
to avoid collision.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-03 17:16:26 -05:00
A. F. Dudley 73ba13aaa5 Add private registry authentication support
Add ability to configure private container registry credentials in spec.yml
for deployments using images from registries like GHCR.

- Add get_image_registry_config() to spec.py for parsing image-registry config
- Add create_registry_secret() to create K8s docker-registry secrets
- Update cluster_info.py to use dynamic {deployment}-registry secret names
- Update deploy_k8s.py to create registry secret before deployment
- Document feature in deployment_patterns.md

The token-env pattern keeps credentials out of git - the spec references an
environment variable name, and the actual token is passed at runtime.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-03 17:16:26 -05:00
A. F. Dudley d82b3fb881 Only load locally-built images into kind, auto-detect ingress
- Check stack.yml containers: field to determine which images are local builds
- Only load local images via kind load; let k8s pull registry images directly
- Add is_ingress_running() to skip ingress installation if already running
- Fixes deployment failures when public registry images aren't in local Docker

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-03 17:16:26 -05:00
A. F. Dudley 3bc7832d8c Fix deployment name extraction from path
When stack: field in spec.yml contains a path (e.g., stack_orchestrator/data/stacks/name),
extract just the final name component for K8s secret naming. K8s resource names must
be valid RFC 1123 subdomains and cannot contain slashes.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-03 17:16:26 -05:00
A. F. Dudley a75138093b Add setup-repositories to key files list
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-03 17:16:26 -05:00
A. F. Dudley 1128c95969 Split documentation: README for users, CLAUDE.md for agents
README.md: deployment types, external stacks, commands, spec.yml reference
CLAUDE.md: implementation details, code locations, codebase navigation

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-03 17:16:26 -05:00
A. F. Dudley d292e7c48d Add k8s-kind architecture documentation to CLAUDE.md
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-03 17:16:25 -05:00
A. F. Dudley b057969ddd Clarify create_cluster docstring: one cluster per host by design
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-03 17:15:19 -05:00
A. F. Dudley ca090d2cd5 Add $generate:type:length$ token support for K8s secrets
- Add GENERATE_TOKEN_PATTERN to detect $generate:hex:N$ and $generate:base64:N$ tokens
- Add _generate_and_store_secrets() to create K8s Secrets from spec.yml config
- Modify _write_config_file() to separate secrets from regular config
- Add env_from with secretRef to container spec in cluster_info.py
- Secrets are injected directly into containers via K8s native mechanism

This enables declarative secret generation in spec.yml:
  config:
    SESSION_SECRET: $generate:hex:32$
    DB_PASSWORD: $generate:hex:16$

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-03 17:15:19 -05:00
A. F. Dudley 2d3721efa4 Add cluster reuse for multi-stack k8s-kind deployments
When deploying a second stack to k8s-kind, automatically reuse an existing
kind cluster instead of trying to create a new one (which would fail due
to port 80/443 conflicts).

Changes:
- helpers.py: create_cluster() now checks for existing cluster first
- deploy_k8s.py: up() captures returned cluster name and updates self

This enables deploying multiple stacks (e.g., gorbagana-rpc + trashscan-explorer)
to the same kind cluster.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-03 17:15:19 -05:00
A. F. Dudley 4408725b08 Fix repo root path calculation (4 parents from stack path) 2026-02-03 17:15:19 -05:00
A. F. Dudley 22d64f1e97 Add --spec-file option to restart and auto-detect GitOps spec
- Add --spec-file option to specify spec location in repo
- Auto-detect deployment/spec.yml in repo as GitOps location
- Fall back to deployment dir if no repo spec found

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-03 17:15:19 -05:00