Compare commits

...

7 Commits

Author SHA1 Message Date
AFDudley 8cc0a9a19a add/local-test-runner (#996)
Publish / Build and publish (push) Failing after 0s Details
Deploy Test / Run deploy test suite (push) Failing after 0s Details
Webapp Test / Run webapp test suite (push) Failing after 0s Details
Lint Checks / Run linter (push) Failing after 0s Details
Smoke Test / Run basic test suite (push) Failing after 0s Details
Co-authored-by: A. F. Dudley <a.frederick.dudley@gmail.com>
Reviewed-on: https://git.vdb.to/cerc-io/stack-orchestrator/pulls/996
2026-03-09 20:04:58 +00:00
AFDudley 4a1b5d86fd Merge pull request 'fix(k8s): translate service names to localhost for sidecar containers' (#989) from fix-sidecar-localhost into main
Webapp Test / Run webapp test suite (push) Failing after 0s Details
Smoke Test / Run basic test suite (push) Failing after 0s Details
Lint Checks / Run linter (push) Failing after 0s Details
Publish / Build and publish (push) Failing after 0s Details
Deploy Test / Run deploy test suite (push) Failing after 0s Details
Reviewed-on: https://git.vdb.to/cerc-io/stack-orchestrator/pulls/989
2026-02-03 23:13:27 +00:00
A. F. Dudley 019225ca18 fix(k8s): translate service names to localhost for sidecar containers
In docker-compose, services can reference each other by name (e.g., 'db:5432').
In Kubernetes, when multiple containers are in the same pod (sidecars), they
share the same network namespace and must use 'localhost' instead.

This fix adds translate_sidecar_service_names() which replaces docker-compose
service name references with 'localhost' in environment variable values for
containers that share the same pod.

Fixes issue where multi-container pods fail because one container tries to
connect to a sibling using the compose service name instead of localhost.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-03 18:10:32 -05:00
AFDudley 0296da6f64 Merge pull request 'feat(k8s): namespace-per-deployment for resource isolation and cleanup' (#988) from feat-namespace-per-deployment into main
Reviewed-on: https://git.vdb.to/cerc-io/stack-orchestrator/pulls/988
2026-02-03 23:09:16 +00:00
A. F. Dudley d913926144 feat(k8s): namespace-per-deployment for resource isolation and cleanup
Each deployment now gets its own Kubernetes namespace (laconic-{deployment_id}).
This provides:
- Resource isolation between deployments on the same cluster
- Simplified cleanup: deleting the namespace cascades to all namespaced resources
- No orphaned resources possible when deployment IDs change

Changes:
- Set k8s_namespace based on deployment name in __init__
- Add _ensure_namespace() to create namespace before deploying resources
- Add _delete_namespace() for cleanup
- Simplify down() to just delete PVs (cluster-scoped) and the namespace
- Fix hardcoded "default" namespace in logs function

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-03 18:04:52 -05:00
AFDudley b41e0cb2f5 Merge pull request 'fix(k8s): query resources by label in down() for proper cleanup' (#987) from fix-down-cleanup-by-label into main
Reviewed-on: https://git.vdb.to/cerc-io/stack-orchestrator/pulls/987
2026-02-03 22:57:52 +00:00
A. F. Dudley 47d3d10ead fix(k8s): query resources by label in down() for proper cleanup
Previously, down() generated resource names from the deployment config
and deleted those specific names. This failed to clean up orphaned
resources when deployment IDs changed (e.g., after force_redeploy).

Changes:
- Add 'app' label to all resources: Ingress, Service, NodePort, ConfigMap, PV
- Refactor down() to query K8s by label selector instead of generating names
- This ensures all resources for a deployment are cleaned up, even if
  the deployment config has changed or been deleted

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-03 17:55:14 -05:00
6 changed files with 253 additions and 108 deletions

19
TODO.md
View File

@ -7,6 +7,25 @@ We need an "update stack" command in stack orchestrator and cleaner documentatio
**Context**: Currently, `deploy init` generates a spec file and `deploy create` creates a deployment directory. The `deployment update` command (added by Thomas Lackey) only syncs env vars and restarts - it doesn't regenerate configurations. There's a gap in the workflow for updating stack configurations after initial deployment.
## Bugs
### `deploy create` doesn't auto-generate volume mappings for new pods
When a new pod is added to `stack.yml` (e.g. `monitoring`), `deploy create`
does not generate default host path mappings in spec.yml for the new pod's
volumes. The deployment then fails at scheduling because the PVCs don't exist.
**Expected**: `deploy create` enumerates all volumes from all compose files
in the stack and generates default host paths for any that aren't already
mapped in the spec.yml `volumes:` section.
**Actual**: Only volumes already in spec.yml get PVs. New volumes are silently
missing, causing `FailedScheduling: persistentvolumeclaim not found`.
**Workaround**: Manually add volume entries to spec.yml and create host dirs.
**Files**: `deployment_create.py` (`_write_config_file`, volume handling)
## Architecture Refactoring
### Separate Deployer from Stack Orchestrator CLI

View File

@ -31,6 +31,7 @@ from stack_orchestrator.deploy.k8s.helpers import (
envs_from_environment_variables_map,
envs_from_compose_file,
merge_envs,
translate_sidecar_service_names,
)
from stack_orchestrator.deploy.deploy_util import (
parsed_pod_files_map_from_file_names,
@ -125,7 +126,8 @@ class ClusterInfo:
name=(
f"{self.app_name}-nodeport-"
f"{pod_port}-{protocol.lower()}"
)
),
labels={"app": self.app_name},
),
spec=client.V1ServiceSpec(
type="NodePort",
@ -208,7 +210,9 @@ class ClusterInfo:
ingress = client.V1Ingress(
metadata=client.V1ObjectMeta(
name=f"{self.app_name}-ingress", annotations=ingress_annotations
name=f"{self.app_name}-ingress",
labels={"app": self.app_name},
annotations=ingress_annotations,
),
spec=spec,
)
@ -238,7 +242,10 @@ class ClusterInfo:
]
service = client.V1Service(
metadata=client.V1ObjectMeta(name=f"{self.app_name}-service"),
metadata=client.V1ObjectMeta(
name=f"{self.app_name}-service",
labels={"app": self.app_name},
),
spec=client.V1ServiceSpec(
type="ClusterIP",
ports=service_ports,
@ -320,7 +327,7 @@ class ClusterInfo:
spec = client.V1ConfigMap(
metadata=client.V1ObjectMeta(
name=f"{self.app_name}-{cfg_map_name}",
labels={"configmap-label": cfg_map_name},
labels={"app": self.app_name, "configmap-label": cfg_map_name},
),
binary_data=data,
)
@ -377,20 +384,53 @@ class ClusterInfo:
pv = client.V1PersistentVolume(
metadata=client.V1ObjectMeta(
name=f"{self.app_name}-{volume_name}",
labels={"volume-label": f"{self.app_name}-{volume_name}"},
labels={
"app": self.app_name,
"volume-label": f"{self.app_name}-{volume_name}",
},
),
spec=spec,
)
result.append(pv)
return result
def _any_service_has_host_network(self):
for pod_name in self.parsed_pod_yaml_map:
pod = self.parsed_pod_yaml_map[pod_name]
for svc in pod.get("services", {}).values():
if svc.get("network_mode") == "host":
return True
return False
def _resolve_container_resources(
self, container_name: str, service_info: dict, global_resources: Resources
) -> Resources:
"""Resolve resources for a container using layered priority.
Priority: spec per-container > compose deploy.resources
> spec global > DEFAULT
"""
# 1. Check spec.yml for per-container override
per_container = self.spec.get_container_resources_for(container_name)
if per_container:
return per_container
# 2. Check compose service_info for deploy.resources
deploy_block = service_info.get("deploy", {})
compose_resources = deploy_block.get("resources", {}) if deploy_block else {}
if compose_resources:
return Resources(compose_resources)
# 3. Fall back to spec.yml global (already resolved with DEFAULT fallback)
return global_resources
# TODO: put things like image pull policy into an object-scope struct
def get_deployment(self, image_pull_policy: Optional[str] = None):
containers = []
services = {}
resources = self.spec.get_container_resources()
if not resources:
resources = DEFAULT_CONTAINER_RESOURCES
global_resources = self.spec.get_container_resources()
if not global_resources:
global_resources = DEFAULT_CONTAINER_RESOURCES
for pod_name in self.parsed_pod_yaml_map:
pod = self.parsed_pod_yaml_map[pod_name]
services = pod["services"]
@ -430,6 +470,12 @@ class ClusterInfo:
if "environment" in service_info
else self.environment_variables.map
)
# Translate docker-compose service names to localhost for sidecars
# All services in the same pod share the network namespace
sibling_services = [s for s in services.keys() if s != service_name]
merged_envs = translate_sidecar_service_names(
merged_envs, sibling_services
)
envs = envs_from_environment_variables_map(merged_envs)
if opts.o.debug:
print(f"Merged envs: {envs}")
@ -467,6 +513,9 @@ class ClusterInfo:
)
)
]
container_resources = self._resolve_container_resources(
container_name, service_info, global_resources
)
container = client.V1Container(
name=container_name,
image=image_to_use,
@ -485,7 +534,7 @@ class ClusterInfo:
if self.spec.get_capabilities()
else None,
),
resources=to_k8s_resource_requirements(resources),
resources=to_k8s_resource_requirements(container_resources),
)
containers.append(container)
volumes = volumes_for_pod_files(
@ -552,6 +601,7 @@ class ClusterInfo:
)
)
use_host_network = self._any_service_has_host_network()
template = client.V1PodTemplateSpec(
metadata=client.V1ObjectMeta(annotations=annotations, labels=labels),
spec=client.V1PodSpec(
@ -561,6 +611,8 @@ class ClusterInfo:
affinity=affinity,
tolerations=tolerations,
runtime_class_name=self.spec.get_runtime_class(),
host_network=use_host_network or None,
dns_policy=("ClusterFirstWithHostNet" if use_host_network else None),
),
)
spec = client.V1DeploymentSpec(

View File

@ -96,7 +96,7 @@ class K8sDeployer(Deployer):
core_api: client.CoreV1Api
apps_api: client.AppsV1Api
networking_api: client.NetworkingV1Api
k8s_namespace: str = "default"
k8s_namespace: str
kind_cluster_name: str
skip_cluster_management: bool
cluster_info: ClusterInfo
@ -113,6 +113,7 @@ class K8sDeployer(Deployer):
) -> None:
self.type = type
self.skip_cluster_management = False
self.k8s_namespace = "default" # Will be overridden below if context exists
# TODO: workaround pending refactoring above to cope with being
# created with a null deployment_context
if deployment_context is None:
@ -120,6 +121,8 @@ class K8sDeployer(Deployer):
self.deployment_dir = deployment_context.deployment_dir
self.deployment_context = deployment_context
self.kind_cluster_name = compose_project_name
# Use deployment-specific namespace for resource isolation and easy cleanup
self.k8s_namespace = f"laconic-{compose_project_name}"
self.cluster_info = ClusterInfo()
self.cluster_info.int(
compose_files,
@ -149,6 +152,46 @@ class K8sDeployer(Deployer):
self.apps_api = client.AppsV1Api()
self.custom_obj_api = client.CustomObjectsApi()
def _ensure_namespace(self):
"""Create the deployment namespace if it doesn't exist."""
if opts.o.dry_run:
print(f"Dry run: would create namespace {self.k8s_namespace}")
return
try:
self.core_api.read_namespace(name=self.k8s_namespace)
if opts.o.debug:
print(f"Namespace {self.k8s_namespace} already exists")
except ApiException as e:
if e.status == 404:
# Create the namespace
ns = client.V1Namespace(
metadata=client.V1ObjectMeta(
name=self.k8s_namespace,
labels={"app": self.cluster_info.app_name},
)
)
self.core_api.create_namespace(body=ns)
if opts.o.debug:
print(f"Created namespace {self.k8s_namespace}")
else:
raise
def _delete_namespace(self):
"""Delete the deployment namespace and all resources within it."""
if opts.o.dry_run:
print(f"Dry run: would delete namespace {self.k8s_namespace}")
return
try:
self.core_api.delete_namespace(name=self.k8s_namespace)
if opts.o.debug:
print(f"Deleted namespace {self.k8s_namespace}")
except ApiException as e:
if e.status == 404:
if opts.o.debug:
print(f"Namespace {self.k8s_namespace} not found")
else:
raise
def _create_volume_data(self):
# Create the host-path-mounted PVs for this deployment
pvs = self.cluster_info.get_pvs()
@ -314,6 +357,8 @@ class K8sDeployer(Deployer):
load_images_into_kind(self.kind_cluster_name, local_images)
# Note: if no local containers defined, all images come from registries
self.connect_api()
# Create deployment-specific namespace for resource isolation
self._ensure_namespace()
if self.is_kind() and not self.skip_cluster_management:
# Configure ingress controller (not installed by default in kind)
# Skip if already running (idempotent for shared cluster)
@ -381,107 +426,30 @@ class K8sDeployer(Deployer):
print("NodePort created:")
print(f"{nodeport_resp}")
def down(self, timeout, volumes, skip_cluster_management): # noqa: C901
def down(self, timeout, volumes, skip_cluster_management):
self.skip_cluster_management = skip_cluster_management
self.connect_api()
# Delete the k8s objects
# PersistentVolumes are cluster-scoped (not namespaced), so delete by label
if volumes:
# Create the host-path-mounted PVs for this deployment
pvs = self.cluster_info.get_pvs()
for pv in pvs:
if opts.o.debug:
print(f"Deleting this pv: {pv}")
try:
pv_resp = self.core_api.delete_persistent_volume(
name=pv.metadata.name
pvs = self.core_api.list_persistent_volume(
label_selector=f"app={self.cluster_info.app_name}"
)
for pv in pvs.items:
if opts.o.debug:
print("PV deleted:")
print(f"{pv_resp}")
print(f"Deleting PV: {pv.metadata.name}")
try:
self.core_api.delete_persistent_volume(name=pv.metadata.name)
except ApiException as e:
_check_delete_exception(e)
except ApiException as e:
if opts.o.debug:
print(f"Error listing PVs: {e}")
# Figure out the PVCs for this deployment
pvcs = self.cluster_info.get_pvcs()
for pvc in pvcs:
if opts.o.debug:
print(f"Deleting this pvc: {pvc}")
try:
pvc_resp = self.core_api.delete_namespaced_persistent_volume_claim(
name=pvc.metadata.name, namespace=self.k8s_namespace
)
if opts.o.debug:
print("PVCs deleted:")
print(f"{pvc_resp}")
except ApiException as e:
_check_delete_exception(e)
# Figure out the ConfigMaps for this deployment
cfg_maps = self.cluster_info.get_configmaps()
for cfg_map in cfg_maps:
if opts.o.debug:
print(f"Deleting this ConfigMap: {cfg_map}")
try:
cfg_map_resp = self.core_api.delete_namespaced_config_map(
name=cfg_map.metadata.name, namespace=self.k8s_namespace
)
if opts.o.debug:
print("ConfigMap deleted:")
print(f"{cfg_map_resp}")
except ApiException as e:
_check_delete_exception(e)
deployment = self.cluster_info.get_deployment()
if opts.o.debug:
print(f"Deleting this deployment: {deployment}")
if deployment and deployment.metadata and deployment.metadata.name:
try:
self.apps_api.delete_namespaced_deployment(
name=deployment.metadata.name, namespace=self.k8s_namespace
)
except ApiException as e:
_check_delete_exception(e)
service = self.cluster_info.get_service()
if opts.o.debug:
print(f"Deleting service: {service}")
if service and service.metadata and service.metadata.name:
try:
self.core_api.delete_namespaced_service(
namespace=self.k8s_namespace, name=service.metadata.name
)
except ApiException as e:
_check_delete_exception(e)
ingress = self.cluster_info.get_ingress(use_tls=not self.is_kind())
if ingress and ingress.metadata and ingress.metadata.name:
if opts.o.debug:
print(f"Deleting this ingress: {ingress}")
try:
self.networking_api.delete_namespaced_ingress(
name=ingress.metadata.name, namespace=self.k8s_namespace
)
except ApiException as e:
_check_delete_exception(e)
else:
if opts.o.debug:
print("No ingress to delete")
nodeports: List[client.V1Service] = self.cluster_info.get_nodeports()
for nodeport in nodeports:
if opts.o.debug:
print(f"Deleting this nodeport: {nodeport}")
if nodeport.metadata and nodeport.metadata.name:
try:
self.core_api.delete_namespaced_service(
namespace=self.k8s_namespace, name=nodeport.metadata.name
)
except ApiException as e:
_check_delete_exception(e)
else:
if opts.o.debug:
print("No nodeport to delete")
# Delete the deployment namespace - this cascades to all namespaced resources
# (PVCs, ConfigMaps, Deployments, Services, Ingresses, etc.)
self._delete_namespace()
if self.is_kind() and not self.skip_cluster_management:
# Destroy the kind cluster
@ -619,7 +587,7 @@ class K8sDeployer(Deployer):
log_data = ""
for container in containers:
container_log = self.core_api.read_namespaced_pod_log(
k8s_pod_name, namespace="default", container=container
k8s_pod_name, namespace=self.k8s_namespace, container=container
)
container_log_lines = container_log.splitlines()
for line in container_log_lines:

View File

@ -942,6 +942,41 @@ def envs_from_compose_file(
return result
def translate_sidecar_service_names(
envs: Mapping[str, str], sibling_service_names: List[str]
) -> Mapping[str, str]:
"""Translate docker-compose service names to localhost for sidecar containers.
In docker-compose, services can reference each other by name (e.g., 'db:5432').
In Kubernetes, when multiple containers are in the same pod (sidecars), they
share the same network namespace and must use 'localhost' instead.
This function replaces service name references with 'localhost' in env values.
"""
import re
if not sibling_service_names:
return envs
result = {}
for env_var, env_val in envs.items():
if env_val is None:
result[env_var] = env_val
continue
new_val = str(env_val)
for service_name in sibling_service_names:
# Match service name followed by optional port (e.g., 'db:5432', 'db')
# Handle URLs like: postgres://user:pass@db:5432/dbname
# and simple refs like: db:5432 or just db
pattern = rf"\b{re.escape(service_name)}(:\d+)?\b"
new_val = re.sub(pattern, lambda m: f'localhost{m.group(1) or ""}', new_val)
result[env_var] = new_val
return result
def envs_from_environment_variables_map(
map: Mapping[str, str]
) -> List[client.V1EnvVar]:

View File

@ -120,6 +120,27 @@ class Spec:
self.obj.get(constants.resources_key, {}).get("containers", {})
)
def get_container_resources_for(
self, container_name: str
) -> typing.Optional[Resources]:
"""Look up per-container resource overrides from spec.yml.
Checks resources.containers.<container_name> in the spec. Returns None
if no per-container override exists (caller falls back to other sources).
"""
containers_block = self.obj.get(constants.resources_key, {}).get(
"containers", {}
)
if container_name in containers_block:
entry = containers_block[container_name]
# Only treat it as a per-container override if it's a dict with
# reservations/limits nested inside (not a top-level global key)
if isinstance(entry, dict) and (
"reservations" in entry or "limits" in entry
):
return Resources(entry)
return None
def get_volume_resources(self):
return Resources(
self.obj.get(constants.resources_key, {}).get(constants.volumes_key, {})
@ -128,9 +149,6 @@ class Spec:
def get_http_proxy(self):
return self.obj.get(constants.network_key, {}).get(constants.http_proxy_key, [])
def get_acme_email(self):
return self.obj.get(constants.network_key, {}).get("acme-email", "")
def get_annotations(self):
return self.obj.get(constants.annotations_key, {})

View File

@ -0,0 +1,53 @@
#!/bin/bash
# Run a test suite locally in an isolated venv.
#
# Usage:
# ./tests/scripts/run-test-local.sh <test-script>
#
# Examples:
# ./tests/scripts/run-test-local.sh tests/webapp-test/run-webapp-test.sh
# ./tests/scripts/run-test-local.sh tests/smoke-test/run-smoke-test.sh
# ./tests/scripts/run-test-local.sh tests/k8s-deploy/run-deploy-test.sh
#
# The script creates a temporary venv, installs shiv, builds the laconic-so
# package, runs the requested test, then cleans up.
set -euo pipefail
if [ $# -lt 1 ]; then
echo "Usage: $0 <test-script> [args...]"
exit 1
fi
TEST_SCRIPT="$1"
shift
if [ ! -f "$TEST_SCRIPT" ]; then
echo "Error: $TEST_SCRIPT not found"
exit 1
fi
REPO_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")/../.." && pwd)"
VENV_DIR=$(mktemp -d /tmp/so-test-XXXXXX)
cleanup() {
echo "Cleaning up venv: $VENV_DIR"
rm -rf "$VENV_DIR"
}
trap cleanup EXIT
cd "$REPO_DIR"
echo "==> Creating venv in $VENV_DIR"
python3 -m venv "$VENV_DIR"
source "$VENV_DIR/bin/activate"
echo "==> Installing shiv"
pip install -q shiv
echo "==> Building laconic-so package"
./scripts/create_build_tag_file.sh
./scripts/build_shiv_package.sh
echo "==> Running: $TEST_SCRIPT $*"
exec "./$TEST_SCRIPT" "$@"