Compare commits

..

5 Commits

Author SHA1 Message Date
AFDudley 8cc0a9a19a add/local-test-runner (#996)
Publish / Build and publish (push) Failing after 0s Details
Deploy Test / Run deploy test suite (push) Failing after 0s Details
Webapp Test / Run webapp test suite (push) Failing after 0s Details
Lint Checks / Run linter (push) Failing after 0s Details
Smoke Test / Run basic test suite (push) Failing after 0s Details
Co-authored-by: A. F. Dudley <a.frederick.dudley@gmail.com>
Reviewed-on: https://git.vdb.to/cerc-io/stack-orchestrator/pulls/996
2026-03-09 20:04:58 +00:00
AFDudley 4a1b5d86fd Merge pull request 'fix(k8s): translate service names to localhost for sidecar containers' (#989) from fix-sidecar-localhost into main
Webapp Test / Run webapp test suite (push) Failing after 0s Details
Smoke Test / Run basic test suite (push) Failing after 0s Details
Lint Checks / Run linter (push) Failing after 0s Details
Publish / Build and publish (push) Failing after 0s Details
Deploy Test / Run deploy test suite (push) Failing after 0s Details
Reviewed-on: https://git.vdb.to/cerc-io/stack-orchestrator/pulls/989
2026-02-03 23:13:27 +00:00
A. F. Dudley 019225ca18 fix(k8s): translate service names to localhost for sidecar containers
In docker-compose, services can reference each other by name (e.g., 'db:5432').
In Kubernetes, when multiple containers are in the same pod (sidecars), they
share the same network namespace and must use 'localhost' instead.

This fix adds translate_sidecar_service_names() which replaces docker-compose
service name references with 'localhost' in environment variable values for
containers that share the same pod.

Fixes issue where multi-container pods fail because one container tries to
connect to a sibling using the compose service name instead of localhost.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-03 18:10:32 -05:00
AFDudley 0296da6f64 Merge pull request 'feat(k8s): namespace-per-deployment for resource isolation and cleanup' (#988) from feat-namespace-per-deployment into main
Reviewed-on: https://git.vdb.to/cerc-io/stack-orchestrator/pulls/988
2026-02-03 23:09:16 +00:00
A. F. Dudley d913926144 feat(k8s): namespace-per-deployment for resource isolation and cleanup
Each deployment now gets its own Kubernetes namespace (laconic-{deployment_id}).
This provides:
- Resource isolation between deployments on the same cluster
- Simplified cleanup: deleting the namespace cascades to all namespaced resources
- No orphaned resources possible when deployment IDs change

Changes:
- Set k8s_namespace based on deployment name in __init__
- Add _ensure_namespace() to create namespace before deploying resources
- Add _delete_namespace() for cleanup
- Simplify down() to just delete PVs (cluster-scoped) and the namespace
- Fix hardcoded "default" namespace in logs function

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-03 18:04:52 -05:00
6 changed files with 227 additions and 107 deletions

19
TODO.md
View File

@ -7,6 +7,25 @@ We need an "update stack" command in stack orchestrator and cleaner documentatio
**Context**: Currently, `deploy init` generates a spec file and `deploy create` creates a deployment directory. The `deployment update` command (added by Thomas Lackey) only syncs env vars and restarts - it doesn't regenerate configurations. There's a gap in the workflow for updating stack configurations after initial deployment.
## Bugs
### `deploy create` doesn't auto-generate volume mappings for new pods
When a new pod is added to `stack.yml` (e.g. `monitoring`), `deploy create`
does not generate default host path mappings in spec.yml for the new pod's
volumes. The deployment then fails at scheduling because the PVCs don't exist.
**Expected**: `deploy create` enumerates all volumes from all compose files
in the stack and generates default host paths for any that aren't already
mapped in the spec.yml `volumes:` section.
**Actual**: Only volumes already in spec.yml get PVs. New volumes are silently
missing, causing `FailedScheduling: persistentvolumeclaim not found`.
**Workaround**: Manually add volume entries to spec.yml and create host dirs.
**Files**: `deployment_create.py` (`_write_config_file`, volume handling)
## Architecture Refactoring
### Separate Deployer from Stack Orchestrator CLI

View File

@ -31,6 +31,7 @@ from stack_orchestrator.deploy.k8s.helpers import (
envs_from_environment_variables_map,
envs_from_compose_file,
merge_envs,
translate_sidecar_service_names,
)
from stack_orchestrator.deploy.deploy_util import (
parsed_pod_files_map_from_file_names,
@ -393,13 +394,43 @@ class ClusterInfo:
result.append(pv)
return result
def _any_service_has_host_network(self):
for pod_name in self.parsed_pod_yaml_map:
pod = self.parsed_pod_yaml_map[pod_name]
for svc in pod.get("services", {}).values():
if svc.get("network_mode") == "host":
return True
return False
def _resolve_container_resources(
self, container_name: str, service_info: dict, global_resources: Resources
) -> Resources:
"""Resolve resources for a container using layered priority.
Priority: spec per-container > compose deploy.resources
> spec global > DEFAULT
"""
# 1. Check spec.yml for per-container override
per_container = self.spec.get_container_resources_for(container_name)
if per_container:
return per_container
# 2. Check compose service_info for deploy.resources
deploy_block = service_info.get("deploy", {})
compose_resources = deploy_block.get("resources", {}) if deploy_block else {}
if compose_resources:
return Resources(compose_resources)
# 3. Fall back to spec.yml global (already resolved with DEFAULT fallback)
return global_resources
# TODO: put things like image pull policy into an object-scope struct
def get_deployment(self, image_pull_policy: Optional[str] = None):
containers = []
services = {}
resources = self.spec.get_container_resources()
if not resources:
resources = DEFAULT_CONTAINER_RESOURCES
global_resources = self.spec.get_container_resources()
if not global_resources:
global_resources = DEFAULT_CONTAINER_RESOURCES
for pod_name in self.parsed_pod_yaml_map:
pod = self.parsed_pod_yaml_map[pod_name]
services = pod["services"]
@ -439,6 +470,12 @@ class ClusterInfo:
if "environment" in service_info
else self.environment_variables.map
)
# Translate docker-compose service names to localhost for sidecars
# All services in the same pod share the network namespace
sibling_services = [s for s in services.keys() if s != service_name]
merged_envs = translate_sidecar_service_names(
merged_envs, sibling_services
)
envs = envs_from_environment_variables_map(merged_envs)
if opts.o.debug:
print(f"Merged envs: {envs}")
@ -476,6 +513,9 @@ class ClusterInfo:
)
)
]
container_resources = self._resolve_container_resources(
container_name, service_info, global_resources
)
container = client.V1Container(
name=container_name,
image=image_to_use,
@ -494,7 +534,7 @@ class ClusterInfo:
if self.spec.get_capabilities()
else None,
),
resources=to_k8s_resource_requirements(resources),
resources=to_k8s_resource_requirements(container_resources),
)
containers.append(container)
volumes = volumes_for_pod_files(
@ -561,6 +601,7 @@ class ClusterInfo:
)
)
use_host_network = self._any_service_has_host_network()
template = client.V1PodTemplateSpec(
metadata=client.V1ObjectMeta(annotations=annotations, labels=labels),
spec=client.V1PodSpec(
@ -570,6 +611,8 @@ class ClusterInfo:
affinity=affinity,
tolerations=tolerations,
runtime_class_name=self.spec.get_runtime_class(),
host_network=use_host_network or None,
dns_policy=("ClusterFirstWithHostNet" if use_host_network else None),
),
)
spec = client.V1DeploymentSpec(

View File

@ -96,7 +96,7 @@ class K8sDeployer(Deployer):
core_api: client.CoreV1Api
apps_api: client.AppsV1Api
networking_api: client.NetworkingV1Api
k8s_namespace: str = "default"
k8s_namespace: str
kind_cluster_name: str
skip_cluster_management: bool
cluster_info: ClusterInfo
@ -113,6 +113,7 @@ class K8sDeployer(Deployer):
) -> None:
self.type = type
self.skip_cluster_management = False
self.k8s_namespace = "default" # Will be overridden below if context exists
# TODO: workaround pending refactoring above to cope with being
# created with a null deployment_context
if deployment_context is None:
@ -120,6 +121,8 @@ class K8sDeployer(Deployer):
self.deployment_dir = deployment_context.deployment_dir
self.deployment_context = deployment_context
self.kind_cluster_name = compose_project_name
# Use deployment-specific namespace for resource isolation and easy cleanup
self.k8s_namespace = f"laconic-{compose_project_name}"
self.cluster_info = ClusterInfo()
self.cluster_info.int(
compose_files,
@ -149,6 +152,46 @@ class K8sDeployer(Deployer):
self.apps_api = client.AppsV1Api()
self.custom_obj_api = client.CustomObjectsApi()
def _ensure_namespace(self):
"""Create the deployment namespace if it doesn't exist."""
if opts.o.dry_run:
print(f"Dry run: would create namespace {self.k8s_namespace}")
return
try:
self.core_api.read_namespace(name=self.k8s_namespace)
if opts.o.debug:
print(f"Namespace {self.k8s_namespace} already exists")
except ApiException as e:
if e.status == 404:
# Create the namespace
ns = client.V1Namespace(
metadata=client.V1ObjectMeta(
name=self.k8s_namespace,
labels={"app": self.cluster_info.app_name},
)
)
self.core_api.create_namespace(body=ns)
if opts.o.debug:
print(f"Created namespace {self.k8s_namespace}")
else:
raise
def _delete_namespace(self):
"""Delete the deployment namespace and all resources within it."""
if opts.o.dry_run:
print(f"Dry run: would delete namespace {self.k8s_namespace}")
return
try:
self.core_api.delete_namespace(name=self.k8s_namespace)
if opts.o.debug:
print(f"Deleted namespace {self.k8s_namespace}")
except ApiException as e:
if e.status == 404:
if opts.o.debug:
print(f"Namespace {self.k8s_namespace} not found")
else:
raise
def _create_volume_data(self):
# Create the host-path-mounted PVs for this deployment
pvs = self.cluster_info.get_pvs()
@ -314,6 +357,8 @@ class K8sDeployer(Deployer):
load_images_into_kind(self.kind_cluster_name, local_images)
# Note: if no local containers defined, all images come from registries
self.connect_api()
# Create deployment-specific namespace for resource isolation
self._ensure_namespace()
if self.is_kind() and not self.skip_cluster_management:
# Configure ingress controller (not installed by default in kind)
# Skip if already running (idempotent for shared cluster)
@ -381,17 +426,12 @@ class K8sDeployer(Deployer):
print("NodePort created:")
print(f"{nodeport_resp}")
def down(self, timeout, volumes, skip_cluster_management): # noqa: C901
def down(self, timeout, volumes, skip_cluster_management):
self.skip_cluster_management = skip_cluster_management
self.connect_api()
# Query K8s for resources by label selector instead of generating names
# from config. This ensures we clean up orphaned resources when deployment
# IDs change (e.g., after force_redeploy).
label_selector = f"app={self.cluster_info.app_name}"
# PersistentVolumes are cluster-scoped (not namespaced), so delete by label
if volumes:
# Delete PVs for this deployment (PVs use volume-label pattern)
try:
pvs = self.core_api.list_persistent_volume(
label_selector=f"app={self.cluster_info.app_name}"
@ -407,97 +447,9 @@ class K8sDeployer(Deployer):
if opts.o.debug:
print(f"Error listing PVs: {e}")
# Delete PVCs for this deployment
try:
pvcs = self.core_api.list_namespaced_persistent_volume_claim(
namespace=self.k8s_namespace, label_selector=label_selector
)
for pvc in pvcs.items:
if opts.o.debug:
print(f"Deleting PVC: {pvc.metadata.name}")
try:
self.core_api.delete_namespaced_persistent_volume_claim(
name=pvc.metadata.name, namespace=self.k8s_namespace
)
except ApiException as e:
_check_delete_exception(e)
except ApiException as e:
if opts.o.debug:
print(f"Error listing PVCs: {e}")
# Delete ConfigMaps for this deployment
try:
cfg_maps = self.core_api.list_namespaced_config_map(
namespace=self.k8s_namespace, label_selector=label_selector
)
for cfg_map in cfg_maps.items:
if opts.o.debug:
print(f"Deleting ConfigMap: {cfg_map.metadata.name}")
try:
self.core_api.delete_namespaced_config_map(
name=cfg_map.metadata.name, namespace=self.k8s_namespace
)
except ApiException as e:
_check_delete_exception(e)
except ApiException as e:
if opts.o.debug:
print(f"Error listing ConfigMaps: {e}")
# Delete Deployments for this deployment
try:
deployments = self.apps_api.list_namespaced_deployment(
namespace=self.k8s_namespace, label_selector=label_selector
)
for deployment in deployments.items:
if opts.o.debug:
print(f"Deleting Deployment: {deployment.metadata.name}")
try:
self.apps_api.delete_namespaced_deployment(
name=deployment.metadata.name, namespace=self.k8s_namespace
)
except ApiException as e:
_check_delete_exception(e)
except ApiException as e:
if opts.o.debug:
print(f"Error listing Deployments: {e}")
# Delete Services for this deployment (includes both ClusterIP and NodePort)
try:
services = self.core_api.list_namespaced_service(
namespace=self.k8s_namespace, label_selector=label_selector
)
for service in services.items:
if opts.o.debug:
print(f"Deleting Service: {service.metadata.name}")
try:
self.core_api.delete_namespaced_service(
namespace=self.k8s_namespace, name=service.metadata.name
)
except ApiException as e:
_check_delete_exception(e)
except ApiException as e:
if opts.o.debug:
print(f"Error listing Services: {e}")
# Delete Ingresses for this deployment
try:
ingresses = self.networking_api.list_namespaced_ingress(
namespace=self.k8s_namespace, label_selector=label_selector
)
for ingress in ingresses.items:
if opts.o.debug:
print(f"Deleting Ingress: {ingress.metadata.name}")
try:
self.networking_api.delete_namespaced_ingress(
name=ingress.metadata.name, namespace=self.k8s_namespace
)
except ApiException as e:
_check_delete_exception(e)
if not ingresses.items and opts.o.debug:
print("No ingress to delete")
except ApiException as e:
if opts.o.debug:
print(f"Error listing Ingresses: {e}")
# Delete the deployment namespace - this cascades to all namespaced resources
# (PVCs, ConfigMaps, Deployments, Services, Ingresses, etc.)
self._delete_namespace()
if self.is_kind() and not self.skip_cluster_management:
# Destroy the kind cluster
@ -635,7 +587,7 @@ class K8sDeployer(Deployer):
log_data = ""
for container in containers:
container_log = self.core_api.read_namespaced_pod_log(
k8s_pod_name, namespace="default", container=container
k8s_pod_name, namespace=self.k8s_namespace, container=container
)
container_log_lines = container_log.splitlines()
for line in container_log_lines:

View File

@ -942,6 +942,41 @@ def envs_from_compose_file(
return result
def translate_sidecar_service_names(
envs: Mapping[str, str], sibling_service_names: List[str]
) -> Mapping[str, str]:
"""Translate docker-compose service names to localhost for sidecar containers.
In docker-compose, services can reference each other by name (e.g., 'db:5432').
In Kubernetes, when multiple containers are in the same pod (sidecars), they
share the same network namespace and must use 'localhost' instead.
This function replaces service name references with 'localhost' in env values.
"""
import re
if not sibling_service_names:
return envs
result = {}
for env_var, env_val in envs.items():
if env_val is None:
result[env_var] = env_val
continue
new_val = str(env_val)
for service_name in sibling_service_names:
# Match service name followed by optional port (e.g., 'db:5432', 'db')
# Handle URLs like: postgres://user:pass@db:5432/dbname
# and simple refs like: db:5432 or just db
pattern = rf"\b{re.escape(service_name)}(:\d+)?\b"
new_val = re.sub(pattern, lambda m: f'localhost{m.group(1) or ""}', new_val)
result[env_var] = new_val
return result
def envs_from_environment_variables_map(
map: Mapping[str, str]
) -> List[client.V1EnvVar]:

View File

@ -120,6 +120,27 @@ class Spec:
self.obj.get(constants.resources_key, {}).get("containers", {})
)
def get_container_resources_for(
self, container_name: str
) -> typing.Optional[Resources]:
"""Look up per-container resource overrides from spec.yml.
Checks resources.containers.<container_name> in the spec. Returns None
if no per-container override exists (caller falls back to other sources).
"""
containers_block = self.obj.get(constants.resources_key, {}).get(
"containers", {}
)
if container_name in containers_block:
entry = containers_block[container_name]
# Only treat it as a per-container override if it's a dict with
# reservations/limits nested inside (not a top-level global key)
if isinstance(entry, dict) and (
"reservations" in entry or "limits" in entry
):
return Resources(entry)
return None
def get_volume_resources(self):
return Resources(
self.obj.get(constants.resources_key, {}).get(constants.volumes_key, {})
@ -128,9 +149,6 @@ class Spec:
def get_http_proxy(self):
return self.obj.get(constants.network_key, {}).get(constants.http_proxy_key, [])
def get_acme_email(self):
return self.obj.get(constants.network_key, {}).get("acme-email", "")
def get_annotations(self):
return self.obj.get(constants.annotations_key, {})

View File

@ -0,0 +1,53 @@
#!/bin/bash
# Run a test suite locally in an isolated venv.
#
# Usage:
# ./tests/scripts/run-test-local.sh <test-script>
#
# Examples:
# ./tests/scripts/run-test-local.sh tests/webapp-test/run-webapp-test.sh
# ./tests/scripts/run-test-local.sh tests/smoke-test/run-smoke-test.sh
# ./tests/scripts/run-test-local.sh tests/k8s-deploy/run-deploy-test.sh
#
# The script creates a temporary venv, installs shiv, builds the laconic-so
# package, runs the requested test, then cleans up.
set -euo pipefail
if [ $# -lt 1 ]; then
echo "Usage: $0 <test-script> [args...]"
exit 1
fi
TEST_SCRIPT="$1"
shift
if [ ! -f "$TEST_SCRIPT" ]; then
echo "Error: $TEST_SCRIPT not found"
exit 1
fi
REPO_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")/../.." && pwd)"
VENV_DIR=$(mktemp -d /tmp/so-test-XXXXXX)
cleanup() {
echo "Cleaning up venv: $VENV_DIR"
rm -rf "$VENV_DIR"
}
trap cleanup EXIT
cd "$REPO_DIR"
echo "==> Creating venv in $VENV_DIR"
python3 -m venv "$VENV_DIR"
source "$VENV_DIR/bin/activate"
echo "==> Installing shiv"
pip install -q shiv
echo "==> Building laconic-so package"
./scripts/create_build_tag_file.sh
./scripts/build_shiv_package.sh
echo "==> Running: $TEST_SCRIPT $*"
exec "./$TEST_SCRIPT" "$@"