so-l2l: in-place stop/restart via label-scoped cleanup (#743)
- `down()` scopes cleanup to a single stack via `app.kubernetes.io/stack` and keeps the namespace `Active` by default - New `stop/down --delete-namespace` flag for opt-in full teardown - `down()` is synchronous - waits until resources are actually gone before returning. Callers can drop their own wait loops - `up()` skip-if-exists for Jobs completes the create-or-replace coverage (other kinds already had it) - Orphan PVs from a prior `stop --delete-namespace` get cleaned on the next `stop --delete-volumes` - Every k8s resource SO creates now carries `app.kubernetes.io/stack` via a new `ClusterInfo._stack_labels()` helper - Closes so-l2l, so-076.2. Also includes pebble audit: closes so-c71, so-b2b, so-k1k; files so-328pull/744/head v1.1.0-fc5dc80-202604160643
parent
f40913d187
commit
fc5dc80058
|
|
@ -26,4 +26,16 @@
|
||||||
{"type":"create","timestamp":"2026-04-08T05:51:31.557582604Z","issue_id":"so-5cd","payload":{"description":"The DockerDeployer.up() in stack_orchestrator/deploy/compose/deploy_docker.py accepts image_overrides as a parameter but silently drops it — only k8s mode (deploy_k8s.py) actually applies overrides.\n\nImpact: the --image container=image CLI flag on 'laconic-so deployment start' is a no-op for compose-mode deployments. Spec-level image-overrides: keys are also ignored in compose mode (they reach up() via deployment.py but are never applied).\n\nUse case: gorchain-stacks test scripts build :local images via build-containers, but compose files reference ghcr.io/gorbagana-dev/*:latest (so prod pulls work). Without image override support in compose mode, tests either need to docker tag the builds or the compose file needs to be rewritten before start — both ugly workarounds for what should be a first-class mechanism.\n\nFix sketch: in DockerDeployer.up(), when image_overrides is non-empty, write a temporary docker-compose.override.yml with {services: {name: {image: ref}}} and construct a new DockerClient with compose_files + [override_path]. Keeps k8s path untouched, reuses existing --image CLI flag and spec-level image-overrides: plumbing.","priority":"2","title":"Compose deployer ignores image_overrides","type":"bug"}}
|
{"type":"create","timestamp":"2026-04-08T05:51:31.557582604Z","issue_id":"so-5cd","payload":{"description":"The DockerDeployer.up() in stack_orchestrator/deploy/compose/deploy_docker.py accepts image_overrides as a parameter but silently drops it — only k8s mode (deploy_k8s.py) actually applies overrides.\n\nImpact: the --image container=image CLI flag on 'laconic-so deployment start' is a no-op for compose-mode deployments. Spec-level image-overrides: keys are also ignored in compose mode (they reach up() via deployment.py but are never applied).\n\nUse case: gorchain-stacks test scripts build :local images via build-containers, but compose files reference ghcr.io/gorbagana-dev/*:latest (so prod pulls work). Without image override support in compose mode, tests either need to docker tag the builds or the compose file needs to be rewritten before start — both ugly workarounds for what should be a first-class mechanism.\n\nFix sketch: in DockerDeployer.up(), when image_overrides is non-empty, write a temporary docker-compose.override.yml with {services: {name: {image: ref}}} and construct a new DockerClient with compose_files + [override_path]. Keeps k8s path untouched, reuses existing --image CLI flag and spec-level image-overrides: plumbing.","priority":"2","title":"Compose deployer ignores image_overrides","type":"bug"}}
|
||||||
{"type": "create", "timestamp": "2026-04-13T09:54:05.207241Z", "issue_id": "so-c71", "payload": {"title": "extraPortMappings maps all compose ports unconditionally", "type": "bug", "priority": "2", "description": "Commit fb69cc58 added compose service port mapping to Kind extraPortMappings. The intent was to support network_mode: host services (RPC, gossip), but the implementation maps ALL compose ports unconditionally. Internal-only ports (postgres 5432, redis 6379) get exposed on the host, causing conflicts with local services. The port mapping should only apply to services with network_mode: host, or be controlled by a spec-level opt-in.", "source_commit": "fb69cc58"}}
|
{"type": "create", "timestamp": "2026-04-13T09:54:05.207241Z", "issue_id": "so-c71", "payload": {"title": "extraPortMappings maps all compose ports unconditionally", "type": "bug", "priority": "2", "description": "Commit fb69cc58 added compose service port mapping to Kind extraPortMappings. The intent was to support network_mode: host services (RPC, gossip), but the implementation maps ALL compose ports unconditionally. Internal-only ports (postgres 5432, redis 6379) get exposed on the host, causing conflicts with local services. The port mapping should only apply to services with network_mode: host, or be controlled by a spec-level opt-in.", "source_commit": "fb69cc58"}}
|
||||||
{"type": "create", "timestamp": "2026-04-14T09:53:31.040118Z", "issue_id": "so-078", "payload": {"title": "Deployments should be self-sufficient: copy hooks into deployment dir", "type": "feature", "priority": "1", "description": "deploy/commands.py hooks are resolved from the stack repo at runtime via get_stack_path. The deployment dir has no copy. This means: (1) the repo must remain at the same path after deploy create, (2) deployment start/restart fail with 'stack does not exist' if cwd differs from deploy create time (stack-source in deployment.yml is relative), (3) deployments cannot be moved or run independently of the source repo. Fix: deploy create should copy deploy/commands.py into the deployment dir alongside compose files and configmaps. call_stack_deploy_start should load from the deployment dir. The deployment becomes self-sufficient."}}
|
{"type": "create", "timestamp": "2026-04-14T09:53:31.040118Z", "issue_id": "so-078", "payload": {"title": "Deployments should be self-sufficient: copy hooks into deployment dir", "type": "feature", "priority": "1", "description": "deploy/commands.py hooks are resolved from the stack repo at runtime via get_stack_path. The deployment dir has no copy. This means: (1) the repo must remain at the same path after deploy create, (2) deployment start/restart fail with 'stack does not exist' if cwd differs from deploy create time (stack-source in deployment.yml is relative), (3) deployments cannot be moved or run independently of the source repo. Fix: deploy create should copy deploy/commands.py into the deployment dir alongside compose files and configmaps. call_stack_deploy_start should load from the deployment dir. The deployment becomes self-sufficient."}}
|
||||||
{"type": "update", "timestamp": "2026-04-14T10:01:14.937483Z", "issue_id": "so-c71", "payload": {"status": "resolved", "resolution": "Fixed in commit e909357a on fix/extraport-host-only branch. Only map ports for services with network_mode: host. Ports 80/443 for Caddy always mapped."}}
|
{"type":"comment","timestamp":"2026-04-15T06:12:45.58660796Z","issue_id":"so-c71","payload":{"body":"Fixed in commit e909357a on fix/extraport-host-only branch. Only map ports for services with network_mode: host. Ports 80/443 for Caddy always mapped."}}
|
||||||
|
{"type":"close","timestamp":"2026-04-15T06:12:45.832454065Z","issue_id":"so-c71","payload":{}}
|
||||||
|
{"type":"comment","timestamp":"2026-04-15T06:18:02.64056792Z","issue_id":"so-b2b","payload":{"body":"Fixed. create_registry_secret() in deployment_create.py:583 reads image-pull-secret from spec, resolves token via token-env/token-file. Spec key renamed from registry-credentials to image-pull-secret (spec.py:140). Documented in docs/deployment_patterns.md with REGISTRY_TOKEN usage example."}}
|
||||||
|
{"type":"close","timestamp":"2026-04-15T06:18:02.965856003Z","issue_id":"so-b2b","payload":{}}
|
||||||
|
{"type":"comment","timestamp":"2026-04-15T06:18:04.543850719Z","issue_id":"so-k1k","payload":{"body":"Largely resolved. deployment restart (deployment.py:324) now uses 'git rev-parse --show-toplevel' to find repo_root dynamically (lines 364-378), removing the fixed 4-parents-up assumption. External stacks with varying nesting depths now work for restart. deploy create still uses get_stack_path(stack_name) which is a different mechanism but works correctly with --stack-path. Closing — the underlying breakage is gone."}}
|
||||||
|
{"type":"close","timestamp":"2026-04-15T06:18:04.856542806Z","issue_id":"so-k1k","payload":{}}
|
||||||
|
{"type":"comment","timestamp":"2026-04-15T06:18:08.436540869Z","issue_id":"so-076.2","payload":{"body":"Partially mitigated by commit cc6acd5f which flipped --skip-cluster-management default to true, so 'deployment stop' no longer destroys the cluster by default. Root fix still open: down() in deploy_k8s.py:904-936 unconditionally calls _delete_namespace() (line 929) and destroy_cluster() (line 936) when --perform-cluster-management is passed. No logic distinguishes shared vs dedicated clusters."}}
|
||||||
|
{"type":"comment","timestamp":"2026-04-15T06:18:11.374723274Z","issue_id":"so-l2l","payload":{"body":"Partially addressed. Readiness probes are now generated in cluster_info.py:652-671 (part C of the original fix). Parts A and B still open: up() does not use patch/apply (delete/recreate semantics remain), and down() still calls _delete_namespace() unconditionally at deploy_k8s.py:929 on every restart. A 'fix: never delete namespace on deployment down' commit (ae2cea34) exists on a remote branch but is not merged to main."}}
|
||||||
|
{"type":"create","timestamp":"2026-04-15T11:11:15.584733236Z","issue_id":"so-328","payload":{"description":"deployment restart runs create_operation(update=True) which uses copytree(dirs_exist_ok=True) to sync the stack repo into the deployment dir (deployment_create.py:1079, 1130). This is additive only — it overwrites and adds files, but never removes them. Two resulting gaps:\n\n1. Deletions don't propagate. If a script, configmap file, or compose service is removed from the stack repo, the deployment dir keeps it, and up_operation keeps applying it. The k8s ConfigMap retains removed keys; removed Deployments/Services are not cleaned up (up() is create/patch, not full reconciliation). Operators see stale files and orphan workloads that won't disappear without manual kubectl intervention or a full teardown.\n\n2. stack.yml structural changes don't auto-surface in the spec. If a stack.yml gains a new configmap declaration or a new compose file reference, restart doesn't pull it in unless the operator's spec.yml already references it. The spec is the contract, so additions to the stack aren't applied to live deployments just by pulling the repo.\n\nTeardown + redeploy is the only reliable way to clean up today, which is not practical in production.\n\nFix direction: create_operation(update=True) should treat the deployment dir as reconciled state — diff the desired tree (from the stack repo + spec) against what's on disk and remove files that no longer exist upstream. up_operation should then delete k8s resources (Deployments, Services, ConfigMaps) that are no longer declared by any compose/configmap source, likely scoped by an 'app.kubernetes.io/managed-by: laconic-so' label to avoid nuking unrelated resources. For new stack.yml entries, consider whether the spec needs operator action or whether restart can auto-detect and warn.","priority":"3","title":"deployment restart does not propagate repo deletions or new stack.yml entries","type":"bug"}}
|
||||||
|
{"type":"comment","timestamp":"2026-04-16T06:24:38.826132538Z","issue_id":"so-l2l","payload":{"body":"Fixed in so-l2l Parts A and B on this branch:\n\n**Part A (up() as create-or-update):** Deployments, Services, ConfigMaps, Secrets, Ingresses, and Endpoints already used create-or-replace in up(). Completed coverage by adding 409 skip-if-exists for Jobs (one-shot, re-run unwanted). Readiness probes (Part C) were already present.\n\n**Part B (down() preserves namespace):** _delete_labeled_resources now deletes by 'app.kubernetes.io/stack' label and keeps the namespace Active. Full-teardown option is a new --delete-namespace flag on stop/down. down() is synchronous (waits for resources to actually be gone before returning) so tests and ansible can assume clean state on return. Orphan PVs from prior --delete-namespace runs are also cleaned on subsequent stop --delete-volumes.\n\nrestart no longer calls down() at all (deployment.py:421-468), so the original wd-d92-style namespace termination race is structurally impossible. In-cluster rolling updates work via k8s native semantics when Deployment pod specs change via replace."}}
|
||||||
|
{"type":"close","timestamp":"2026-04-16T06:24:39.175431401Z","issue_id":"so-l2l","payload":{}}
|
||||||
|
{"type":"comment","timestamp":"2026-04-16T06:24:41.70556861Z","issue_id":"so-076.2","payload":{"body":"Fixed on chore/pebble-status-audit. stop now uses label-scoped cleanup (app.kubernetes.io/stack=\u003cstack\u003e) and keeps the namespace Active by default. The Kind cluster is not destroyed unless --perform-cluster-management is passed. Full namespace teardown is opt-in via the new --delete-namespace flag. Multiple stacks sharing a namespace/cluster are now cleaned up independently, not blown away en masse."}}
|
||||||
|
{"type":"close","timestamp":"2026-04-16T06:24:42.153940477Z","issue_id":"so-076.2","payload":{}}
|
||||||
|
|
|
||||||
|
|
@ -55,7 +55,8 @@ class DockerDeployer(Deployer):
|
||||||
except DockerException as e:
|
except DockerException as e:
|
||||||
raise DeployerException(e)
|
raise DeployerException(e)
|
||||||
|
|
||||||
def down(self, timeout, volumes, skip_cluster_management):
|
def down(self, timeout, volumes, skip_cluster_management, delete_namespace=False):
|
||||||
|
# delete_namespace is k8s-only; ignored in compose mode.
|
||||||
if not opts.o.dry_run:
|
if not opts.o.dry_run:
|
||||||
try:
|
try:
|
||||||
return self.docker.compose.down(timeout=timeout, volumes=volumes)
|
return self.docker.compose.down(timeout=timeout, volumes=volumes)
|
||||||
|
|
|
||||||
|
|
@ -172,7 +172,13 @@ def up_operation(
|
||||||
)
|
)
|
||||||
|
|
||||||
|
|
||||||
def down_operation(ctx, delete_volumes, extra_args_list, skip_cluster_management=False):
|
def down_operation(
|
||||||
|
ctx,
|
||||||
|
delete_volumes,
|
||||||
|
extra_args_list,
|
||||||
|
skip_cluster_management=False,
|
||||||
|
delete_namespace=False,
|
||||||
|
):
|
||||||
timeout_arg = None
|
timeout_arg = None
|
||||||
if extra_args_list:
|
if extra_args_list:
|
||||||
timeout_arg = extra_args_list[0]
|
timeout_arg = extra_args_list[0]
|
||||||
|
|
@ -182,6 +188,7 @@ def down_operation(ctx, delete_volumes, extra_args_list, skip_cluster_management
|
||||||
timeout=timeout_arg,
|
timeout=timeout_arg,
|
||||||
volumes=delete_volumes,
|
volumes=delete_volumes,
|
||||||
skip_cluster_management=skip_cluster_management,
|
skip_cluster_management=skip_cluster_management,
|
||||||
|
delete_namespace=delete_namespace,
|
||||||
)
|
)
|
||||||
|
|
||||||
|
|
||||||
|
|
|
||||||
|
|
@ -24,7 +24,7 @@ class Deployer(ABC):
|
||||||
pass
|
pass
|
||||||
|
|
||||||
@abstractmethod
|
@abstractmethod
|
||||||
def down(self, timeout, volumes, skip_cluster_management):
|
def down(self, timeout, volumes, skip_cluster_management, delete_namespace=False):
|
||||||
pass
|
pass
|
||||||
|
|
||||||
@abstractmethod
|
@abstractmethod
|
||||||
|
|
|
||||||
|
|
@ -157,13 +157,21 @@ def prepare(ctx, skip_cluster_management):
|
||||||
default=True,
|
default=True,
|
||||||
help="Skip cluster initialization/tear-down (only for kind-k8s deployments)",
|
help="Skip cluster initialization/tear-down (only for kind-k8s deployments)",
|
||||||
)
|
)
|
||||||
|
@click.option(
|
||||||
|
"--delete-namespace",
|
||||||
|
is_flag=True,
|
||||||
|
default=False,
|
||||||
|
help="Also delete the k8s namespace (full teardown)",
|
||||||
|
)
|
||||||
@click.argument("extra_args", nargs=-1) # help: command: down <service1> <service2>
|
@click.argument("extra_args", nargs=-1) # help: command: down <service1> <service2>
|
||||||
@click.pass_context
|
@click.pass_context
|
||||||
def down(ctx, delete_volumes, skip_cluster_management, extra_args):
|
def down(ctx, delete_volumes, skip_cluster_management, delete_namespace, extra_args):
|
||||||
# Get the stack config file name
|
# Get the stack config file name
|
||||||
# TODO: add cluster name and env file here
|
# TODO: add cluster name and env file here
|
||||||
ctx.obj = make_deploy_context(ctx)
|
ctx.obj = make_deploy_context(ctx)
|
||||||
down_operation(ctx, delete_volumes, extra_args, skip_cluster_management)
|
down_operation(
|
||||||
|
ctx, delete_volumes, extra_args, skip_cluster_management, delete_namespace
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
# stop is the preferred alias for down
|
# stop is the preferred alias for down
|
||||||
|
|
@ -176,12 +184,20 @@ def down(ctx, delete_volumes, skip_cluster_management, extra_args):
|
||||||
default=True,
|
default=True,
|
||||||
help="Skip cluster initialization/tear-down (only for kind-k8s deployments)",
|
help="Skip cluster initialization/tear-down (only for kind-k8s deployments)",
|
||||||
)
|
)
|
||||||
|
@click.option(
|
||||||
|
"--delete-namespace",
|
||||||
|
is_flag=True,
|
||||||
|
default=False,
|
||||||
|
help="Also delete the k8s namespace (full teardown)",
|
||||||
|
)
|
||||||
@click.argument("extra_args", nargs=-1) # help: command: down <service1> <service2>
|
@click.argument("extra_args", nargs=-1) # help: command: down <service1> <service2>
|
||||||
@click.pass_context
|
@click.pass_context
|
||||||
def stop(ctx, delete_volumes, skip_cluster_management, extra_args):
|
def stop(ctx, delete_volumes, skip_cluster_management, delete_namespace, extra_args):
|
||||||
# TODO: add cluster name and env file here
|
# TODO: add cluster name and env file here
|
||||||
ctx.obj = make_deploy_context(ctx)
|
ctx.obj = make_deploy_context(ctx)
|
||||||
down_operation(ctx, delete_volumes, extra_args, skip_cluster_management)
|
down_operation(
|
||||||
|
ctx, delete_volumes, extra_args, skip_cluster_management, delete_namespace
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
@command.command()
|
@command.command()
|
||||||
|
|
|
||||||
|
|
@ -118,6 +118,17 @@ class ClusterInfo:
|
||||||
volumes.extend(named_volumes_from_pod_files(self.parsed_job_yaml_map))
|
volumes.extend(named_volumes_from_pod_files(self.parsed_job_yaml_map))
|
||||||
return volumes
|
return volumes
|
||||||
|
|
||||||
|
def _stack_labels(self, extra: Optional[dict] = None) -> dict:
|
||||||
|
"""Standard resource labels. Use on every k8s resource SO creates so
|
||||||
|
label-based cleanup (down by stack) can find them all.
|
||||||
|
"""
|
||||||
|
labels = {"app": self.app_name}
|
||||||
|
if self.stack_name:
|
||||||
|
labels["app.kubernetes.io/stack"] = self.stack_name
|
||||||
|
if extra:
|
||||||
|
labels.update(extra)
|
||||||
|
return labels
|
||||||
|
|
||||||
def get_nodeports(self):
|
def get_nodeports(self):
|
||||||
nodeports = []
|
nodeports = []
|
||||||
for pod_name in self.parsed_pod_yaml_map:
|
for pod_name in self.parsed_pod_yaml_map:
|
||||||
|
|
@ -151,7 +162,7 @@ class ClusterInfo:
|
||||||
f"{self.app_name}-nodeport-"
|
f"{self.app_name}-nodeport-"
|
||||||
f"{pod_port}-{protocol.lower()}"
|
f"{pod_port}-{protocol.lower()}"
|
||||||
),
|
),
|
||||||
labels={"app": self.app_name},
|
labels=self._stack_labels(),
|
||||||
),
|
),
|
||||||
spec=client.V1ServiceSpec(
|
spec=client.V1ServiceSpec(
|
||||||
type="NodePort",
|
type="NodePort",
|
||||||
|
|
@ -268,7 +279,7 @@ class ClusterInfo:
|
||||||
ingress = client.V1Ingress(
|
ingress = client.V1Ingress(
|
||||||
metadata=client.V1ObjectMeta(
|
metadata=client.V1ObjectMeta(
|
||||||
name=f"{self.app_name}-ingress",
|
name=f"{self.app_name}-ingress",
|
||||||
labels={"app": self.app_name},
|
labels=self._stack_labels(),
|
||||||
annotations=ingress_annotations,
|
annotations=ingress_annotations,
|
||||||
),
|
),
|
||||||
spec=spec,
|
spec=spec,
|
||||||
|
|
@ -323,7 +334,7 @@ class ClusterInfo:
|
||||||
service = client.V1Service(
|
service = client.V1Service(
|
||||||
metadata=client.V1ObjectMeta(
|
metadata=client.V1ObjectMeta(
|
||||||
name=f"{self.app_name}-service",
|
name=f"{self.app_name}-service",
|
||||||
labels={"app": self.app_name},
|
labels=self._stack_labels(),
|
||||||
),
|
),
|
||||||
spec=client.V1ServiceSpec(
|
spec=client.V1ServiceSpec(
|
||||||
type="ClusterIP",
|
type="ClusterIP",
|
||||||
|
|
@ -355,10 +366,9 @@ class ClusterInfo:
|
||||||
self.spec.get_volume_resources_for(volume_name) or global_resources
|
self.spec.get_volume_resources_for(volume_name) or global_resources
|
||||||
)
|
)
|
||||||
|
|
||||||
labels = {
|
labels = self._stack_labels(
|
||||||
"app": self.app_name,
|
{"volume-label": f"{self.app_name}-{volume_name}"}
|
||||||
"volume-label": f"{self.app_name}-{volume_name}",
|
)
|
||||||
}
|
|
||||||
if volume_path:
|
if volume_path:
|
||||||
storage_class_name = "manual"
|
storage_class_name = "manual"
|
||||||
k8s_volume_name = f"{self.app_name}-{volume_name}"
|
k8s_volume_name = f"{self.app_name}-{volume_name}"
|
||||||
|
|
@ -418,7 +428,7 @@ class ClusterInfo:
|
||||||
spec = client.V1ConfigMap(
|
spec = client.V1ConfigMap(
|
||||||
metadata=client.V1ObjectMeta(
|
metadata=client.V1ObjectMeta(
|
||||||
name=f"{self.app_name}-{cfg_map_name}",
|
name=f"{self.app_name}-{cfg_map_name}",
|
||||||
labels={"app": self.app_name, "configmap-label": cfg_map_name},
|
labels=self._stack_labels({"configmap-label": cfg_map_name}),
|
||||||
),
|
),
|
||||||
binary_data=data,
|
binary_data=data,
|
||||||
)
|
)
|
||||||
|
|
@ -482,10 +492,9 @@ class ClusterInfo:
|
||||||
pv = client.V1PersistentVolume(
|
pv = client.V1PersistentVolume(
|
||||||
metadata=client.V1ObjectMeta(
|
metadata=client.V1ObjectMeta(
|
||||||
name=f"{self.app_name}-{volume_name}",
|
name=f"{self.app_name}-{volume_name}",
|
||||||
labels={
|
labels=self._stack_labels(
|
||||||
"app": self.app_name,
|
{"volume-label": f"{self.app_name}-{volume_name}"}
|
||||||
"volume-label": f"{self.app_name}-{volume_name}",
|
),
|
||||||
},
|
|
||||||
),
|
),
|
||||||
spec=spec,
|
spec=spec,
|
||||||
)
|
)
|
||||||
|
|
@ -737,9 +746,7 @@ class ClusterInfo:
|
||||||
Returns (annotations, labels, affinity, tolerations).
|
Returns (annotations, labels, affinity, tolerations).
|
||||||
"""
|
"""
|
||||||
annotations = None
|
annotations = None
|
||||||
labels = {"app": self.app_name}
|
labels = self._stack_labels()
|
||||||
if self.stack_name:
|
|
||||||
labels["app.kubernetes.io/stack"] = self.stack_name
|
|
||||||
affinity = None
|
affinity = None
|
||||||
tolerations = None
|
tolerations = None
|
||||||
|
|
||||||
|
|
@ -920,21 +927,11 @@ class ClusterInfo:
|
||||||
kind="Deployment",
|
kind="Deployment",
|
||||||
metadata=client.V1ObjectMeta(
|
metadata=client.V1ObjectMeta(
|
||||||
name=deployment_name,
|
name=deployment_name,
|
||||||
labels={
|
labels=self._stack_labels(
|
||||||
"app": self.app_name,
|
{"app.kubernetes.io/component": pod_name}
|
||||||
**(
|
if multi_pod
|
||||||
{
|
else None
|
||||||
"app.kubernetes.io/stack": self.stack_name,
|
),
|
||||||
}
|
|
||||||
if self.stack_name
|
|
||||||
else {}
|
|
||||||
),
|
|
||||||
**(
|
|
||||||
{"app.kubernetes.io/component": pod_name}
|
|
||||||
if multi_pod
|
|
||||||
else {}
|
|
||||||
),
|
|
||||||
},
|
|
||||||
),
|
),
|
||||||
spec=spec,
|
spec=spec,
|
||||||
)
|
)
|
||||||
|
|
@ -1001,7 +998,7 @@ class ClusterInfo:
|
||||||
service = client.V1Service(
|
service = client.V1Service(
|
||||||
metadata=client.V1ObjectMeta(
|
metadata=client.V1ObjectMeta(
|
||||||
name=f"{self.app_name}-{pod_name}-service",
|
name=f"{self.app_name}-{pod_name}-service",
|
||||||
labels={"app": self.app_name},
|
labels=self._stack_labels(),
|
||||||
),
|
),
|
||||||
spec=client.V1ServiceSpec(
|
spec=client.V1ServiceSpec(
|
||||||
type="ClusterIP",
|
type="ClusterIP",
|
||||||
|
|
@ -1054,14 +1051,9 @@ class ClusterInfo:
|
||||||
|
|
||||||
# Use a distinct app label for job pods so they don't get
|
# Use a distinct app label for job pods so they don't get
|
||||||
# picked up by pods_in_deployment() which queries app={app_name}.
|
# picked up by pods_in_deployment() which queries app={app_name}.
|
||||||
pod_labels = {
|
# Use a distinct app label for job pods (see comment above) so we
|
||||||
"app": f"{self.app_name}-job",
|
# still build via _stack_labels then override.
|
||||||
**(
|
pod_labels = self._stack_labels({"app": f"{self.app_name}-job"})
|
||||||
{"app.kubernetes.io/stack": self.stack_name}
|
|
||||||
if self.stack_name
|
|
||||||
else {}
|
|
||||||
),
|
|
||||||
}
|
|
||||||
template = client.V1PodTemplateSpec(
|
template = client.V1PodTemplateSpec(
|
||||||
metadata=client.V1ObjectMeta(labels=pod_labels),
|
metadata=client.V1ObjectMeta(labels=pod_labels),
|
||||||
spec=client.V1PodSpec(
|
spec=client.V1PodSpec(
|
||||||
|
|
@ -1076,14 +1068,7 @@ class ClusterInfo:
|
||||||
template=template,
|
template=template,
|
||||||
backoff_limit=0,
|
backoff_limit=0,
|
||||||
)
|
)
|
||||||
job_labels = {
|
job_labels = self._stack_labels()
|
||||||
"app": self.app_name,
|
|
||||||
**(
|
|
||||||
{"app.kubernetes.io/stack": self.stack_name}
|
|
||||||
if self.stack_name
|
|
||||||
else {}
|
|
||||||
),
|
|
||||||
}
|
|
||||||
job = client.V1Job(
|
job = client.V1Job(
|
||||||
api_version="batch/v1",
|
api_version="batch/v1",
|
||||||
kind="Job",
|
kind="Job",
|
||||||
|
|
@ -1121,7 +1106,7 @@ class ClusterInfo:
|
||||||
svc = client.V1Service(
|
svc = client.V1Service(
|
||||||
metadata=client.V1ObjectMeta(
|
metadata=client.V1ObjectMeta(
|
||||||
name=name,
|
name=name,
|
||||||
labels={"app": self.app_name},
|
labels=self._stack_labels(),
|
||||||
),
|
),
|
||||||
spec=client.V1ServiceSpec(
|
spec=client.V1ServiceSpec(
|
||||||
type="ExternalName",
|
type="ExternalName",
|
||||||
|
|
@ -1138,7 +1123,7 @@ class ClusterInfo:
|
||||||
svc = client.V1Service(
|
svc = client.V1Service(
|
||||||
metadata=client.V1ObjectMeta(
|
metadata=client.V1ObjectMeta(
|
||||||
name=name,
|
name=name,
|
||||||
labels={"app": self.app_name},
|
labels=self._stack_labels(),
|
||||||
),
|
),
|
||||||
spec=client.V1ServiceSpec(
|
spec=client.V1ServiceSpec(
|
||||||
cluster_ip="None",
|
cluster_ip="None",
|
||||||
|
|
@ -1156,7 +1141,7 @@ class ClusterInfo:
|
||||||
svc = client.V1Service(
|
svc = client.V1Service(
|
||||||
metadata=client.V1ObjectMeta(
|
metadata=client.V1ObjectMeta(
|
||||||
name=name,
|
name=name,
|
||||||
labels={"app": self.app_name},
|
labels=self._stack_labels(),
|
||||||
),
|
),
|
||||||
spec=client.V1ServiceSpec(
|
spec=client.V1ServiceSpec(
|
||||||
cluster_ip="None",
|
cluster_ip="None",
|
||||||
|
|
@ -1199,7 +1184,7 @@ class ClusterInfo:
|
||||||
secret = client.V1Secret(
|
secret = client.V1Secret(
|
||||||
metadata=client.V1ObjectMeta(
|
metadata=client.V1ObjectMeta(
|
||||||
name=secret_name,
|
name=secret_name,
|
||||||
labels={"app": self.app_name},
|
labels=self._stack_labels(),
|
||||||
),
|
),
|
||||||
data=secret_data,
|
data=secret_data,
|
||||||
)
|
)
|
||||||
|
|
|
||||||
|
|
@ -189,7 +189,7 @@ class K8sDeployer(Deployer):
|
||||||
ns = client.V1Namespace(
|
ns = client.V1Namespace(
|
||||||
metadata=client.V1ObjectMeta(
|
metadata=client.V1ObjectMeta(
|
||||||
name=self.k8s_namespace,
|
name=self.k8s_namespace,
|
||||||
labels={"app": self.cluster_info.app_name},
|
labels=self.cluster_info._stack_labels(),
|
||||||
)
|
)
|
||||||
)
|
)
|
||||||
self.core_api.create_namespace(body=ns)
|
self.core_api.create_namespace(body=ns)
|
||||||
|
|
@ -475,7 +475,7 @@ class K8sDeployer(Deployer):
|
||||||
endpoints = client.V1Endpoints(
|
endpoints = client.V1Endpoints(
|
||||||
metadata=client.V1ObjectMeta(
|
metadata=client.V1ObjectMeta(
|
||||||
name=name,
|
name=name,
|
||||||
labels={"app": self.cluster_info.app_name},
|
labels=self.cluster_info._stack_labels(),
|
||||||
),
|
),
|
||||||
subsets=[
|
subsets=[
|
||||||
client.V1EndpointSubset(
|
client.V1EndpointSubset(
|
||||||
|
|
@ -535,7 +535,7 @@ class K8sDeployer(Deployer):
|
||||||
endpoints = client.V1Endpoints(
|
endpoints = client.V1Endpoints(
|
||||||
metadata=client.V1ObjectMeta(
|
metadata=client.V1ObjectMeta(
|
||||||
name=name,
|
name=name,
|
||||||
labels={"app": self.cluster_info.app_name},
|
labels=self.cluster_info._stack_labels(),
|
||||||
),
|
),
|
||||||
subsets=[
|
subsets=[
|
||||||
client.V1EndpointSubset(
|
client.V1EndpointSubset(
|
||||||
|
|
@ -709,16 +709,27 @@ class K8sDeployer(Deployer):
|
||||||
if opts.o.debug:
|
if opts.o.debug:
|
||||||
print(f"Sending this job: {job}")
|
print(f"Sending this job: {job}")
|
||||||
if not opts.o.dry_run:
|
if not opts.o.dry_run:
|
||||||
job_resp = self.batch_api.create_namespaced_job(
|
job_name = job.metadata.name
|
||||||
body=job, namespace=self.k8s_namespace
|
try:
|
||||||
)
|
job_resp = self.batch_api.create_namespaced_job(
|
||||||
if opts.o.debug:
|
body=job, namespace=self.k8s_namespace
|
||||||
print("Job created:")
|
)
|
||||||
if job_resp.metadata:
|
if opts.o.debug:
|
||||||
print(
|
print("Job created:")
|
||||||
f" {job_resp.metadata.namespace} "
|
if job_resp.metadata:
|
||||||
f"{job_resp.metadata.name}"
|
print(
|
||||||
)
|
f" {job_resp.metadata.namespace} "
|
||||||
|
f"{job_resp.metadata.name}"
|
||||||
|
)
|
||||||
|
except ApiException as e:
|
||||||
|
if e.status == 409:
|
||||||
|
# Job already exists from a prior run. Jobs are one-
|
||||||
|
# shot — don't recreate on restart. Delete the Job
|
||||||
|
# explicitly to re-run (stop --delete-volumes also
|
||||||
|
# clears them via label-based cleanup).
|
||||||
|
print(f"Job {job_name} already exists, skipping")
|
||||||
|
else:
|
||||||
|
raise
|
||||||
|
|
||||||
def _find_certificate_for_host_name(self, host_name):
|
def _find_certificate_for_host_name(self, host_name):
|
||||||
all_certificates = self.custom_obj_api.list_namespaced_custom_object(
|
all_certificates = self.custom_obj_api.list_namespaced_custom_object(
|
||||||
|
|
@ -901,40 +912,261 @@ class K8sDeployer(Deployer):
|
||||||
|
|
||||||
call_stack_deploy_start(self.deployment_context)
|
call_stack_deploy_start(self.deployment_context)
|
||||||
|
|
||||||
def down(self, timeout, volumes, skip_cluster_management):
|
def down(
|
||||||
|
self, timeout, volumes, skip_cluster_management, delete_namespace=False
|
||||||
|
):
|
||||||
|
"""Tear down stack-labeled resources. Phases:
|
||||||
|
|
||||||
|
1. Delete namespaced resources (if namespace still exists).
|
||||||
|
2. Delete cluster-scoped PVs (if --delete-volumes, regardless of (1)).
|
||||||
|
3. Wait for everything we triggered to actually be gone.
|
||||||
|
4. Optionally delete the namespace itself (--delete-namespace).
|
||||||
|
5. Optionally destroy the kind cluster (--perform-cluster-management).
|
||||||
|
|
||||||
|
Steps 1-3 scope cleanup to a single stack via app.kubernetes.io/stack,
|
||||||
|
so multiple stacks sharing a namespace tear down independently.
|
||||||
|
"""
|
||||||
self.skip_cluster_management = skip_cluster_management
|
self.skip_cluster_management = skip_cluster_management
|
||||||
self.connect_api()
|
self.connect_api()
|
||||||
|
|
||||||
app_label = f"app={self.cluster_info.app_name}"
|
selector = self._stack_label_selector()
|
||||||
|
ns = self.k8s_namespace
|
||||||
|
ns_exists = self._namespace_exists(ns)
|
||||||
|
|
||||||
# PersistentVolumes are cluster-scoped (not namespaced), so delete by label
|
if ns_exists:
|
||||||
|
self._delete_namespaced_labeled_resources(ns, selector, volumes)
|
||||||
if volumes:
|
if volumes:
|
||||||
try:
|
self._delete_labeled_pvs(selector)
|
||||||
pvs = self.core_api.list_persistent_volume(label_selector=app_label)
|
self._wait_for_labeled_gone(
|
||||||
for pv in pvs.items:
|
ns, selector, delete_volumes=volumes, namespace_present=ns_exists
|
||||||
if opts.o.debug:
|
)
|
||||||
print(f"Deleting PV: {pv.metadata.name}")
|
|
||||||
try:
|
|
||||||
self.core_api.delete_persistent_volume(name=pv.metadata.name)
|
|
||||||
except ApiException as e:
|
|
||||||
_check_delete_exception(e)
|
|
||||||
except ApiException as e:
|
|
||||||
if opts.o.debug:
|
|
||||||
print(f"Error listing PVs: {e}")
|
|
||||||
|
|
||||||
# Delete the namespace to ensure clean slate.
|
if delete_namespace and ns_exists:
|
||||||
# Resources created by older laconic-so versions lack labels, so
|
self._delete_namespace()
|
||||||
# label-based deletion can't find them. Namespace deletion is the
|
self._wait_for_namespace_gone()
|
||||||
# only reliable cleanup.
|
|
||||||
self._delete_namespace()
|
|
||||||
# Wait for namespace to finish terminating before returning,
|
|
||||||
# so that up() can recreate it immediately.
|
|
||||||
self._wait_for_namespace_gone()
|
|
||||||
|
|
||||||
if self.is_kind() and not self.skip_cluster_management:
|
if self.is_kind() and not self.skip_cluster_management:
|
||||||
# Destroy the kind cluster
|
|
||||||
destroy_cluster(self.kind_cluster_name)
|
destroy_cluster(self.kind_cluster_name)
|
||||||
|
|
||||||
|
def _stack_label_selector(self) -> str:
|
||||||
|
"""Selector used for stack-scoped cleanup.
|
||||||
|
|
||||||
|
Prefer app.kubernetes.io/stack (per-stack) and fall back to the
|
||||||
|
legacy app= label (cluster-id scoped) for deployments that predate
|
||||||
|
the stack label.
|
||||||
|
"""
|
||||||
|
stack_name = self.cluster_info.stack_name
|
||||||
|
if stack_name:
|
||||||
|
return f"app.kubernetes.io/stack={stack_name}"
|
||||||
|
return f"app={self.cluster_info.app_name}"
|
||||||
|
|
||||||
|
def _namespace_exists(self, namespace: str) -> bool:
|
||||||
|
try:
|
||||||
|
self.core_api.read_namespace(name=namespace)
|
||||||
|
return True
|
||||||
|
except ApiException as e:
|
||||||
|
if e.status == 404:
|
||||||
|
if opts.o.debug:
|
||||||
|
print(f"Namespace {namespace} not found")
|
||||||
|
return False
|
||||||
|
raise
|
||||||
|
|
||||||
|
def _delete_namespaced_labeled_resources(
|
||||||
|
self, namespace: str, selector: str, delete_volumes: bool
|
||||||
|
):
|
||||||
|
"""Delete Ingresses, Deployments, Jobs, Services, ConfigMaps,
|
||||||
|
Secrets, Endpoints, Pods, and (if delete_volumes) PVCs in the
|
||||||
|
namespace. Order matters: Ingresses first so external traffic
|
||||||
|
stops, then workloads, then support objects, then Pods, then PVCs.
|
||||||
|
"""
|
||||||
|
if opts.o.dry_run:
|
||||||
|
print(
|
||||||
|
f"Dry run: would delete namespaced resources in {namespace} "
|
||||||
|
f"matching {selector}"
|
||||||
|
)
|
||||||
|
return
|
||||||
|
|
||||||
|
def swallow_404(fn):
|
||||||
|
try:
|
||||||
|
fn()
|
||||||
|
except ApiException as e:
|
||||||
|
if e.status not in (404, 405):
|
||||||
|
raise
|
||||||
|
|
||||||
|
# Ingresses first so external traffic stops before pods disappear.
|
||||||
|
swallow_404(
|
||||||
|
lambda: self.networking_api.delete_collection_namespaced_ingress(
|
||||||
|
namespace=namespace, label_selector=selector
|
||||||
|
)
|
||||||
|
)
|
||||||
|
# Deployments (owns ReplicaSets + Pods via GC).
|
||||||
|
swallow_404(
|
||||||
|
lambda: self.apps_api.delete_collection_namespaced_deployment(
|
||||||
|
namespace=namespace, label_selector=selector
|
||||||
|
)
|
||||||
|
)
|
||||||
|
# Jobs — propagation=Background cascades to child pods.
|
||||||
|
swallow_404(
|
||||||
|
lambda: self.batch_api.delete_collection_namespaced_job(
|
||||||
|
namespace=namespace,
|
||||||
|
label_selector=selector,
|
||||||
|
propagation_policy="Background",
|
||||||
|
)
|
||||||
|
)
|
||||||
|
# Services have no delete_collection on core_api; list + delete.
|
||||||
|
self._list_delete_namespaced(
|
||||||
|
namespace,
|
||||||
|
selector,
|
||||||
|
list_fn=self.core_api.list_namespaced_service,
|
||||||
|
delete_fn=self.core_api.delete_namespaced_service,
|
||||||
|
)
|
||||||
|
# ConfigMaps, Secrets.
|
||||||
|
swallow_404(
|
||||||
|
lambda: self.core_api.delete_collection_namespaced_config_map(
|
||||||
|
namespace=namespace, label_selector=selector
|
||||||
|
)
|
||||||
|
)
|
||||||
|
swallow_404(
|
||||||
|
lambda: self.core_api.delete_collection_namespaced_secret(
|
||||||
|
namespace=namespace, label_selector=selector
|
||||||
|
)
|
||||||
|
)
|
||||||
|
# Endpoints usually GC with Services, but we create a few directly
|
||||||
|
# (external-services) that aren't owned by a Service — clean those.
|
||||||
|
self._list_delete_namespaced(
|
||||||
|
namespace,
|
||||||
|
selector,
|
||||||
|
list_fn=self.core_api.list_namespaced_endpoints,
|
||||||
|
delete_fn=self.core_api.delete_namespaced_endpoints,
|
||||||
|
)
|
||||||
|
# Stray pods (owned pods are GC'd with their Deployment/Job).
|
||||||
|
swallow_404(
|
||||||
|
lambda: self.core_api.delete_collection_namespaced_pod(
|
||||||
|
namespace=namespace, label_selector=selector
|
||||||
|
)
|
||||||
|
)
|
||||||
|
if delete_volumes:
|
||||||
|
swallow_404(
|
||||||
|
lambda: self.core_api.delete_collection_namespaced_persistent_volume_claim( # noqa: E501
|
||||||
|
namespace=namespace, label_selector=selector
|
||||||
|
)
|
||||||
|
)
|
||||||
|
|
||||||
|
def _list_delete_namespaced(self, namespace, selector, list_fn, delete_fn):
|
||||||
|
"""List by selector and delete each item. Use for resources where
|
||||||
|
the k8s python client lacks delete_collection (Services, Endpoints).
|
||||||
|
"""
|
||||||
|
try:
|
||||||
|
items = list_fn(namespace=namespace, label_selector=selector).items
|
||||||
|
except ApiException as e:
|
||||||
|
if e.status == 404:
|
||||||
|
return
|
||||||
|
raise
|
||||||
|
for item in items:
|
||||||
|
try:
|
||||||
|
delete_fn(name=item.metadata.name, namespace=namespace)
|
||||||
|
except ApiException as e:
|
||||||
|
if e.status not in (404, 405):
|
||||||
|
raise
|
||||||
|
|
||||||
|
def _delete_labeled_pvs(self, selector: str):
|
||||||
|
"""Delete cluster-scoped PVs matching the stack label."""
|
||||||
|
if opts.o.dry_run:
|
||||||
|
print(f"Dry run: would delete PVs matching {selector}")
|
||||||
|
return
|
||||||
|
try:
|
||||||
|
pvs = self.core_api.list_persistent_volume(label_selector=selector)
|
||||||
|
except ApiException as e:
|
||||||
|
if opts.o.debug:
|
||||||
|
print(f"Error listing PVs: {e}")
|
||||||
|
return
|
||||||
|
for pv in pvs.items:
|
||||||
|
if opts.o.debug:
|
||||||
|
print(f"Deleting PV: {pv.metadata.name}")
|
||||||
|
try:
|
||||||
|
self.core_api.delete_persistent_volume(name=pv.metadata.name)
|
||||||
|
except ApiException as e:
|
||||||
|
_check_delete_exception(e)
|
||||||
|
|
||||||
|
def _wait_for_labeled_gone(
|
||||||
|
self,
|
||||||
|
namespace: str,
|
||||||
|
selector: str,
|
||||||
|
delete_volumes: bool,
|
||||||
|
namespace_present: bool,
|
||||||
|
timeout_seconds: int = 120,
|
||||||
|
):
|
||||||
|
"""Poll until every kind we triggered a delete for is gone.
|
||||||
|
|
||||||
|
delete_collection/delete are async — finalizers (PV bound-by-PVC,
|
||||||
|
PVC bound-by-VolumeAttachment, pod graceful shutdown) propagate
|
||||||
|
after the API call returns. Blocking here makes down() a
|
||||||
|
synchronous contract for callers (tests, ansible, cryovial).
|
||||||
|
"""
|
||||||
|
import time
|
||||||
|
|
||||||
|
listers = []
|
||||||
|
if namespace_present:
|
||||||
|
listers += [
|
||||||
|
("deployment", lambda: self.apps_api.list_namespaced_deployment(
|
||||||
|
namespace=namespace, label_selector=selector)),
|
||||||
|
("ingress", lambda: self.networking_api.list_namespaced_ingress(
|
||||||
|
namespace=namespace, label_selector=selector)),
|
||||||
|
("job", lambda: self.batch_api.list_namespaced_job(
|
||||||
|
namespace=namespace, label_selector=selector)),
|
||||||
|
("service", lambda: self.core_api.list_namespaced_service(
|
||||||
|
namespace=namespace, label_selector=selector)),
|
||||||
|
("configmap", lambda: self.core_api.list_namespaced_config_map(
|
||||||
|
namespace=namespace, label_selector=selector)),
|
||||||
|
("secret", lambda: self.core_api.list_namespaced_secret(
|
||||||
|
namespace=namespace, label_selector=selector)),
|
||||||
|
("pod", lambda: self.core_api.list_namespaced_pod(
|
||||||
|
namespace=namespace, label_selector=selector)),
|
||||||
|
]
|
||||||
|
if delete_volumes:
|
||||||
|
listers.append(
|
||||||
|
("persistentvolumeclaim",
|
||||||
|
lambda: self.core_api.list_namespaced_persistent_volume_claim(
|
||||||
|
namespace=namespace, label_selector=selector))
|
||||||
|
)
|
||||||
|
# PVs are cluster-scoped — wait for them even when the namespace
|
||||||
|
# is already gone (orphaned from a prior --delete-namespace).
|
||||||
|
if delete_volumes:
|
||||||
|
listers.append(
|
||||||
|
("persistentvolume",
|
||||||
|
lambda: self.core_api.list_persistent_volume(
|
||||||
|
label_selector=selector))
|
||||||
|
)
|
||||||
|
|
||||||
|
def remaining():
|
||||||
|
out = []
|
||||||
|
for kind, lister in listers:
|
||||||
|
try:
|
||||||
|
items = lister().items
|
||||||
|
except ApiException as e:
|
||||||
|
if e.status == 404:
|
||||||
|
continue
|
||||||
|
raise
|
||||||
|
if items:
|
||||||
|
out.append((kind, len(items)))
|
||||||
|
return out
|
||||||
|
|
||||||
|
deadline = time.monotonic() + timeout_seconds
|
||||||
|
while time.monotonic() < deadline:
|
||||||
|
left = remaining()
|
||||||
|
if not left:
|
||||||
|
return
|
||||||
|
if opts.o.debug:
|
||||||
|
print(f"Waiting for deletions: {left}")
|
||||||
|
time.sleep(2)
|
||||||
|
|
||||||
|
left = remaining()
|
||||||
|
if left:
|
||||||
|
print(
|
||||||
|
f"Warning: resources still present after {timeout_seconds}s: "
|
||||||
|
f"{left}"
|
||||||
|
)
|
||||||
|
|
||||||
def status(self):
|
def status(self):
|
||||||
self.connect_api()
|
self.connect_api()
|
||||||
# Call whatever API we need to get the running container list
|
# Call whatever API we need to get the running container list
|
||||||
|
|
|
||||||
|
|
@ -23,7 +23,7 @@ wait_for_pods_started () {
|
||||||
done
|
done
|
||||||
# Timed out, error exit
|
# Timed out, error exit
|
||||||
echo "waiting for pods to start: FAILED"
|
echo "waiting for pods to start: FAILED"
|
||||||
delete_cluster_exit
|
cleanup_and_exit
|
||||||
}
|
}
|
||||||
|
|
||||||
wait_for_log_output () {
|
wait_for_log_output () {
|
||||||
|
|
@ -42,15 +42,42 @@ wait_for_log_output () {
|
||||||
done
|
done
|
||||||
# Timed out, error exit
|
# Timed out, error exit
|
||||||
echo "waiting for pods log content: FAILED"
|
echo "waiting for pods log content: FAILED"
|
||||||
delete_cluster_exit
|
cleanup_and_exit
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|
||||||
delete_cluster_exit () {
|
cleanup_and_exit () {
|
||||||
$TEST_TARGET_SO deployment --dir $test_deployment_dir stop --delete-volumes
|
# Full teardown so CI runners don't leak namespaces/PVs between runs.
|
||||||
|
$TEST_TARGET_SO deployment --dir $test_deployment_dir \
|
||||||
|
stop --delete-volumes --delete-namespace --skip-cluster-management || true
|
||||||
exit 1
|
exit 1
|
||||||
}
|
}
|
||||||
|
|
||||||
|
assert_ns_phase () {
|
||||||
|
local expected=$1
|
||||||
|
local phase
|
||||||
|
phase=$(kubectl get namespace ${deployment_ns} -o jsonpath='{.status.phase}' 2>/dev/null || echo "Missing")
|
||||||
|
if [ "$phase" != "$expected" ]; then
|
||||||
|
echo "namespace phase test: FAILED (expected ${expected}, got ${phase})"
|
||||||
|
cleanup_and_exit
|
||||||
|
fi
|
||||||
|
}
|
||||||
|
|
||||||
|
# Count labeled resources in the deployment namespace. down() is
|
||||||
|
# synchronous on its own cleanup (waits for PVCs/pods to terminate
|
||||||
|
# before returning) so callers can assert immediately.
|
||||||
|
# Usage: assert_no_labeled_resources <kind>
|
||||||
|
assert_no_labeled_resources () {
|
||||||
|
local kind=$1
|
||||||
|
local count
|
||||||
|
count=$(kubectl get ${kind} -n ${deployment_ns} \
|
||||||
|
-l app.kubernetes.io/stack=test --no-headers 2>/dev/null | wc -l)
|
||||||
|
if [ "$count" -ne 0 ]; then
|
||||||
|
echo "labeled cleanup test: FAILED (${kind} still present: ${count})"
|
||||||
|
cleanup_and_exit
|
||||||
|
fi
|
||||||
|
}
|
||||||
|
|
||||||
# Note: eventually this test should be folded into ../deploy/
|
# Note: eventually this test should be folded into ../deploy/
|
||||||
# but keeping it separate for now for convenience
|
# but keeping it separate for now for convenience
|
||||||
TEST_TARGET_SO=$( ls -t1 ./package/laconic-so* | head -1 )
|
TEST_TARGET_SO=$( ls -t1 ./package/laconic-so* | head -1 )
|
||||||
|
|
@ -130,7 +157,7 @@ if [[ "$log_output_3" == *"filesystem is fresh"* ]]; then
|
||||||
else
|
else
|
||||||
echo "deployment logs test: FAILED"
|
echo "deployment logs test: FAILED"
|
||||||
echo "$log_output_3"
|
echo "$log_output_3"
|
||||||
delete_cluster_exit
|
cleanup_and_exit
|
||||||
fi
|
fi
|
||||||
|
|
||||||
# Check the config variable CERC_TEST_PARAM_1 was passed correctly
|
# Check the config variable CERC_TEST_PARAM_1 was passed correctly
|
||||||
|
|
@ -138,7 +165,7 @@ if [[ "$log_output_3" == *"Test-param-1: PASSED"* ]]; then
|
||||||
echo "deployment config test: passed"
|
echo "deployment config test: passed"
|
||||||
else
|
else
|
||||||
echo "deployment config test: FAILED"
|
echo "deployment config test: FAILED"
|
||||||
delete_cluster_exit
|
cleanup_and_exit
|
||||||
fi
|
fi
|
||||||
|
|
||||||
# Check the config variable CERC_TEST_PARAM_2 was passed correctly from the compose file
|
# Check the config variable CERC_TEST_PARAM_2 was passed correctly from the compose file
|
||||||
|
|
@ -155,7 +182,7 @@ if [[ "$log_output_4" == *"/config/test_config:"* ]] && [[ "$log_output_4" == *"
|
||||||
echo "deployment ConfigMap test: passed"
|
echo "deployment ConfigMap test: passed"
|
||||||
else
|
else
|
||||||
echo "deployment ConfigMap test: FAILED"
|
echo "deployment ConfigMap test: FAILED"
|
||||||
delete_cluster_exit
|
cleanup_and_exit
|
||||||
fi
|
fi
|
||||||
|
|
||||||
# Check that the bind-mount volume is mounted.
|
# Check that the bind-mount volume is mounted.
|
||||||
|
|
@ -165,7 +192,7 @@ if [[ "$log_output_5" == *"/data: MOUNTED"* ]]; then
|
||||||
else
|
else
|
||||||
echo "deployment bind volumes test: FAILED"
|
echo "deployment bind volumes test: FAILED"
|
||||||
echo "$log_output_5"
|
echo "$log_output_5"
|
||||||
delete_cluster_exit
|
cleanup_and_exit
|
||||||
fi
|
fi
|
||||||
|
|
||||||
# Check that the provisioner managed volume is mounted.
|
# Check that the provisioner managed volume is mounted.
|
||||||
|
|
@ -175,7 +202,7 @@ if [[ "$log_output_6" == *"/data2: MOUNTED"* ]]; then
|
||||||
else
|
else
|
||||||
echo "deployment provisioner volumes test: FAILED"
|
echo "deployment provisioner volumes test: FAILED"
|
||||||
echo "$log_output_6"
|
echo "$log_output_6"
|
||||||
delete_cluster_exit
|
cleanup_and_exit
|
||||||
fi
|
fi
|
||||||
|
|
||||||
# --- New feature tests: namespace, labels, jobs, secrets ---
|
# --- New feature tests: namespace, labels, jobs, secrets ---
|
||||||
|
|
@ -187,7 +214,7 @@ if [ "$ns_pod_count" -gt 0 ]; then
|
||||||
else
|
else
|
||||||
echo "namespace isolation test: FAILED"
|
echo "namespace isolation test: FAILED"
|
||||||
echo "Expected pod in namespace ${deployment_ns}"
|
echo "Expected pod in namespace ${deployment_ns}"
|
||||||
delete_cluster_exit
|
cleanup_and_exit
|
||||||
fi
|
fi
|
||||||
|
|
||||||
# Check that the stack label is set on the pod
|
# Check that the stack label is set on the pod
|
||||||
|
|
@ -196,7 +223,7 @@ if [ "$stack_label_count" -gt 0 ]; then
|
||||||
echo "stack label test: passed"
|
echo "stack label test: passed"
|
||||||
else
|
else
|
||||||
echo "stack label test: FAILED"
|
echo "stack label test: FAILED"
|
||||||
delete_cluster_exit
|
cleanup_and_exit
|
||||||
fi
|
fi
|
||||||
|
|
||||||
# Check that the job completed successfully
|
# Check that the job completed successfully
|
||||||
|
|
@ -212,7 +239,7 @@ if [ "$job_status" == "1" ]; then
|
||||||
else
|
else
|
||||||
echo "job completion test: FAILED"
|
echo "job completion test: FAILED"
|
||||||
echo "Job status.succeeded: ${job_status}"
|
echo "Job status.succeeded: ${job_status}"
|
||||||
delete_cluster_exit
|
cleanup_and_exit
|
||||||
fi
|
fi
|
||||||
|
|
||||||
# Check that the secrets spec results in an envFrom secretRef on the pod
|
# Check that the secrets spec results in an envFrom secretRef on the pod
|
||||||
|
|
@ -223,25 +250,24 @@ if [ "$secret_ref" == "test-secret" ]; then
|
||||||
else
|
else
|
||||||
echo "secrets envFrom test: FAILED"
|
echo "secrets envFrom test: FAILED"
|
||||||
echo "Expected secretRef 'test-secret', got: ${secret_ref}"
|
echo "Expected secretRef 'test-secret', got: ${secret_ref}"
|
||||||
delete_cluster_exit
|
cleanup_and_exit
|
||||||
fi
|
fi
|
||||||
|
|
||||||
# Stop then start again and check the volume was preserved.
|
# Stop with --delete-volumes (but not --delete-namespace) and verify:
|
||||||
# Use --skip-cluster-management to reuse the existing kind cluster instead of
|
# - namespace stays Active (no termination race on restart)
|
||||||
# destroying and recreating it (which fails on CI runners due to stale etcd/certs
|
# - stack-labeled workloads are gone
|
||||||
# and cgroup detection issues).
|
# - bind-mount data on the host survives; provisioner volumes are recreated
|
||||||
# Use --delete-volumes to clear PVs so fresh PVCs can bind on restart.
|
|
||||||
# Bind-mount data survives on the host filesystem; provisioner volumes are recreated fresh.
|
|
||||||
$TEST_TARGET_SO deployment --dir $test_deployment_dir stop --delete-volumes --skip-cluster-management
|
$TEST_TARGET_SO deployment --dir $test_deployment_dir stop --delete-volumes --skip-cluster-management
|
||||||
# Wait for the namespace to be fully terminated before restarting.
|
|
||||||
# Without this, 'start' fails with 403 Forbidden because the namespace
|
assert_ns_phase "Active"
|
||||||
# is still in Terminating state.
|
echo "stop preserves namespace test: passed"
|
||||||
for i in {1..60}; do
|
|
||||||
if ! kubectl get namespace ${deployment_ns} 2>/dev/null | grep -q .; then
|
for kind in deployment job ingress service configmap secret pvc pod; do
|
||||||
break
|
assert_no_labeled_resources "$kind"
|
||||||
fi
|
|
||||||
sleep 2
|
|
||||||
done
|
done
|
||||||
|
echo "stop cleans labeled resources test: passed"
|
||||||
|
|
||||||
|
# Restart — no wait needed, the namespace is still Active.
|
||||||
$TEST_TARGET_SO deployment --dir $test_deployment_dir start --skip-cluster-management
|
$TEST_TARGET_SO deployment --dir $test_deployment_dir start --skip-cluster-management
|
||||||
wait_for_pods_started
|
wait_for_pods_started
|
||||||
wait_for_log_output
|
wait_for_log_output
|
||||||
|
|
@ -252,7 +278,7 @@ if [[ "$log_output_10" == *"/data filesystem is old"* ]]; then
|
||||||
echo "Retain bind volumes test: passed"
|
echo "Retain bind volumes test: passed"
|
||||||
else
|
else
|
||||||
echo "Retain bind volumes test: FAILED"
|
echo "Retain bind volumes test: FAILED"
|
||||||
delete_cluster_exit
|
cleanup_and_exit
|
||||||
fi
|
fi
|
||||||
|
|
||||||
# Provisioner volumes are destroyed when PVs are deleted (--delete-volumes on stop).
|
# Provisioner volumes are destroyed when PVs are deleted (--delete-volumes on stop).
|
||||||
|
|
@ -263,9 +289,17 @@ if [[ "$log_output_11" == *"/data2 filesystem is fresh"* ]]; then
|
||||||
echo "Fresh provisioner volumes test: passed"
|
echo "Fresh provisioner volumes test: passed"
|
||||||
else
|
else
|
||||||
echo "Fresh provisioner volumes test: FAILED"
|
echo "Fresh provisioner volumes test: FAILED"
|
||||||
delete_cluster_exit
|
cleanup_and_exit
|
||||||
fi
|
fi
|
||||||
|
|
||||||
# Stop and clean up
|
# Full teardown: --delete-namespace nukes the namespace after labeled cleanup.
|
||||||
$TEST_TARGET_SO deployment --dir $test_deployment_dir stop --delete-volumes
|
# Verify the namespace is actually gone.
|
||||||
|
$TEST_TARGET_SO deployment --dir $test_deployment_dir \
|
||||||
|
stop --delete-volumes --delete-namespace --skip-cluster-management
|
||||||
|
if kubectl get namespace ${deployment_ns} >/dev/null 2>&1; then
|
||||||
|
echo "delete-namespace test: FAILED (namespace still present)"
|
||||||
|
exit 1
|
||||||
|
fi
|
||||||
|
echo "delete-namespace test: passed"
|
||||||
|
|
||||||
echo "Test passed"
|
echo "Test passed"
|
||||||
|
|
|
||||||
Loading…
Reference in New Issue