chore(pebbles): file so-ad7 for broken maintenance swap

Maintenance swap during restart patches Ingress to point at a
Service that get_services() never creates, so during the swap
window Caddy has no valid backend and users see 'site cannot be
reached' instead of the maintenance page.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
pull/744/head
Prathamesh Musale 2026-04-16 07:27:07 +00:00
parent fc5dc80058
commit 79bae5bf9d
1 changed files with 1 additions and 0 deletions

View File

@ -39,3 +39,4 @@
{"type":"close","timestamp":"2026-04-16T06:24:39.175431401Z","issue_id":"so-l2l","payload":{}} {"type":"close","timestamp":"2026-04-16T06:24:39.175431401Z","issue_id":"so-l2l","payload":{}}
{"type":"comment","timestamp":"2026-04-16T06:24:41.70556861Z","issue_id":"so-076.2","payload":{"body":"Fixed on chore/pebble-status-audit. stop now uses label-scoped cleanup (app.kubernetes.io/stack=\u003cstack\u003e) and keeps the namespace Active by default. The Kind cluster is not destroyed unless --perform-cluster-management is passed. Full namespace teardown is opt-in via the new --delete-namespace flag. Multiple stacks sharing a namespace/cluster are now cleaned up independently, not blown away en masse."}} {"type":"comment","timestamp":"2026-04-16T06:24:41.70556861Z","issue_id":"so-076.2","payload":{"body":"Fixed on chore/pebble-status-audit. stop now uses label-scoped cleanup (app.kubernetes.io/stack=\u003cstack\u003e) and keeps the namespace Active by default. The Kind cluster is not destroyed unless --perform-cluster-management is passed. Full namespace teardown is opt-in via the new --delete-namespace flag. Multiple stacks sharing a namespace/cluster are now cleaned up independently, not blown away en masse."}}
{"type":"close","timestamp":"2026-04-16T06:24:42.153940477Z","issue_id":"so-076.2","payload":{}} {"type":"close","timestamp":"2026-04-16T06:24:42.153940477Z","issue_id":"so-076.2","payload":{}}
{"type":"create","timestamp":"2026-04-16T07:26:56.820142001Z","issue_id":"so-ad7","payload":{"description":"_restart_with_maintenance in deployment.py patches Ingress backends to point at the maintenance Service, but that Service is never created. get_services() in cluster_info.py only builds per-pod ClusterIP Services for pods referenced by http-proxy routes (cluster_info.py:991-992 'if not ports_set: continue'). The maintenance pod has no http-proxy route by design, so no Service is built for it.\n\nResult: during a restart with maintenance-service configured, the Ingress points to a non-existent Service. Caddy has no valid backend, connection fails, users see 'site cannot be reached' instead of the maintenance page. Cryovial logs correctly report the swap happened.\n\n_resolve_service_name_for_container (cluster_info.py:183) and get_services() (cluster_info.py:945) operate on inconsistent premises — the resolver assumes every pod has a {app_name}-{pod_name}-service; the builder only creates one for http-proxy-referenced pods.\n\nFix: create_services() should also build a Service for the container named by spec's maintenance-service: key.","priority":"3","title":"Maintenance swap routes Ingress to non-existent Service","type":"bug"}}