stack-orchestrator/docs/deployment_patterns.md

# Deployment Patterns

## GitOps Pattern

For production deployments, we recommend a GitOps approach where your deployment configuration is tracked in version control.

### Overview

- **spec.yml is your source of truth**: Maintain it in your operator repository
- **Don't regenerate on every restart**: Run `deploy init` once, then customize and commit
- **Use restart for updates**: The restart command respects your git-tracked spec.yml

### Workflow

1. **Initial setup**: Run `deploy init` once to generate a spec.yml template
2. **Customize and commit**: Edit spec.yml with your configuration (hostnames, resources, etc.) and commit to your operator repo
3. **Deploy from git**: Use the committed spec.yml for deployments
4. **Update via git**: Make changes in git, then restart to apply

```bash
# Initial setup (run once)
laconic-so --stack my-stack deploy init --output spec.yml

# Customize for your environment
vim spec.yml  # Set hostname, resources, etc.

# Commit to your operator repository
git add spec.yml
git commit -m "Add my-stack deployment configuration"
git push

# On deployment server: deploy from git-tracked spec
laconic-so --stack my-stack deploy create \
  --spec-file /path/to/operator-repo/spec.yml \
  --deployment-dir my-deployment

laconic-so deployment --dir my-deployment start
```

### Updating Deployments

When you need to update a deployment:

```bash
# 1. Make changes in your operator repo
vim /path/to/operator-repo/spec.yml
git commit -am "Update configuration"
git push

# 2. On deployment server: pull and restart
cd /path/to/operator-repo && git pull
laconic-so deployment --dir my-deployment restart
```

The `restart` command:
- Pulls latest code from the stack repository
- Uses your git-tracked spec.yml (does NOT regenerate from defaults)
- Syncs the deployment directory
- Restarts services

### Anti-patterns

**Don't do this:**
```bash
# BAD: Regenerating spec on every deployment
laconic-so --stack my-stack deploy init --output spec.yml
laconic-so deploy create --spec-file spec.yml ...
```

This overwrites your customizations with defaults from the stack's `commands.py`.

**Do this instead:**
```bash
# GOOD: Use your git-tracked spec
git pull  # Get latest spec.yml from your operator repo
laconic-so deployment --dir my-deployment restart
```

## Private Registry Authentication

For deployments using images from private container registries (e.g., GitHub Container Registry), configure authentication in your spec.yml:

### Configuration

Add a `registry-credentials` section to your spec.yml:

```yaml
registry-credentials:
  server: ghcr.io
  username: your-org-or-username
  token-env: REGISTRY_TOKEN
```

**Fields:**
- `server`: The registry hostname (e.g., `ghcr.io`, `docker.io`, `gcr.io`)
- `username`: Registry username (for GHCR, use your GitHub username or org name)
- `token-env`: Name of the environment variable containing your API token/PAT

### Token Environment Variable

The `token-env` pattern keeps credentials out of version control. Set the environment variable when running `deployment start`:

```bash
export REGISTRY_TOKEN="your-personal-access-token"
laconic-so deployment --dir my-deployment start
```

For GHCR, create a Personal Access Token (PAT) with `read:packages` scope.

### Ansible Integration

When using Ansible for deployments, pass the token from a credentials file:

```yaml
- name: Start deployment
  ansible.builtin.command:
    cmd: laconic-so deployment --dir {{ deployment_dir }} start
  environment:
    REGISTRY_TOKEN: "{{ lookup('file', '~/.credentials/ghcr_token') }}"
```

### How It Works

1. laconic-so reads the `registry-credentials` config from spec.yml
2. Creates a Kubernetes `docker-registry` secret named `{deployment}-registry`
3. The deployment's pods reference this secret for image pulls

## Cluster and Volume Management

### Stopping Deployments

The `deployment stop` command has two important flags:

```bash
# Default: stops deployment, deletes cluster, PRESERVES volumes
laconic-so deployment --dir my-deployment stop

# Explicitly delete volumes (USE WITH CAUTION)
laconic-so deployment --dir my-deployment stop --delete-volumes
```

### Volume Persistence

Volumes persist across cluster deletion by design. This is important because:
- **Data survives cluster recreation**: Ledger data, databases, and other state are preserved
- **Faster recovery**: No need to re-sync or rebuild data after cluster issues
- **Safe cluster upgrades**: Delete and recreate cluster without data loss

**Only use `--delete-volumes` when:**
- You explicitly want to start fresh with no data
- The user specifically requests volume deletion
- You're cleaning up a test/dev environment completely

### Shared Cluster Architecture

In kind deployments, multiple stacks share a single cluster:
- First `deployment start` creates the cluster
- Subsequent deployments reuse the existing cluster
- `deployment stop` on ANY deployment deletes the shared cluster
- Other deployments will fail until cluster is recreated

To stop a single deployment without affecting the cluster:
```bash
laconic-so deployment --dir my-deployment stop --skip-cluster-management
```

Stacks sharing a cluster must agree on mount topology. See
[Volume Persistence in k8s-kind](#volume-persistence-in-k8s-kind).

### cluster-id vs deployment-id

Each deployment's `deployment.yml` carries two identifiers with
different roles:

- **`cluster-id`** — which kind cluster this deployment attaches to.
  Used for the kube-config context name (`kind-{cluster-id}`) and for
  kind lifecycle ops. Inherited from the running cluster at
  `deploy create` time when one exists; freshly generated otherwise.
  Shared across every deployment that joins the same cluster.
- **`deployment-id`** — this particular deployment's identity.
  Generated fresh on every `deploy create` and never inherited. Flows
  into `app_name`, the prefix on every k8s resource name this
  deployment creates (PVs, ConfigMaps, Deployments, PVCs, …). Distinct
  per deployment even when the cluster is shared.

The split prevents silent resource-name collisions between
deployments sharing a cluster: two deployments of the same stack,
or any two deployments that happen to declare a volume with the same
name, still produce distinct `{deployment-id}-{vol}` PV names.

**Backward compatibility**: `deployment.yml` files written before the
`deployment-id` field existed fall back to using `cluster-id` as the
deployment-id. Existing resource names stay stable across this
upgrade — no PV renames, no re-bind, no data orphaning. The next
`deploy create` writes both fields going forward.

**Namespace ownership**: on top of distinct resource names, SO stamps
the k8s namespace with a `laconic.com/deployment-dir` annotation on
first creation. A subsequent `deployment start` from a different
deployment directory that would land in the same namespace fails
with a `DeployerException` pointing at the `namespace:` spec
override. Catches operator-error cases where the same deployment dir
is effectively registered twice.

### Caddy ingress image lifecycle

The Caddy ingress controller lives in the cluster-scoped
`caddy-system` namespace and is installed on first `deployment start`.
Its image is configurable per deployment:

```yaml
# spec.yml
caddy-ingress-image: ghcr.io/laconicnetwork/caddy-ingress:v1.2.3
```

Two cases, intentionally different:

- **Spec key set**: on first install the manifest is templated with
  this image. On subsequent `deployment start`, if the running Caddy
  Deployment's image differs, laconic-so patches it and waits for the
  rollout. The Deployment uses `strategy: Recreate` (hostPort 80/443
  blocks rolling updates from ever completing), so expect ~10–30s of
  ingress downtime while the old pod terminates and the new one
  starts.
- **Spec key absent**: on first install the manifest's hardcoded
  default (`ghcr.io/laconicnetwork/caddy-ingress:latest`) is used.
  On subsequent `deployment start`, laconic-so does **not** touch the
  running Caddy Deployment. This matters when the image was set
  out-of-band (via an ansible playbook, or by another deployment's
  spec that's since been removed) — a silent revert to the default
  would be worse than doing nothing. If you want to go back to the
  default image, set `caddy-ingress-image` to it explicitly.

**Cluster-scoped caveat**: `caddy-system` is shared by every
deployment on the cluster. Setting `caddy-ingress-image` in any one
deployment's spec rolls the controller for all of them — last
`deployment start` wins. Treat it as a cluster-level knob; keep the
value consistent across the deployments sharing a cluster.

## Volume Persistence in k8s-kind

k8s-kind has 3 storage layers:

- **Docker Host**: The physical server running Docker
- **Kind Node**: A Docker container simulating a k8s node
- **Pod Container**: Your workload

Volumes with paths are mounted from Docker Host → Kind Node → Pod via kind
`extraMounts`. Kind applies `extraMounts` only at cluster creation — they
cannot be added to a running cluster.

| spec.yml volume | Storage Location | Survives Pod Restart | Survives Cluster Restart |
|-----------------|------------------|---------------------|-------------------------|
| `vol:` (empty)  | Kind Node PVC    | ✅ | ❌ |
| `vol: ./data/x` | Docker Host      | ✅ | ✅ |
| `vol: /abs/path`| Docker Host      | ✅ | ✅ |

**Recommendation**: Always use paths for data you want to keep. Relative paths
(e.g., `./data/rpc-config`) resolve to `$DEPLOYMENT_DIR/data/rpc-config` on the
Docker Host.

### Example

```yaml
# In spec.yml
volumes:
  rpc-config: ./data/rpc-config  # Persists to $DEPLOYMENT_DIR/data/rpc-config
  chain-data: ./data/chain       # Persists to $DEPLOYMENT_DIR/data/chain
  temp-cache:                    # Empty = Kind Node PVC (lost on cluster delete)
```

### The Antipattern

Empty-path volumes appear persistent because they survive pod restarts (data lives
in Kind Node container). However, this data is lost when the kind cluster is
recreated. This "false persistence" has caused data loss when operators assumed
their data was safe.

### Shared Clusters: Use `kind-mount-root`

Because kind `extraMounts` can only be set at cluster creation, the first
deployment to start locks in the mount topology. Later deployments that
declare new `extraMounts` have them silently ignored — their PVs fall
through to the kind node's overlay filesystem and lose data on cluster
destroy.

The fix is an umbrella mount. Set `kind-mount-root` in the spec, pointing
at a host directory all stacks will share:

```yaml
# spec.yml
kind-mount-root: /srv/kind

volumes:
  my-data: /srv/kind/my-stack/data   # visible at /mnt/my-stack/data in-node
```

SO emits a single `extraMount` (`<kind-mount-root>` → `/mnt`). Any new
host subdirectory under the root is visible in the node immediately — no
cluster recreate needed to add stacks.

**All stacks sharing a cluster must agree on `kind-mount-root`** and keep
their host paths under it.

### Mount Compatibility Enforcement

`laconic-so deployment start` validates mount topology:

- **On first cluster creation** without an umbrella mount: prints a
  warning (future stacks may require a full recreate to add mounts).
- **On cluster reuse**: compares the new deployment's `extraMounts`
  against the live mounts on the control-plane container. Any mismatch
  (wrong host path, or mount missing) fails the deploy.

### Static files in compose volumes → auto-ConfigMap

Compose volumes that bind a host file or flat directory into a container
(e.g. `../config/test/script.sh:/opt/run.sh`) are used to inject static
content that ships with the stack. k8s doesn't have a native notion of
this — the canonical way to inject static content is a ConfigMap.

At `deploy start`, laconic-so auto-generates a namespace-scoped
ConfigMap per host-path compose volume (deduped by source) and mounts
it into the pod instead of routing the bind through the kind node:

| Source shape | Behavior |
|---|---|
| Single file | ConfigMap with one key (the filename); pod mount uses `subPath` so the single key lands at the compose target path |
| Flat directory (no subdirs, ≤ ~700 KiB) | ConfigMap with one key per file; pod mount exposes all keys at the target path |
| Directory with subdirs, or over budget | Rejected at `deploy create` — embed in the container image, split into multiple ConfigMaps, or use an initContainer |
| `:rw` on any host-path bind | Rejected at `deploy create` — use a named volume with a spec-configured host path for writable data |

The deployment dir layout is unchanged: compose files stay verbatim and
`spec.yml` is not rewritten. Source files remain under
`{deployment_dir}/config/{pod}/` (as copied by `deploy create`); the
ConfigMap is built from them at deploy start and no kind extraMount is
emitted for these paths.

This works identically on kind and real k8s (ConfigMaps are
cluster-native; no node-side landing pad required), and two deployments
of the same stack sharing a cluster get their own per-namespace
ConfigMaps — no aliasing.

### Writable / generated data → named volume + host path

For volumes the workload *writes to* (databases, ledgers, caches, logs),
use a named volume backed by a spec-configured host path under
`kind-mount-root`:

```yaml
# compose
volumes:
  - my-data:/var/lib/foo

# spec.yml
kind-mount-root: /srv/kind
volumes:
  my-data: /srv/kind/my-stack/data
```

Works on both kind (via the umbrella mount) and real k8s (operator
provisions `/srv/kind/my-stack/data` on each node).

### Migrating an Existing Cluster

If a cluster was created without an umbrella mount and you need to add a
stack that requires new host-path mounts, the cluster must be recreated:

1. Back up ephemeral state (DBs, caches) from PVs that lack host mounts —
   these are in the kind node overlay FS and do not survive `kind delete`.
2. Update every stack's spec to set a shared `kind-mount-root` and place
   host paths under it.
3. Stop all deployments, destroy the cluster, recreate it by starting any
   stack (umbrella now active), and restore state.