Commit Graph

206 Commits (eb4704b563d53b2c9c93793530c94987ed5c2b25)

Author SHA1 Message Date
A. F. Dudley 14f423ea0c fix(k8s): read existing resourceVersion/clusterIP before replace
K8s PUT (replace) operations require metadata.resourceVersion for
optimistic concurrency control. Services additionally have immutable
spec.clusterIP that must be preserved from the existing object.

On 409 conflict, all _ensure_* methods now read the existing resource
first and copy resourceVersion (and clusterIP for Services) into the
body before calling replace.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-08 04:32:20 +00:00
A. F. Dudley 1da69cf739 fix(k8s): make deploy_k8s.py idempotent with create-or-replace semantics
All K8s resource creation in deploy_k8s.py now uses try-create, catch
ApiException(409), then replace — matching the pattern already used for
secrets in deployment_create.py. This allows `deployment start` to be
safely re-run without 409 Conflict errors.

Resources made idempotent:
- Deployment (create_namespaced_deployment → replace on 409)
- Service (create_namespaced_service → replace on 409)
- Ingress (create_namespaced_ingress → replace on 409)
- NodePort services (same as Service)
- ConfigMap (create_namespaced_config_map → replace on 409)
- PV/PVC: bare `except: pass` replaced with explicit ApiException
  catch for 404

Extracted _ensure_deployment(), _ensure_service(), _ensure_ingress(),
and _ensure_config_map() helpers to keep cyclomatic complexity in check.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-08 04:15:03 +00:00
A. F. Dudley cc6acd5f09 fix: default skip-cluster-management to true
Destroying the kind cluster on stop/start is almost never the intent.
The cluster holds PVs, ConfigMaps, and networking state that are
expensive to recreate. Default to preserving the cluster; pass
--perform-cluster-management explicitly when a full teardown is needed.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-08 02:41:25 +00:00
A. F. Dudley 806c1bb723 refactor: rename `deployment update` to `deployment update-envs`
The update command only patches environment variables and adds a
restart annotation. It does not update ports, volumes, configmaps,
or any other deployment spec. The old name was misleading — it
implied a full spec update, causing operators to expect changes
that never took effect.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-08 02:33:20 +00:00
A. F. Dudley 7f205732f2 fix(k8s): expand etcd cleanup whitelist to preserve core cluster services
_clean_etcd_keeping_certs() only preserved /registry/secrets/caddy-system,
deleting everything else including the kubernetes ClusterIP service in the
default namespace. When kind recreated the cluster with the cleaned etcd,
kube-apiserver saw existing data and skipped bootstrapping the service.
kindnet panicked on KUBERNETES_SERVICE_HOST missing, blocking all pod
networking.

Expand the whitelist to also preserve:
- /registry/services/specs/default/kubernetes
- /registry/services/endpoints/default/kubernetes

Loop over multiple prefixes instead of a single etcdctl get --prefix call.

See docs/bug-laconic-so-etcd-cleanup.md in biscayne-agave-runbook.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-07 17:56:13 +00:00
A. F. Dudley a11d40f2f3 fix(k8s): add HostToContainer mount propagation to kind extraMounts
Without propagation, rbind submounts on the host (e.g., XFS zvol at
/srv/kind/solana) are invisible inside the kind node — it sees the
underlying filesystem (ZFS) instead. This causes agave's io_uring to
deadlock on ZFS transaction commits (D-state in dsl_dir_tempreserve_space).

HostToContainer propagation ensures host submounts propagate into the
kind node, so /mnt/solana correctly resolves to the XFS zvol.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-07 13:07:12 +00:00
A. F. Dudley 929bdab8a4 fix(k8s): add HostToContainer mount propagation to kind-mount-root
The kind-mount-root extraMount entry used kind's default propagation
(None), so new bind mounts under the root on the host (e.g. zvols
mounted under /srv/kind) were not visible inside the kind node until
restart. Setting propagation to HostToContainer makes host-side mount
changes propagate into the kind node automatically.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-07 12:58:04 +00:00
A. F. Dudley b6d6ad8145 feat(k8s): add kind-mount-root for unified kind extraMount
When kind-mount-root is set in spec.yml, emit a single extraMount
mapping the root to /mnt instead of per-volume mounts. This allows
adding new volumes without recreating the kind cluster.

Volumes whose host path is under the root are skipped for individual
extraMounts and their PV paths resolve to /mnt/{relative_path}.
Volumes outside the root keep individual extraMounts as before.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-07 12:57:19 +00:00
A. F. Dudley eae4c3cdff feat(k8s): per-service resource layering in deployer
Resolve container resources using layered priority:
1. spec.yml per-container override (resources.containers.<name>)
2. Compose file deploy.resources block
3. spec.yml global resources
4. DEFAULT_CONTAINER_RESOURCES fallback

This prevents monitoring sidecars from inheriting the validator's
resource requests (e.g., 256G memory). Each service gets appropriate
resources from its compose definition unless explicitly overridden.

Note: existing deployments with a global resources block in spec.yml
can remove it once compose files declare per-service defaults.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-07 10:26:10 +00:00
A. F. Dudley d090f2064e docs: annotate spec.yml config layering conventions
Compose file owns application defaults. spec.yml config: section is for
deployment-specific overrides only (hostnames, IPs, secrets). Start
scripts should not have their own defaults — they read what the compose
file provides.

Annotations added:
- CLAUDE.md: config layering table and anti-pattern callout
- spec.py: Spec class docstring with good/bad config examples
- deployment_create.py: _write_config_file docstring

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-07 08:47:12 +00:00
A. F. Dudley 26dea540e9 fix(k8s): use deployment namespace for pod and container lookups
pods_in_deployment() and containers_in_pod() were hardcoded to search
the "default" namespace, but deployments are created in a per-deployment
namespace (laconic-{name}). This caused logs() to report "Pods not
running" even when pods were healthy.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-04 17:13:08 +00:00
A. F. Dudley 7cd5043a83 feat(k8s): add kind-mount-root for unified kind extraMount
When kind-mount-root is set in spec.yml, emit a single extraMount
mapping the root to /mnt instead of per-volume mounts. This allows
adding new volumes without recreating the kind cluster.

Volumes whose host path is under the root are skipped for individual
extraMounts and their PV paths resolve to /mnt/{relative_path}.
Volumes outside the root keep individual extraMounts as before.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-04 16:41:16 +00:00
A. F. Dudley fb69cc58ff feat(k8s): map compose service ports to Kind extraPortMappings and support hostNetwork
Kind's extraPortMappings only included ports 80/443 for Caddy. Compose
service ports (RPC, gossip, UDP) were never forwarded, making them
unreachable from the host. Also adds hostNetwork/dnsPolicy to the k8s
pod spec when any compose service uses network_mode: host.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-03 05:28:52 +00:00
A. F. Dudley 019225ca18 fix(k8s): translate service names to localhost for sidecar containers
In docker-compose, services can reference each other by name (e.g., 'db:5432').
In Kubernetes, when multiple containers are in the same pod (sidecars), they
share the same network namespace and must use 'localhost' instead.

This fix adds translate_sidecar_service_names() which replaces docker-compose
service name references with 'localhost' in environment variable values for
containers that share the same pod.

Fixes issue where multi-container pods fail because one container tries to
connect to a sibling using the compose service name instead of localhost.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-03 18:10:32 -05:00
A. F. Dudley d913926144 feat(k8s): namespace-per-deployment for resource isolation and cleanup
Each deployment now gets its own Kubernetes namespace (laconic-{deployment_id}).
This provides:
- Resource isolation between deployments on the same cluster
- Simplified cleanup: deleting the namespace cascades to all namespaced resources
- No orphaned resources possible when deployment IDs change

Changes:
- Set k8s_namespace based on deployment name in __init__
- Add _ensure_namespace() to create namespace before deploying resources
- Add _delete_namespace() for cleanup
- Simplify down() to just delete PVs (cluster-scoped) and the namespace
- Fix hardcoded "default" namespace in logs function

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-03 18:04:52 -05:00
A. F. Dudley 47d3d10ead fix(k8s): query resources by label in down() for proper cleanup
Previously, down() generated resource names from the deployment config
and deleted those specific names. This failed to clean up orphaned
resources when deployment IDs changed (e.g., after force_redeploy).

Changes:
- Add 'app' label to all resources: Ingress, Service, NodePort, ConfigMap, PV
- Refactor down() to query K8s by label selector instead of generating names
- This ensures all resources for a deployment are cleaned up, even if
  the deployment config has changed or been deleted

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-03 17:55:14 -05:00
A. F. Dudley f70e87b848 Add etcd + PKI extraMounts for offline data recovery
Mount /var/lib/etcd and /etc/kubernetes/pki to host filesystem
so cluster state is preserved for offline recovery. Each deployment
gets its own backup directory keyed by deployment ID.

Directory structure:
  data/cluster-backups/{deployment_id}/etcd/
  data/cluster-backups/{deployment_id}/pki/

This enables extracting secrets from etcd backups using etcdctl
with the preserved PKI certificates.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-03 17:19:52 -05:00
A. F. Dudley 5bc6c978ac feat(k8s): support acme-email config for Caddy ingress
Adds support for configuring ACME email for Let's Encrypt certificates
in kind deployments. The email can be specified in the spec under
network.acme-email and will be used to configure the Caddy ingress
controller ConfigMap.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-03 17:19:52 -05:00
A. F. Dudley ee59918082 Allow relative volume paths for k8s-kind deployments
For k8s-kind, relative paths (e.g., ./data/rpc-config) are resolved to
$DEPLOYMENT_DIR/path by _make_absolute_host_path() during kind config
generation. This provides Docker Host persistence that survives cluster
restarts.

Previously, validation threw an exception before paths could be resolved,
making it impossible to use relative paths for persistent storage.

Changes:
- deployment_create.py: Skip relative path check for k8s-kind
- cluster_info.py: Allow relative paths to reach PV generation
- docs/deployment_patterns.md: Document volume persistence patterns

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-03 17:17:44 -05:00
A. F. Dudley 7cecf2caa6 Fix Caddy ACME email race condition by templating YAML
Previously, install_ingress_for_kind() applied the YAML (which starts
the Caddy pod with email: ""), then patched the ConfigMap afterward.
The pod had already read the empty email and Caddy doesn't hot-reload.

Now template the email into the YAML before applying, so the pod starts
with the correct email from the beginning.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-03 17:16:26 -05:00
A. F. Dudley cb6fdb77a6 Rename image-registry to registry-credentials to avoid collision
The existing 'image-registry' key is used for pushing images to a remote
registry (URL string). Rename the new auth config to 'registry-credentials'
to avoid collision.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-03 17:16:26 -05:00
A. F. Dudley 73ba13aaa5 Add private registry authentication support
Add ability to configure private container registry credentials in spec.yml
for deployments using images from registries like GHCR.

- Add get_image_registry_config() to spec.py for parsing image-registry config
- Add create_registry_secret() to create K8s docker-registry secrets
- Update cluster_info.py to use dynamic {deployment}-registry secret names
- Update deploy_k8s.py to create registry secret before deployment
- Document feature in deployment_patterns.md

The token-env pattern keeps credentials out of git - the spec references an
environment variable name, and the actual token is passed at runtime.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-03 17:16:26 -05:00
A. F. Dudley d82b3fb881 Only load locally-built images into kind, auto-detect ingress
- Check stack.yml containers: field to determine which images are local builds
- Only load local images via kind load; let k8s pull registry images directly
- Add is_ingress_running() to skip ingress installation if already running
- Fixes deployment failures when public registry images aren't in local Docker

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-03 17:16:26 -05:00
A. F. Dudley 3bc7832d8c Fix deployment name extraction from path
When stack: field in spec.yml contains a path (e.g., stack_orchestrator/data/stacks/name),
extract just the final name component for K8s secret naming. K8s resource names must
be valid RFC 1123 subdomains and cannot contain slashes.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-03 17:16:26 -05:00
A. F. Dudley b057969ddd Clarify create_cluster docstring: one cluster per host by design
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-03 17:15:19 -05:00
A. F. Dudley ca090d2cd5 Add $generate:type:length$ token support for K8s secrets
- Add GENERATE_TOKEN_PATTERN to detect $generate:hex:N$ and $generate:base64:N$ tokens
- Add _generate_and_store_secrets() to create K8s Secrets from spec.yml config
- Modify _write_config_file() to separate secrets from regular config
- Add env_from with secretRef to container spec in cluster_info.py
- Secrets are injected directly into containers via K8s native mechanism

This enables declarative secret generation in spec.yml:
  config:
    SESSION_SECRET: $generate:hex:32$
    DB_PASSWORD: $generate:hex:16$

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-03 17:15:19 -05:00
A. F. Dudley 2d3721efa4 Add cluster reuse for multi-stack k8s-kind deployments
When deploying a second stack to k8s-kind, automatically reuse an existing
kind cluster instead of trying to create a new one (which would fail due
to port 80/443 conflicts).

Changes:
- helpers.py: create_cluster() now checks for existing cluster first
- deploy_k8s.py: up() captures returned cluster name and updates self

This enables deploying multiple stacks (e.g., gorbagana-rpc + trashscan-explorer)
to the same kind cluster.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-03 17:15:19 -05:00
A. F. Dudley 4408725b08 Fix repo root path calculation (4 parents from stack path) 2026-02-03 17:15:19 -05:00
A. F. Dudley 22d64f1e97 Add --spec-file option to restart and auto-detect GitOps spec
- Add --spec-file option to specify spec location in repo
- Auto-detect deployment/spec.yml in repo as GitOps location
- Fall back to deployment dir if no repo spec found

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-03 17:15:19 -05:00
A. F. Dudley 14258500bc Fix restart command for GitOps deployments
- Remove init_operation() from restart - don't regenerate spec from
  commands.py defaults, use existing git-tracked spec.yml instead
- Add docs/deployment_patterns.md documenting GitOps workflow
- Add pre-commit rule to CLAUDE.md
- Fix line length issues in helpers.py

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-03 17:15:19 -05:00
A. F. Dudley 3fbd854b8c Use docker for etcd existence check (root-owned dir)
The etcd directory is root-owned, so shell test -f fails.
Use docker with volume mount to check file existence.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-03 17:15:19 -05:00
A. F. Dudley e2d3c44321 Keep timestamped backup of etcd forever
Create member.backup-YYYYMMDD-HHMMSS before cleaning.
Each cluster recreation creates a new backup, preserving history.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-03 17:15:19 -05:00
A. F. Dudley 720e01fc75 Preserve original etcd backup until restore is verified
Move original to .bak, move new into place, then delete bak.
If anything fails before the swap, original remains intact.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-03 17:15:19 -05:00
A. F. Dudley 5b06cffe17 Use whitelist approach for etcd cleanup
Instead of trying to delete specific stale resources (blacklist),
keep only the valuable data (caddy TLS certs) and delete everything
else. This is more robust as we don't need to maintain a list of
all possible stale resources.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-03 17:15:19 -05:00
A. F. Dudley 8948f5bfec Fix etcd cleanup to use docker for root-owned files
Use docker containers with volume mounts to handle all file
operations on root-owned etcd directories, avoiding the need
for sudo on the host.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-03 17:15:19 -05:00
A. F. Dudley 675ee87544 Clear stale CNI resources from persisted etcd before cluster creation
When etcd is persisted (for certificate backup) and a cluster is
recreated, kind tries to install CNI (kindnet) fresh but the
persisted etcd already has those resources, causing 'AlreadyExists'
errors and cluster creation failure.

This fix:
- Detects etcd mount path from kind config
- Before cluster creation, clears stale CNI resources (kindnet, coredns)
- Preserves certificate and other important data

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-03 17:15:19 -05:00
A. F. Dudley 8d3191e4fd Fix Caddy ingress ACME email and RBAC issues
- Add acme_email_key constant for spec.yml parsing
- Add get_acme_email() method to Spec class
- Modify install_ingress_for_kind() to patch ConfigMap with email
- Pass acme-email from spec to ingress installation
- Add 'delete' verb to leases RBAC for certificate lock cleanup

The acme-email field in spec.yml was previously ignored, causing
Let's Encrypt to fail with "unable to parse email address".
The missing delete permission on leases caused lock cleanup failures.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-03 17:15:19 -05:00
A. F. Dudley c197406cc7 feat(deploy): add deployment restart command
Add `laconic-so deployment restart` command that:
- Pulls latest code from stack git repository
- Regenerates spec.yml from stack's commands.py
- Verifies DNS if hostname changed (with --force to skip)
- Syncs deployment directory preserving cluster ID and data
- Stops and restarts deployment with --skip-cluster-management

Also stores stack-source path in deployment.yml during create
for automatic stack location on restart.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-03 17:15:19 -05:00
A. F. Dudley 76c0c17c3b fix(deploy): merge volumes from stack init() instead of overwriting
Previously, volumes defined in a stack's commands.py init() function
were being overwritten by volumes discovered from compose files.
This prevented stacks from adding infrastructure volumes like caddy-data
that aren't defined in the compose files.

Now volumes are merged, with init() volumes taking precedence over
compose-discovered defaults.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-31 18:23:20 -05:00
A. F. Dudley 458b548dcf fix(k8s): add hostPath support for compose host path mounts
Add support for Docker Compose host path mounts (like ../config/file:/path)
in k8s deployments. Previously these were silently skipped, causing k8s
deployments to fail when compose files used host path mounts.

Changes:
- Add helper functions for host path detection and name sanitization
- Generate kind extraMounts for host path mounts
- Create hostPath volumes in pod specs for host path mounts
- Create volumeMounts with sanitized names for host path mounts

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-30 19:25:28 -05:00
Roy Crihfield 789b2dd3a7 Add --update option to deploy create
To allow updating an existing deployment

- Check the deployment dir exists when updating
- Write to temp dir, then safely copy tree
- Don't overwrite data dir or config.env
2026-01-29 08:25:05 -06:00
A. F. Dudley d07a3afd27 Merge origin/main into multi-port-service
Resolve conflicts:
- deployment_context.py: Keep single modify_yaml method from main
- fixturenet-optimism/commands.py: Use modify_yaml helper from main
- deployment_create.py: Keep helm-chart, network-dir, initial-peers options
- deploy_webapp.py: Update create_operation call signature

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-24 16:48:11 -05:00
A. F. Dudley a5b373da26 Check for None before creating k8s service
get_service() returns None when there are no http-proxy routes,
so we must check before calling create_namespaced_service().

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-24 16:39:11 -05:00
A. F. Dudley 4f01054781 Expose all ports from http-proxy routes in k8s Service
Previously get_service() only exposed the first port from pod definition.
Now it collects all unique ports from http-proxy routes and exposes them
all in the Service spec.

This is needed for WebSocket support where RPC runs on one port (8899)
and WebSocket pubsub on another (8900) - both need to be accessible
through the ingress.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-24 15:14:48 -05:00
A. F. Dudley 8d9682eb47 Use caddy ingress class instead of nginx in cluster_info.py
Lint Checks / Run linter (push) Failing after 0s Details
The ingress annotation was still set to nginx class even though we're now
using Caddy as the ingress controller. Caddy won't pick up ingresses
annotated with the nginx class.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-22 03:41:35 -05:00
A. F. Dudley 638435873c Add port 443 mapping for kind clusters with Caddy ingress
Caddy provides automatic HTTPS with Let's Encrypt, but needs port 443
mapped from the kind container to the host. Previously only port 80 was
mapped.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-22 03:35:03 -05:00
A. F. Dudley 97a85359ff Fix helpers.py to use Caddy ingress instead of nginx
The helm-charts-with-caddy branch had the Caddy manifest file but was still
using nginx in the code. This change:

- Switch install_ingress_for_kind() to use ingress-caddy-kind-deploy.yaml
- Update wait_for_ingress_in_kind() to watch caddy-system namespace
- Use correct label selector for Caddy ingress controller pods

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-22 03:22:07 -05:00
A. F. Dudley ffa00767d4 Add extra_args support to deploy create command
- Add @click.argument for generic args passthrough to stack commands
- Keep explicit --network-dir and --initial-peers options
- Add DeploymentContext.get_compose_file() helper
- Add DeploymentContext.modify_yaml() helper for stack commands
- Update init() to use absolute paths

This allows stack-specific create commands to receive arbitrary
arguments via: laconic-so deploy create ... -- --custom-arg value

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-22 03:06:45 -05:00
A. F. Dudley 86462c940f Fix high-memlock spec to include complete OCI runtime config
The base_runtime_spec for containerd requires a complete OCI spec,
not just the rlimits section. The minimal spec was causing runc to
fail with "open /proc/self/fd: no such file or directory" because
essential mounts and namespaces were missing.

This commit uses kind's default cri-base.json as the base and adds
the rlimits configuration on top. The spec includes all necessary
mounts, namespaces, capabilities, and kind-specific hooks.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-22 02:12:11 -05:00
A. F. Dudley 87db167d7f Add RuntimeClass support for unlimited RLIMIT_MEMLOCK
The previous approach of mounting cri-base.json into kind nodes failed
because we didn't tell containerd to use it via containerdConfigPatches.

RuntimeClass allows different stacks to have different rlimit profiles,
which is essential since kind only supports one cluster per host and
multiple stacks share the same cluster.

Changes:
- Add containerdConfigPatches to kind-config.yml to define runtime handlers
- Create RuntimeClass resources after cluster creation
- Add runtimeClassName to pod specs based on stack's security settings
- Rename cri-base.json to high-memlock-spec.json for clarity
- Add get_runtime_class() method to Spec that auto-derives from
  unlimited-memlock setting

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-22 01:58:38 -05:00
A. F. Dudley dd856af2d3 Fix pyright type errors across codebase
- Add pyrightconfig.json for pyright 1.1.408 TOML parsing workaround
- Add NoReturn annotations to fatal() functions for proper type narrowing
- Add None checks and assertions after require=True get_record() calls
- Fix AttrDict class with __getattr__ for dynamic attribute access
- Add type annotations and casts for Kubernetes client objects
- Store compose config as DockerDeployer instance attributes
- Filter None values from dotenv and environment mappings
- Use hasattr/getattr patterns for optional container attributes

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-22 01:10:36 -05:00
A. F. Dudley cd3d908d0d Apply pre-commit linting fixes
- Format code with black (line length 88)
- Fix E501 line length errors by breaking long strings and comments
- Fix F841 unused variable (removed unused 'quiet' variable)
- Configure pyright to disable common type issues in existing codebase
  (reportGeneralTypeIssues, reportOptionalMemberAccess, etc.)
- All pre-commit hooks now pass

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-21 20:58:31 -05:00
A. F. Dudley 03f9acf869 Add unlimited-memlock support for Kind clusters
Lint Checks / Run linter (push) Failing after 0s Details
Container Registry Test / Run contaier registry hosting test on kind/k8s (push) Failing after 0s Details
Add spec.yml option `security.unlimited-memlock` that configures
RLIMIT_MEMLOCK to unlimited for Kind cluster pods. This is needed
for workloads like Solana validators that require large amounts of
locked memory for memory-mapped files during snapshot decompression.

When enabled, generates a cri-base.json file with rlimits and mounts
it into the Kind node to override the default containerd runtime spec.

Also includes flake8 line-length fixes for affected files.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-21 20:20:19 -05:00
A. F. Dudley dc36a6564a Fix misleading error message in load_images_into_kind 2026-01-21 19:32:53 -05:00
A. F. Dudley d8da9b6515 Add missing get_kind_cluster function to helpers.py
Fixes ImportError in k8s_command.py that was causing CI failure.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-21 19:04:46 -05:00
A. F. Dudley 89db6e1e92 Add Caddy ingress and k8s cluster management features
- Add Caddy ingress controller manifest for kind deployments
- Add k8s cluster list command for kind cluster management
- Add k8s_command import and registration in deploy.py
- Fix network section merge to preserve http-proxy settings
- Increase default container resources (4 CPUs, 8GB memory)
- Add UDP protocol support for K8s port definitions
- Add command/entrypoint support for K8s deployments
- Implement docker-compose variable expansion for K8s
- Set ConfigMap defaultMode to 0755 for executable scripts

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-20 23:14:22 -05:00
Prathamesh Musale 8afae1904b Add support for running jobs from a stack (#975)
Lint Checks / Run linter (push) Failing after 5s Details
Part of https://plan.wireit.in/deepstack/browse/VUL-265/

Reviewed-on: https://git.vdb.to/cerc-io/stack-orchestrator/pulls/975
Co-authored-by: Prathamesh Musale <prathamesh.musale0@gmail.com>
Co-committed-by: Prathamesh Musale <prathamesh.musale0@gmail.com>
2025-12-04 06:13:28 +00:00
Prathamesh Musale 7acabb0743 Add support for generating Helm charts when creating a deployment (#974)
Lint Checks / Run linter (push) Failing after 3s Details
Part of https://plan.wireit.in/deepstack/browse/VUL-265/

- Added a flag `--helm-chart` to `deploy create` command
- Uses Kompose CLI wrapper to generate a helm chart from compose files in a stack
- To be handled in a follow on PR(s):
  - Templatize generated charts and generate a `values.yml` file with defaults

Reviewed-on: https://git.vdb.to/cerc-io/stack-orchestrator/pulls/974
Co-authored-by: Prathamesh Musale <prathamesh.musale0@gmail.com>
Co-committed-by: Prathamesh Musale <prathamesh.musale0@gmail.com>
2025-11-27 06:43:07 +00:00
Roy Crihfield ccccd9f957 Pass extra args to custom create command (#972)
Publish / Build and publish (push) Failing after 5s Details
Deploy Test / Run deploy test suite (push) Failing after 2s Details
Smoke Test / Run basic test suite (push) Failing after 2s Details
Lint Checks / Run linter (push) Failing after 2s Details
Webapp Test / Run webapp test suite (push) Failing after 2s Details
This is needed to allow custom deploy commands to handle arbitrary args.

* Adds a `DeploymentContext.modify_yaml` helper
* Removes `laconicd` from test stack to simplify it

Reviewed-on: https://git.vdb.to/cerc-io/stack-orchestrator/pulls/972
Reviewed-by: ashwin <ashwin@noreply.git.vdb.to>
2025-11-25 03:05:35 +00:00
Roy Crihfield 34f3b719e4 Path is not a context manager in python 3.13 (#971)
Publish / Build and publish (push) Failing after 9s Details
Deploy Test / Run deploy test suite (push) Failing after 3s Details
Lint Checks / Run linter (push) Failing after 2s Details
Webapp Test / Run webapp test suite (push) Failing after 2s Details
Smoke Test / Run basic test suite (push) Failing after 3s Details
Reviewed-on: https://git.vdb.to/cerc-io/stack-orchestrator/pulls/971
Reviewed-by: rachmaninovquar <rachmaninovquar@noreply.git.vdb.to>
2025-10-16 17:55:44 +00:00
Prathamesh Musale 873a6d472c Update webapp deployment flow for supporting custom domains (#963)
Webapp Test / Run webapp test suite (push) Failing after 7s Details
Smoke Test / Run basic test suite (push) Failing after 4s Details
Deploy Test / Run deploy test suite (push) Failing after 3s Details
Lint Checks / Run linter (push) Failing after 3s Details
Publish / Build and publish (push) Failing after 4s Details
Part of https://www.notion.so/Support-custom-domains-in-deploy-laconic-com-18aa6b22d4728067a44ae27090c02ce5 and https://git.vdb.to/cerc-io/snowballtools-base/issues/47

- Set `value` (IP address for `A` resource) in the DNS records
- Update `deploy-webapp-from-registry` command with an option to pass k8s cluster IP address (only required with fqdn policy `allow`)

Reviewed-on: https://git.vdb.to/cerc-io/stack-orchestrator/pulls/963
Reviewed-by: ashwin <ashwin@noreply.git.vdb.to>
Co-authored-by: Prathamesh Musale <prathamesh.musale0@gmail.com>
Co-committed-by: Prathamesh Musale <prathamesh.musale0@gmail.com>
2025-02-04 13:26:31 +00:00
Prathamesh Musale 39df4683ac Allow payment reuse for same app LRN (#961)
Publish / Build and publish (push) Failing after 5s Details
Smoke Test / Run basic test suite (push) Failing after 3s Details
Lint Checks / Run linter (push) Failing after 3s Details
Deploy Test / Run deploy test suite (push) Failing after 3s Details
Webapp Test / Run webapp test suite (push) Failing after 3s Details
Part of [Service provider auctions for web deployments](https://www.notion.so/Service-provider-auctions-for-web-deployments-104a6b22d47280dbad51d28aa3a91d75)

Reviewed-on: https://git.vdb.to/cerc-io/stack-orchestrator/pulls/961
Reviewed-by: ashwin <ashwin@noreply.git.vdb.to>
Co-authored-by: Prathamesh Musale <prathamesh.musale0@gmail.com>
Co-committed-by: Prathamesh Musale <prathamesh.musale0@gmail.com>
2024-10-29 11:30:03 +00:00
Prathamesh Musale 23ca4c4341 Allow payment reuse for application redeployment (#960)
Deploy Test / Run deploy test suite (push) Failing after 4s Details
Smoke Test / Run basic test suite (push) Failing after 3s Details
Lint Checks / Run linter (push) Failing after 3s Details
Publish / Build and publish (push) Failing after 5s Details
Webapp Test / Run webapp test suite (push) Failing after 2s Details
Part of [Service provider auctions for web deployments](https://www.notion.so/Service-provider-auctions-for-web-deployments-104a6b22d47280dbad51d28aa3a91d75)

Reviewed-on: https://git.vdb.to/cerc-io/stack-orchestrator/pulls/960
Reviewed-by: ashwin <ashwin@noreply.git.vdb.to>
Co-authored-by: Prathamesh Musale <prathamesh.musale0@gmail.com>
Co-committed-by: Prathamesh Musale <prathamesh.musale0@gmail.com>
2024-10-29 06:51:48 +00:00
Prathamesh Musale f64ef5d128 Use file existence for registry mutex (#959)
Part of [Service provider auctions for web deployments](https://www.notion.so/Service-provider-auctions-for-web-deployments-104a6b22d47280dbad51d28aa3a91d75)

Reviewed-on: https://git.vdb.to/cerc-io/stack-orchestrator/pulls/959
Reviewed-by: ashwin <ashwin@noreply.git.vdb.to>
Co-authored-by: Prathamesh Musale <prathamesh.musale0@gmail.com>
Co-committed-by: Prathamesh Musale <prathamesh.musale0@gmail.com>
2024-10-29 04:05:35 +00:00
Prathamesh Musale 5f8e809b2d Add mutex lock file path to registry CLI wrapper class (#958)
Lint Checks / Run linter (push) Failing after 4s Details
Publish / Build and publish (push) Failing after 5s Details
Deploy Test / Run deploy test suite (push) Failing after 3s Details
Webapp Test / Run webapp test suite (push) Failing after 3s Details
Smoke Test / Run basic test suite (push) Failing after 2s Details
Part of [Service provider auctions for web deployments](https://www.notion.so/Service-provider-auctions-for-web-deployments-104a6b22d47280dbad51d28aa3a91d75)
Follows https://git.vdb.to/cerc-io/stack-orchestrator/pulls/957

Reviewed-on: https://git.vdb.to/cerc-io/stack-orchestrator/pulls/958
Reviewed-by: ashwin <ashwin@noreply.git.vdb.to>
Co-authored-by: Prathamesh Musale <prathamesh.musale0@gmail.com>
Co-committed-by: Prathamesh Musale <prathamesh.musale0@gmail.com>
2024-10-28 06:03:13 +00:00
Prathamesh Musale 4a7df2de33 Use a mutex for registry CLI txs in webapp deployment commands (#957)
Publish / Build and publish (push) Failing after 9s Details
Deploy Test / Run deploy test suite (push) Failing after 3s Details
Lint Checks / Run linter (push) Failing after 3s Details
Webapp Test / Run webapp test suite (push) Failing after 3s Details
Smoke Test / Run basic test suite (push) Failing after 3s Details
Part of [Service provider auctions for web deployments](https://www.notion.so/Service-provider-auctions-for-web-deployments-104a6b22d47280dbad51d28aa3a91d75) and https://git.vdb.to/cerc-io/stack-orchestrator/issues/948

- Add a registry mutex decorator over tx methods in `LaconicRegistryClient` wrapper
- Required to allow multiple process to run webapp deployment tooling without running into account sequence errors when sending laconicd txs

Reviewed-on: https://git.vdb.to/cerc-io/stack-orchestrator/pulls/957
Reviewed-by: ashwin <ashwin@noreply.git.vdb.to>
Co-authored-by: Prathamesh Musale <prathamesh.musale0@gmail.com>
Co-committed-by: Prathamesh Musale <prathamesh.musale0@gmail.com>
2024-10-25 08:40:54 +00:00
Prathamesh Musale 0c47da42fe Integrate SP auctions in webapp deployment flow (#950)
Publish / Build and publish (push) Failing after 7s Details
Deploy Test / Run deploy test suite (push) Failing after 2s Details
Smoke Test / Run basic test suite (push) Failing after 2s Details
Lint Checks / Run linter (push) Failing after 2s Details
Webapp Test / Run webapp test suite (push) Failing after 3s Details
Part of [Service provider auctions for web deployments](https://www.notion.so/Service-provider-auctions-for-web-deployments-104a6b22d47280dbad51d28aa3a91d75) and https://git.vdb.to/cerc-io/stack-orchestrator/issues/948

- Add a command `publish-deployment-auction` to create and publish an app deployment auction
- Add a command `handle-deployment-auction` to handle auctions on deployer side
- Update `request-webapp-deployment` command to allow using an auction id in deployment requests
- Update `deploy-webapp-from-registry` command to handle deployment requests with auction
- Add a command `request-webapp-undeployment` to request an application undeployment

Reviewed-on: https://git.vdb.to/cerc-io/stack-orchestrator/pulls/950
Reviewed-by: ashwin <ashwin@noreply.git.vdb.to>
Co-authored-by: Prathamesh Musale <prathamesh.musale0@gmail.com>
Co-committed-by: Prathamesh Musale <prathamesh.musale0@gmail.com>
2024-10-21 07:02:06 +00:00
Thomas E Lackey f1fdc48aaa Work around this bug: https://github.com/python/cpython/pull/14064 (#941)
Publish / Build and publish (push) Failing after 6s Details
Lint Checks / Run linter (push) Failing after 3s Details
Deploy Test / Run deploy test suite (push) Failing after 3s Details
Webapp Test / Run webapp test suite (push) Failing after 3s Details
Smoke Test / Run basic test suite (push) Failing after 3s Details
Otherwise we sometimes see errors like:

```
cerc-webapp-deployer:   File "/root/.shiv/laconic-so_0f937aa98c2748ef9af8585d6f441dbc01546ace0d6660cbb159d1e5040aeddf/site-packages/stack_orchestrator/deploy/webapp/deploy_webapp_from_registry.py", line 671, in command
cerc-webapp-deployer:     shutil.rmtree(tempdir)
cerc-webapp-deployer:   File "/usr/lib/python3.10/shutil.py", line 725, in rmtree
cerc-webapp-deployer:     _rmtree_safe_fd(fd, path, onerror)
cerc-webapp-deployer:   File "/usr/lib/python3.10/shutil.py", line 681, in _rmtree_safe_fd
cerc-webapp-deployer:     onerror(os.unlink, fullname, sys.exc_info())
cerc-webapp-deployer:   File "/usr/lib/python3.10/shutil.py", line 679, in _rmtree_safe_fd
cerc-webapp-deployer:     os.unlink(entry.name, dir_fd=topfd)
cerc-webapp-deployer: FileNotFoundError: [Errno 2] No such file or directory: 'S.gpg-agent.extra'
```

Reviewed-on: https://git.vdb.to/cerc-io/stack-orchestrator/pulls/941
Co-authored-by: Thomas E Lackey <telackey@bozemanpass.com>
Co-committed-by: Thomas E Lackey <telackey@bozemanpass.com>
2024-08-28 23:17:13 +00:00
Thomas E Lackey a54072de6c Add --config-ref flag. (#939)
Lint Checks / Run linter (push) Failing after 3s Details
Publish / Build and publish (push) Failing after 4s Details
Deploy Test / Run deploy test suite (push) Failing after 3s Details
Webapp Test / Run webapp test suite (push) Failing after 3s Details
Smoke Test / Run basic test suite (push) Failing after 3s Details
Add a flag to re-use config.

Reviewed-on: https://git.vdb.to/cerc-io/stack-orchestrator/pulls/939
Co-authored-by: Thomas E Lackey <telackey@bozemanpass.com>
Co-committed-by: Thomas E Lackey <telackey@bozemanpass.com>
2024-08-28 17:32:52 +00:00
Thomas E Lackey fa21ff2627 Support uploaded config, add 'publish-webapp-deployer' and 'request-webapp-deployment' commands (#938)
Lint Checks / Run linter (push) Failing after 5s Details
Publish / Build and publish (push) Failing after 4s Details
Webapp Test / Run webapp test suite (push) Failing after 3s Details
Deploy Test / Run deploy test suite (push) Failing after 3s Details
Smoke Test / Run basic test suite (push) Failing after 3s Details
This adds two new commands: `publish-webapp-deployer` and `request-webapp-deployment`.

`publish-webapp-deployer` creates a `WebappDeployer` record, which provides information to requestors like the API URL, minimum required payment, payment address, and public key to use for encrypting config.

```
$ laconic-so publish-deployer-to-registry \
  --laconic-config ~/.laconic/laconic.yml \
  --api-url https://webapp-deployer-api.dev.vaasl.io \
  --public-key-file webapp-deployer-api.dev.vaasl.io.pgp.pub  \
  --lrn lrn://laconic/deployers/webapp-deployer-api.dev.vaasl.io  \
  --min-required-payment 100000
```

`request-webapp-deployment` simplifies publishing a `WebappDeploymentRequest` and can also handle automatic payment, and encryption and upload of configuration.

```
$ laconic-so request-webapp-deployment \
  --laconic-config ~/.laconic/laconic.yml \
  --deployer lrn://laconic/deployers/webapp-deployer-api.dev.vaasl.io \
  --app lrn://cerc-io/applications/webapp-hello-world@0.1.3 \
  --env-file ~/yaml/hello.env \
  --make-payment auto
```

Related changes are included for the deploy/undeploy commands for decrypting and using config, using the payment address from the WebappDeployer record, etc.

Reviewed-on: https://git.vdb.to/cerc-io/stack-orchestrator/pulls/938
2024-08-27 19:55:06 +00:00
Thomas E Lackey 75ff60752a Require payment for app deployment requests. (#928)
Lint Checks / Run linter (push) Failing after 4s Details
Publish / Build and publish (push) Failing after 5s Details
Deploy Test / Run deploy test suite (push) Failing after 3s Details
Webapp Test / Run webapp test suite (push) Failing after 3s Details
Smoke Test / Run basic test suite (push) Failing after 3s Details
Adds three new options for deployment/undeployment:

```
    "--min-required-payment",
    help="Requests must have a minimum payment to be processed",

    "--payment-address",
    help="The address to which payments should be made.  Default is the current laconic account.",

    "--all-requests",
    help="Handle requests addressed to anyone (by default only requests to my payment address are examined).",
```

In this mode, requests should be designated for a particular address with the attribute `to` and include a `payment` attribute which is the tx hash for the payment.

The deployer will confirm the payment (to the right account, right amount, not used before, etc.) and then proceed with the deployment or undeployment.

Reviewed-on: https://git.vdb.to/cerc-io/stack-orchestrator/pulls/928
Reviewed-by: David Boreham <dboreham@noreply.git.vdb.to>
Co-authored-by: Thomas E Lackey <telackey@bozemanpass.com>
Co-committed-by: Thomas E Lackey <telackey@bozemanpass.com>
2024-08-21 14:39:20 +00:00
David Boreham e56da7dcc1 Add support for k8s pod to node affinity and taint toleration (#917)
Deploy Test / Run deploy test suite (push) Failing after 4s Details
Webapp Test / Run webapp test suite (push) Failing after 3s Details
Smoke Test / Run basic test suite (push) Failing after 4s Details
Publish / Build and publish (push) Failing after 4s Details
K8s Deployment Control Test / Run deployment control suite on kind/k8s (push) Failing after 3s Details
Lint Checks / Run linter (push) Failing after 3s Details
Reviewed-on: https://git.vdb.to/cerc-io/stack-orchestrator/pulls/917
Reviewed-by: Thomas E Lackey <telackey@noreply.git.vdb.to>
Co-authored-by: David Boreham <david@bozemanpass.com>
Co-committed-by: David Boreham <david@bozemanpass.com>
2024-08-15 20:32:58 +00:00
Thomas E Lackey 60d34217f8 More logging for webapp deployment (#923)
Lint Checks / Run linter (push) Failing after 3s Details
Webapp Test / Run webapp test suite (push) Failing after 3s Details
Publish / Build and publish (push) Failing after 5s Details
Deploy Test / Run deploy test suite (push) Failing after 3s Details
Smoke Test / Run basic test suite (push) Failing after 3s Details
```
cerc-webapp-deployer: ############ DEPLOY #############
cerc-webapp-deployer: 2024-08-15 02:13:08.321991 -  - 0:00:00.000031 (step): Discovering deployment requests...
cerc-webapp-deployer: laconic -c /etc/config/laconic.yml registry record list --all --type ApplicationDeploymentRequest
cerc-webapp-deployer: 2024-08-15 02:13:08.815428 -  - 0:00:00.493420 (step): Loading known requests from /srv/deployments/autodeploy.state...
cerc-webapp-deployer: 2024-08-15 02:13:08.815626 -  - 0:00:00.000158 (step): BEGIN: Examining request bafyreigiltcdscwt7rqldnilo4ohrhgoulrlfceixde5ycewsym64sefgi
cerc-webapp-deployer: 2024-08-15 02:13:08.815645 -  - 0:00:00.000008 (step): Skipping request bafyreigiltcdscwt7rqldnilo4ohrhgoulrlfceixde5ycewsym64sefgi, we've already seen it.
cerc-webapp-deployer: 2024-08-15 02:13:08.815653 -  - 0:00:00.000005 (step): DONE Examining request bafyreigiltcdscwt7rqldnilo4ohrhgoulrlfceixde5ycewsym64sefgi with result SKIP.
cerc-webapp-deployer: 2024-08-15 02:13:08.815664 -  - 0:00:00.000005 (step): BEGIN: Examining request bafyreicoxippgdwab6cz72py4rgv63rvvbsea73y62hashlhqpcsxyfkue
cerc-webapp-deployer: 2024-08-15 02:13:08.815674 -  - 0:00:00.000006 (step): Skipping request bafyreicoxippgdwab6cz72py4rgv63rvvbsea73y62hashlhqpcsxyfkue, we've already seen it.
cerc-webapp-deployer: 2024-08-15 02:13:08.815684 -  - 0:00:00.000004 (step): DONE Examining request bafyreicoxippgdwab6cz72py4rgv63rvvbsea73y62hashlhqpcsxyfkue with result SKIP.
cerc-webapp-deployer: 2024-08-15 02:13:08.815692 -  - 0:00:00.000005 (step): BEGIN: Examining request bafyreih3gt44pvahnbg7ag26mlk3iie4s5m5znhygajja5dcovheti72ne
cerc-webapp-deployer: 2024-08-15 02:13:08.815705 -  - 0:00:00.000007 (step): Skipping request bafyreih3gt44pvahnbg7ag26mlk3iie4s5m5znhygajja5dcovheti72ne, we've already seen it.
cerc-webapp-deployer: 2024-08-15 02:13:08.815714 -  - 0:00:00.000005 (step): DONE Examining request bafyreih3gt44pvahnbg7ag26mlk3iie4s5m5znhygajja5dcovheti72ne with result SKIP.
cerc-webapp-deployer: 2024-08-15 02:13:08.815724 -  - 0:00:00.000004 (step): BEGIN: Examining request bafyreigjnbio47rug6x5tufzc6cwfcqpl3ck3xldzotrlz5bt663dh2pua
cerc-webapp-deployer: 2024-08-15 02:13:08.815733 -  - 0:00:00.000005 (step): Skipping request bafyreigjnbio47rug6x5tufzc6cwfcqpl3ck3xldzotrlz5bt663dh2pua, we've already seen it.
cerc-webapp-deployer: 2024-08-15 02:13:08.815743 -  - 0:00:00.000005 (step): DONE Examining request bafyreigjnbio47rug6x5tufzc6cwfcqpl3ck3xldzotrlz5bt663dh2pua with result SKIP.
cerc-webapp-deployer: 2024-08-15 02:13:08.815751 -  - 0:00:00.000004 (step): BEGIN: Examining request bafyreihsfno4s6lkxcp5a7g7pjj7kklrp3xaqo57mr2pz76nk3h4jukayy
cerc-webapp-deployer: 2024-08-15 02:13:08.815761 -  - 0:00:00.000006 (step): Skipping request bafyreihsfno4s6lkxcp5a7g7pjj7kklrp3xaqo57mr2pz76nk3h4jukayy, we've already seen it.
cerc-webapp-deployer: 2024-08-15 02:13:08.815770 -  - 0:00:00.000005 (step): DONE Examining request bafyreihsfno4s6lkxcp5a7g7pjj7kklrp3xaqo57mr2pz76nk3h4jukayy with result SKIP.
cerc-webapp-deployer: 2024-08-15 02:13:08.815779 -  - 0:00:00.000005 (step): BEGIN: Examining request bafyreicyfyj4ncmtuy5pain2rvc67v645cg2bbsiakizvhdiwvkx7asvdy
cerc-webapp-deployer: 2024-08-15 02:13:08.815791 -  - 0:00:00.000007 (step): Skipping request bafyreicyfyj4ncmtuy5pain2rvc67v645cg2bbsiakizvhdiwvkx7asvdy, we've already seen it.
cerc-webapp-deployer: 2024-08-15 02:13:08.815800 -  - 0:00:00.000004 (step): DONE Examining request bafyreicyfyj4ncmtuy5pain2rvc67v645cg2bbsiakizvhdiwvkx7asvdy with result SKIP.
cerc-webapp-deployer: 2024-08-15 02:13:08.815808 -  - 0:00:00.000004 (step): Discovering existing app deployments...
cerc-webapp-deployer: laconic -c /etc/config/laconic.yml registry record list --all --type ApplicationDeploymentRecord
cerc-webapp-deployer: 2024-08-15 02:13:09.330655 -  - 0:00:00.514858 (step): Discovering deployment removal and cancellation requests...
cerc-webapp-deployer: laconic -c /etc/config/laconic.yml registry record list --all --type ApplicationDeploymentRemovalRequest
cerc-webapp-deployer: 2024-08-15 02:13:09.825145 -  - 0:00:00.494460 (step): Found 0 unsatisfied request(s) to process.
cerc-webapp-deployer: ############ DEPLOY SUCCESS #############
```

Reviewed-on: https://git.vdb.to/cerc-io/stack-orchestrator/pulls/923
Co-authored-by: Thomas E Lackey <telackey@bozemanpass.com>
Co-committed-by: Thomas E Lackey <telackey@bozemanpass.com>
2024-08-15 02:57:47 +00:00
Thomas E Lackey 952389abb0 Add option to recreate deployments rather than update them. (#920)
cherry-pick from https://git.vdb.to/cerc-io/stack-orchestrator/pulls/912

Reviewed-on: https://git.vdb.to/cerc-io/stack-orchestrator/pulls/920
Reviewed-by: David Boreham <dboreham@noreply.git.vdb.to>
2024-08-14 20:14:40 +00:00
Thomas E Lackey 5c275aa622 Defensively handle errors examining app requests. (#922)
Deploy Test / Run deploy test suite (push) Failing after 3s Details
Smoke Test / Run basic test suite (push) Failing after 3s Details
Lint Checks / Run linter (push) Failing after 2s Details
Publish / Build and publish (push) Failing after 4s Details
Webapp Test / Run webapp test suite (push) Failing after 3s Details
Related to https://git.vdb.to/cerc-io/webapp-deployment-status-api/issues/10

There are two issues in that.  One is that the output probably changed recently, whether in the client or server, where no matching record is found by ID (Note this is specific to `laconic record get --id <v>` and does not seem to apply to the similar command to retrieve a record by name, `laconic name resolve <n>`).

Rather than returning `[]` it is now returning `[ null ]`.  This cause us to think there *was* an application record found, and we attempt to treat the `null` entry like an Application object.  That's fixed by filtering out null responses, which is a good precaution for the deployer, though I think it makes sense to ask whether this new behavior by the client/server is correct.  Seems suspicious.

The other issue is that all the defensive checks we had in place to deal with broken/bad AppDeploymentRequests were around the _build_.  This error was coming much earlier, merely when parsing and examining the request to see if it needed to be handled at all.

I have now added similar defensive error handling around that portion of the code.

Reviewed-on: https://git.vdb.to/cerc-io/stack-orchestrator/pulls/922
Reviewed-by: zramsay <zramsay@noreply.git.vdb.to>
Co-authored-by: Thomas E Lackey <telackey@bozemanpass.com>
Co-committed-by: Thomas E Lackey <telackey@bozemanpass.com>
2024-08-14 18:04:31 +00:00
Thomas E Lackey 8576137557 Convert port to string. (#919)
Lint Checks / Run linter (push) Failing after 4s Details
Webapp Test / Run webapp test suite (push) Failing after 3s Details
Publish / Build and publish (push) Failing after 5s Details
Deploy Test / Run deploy test suite (push) Failing after 3s Details
Smoke Test / Run basic test suite (push) Failing after 3s Details
The str type check doesn't work if the port is a ruamel.yaml.scalarstring.SingleQuotedScalarString or ruamel.yaml.scalarstring.DoubleQuotedScalarString

Reviewed-on: https://git.vdb.to/cerc-io/stack-orchestrator/pulls/919
Co-authored-by: Thomas E Lackey <telackey@bozemanpass.com>
Co-committed-by: Thomas E Lackey <telackey@bozemanpass.com>
2024-08-14 00:25:35 +00:00
David Boreham 65c1cdf6b1 Fix crash if port has int type in yaml (#918)
Reviewed-on: https://git.vdb.to/cerc-io/stack-orchestrator/pulls/918
Reviewed-by: Thomas E Lackey <telackey@noreply.git.vdb.to>
Co-authored-by: David Boreham <david@bozemanpass.com>
Co-committed-by: David Boreham <david@bozemanpass.com>
2024-08-13 20:47:09 +00:00
David Boreham 265699bc38 Allow to disable kind cluster management for testing (#915)
Deploy Test / Run deploy test suite (push) Failing after 5s Details
Webapp Test / Run webapp test suite (push) Failing after 2s Details
Smoke Test / Run basic test suite (push) Failing after 2s Details
Lint Checks / Run linter (push) Failing after 3s Details
Publish / Build and publish (push) Failing after 4s Details
Reviewed-on: https://git.vdb.to/cerc-io/stack-orchestrator/pulls/915
Co-authored-by: David Boreham <david@bozemanpass.com>
Co-committed-by: David Boreham <david@bozemanpass.com>
2024-08-13 17:48:14 +00:00
Thomas E Lackey 6087e1cd31 Copy config under a volume for Docker (similar to a ConfigMap for K8S). (#914)
Smoke Test / Run basic test suite (push) Failing after 4s Details
Lint Checks / Run linter (push) Failing after 2s Details
Publish / Build and publish (push) Failing after 4s Details
Deploy Test / Run deploy test suite (push) Failing after 3s Details
Webapp Test / Run webapp test suite (push) Failing after 3s Details
This emulates the K8S ConfigMap behavior on Docker by using a regular volume.

If a directory exists under `config/` which matches a named volume, the contents will be copied to the volume on `create` (provided the destination volume is empty).  That is, rather than a ConfigMap, it is essentially a "config volume".

For example, with a compose file like:

```
version: '3.7'
services:
  caddy:
    image: cerc/caddy-ethcache:local
    restart: always
    volumes:
      - caddyconfig:/etc/caddy:ro
volumes:
  caddyconfig:
```

And a directory:

```
❯ ls stack-orchestrator/config/caddyconfig/
Caddyfile
```

After `laconic-so deploy create --spec-file caddy.yml --deployment-dir /srv/caddy` there will be:

```
❯ ls /srv/caddy/data/caddyconfig
Caddyfile
```

Mounted at `/etc/caddy`

Reviewed-on: https://git.vdb.to/cerc-io/stack-orchestrator/pulls/914
Reviewed-by: David Boreham <dboreham@noreply.git.vdb.to>
Co-authored-by: Thomas E Lackey <telackey@bozemanpass.com>
Co-committed-by: Thomas E Lackey <telackey@bozemanpass.com>
2024-08-10 02:32:21 +00:00
Thomas E Lackey 1def279d26 Support multiple NodePorts, static NodePort mapping, and add 'replicas' spec option (#913)
Deploy Test / Run deploy test suite (push) Failing after 3s Details
Webapp Test / Run webapp test suite (push) Failing after 2s Details
Publish / Build and publish (push) Failing after 4s Details
Smoke Test / Run basic test suite (push) Failing after 2s Details
Lint Checks / Run linter (push) Failing after 2s Details
NodePort example:

```
network:
  ports:
    caddy:
     - 1234
     - 32020:2020
```

Replicas example:

```
replicas: 2
```

This also adds an optimization for k8s where if a directory matching the name of a configmap exists in beneath config/ in the stack, its contents will be copied into the corresponding configmap.

For example:

```
# Config files in the stack
❯ ls stack-orchestrator/config/caddyconfig
Caddyfile  Caddyfile.one-req-per-upstream-example

# ConfigMap in the spec
❯ cat foo.yml | grep config
...
configmaps:
  caddyconfig: ./configmaps/caddyconfig

# Create the deployment
❯ laconic-so --stack ~/cerc/caddy-ethcache/stack-orchestrator/stacks/caddy-ethcache deploy create --spec-file foo.yml

# The files from beneath config/<config_map_name> have been copied to the ConfigMap directory from the spec.
❯ ls deployment-001/configmaps/caddyconfig
Caddyfile  Caddyfile.one-req-per-upstream-example
```

Reviewed-on: https://git.vdb.to/cerc-io/stack-orchestrator/pulls/913
Reviewed-by: David Boreham <dboreham@noreply.git.vdb.to>
Co-authored-by: Thomas E Lackey <telackey@bozemanpass.com>
Co-committed-by: Thomas E Lackey <telackey@bozemanpass.com>
2024-08-09 02:32:06 +00:00
David Boreham aef5986135 Allow gentx-files to be omitted 2024-08-07 14:11:06 -06:00
David Boreham 7590d6e237 Add stage 1 support 2024-08-07 11:28:10 -06:00
David Boreham 7d18334953 Mainnet-laconic stack fixes for laconicd2 (#904)
Lint Checks / Run linter (push) Failing after 3s Details
Publish / Build and publish (push) Failing after 4s Details
Fixturenet-Laconicd-Test / Run Laconicd fixturenet and Laconic CLI tests (push) Failing after 3s Details
Deploy Test / Run deploy test suite (push) Failing after 3s Details
Webapp Test / Run webapp test suite (push) Failing after 3s Details
Smoke Test / Run basic test suite (push) Failing after 3s Details
Reviewed-on: https://git.vdb.to/cerc-io/stack-orchestrator/pulls/904
2024-07-31 13:51:28 +00:00
Thomas E Lackey 913c3a8557 Back to v2 now that we have a working webapp deployer build again. (#896)
Publish / Build and publish (push) Failing after 5s Details
Webapp Test / Run webapp test suite (push) Failing after 3s Details
Smoke Test / Run basic test suite (push) Failing after 3s Details
Lint Checks / Run linter (push) Failing after 3s Details
Deploy Test / Run deploy test suite (push) Failing after 3s Details
Reviewed-on: https://git.vdb.to/cerc-io/stack-orchestrator/pulls/896
Reviewed-by: David Boreham <dboreham@noreply.git.vdb.to>
Co-authored-by: Thomas E Lackey <telackey@bozemanpass.com>
Co-committed-by: Thomas E Lackey <telackey@bozemanpass.com>
2024-07-27 18:59:42 +00:00
Thomas E Lackey 2f5b0cdd13 Revert recent laconicd deployment changes to restore production webapp deployer function. (#895)
Deploy Test / Run deploy test suite (push) Failing after 5s Details
Publish / Build and publish (push) Failing after 5s Details
Webapp Test / Run webapp test suite (push) Failing after 3s Details
Smoke Test / Run basic test suite (push) Failing after 3s Details
Lint Checks / Run linter (push) Failing after 3s Details
Reviewed-on: https://git.vdb.to/cerc-io/stack-orchestrator/pulls/895
2024-07-27 17:04:03 +00:00
zramsay 80cff73344 crn --> lrn 2024-07-23 20:20:01 -04:00
zramsay 21b1270d27 fix lint 2024-07-23 20:16:16 -04:00
zramsay 008389dcd8 cns --> registry and other fixes 2024-07-23 20:10:06 -04:00
Roy Crihfield 36d4969b2d Fixes for external stack deployment (#851)
Publish / Build and publish (push) Failing after 11s Details
Deploy Test / Run deploy test suite (push) Failing after 3s Details
Webapp Test / Run webapp test suite (push) Failing after 3s Details
Lint Checks / Run linter (push) Failing after 2s Details
Smoke Test / Run basic test suite (push) Failing after 3s Details
Fixes
- stack path resolution for `build`
- external stack path resolution for deployments
- "extra" config detection
- `deployment ports` command
- `version` command in dist or source install (without build_tag.txt)
- `setup-repos`, so it won't die when an existing repo is not at a branch or exact tag

Used in https://git.vdb.to/cerc-io/fixturenet-eth-stacks/pulls/14

Reviewed-on: https://git.vdb.to/cerc-io/stack-orchestrator/pulls/851
Reviewed-by: David Boreham <dboreham@noreply.git.vdb.to>
2024-07-09 15:37:35 +00:00
David Boreham 62f7825ec2 Remove quotes from git config 2024-07-05 09:55:14 -06:00
David Boreham 6d24d4a7e6 Set github auth token if present (#868)
Smoke Test / Run basic test suite (push) Failing after 3s Details
Deploy Test / Run deploy test suite (push) Failing after 3s Details
Publish / Build and publish (push) Failing after 5s Details
Webapp Test / Run webapp test suite (push) Failing after 3s Details
Lint Checks / Run linter (push) Failing after 4s Details
Reviewed-on: https://git.vdb.to/cerc-io/stack-orchestrator/pulls/868
Co-authored-by: David Boreham <david@bozemanpass.com>
Co-committed-by: David Boreham <david@bozemanpass.com>
2024-07-05 12:27:22 +00:00
David Boreham f06e5f9a2a Don't try to tag remote images (#866)
Deploy Test / Run deploy test suite (push) Failing after 4s Details
Smoke Test / Run basic test suite (push) Failing after 3s Details
Lint Checks / Run linter (push) Failing after 2s Details
Publish / Build and publish (push) Failing after 4s Details
Webapp Test / Run webapp test suite (push) Failing after 3s Details
Reviewed-on: https://git.vdb.to/cerc-io/stack-orchestrator/pulls/866
Co-authored-by: David Boreham <david@bozemanpass.com>
Co-committed-by: David Boreham <david@bozemanpass.com>
2024-07-04 23:51:06 +00:00
David Boreham c3a1402042 Derive the webapp host container id from stable data (#865)
Deploy Test / Run deploy test suite (push) Failing after 5s Details
Smoke Test / Run basic test suite (push) Failing after 3s Details
Webapp Test / Run webapp test suite (push) Failing after 3s Details
Lint Checks / Run linter (push) Failing after 3s Details
Publish / Build and publish (push) Failing after 5s Details
Reviewed-on: https://git.vdb.to/cerc-io/stack-orchestrator/pulls/865
Co-authored-by: David Boreham <david@bozemanpass.com>
Co-committed-by: David Boreham <david@bozemanpass.com>
2024-07-04 22:38:07 +00:00
David Boreham ca5fffaed5 Add logging to webapp deployer (#863)
Helps with diagnosing failures and odd behavior seen in the deployer in production.

Reviewed-on: https://git.vdb.to/cerc-io/stack-orchestrator/pulls/863
Co-authored-by: David Boreham <david@bozemanpass.com>
Co-committed-by: David Boreham <david@bozemanpass.com>
2024-07-04 19:46:42 +00:00
David Boreham bf1eccd486 Fix image tag name 2024-06-13 08:31:45 -06:00
David Boreham 3fb025b5c9 Make remote image tags unique to the deployment (#838)
Webapp Test / Run webapp test suite (push) Failing after 4s Details
Deploy Test / Run deploy test suite (push) Failing after 3s Details
Publish / Build and publish (push) Failing after 4s Details
Smoke Test / Run basic test suite (push) Failing after 3s Details
Lint Checks / Run linter (push) Failing after 3s Details
Reviewed-on: https://git.vdb.to/cerc-io/stack-orchestrator/pulls/838
Reviewed-by: Thomas E Lackey <telackey@noreply.git.vdb.to>
Co-authored-by: David Boreham <david@bozemanpass.com>
Co-committed-by: David Boreham <david@bozemanpass.com>
2024-06-13 03:26:58 +00:00
David Boreham 13ce521d84 Fix config dir processing for external stacks (#810)
Lint Checks / Run linter (push) Failing after 6s Details
Publish / Build and publish (push) Failing after 5s Details
Smoke Test / Run basic test suite (push) Failing after 4s Details
Deploy Test / Run deploy test suite (push) Failing after 3s Details
Webapp Test / Run webapp test suite (push) Failing after 3s Details
Reviewed-on: https://git.vdb.to/cerc-io/stack-orchestrator/pulls/810
Co-authored-by: David Boreham <david@bozemanpass.com>
Co-committed-by: David Boreham <david@bozemanpass.com>
2024-04-23 21:47:20 +00:00
David Boreham 6e4dae9777 Add external stack support (#806)
Deploy Test / Run deploy test suite (push) Failing after 5s Details
External Stack Test / Run external stack test suite (push) Failing after 3s Details
Webapp Test / Run webapp test suite (push) Failing after 3s Details
Smoke Test / Run basic test suite (push) Failing after 4s Details
Lint Checks / Run linter (push) Failing after 4s Details
Publish / Build and publish (push) Failing after 5s Details
Reviewed-on: https://git.vdb.to/cerc-io/stack-orchestrator/pulls/806
Co-authored-by: David Boreham <david@bozemanpass.com>
Co-committed-by: David Boreham <david@bozemanpass.com>
2024-04-18 21:22:47 +00:00
Thomas E Lackey 9043a67c7c Skip checks on requests we've already seen (#805)
Publish / Build and publish (push) Failing after 6s Details
Deploy Test / Run deploy test suite (push) Failing after 3s Details
Webapp Test / Run webapp test suite (push) Failing after 3s Details
Smoke Test / Run basic test suite (push) Failing after 3s Details
Lint Checks / Run linter (push) Failing after 4s Details
Reviewed-on: https://git.vdb.to/cerc-io/stack-orchestrator/pulls/805
Co-authored-by: Thomas E Lackey <telackey@bozemanpass.com>
Co-committed-by: Thomas E Lackey <telackey@bozemanpass.com>
2024-04-15 20:27:35 +00:00
Thomas E Lackey 4126f2fc43 Add --fqdn-policy option to deploy-webapp-from-registry. (#802)
Webapp Test / Run webapp test suite (push) Failing after 5s Details
Lint Checks / Run linter (push) Failing after 4s Details
Deploy Test / Run deploy test suite (push) Failing after 3s Details
Publish / Build and publish (push) Failing after 4s Details
Smoke Test / Run basic test suite (push) Failing after 3s Details
This add a new option `--fqdn-policy` to the `deploy-webapp-from-registry`.

The default policy, `prohibit` means that `ApplicationDeploymentRequests` which specify a FQDN will be rejected.  The `allow` policy will cause them to be processed.  The `preexisting` policy will only process them if an existing `DnsRecord` exists in the registry with the correct ownership.

The latter would be useful in conjunction with a pre-checking scheme in the UI (eg, that the DNS entry is properly configured, the domain is under the control of the requestor, etc.)  Only after all the checks were successful would the `DnsRecord` be created, allowing for `ApplicationDeploymentRequests` to use it.

Reviewed-on: https://git.vdb.to/cerc-io/stack-orchestrator/pulls/802
Reviewed-by: David Boreham <dboreham@noreply.git.vdb.to>
Co-authored-by: Thomas E Lackey <telackey@bozemanpass.com>
Co-committed-by: Thomas E Lackey <telackey@bozemanpass.com>
2024-04-15 12:20:35 +00:00