stack-orchestrator/docs/doublezero-status.md

122 lines
4.3 KiB
Markdown

# DoubleZero Current State and Bug Fixes
## Biscayne Connection Details
| Field | Value |
|-------|-------|
| Host | biscayne.vaasl.io (186.233.184.235) |
| DZ identity | `3Bw6v7EruQvTwoY79h2QjQCs2KBQFzSneBdYUbcXK1Tr` |
| Validator identity | `4WeLUxfQghbhsLEuwaAzjZiHg2VBw87vqHc4iZrGvKPr` |
| Nearest device | laconic-mia-sw01 (0.3ms) |
| DZ version (host) | 0.8.10 |
| DZ version (container) | 0.8.11 |
| k8s version | 1.35.1 (kind) |
## Current State (2026-03-03)
The host systemd `doublezerod` is connected and working. The container sidecar
doublezerod is broken. Both are running simultaneously.
| Instance | Identity | Status |
|----------|----------|--------|
| Host systemd | `3Bw6v7...` (correct) | BGP Session Up, IBRL to laconic-mia-sw01 |
| Container sidecar | `Cw9qun...` (wrong) | Disconnected, error loop |
| DaemonSet manifest | N/A | Never applied, dead code |
### Access pass
The access pass for 186.233.184.235 is registered and connected:
```
type: prepaid
payer: 3Bw6v7EruQvTwoY79h2QjQCs2KBQFzSneBdYUbcXK1Tr
status: connected
owner: DZfLKFDgLShjY34WqXdVVzHUvVtrYXb7UtdrALnGa8jw
```
## Bugs
### BUG-1: Container doublezerod has wrong identity
The entrypoint script (`entrypoint.sh`) auto-generates a new `id.json` if one isn't
found. The volume at `/srv/deployments/agave/data/doublezero-config/` was empty at
first boot, so it generated `Cw9qun...` instead of using the registered identity.
**Root cause:** The real `id.json` lives at `/home/solana/.config/doublezero/id.json`
(created by the host-level DZ install). The container volume is a separate path that
was never seeded.
**Fix:**
```bash
sudo cp /home/solana/.config/doublezero/id.json \
/srv/deployments/agave/data/doublezero-config/id.json
```
### BUG-2: Container doublezerod can't resolve DZ passport program
`DOUBLEZERO_RPC_ENDPOINT` in `spec.yml` is `http://127.0.0.1:8899` — the local
validator. But the local validator hasn't replayed enough slots to have the DZ
passport program accounts (`ser2VaTMAcYTaauMrTSfSrxBaUDq7BLNs2xfUugTAGv`).
doublezerod calls `GetProgramAccounts` every 30 seconds and gets empty results.
**Fix in `deployment/spec.yml`:**
```yaml
# Use public RPC for DZ bootstrapping until local validator is caught up
DOUBLEZERO_RPC_ENDPOINT: https://api.mainnet-beta.solana.com
```
Switch back to `http://127.0.0.1:8899` once the local validator is synced.
### BUG-3: Container doublezerod lacks hostNetwork
laconic-so was not translating `network_mode: host` from compose files to
`hostNetwork: true` in generated k8s pod specs. Without host network access, the
container can't create GRE tunnels (IP proto 47) or run BGP (tcp/179 on
169.254.0.0/16).
**Fix:** Deploy with stack-orchestrator branch `fix/k8s-port-mappings-hostnetwork-v2`
(commit `fb69cc58`, 2026-03-03) which adds automatic hostNetwork detection.
### BUG-4: DaemonSet workaround is dead code
`deployment/k8s-manifests/doublezero-daemonset.yaml` was a workaround for BUG-3.
Now that laconic-so supports hostNetwork natively, it should be deleted.
**Fix:** Remove `deployment/k8s-manifests/doublezero-daemonset.yaml` from agave-stack.
### BUG-5: Two doublezerod instances running simultaneously
The host systemd `doublezerod` and the container sidecar are both running. Once the
container is fixed (BUG-1 through BUG-3), the host service must be disabled to avoid
two processes fighting over the GRE tunnel.
**Fix:**
```bash
sudo systemctl stop doublezerod
sudo systemctl disable doublezerod
```
## Diagnostic Commands
Always use `sudo -u solana` for host-level DZ commands — the identity is under
`/home/solana/.config/doublezero/`.
```bash
# Host
sudo -u solana doublezero address # expect 3Bw6v7...
sudo -u solana doublezero status # tunnel state
sudo -u solana doublezero latency # device reachability
sudo -u solana doublezero access-pass list | grep 186.233.184 # access pass
sudo -u solana doublezero balance # credits
ip route | grep doublezero0 # BGP routes
# Container (from kind node)
kubectl -n <ns> exec <pod> -c doublezerod -- doublezero address
kubectl -n <ns> exec <pod> -c doublezerod -- doublezero status
kubectl -n <ns> exec <pod> -c doublezerod -- doublezero --version
# Logs
kubectl -n <ns> logs <pod> -c doublezerod --tail=30
sudo journalctl -u doublezerod -f # host systemd logs
```