192 lines
8.1 KiB
Markdown
192 lines
8.1 KiB
Markdown
# Shred Collector Relay
|
||
|
||
## Problem
|
||
|
||
Turbine assigns each validator a single position in the shred distribution tree
|
||
per slot, determined by its pubkey. A validator in Miami with one identity receives
|
||
shreds from one set of tree neighbors — typically ~60-70% of shreds for any given
|
||
slot. The remaining 30-40% must come from the repair protocol, which is too slow
|
||
to keep pace with chain production (see analysis below).
|
||
|
||
Commercial services (Jito ShredStream, bloXroute OFR) solve this by running many
|
||
nodes with different identities across the turbine tree, aggregating shreds, and
|
||
redistributing the combined set to subscribers. This works but costs $300-5,000/mo
|
||
and adds a dependency on a third party.
|
||
|
||
## Concept
|
||
|
||
Run lightweight **shred collector** nodes at multiple geographic locations on
|
||
the Laconic network (Ashburn, Dallas, etc.). Each collector has its own keypair,
|
||
joins gossip with a unique identity, receives turbine shreds from its unique tree
|
||
position, and forwards raw shred packets to the main validator in Miami. The main
|
||
validator inserts these shreds into its blockstore alongside its own turbine shreds,
|
||
increasing completeness toward 100% without relying on repair.
|
||
|
||
```
|
||
Turbine Tree
|
||
/ | \
|
||
/ | \
|
||
collector-ash collector-dfw biscayne (main validator)
|
||
(Ashburn) (Dallas) (Miami)
|
||
identity A identity B identity C
|
||
~60% shreds ~60% shreds ~60% shreds
|
||
\ | /
|
||
\ | /
|
||
→ UDP forward via DZ backbone →
|
||
|
|
||
biscayne blockstore
|
||
~95%+ shreds (union of A∪B∪C)
|
||
```
|
||
|
||
Each collector sees a different ~60% slice of the turbine tree. The union of
|
||
three independent positions yields ~94% coverage (1 - 0.4³ = 0.936). Four
|
||
collectors yield ~97%. The main validator fills the remaining few percent via
|
||
repair, which is fast when only 3-6% of shreds are missing.
|
||
|
||
## Why This Works
|
||
|
||
The math from biscayne's recovery (2026-03-06):
|
||
|
||
| Metric | Value |
|
||
|--------|-------|
|
||
| Compute-bound replay (complete blocks) | 5.2 slots/sec |
|
||
| Repair-bound replay (incomplete blocks) | 0.5 slots/sec |
|
||
| Chain production rate | 2.5 slots/sec |
|
||
| Turbine + relay delivery per identity | ~60-70% |
|
||
| Repair bandwidth | ~600 shreds/sec (estimated) |
|
||
| Repair needed to converge at 60% delivery | 5x current bandwidth |
|
||
| Repair needed to converge at 95% delivery | Easily sufficient |
|
||
|
||
At 60% shred delivery, repair must fill 40% per slot — too slow to converge.
|
||
At 95% delivery (3 collectors), repair fills 5% per slot — well within capacity.
|
||
The validator replays at near compute-bound speed (5+ slots/sec) and converges.
|
||
|
||
## Infrastructure
|
||
|
||
Laconic already has DZ-connected switches at multiple sites:
|
||
|
||
| Site | Device | Latency to Miami | Backbone |
|
||
|------|--------|-------------------|----------|
|
||
| Miami | laconic-mia-sw01 | 0.24ms | local |
|
||
| Ashburn | laconic-was-sw01 | ~29ms | Et4/1 25.4ms |
|
||
| Dallas | laconic-dfw-sw01 | ~30ms | TBD |
|
||
|
||
The DZ backbone carries traffic between sites at line rate. Shred packets are
|
||
~1280 bytes each. At ~3,000 shreds/slot and 2.5 slots/sec, each collector
|
||
forwards ~7,500 packets/sec (~10 MB/s) — trivial bandwidth for the backbone.
|
||
|
||
## Collector Architecture
|
||
|
||
The collector does NOT need to be a full validator. It needs to:
|
||
|
||
1. **Join gossip** — advertise a ContactInfo with its own pubkey and a TVU
|
||
address (the site's IP)
|
||
2. **Receive turbine shreds** — UDP packets on the advertised TVU port
|
||
3. **Forward shreds** — retransmit raw UDP packets to biscayne's TVU port
|
||
|
||
It does NOT need to: replay transactions, maintain accounts state, store a
|
||
ledger, load a snapshot, vote, or run RPC.
|
||
|
||
### Option A: Firedancer Minimal Build
|
||
|
||
Firedancer (Apache 2, C) has a tile-based architecture where each function
|
||
(net, gossip, shred, bank, store, etc.) runs as an independent Linux process.
|
||
A minimal build using only the networking + gossip + shred tiles would:
|
||
|
||
- Join gossip and advertise a TVU address
|
||
- Receive turbine shreds via the shred tile
|
||
- Forward shreds to a configured destination instead of to bank/store
|
||
|
||
This requires modifying the shred tile to add a UDP forwarder output instead
|
||
of (or in addition to) the normal bank handoff. The rest of the tile pipeline
|
||
(bank, pack, poh, store) is simply not started.
|
||
|
||
**Estimated effort:** Moderate. Firedancer's tile architecture is designed for
|
||
this kind of composition. The main work is adding a forwarder sink to the shred
|
||
tile and testing gossip participation without the full validator stack.
|
||
|
||
**Source:** https://github.com/firedancer-io/firedancer
|
||
|
||
### Option B: Agave Non-Voting Minimal
|
||
|
||
Run `agave-validator --no-voting` with `--limit-ledger-size 0` and minimal
|
||
config. Agave still requires a snapshot to start and runs the full process, but
|
||
with no voting and minimal ledger it would be lighter than a full node.
|
||
|
||
**Downside:** Agave is monolithic — you can't easily disable replay/accounts.
|
||
It still loads a snapshot, builds the accounts index, and runs replay. This
|
||
defeats the purpose of a lightweight collector.
|
||
|
||
### Option C: Custom Gossip + TVU Receiver
|
||
|
||
Write a minimal Rust binary using agave's `solana-gossip` and `solana-streamer`
|
||
crates to:
|
||
1. Bootstrap into gossip via entrypoints
|
||
2. Advertise ContactInfo with TVU socket
|
||
3. Receive shred packets on TVU
|
||
4. Forward them via UDP
|
||
|
||
**Estimated effort:** Significant. Gossip protocol participation is complex
|
||
(CRDS protocol, pull/push protocol, protocol versioning). Using the agave
|
||
crates directly is possible but poorly documented for standalone use.
|
||
|
||
### Option D: Run Collectors on Biscayne
|
||
|
||
Run the collector processes on biscayne itself, each advertising a TVU address
|
||
at a remote site. The switches at each site forward inbound TVU traffic to
|
||
biscayne via the DZ backbone using traffic-policy redirects (same pattern as
|
||
`ashburn-validator-relay.md`).
|
||
|
||
**Advantage:** No compute needed at remote sites. Just switch config + loopback
|
||
IPs. All collector processes run in Miami.
|
||
|
||
**Risk:** Gossip advertises IP + port. If the collector runs on biscayne but
|
||
advertises an Ashburn IP, gossip protocol interactions (pull requests, pings)
|
||
arrive at the Ashburn IP and must be forwarded back to biscayne. This adds
|
||
~58ms RTT to gossip protocol messages, which may cause timeouts or peer
|
||
quality degradation. Needs testing.
|
||
|
||
## Recommendation
|
||
|
||
Option A (Firedancer minimal build) is the correct long-term approach. It
|
||
produces a single binary that does exactly one thing: collect shreds from a
|
||
unique turbine tree position and forward them. It runs on minimal hardware
|
||
(a small VM or container at each site, or on biscayne with remote TVU
|
||
addresses).
|
||
|
||
Option D (collectors on biscayne with switch forwarding) is the fastest to
|
||
test since it needs no new software — just switch config and multiple
|
||
agave-validator instances with `--no-voting`. The question is whether agave
|
||
can start without a snapshot if we only care about gossip + TVU.
|
||
|
||
## Deployment Topology
|
||
|
||
```
|
||
biscayne (186.233.184.235)
|
||
├── agave-validator (main, identity C, TVU 186.233.184.235:9000)
|
||
├── collector-ash (identity A, TVU 137.239.194.65:9000)
|
||
│ └── shreds forwarded via was-sw01 traffic-policy
|
||
├── collector-dfw (identity B, TVU <dfw-ip>:9000)
|
||
│ └── shreds forwarded via dfw-sw01 traffic-policy
|
||
└── blockstore receives union of A∪B∪C shreds
|
||
|
||
was-sw01 (Ashburn)
|
||
└── Loopback: 137.239.194.65
|
||
└── traffic-policy: UDP dst 137.239.194.65:9000 → nexthop mia-sw01
|
||
|
||
dfw-sw01 (Dallas)
|
||
└── Loopback: <assigned IP>
|
||
└── traffic-policy: UDP dst <assigned IP>:9000 → nexthop mia-sw01
|
||
```
|
||
|
||
## Open Questions
|
||
|
||
1. Can agave-validator start in gossip-only mode without a snapshot?
|
||
2. Does Firedancer's shred tile work standalone without bank/replay?
|
||
3. What is the gossip protocol timeout for remote TVU addresses (Option D)?
|
||
4. How does the turbine tree handle multiple identities from the same IP
|
||
(if running all collectors on biscayne)?
|
||
5. Do we need stake on collector identities to be placed in the turbine tree,
|
||
or do unstaked nodes still participate?
|
||
6. What IP block is available on dfw-sw01 for a collector loopback?
|