83 lines
4.5 KiB
Markdown
83 lines
4.5 KiB
Markdown
# Edge And Trial Surface
|
|
|
|
This document fixes the supported product boundary for the edge bundle and the lightest trial surface.
|
|
|
|
## APIGateway
|
|
|
|
APIGateway is supported as stateless replicated instances behind an external L4 or VIP layer; live in-process reload is not part of the product contract.
|
|
|
|
Supported operator contract:
|
|
|
|
1. Render gateway config from Nix or `ultracloud.cluster` generated inputs and restart or replace the process when routes, auth providers, or credit providers change.
|
|
2. Scale out by running multiple identical gateway instances behind FiberLB, an external load balancer, or another L4 or VIP distribution layer.
|
|
3. Treat route distribution as configuration rollout, not as a dynamic control-plane API.
|
|
|
|
Explicit non-supported behavior:
|
|
|
|
1. Hot route reload through an admin API or `SIGHUP`.
|
|
2. Stateful leader election or in-process config distribution between gateway replicas.
|
|
3. A release promise that every HA topology is directly exercised by `fresh-matrix`.
|
|
|
|
Current proof scope:
|
|
|
|
1. `fresh-matrix` proves the shipped single gateway-node composition on `node06`.
|
|
2. The HA story is a supported operator shape, but the release-facing proof remains one stateless gateway instance plus restart-based rollout.
|
|
|
|
## NightLight
|
|
|
|
NightLight is supported as a single-node WAL/snapshot service; replicated HA metrics storage is not part of the product contract.
|
|
|
|
Supported operator contract:
|
|
|
|
1. Use one NightLight instance per edge bundle, per lab, or per tenant environment when you need a hard operational boundary.
|
|
2. Use `retention_days`, the WAL, and periodic snapshots as the retention contract for that instance.
|
|
3. Put shared access control in front of NightLight with APIGateway or another authenticated front door when multiple writers or readers share the same endpoint.
|
|
|
|
Explicit non-supported behavior:
|
|
|
|
1. Multi-node or quorum-backed NightLight replication.
|
|
2. Per-tenant retention enforcement inside NightLight itself.
|
|
3. Treating NightLight labels as a hard security boundary.
|
|
|
|
The supported tenant contract is therefore deployment-scoped: one NightLight instance can serve one environment or a carefully trusted shared bundle, but tenant isolation is not enforced inside the process.
|
|
|
|
## CreditService
|
|
|
|
CreditService export and backend migration are supported as offline export/import or backend-native snapshot workflows, not live mixed-writer migration.
|
|
|
|
Supported operator contract:
|
|
|
|
1. Keep CreditService scoped to quota, wallet, reservation, and admission-control behavior.
|
|
2. Use backend-native snapshots or logical API replay as the export baseline.
|
|
3. Drain or quiesce writes before moving between FlareDB, PostgreSQL, or SQLite backends.
|
|
4. Rehydrate the target backend, then cut APIGateway or callers over to the new endpoint.
|
|
|
|
Explicit non-supported behavior:
|
|
|
|
1. Finance-grade ledger ownership.
|
|
2. Live mixed-writer backend migration.
|
|
3. Turning the service into a pricing, invoicing, or settlement platform.
|
|
|
|
## Trial Surface
|
|
|
|
OCI/Docker artifact is intentionally not the public trial surface.
|
|
|
|
The supported lightweight trial remains:
|
|
|
|
1. `nix build .#single-node-trial-vm`
|
|
2. `nix run .#single-node-trial`
|
|
3. `nix run .#single-node-quickstart`
|
|
|
|
That boundary exists because the supported VM-platform contract needs a guest kernel plus host KVM, `/dev/net/tun`, and OVS or libvirt semantics. A Docker or OCI image would either be host-coupled and privileged or prove a different, weaker contract.
|
|
|
|
## Work Root Budget
|
|
|
|
Use `./nix/test-cluster/work-root-budget.sh status` for reporting, `./nix/test-cluster/work-root-budget.sh enforce` for a stronger local budget gate, and `./nix/test-cluster/work-root-budget.sh prune-proof-logs 2` for safer dated-proof cleanup.
|
|
|
|
Recommended soft budgets on a local AMD/KVM proof host:
|
|
|
|
1. Keep `./work/test-cluster/state` under roughly 35 GiB.
|
|
2. Keep disposable runtime state such as `./work/tmp` and `./work/publishable-kvm-runtime` under roughly 10 GiB combined.
|
|
3. Keep dated proof roots trimmed so combined proof logs stay under roughly 20 GiB unless you are intentionally archiving a release snapshot.
|
|
|
|
The helper prints current sizes, highlights budget overruns, and prints safe cleanup steps such as stopping the cluster, cleaning runtime state, deleting disposable log roots, and then running a Nix store GC after old result symlinks are no longer needed. The `enforce` mode lets local proof lanes fail fast when the operator has let `./work` drift beyond the documented soft budget, and `prune-proof-logs` gives a dry-run-first workflow for trimming dated proof roots.
|