| .cargo | ||
| .github/workflows | ||
| apigateway | ||
| baremetal | ||
| bin | ||
| chainfire | ||
| client-common | ||
| coronafs | ||
| crates | ||
| creditservice | ||
| deployer | ||
| docs | ||
| fiberlb | ||
| flaredb | ||
| flashdns | ||
| iam | ||
| k8shost | ||
| lightningstor | ||
| mtls-agent | ||
| nightlight | ||
| nix | ||
| nix-nos | ||
| plans | ||
| plasmavmc | ||
| prismnet | ||
| scripts | ||
| .gitignore | ||
| CONTRIBUTING.md | ||
| flake.lock | ||
| flake.nix | ||
| LICENSE | ||
| Makefile | ||
| README.md | ||
| SECURITY.md | ||
| shell.nix | ||
| TODO.md | ||
UltraCloud
UltraCloud is a Nix-first cloud platform workspace that assembles a small control plane, network services, VM hosting, shared storage, object storage, and gateway services into one reproducible repository.
The fastest public entrypoint is the one-command single-node quickstart. The 3-node HA control plane profile lives in nixosConfigurations.node01, nixosConfigurations.node02, and nixosConfigurations.node03; the six-node VM cluster under nix/test-cluster is the publishable harness that extends that HA baseline with worker and optional service bundles on host-built QEMU guests.
The canonical bare-metal bootstrap proof is the ISO-on-QEMU path under nix/test-cluster, which drives phone-home, Disko install, reboot, and desired-system convergence for one control-plane node and one worker-equivalent node.
Components
chainfire: replicated coordination storeflaredb: replicated KV and metadata storeiam: identity, token issuance, and authorizationprismnet: tenant networking control planeflashdns: authoritative DNS servicefiberlb: load balancer control plane and dataplaneplasmavmc: VM control plane and worker agentscoronafs: shared filesystem for mutable VM volumeslightningstor: object storage and VM image backingk8shost: Kubernetes-style hosting control plane for tenant pods and servicesapigateway: external API and proxy surfacenightlight: metrics ingestion and query servicecreditservice: quota, reservation, and admission-control servicedeployer: bootstrap and phone-home deployment service that owns install plans and desired-system intentfleet-scheduler: non-Kubernetes service scheduler for bare-metal cluster services
Core API Notes
chainfireships a live cluster-management API on the supported surface. Public cluster management isMemberAdd,MemberRemove,MemberList,Status, andLeaderTransfer, and the internal Raft transport surface isVote,AppendEntries, plusTimeoutNow.chainfire-coreis workspace-internal only; the old embeddable builder and distributed-KV scaffold are not part of the supported product contract.flaredbships SQL on both gRPC and REST. The supported REST SQL surface isPOST /api/v1/sqlfor statement execution andGET /api/v1/tablesfor table discovery, alongside the existing KV and scan endpoints.plasmavmcships a KVM-only public VM backend contract. The supported create and recovery surface is the KVM path exercised insingle-node-quickstart,fresh-smoke, andfresh-matrix; Firecracker and mvisor remain archived non-product backends outside the supported surface until they have real tenant-network coverage.lightningstorkeeps its optional gRPC surface live: bucket versioning, bucket policy, bucket tagging, and explicit object version listing are part of the supported contract for the canonical optional bundle.fiberlbbackendHttpshealth checks currently do not verify backend TLS certificates. Supported scope is limited to TCP reachability plus HTTP status for the backend endpoint until CA-aware verification is wired through config, server code, and the canonical harness.k8shostkeepsWatchPodson the supported surface as a bounded snapshot stream for the current matching pod set. The published contract is the tenant workload API, not a separate long-lived controller event bus.k8shostis fixed as an API/control-plane product surface; runtime dataplane helpers stay archived non-product until they have their own published contract and proof.k8shost-cni,k8shost-controllers,lightningstor-csi,nixosConfigurations.netboot-worker, and the older scripts underbaremetal/vm-clusterare archived internal scaffolds orlegacy/manualdebugging paths outside the supported surface.
Core Control Plane Operations
The control-plane operator contract is fixed in docs/control-plane-ops.md.
- ChainFire supports live membership add, remove, promotion, endpoint replacement, and leader transfer for voters and learners on the public surface, including current-leader removal followed by election on the remaining voters. The supported reconfiguration boundary is sequential one-voter transitions until joint consensus lands. The fallback operator path remains backup plus restore through
durability-proof, and the dedicated KVM proof lane isnix run ./nix/test-cluster#cluster -- chainfire-live-membership-proof. - FlareDB online migration and schema evolution must start from the durability-proof backup/restore baseline and stay additive-first until a later destructive cleanup window. FlareDB destructive DDL and fully automated online migration remain outside the supported product contract for this release.
- IAM bootstrap hardening requires an explicit admin token, an explicit signing key, and a 32-byte IAM_CRED_MASTER_KEY. Signing-key rotation, credential overlap-and-revoke rotation, and mTLS overlap-and-cutover rotation are part of the supported operator contract; multi-node IAM failover remains outside the supported product contract. The standalone proof is
./nix/test-cluster/run-core-control-plane-ops-proof.sh.
Edge And Trial Surface
The edge-bundle and trial-surface contract is fixed in docs/edge-trial-surface.md.
- APIGateway is supported as stateless replicated instances behind an external L4 or VIP layer; live in-process reload is not part of the product contract.
- NightLight is supported as a single-node WAL/snapshot service; replicated HA metrics storage is not part of the product contract.
- CreditService export and backend migration are supported as offline export/import or backend-native snapshot workflows, not live mixed-writer migration.
- OCI/Docker artifact is intentionally not the public trial surface.
- Use
./nix/test-cluster/work-root-budget.sh statusfor disk budget, GC, and cleanup guidance,./nix/test-cluster/work-root-budget.sh enforcefor a stronger local budget gate, and./nix/test-cluster/work-root-budget.sh prune-proof-logs 2for safer dated-proof cleanup.
Quick Start
Single-node quickstart:
nix run .#single-node-quickstart
This app is also the automated smoke check for the smallest realistic trial surface. It builds the minimal VM stack, boots a QEMU VM, waits for chainfire, flaredb, iam, prismnet, and plasmavmc, checks their health endpoints, and verifies the in-guest VM runtime prerequisites. For an interactive session, keep the VM running:
ULTRACLOUD_QUICKSTART_KEEP_VM=1 nix run .#single-node-quickstart
Buildable trial artifact:
nix build .#single-node-trial-vm
nix run .#single-node-trial
single-node-trial-vm is the lightest supported artifact for local use: a host-built NixOS VM appliance for the VM-platform core. OCI/Docker artifact is intentionally not the public trial surface here, because the supported scope needs a guest kernel plus host KVM, /dev/net/tun, and OVS/libvirt semantics. A privileged container would be host-coupled and would not prove the same contract.
The legacy name .#all-in-one-quickstart is kept as an alias, and .#single-node-trial is a friendlier alias for the same smoke launcher.
Portable local proof on hosts without /dev/kvm:
nix build .#checks.x86_64-linux.canonical-profile-eval-guards
nix build .#checks.x86_64-linux.portable-control-plane-regressions
This TCG-safe lane keeps canonical profile drift, the core chainfire / deployer control-plane path, the deployer -> nix-agent boundary, and the fleet-scheduler -> node-agent boundary under regression coverage without requiring nested virtualization.
Publishable nested-KVM suite:
nix develop
nix run ./nix/test-cluster#cluster -- fresh-smoke
nix run ./nix/test-cluster#cluster -- fresh-demo-vm-webapp
nix run ./nix/test-cluster#cluster -- fresh-matrix
nix run ./nix/test-cluster#cluster -- chainfire-live-membership-proof
./nix/test-cluster/run-publishable-kvm-suite.sh ./work/publishable-kvm-suite
The checked-in entrypoint for the publishable nested-KVM suite is the local wrapper ./nix/test-cluster/run-publishable-kvm-suite.sh. Runner-specific workflow wiring from task/f5c70db0-baseline-profiles is intentionally not part of this re-aggregated baseline.
For the full supported-surface proof on a local AMD/KVM host, use ./nix/test-cluster/run-supported-surface-final-proof.sh ./work/final-proofs/latest; it keeps builders local, builds single-node-trial-vm, runs single-node-quickstart, and captures the publishable KVM suite logs in one place.
nix run ./nix/test-cluster#cluster -- durability-proof is the canonical chainfire flaredb deployer backup/restore lane. It persists artifacts under ./work/durability-proof/latest, proves logical backup/restore for ChainFire keys and FlareDB SQL rows, uses the canonical Deployer admin pre-register request itself as the backup artifact, verifies that the pre-registered node survives a deployer.service restart, replays the same request idempotently, and injects CoronaFS plus LightningStor failures against the same live KVM cluster.
nix run ./nix/test-cluster#cluster -- rollout-soak is the longer-running control-plane and rollout companion lane. It rebuilds from clean local KVM runtime state, persists artifacts under ./work/rollout-soak/latest, validates exactly one planned draining maintenance cycle and one fail-stop worker-loss cycle on the two native-runtime workers, holds each degraded state for the configured soak window, then restarts deployer, fleet-scheduler, node-agent, chainfire, and flaredb before revalidating the cluster. The soak root also carries explicit scope markers so the supported boundary is encoded in the proof artifacts rather than only in docs. The steady-state KVM nodes do not run nix-agent.service, so the soak lane records explicit nix-agent scope markers instead of pretending a live-cluster nix-agent restart happened.
nix run ./nix/test-cluster#cluster -- chainfire-live-membership-proof is the focused local-KVM live-reconfiguration lane for ChainFire. It rebuilds from clean local runtime state, starts a temporary ChainFire replica on node04, proves learner add plus local replication, voter promotion, live leader transfer, temporary-voter restart and rejoin, current-leader removal followed by re-election, removed-leader re-add, and final scale-in back to the canonical 3-node control-plane shape, and stores the resulting membership or local-read artifacts under ./work/chainfire-live-membership-proof/latest.
nix run ./nix/test-cluster#cluster -- provider-vm-reality-proof is the focused local-KVM reality lane for the provider and VM-hosting bundles. It stores artifacts under ./work/provider-vm-reality-proof/latest, captures authoritative FlashDNS answers, FiberLB backend drain and restore evidence, and PlasmaVMC KVM shared-storage migration plus post-migration restart state.
The 2026-04-10 local AMD/KVM proof logs are in ./work/final-proofs/32f64c10-1b74-4d8a-8d7d-b2cc6bf6b4f0-final for supported-surface-guard, single-node-trial-vm, and single-node-quickstart, and in ./work/publishable-kvm-suite for the final passing fresh-smoke, fresh-demo-vm-webapp, and fresh-matrix run through ./nix/test-cluster/run-publishable-kvm-suite.sh.
The exact bare-metal check-runner proof from 2026-04-10 is in ./work/baremetal-iso-e2e/0de75570-dabd-471b-95fe-5898c54e2e8c; its outer environment.txt records execution_model=materialized-check-runner, and state/environment.txt records vm_accelerator_mode=kvm.
The 2026-04-10 durability and failure-injection proof logs are in ./work/durability-proof/20260410T120618+0900; result.json records success=true, deployer_restore_mode="admin pre-register request replay with pre/post-restart list verification", and the artifact set includes chainfire-backup-response.json, flaredb-restored.json, deployer-post-restart-list.json, coronafs-node04-local-state.json, and lightningstor-head-during-node05-outage.json.
The 2026-04-10 longer-running rollout and control-plane soak is in ./work/rollout-soak/20260410T164549+0900; result.json records success=true, fleet_supported_native_runtime_nodes=2, validated_maintenance_cycles=1, validated_power_loss_cycles=1, and soak_hold_secs=30, while the artifact set includes maintenance-held.json, power-loss-held.json, deployer-post-restart-nodes.json, chainfire-post-restart-put.json, flaredb-post-restart.json, scope-fixed-contract.json, deployer-scope-fixed.txt, fleet-scheduler-scope-fixed.txt, and the node01-nix-agent-scope.txt / node04-nix-agent-scope.txt boundary markers.
The 2026-04-10 provider and VM-hosting reality proof logs are in ./work/provider-vm-reality-proof/20260410T135827+0900; result.json records success=true, and the artifact set includes network-provider/fiberlb-drain-summary.txt, network-provider/flashdns-service-authoritative-answer.txt, vm-hosting/migration-summary.json, and vm-hosting/root-volume-after-post-migration-restart.json.
Physical-node bring-up now has a canonical preflight wrapper as well: nix run ./nix/test-cluster#hardware-smoke -- preflight. It writes kernel-params.txt, expected markers, failure markers, and a machine-readable blocked or ready state under ./work/hardware-smoke/latest, and the same entrypoint can later be rerun as run or capture when USB or BMC/Redfish transport is actually present.
Within that suite, fresh-matrix is the public provider-bundle proof: it exercises PrismNet VPC/subnet/port flows plus security-group ACL add/remove, FlashDNS record publication, and FiberLB TCP plus TLS-terminated Https / TerminatedHttps listeners in one tenant-scoped composition run. The published FiberLB L4 algorithms are kept honest with targeted server unit tests in-tree. provider-vm-reality-proof is the artifact-producing companion lane for the same bundle and for the VM-hosting path, and chainfire-live-membership-proof is the dedicated control-plane live-reconfiguration companion for ChainFire.
PrismNet real OVS/OVN dataplane validation remains outside the supported local KVM surface. FiberLB native BGP or BFD peer interop plus hardware VIP ownership also remain outside the supported local KVM surface. PlasmaVMC real-hardware migration or storage handoff remains a later hardware proof; the current local-KVM proof fixes the release surface to KVM shared-storage migration on the worker pair.
Project-done release proof now requires both halves of the public validation surface to be green:
baremetal-isoandbaremetal-iso-e2efor the canonicaldeployer -> installer -> nix-agentbare-metal bootstrap path- the KVM publishable suite (
fresh-smoke,fresh-demo-vm-webapp,fresh-matrix,chainfire-live-membership-proof) for the nested-KVM multi-node VM-hosting and live-control-plane path
Canonical bare-metal bootstrap proof:
nix run ./nix/test-cluster#cluster -- baremetal-iso
nix build .#checks.x86_64-linux.baremetal-iso-e2e
./result/bin/baremetal-iso-e2e ./work/baremetal-iso-e2e/latest
baremetal-iso-e2e now materializes the exact local-KVM proof runner instead of trying to boot QEMU inside a sandboxed nixbld build. That older build-time execution model degraded to TCG; the built runner keeps the canonical attr name but executes the same verify-baremetal-iso.sh harness as the direct QEMU proof, with host KVM and persistent logs under ./work.
The QEMU ISO proof is a stand-in for the real install route, not a separate workflow. Build nixosConfigurations.ultracloud-iso, boot it under KVM locally or write the same ISO to USB or BMC virtual media on hardware, and pass the same bootstrap inputs that the installer consumes in the harness: ultracloud.deployer_url=<scheme://host:port>, ultracloud.bootstrap_token=<token> for authenticated bootstrap or a lab-only deployer configured with allow_unauthenticated=true, optional ultracloud.ca_cert_url=<https://.../ca.crt>, optional ultracloud.binary_cache_url=<http://cache:8090>, and optional ultracloud.node_id= / ultracloud.hostname= overrides when DMI serials or DHCP names are not the desired identity.
The networking contract is the same in QEMU and on hardware: the live ISO needs DHCP or equivalent L3 reachability to deployer before Disko starts, and it needs reachability to the optional binary cache if you want it to pull prebuilt closures instead of compiling locally. The local QEMU proof relies on the 10.0.2.2 fallback addresses from user-mode NAT; real hardware should set ultracloud.deployer_url and, when used, ultracloud.binary_cache_url to routable control-plane endpoints. USB media and BMC virtual media are only transport differences for the same ISO and kernel parameters. For the local proof keep ./work or ULTRACLOUD_WORK_ROOT on a large disk; the checked-in wrappers force local builders and derive Nix parallelism from the host CPU count unless you override it explicitly.
Canonical hardware preflight and handoff for the same path:
nix run ./nix/test-cluster#hardware-smoke -- preflight
nix run ./nix/test-cluster#hardware-smoke -- run
nix run ./nix/test-cluster#hardware-smoke -- capture
That wrapper keeps the QEMU proof and the physical-node proof on one contract by writing the exact kernel parameters, expected ULTRACLOUD_MARKER sequence, failure markers, and artifact root under ./work/hardware-smoke/latest.
Canonical hardware handoff for that path:
- Build
nixosConfigurations.ultracloud-isoplus the target role configs (baremetal-qemu-control-plane,baremetal-qemu-worker, or their hardware-specific successors) and exposedeployerplus an optional HTTP Nix cache on addresses the installer can reach. - Publish cluster state so that the reusable node class owns the install contract:
install_plan.nixos_configuration,install_plan.disko_config_path, and preferablyinstall_plan.target_disk_by_id. Node entries should only bind identity, pool, and any desired-system override that truly differs per host. When you expose a binary cache, prefer settingdesired_system.target_systemto the prebuilt class-owned closure as well so post-install convergence does not rebuild a dirty local variant on each node. - Boot the same ISO through USB or BMC virtual media and pass
ultracloud.deployer_url=...,ultracloud.bootstrap_token=..., and, when used,ultracloud.binary_cache_url=...on the kernel command line. - Watch the canonical marker sequence from the installer journal:
pre-install.boot,pre-install.phone-home.complete,install.bundle-downloaded,install.disko.complete,install.nixos-install.complete,reboot,post-install.boot. - Treat
nix-agentreporting the desired system asactiveas the final convergence gate. The QEMU harness proves the same sequence, only with virtio disks and host-local endpoints standing in for the real chassis.
The checked-in QEMU proof now mirrors the disk-selection contract that hardware should use. Its node classes install by stable /dev/disk/by-id/virtio-uc-control-root and /dev/disk/by-id/virtio-uc-worker-root selectors, backed by explicit QEMU disk serials, while the ISO resolves the prebuilt Disko script and target system from the install profile name embedded into the ISO. Hardware should keep the same class/profile structure and swap only the disk selector, routable URLs, and physical media transport.
Canonical Profiles
UltraCloud now fixes the public support surface to three canonical profiles:
| Profile | Canonical entrypoints | Required components | Optional components |
|---|---|---|---|
single-node dev |
nix run .#single-node-quickstart, nix run .#single-node-trial, nix build .#single-node-trial-vm, nixosConfigurations.single-node-quickstart, companion install image nixosConfigurations.netboot-all-in-one |
chainfire, flaredb, iam, plasmavmc, prismnet |
lightningstor, coronafs, flashdns, fiberlb, apigateway, nightlight, creditservice, k8shost |
3-node HA control plane |
nixosConfigurations.node01, nixosConfigurations.node02, nixosConfigurations.node03, companion install image nixosConfigurations.netboot-control-plane |
chainfire, flaredb, iam, nix-agent on every control-plane node, plus deployer on the bootstrap node |
fleet-scheduler, node-agent, prismnet, flashdns, fiberlb, plasmavmc, lightningstor, coronafs, k8shost, apigateway, nightlight, creditservice |
bare-metal bootstrap |
nix run ./nix/test-cluster#cluster -- baremetal-iso, nixosConfigurations.ultracloud-iso, nixosConfigurations.baremetal-qemu-control-plane, nixosConfigurations.baremetal-qemu-worker, checks.x86_64-linux.baremetal-iso-e2e |
deployer, first-boot-automation, install-target, nix-agent |
node-agent, fleet-scheduler, and higher-level storage or edge services after bootstrap |
nixosConfigurations.netboot-all-in-one and nixosConfigurations.netboot-control-plane are canonical companion images for the supported single-node dev and 3-node HA control plane profiles. packages.single-node-trial-vm is the low-friction trial artifact for the minimal VM-platform core. nixosConfigurations.netboot-worker, netboot-base, pxe-server, vm-smoke-target, and older launch flows under baremetal/vm-cluster are archived helpers or legacy/manual debugging paths outside the canonical profiles and their guard set.
Cluster Authoring
ultracloud.cluster backed by nix/lib/cluster-schema.nix is the only supported cluster authoring source. It is the canonical place to define nodes, reusable deployer classes and pools, rollout objects, service placement intent, and the generated per-node bootstrap metadata consumed by deployer, fleet-scheduler, nix-agent, and node-agent.
nix-nos is limited to legacy compatibility and low-level network primitives such as interfaces, VLANs, BGP, and static routing. It is not the canonical source for cluster topology, rollout intent, scheduler state, or bootstrap inventory.
Responsibility Boundaries
plasmavmcowns tenant VM lifecycle plus KVM worker registration. It can run against explicit remote IAM, PrismNet, and FlareDB endpoints, but it does not own machine enrollment, desired-system rollout, or host-native service placement.k8shostowns Kubernetes-style pod and service APIs for tenant workloads, then translates them intoprismnet,flashdns, andfiberlbobjects. It does not place host-native cluster daemons, and its runtime dataplane helpers remain archived non-product.fleet-schedulerowns placement and failover of host-native service instances from declarative cluster state derived fromultracloud.cluster. It consumesnode-agentheartbeats and writes instance placement, but it does not expose tenant-facing Kubernetes semantics.deployerowns machine enrollment,/api/v1/phone-home, install plans, cluster metadata, and desired-system references. The supported declarative input for that state is the JSON generated fromultracloud.cluster; it decides what a node should become, but it does not execute the host-local switch.nix-agentowns host-local NixOS convergence only. It reads desired-system state fromdeployerorchainfire, activates the target closure, and rolls back on failed health checks.node-agentowns host-local runtime execution only. It reports heartbeats and applies scheduled service-instance state, but it does not install the base OS or rewrite desired-system targets.
The single-node quickstart deliberately stops below that rollout stack: it ships only the VM-platform core plus optional add-ons, not deployer, nix-agent, node-agent, or fleet-scheduler.
Standalone Stories
single-node-trial-vmandsingle-node-quickstartare the standalone VM-platform story. They keep the minimal KVM-backed VM surface light and intentionally excludedeployer,nix-agent,fleet-scheduler, andnode-agent.deployer-vm-smoke,portable-control-plane-regressions, andbaremetal-isoare the standalone rollout-stack story. They validatedeployer -> nix-agentanddeployer -> fleet-scheduler -> node-agentwithout requiring the full VM-hosting bundle.
Rollout Bundle Operations
The rollout-bundle operator contract is fixed in docs/rollout-bundle.md. As of 2026-04-10 the supported deployer recovery model is scope-fixed to one active writer plus optional cold-standby restore that reuses the same ChainFire namespace, credentials, bootstrap bundle, and local state backup. deployer is scope-fixed to one active writer plus optional cold-standby restore; automatic ChainFire-backed multi-instance failover is outside the supported product contract for this release.
The same operator doc also fixes the nix-agent health-check and rollback contract, the node-agent logs/secrets/volume/upgrade contract, and the fleet-scheduler supported upper limit: the two native-runtime worker lab with one planned drain cycle, one fail-stop worker-loss cycle, and 30-second held degraded states in rollout-soak. fleet-scheduler is scope-fixed to the two native-runtime worker lab with one planned drain cycle, one fail-stop worker-loss cycle, and 30-second held degraded states in rollout-soak. The canonical proofs are nix build .#checks.x86_64-linux.deployer-vm-rollback, nix build .#checks.x86_64-linux.fleet-scheduler-e2e, nix build .#checks.x86_64-linux.portable-control-plane-regressions, nix run ./nix/test-cluster#cluster -- fresh-smoke, nix run ./nix/test-cluster#cluster -- rollout-soak, and nix run ./nix/test-cluster#cluster -- durability-proof.
Main Entrypoints
- workspace flake: flake.nix
- single-node quickstart smoke:
nix run .#single-node-quickstart - single-node trial artifact:
nix build .#single-node-trial-vm,nix run .#single-node-trial - smallest rollback proof for
deployer -> nix-agent:nix build .#checks.x86_64-linux.deployer-vm-rollback 3-node HA control planeconfigs:nixosConfigurations.node01,nixosConfigurations.node02,nixosConfigurations.node03, companion imagenixosConfigurations.netboot-control-plane- portable local proof:
nix build .#checks.x86_64-linux.portable-control-plane-regressions - longer-running control-plane and rollout soak:
nix run ./nix/test-cluster#cluster -- rollout-soak - canonical bare-metal bootstrap smoke:
nix run ./nix/test-cluster#cluster -- baremetal-iso - canonical bare-metal exact proof runner:
nix build .#checks.x86_64-linux.baremetal-iso-e2ethen./result/bin/baremetal-iso-e2e - canonical physical-node preflight and handoff:
nix run ./nix/test-cluster#hardware-smoke -- preflight, thenrunorcapture - canonical profile guards:
nix build .#checks.x86_64-linux.canonical-profile-eval-guards,nix build .#checks.x86_64-linux.canonical-profile-build-guards - supported surface guard:
nix build .#checks.x86_64-linux.supported-surface-guardfor public docs wording, shipped server API completeness, and high-signal TODO or best-effort markers in the supported provider/backend servers - VM validation harness: nix/test-cluster/README.md
- work-root budget helper:
./nix/test-cluster/work-root-budget.sh status,enforce, andprune-proof-logs - shared volume notes: coronafs/README.md
- apigateway supported scope: apigateway/README.md
- nightlight supported scope: nightlight/README.md
- creditservice supported scope: creditservice/README.md
- k8shost supported scope: k8shost/README.md
Repository Guide
- docs/README.md: documentation entrypoint
- docs/testing.md: validation path summary
- docs/component-matrix.md: canonical profiles and optional bundles
- docs/rollout-bundle.md: rollout-bundle HA, rollback, drain, logs, secrets, and volume contract
- docs/control-plane-ops.md: ChainFire membership boundary, FlareDB schema or destructive-DDL boundary, and IAM bootstrap hardening plus signing-key, credential, and mTLS rotation
- docs/edge-trial-surface.md: APIGateway, NightLight, CreditService, trial-surface, and work-root budget contract
- docs/provider-vm-reality.md: PrismNet, FlashDNS, FiberLB, and PlasmaVMC local-KVM proof scope plus artifact contract
- docs/hardware-bringup.md: USB/BMC/Redfish preflight, artifact capture, and hardware-smoke handoff
- docs/storage-benchmarks.md: latest CoronaFS and LightningStor lab numbers
plans/: design notes and exploration documents
Scope
UltraCloud is centered on reproducible infrastructure behavior. Optional add-ons such as creditservice and k8shost remain part of the supported surface only when the documented scope, harness coverage, and public contract stay aligned with what the repository actually ships.
Host-level NixOS rollout validation is also expected to stay reproducible: baremetal-iso-e2e is now the materialized exact proof runner for the full install path, canonical-profile-eval-guards and canonical-profile-build-guards fail fast when supported outputs drift, supported-surface-guard now rejects unfinished public wording, shipped server API stubs, high-signal completeness markers such as TODO: or best-effort in the supported network or backend servers, and archived helper regressions such as worker netboot or backend scaffolds re-entering the default product surface, while portable-control-plane-regressions remains the non-KVM developer lane that keeps the main control-plane and rollout boundaries green on TCG-only hosts before the publishable nested-KVM suite is rerun.