32 KiB
Testing
UltraCloud treats VM-first validation as the canonical local proof path and keeps the public support contract limited to three profiles.
Canonical Profiles
| Profile | Canonical entrypoints | Required components | Optional components |
|---|---|---|---|
single-node dev |
nix run .#single-node-quickstart, nix run .#single-node-trial, nix build .#single-node-trial-vm, nixosConfigurations.single-node-quickstart, companion install image nixosConfigurations.netboot-all-in-one |
chainfire, flaredb, iam, plasmavmc, prismnet |
lightningstor, coronafs, flashdns, fiberlb, apigateway, nightlight, creditservice, k8shost |
3-node HA control plane |
nixosConfigurations.node01, nixosConfigurations.node02, nixosConfigurations.node03, companion install image nixosConfigurations.netboot-control-plane |
chainfire, flaredb, iam, nix-agent on every control-plane node, plus deployer on the bootstrap node |
fleet-scheduler, node-agent, prismnet, flashdns, fiberlb, plasmavmc, lightningstor, coronafs, k8shost, apigateway, nightlight, creditservice |
bare-metal bootstrap |
nix run ./nix/test-cluster#cluster -- baremetal-iso, nixosConfigurations.ultracloud-iso, nixosConfigurations.baremetal-qemu-control-plane, nixosConfigurations.baremetal-qemu-worker, checks.x86_64-linux.baremetal-iso-e2e |
deployer, first-boot-automation, install-target, nix-agent |
node-agent, fleet-scheduler, and higher-level storage or edge services after bootstrap |
nixosConfigurations.netboot-all-in-one and nixosConfigurations.netboot-control-plane are canonical companion images for the single-node and HA profiles. nixosConfigurations.netboot-worker is an archived worker helper outside the canonical profiles and their guard set, and baremetal/vm-cluster remains a legacy/manual debugging path rather than a publishable entrypoint.
Cluster Authoring Source
ultracloud.cluster backed by nix/lib/cluster-schema.nix is the only supported cluster authoring source. The supported rollout and scheduling tests consume cluster state generated from that module rather than treating nix-nos or ad hoc shell state as a primary source.
nix-nos is limited to legacy compatibility and low-level network primitives such as interfaces, VLANs, BGP, and static routing.
Quickstart Smoke
nix flake show . --all-systems | rg -n "quickstart|single-node|trial|container|oci"
nix build .#single-node-trial-vm
nix eval --no-eval-cache .#nixosConfigurations.single-node-quickstart.config.system.build.toplevel.drvPath --raw
nix run .#single-node-quickstart
single-node-trial-vm is the buildable trial artifact for the minimal VM-platform core, and single-node-quickstart is the automated smoke launcher for that same surface. The launcher boots the minimal VM stack under QEMU, waits for chainfire, flaredb, iam, prismnet, and plasmavmc, verifies their health from inside the guest, and checks the machine-readable product-surface manifest shipped in the VM. The launcher uses the generated NixOS VM runner, so it can fall back to TCG when /dev/kvm is absent.
single-node-trial is a public alias for the same smoke launcher. OCI/Docker artifact is intentionally not the public trial surface because the supported scope needs a guest kernel plus host KVM, /dev/net/tun, and OVS/libvirt semantics; a privileged container would not represent the same contract.
For debugging, keep the VM alive after the smoke passes:
ULTRACLOUD_QUICKSTART_KEEP_VM=1 nix run .#single-node-quickstart
3-Node HA Control Plane
nix eval --no-eval-cache .#nixosConfigurations.node01.config.system.build.toplevel.drvPath --raw
nix eval --no-eval-cache .#nixosConfigurations.node02.config.system.build.toplevel.drvPath --raw
nix eval --no-eval-cache .#nixosConfigurations.node03.config.system.build.toplevel.drvPath --raw
nix eval --no-eval-cache .#nixosConfigurations.netboot-control-plane.config.system.build.toplevel.drvPath --raw
These are the canonical HA control-plane entrypoints. The publishable six-node VM-cluster suite under ./nix/test-cluster extends this baseline with worker and optional service nodes, but it does not redefine the supported profile names.
Canonical Bare-Metal Proof
nix eval --no-eval-cache .#nixosConfigurations.baremetal-qemu-control-plane.config.system.build.toplevel.drvPath --raw
nix eval --no-eval-cache .#nixosConfigurations.baremetal-qemu-worker.config.system.build.toplevel.drvPath --raw
nix run ./nix/test-cluster#cluster -- baremetal-iso
nix build .#checks.x86_64-linux.baremetal-iso-e2e
./result/bin/baremetal-iso-e2e ./work/baremetal-iso-e2e/latest
nix run ./nix/test-cluster#hardware-smoke -- preflight
baremetal-iso is the canonical install path for QEMU-as-bare-metal validation. It boots nixosConfigurations.ultracloud-iso, waits for /api/v1/phone-home, downloads the flake bundle from deployer, runs Disko, reboots, confirms the first post-install boot markers, and waits for nix-agent to report the desired system as active for both baremetal-qemu-control-plane and baremetal-qemu-worker.
baremetal-iso-e2e now keeps the exact flake attr but changes the execution model: nix build .#checks.x86_64-linux.baremetal-iso-e2e materializes ./result/bin/baremetal-iso-e2e, and that built runner executes the same nix/test-cluster/verify-baremetal-iso.sh harness with host KVM and logs under ./work by default. This avoids the old daemon-sandbox path where a nixbld build fell back to TCG instead of the host's /dev/kvm.
The local proof intentionally mirrors the real hardware route. Build nixosConfigurations.ultracloud-iso, then either boot that ISO in QEMU with KVM or put the same image on USB or BMC virtual media for the target machine. The live installer consumes the same bootstrap parameters in every environment:
ultracloud.deployer_url=<scheme://host:port>for the reachabledeployerendpointultracloud.bootstrap_token=<token>for authenticated phone-home, or a lab-onlydeployerwithallow_unauthenticated=trueultracloud.ca_cert_url=<https://.../ca.crt>whendeployeris TLS-enabled with a private CAultracloud.binary_cache_url=<http://cache:8090>when you want the installer to fetch host-built closures instead of compiling locallyultracloud.node_id=andultracloud.hostname=only when you need to override the DMI-serial or hostname-derived identity
The networking assumptions are also the same. The ISO needs DHCP or equivalent IP configuration that can reach deployer before Disko starts, and it must also reach the optional binary cache when that URL is set. The QEMU harness uses user-mode NAT and the built-in 10.0.2.2 fallback endpoints for the local host; physical installs should set the deployer and cache URLs explicitly to routable control-plane addresses.
The proven marker sequence from nix/test-cluster/verify-baremetal-iso.sh is the same sequence you should expect on hardware: pre-install.boot, pre-install.phone-home.complete, install.bundle-downloaded, install.disko.complete, install.nixos-install.complete, reboot, post-install.boot, and finally nix-agent reporting the desired system as active. USB and BMC virtual media change only how the ISO is presented to the machine; they do not change the bootstrap contract.
Hardware Bring-Up Pack
nix run ./nix/test-cluster#hardware-smoke -- preflight
nix run ./nix/test-cluster#hardware-smoke -- run
nix run ./nix/test-cluster#hardware-smoke -- capture
hardware-smoke is the canonical USB/BMC/Redfish bridge for the physical-node proof. It always writes artifacts under ./work/hardware-smoke/<run-id> and refreshes ./work/hardware-smoke/latest.
preflightemitskernel-params.txt,expected-markers.txt,failure-markers.txt,operator-handoff.md, andstatus.env.- With no USB device or BMC/Redfish credentials,
preflightrecordsstatus=blockedand the exact missing transport inputs inmissing-requirements.txt. - With transport present, the same wrapper can write USB media or call Redfish virtual media and then capture the real
desired-system activeevidence through SSH or a supplied serial log. - The expected hardware markers are the same
ULTRACLOUD_MARKER pre-install.boot.*,pre-install.phone-home.complete.*,install.disko.complete.*,reboot.*,post-install.boot.*, anddesired-system-active.*lines used byverify-baremetal-iso.sh.
Hardware runbook for the same canonical path:
- Build
nixosConfigurations.ultracloud-isoand the target install profiles you want the installer to materialize. - Publish cluster state where each reusable node class owns
install_plan.nixos_configuration,install_plan.disko_config_path, and a stable disk selector. Preferinstall_plan.target_disk_by_idon hardware; the QEMU proof now uses/dev/disk/by-id/virtio-uc-control-rootand/dev/disk/by-id/virtio-uc-worker-rootto exercise the same contract. When the live ISO can reach a binary cache, also publishdesired_system.target_systemwith the prebuilt closure for that class sonix-agentconverges to the exact shipped system instead of rebuilding a dirty local copy. - Make
deployerand the optional binary cache reachable from the live ISO, then boot the ISO through USB or BMC virtual media withultracloud.deployer_url=...,ultracloud.bootstrap_token=..., and optionalultracloud.binary_cache_url=.... - Confirm the live installer resolves the install profile, downloads the flake bundle, runs Disko against the selected disk, reboots, and lands on the post-install marker.
- Confirm
nix-agenton the installed node converges the desired system toactive.
QEMU-to-hardware mapping for the proof:
| QEMU harness proof | Hardware proof |
|---|---|
nix run ./nix/test-cluster#cluster -- baremetal-iso |
boot the same nixosConfigurations.ultracloud-iso through USB or BMC virtual media |
user-mode NAT fallback to 10.0.2.2 |
routable ultracloud.deployer_url and optional ultracloud.binary_cache_url |
| virtio disk by-id selectors seeded by explicit QEMU serials | server, NVMe, or RAID-controller /dev/disk/by-id/... selectors in the node class |
host-local QEMU logs and SSH on 127.0.0.1:22231/22232 |
serial-over-LAN, BMC console, or physical console plus SSH on the installed host |
same marker sequence and nix-agent active gate |
same marker sequence and nix-agent active gate |
Host prerequisites for the KVM-backed proof are a Linux host with readable and writable /dev/kvm, nested virtualization enabled, and enough free space under ./work or ULTRACLOUD_WORK_ROOT for VM disks, logs, and temporary build state. The checked-in wrappers force local Nix builders and derive max-jobs and per-build cores from the host CPU count unless ULTRACLOUD_LOCAL_NIX_MAX_JOBS, ULTRACLOUD_LOCAL_NIX_BUILD_CORES, PHOTON_CLUSTER_NIX_MAX_JOBS, or PHOTON_CLUSTER_NIX_BUILD_CORES override them.
Regression Guards
nix build .#checks.x86_64-linux.canonical-profile-eval-guards
nix build .#checks.x86_64-linux.canonical-profile-build-guards
These two checks are the fast fail-first drift gates for the supported surface:
canonical-profile-eval-guards: forces evaluation of every canonical profile entrypoint, so broken attrs fail before any long-running harness work starts.canonical-profile-build-guards: realizes the single-node VM, the HA control-plane configs and companion image, and the ISO or bare-metal outputs so build-time drift is caught even when a cluster harness is not running.supported-surface-guard: rejects unfinished public-surface wording across the published docs, add-on workspaces, and VM-cluster harness files, fails on shipped public server code that still containsStatus::unimplemented,unimplemented!(),todo!(), or other intentional stub responses, blocks high-signal completeness markers such asTODO:,FIXME, orbest-effortin the supported FiberLB, PrismNet, PlasmaVMC, and K8sHost server code paths, and also fails if archived helpers such asnetboot-worker,plasmavmc-firecracker,k8shost-cni,k8shost-csi, ork8shost-controllersre-enter the default product surface.
Portable Local Proof
nix build .#checks.x86_64-linux.canonical-profile-eval-guards
nix build .#checks.x86_64-linux.portable-control-plane-regressions
Use this lane on Linux hosts that do not expose /dev/kvm:
portable-control-plane-regressions: TCG-safe aggregate check that keeps the canonical profile eval guard,deployer-bootstrap-e2e,host-lifecycle-e2e,deployer-vm-smoke, andfleet-scheduler-e2egreen together.- It also links in
supported-surface-guard, so unsupported product-surface wording, code-level public API stubs, or high-signal completeness markers in the supported provider/backend servers fail in the same low-cost lane before a publishable rerun. - It intentionally does not boot the six-node nested-KVM VM suite, so it is a developer regression path, not the publishable multi-node proof.
- CI runs
canonical-profile-eval-guardsandportable-control-plane-regressionson every relevant change from.github/workflows/nix.yml.
Publishable Checks
nix run .#single-node-quickstart
nix run ./nix/test-cluster#cluster -- baremetal-iso
nix run ./nix/test-cluster#cluster -- fresh-smoke
nix run ./nix/test-cluster#cluster -- fresh-demo-vm-webapp
nix run ./nix/test-cluster#cluster -- fresh-matrix
nix run ./nix/test-cluster#cluster -- chainfire-live-membership-proof
nix run ./nix/test-cluster#cluster -- provider-vm-reality-proof
nix run ./nix/test-cluster#cluster -- rollout-soak
./nix/test-cluster/run-publishable-kvm-suite.sh ./work/publishable-kvm-suite
./nix/test-cluster/run-supported-surface-final-proof.sh ./work/final-proofs/latest
nix build .#checks.x86_64-linux.baremetal-iso-e2e
nix build .#checks.x86_64-linux.baremetal-iso-e2e && ./result/bin/baremetal-iso-e2e ./work/baremetal-iso-e2e/latest
nix build .#checks.x86_64-linux.deployer-vm-smoke
Use these commands as the release-facing local proof set:
single-node-quickstart: productized one-command quickstart gate for the minimal VM platform profilesingle-node-trial-vm: buildable VM appliance for the same minimal VM-platform profilebaremetal-iso: canonical bare-metal bootstrap gate covering pre-install boot, phone-home, flake bundle fetch, Disko install, reboot, post-install boot, and desired-system activation on one control-plane node plus one worker-equivalent nodefresh-smoke: base VM-cluster gate for the six-node harness that extends the canonical3-node HA control plane, including readiness, core behavior, and fault injectionfresh-smokealso proves the supported PlasmaVMC backend contract by requiring both worker registrations to advertiseHYPERVISOR_TYPE_KVMand nothing broader on the public surfacefresh-demo-vm-webapp: optional VM-hosting bundle proof forplasmavmc + prismnetwith state persisted throughlightningstorfresh-matrix: optional composition proof for provider bundles such asprismnet + flashdns + fiberlbandplasmavmc + coronafs + lightningstor, including PrismNet security-group ACL add/remove, FiberLB TCP plus TLS-terminatedHttps/TerminatedHttpslisteners, LightningStor bucket metadata plus object-version APIs, the publishedk8shostpod-watch surface, and the KVM-only PlasmaVMC worker contractchainfire-live-membership-proof: focused local-KVM ChainFire lane that starts from the canonical 3-node control plane, adds a temporary learner onnode04, promotes it to voter, transfers leadership to another live voter, restarts the temporary voter, removes the current leader, re-adds the removed leader, and scales back into the canonical 3-node shape while proving local serializable reads through each transitionprovider-vm-reality-proof: focused local-KVM provider and VM-hosting lane that writes dated artifacts under./work/provider-vm-reality-proof/latest, captures authoritative FlashDNS answers, FiberLB backend drain and re-convergence, and PlasmaVMC KVM shared-storage migration plus post-migration restart staterollout-soak: focused longer-run control-plane and rollout lane that rebuilds from clean local runtime state, writes dated artifacts under./work/rollout-soak/latest, repeatsdrainingmaintenance and worker power-loss, then restartsdeployer,fleet-scheduler,node-agent,chainfire, andflaredbwhile recording explicitnix-agentscope markers for the steady-state KVM nodesdurability-proof: canonical chainfire flaredb deployer backup/restore lane. It stores artifacts under./work/durability-proof/latest, proves logical backup/restore for ChainFire keys and FlareDB SQL rows, uses the canonical Deployer admin pre-register request itself as the backup artifact, verifies that the pre-registered node survives adeployer.servicerestart, replays the same request idempotently, and injects CoronaFS plus LightningStor failures on the live KVM clusterrun-publishable-kvm-suite.sh: reproducible wrapper that captures the KVM environment, requires real/dev/kvmaccess, keeps runtime state under./workby default, and runs the publishable nested-KVM application lanes plus the focused ChainFire live-membership proof in a single commandrun-supported-surface-final-proof.sh: one-shot local wrapper that keeps builders local, records environment metadata, buildssingle-node-trial-vm, runssupported-surface-guard,single-node-quickstart, and then the publishable nested-KVM suite into one dated log rootbaremetal-iso-e2e: materialized exact proof runner for the same canonical ISO harness; the build output keeps the attr stable, and./result/bin/baremetal-iso-e2eruns the real host-KVM proof with persisted log/metadeployer-vm-smoke: lightweight regression proving thatnix-agentcan activate a host-built target closure without guest-side compilationdeployer-vm-rollback: smallest reproduciblenix-agentrollback proof. It publishes a desired system with a failinghealth_check_command, expects observed statusrolled-back, and confirms the node does not stay on the rejected target generation
single-node-trial-vm and single-node-quickstart are the standalone VM-platform story. They keep the minimal KVM-backed surface separate from the rollout stack.
The checked-in local entrypoint for the publishable KVM proof is ./nix/test-cluster/run-publishable-kvm-suite.sh. The repository-owned remote entrypoint is .github/workflows/kvm-publishable-selfhosted.yml, which runs the same wrapper on Forgejo runners labeled nix-host and cn-nixos-mouse-runner.
The 2026-04-10 local AMD/KVM proof snapshot is recorded under ./work/final-proofs/32f64c10-1b74-4d8a-8d7d-b2cc6bf6b4f0-final for supported-surface-guard, single-node-trial-vm, and single-node-quickstart, under ./work/publishable-kvm-suite for the passing fresh-smoke, fresh-demo-vm-webapp, fresh-matrix, and wrapper environment capture, and under ./work/rollout-soak/20260410T164549+0900 for the longer-running rollout/control-plane soak.
The 2026-04-10 exact bare-metal check-runner proof is recorded under ./work/baremetal-iso-e2e/0de75570-dabd-471b-95fe-5898c54e2e8c; its outer environment.txt records execution_model=materialized-check-runner, while state/environment.txt records vm_accelerator_mode=kvm.
Responsibility Coverage
baremetal-isoandbaremetal-iso-e2eare the canonical proof fordeployer -> installer -> nix-agent. They cover phone-home, install-plan materialization, Disko, reboot, and desired-system activation, and they now share the sameverify-baremetal-iso.shruntime harness.deployer-vm-smokeis the smallest regression for the samedeployer -> nix-agentboundary. It proves that a node can receive a prebuilt target closure and activate it without guest-side compilation.deployer-vm-rollbackis the canonical operator proof fornix-agenthealth-check, rollback, and partial failure recovery. Use it with rollout-bundle.md when documenting or changing the host-local rollback contract.portable-control-plane-regressionskeeps the main non-KVM-safe boundaries under continuous coverage by composingdeployer-bootstrap-e2e,host-lifecycle-e2e,deployer-vm-smoke, andfleet-scheduler-e2ebehind the canonical profile eval guard.fresh-smokeandfresh-matrixare the canonical proof fordeployer -> fleet-scheduler -> node-agent. They cover native service placement, heartbeats, failover, and runtime reconciliation.fresh-smokeproves the supportedfleet-schedulermaintenance semantics: short-livedactive -> draining -> activetransitions, fail-stop worker loss, and replica restoration after the node returns.chainfire-live-membership-proofis the canonical KVM proof for ChainFire live reconfiguration on the supported surface. It covers learner add, local replica catch-up, voter promotion, live leader transfer, temporary-voter restart and rejoin, current-leader removal, removed-leader re-add, and final scale-in on the canonical control-plane shape.rollout-soakis the longer-running companion lane for the same bundle. It validates exactly one planned drain cycle and one fail-stop worker-loss cycle on the two native-runtime workers, holds each degraded state for 30 seconds, restartsdeployer,fleet-scheduler,node-agent,chainfire, andflaredb, and then revalidates the live cluster. It also writesscope-fixed-contract.json,deployer-scope-fixed.txt, andfleet-scheduler-scope-fixed.txtso the supported release boundary is captured in the proof root. The steady-state KVM nodes do not shipnix-agent.service, so the lane records scope markers there and leaves executablenix-agentproof todeployer-vm-rollback,baremetal-iso, andbaremetal-iso-e2e.- Multi-hour maintenance windows, arbitrary multi-voter ChainFire swaps that still need joint consensus, larger-cluster or hardware ChainFire live membership reconfiguration beyond the canonical KVM proof lane, destructive FlareDB schema rewrites, fully automated online migration, and large-cluster drain storms remain outside the release-proven scope and are called out explicitly in rollout-bundle.md and control-plane-ops.md.
fresh-smokealso coversk8shostseparately fromfleet-scheduler:k8shostexposes tenant pod and service semantics, whilefleet-schedulerhandles bare-metal host services.k8shostis fixed as an API/control-plane product surface; runtime dataplane helpers stay archived non-product.fresh-matrixkeeps the shipped add-on surface honest: it exercises the supportedcreditservicequota, wallet, reservation, and API-gateway flows, the publishedk8shost-serverAPI contract, the supported LightningStor bucket metadata plus object-version APIs, and the network-provider bundle contract for PrismNet ACL lifecycle plus FiberLB TCP and TLS-terminated listeners.provider-vm-reality-proofis the artifact-producing companion lane for that same provider or VM-hosting bundle. It records PrismNet port and ACL state, authoritative FlashDNS answers, FiberLB listener drain or restore artifacts, and PlasmaVMC migration or storage-handoff state in one dated proof root.- PrismNet real OVS/OVN dataplane validation remains outside the supported local KVM surface. The current provider proof keeps tenant API lifecycle and attached-VM networking honest, but not a release-grade
ovn-nbctlor hardware-switch dataplane path. - FiberLB native BGP or BFD peer interop and hardware VIP ownership remain outside the supported local KVM surface. The current provider proof fixes the shipped contract to listener publication plus backend drain and re-convergence inside the lab.
- PlasmaVMC real-hardware migration or storage handoff remains a later hardware proof. The current provider proof fixes the release surface to KVM shared-storage migration on the local worker pair.
- Within that edge bundle, APIGateway is supported as stateless replicated instances behind an external L4 or VIP layer, but the release-facing proof remains the shipped single gateway-node layout on
node06; live in-process reload is not promised, and config rollout stays restart-based. - NightLight is supported as a single-node WAL/snapshot service; replicated HA metrics storage and per-tenant retention enforcement are not part of the current product contract.
- CreditService export and backend migration are supported as offline export/import or backend-native snapshot workflows, not live mixed-writer migration.
- FiberLB HTTPS health checks currently do not verify backend TLS certificates. Supported scope is limited to TCP reachability plus HTTP status for the backend endpoint until CA-aware verification is wired through config, server code, and the canonical harness.
durability-proofis the canonical backup, restore, and failure-injection companion lane for the publishable KVM suite. Use it afterfresh-matrixwhen you need persisted artifacts forchainfire,flaredb,deployer,coronafs, andlightningstor.rollout-soakis the longer-running maintenance and DR companion lane for the same control-plane and rollout bundle. Use it when a change is supposed to survive the current release boundary of one planned drain cycle, one fail-stop worker-loss cycle, and service-restart churn on the live KVM lab instead of only the shortfresh-smokewindow.run-core-control-plane-ops-proof.shis the focused operator lifecycle proof for the core control plane. It records the published ChainFire API boundary, the FlareDB additive-first migration and destructive-DDL boundary, and the standalone IAM bootstrap hardening plus signing-key, credential, and mTLS rotation proof under./work/core-control-plane-ops-proof.- The supported
deployerHA and DR boundary is scope-fixed to one active writer plus optional cold-standby restore, not automatic multi-instance failover. The canonical runbook is to recover one writer, re-applyultracloud.clustergenerated state withdeployer-ctl apply, replay preserved admin pre-register requests, and then verify state through the admin API ordeployer-ctl node inspect; the unsupported multi-instance boundary is fixed in rollout-bundle.md. - The supported
node-agentproduct contract is also fixed in rollout-bundle.md: per-instance logs and pid metadata live under${stateDir}/pids, secrets must already exist in the rendered spec or mounted host files, host-path volumes are passed through but not provisioned, and upgrades are replace-and-reconcile operations rather than in-place patching. - The dated 2026-04-10 proof root for that lane is
./work/durability-proof/20260410T120618+0900;result.jsonrecordssuccess=true, and the artifact set includesdeployer-post-restart-list.json,coronafs-node04-local-state.json, andlightningstor-head-during-node05-outage.json. single-node-quickstartintentionally excludesdeployer,nix-agent,node-agent, andfleet-scheduler, so the smallest trial surface stays focused on the VM-platform core instead of mixing rollout and scheduling responsibilities.
The three fresh-* VM-cluster commands plus chainfire-live-membership-proof make up the publishable nested-KVM suite. They require a Linux host with /dev/kvm and nested virtualization, and the harness stops at preflight by design when that device is absent. single-node-quickstart and baremetal-iso can still fall back to TCG for debugging, but the release-facing baremetal-iso-e2e runner now requires host KVM so the exact proof lane matches the shipped hardware proxy route. deployer-vm-smoke and portable-control-plane-regressions remain the supported non-KVM developer lanes.
Release-facing completion now requires both of these to be green on the same branch:
- the canonical bare-metal proof:
nix run ./nix/test-cluster#cluster -- baremetal-isoplusnix build .#checks.x86_64-linux.baremetal-iso-e2eand./result/bin/baremetal-iso-e2e - the publishable nested-KVM suite:
fresh-smoke,fresh-demo-vm-webapp,fresh-matrix, andchainfire-live-membership-proof, preferably through./nix/test-cluster/run-publishable-kvm-suite.sh
Focused operator lifecycle proof for the core control plane:
./nix/test-cluster/run-core-control-plane-ops-proof.sh ./work/core-control-plane-ops-proof/latest
This proof is lighter than the full KVM suite. It keeps supported-surface-guard honest for the control-plane contract, runs the standalone IAM signing-key rotation, credential rotation, and mTLS overlap rotation tests, and records the explicit ChainFire membership, FlareDB schema migration or destructive-DDL boundary, and IAM bootstrap hardening markers that the public docs now promise.
The dated 2026-04-10 artifact root for that lane is ./work/core-control-plane-ops-proof/20260410T172148+09:00; it includes iam-key-rotation-tests.log, iam-credential-rotation-tests.log, iam-mtls-rotation-tests.log, scope-fixed-contract.json, and result.json.
Work Root Budget
./nix/test-cluster/work-root-budget.sh status
./nix/test-cluster/work-root-budget.sh enforce
./nix/test-cluster/work-root-budget.sh cleanup-advice
./nix/test-cluster/work-root-budget.sh prune-proof-logs 2
Use ./nix/test-cluster/work-root-budget.sh status for reporting, ./nix/test-cluster/work-root-budget.sh enforce when a local proof run should fail on budget overrun, and ./nix/test-cluster/work-root-budget.sh prune-proof-logs 2 for a safer dated-proof cleanup dry-run.
The helper keeps the local proof path practical by reporting the current size of ./work, ./work/test-cluster/state, disposable runtime directories such as ./work/tmp and ./work/publishable-kvm-runtime, and the dated proof roots including ./work/provider-vm-reality-proof and ./work/hardware-smoke. The enforce mode turns those soft budgets into a non-zero local gate, and prune-proof-logs gives a safer dated-proof cleanup workflow before the final nix store gc.
Extended Measurements
nix run ./nix/test-cluster#cluster -- fresh-bench-storage
fresh-bench-storage remains useful for storage regression tracking, but it is a benchmark path, not part of the minimal canonical publish gate.
Operational Commands
nix run ./nix/test-cluster#cluster -- status
nix run ./nix/test-cluster#cluster -- logs node01
nix run ./nix/test-cluster#cluster -- ssh node04
nix run ./nix/test-cluster#cluster -- demo-vm-webapp
nix run ./nix/test-cluster#cluster -- serve-vm-webapp
nix run ./nix/test-cluster#cluster -- matrix
nix run ./nix/test-cluster#cluster -- bench-storage
nix run ./nix/test-cluster#cluster -- fresh-matrix
nix run ./nix/test-cluster#cluster -- fresh-bench-storage
nix run ./nix/test-cluster#cluster -- stop
nix run ./nix/test-cluster#cluster -- clean
Validation Philosophy
- package unit tests are useful but not sufficient
- host-built VM clusters are the main integration signal
- bootstrap and rollout paths must stay evaluable independently of the larger VM-hosting feature set
- distributed storage and virtualization paths must be checked under failure, not only at steady state
Legacy And Experimental Paths
baremetal/vm-clustermanual launch scripts arelegacy/manual, not canonical validation- direct
nix develop ./nix/test-cluster -c ./nix/test-cluster/run-cluster.sh ...usage is a debugging path, not the publishable entrypoint - standalone use of
netboot-control-planeornetboot-all-in-oneoutside the documented profiles is a debugging path, not a fourth supported profile netboot-worker, Firecracker, mvisor,k8shost-cni,k8shost-controllers, andlightningstor-csiare archived non-product helpers and should not be presented as canonical entrypointsnetboot-base,pxe-server,vm-smoke-target, and other helper images are internal or legacy building blocks, not supported profiles by themselves