# Testing UltraCloud treats VM-first validation as the canonical local proof path and keeps the public support contract limited to three profiles. ## Canonical Profiles | Profile | Primary outputs | Required components | Optional components | | --- | --- | --- | --- | | `single-node dev` | `nix run .#single-node-quickstart`, `nixosConfigurations.single-node-quickstart`, companion install image `nixosConfigurations.netboot-all-in-one` | `chainfire`, `flaredb`, `iam`, `plasmavmc`, `prismnet` | `lightningstor`, `coronafs`, `flashdns`, `fiberlb`, `apigateway`, `nightlight`, `creditservice`, `k8shost`, `deployer` | | `3-node HA control plane` | `nixosConfigurations.node01`, `node02`, `node03`, `netboot-control-plane` | `chainfire`, `flaredb`, `iam`, `nix-agent` on every control-plane node, plus `deployer` on the bootstrap node | `fleet-scheduler`, `node-agent`, `prismnet`, `flashdns`, `fiberlb`, `plasmavmc`, `lightningstor`, `coronafs`, `k8shost`, `apigateway`, `nightlight`, `creditservice` | | `bare-metal bootstrap` | `nixosConfigurations.ultracloud-iso`, `nixosConfigurations.baremetal-qemu-control-plane`, `nixosConfigurations.baremetal-qemu-worker`, `checks.x86_64-linux.baremetal-iso-e2e` | `deployer`, `first-boot-automation`, `install-target`, `nix-agent` | `netboot-control-plane`, `netboot-worker`, and `netboot-all-in-one` as experimental helper images, plus `node-agent`, `fleet-scheduler`, and higher-level storage or edge services after bootstrap | ## Quickstart Smoke ```bash nix flake show . --all-systems | rg -n "single|all-in-one|quickstart" nix eval --no-eval-cache .#nixosConfigurations.single-node-quickstart.config.system.build.toplevel.drvPath --raw nix run .#single-node-quickstart ``` `single-node-quickstart` is the supported one-box entrypoint. It boots the minimal VM stack under QEMU, waits for `chainfire`, `flaredb`, `iam`, `prismnet`, and `plasmavmc`, and verifies their health from inside the guest. The launcher uses the generated NixOS VM runner, so it can fall back to TCG when `/dev/kvm` is absent. For debugging, keep the VM alive after the smoke passes: ```bash ULTRACLOUD_QUICKSTART_KEEP_VM=1 nix run .#single-node-quickstart ``` ## Canonical Bare-Metal Proof ```bash nix eval --no-eval-cache .#nixosConfigurations.baremetal-qemu-control-plane.config.system.build.toplevel.drvPath --raw nix eval --no-eval-cache .#nixosConfigurations.baremetal-qemu-worker.config.system.build.toplevel.drvPath --raw nix run ./nix/test-cluster#cluster -- baremetal-iso nix build .#checks.x86_64-linux.baremetal-iso-e2e ``` `baremetal-iso` is the canonical install path for QEMU-as-bare-metal validation. It boots `nixosConfigurations.ultracloud-iso`, waits for `/api/v1/phone-home`, downloads the flake bundle from `deployer`, runs Disko, reboots, confirms the first post-install boot markers, and waits for `nix-agent` to report the desired system as `active` for both `baremetal-qemu-control-plane` and `baremetal-qemu-worker`. `baremetal-iso-e2e` runs the same flow under `flake check`. ## Regression Guards ```bash nix build .#checks.x86_64-linux.canonical-profile-eval-guards nix build .#checks.x86_64-linux.canonical-profile-build-guards ``` These two checks are the fast fail-first drift gates for the supported surface: - `canonical-profile-eval-guards`: forces evaluation of every canonical profile output, including `netboot-worker` and `netboot-all-in-one`, so broken attrs fail before any long-running harness work starts. - `canonical-profile-build-guards`: realizes the canonical VM, ISO, control-plane, and helper-image outputs so build-time drift is caught even when a cluster harness is not running. ## Portable Local Proof ```bash nix build .#checks.x86_64-linux.canonical-profile-eval-guards nix build .#checks.x86_64-linux.portable-control-plane-regressions ``` Use this lane on Linux hosts that do not expose `/dev/kvm`: - `portable-control-plane-regressions`: TCG-safe aggregate check that keeps the canonical profile eval guard, `deployer-bootstrap-e2e`, `host-lifecycle-e2e`, `deployer-vm-smoke`, and `fleet-scheduler-e2e` green together. - It intentionally does not boot the six-node nested-KVM VM suite, so it is a developer regression path, not the publishable multi-node proof. - CI runs `canonical-profile-eval-guards` and `portable-control-plane-regressions` on every relevant change from `.github/workflows/nix.yml`. ## Publishable Checks ```bash nix run .#single-node-quickstart nix run ./nix/test-cluster#cluster -- baremetal-iso nix run ./nix/test-cluster#cluster -- fresh-smoke nix run ./nix/test-cluster#cluster -- fresh-demo-vm-webapp nix run ./nix/test-cluster#cluster -- fresh-matrix ./nix/test-cluster/run-publishable-kvm-suite.sh ./work/publishable-kvm-suite nix build .#checks.x86_64-linux.baremetal-iso-e2e nix build .#checks.x86_64-linux.deployer-vm-smoke ``` Use these commands as the release-facing local proof set: - `single-node-quickstart`: productized one-command quickstart gate for the minimal VM platform profile - `baremetal-iso`: canonical bare-metal bootstrap gate covering pre-install boot, phone-home, flake bundle fetch, Disko install, reboot, post-install boot, and desired-system activation on one control-plane node plus one worker-equivalent node - `fresh-smoke`: base VM-cluster gate for the canonical multi-node topology, including readiness, core behavior, and fault injection - `fresh-demo-vm-webapp`: optional VM-hosting bundle proof for `plasmavmc + prismnet` with state persisted through `lightningstor` - `fresh-matrix`: optional composition proof for provider bundles such as `prismnet + flashdns + fiberlb` and `plasmavmc + coronafs + lightningstor` - `run-publishable-kvm-suite.sh`: reproducible wrapper that captures the KVM environment and runs the full publishable nested-KVM trio in a single command - `baremetal-iso-e2e`: flake-check wrapper around the same canonical ISO harness - `deployer-vm-smoke`: lightweight regression proving that `nix-agent` can activate a host-built target closure without guest-side compilation ## Responsibility Coverage - `baremetal-iso` and `baremetal-iso-e2e` are the canonical proof for `deployer -> installer -> nix-agent`. They cover phone-home, install-plan materialization, Disko, reboot, and desired-system activation. - `deployer-vm-smoke` is the smallest regression for the same `deployer -> nix-agent` boundary. It proves that a node can receive a prebuilt target closure and activate it without guest-side compilation. - `portable-control-plane-regressions` keeps the main non-KVM-safe boundaries under continuous coverage by composing `deployer-bootstrap-e2e`, `host-lifecycle-e2e`, `deployer-vm-smoke`, and `fleet-scheduler-e2e` behind the canonical profile eval guard. - `fresh-smoke` and `fresh-matrix` are the canonical proof for `deployer -> fleet-scheduler -> node-agent`. They cover native service placement, heartbeats, failover, and runtime reconciliation. - `fresh-smoke` also covers `k8shost` separately from `fleet-scheduler`: `k8shost` exposes tenant pod and service semantics, while `fleet-scheduler` handles bare-metal host services. The three `fresh-*` VM-cluster commands are the publishable nested-KVM suite. They require a Linux host with `/dev/kvm` and nested virtualization, and the harness stops at preflight by design when that device is absent. `single-node-quickstart`, `baremetal-iso`, `baremetal-iso-e2e`, `deployer-vm-smoke`, and `portable-control-plane-regressions` can run on TCG-only hosts, but they are slower without host KVM. Release-facing completion now requires both of these to be green on the same branch: - the canonical bare-metal proof: `nix run ./nix/test-cluster#cluster -- baremetal-iso` plus `nix build .#checks.x86_64-linux.baremetal-iso-e2e` - the publishable nested-KVM suite: `fresh-smoke`, `fresh-demo-vm-webapp`, and `fresh-matrix`, preferably through `./nix/test-cluster/run-publishable-kvm-suite.sh` ## Extended Measurements ```bash nix run ./nix/test-cluster#cluster -- fresh-bench-storage ``` `fresh-bench-storage` remains useful for storage regression tracking, but it is a benchmark path, not part of the minimal canonical publish gate. ## Operational Commands ```bash nix run ./nix/test-cluster#cluster -- status nix run ./nix/test-cluster#cluster -- logs node01 nix run ./nix/test-cluster#cluster -- ssh node04 nix run ./nix/test-cluster#cluster -- demo-vm-webapp nix run ./nix/test-cluster#cluster -- serve-vm-webapp nix run ./nix/test-cluster#cluster -- matrix nix run ./nix/test-cluster#cluster -- bench-storage nix run ./nix/test-cluster#cluster -- fresh-matrix nix run ./nix/test-cluster#cluster -- fresh-bench-storage nix run ./nix/test-cluster#cluster -- stop nix run ./nix/test-cluster#cluster -- clean ``` ## Validation Philosophy - package unit tests are useful but not sufficient - host-built VM clusters are the main integration signal - bootstrap and rollout paths must stay evaluable independently of the larger VM-hosting feature set - distributed storage and virtualization paths must be checked under failure, not only at steady state ## Legacy And Experimental Paths - `baremetal/vm-cluster` manual launch scripts are `legacy/manual`, not canonical validation - direct `nix develop ./nix/test-cluster -c ./nix/test-cluster/run-cluster.sh ...` usage is a debugging path, not the publishable entrypoint - `netboot-control-plane`, `netboot-worker`, `netboot-all-in-one`, `netboot-base`, `pxe-server`, and other helper images are internal or experimental building blocks, not supported profiles by themselves