# UltraCloud UltraCloud is a Nix-first cloud platform workspace that assembles a small control plane, network services, VM hosting, shared storage, object storage, and gateway services into one reproducible repository. The fastest public entrypoint is the one-command single-node quickstart. The canonical multi-node integration proof remains the six-node VM cluster under [`nix/test-cluster`](nix/test-cluster/README.md), which builds all guest images on the host, boots them as hardware-like QEMU nodes, and validates real multi-node behavior. The canonical bare-metal bootstrap proof is the ISO-on-QEMU path under [`nix/test-cluster`](nix/test-cluster/README.md), which drives phone-home, Disko install, reboot, and desired-system convergence for one control-plane node and one worker-equivalent node. ## Components - `chainfire`: replicated coordination store - `flaredb`: replicated KV and metadata store - `iam`: identity, token issuance, and authorization - `prismnet`: tenant networking control plane - `flashdns`: authoritative DNS service - `fiberlb`: load balancer control plane and dataplane - `plasmavmc`: VM control plane and worker agents - `coronafs`: shared filesystem for mutable VM volumes - `lightningstor`: object storage and VM image backing - `k8shost`: Kubernetes-style hosting control plane for tenant pods and services - `apigateway`: external API and proxy surface - `nightlight`: metrics ingestion and query service - `creditservice`: minimal reference quota/credit service - `deployer`: bootstrap and phone-home deployment service that owns install plans and desired-system intent - `fleet-scheduler`: non-Kubernetes service scheduler for bare-metal cluster services ## Quick Start Single-node quickstart: ```bash nix run .#single-node-quickstart ``` This app builds the minimal VM stack, boots a QEMU VM, waits for `chainfire`, `flaredb`, `iam`, `prismnet`, and `plasmavmc`, checks their health endpoints, and verifies the in-guest VM runtime prerequisites. For an interactive session, keep the VM running: ```bash ULTRACLOUD_QUICKSTART_KEEP_VM=1 nix run .#single-node-quickstart ``` The legacy name `.#all-in-one-quickstart` is kept as an alias. Portable local proof on hosts without `/dev/kvm`: ```bash nix build .#checks.x86_64-linux.canonical-profile-eval-guards nix build .#checks.x86_64-linux.portable-control-plane-regressions ``` This TCG-safe lane keeps canonical profile drift, the core `chainfire` / `deployer` control-plane path, the `deployer -> nix-agent` boundary, and the `fleet-scheduler -> node-agent` boundary under regression coverage without requiring nested virtualization. Publishable nested-KVM suite: ```bash nix develop nix run ./nix/test-cluster#cluster -- fresh-smoke nix run ./nix/test-cluster#cluster -- fresh-demo-vm-webapp nix run ./nix/test-cluster#cluster -- fresh-matrix ./nix/test-cluster/run-publishable-kvm-suite.sh ./work/publishable-kvm-suite ``` The repository-owned remote entrypoint for the same suite is [`.github/workflows/kvm-publishable-selfhosted.yml`](.github/workflows/kvm-publishable-selfhosted.yml). It runs the wrapper on Forgejo runners labeled `nix-host`, and those runners must expose `/dev/kvm` with nested virtualization enabled. Project-done release proof now requires both halves of the public validation surface to be green: - `baremetal-iso` and `baremetal-iso-e2e` for the canonical `deployer -> installer -> nix-agent` bare-metal bootstrap path - the KVM publishable suite (`fresh-smoke`, `fresh-demo-vm-webapp`, `fresh-matrix`) for the nested-KVM multi-node VM-hosting path Canonical bare-metal bootstrap proof: ```bash nix run ./nix/test-cluster#cluster -- baremetal-iso nix build .#checks.x86_64-linux.baremetal-iso-e2e ``` ## Canonical Profiles UltraCloud now fixes the public support surface to three canonical profiles: | Profile | Primary Nix outputs | Required components | Optional components | | --- | --- | --- | --- | | `single-node dev` | `nix run .#single-node-quickstart`, `nixosConfigurations.single-node-quickstart`, companion install image `nixosConfigurations.netboot-all-in-one` | `chainfire`, `flaredb`, `iam`, `plasmavmc`, `prismnet` | `lightningstor`, `coronafs`, `flashdns`, `fiberlb`, `apigateway`, `nightlight`, `creditservice`, `k8shost`, `deployer` | | `3-node HA control plane` | `nixosConfigurations.node01`, `node02`, `node03`, `netboot-control-plane` | `chainfire`, `flaredb`, `iam`, `nix-agent` on every control-plane node, plus `deployer` on the bootstrap node | `fleet-scheduler`, `node-agent`, `prismnet`, `flashdns`, `fiberlb`, `plasmavmc`, `lightningstor`, `coronafs`, `k8shost`, `apigateway`, `nightlight`, `creditservice` | | `bare-metal bootstrap` | `nixosConfigurations.ultracloud-iso`, `nixosConfigurations.baremetal-qemu-control-plane`, `nixosConfigurations.baremetal-qemu-worker`, `checks.x86_64-linux.baremetal-iso-e2e` | `deployer`, `first-boot-automation`, `install-target`, `nix-agent` | `netboot-control-plane`, `netboot-worker`, and `netboot-all-in-one` as experimental helper images, plus `node-agent`, `fleet-scheduler`, and higher-level storage or edge services after bootstrap | `netboot-base` is an internal helper image, not a public profile. `netboot-control-plane`, `netboot-worker`, and `netboot-all-in-one` remain experimental helper images until they implement the same phone-home and install semantics as the ISO path. Older launch flows under `baremetal/vm-cluster` are `legacy/manual`, not canonical. ## Responsibility Boundaries - `k8shost` owns Kubernetes-style pod and service APIs for tenant workloads, then translates them into `prismnet`, `flashdns`, and `fiberlb` objects. It does not place host-native cluster daemons. - `fleet-scheduler` owns placement and failover of host-native service instances from declarative cluster state. It consumes `node-agent` heartbeats and writes instance placement, but it does not expose tenant-facing Kubernetes semantics. - `deployer` owns machine enrollment, `/api/v1/phone-home`, install plans, cluster metadata, and desired-system references. It decides what a node should become, but it does not execute the host-local switch. - `nix-agent` owns host-local NixOS convergence only. It reads desired-system state from `deployer` or `chainfire`, activates the target closure, and rolls back on failed health checks. - `node-agent` owns host-local runtime execution only. It reports heartbeats and applies scheduled service-instance state, but it does not install the base OS or rewrite desired-system targets. ## Main Entrypoints - workspace flake: [flake.nix](flake.nix) - single-node quickstart smoke: [`nix run .#single-node-quickstart`](docs/testing.md) - portable local proof: [`nix build .#checks.x86_64-linux.portable-control-plane-regressions`](docs/testing.md) - canonical bare-metal bootstrap smoke: [`nix run ./nix/test-cluster#cluster -- baremetal-iso`](docs/testing.md) - canonical profile guards: [`nix build .#checks.x86_64-linux.canonical-profile-eval-guards`](docs/testing.md), [`nix build .#checks.x86_64-linux.canonical-profile-build-guards`](docs/testing.md) - VM validation harness: [nix/test-cluster/README.md](nix/test-cluster/README.md) - shared volume notes: [coronafs/README.md](coronafs/README.md) - minimal quota-service rationale: [creditservice/README.md](creditservice/README.md) - legacy/manual VM launch scripts: [baremetal/vm-cluster/README.md](baremetal/vm-cluster/README.md) ## Repository Guide - [docs/README.md](docs/README.md): documentation entrypoint - [docs/testing.md](docs/testing.md): validation path summary - [docs/component-matrix.md](docs/component-matrix.md): canonical profiles and optional bundles - [docs/storage-benchmarks.md](docs/storage-benchmarks.md): latest CoronaFS and LightningStor lab numbers - `plans/`: design notes and exploration documents ## Scope UltraCloud is centered on reproducible infrastructure behavior rather than polished end-user product surfaces. Some services, such as `creditservice`, are intentionally minimal reference implementations that prove integration points rather than full products. Host-level NixOS rollout validation is also expected to stay reproducible: `baremetal-iso-e2e` is now the full install-path proof, `canonical-profile-eval-guards` and `canonical-profile-build-guards` fail fast when supported outputs drift, and `portable-control-plane-regressions` is the non-KVM developer lane that keeps the main control-plane and rollout boundaries green on TCG-only hosts before the publishable nested-KVM suite is rerun.