Some checks failed
KVM Publishable Validation / publishable-kvm-suite (push) Failing after 6s
119 lines
8.4 KiB
Markdown
119 lines
8.4 KiB
Markdown
# UltraCloud
|
|
|
|
UltraCloud is a Nix-first cloud platform workspace that assembles a small control plane, network services, VM hosting, shared storage, object storage, and gateway services into one reproducible repository.
|
|
|
|
The fastest public entrypoint is the one-command single-node quickstart. The canonical multi-node integration proof remains the six-node VM cluster under [`nix/test-cluster`](nix/test-cluster/README.md), which builds all guest images on the host, boots them as hardware-like QEMU nodes, and validates real multi-node behavior.
|
|
The canonical bare-metal bootstrap proof is the ISO-on-QEMU path under [`nix/test-cluster`](nix/test-cluster/README.md), which drives phone-home, Disko install, reboot, and desired-system convergence for one control-plane node and one worker-equivalent node.
|
|
|
|
## Components
|
|
|
|
- `chainfire`: replicated coordination store
|
|
- `flaredb`: replicated KV and metadata store
|
|
- `iam`: identity, token issuance, and authorization
|
|
- `prismnet`: tenant networking control plane
|
|
- `flashdns`: authoritative DNS service
|
|
- `fiberlb`: load balancer control plane and dataplane
|
|
- `plasmavmc`: VM control plane and worker agents
|
|
- `coronafs`: shared filesystem for mutable VM volumes
|
|
- `lightningstor`: object storage and VM image backing
|
|
- `k8shost`: Kubernetes-style hosting control plane for tenant pods and services
|
|
- `apigateway`: external API and proxy surface
|
|
- `nightlight`: metrics ingestion and query service
|
|
- `creditservice`: minimal reference quota/credit service
|
|
- `deployer`: bootstrap and phone-home deployment service that owns install plans and desired-system intent
|
|
- `fleet-scheduler`: non-Kubernetes service scheduler for bare-metal cluster services
|
|
|
|
## Quick Start
|
|
|
|
Single-node quickstart:
|
|
|
|
```bash
|
|
nix run .#single-node-quickstart
|
|
```
|
|
|
|
This app builds the minimal VM stack, boots a QEMU VM, waits for `chainfire`, `flaredb`, `iam`, `prismnet`, and `plasmavmc`, checks their health endpoints, and verifies the in-guest VM runtime prerequisites. For an interactive session, keep the VM running:
|
|
|
|
```bash
|
|
ULTRACLOUD_QUICKSTART_KEEP_VM=1 nix run .#single-node-quickstart
|
|
```
|
|
|
|
The legacy name `.#all-in-one-quickstart` is kept as an alias.
|
|
|
|
Portable local proof on hosts without `/dev/kvm`:
|
|
|
|
```bash
|
|
nix build .#checks.x86_64-linux.canonical-profile-eval-guards
|
|
nix build .#checks.x86_64-linux.portable-control-plane-regressions
|
|
```
|
|
|
|
This TCG-safe lane keeps canonical profile drift, the core `chainfire` / `deployer` control-plane path, the `deployer -> nix-agent` boundary, and the `fleet-scheduler -> node-agent` boundary under regression coverage without requiring nested virtualization.
|
|
|
|
Publishable nested-KVM suite:
|
|
|
|
```bash
|
|
nix develop
|
|
nix run ./nix/test-cluster#cluster -- fresh-smoke
|
|
nix run ./nix/test-cluster#cluster -- fresh-demo-vm-webapp
|
|
nix run ./nix/test-cluster#cluster -- fresh-matrix
|
|
./nix/test-cluster/run-publishable-kvm-suite.sh ./work/publishable-kvm-suite
|
|
```
|
|
|
|
The repository-owned remote entrypoint for the same suite is [`.github/workflows/kvm-publishable-selfhosted.yml`](.github/workflows/kvm-publishable-selfhosted.yml). It runs the wrapper on Forgejo runners labeled `nix-host`, and those runners must expose `/dev/kvm` with nested virtualization enabled.
|
|
|
|
Project-done release proof now requires both halves of the public validation surface to be green:
|
|
|
|
- `baremetal-iso` and `baremetal-iso-e2e` for the canonical `deployer -> installer -> nix-agent` bare-metal bootstrap path
|
|
- the KVM publishable suite (`fresh-smoke`, `fresh-demo-vm-webapp`, `fresh-matrix`) for the nested-KVM multi-node VM-hosting path
|
|
|
|
Canonical bare-metal bootstrap proof:
|
|
|
|
```bash
|
|
nix run ./nix/test-cluster#cluster -- baremetal-iso
|
|
nix build .#checks.x86_64-linux.baremetal-iso-e2e
|
|
```
|
|
|
|
## Canonical Profiles
|
|
|
|
UltraCloud now fixes the public support surface to three canonical profiles:
|
|
|
|
| Profile | Primary Nix outputs | Required components | Optional components |
|
|
| --- | --- | --- | --- |
|
|
| `single-node dev` | `nix run .#single-node-quickstart`, `nixosConfigurations.single-node-quickstart`, companion install image `nixosConfigurations.netboot-all-in-one` | `chainfire`, `flaredb`, `iam`, `plasmavmc`, `prismnet` | `lightningstor`, `coronafs`, `flashdns`, `fiberlb`, `apigateway`, `nightlight`, `creditservice`, `k8shost`, `deployer` |
|
|
| `3-node HA control plane` | `nixosConfigurations.node01`, `node02`, `node03`, `netboot-control-plane` | `chainfire`, `flaredb`, `iam`, `nix-agent` on every control-plane node, plus `deployer` on the bootstrap node | `fleet-scheduler`, `node-agent`, `prismnet`, `flashdns`, `fiberlb`, `plasmavmc`, `lightningstor`, `coronafs`, `k8shost`, `apigateway`, `nightlight`, `creditservice` |
|
|
| `bare-metal bootstrap` | `nixosConfigurations.ultracloud-iso`, `nixosConfigurations.baremetal-qemu-control-plane`, `nixosConfigurations.baremetal-qemu-worker`, `checks.x86_64-linux.baremetal-iso-e2e` | `deployer`, `first-boot-automation`, `install-target`, `nix-agent` | `netboot-control-plane`, `netboot-worker`, and `netboot-all-in-one` as experimental helper images, plus `node-agent`, `fleet-scheduler`, and higher-level storage or edge services after bootstrap |
|
|
|
|
`netboot-base` is an internal helper image, not a public profile. `netboot-control-plane`, `netboot-worker`, and `netboot-all-in-one` remain experimental helper images until they implement the same phone-home and install semantics as the ISO path. Older launch flows under `baremetal/vm-cluster` are `legacy/manual`, not canonical.
|
|
|
|
## Responsibility Boundaries
|
|
|
|
- `k8shost` owns Kubernetes-style pod and service APIs for tenant workloads, then translates them into `prismnet`, `flashdns`, and `fiberlb` objects. It does not place host-native cluster daemons.
|
|
- `fleet-scheduler` owns placement and failover of host-native service instances from declarative cluster state. It consumes `node-agent` heartbeats and writes instance placement, but it does not expose tenant-facing Kubernetes semantics.
|
|
- `deployer` owns machine enrollment, `/api/v1/phone-home`, install plans, cluster metadata, and desired-system references. It decides what a node should become, but it does not execute the host-local switch.
|
|
- `nix-agent` owns host-local NixOS convergence only. It reads desired-system state from `deployer` or `chainfire`, activates the target closure, and rolls back on failed health checks.
|
|
- `node-agent` owns host-local runtime execution only. It reports heartbeats and applies scheduled service-instance state, but it does not install the base OS or rewrite desired-system targets.
|
|
|
|
## Main Entrypoints
|
|
|
|
- workspace flake: [flake.nix](flake.nix)
|
|
- single-node quickstart smoke: [`nix run .#single-node-quickstart`](docs/testing.md)
|
|
- portable local proof: [`nix build .#checks.x86_64-linux.portable-control-plane-regressions`](docs/testing.md)
|
|
- canonical bare-metal bootstrap smoke: [`nix run ./nix/test-cluster#cluster -- baremetal-iso`](docs/testing.md)
|
|
- canonical profile guards: [`nix build .#checks.x86_64-linux.canonical-profile-eval-guards`](docs/testing.md), [`nix build .#checks.x86_64-linux.canonical-profile-build-guards`](docs/testing.md)
|
|
- VM validation harness: [nix/test-cluster/README.md](nix/test-cluster/README.md)
|
|
- shared volume notes: [coronafs/README.md](coronafs/README.md)
|
|
- minimal quota-service rationale: [creditservice/README.md](creditservice/README.md)
|
|
- legacy/manual VM launch scripts: [baremetal/vm-cluster/README.md](baremetal/vm-cluster/README.md)
|
|
|
|
## Repository Guide
|
|
|
|
- [docs/README.md](docs/README.md): documentation entrypoint
|
|
- [docs/testing.md](docs/testing.md): validation path summary
|
|
- [docs/component-matrix.md](docs/component-matrix.md): canonical profiles and optional bundles
|
|
- [docs/storage-benchmarks.md](docs/storage-benchmarks.md): latest CoronaFS and LightningStor lab numbers
|
|
- `plans/`: design notes and exploration documents
|
|
|
|
## Scope
|
|
|
|
UltraCloud is centered on reproducible infrastructure behavior rather than polished end-user product surfaces. Some services, such as `creditservice`, are intentionally minimal reference implementations that prove integration points rather than full products.
|
|
|
|
Host-level NixOS rollout validation is also expected to stay reproducible: `baremetal-iso-e2e` is now the full install-path proof, `canonical-profile-eval-guards` and `canonical-profile-build-guards` fail fast when supported outputs drift, and `portable-control-plane-regressions` is the non-KVM developer lane that keeps the main control-plane and rollout boundaries green on TCG-only hosts before the publishable nested-KVM suite is rerun.
|