photoncloud-monorepo/README.md

29 KiB

UltraCloud

UltraCloud is a Nix-first cloud platform workspace that assembles a small control plane, network services, VM hosting, shared storage, object storage, and gateway services into one reproducible repository.

The fastest public entrypoint is the one-command single-node quickstart. The 3-node HA control plane profile lives in nixosConfigurations.node01, nixosConfigurations.node02, and nixosConfigurations.node03; the six-node VM cluster under nix/test-cluster is the publishable harness that extends that HA baseline with worker and optional service bundles on host-built QEMU guests. The canonical bare-metal bootstrap proof is the ISO-on-QEMU path under nix/test-cluster, which drives phone-home, Disko install, reboot, and desired-system convergence for one control-plane node and one worker-equivalent node.

Components

  • chainfire: replicated coordination store
  • flaredb: replicated KV and metadata store
  • iam: identity, token issuance, and authorization
  • prismnet: tenant networking control plane
  • flashdns: authoritative DNS service
  • fiberlb: load balancer control plane and dataplane
  • plasmavmc: VM control plane and worker agents
  • coronafs: shared filesystem for mutable VM volumes
  • lightningstor: object storage and VM image backing
  • k8shost: Kubernetes-style hosting control plane for tenant pods and services
  • apigateway: external API and proxy surface
  • nightlight: metrics ingestion and query service
  • creditservice: quota, reservation, and admission-control service
  • deployer: bootstrap and phone-home deployment service that owns install plans and desired-system intent
  • fleet-scheduler: non-Kubernetes service scheduler for bare-metal cluster services

Core API Notes

  • chainfire ships a fixed-membership cluster API on the supported surface. Public cluster management is MemberList plus Status, and the internal Raft transport surface is Vote plus AppendEntries. chainfire-core is workspace-internal only; the old embeddable builder and distributed-KV scaffold are not part of the supported product contract.
  • flaredb ships SQL on both gRPC and REST. The supported REST SQL surface is POST /api/v1/sql for statement execution and GET /api/v1/tables for table discovery, alongside the existing KV and scan endpoints.
  • plasmavmc ships a KVM-only public VM backend contract. The supported create and recovery surface is the KVM path exercised in single-node-quickstart, fresh-smoke, and fresh-matrix; Firecracker and mvisor remain archived non-product backends outside the supported surface until they have real tenant-network coverage.
  • lightningstor keeps its optional gRPC surface live: bucket versioning, bucket policy, bucket tagging, and explicit object version listing are part of the supported contract for the canonical optional bundle.
  • fiberlb backend Https health checks currently do not verify backend TLS certificates. Supported scope is limited to TCP reachability plus HTTP status for the backend endpoint until CA-aware verification is wired through config, server code, and the canonical harness.
  • k8shost keeps WatchPods on the supported surface as a bounded snapshot stream for the current matching pod set. The published contract is the tenant workload API, not a separate long-lived controller event bus.
  • k8shost is fixed as an API/control-plane product surface; runtime dataplane helpers stay archived non-product until they have their own published contract and proof.
  • k8shost-cni, k8shost-controllers, lightningstor-csi, nixosConfigurations.netboot-worker, and the older scripts under baremetal/vm-cluster are archived internal scaffolds or legacy/manual debugging paths outside the supported surface.

Core Control Plane Operations

The control-plane operator contract is fixed in docs/control-plane-ops.md.

  • ChainFire dynamic membership, replace-node, and scale-out are unsupported on the supported surface; the supported operator path is fixed-membership restore or whole-cluster replacement backed by the durability-proof backup/restore baseline.
  • FlareDB online migration and schema evolution must start from the durability-proof backup/restore baseline and stay additive-first until a later destructive cleanup window. FlareDB destructive DDL and fully automated online migration remain outside the supported product contract for this release.
  • IAM bootstrap hardening requires an explicit admin token, an explicit signing key, and a 32-byte IAM_CRED_MASTER_KEY. Signing-key rotation, credential overlap-and-revoke rotation, and mTLS overlap-and-cutover rotation are part of the supported operator contract; multi-node IAM failover remains outside the supported product contract. The standalone proof is ./nix/test-cluster/run-core-control-plane-ops-proof.sh.

Edge And Trial Surface

The edge-bundle and trial-surface contract is fixed in docs/edge-trial-surface.md.

  • APIGateway is supported as stateless replicated instances behind an external L4 or VIP layer; live in-process reload is not part of the product contract.
  • NightLight is supported as a single-node WAL/snapshot service; replicated HA metrics storage is not part of the product contract.
  • CreditService export and backend migration are supported as offline export/import or backend-native snapshot workflows, not live mixed-writer migration.
  • OCI/Docker artifact is intentionally not the public trial surface.
  • Use ./nix/test-cluster/work-root-budget.sh status for disk budget, GC, and cleanup guidance, ./nix/test-cluster/work-root-budget.sh enforce for a stronger local budget gate, and ./nix/test-cluster/work-root-budget.sh prune-proof-logs 2 for safer dated-proof cleanup.

Quick Start

Single-node quickstart:

nix run .#single-node-quickstart

This app is also the automated smoke check for the smallest realistic trial surface. It builds the minimal VM stack, boots a QEMU VM, waits for chainfire, flaredb, iam, prismnet, and plasmavmc, checks their health endpoints, and verifies the in-guest VM runtime prerequisites. For an interactive session, keep the VM running:

ULTRACLOUD_QUICKSTART_KEEP_VM=1 nix run .#single-node-quickstart

Buildable trial artifact:

nix build .#single-node-trial-vm
nix run .#single-node-trial

single-node-trial-vm is the lightest supported artifact for local use: a host-built NixOS VM appliance for the VM-platform core. OCI/Docker artifact is intentionally not the public trial surface here, because the supported scope needs a guest kernel plus host KVM, /dev/net/tun, and OVS/libvirt semantics. A privileged container would be host-coupled and would not prove the same contract.

The legacy name .#all-in-one-quickstart is kept as an alias, and .#single-node-trial is a friendlier alias for the same smoke launcher.

Portable local proof on hosts without /dev/kvm:

nix build .#checks.x86_64-linux.canonical-profile-eval-guards
nix build .#checks.x86_64-linux.portable-control-plane-regressions

This TCG-safe lane keeps canonical profile drift, the core chainfire / deployer control-plane path, the deployer -> nix-agent boundary, and the fleet-scheduler -> node-agent boundary under regression coverage without requiring nested virtualization.

Publishable nested-KVM suite:

nix develop
nix run ./nix/test-cluster#cluster -- fresh-smoke
nix run ./nix/test-cluster#cluster -- fresh-demo-vm-webapp
nix run ./nix/test-cluster#cluster -- fresh-matrix
./nix/test-cluster/run-publishable-kvm-suite.sh ./work/publishable-kvm-suite

The checked-in entrypoint for the publishable nested-KVM suite is the local wrapper ./nix/test-cluster/run-publishable-kvm-suite.sh. Runner-specific workflow wiring from task/f5c70db0-baseline-profiles is intentionally not part of this re-aggregated baseline. For the full supported-surface proof on a local AMD/KVM host, use ./nix/test-cluster/run-supported-surface-final-proof.sh ./work/final-proofs/latest; it keeps builders local, builds single-node-trial-vm, runs single-node-quickstart, and captures the publishable KVM suite logs in one place. nix run ./nix/test-cluster#cluster -- durability-proof is the canonical chainfire flaredb deployer backup/restore lane. It persists artifacts under ./work/durability-proof/latest, proves logical backup/restore for ChainFire keys and FlareDB SQL rows, uses the canonical Deployer admin pre-register request itself as the backup artifact, verifies that the pre-registered node survives a deployer.service restart, replays the same request idempotently, and injects CoronaFS plus LightningStor failures against the same live KVM cluster. nix run ./nix/test-cluster#cluster -- rollout-soak is the longer-running control-plane and rollout companion lane. It rebuilds from clean local KVM runtime state, persists artifacts under ./work/rollout-soak/latest, validates exactly one planned draining maintenance cycle and one fail-stop worker-loss cycle on the two native-runtime workers, holds each degraded state for the configured soak window, then restarts deployer, fleet-scheduler, node-agent, chainfire, and flaredb before revalidating the cluster. The soak root also carries explicit scope markers so the supported boundary is encoded in the proof artifacts rather than only in docs. The steady-state KVM nodes do not run nix-agent.service, so the soak lane records explicit nix-agent scope markers instead of pretending a live-cluster nix-agent restart happened. nix run ./nix/test-cluster#cluster -- provider-vm-reality-proof is the focused local-KVM reality lane for the provider and VM-hosting bundles. It stores artifacts under ./work/provider-vm-reality-proof/latest, captures authoritative FlashDNS answers, FiberLB backend drain and restore evidence, and PlasmaVMC KVM shared-storage migration plus post-migration restart state. The 2026-04-10 local AMD/KVM proof logs are in ./work/final-proofs/32f64c10-1b74-4d8a-8d7d-b2cc6bf6b4f0-final for supported-surface-guard, single-node-trial-vm, and single-node-quickstart, and in ./work/publishable-kvm-suite for the final passing fresh-smoke, fresh-demo-vm-webapp, and fresh-matrix run through ./nix/test-cluster/run-publishable-kvm-suite.sh. The exact bare-metal check-runner proof from 2026-04-10 is in ./work/baremetal-iso-e2e/0de75570-dabd-471b-95fe-5898c54e2e8c; its outer environment.txt records execution_model=materialized-check-runner, and state/environment.txt records vm_accelerator_mode=kvm. The 2026-04-10 durability and failure-injection proof logs are in ./work/durability-proof/20260410T120618+0900; result.json records success=true, deployer_restore_mode="admin pre-register request replay with pre/post-restart list verification", and the artifact set includes chainfire-backup-response.json, flaredb-restored.json, deployer-post-restart-list.json, coronafs-node04-local-state.json, and lightningstor-head-during-node05-outage.json. The 2026-04-10 longer-running rollout and control-plane soak is in ./work/rollout-soak/20260410T164549+0900; result.json records success=true, fleet_supported_native_runtime_nodes=2, validated_maintenance_cycles=1, validated_power_loss_cycles=1, and soak_hold_secs=30, while the artifact set includes maintenance-held.json, power-loss-held.json, deployer-post-restart-nodes.json, chainfire-post-restart-put.json, flaredb-post-restart.json, scope-fixed-contract.json, deployer-scope-fixed.txt, fleet-scheduler-scope-fixed.txt, and the node01-nix-agent-scope.txt / node04-nix-agent-scope.txt boundary markers. The 2026-04-10 provider and VM-hosting reality proof logs are in ./work/provider-vm-reality-proof/20260410T135827+0900; result.json records success=true, and the artifact set includes network-provider/fiberlb-drain-summary.txt, network-provider/flashdns-service-authoritative-answer.txt, vm-hosting/migration-summary.json, and vm-hosting/root-volume-after-post-migration-restart.json. Physical-node bring-up now has a canonical preflight wrapper as well: nix run ./nix/test-cluster#hardware-smoke -- preflight. It writes kernel-params.txt, expected markers, failure markers, and a machine-readable blocked or ready state under ./work/hardware-smoke/latest, and the same entrypoint can later be rerun as run or capture when USB or BMC/Redfish transport is actually present.

Within that suite, fresh-matrix is the public provider-bundle proof: it exercises PrismNet VPC/subnet/port flows plus security-group ACL add/remove, FlashDNS record publication, and FiberLB TCP plus TLS-terminated Https / TerminatedHttps listeners in one tenant-scoped composition run. The published FiberLB L4 algorithms are kept honest with targeted server unit tests in-tree. provider-vm-reality-proof is the artifact-producing companion lane for the same bundle and for the VM-hosting path. PrismNet real OVS/OVN dataplane validation remains outside the supported local KVM surface. FiberLB native BGP or BFD peer interop plus hardware VIP ownership also remain outside the supported local KVM surface. PlasmaVMC real-hardware migration or storage handoff remains a later hardware proof; the current local-KVM proof fixes the release surface to KVM shared-storage migration on the worker pair.

Project-done release proof now requires both halves of the public validation surface to be green:

  • baremetal-iso and baremetal-iso-e2e for the canonical deployer -> installer -> nix-agent bare-metal bootstrap path
  • the KVM publishable suite (fresh-smoke, fresh-demo-vm-webapp, fresh-matrix) for the nested-KVM multi-node VM-hosting path

Canonical bare-metal bootstrap proof:

nix run ./nix/test-cluster#cluster -- baremetal-iso
nix build .#checks.x86_64-linux.baremetal-iso-e2e
./result/bin/baremetal-iso-e2e ./work/baremetal-iso-e2e/latest

baremetal-iso-e2e now materializes the exact local-KVM proof runner instead of trying to boot QEMU inside a sandboxed nixbld build. That older build-time execution model degraded to TCG; the built runner keeps the canonical attr name but executes the same verify-baremetal-iso.sh harness as the direct QEMU proof, with host KVM and persistent logs under ./work.

The QEMU ISO proof is a stand-in for the real install route, not a separate workflow. Build nixosConfigurations.ultracloud-iso, boot it under KVM locally or write the same ISO to USB or BMC virtual media on hardware, and pass the same bootstrap inputs that the installer consumes in the harness: ultracloud.deployer_url=<scheme://host:port>, ultracloud.bootstrap_token=<token> for authenticated bootstrap or a lab-only deployer configured with allow_unauthenticated=true, optional ultracloud.ca_cert_url=<https://.../ca.crt>, optional ultracloud.binary_cache_url=<http://cache:8090>, and optional ultracloud.node_id= / ultracloud.hostname= overrides when DMI serials or DHCP names are not the desired identity.

The networking contract is the same in QEMU and on hardware: the live ISO needs DHCP or equivalent L3 reachability to deployer before Disko starts, and it needs reachability to the optional binary cache if you want it to pull prebuilt closures instead of compiling locally. The local QEMU proof relies on the 10.0.2.2 fallback addresses from user-mode NAT; real hardware should set ultracloud.deployer_url and, when used, ultracloud.binary_cache_url to routable control-plane endpoints. USB media and BMC virtual media are only transport differences for the same ISO and kernel parameters. For the local proof keep ./work or ULTRACLOUD_WORK_ROOT on a large disk; the checked-in wrappers force local builders and derive Nix parallelism from the host CPU count unless you override it explicitly.

Canonical hardware preflight and handoff for the same path:

nix run ./nix/test-cluster#hardware-smoke -- preflight
nix run ./nix/test-cluster#hardware-smoke -- run
nix run ./nix/test-cluster#hardware-smoke -- capture

That wrapper keeps the QEMU proof and the physical-node proof on one contract by writing the exact kernel parameters, expected ULTRACLOUD_MARKER sequence, failure markers, and artifact root under ./work/hardware-smoke/latest.

Canonical hardware handoff for that path:

  1. Build nixosConfigurations.ultracloud-iso plus the target role configs (baremetal-qemu-control-plane, baremetal-qemu-worker, or their hardware-specific successors) and expose deployer plus an optional HTTP Nix cache on addresses the installer can reach.
  2. Publish cluster state so that the reusable node class owns the install contract: install_plan.nixos_configuration, install_plan.disko_config_path, and preferably install_plan.target_disk_by_id. Node entries should only bind identity, pool, and any desired-system override that truly differs per host. When you expose a binary cache, prefer setting desired_system.target_system to the prebuilt class-owned closure as well so post-install convergence does not rebuild a dirty local variant on each node.
  3. Boot the same ISO through USB or BMC virtual media and pass ultracloud.deployer_url=..., ultracloud.bootstrap_token=..., and, when used, ultracloud.binary_cache_url=... on the kernel command line.
  4. Watch the canonical marker sequence from the installer journal: pre-install.boot, pre-install.phone-home.complete, install.bundle-downloaded, install.disko.complete, install.nixos-install.complete, reboot, post-install.boot.
  5. Treat nix-agent reporting the desired system as active as the final convergence gate. The QEMU harness proves the same sequence, only with virtio disks and host-local endpoints standing in for the real chassis.

The checked-in QEMU proof now mirrors the disk-selection contract that hardware should use. Its node classes install by stable /dev/disk/by-id/virtio-uc-control-root and /dev/disk/by-id/virtio-uc-worker-root selectors, backed by explicit QEMU disk serials, while the ISO resolves the prebuilt Disko script and target system from the install profile name embedded into the ISO. Hardware should keep the same class/profile structure and swap only the disk selector, routable URLs, and physical media transport.

Canonical Profiles

UltraCloud now fixes the public support surface to three canonical profiles:

Profile Canonical entrypoints Required components Optional components
single-node dev nix run .#single-node-quickstart, nix run .#single-node-trial, nix build .#single-node-trial-vm, nixosConfigurations.single-node-quickstart, companion install image nixosConfigurations.netboot-all-in-one chainfire, flaredb, iam, plasmavmc, prismnet lightningstor, coronafs, flashdns, fiberlb, apigateway, nightlight, creditservice, k8shost
3-node HA control plane nixosConfigurations.node01, nixosConfigurations.node02, nixosConfigurations.node03, companion install image nixosConfigurations.netboot-control-plane chainfire, flaredb, iam, nix-agent on every control-plane node, plus deployer on the bootstrap node fleet-scheduler, node-agent, prismnet, flashdns, fiberlb, plasmavmc, lightningstor, coronafs, k8shost, apigateway, nightlight, creditservice
bare-metal bootstrap nix run ./nix/test-cluster#cluster -- baremetal-iso, nixosConfigurations.ultracloud-iso, nixosConfigurations.baremetal-qemu-control-plane, nixosConfigurations.baremetal-qemu-worker, checks.x86_64-linux.baremetal-iso-e2e deployer, first-boot-automation, install-target, nix-agent node-agent, fleet-scheduler, and higher-level storage or edge services after bootstrap

nixosConfigurations.netboot-all-in-one and nixosConfigurations.netboot-control-plane are canonical companion images for the supported single-node dev and 3-node HA control plane profiles. packages.single-node-trial-vm is the low-friction trial artifact for the minimal VM-platform core. nixosConfigurations.netboot-worker, netboot-base, pxe-server, vm-smoke-target, and older launch flows under baremetal/vm-cluster are archived helpers or legacy/manual debugging paths outside the canonical profiles and their guard set.

Cluster Authoring

ultracloud.cluster backed by nix/lib/cluster-schema.nix is the only supported cluster authoring source. It is the canonical place to define nodes, reusable deployer classes and pools, rollout objects, service placement intent, and the generated per-node bootstrap metadata consumed by deployer, fleet-scheduler, nix-agent, and node-agent.

nix-nos is limited to legacy compatibility and low-level network primitives such as interfaces, VLANs, BGP, and static routing. It is not the canonical source for cluster topology, rollout intent, scheduler state, or bootstrap inventory.

Responsibility Boundaries

  • plasmavmc owns tenant VM lifecycle plus KVM worker registration. It can run against explicit remote IAM, PrismNet, and FlareDB endpoints, but it does not own machine enrollment, desired-system rollout, or host-native service placement.
  • k8shost owns Kubernetes-style pod and service APIs for tenant workloads, then translates them into prismnet, flashdns, and fiberlb objects. It does not place host-native cluster daemons, and its runtime dataplane helpers remain archived non-product.
  • fleet-scheduler owns placement and failover of host-native service instances from declarative cluster state derived from ultracloud.cluster. It consumes node-agent heartbeats and writes instance placement, but it does not expose tenant-facing Kubernetes semantics.
  • deployer owns machine enrollment, /api/v1/phone-home, install plans, cluster metadata, and desired-system references. The supported declarative input for that state is the JSON generated from ultracloud.cluster; it decides what a node should become, but it does not execute the host-local switch.
  • nix-agent owns host-local NixOS convergence only. It reads desired-system state from deployer or chainfire, activates the target closure, and rolls back on failed health checks.
  • node-agent owns host-local runtime execution only. It reports heartbeats and applies scheduled service-instance state, but it does not install the base OS or rewrite desired-system targets.

The single-node quickstart deliberately stops below that rollout stack: it ships only the VM-platform core plus optional add-ons, not deployer, nix-agent, node-agent, or fleet-scheduler.

Standalone Stories

  • single-node-trial-vm and single-node-quickstart are the standalone VM-platform story. They keep the minimal KVM-backed VM surface light and intentionally exclude deployer, nix-agent, fleet-scheduler, and node-agent.
  • deployer-vm-smoke, portable-control-plane-regressions, and baremetal-iso are the standalone rollout-stack story. They validate deployer -> nix-agent and deployer -> fleet-scheduler -> node-agent without requiring the full VM-hosting bundle.

Rollout Bundle Operations

The rollout-bundle operator contract is fixed in docs/rollout-bundle.md. As of 2026-04-10 the supported deployer recovery model is scope-fixed to one active writer plus optional cold-standby restore that reuses the same ChainFire namespace, credentials, bootstrap bundle, and local state backup. deployer is scope-fixed to one active writer plus optional cold-standby restore; automatic ChainFire-backed multi-instance failover is outside the supported product contract for this release.

The same operator doc also fixes the nix-agent health-check and rollback contract, the node-agent logs/secrets/volume/upgrade contract, and the fleet-scheduler supported upper limit: the two native-runtime worker lab with one planned drain cycle, one fail-stop worker-loss cycle, and 30-second held degraded states in rollout-soak. fleet-scheduler is scope-fixed to the two native-runtime worker lab with one planned drain cycle, one fail-stop worker-loss cycle, and 30-second held degraded states in rollout-soak. The canonical proofs are nix build .#checks.x86_64-linux.deployer-vm-rollback, nix build .#checks.x86_64-linux.fleet-scheduler-e2e, nix build .#checks.x86_64-linux.portable-control-plane-regressions, nix run ./nix/test-cluster#cluster -- fresh-smoke, nix run ./nix/test-cluster#cluster -- rollout-soak, and nix run ./nix/test-cluster#cluster -- durability-proof.

Main Entrypoints

Repository Guide

Scope

UltraCloud is centered on reproducible infrastructure behavior. Optional add-ons such as creditservice and k8shost remain part of the supported surface only when the documented scope, harness coverage, and public contract stay aligned with what the repository actually ships.

Host-level NixOS rollout validation is also expected to stay reproducible: baremetal-iso-e2e is now the materialized exact proof runner for the full install path, canonical-profile-eval-guards and canonical-profile-build-guards fail fast when supported outputs drift, supported-surface-guard now rejects unfinished public wording, shipped server API stubs, high-signal completeness markers such as TODO: or best-effort in the supported network or backend servers, and archived helper regressions such as worker netboot or backend scaffolds re-entering the default product surface, while portable-control-plane-regressions remains the non-KVM developer lane that keeps the main control-plane and rollout boundaries green on TCG-only hosts before the publishable nested-KVM suite is rerun.