photoncloud-monorepo/nix/test-cluster
2026-03-31 23:15:04 +09:00
..
common.nix Tighten cluster builds and add record-set DNS publication 2026-03-31 10:03:40 +09:00
flake.lock WIP snapshot: preserve dirty worktree 2026-03-20 16:25:11 +09:00
flake.nix Implement host lifecycle orchestration and distributed storage restructuring 2026-03-27 12:14:12 +09:00
nightlight_remote_write.py WIP snapshot: preserve dirty worktree 2026-03-20 16:25:11 +09:00
node01.nix Tighten cluster builds and add record-set DNS publication 2026-03-31 10:03:40 +09:00
node02.nix Tighten cluster builds and add record-set DNS publication 2026-03-31 10:03:40 +09:00
node03.nix Tighten cluster builds and add record-set DNS publication 2026-03-31 10:03:40 +09:00
node04.nix Implement host lifecycle orchestration and distributed storage restructuring 2026-03-27 12:14:12 +09:00
node05.nix Implement host lifecycle orchestration and distributed storage restructuring 2026-03-27 12:14:12 +09:00
node06.nix Expand gateway matrix coverage and fix test-cluster routing 2026-03-27 22:51:37 +09:00
README.md Move native runtime seed state into declarative Nix 2026-03-28 07:38:45 +09:00
run-cluster.sh Implement K8sHost deployment reconciliation 2026-03-31 23:15:04 +09:00
storage-node01.nix Tighten cluster builds and add record-set DNS publication 2026-03-31 10:03:40 +09:00
storage-node02.nix Tighten cluster builds and add record-set DNS publication 2026-03-31 10:03:40 +09:00
storage-node03.nix Tighten cluster builds and add record-set DNS publication 2026-03-31 10:03:40 +09:00
storage-node04.nix Implement host lifecycle orchestration and distributed storage restructuring 2026-03-27 12:14:12 +09:00
storage-node05.nix Implement host lifecycle orchestration and distributed storage restructuring 2026-03-27 12:14:12 +09:00
vm-bench-guest-image.nix Implement host lifecycle orchestration and distributed storage restructuring 2026-03-27 12:14:12 +09:00
vm-guest-image.nix WIP snapshot: preserve dirty worktree 2026-03-20 16:25:11 +09:00

PhotonCloud VM Test Cluster

nix/test-cluster is the canonical local validation path for PhotonCloud. It boots six QEMU VMs, treats them as hardware-like nodes, and validates representative control-plane, worker, and gateway behavior over SSH and service endpoints. All VM images are built on the host in a single Nix invocation and then booted as prebuilt artifacts. The guests do not compile the stack locally.

What it validates

  • 3-node control-plane formation for chainfire, flaredb, and iam
  • control-plane service health for prismnet, flashdns, fiberlb, plasmavmc, lightningstor, and k8shost
  • worker-node plasmavmc and lightningstor startup
  • PrismNet port binding for PlasmaVMC guests, including lifecycle cleanup on VM deletion
  • nested KVM inside worker VMs by booting an inner guest with qemu-system-x86_64 -accel kvm
  • gateway-node apigateway, nightlight, and minimal creditservice startup
  • host-forwarded access to the API gateway and NightLight HTTP surfaces
  • cross-node data replication smoke tests for chainfire and flaredb
  • deployer-seeded native runtime scheduling from declarative Nix service definitions, including drain/failover recovery

Validation layers

  • image build: build all six VM derivations on the host in one nix build
  • boot and unit readiness: boot the nodes in dependency order and wait for SSH plus the expected systemd units
  • protocol surfaces: probe the expected HTTP, TCP, UDP, and metrics endpoints for each role
  • replicated state: write and read convergence checks across the 3-node chainfire and flaredb clusters
  • worker virtualization: launch a nested KVM guest inside both worker VMs
  • external entrypoints: verify host-forwarded API gateway and NightLight access from outside the guest
  • auth-integrated minimal services: confirm creditservice stays up and actually connects to IAM

Requirements

  • minimal host requirements:
    • Linux host with /dev/kvm
    • nested virtualization enabled on the host hypervisor
    • nix
  • if you do not use nix run or nix develop, install:
    • qemu-system-x86_64
    • ssh
    • sshpass
    • curl

Main commands

nix run ./nix/test-cluster#cluster -- build
nix run ./nix/test-cluster#cluster -- start
nix run ./nix/test-cluster#cluster -- smoke
nix run ./nix/test-cluster#cluster -- fresh-smoke
nix run ./nix/test-cluster#cluster -- matrix
nix run ./nix/test-cluster#cluster -- fresh-matrix
nix run ./nix/test-cluster#cluster -- bench-storage
nix run ./nix/test-cluster#cluster -- fresh-bench-storage
nix run ./nix/test-cluster#cluster -- validate
nix run ./nix/test-cluster#cluster -- status
nix run ./nix/test-cluster#cluster -- ssh node04
nix run ./nix/test-cluster#cluster -- stop
nix run ./nix/test-cluster#cluster -- clean
make cluster-smoke

Preferred entrypoint for publishable verification: nix run ./nix/test-cluster#cluster -- fresh-smoke

make cluster-smoke is a convenience wrapper for the same clean host-build VM validation flow.

nix run ./nix/test-cluster#cluster -- matrix reuses the current running cluster to exercise composed service scenarios such as prismnet + flashdns + fiberlb, PrismNet-backed VM hosting with plasmavmc + prismnet + coronafs + lightningstor, the Kubernetes-style hosting bundle, and API-gateway-mediated nightlight / creditservice flows.

Preferred entrypoint for publishable matrix verification: nix run ./nix/test-cluster#cluster -- fresh-matrix

nix run ./nix/test-cluster#cluster -- bench-storage benchmarks CoronaFS controller-export vs node-local-export I/O, worker-side materialization latency, and LightningStor large/small-object S3 throughput, then writes a report to docs/storage-benchmarks.md.

Preferred entrypoint for publishable storage numbers: nix run ./nix/test-cluster#cluster -- fresh-storage-bench

nix run ./nix/test-cluster#cluster -- bench-coronafs-local-matrix runs the local single-process CoronaFS export benchmark across the supported cache/aio combinations so software-path regressions can be separated from VM-lab network limits. On the current lab hosts, cache=none with aio=io_uring is the strongest local-export profile and should be treated as the reference point when CoronaFS remote numbers are being distorted by the nested-QEMU/VDE network path.

Advanced usage

Use the script entrypoint only for local debugging inside a prepared Nix shell:

nix develop ./nix/test-cluster -c ./nix/test-cluster/run-cluster.sh smoke

For the strongest local check, use:

nix develop ./nix/test-cluster -c ./nix/test-cluster/run-cluster.sh fresh-smoke

Runtime state

The harness stores build links and VM runtime state under ${PHOTON_VM_DIR:-$HOME/.photoncloud-test-cluster} for the default profile and uses profile-suffixed siblings such as ${PHOTON_VM_DIR:-$HOME/.photoncloud-test-cluster}-storage for alternate build profiles. Logs for each VM are written to <state-dir>/<node>/vm.log.

Scope note

This harness is intentionally VM-first. Older ad hoc launch scripts under baremetal/vm-cluster are legacy/manual paths and should not be treated as the primary local validation entrypoint.