photoncloud-monorepo/nix/test-cluster/README.md

91 lines
4.3 KiB
Markdown

# PhotonCloud VM Test Cluster
`nix/test-cluster` is the canonical local validation path for PhotonCloud.
It boots six QEMU VMs, treats them as hardware-like nodes, and validates representative control-plane, worker, and gateway behavior over SSH and service endpoints.
All VM images are built on the host in a single Nix invocation and then booted as prebuilt artifacts. The guests do not compile the stack locally.
## What it validates
- 3-node control-plane formation for `chainfire`, `flaredb`, and `iam`
- control-plane service health for `prismnet`, `flashdns`, `fiberlb`, `plasmavmc`, `lightningstor`, and `k8shost`
- worker-node `plasmavmc` and `lightningstor` startup
- nested KVM inside worker VMs by booting an inner guest with `qemu-system-x86_64 -accel kvm`
- gateway-node `apigateway`, `nightlight`, and minimal `creditservice` startup
- host-forwarded access to the API gateway and NightLight HTTP surfaces
- cross-node data replication smoke tests for `chainfire` and `flaredb`
## Validation layers
- image build: build all six VM derivations on the host in one `nix build`
- boot and unit readiness: boot the nodes in dependency order and wait for SSH plus the expected `systemd` units
- protocol surfaces: probe the expected HTTP, TCP, UDP, and metrics endpoints for each role
- replicated state: write and read convergence checks across the 3-node `chainfire` and `flaredb` clusters
- worker virtualization: launch a nested KVM guest inside both worker VMs
- external entrypoints: verify host-forwarded API gateway and NightLight access from outside the guest
- auth-integrated minimal services: confirm `creditservice` stays up and actually connects to IAM
## Requirements
- minimal host requirements:
- Linux host with `/dev/kvm`
- nested virtualization enabled on the host hypervisor
- `nix`
- if you do not use `nix run` or `nix develop`, install:
- `qemu-system-x86_64`
- `ssh`
- `sshpass`
- `curl`
## Main commands
```bash
nix run ./nix/test-cluster#cluster -- build
nix run ./nix/test-cluster#cluster -- start
nix run ./nix/test-cluster#cluster -- smoke
nix run ./nix/test-cluster#cluster -- fresh-smoke
nix run ./nix/test-cluster#cluster -- matrix
nix run ./nix/test-cluster#cluster -- fresh-matrix
nix run ./nix/test-cluster#cluster -- bench-storage
nix run ./nix/test-cluster#cluster -- fresh-bench-storage
nix run ./nix/test-cluster#cluster -- validate
nix run ./nix/test-cluster#cluster -- status
nix run ./nix/test-cluster#cluster -- ssh node04
nix run ./nix/test-cluster#cluster -- stop
nix run ./nix/test-cluster#cluster -- clean
make cluster-smoke
```
Preferred entrypoint for publishable verification: `nix run ./nix/test-cluster#cluster -- fresh-smoke`
`make cluster-smoke` is a convenience wrapper for the same clean host-build VM validation flow.
`nix run ./nix/test-cluster#cluster -- matrix` reuses the current running cluster to exercise composed service scenarios such as `prismnet + flashdns + fiberlb`, VM hosting with `plasmavmc + coronafs + lightningstor`, and the Kubernetes-style hosting bundle.
Preferred entrypoint for publishable matrix verification: `nix run ./nix/test-cluster#cluster -- fresh-matrix`
`nix run ./nix/test-cluster#cluster -- bench-storage` benchmarks CoronaFS local-vs-shared-volume I/O, queued random-read behavior, cross-worker direct-I/O shared-volume reads, and LightningStor large/small-object S3 throughput and writes a report to `docs/storage-benchmarks.md`.
Preferred entrypoint for publishable storage numbers: `nix run ./nix/test-cluster#cluster -- fresh-storage-bench`
## Advanced usage
Use the script entrypoint only for local debugging inside a prepared Nix shell:
```bash
nix develop ./nix/test-cluster -c ./nix/test-cluster/run-cluster.sh smoke
```
For the strongest local check, use:
```bash
nix develop ./nix/test-cluster -c ./nix/test-cluster/run-cluster.sh fresh-smoke
```
## Runtime state
The harness stores build links and VM runtime state under `${PHOTON_VM_DIR:-$HOME/.photoncloud-test-cluster}` for the default profile and uses profile-suffixed siblings such as `${PHOTON_VM_DIR:-$HOME/.photoncloud-test-cluster}-storage` for alternate build profiles.
Logs for each VM are written to `<state-dir>/<node>/vm.log`.
## Scope note
This harness is intentionally VM-first. Older ad hoc launch scripts under `baremetal/vm-cluster` are legacy/manual paths and should not be treated as the primary local validation entrypoint.