photoncloud-monorepo/docs/ops/nested-kvm-setup.md

38 lines
2 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# PlasmaVMC Nested KVM & App Validation (Draft)
## Nested KVM quick check
1) On host: `cat /sys/module/kvm_intel/parameters/nested` (or `kvm_amd`). Expect `Y` for enabled, `N` for disabled.
2) If disabled (Intel example):
```
boot.kernelModules = [ "kvm-intel" ];
boot.extraModprobeConfig = ''
options kvm-intel nested=1
'';
```
For AMD, use `kvm-amd` and `options kvm-amd nested=1`.
3) Reboot once, verify again.
4) Inside a guest VM: prove nesting with a minimal KVM launch:
```
qemu-system-x86_64 -accel kvm -cpu host -m 512 -nographic \
-kernel /run/current-system/kernel -append "console=ttyS0" < /dev/null
```
If it boots to kernel console, nesting works.
## App scenario (lightweight)
- Topology: 2x app VMs on PrismNET, FiberLB front, FlashDNS record -> LB VIP.
- Data: FlareDB SQL (guestbook-style) for metadata; ChainFire backs control-plane metadata.
- Controls: CreditService Admission Control enforced on VM create (low quota); NightLight metrics exported.
### Steps
1) Provision: create 2 VMs via PlasmaVMC API; attach PrismNET network; ensure watcher persists VM metadata to FlareDB.
2) Configure: deploy small web app on each VM that writes/reads FlareDB SQL; register DNS record in FlashDNS pointing to FiberLB listener.
3) Gate: set low wallet balance; attempt VM create/update to confirm CAS-based debit and rollback on failure.
4) Observe: ensure NightLight scrapes app + system metrics; add alerts for latency > target and billing failures.
5) Failover drills:
- Kill one app VM: FiberLB should reroute; CreditService must not double-charge retries.
- Restart PlasmaVMC node: watcher should replay state from FlareDB/ChainFire; VM lifecycle ops continue.
6) Exit criteria: all above steps pass 5x in a row; NightLight shows zero SLO violations; CreditService balances consistent before/after drills.
## Notes
- Full disk HA not covered; for disk replication wed need distributed block (future).
- Keep tests env-gated (ignored by default) so CI doesnt require nested virt.