- Replace form_urlencoded with RFC 3986 compliant URI encoding - Implement aws_uri_encode() matching AWS SigV4 spec exactly - Unreserved chars (A-Z,a-z,0-9,-,_,.,~) not encoded - All other chars percent-encoded with uppercase hex - Preserve slashes in paths, encode in query params - Normalize empty paths to '/' per AWS spec - Fix test expectations (body hash, HMAC values) - Add comprehensive SigV4 signature determinism test This fixes the canonicalization mismatch that caused signature validation failures in T047. Auth can now be enabled for production. Refs: T058.S1
9.1 KiB
T036 VM Cluster Deployment - Configuration Guide
This document describes the node configurations prepared for the 3-node PlasmaCloud test cluster.
Overview
Goal: Deploy and validate a 3-node PlasmaCloud cluster using T032 bare-metal provisioning tools in a VM environment.
Deployment Profile: Control-plane (all 8 PlasmaCloud services on each node)
Cluster Mode: Bootstrap (3-node Raft quorum initialization)
Node Configurations
Network Topology
| Node | IP | Hostname | MAC | Role |
|---|---|---|---|---|
| node01 | 192.168.100.11 | node01.plasma.local | 52:54:00:00:01:01 | control-plane |
| node02 | 192.168.100.12 | node02.plasma.local | 52:54:00:00:01:02 | control-plane |
| node03 | 192.168.100.13 | node03.plasma.local | 52:54:00:00:01:03 | control-plane |
Network: 192.168.100.0/24 (QEMU multicast socket: 230.0.0.1:1234)
Gateway: 192.168.100.1 (PXE server)
Directory Structure
T036-vm-cluster-deployment/
├── DEPLOYMENT.md (this file)
├── task.yaml
├── node01/
│ ├── configuration.nix # NixOS system configuration
│ ├── disko.nix # Disk partitioning layout
│ └── secrets/
│ ├── cluster-config.json # Raft cluster configuration
│ ├── ca.crt # [S3] CA certificate (to be added)
│ ├── node01.crt # [S3] Node certificate (to be added)
│ ├── node01.key # [S3] Node private key (to be added)
│ └── README.md # Secrets documentation
├── node02/ (same structure)
└── node03/ (same structure)
Configuration Details
Control-Plane Services (Enabled on All Nodes)
- Chainfire - Distributed configuration (ports: 2379/2380/2381)
- FlareDB - KV database (ports: 2479/2480)
- IAM - Identity management (port: 8080)
- PlasmaVMC - VM control plane (port: 8081)
- PrismNET - SDN controller (port: 8082)
- FlashDNS - DNS server (port: 8053)
- FiberLB - Load balancer (port: 8084)
- LightningStor - Block storage (port: 8085)
- K8sHost - Kubernetes component (port: 8086)
Disk Layout (disko.nix)
All nodes use identical single-disk LVM layout:
- Device:
/dev/vda(100GB QCOW2) - Partitions:
- ESP (boot): 512MB, FAT32, mounted at
/boot - LVM Physical Volume: Remaining space (~99.5GB)
- ESP (boot): 512MB, FAT32, mounted at
- LVM Volume Group:
poolrootLV: 80GB, ext4, mounted at/dataLV: ~19.5GB, ext4, mounted at/var/lib
Cluster Configuration (cluster-config.json)
All nodes configured for bootstrap mode (3-node simultaneous initialization):
{
"bootstrap": true,
"initial_peers": ["node01:2380", "node02:2380", "node03:2380"],
"flaredb_peers": ["node01:2480", "node02:2480", "node03:2480"]
}
Key Points:
- All 3 nodes have
bootstrap: true(Raft bootstrap cluster) leader_urlpoints to node01 (first node) for referenceinitial_peersidentical on all nodes (required for bootstrap)- First-boot automation will initialize cluster automatically
First-Boot Automation
Enabled on all nodes via services.first-boot-automation:
- Wait for local service health (Chainfire, FlareDB, IAM)
- Detect bootstrap mode (
bootstrap: true) - Skip cluster join (bootstrap nodes auto-form cluster via
initial_peers) - Create marker files (
.chainfire-initialized,.flaredb-initialized) - Run health checks
Expected Behavior:
- All 3 nodes start simultaneously
- Raft consensus auto-elects leader
- Cluster operational within 30-60 seconds
Next Steps (After S4)
S3: TLS Certificate Generation (PeerA)
Generate certificates and copy to each node's secrets/ directory:
# Generate CA and node certificates (see T032 QUICKSTART)
cd /home/centra/cloud/baremetal/tls
./generate-ca.sh
./generate-node-cert.sh node01.plasma.local 192.168.100.11
./generate-node-cert.sh node02.plasma.local 192.168.100.12
./generate-node-cert.sh node03.plasma.local 192.168.100.13
# Copy to node configuration directories
cp ca.crt docs/por/T036-vm-cluster-deployment/node01/secrets/
cp node01.crt node01.key docs/por/T036-vm-cluster-deployment/node01/secrets/
# Repeat for node02 and node03
S5: Cluster Provisioning (PeerA + PeerB)
Deploy using nixos-anywhere:
cd /home/centra/cloud
# Start VMs (S1 - already done by PeerA)
# VMs should be running and accessible via PXE network
# Deploy all 3 nodes in parallel
for node in node01 node02 node03; do
nixos-anywhere --flake docs/por/T036-vm-cluster-deployment/$node \
root@$node.plasma.local &
done
wait
# Monitor first-boot logs
ssh root@node01.plasma.local 'journalctl -u chainfire-cluster-join.service -f'
S6: Cluster Validation (Both)
Verify cluster health:
# Check Chainfire cluster
curl -k https://192.168.100.11:2379/admin/cluster/members | jq
# Expected: 3 members, all healthy, leader elected
# Check FlareDB cluster
curl -k https://192.168.100.11:2479/admin/cluster/members | jq
# Test CRUD operations
curl -k -X PUT https://192.168.100.11:2479/api/v1/kv/test-key \
-H "Content-Type: application/json" \
-d '{"value": "hello-cluster"}'
curl -k https://192.168.100.11:2479/api/v1/kv/test-key
# Verify data replicated to all nodes
curl -k https://192.168.100.12:2479/api/v1/kv/test-key
curl -k https://192.168.100.13:2479/api/v1/kv/test-key
Coordination with PeerA
PeerA Status (from S1):
- ✅ VM infrastructure created (QEMU multicast socket)
- ✅ Disk images created (node01/02/03.qcow2, pxe-server.qcow2)
- ✅ Launch scripts ready
- ⏳ S2 (PXE Server) - Waiting on Full PXE decision (Foreman MID: 000620)
- ⏳ S3 (TLS Certs) - Pending
PeerB Status (S4):
- ✅ Node configurations complete (configuration.nix, disko.nix)
- ✅ Cluster configs ready (cluster-config.json)
- ✅ TLS directory structure prepared
- ⏳ Awaiting S3 certificates from PeerA
Dependency Flow:
S1 (VMs) → S2 (PXE) → S3 (TLS) → S4 (Configs) → S5 (Provision) → S6 (Validate)
PeerA PeerA PeerA PeerB Both Both
Configuration Files Reference
configuration.nix
- Imports:
hardware-configuration.nix,disko.nix,nix/modules/default.nix - Network: Static IP, hostname, firewall rules
- Services: All control-plane services enabled
- First-boot: Enabled with cluster-config.json
- SSH: Key-based authentication only
- System packages: vim, htop, curl, jq, tcpdump, etc.
disko.nix
- Based on disko project format
- Declarative disk partitioning
- Executed by nixos-anywhere during provisioning
- Creates: EFI boot partition + LVM (root + data)
cluster-config.json
- Read by first-boot-automation systemd services
- Defines: node identity, Raft peers, bootstrap mode
- Deployed to:
/etc/nixos/secrets/cluster-config.json
Troubleshooting
If Provisioning Fails
- Check VM network connectivity:
ping 192.168.100.11 - Verify PXE server is serving netboot images (S2)
- Check TLS certificates exist in secrets/ directories (S3)
- Review nixos-anywhere logs
- Check disko.nix syntax:
nix eval --json -f disko.nix
If Cluster Join Fails
- SSH to node:
ssh root@192.168.100.11 - Check service status:
systemctl status chainfire.service - View first-boot logs:
journalctl -u chainfire-cluster-join.service - Verify cluster-config.json:
jq . /etc/nixos/secrets/cluster-config.json - Test health endpoint:
curl -k https://localhost:2379/health
If Cluster Not Forming
- Verify all 3 nodes started simultaneously (bootstrap requirement)
- Check
initial_peersmatches on all nodes - Check network connectivity between nodes:
ping 192.168.100.12 - Check firewall allows Raft ports (2380, 2480)
- Review Chainfire logs:
journalctl -u chainfire.service
Documentation References
- T032 Bare-Metal Provisioning:
/home/centra/cloud/docs/por/T032-baremetal-provisioning/ - First-Boot Automation:
/home/centra/cloud/baremetal/first-boot/README.md - Image Builder:
/home/centra/cloud/baremetal/image-builder/README.md - VM Cluster Setup:
/home/centra/cloud/baremetal/vm-cluster/README.md - NixOS Modules:
/home/centra/cloud/nix/modules/
Notes
- Bootstrap vs Join: All 3 nodes use bootstrap mode (simultaneous start). Additional nodes would use
bootstrap: falseand join vialeader_url. - PXE vs Direct: Foreman decision (MID: 000620) confirms Full PXE validation. S2 will build and deploy netboot artifacts.
- Hardware Config:
hardware-configuration.nixwill be auto-generated by nixos-anywhere during provisioning. - SSH Keys: Placeholder key in configuration.nix will be replaced during nixos-anywhere with actual provisioning key.
Success Criteria (T036 Acceptance)
- ✅ 3 VMs deployed with QEMU
- ✅ Virtual network configured (multicast socket)
- ⏳ PXE server operational (S2)
- ⏳ All 3 nodes provisioned via nixos-anywhere (S5)
- ⏳ Chainfire + FlareDB Raft clusters formed (S6)
- ⏳ IAM service operational on all nodes (S6)
- ⏳ Health checks passing (S6)
- ⏳ T032 RUNBOOK validated end-to-end (S6)
S4 Status: COMPLETE (Node Configs Ready for S5)
Next: Awaiting S3 (TLS Certs) + S2 (PXE Server) from PeerA