- Replace form_urlencoded with RFC 3986 compliant URI encoding - Implement aws_uri_encode() matching AWS SigV4 spec exactly - Unreserved chars (A-Z,a-z,0-9,-,_,.,~) not encoded - All other chars percent-encoded with uppercase hex - Preserve slashes in paths, encode in query params - Normalize empty paths to '/' per AWS spec - Fix test expectations (body hash, HMAC values) - Add comprehensive SigV4 signature determinism test This fixes the canonicalization mismatch that caused signature validation failures in T047. Auth can now be enabled for production. Refs: T058.S1
47 KiB
T032 Bare-Metal Provisioning Design Document
Status: Draft Author: peerB Created: 2025-12-10 Last Updated: 2025-12-10
1. Architecture Overview
This document outlines the design for automated bare-metal provisioning of the PlasmaCloud platform, which consists of 8 core services (Chainfire, FlareDB, IAM, PlasmaVMC, PrismNET, FlashDNS, FiberLB, and K8sHost). The provisioning system leverages NixOS's declarative configuration capabilities to enable fully automated deployment from bare hardware to a running, clustered platform.
The high-level flow follows this sequence: PXE Boot → kexec NixOS Installer → disko Disk Partitioning → nixos-anywhere Installation → First-Boot Configuration → Running Cluster. A bare-metal server performs a network boot via PXE/iPXE, which loads a minimal NixOS installer into RAM using kexec. The installer then connects to a provisioning server, which uses nixos-anywhere to declaratively partition disks (via disko), install NixOS with pre-configured services, and inject node-specific configuration (SSH keys, network settings, cluster join parameters, TLS certificates). On first boot, the system automatically joins existing Raft clusters (Chainfire/FlareDB) or bootstraps new ones, and all 8 services start with proper dependencies and TLS enabled.
The key components are:
- PXE/iPXE Boot Server: Serves boot binaries and configuration scripts via TFTP/HTTP
- nixos-anywhere: SSH-based remote installation tool that orchestrates the entire deployment
- disko: Declarative disk partitioning engine integrated with nixos-anywhere
- kexec: Linux kernel feature enabling fast boot into NixOS installer without full reboot
- NixOS Flake (from T024): Provides all service packages and NixOS modules
- Configuration Injection System: Manages node-specific secrets, network config, and cluster metadata
- First-Boot Automation: Systemd units that perform cluster join and service initialization
2. PXE Boot Flow
2.1 Boot Sequence
┌─────────────┐
│ Bare Metal │
│ Server │
└──────┬──────┘
│ 1. UEFI/BIOS PXE ROM
▼
┌──────────────┐
│ DHCP Server │ Option 93: Client Architecture (0=BIOS, 7=UEFI x64)
│ │ Option 67: Boot filename (undionly.kpxe or ipxe.efi)
│ │ Option 66: TFTP server address
└──────┬───────┘
│ 2. DHCP OFFER with boot parameters
▼
┌──────────────┐
│ TFTP/HTTP │
│ Server │ Serves: undionly.kpxe (BIOS) or ipxe.efi (UEFI)
└──────┬───────┘
│ 3. Download iPXE bootloader
▼
┌──────────────┐
│ iPXE Running │ User-class="iPXE" in DHCP request
│ (in RAM) │
└──────┬───────┘
│ 4. Second DHCP request (now with iPXE user-class)
▼
┌──────────────┐
│ DHCP Server │ Detects user-class="iPXE"
│ │ Option 67: http://boot.server/boot.ipxe
└──────┬───────┘
│ 5. DHCP OFFER with script URL
▼
┌──────────────┐
│ HTTP Server │ Serves: boot.ipxe (iPXE script)
└──────┬───────┘
│ 6. Download and execute boot script
▼
┌──────────────┐
│ iPXE Script │ Loads: NixOS kernel + initrd + kexec
│ Execution │
└──────┬───────┘
│ 7. kexec into NixOS installer
▼
┌──────────────┐
│ NixOS Live │ SSH enabled, waiting for nixos-anywhere
│ Installer │
└──────────────┘
2.2 DHCP Configuration Requirements
The DHCP server must support architecture-specific boot file selection and iPXE user-class detection. For ISC DHCP server (/etc/dhcp/dhcpd.conf):
# Architecture detection (RFC 4578)
option architecture-type code 93 = unsigned integer 16;
# iPXE detection
option user-class code 77 = string;
subnet 10.0.0.0 netmask 255.255.255.0 {
range 10.0.0.100 10.0.0.200;
option routers 10.0.0.1;
option domain-name-servers 10.0.0.1;
# Boot server
next-server 10.0.0.2; # TFTP/HTTP server IP
# Chainloading logic
if exists user-class and option user-class = "iPXE" {
# iPXE is already loaded, provide boot script via HTTP
filename "http://10.0.0.2:8080/boot.ipxe";
} elsif option architecture-type = 00:00 {
# BIOS (legacy) - load iPXE via TFTP
filename "undionly.kpxe";
} elsif option architecture-type = 00:07 {
# UEFI x86_64 - load iPXE via TFTP
filename "ipxe.efi";
} elsif option architecture-type = 00:09 {
# UEFI x86_64 (alternate) - load iPXE via TFTP
filename "ipxe.efi";
} else {
# Fallback
filename "ipxe.efi";
}
}
Key Points:
- Option 93 (architecture-type): Distinguishes BIOS (0x0000) vs UEFI (0x0007/0x0009)
- Option 66 (next-server): TFTP server IP for initial boot files
- Option 67 (filename): Boot file name, changes based on architecture and iPXE presence
- User-class detection: Prevents infinite loop (iPXE downloading itself)
- HTTP chainloading: After iPXE loads, switch to HTTP for faster downloads
2.3 iPXE Script Structure
The boot script (/srv/boot/boot.ipxe) provides a menu for deployment profiles:
#!ipxe
# Variables
set boot-server 10.0.0.2:8080
set nix-cache http://${boot-server}/nix-cache
# Display system info
echo System information:
echo - Platform: ${platform}
echo - Architecture: ${buildarch}
echo - MAC: ${net0/mac}
echo - IP: ${net0/ip}
echo
# Menu with timeout
:menu
menu PlasmaCloud Bare-Metal Provisioning
item --gap -- ──────────── Deployment Profiles ────────────
item control-plane Install Control Plane Node (Chainfire + FlareDB + IAM)
item worker Install Worker Node (PlasmaVMC + PrismNET + Storage)
item all-in-one Install All-in-One (All 8 Services)
item shell Boot to NixOS Installer Shell
item --gap -- ─────────────────────────────────────────────
item --key r reboot Reboot System
choose --timeout 30000 --default all-in-one target || goto menu
# Execute selection
goto ${target}
:control-plane
echo Booting Control Plane installer...
set profile control-plane
goto boot
:worker
echo Booting Worker Node installer...
set profile worker
goto boot
:all-in-one
echo Booting All-in-One installer...
set profile all-in-one
goto boot
:shell
echo Booting to installer shell...
set profile shell
goto boot
:boot
# Load NixOS netboot artifacts (from nixos-images or custom build)
kernel http://${boot-server}/nixos/bzImage init=/nix/store/...-nixos-system/init loglevel=4 console=ttyS0 console=tty0 nixos.profile=${profile}
initrd http://${boot-server}/nixos/initrd
boot
:reboot
reboot
:failed
echo Boot failed, dropping to shell...
sleep 10
shell
Features:
- Multi-profile support: Different service combinations per node type
- Hardware detection: Shows MAC/IP for inventory tracking
- Timeout with default: Unattended deployment after 30 seconds
- Kernel parameters: Pass profile to NixOS installer for conditional configuration
- Error handling: Falls back to shell on failure
2.4 HTTP vs TFTP Trade-offs
| Aspect | TFTP | HTTP |
|---|---|---|
| Speed | ~1-5 MB/s (UDP, no windowing) | ~50-100+ MB/s (TCP with pipelining) |
| Reliability | Low (UDP, prone to timeouts) | High (TCP with retries) |
| Firmware Support | Universal (all PXE ROMs) | UEFI 2.5+ only (HTTP Boot) |
| Complexity | Simple protocol, minimal config | Requires web server (nginx/apache) |
| Use Case | Initial iPXE binary (~100KB) | Kernel/initrd/images (~100-500MB) |
Recommended Hybrid Approach:
- TFTP for initial iPXE binary delivery (universal compatibility)
- HTTP for all subsequent artifacts (kernel, initrd, scripts, packages)
- Configure iPXE with embedded HTTP support
- NixOS netboot images served via HTTP with range request support for resumability
UEFI HTTP Boot Alternative: For pure UEFI environments, skip TFTP entirely by using DHCP Option 60 (Vendor Class = "HTTPClient") and Option 67 (HTTP URI). However, this lacks BIOS compatibility and requires newer firmware (2015+).
3. Image Generation Strategy
3.1 Building NixOS Netboot Images
NixOS provides built-in netboot image generation. We extend this to include PlasmaCloud services:
Option 1: Custom Netboot Configuration (Recommended)
Create nix/images/netboot.nix:
{ config, pkgs, lib, modulesPath, ... }:
{
imports = [
"${modulesPath}/installer/netboot/netboot-minimal.nix"
../../nix/modules # PlasmaCloud service modules
];
# Networking for installer phase
networking = {
usePredictableInterfaceNames = false; # Use eth0 instead of enpXsY
useDHCP = true;
firewall.enable = false; # Open during installation
};
# SSH for nixos-anywhere
services.openssh = {
enable = true;
settings = {
PermitRootLogin = "yes";
PasswordAuthentication = false; # Key-based only
};
};
# Authorized keys for provisioning server
users.users.root.openssh.authorizedKeys.keys = [
"ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIProvisioning Server Key..."
];
# Minimal kernel for hardware support
boot.kernelPackages = pkgs.linuxPackages_latest;
boot.supportedFilesystems = [ "ext4" "xfs" "btrfs" "zfs" ];
# Include disko for disk management
environment.systemPackages = with pkgs; [
disko
parted
cryptsetup
lvm2
];
# Disable unnecessary services for installer
documentation.enable = false;
documentation.nixos.enable = false;
sound.enable = false;
# Build artifacts needed for netboot
system.build = {
netbootRamdisk = config.system.build.initialRamdisk;
kernel = config.system.build.kernel;
netbootIpxeScript = pkgs.writeText "netboot.ipxe" ''
#!ipxe
kernel \${boot-url}/bzImage init=${config.system.build.toplevel}/init ${toString config.boot.kernelParams}
initrd \${boot-url}/initrd
boot
'';
};
}
Build the netboot artifacts:
nix build .#nixosConfigurations.netboot.config.system.build.netbootRamdisk
nix build .#nixosConfigurations.netboot.config.system.build.kernel
# Copy to HTTP server
cp result/bzImage /srv/boot/nixos/
cp result/initrd /srv/boot/nixos/
Option 2: Use Pre-built Images (Faster Development)
The nix-community/nixos-images project provides pre-built netboot images:
# Use their iPXE chainload directly
chain https://github.com/nix-community/nixos-images/releases/download/nixos-unstable/netboot-x86_64-linux.ipxe
# Or download artifacts
curl -L https://github.com/nix-community/nixos-images/releases/download/nixos-unstable/bzImage -o /srv/boot/nixos/bzImage
curl -L https://github.com/nix-community/nixos-images/releases/download/nixos-unstable/initrd -o /srv/boot/nixos/initrd
3.2 Configuration Injection Approach
Configuration must be injected at installation time (not baked into netboot image) to support:
- Node-specific networking (static IPs, VLANs)
- Cluster join parameters (existing Raft leader addresses)
- TLS certificates (unique per node)
- Hardware-specific disk layouts
Three-Phase Configuration Model:
Phase 1: Netboot Image (Generic)
- Universal kernel with broad hardware support
- SSH server with provisioning key
- disko + installer tools
- No node-specific data
Phase 2: nixos-anywhere Deployment (Node-Specific)
- Pull node configuration from provisioning server based on MAC/hostname
- Partition disks per disko spec
- Install NixOS with flake:
github:yourorg/plasmacloud#node-hostname - Inject secrets:
/etc/nixos/secrets/(TLS certs, cluster tokens)
Phase 3: First Boot (Service Initialization)
- systemd service reads
/etc/nixos/secrets/cluster-config.json - Auto-join Chainfire cluster (or bootstrap if first node)
- FlareDB joins after Chainfire is healthy
- IAM initializes with FlareDB backend
- Other services start with proper dependencies
Configuration Repository Structure:
/srv/provisioning/
├── nodes/
│ ├── node01.example.com/
│ │ ├── hardware.nix # Generated from nixos-generate-config
│ │ ├── configuration.nix # Node-specific service config
│ │ ├── disko.nix # Disk layout
│ │ └── secrets/
│ │ ├── tls-cert.pem
│ │ ├── tls-key.pem
│ │ ├── tls-ca.pem
│ │ └── cluster-config.json
│ └── node02.example.com/
│ └── ...
├── profiles/
│ ├── control-plane.nix # Chainfire + FlareDB + IAM
│ ├── worker.nix # PlasmaVMC + storage
│ └── all-in-one.nix # All 8 services
└── common/
├── base.nix # Common settings (SSH, users, firewall)
└── networking.nix # Network defaults
Node Configuration Example (nodes/node01.example.com/configuration.nix):
{ config, pkgs, lib, ... }:
{
imports = [
../../profiles/control-plane.nix
../../common/base.nix
./hardware.nix
./disko.nix
];
networking = {
hostName = "node01";
domain = "example.com";
interfaces.eth0 = {
useDHCP = false;
ipv4.addresses = [{
address = "10.0.1.10";
prefixLength = 24;
}];
};
defaultGateway = "10.0.1.1";
nameservers = [ "10.0.1.1" ];
};
# Service configuration
services.chainfire = {
enable = true;
port = 2379;
raftPort = 2380;
gossipPort = 2381;
settings = {
node_id = "node01";
cluster_name = "prod-cluster";
# Initial cluster peers (for bootstrap)
initial_peers = [
"node01.example.com:2380"
"node02.example.com:2380"
"node03.example.com:2380"
];
tls = {
cert_path = "/etc/nixos/secrets/tls-cert.pem";
key_path = "/etc/nixos/secrets/tls-key.pem";
ca_path = "/etc/nixos/secrets/tls-ca.pem";
};
};
};
services.flaredb = {
enable = true;
port = 2479;
raftPort = 2480;
settings = {
node_id = "node01";
cluster_name = "prod-cluster";
chainfire_endpoint = "https://localhost:2379";
tls = {
cert_path = "/etc/nixos/secrets/tls-cert.pem";
key_path = "/etc/nixos/secrets/tls-key.pem";
ca_path = "/etc/nixos/secrets/tls-ca.pem";
};
};
};
services.iam = {
enable = true;
port = 8080;
settings = {
flaredb_endpoint = "https://localhost:2479";
tls = {
cert_path = "/etc/nixos/secrets/tls-cert.pem";
key_path = "/etc/nixos/secrets/tls-key.pem";
};
};
};
system.stateVersion = "24.11";
}
3.3 Hardware Detection vs Explicit Hardware Config
Hardware Detection (Automatic):
During installation, nixos-generate-config scans hardware and creates hardware-configuration.nix:
# On live installer, after disk setup
nixos-generate-config --root /mnt --show-hardware-config > /tmp/hardware.nix
# Upload to provisioning server
curl -X POST -F "file=@/tmp/hardware.nix" http://provisioning-server/api/hardware/node01
Explicit Hardware Config (Declarative):
For homogeneous hardware (e.g., fleet of identical servers), use a template:
# profiles/hardware/dell-r640.nix
{ config, lib, pkgs, modulesPath, ... }:
{
imports = [ (modulesPath + "/installer/scan/not-detected.nix") ];
boot.initrd.availableKernelModules = [ "xhci_pci" "ahci" "nvme" "usbhid" "sd_mod" ];
boot.kernelModules = [ "kvm-intel" ];
# Network interfaces (predictable naming)
networking.interfaces = {
enp59s0f0 = {}; # 10GbE Port 1
enp59s0f1 = {}; # 10GbE Port 2
};
# CPU microcode updates
hardware.cpu.intel.updateMicrocode = true;
# Power management
powerManagement.cpuFreqGovernor = "performance";
nixpkgs.hostPlatform = "x86_64-linux";
}
Recommendation:
- Phase 1 (Development): Auto-detect hardware for flexibility
- Phase 2 (Production): Standardize on explicit hardware profiles for consistency and faster deployments
3.4 Image Size Optimization
Netboot images must fit in RAM (typically 1-4 GB available after kexec). Strategies:
1. Exclude Documentation and Locales:
documentation.enable = false;
documentation.nixos.enable = false;
i18n.supportedLocales = [ "en_US.UTF-8/UTF-8" ];
2. Minimal Kernel:
boot.kernelPackages = pkgs.linuxPackages_latest;
boot.kernelParams = [ "modprobe.blacklist=nouveau" ]; # Exclude unused drivers
3. Squashfs Compression: NixOS netboot uses squashfs for the Nix store, achieving ~2.5x compression:
# Automatically applied by netboot-minimal.nix
system.build.squashfsStore = ...; # Default: gzip compression
4. On-Demand Package Fetching: Instead of bundling all packages, fetch from HTTP substituter during installation:
nix.settings.substituters = [ "http://10.0.0.2:8080/nix-cache" ];
nix.settings.trusted-public-keys = [ "cache-key-here" ];
Expected Sizes:
- Minimal installer (no services): ~150-250 MB (initrd)
- Installer + PlasmaCloud packages: ~400-600 MB (with on-demand fetch)
- Full offline installer: ~1-2 GB (includes all service closures)
4. Installation Flow
4.1 Step-by-Step Process
1. PXE Boot to NixOS Installer (Automated)
- Server powers on, sends DHCP request
- DHCP provides iPXE binary (via TFTP)
- iPXE loads, sends second DHCP request with user-class
- DHCP provides boot script URL (via HTTP)
- iPXE downloads script, executes, loads kernel+initrd
- kexec into NixOS installer (in RAM, ~30-60 seconds)
- Installer boots, acquires IP via DHCP, starts SSH server
2. Provisioning Server Detects Node (Semi-Automated)
Provisioning server monitors DHCP leases or receives webhook from installer:
# Installer sends registration on boot (custom init script)
curl -X POST http://provisioning-server/api/register \
-d '{"mac":"aa:bb:cc:dd:ee:ff","ip":"10.0.0.100","hostname":"node01"}'
Provisioning server looks up node in inventory:
# /srv/provisioning/inventory.json
{
"nodes": {
"aa:bb:cc:dd:ee:ff": {
"hostname": "node01.example.com",
"profile": "control-plane",
"config_path": "/srv/provisioning/nodes/node01.example.com"
}
}
}
3. Run nixos-anywhere (Automated)
Provisioning server executes nixos-anywhere:
#!/bin/bash
# /srv/provisioning/scripts/provision-node.sh
NODE_MAC="$1"
NODE_IP=$(get_ip_from_dhcp "$NODE_MAC")
NODE_HOSTNAME=$(lookup_hostname "$NODE_MAC")
CONFIG_PATH="/srv/provisioning/nodes/$NODE_HOSTNAME"
# Copy secrets to installer (will be injected during install)
ssh root@$NODE_IP "mkdir -p /tmp/secrets"
scp $CONFIG_PATH/secrets/* root@$NODE_IP:/tmp/secrets/
# Run nixos-anywhere with disko
nix run github:nix-community/nixos-anywhere -- \
--flake "/srv/provisioning#$NODE_HOSTNAME" \
--build-on-remote \
--disk-encryption-keys /tmp/disk.key <(cat $CONFIG_PATH/secrets/disk-encryption.key) \
root@$NODE_IP
nixos-anywhere performs:
- Detects existing OS (if any)
- Loads kexec if needed (already done via PXE)
- Runs disko to partition disks (based on
$CONFIG_PATH/disko.nix) - Builds NixOS system closure (either locally or on target)
- Copies closure to
/mnt(mounted root) - Installs bootloader (GRUB/systemd-boot)
- Copies secrets to
/mnt/etc/nixos/secrets/ - Unmounts, reboots
4. First Boot into Installed System (Automated)
Server reboots from disk (GRUB/systemd-boot), loads NixOS:
- systemd starts
chainfire.servicestarts (waits 30s for network)- If
initial_peersmatches only self → bootstrap new cluster - If
initial_peersincludes others → attempt to join existing cluster flaredb.servicestarts after chainfire is healthyiam.servicestarts after flaredb is healthy- Other services start based on profile
First-boot cluster join logic (systemd unit):
# /etc/nixos/first-boot-cluster-join.nix
{ config, lib, pkgs, ... }:
let
clusterConfig = builtins.fromJSON (builtins.readFile /etc/nixos/secrets/cluster-config.json);
in
{
systemd.services.chainfire-cluster-join = {
description = "Chainfire Cluster Join";
after = [ "network-online.target" "chainfire.service" ];
wants = [ "network-online.target" ];
wantedBy = [ "multi-user.target" ];
serviceConfig = {
Type = "oneshot";
RemainAfterExit = true;
};
script = ''
# Wait for local chainfire to be ready
until ${pkgs.curl}/bin/curl -k https://localhost:2379/health; do
echo "Waiting for local chainfire..."
sleep 5
done
# Check if this is the first node (bootstrap)
if [ "${clusterConfig.bootstrap}" = "true" ]; then
echo "Bootstrap node, cluster already initialized"
exit 0
fi
# Join existing cluster
LEADER_URL="${clusterConfig.leader_url}"
NODE_ID="${clusterConfig.node_id}"
RAFT_ADDR="${clusterConfig.raft_addr}"
${pkgs.curl}/bin/curl -k -X POST "$LEADER_URL/admin/member/add" \
-H "Content-Type: application/json" \
-d "{\"id\":\"$NODE_ID\",\"raft_addr\":\"$RAFT_ADDR\"}"
echo "Cluster join initiated"
'';
};
# Similar for flaredb
systemd.services.flaredb-cluster-join = {
description = "FlareDB Cluster Join";
after = [ "chainfire-cluster-join.service" "flaredb.service" ];
requires = [ "chainfire-cluster-join.service" ];
# ... similar logic
};
}
5. Validation (Manual/Automated)
Provisioning server polls health endpoints:
# Health check script
curl -k https://10.0.1.10:2379/health # Chainfire
curl -k https://10.0.1.10:2479/health # FlareDB
curl -k https://10.0.1.10:8080/health # IAM
# Cluster status
curl -k https://10.0.1.10:2379/admin/cluster/members | jq
4.2 Error Handling and Recovery
Boot Failures:
- Symptom: Server stuck in PXE boot loop
- Diagnosis: Check DHCP server logs, verify TFTP/HTTP server accessibility
- Recovery: Fix DHCP config, restart services, retry boot
Disk Partitioning Failures:
- Symptom: nixos-anywhere fails during disko phase
- Diagnosis: SSH to installer, run
dmesg | grep -i error, check disk accessibility - Recovery: Adjust disko config (e.g., wrong disk device), re-run nixos-anywhere
Installation Failures:
- Symptom: nixos-anywhere fails during installation phase
- Diagnosis: Check nixos-anywhere output, SSH to
/mntto inspect - Recovery: Fix configuration errors, re-run nixos-anywhere (will reformat)
Cluster Join Failures:
- Symptom: Service starts but not in cluster
- Diagnosis:
journalctl -u chainfire-cluster-join, check leader reachability - Recovery: Manually run join command, verify TLS certs, check firewall
Rollback Strategy:
- NixOS generations provide atomic rollback:
nixos-rebuild switch --rollback - For catastrophic failure: Re-provision from PXE (data loss if not replicated)
4.3 Network Requirements
DHCP:
- Option 66/67 for PXE boot
- Option 93 for architecture detection
- User-class filtering for iPXE chainload
- Static reservations for production nodes (optional)
DNS:
- Forward and reverse DNS for all nodes (required for TLS cert CN verification)
- Example:
node01.example.com→10.0.1.10,10.0.1.10→node01.example.com
Firewall:
- Allow TFTP (UDP 69) from nodes to boot server
- Allow HTTP (TCP 80/8080) from nodes to boot/provisioning server
- Allow SSH (TCP 22) from provisioning server to nodes
- Allow service ports (2379-2381, 2479-2480, 8080, etc.) between cluster nodes
Internet Access:
- During installation: Required for Nix binary cache (cache.nixos.org) unless using local cache
- After installation: Optional (recommended for updates), can run air-gapped with local cache
- Workaround: Set up local binary cache:
nix-serve+ nginx
Bandwidth:
- PXE boot: ~200 MB (kernel + initrd) per node, sequential is acceptable
- Installation: ~1-5 GB (Nix closures) per node, parallel ok if cache is local
- Recommendation: 1 Gbps link between provisioning server and nodes
5. Integration Points
5.1 T024 NixOS Modules
The NixOS modules from T024 (nix/modules/*.nix) provide declarative service configuration. They are included in node configurations:
{ config, pkgs, lib, ... }:
{
imports = [
# Import PlasmaCloud service modules
inputs.plasmacloud.nixosModules.default
];
# Enable services declaratively
services.chainfire.enable = true;
services.flaredb.enable = true;
services.iam.enable = true;
# ... etc
}
Module Integration Strategy:
-
Flake Inputs: Node configurations reference the PlasmaCloud flake:
# flake.nix for provisioning repo inputs.plasmacloud.url = "github:yourorg/plasmacloud"; # or path-based for development inputs.plasmacloud.url = "path:/path/to/plasmacloud/repo"; -
Service Packages: Packages are injected via overlay:
nixpkgs.overlays = [ inputs.plasmacloud.overlays.default ]; # Now pkgs.chainfire-server, pkgs.flaredb-server, etc. are available -
Dependency Graph: systemd units respect T024 dependencies:
chainfire.service ↓ requires/after flaredb.service ↓ requires/after iam.service ↓ requires/after plasmavmc.service, flashdns.service, ... (parallel) -
Configuration Schema: Use
services.<name>.settingsfor service-specific config:services.chainfire.settings = { node_id = "node01"; cluster_name = "prod"; tls = { ... }; };
5.2 T027 Config Unification
T027 established a unified configuration approach (clap + config file/env). This integrates with NixOS in two ways:
1. NixOS Module → Config File Generation:
The NixOS module translates services.<name>.settings to a config file:
# In nix/modules/chainfire.nix
systemd.services.chainfire = {
preStart = ''
# Generate config file from settings
cat > /var/lib/chainfire/config.toml <<EOF
node_id = "${cfg.settings.node_id}"
cluster_name = "${cfg.settings.cluster_name}"
[tls]
cert_path = "${cfg.settings.tls.cert_path}"
key_path = "${cfg.settings.tls.key_path}"
ca_path = "${cfg.settings.tls.ca_path or ""}"
EOF
'';
serviceConfig.ExecStart = "${cfg.package}/bin/chainfire-server --config /var/lib/chainfire/config.toml";
};
2. Environment Variable Injection:
For secrets not suitable for Nix store:
systemd.services.chainfire.serviceConfig = {
EnvironmentFile = "/etc/nixos/secrets/chainfire.env";
# File contains: CHAINFIRE_API_TOKEN=secret123
};
Best Practices:
- Public config: Use
services.<name>.settings(stored in Nix store, world-readable) - Secrets: Use
EnvironmentFileor systemd credentials - Hybrid: Config file with placeholders, secrets injected at runtime
5.3 T031 TLS Certificates
T031 added TLS to all 8 services. Provisioning must handle certificate distribution:
Certificate Provisioning Strategies:
Option 1: Pre-Generated Certificates (Simple)
-
Generate certs on provisioning server per node:
# /srv/provisioning/scripts/generate-certs.sh node01.example.com openssl req -x509 -newkey rsa:4096 -nodes \ -keyout node01-key.pem -out node01-cert.pem \ -days 365 -subj "/CN=node01.example.com" -
Copy to node secrets directory:
cp node01-*.pem /srv/provisioning/nodes/node01.example.com/secrets/ -
nixos-anywhere installs them to
/etc/nixos/secrets/(mode 0400, owner root) -
NixOS module references them:
services.chainfire.settings.tls = { cert_path = "/etc/nixos/secrets/tls-cert.pem"; key_path = "/etc/nixos/secrets/tls-key.pem"; ca_path = "/etc/nixos/secrets/tls-ca.pem"; };
Option 2: ACME (Let's Encrypt) for External Services
For internet-facing services (e.g., PlasmaVMC API):
security.acme = {
acceptTerms = true;
defaults.email = "admin@example.com";
};
services.plasmavmc.settings.tls = {
cert_path = config.security.acme.certs."plasmavmc.example.com".directory + "/cert.pem";
key_path = config.security.acme.certs."plasmavmc.example.com".directory + "/key.pem";
};
security.acme.certs."plasmavmc.example.com" = {
domain = "plasmavmc.example.com";
# Use DNS-01 challenge for internal servers
dnsProvider = "cloudflare";
credentialsFile = "/etc/nixos/secrets/cloudflare-api-token";
};
Option 3: Internal CA with Cert-Manager (Advanced)
- Deploy cert-manager as a service on control plane
- Generate per-node CSRs during first boot
- Cert-manager signs and distributes certs
- Systemd timer renews certs before expiry
Recommendation:
- Phase 1 (MVP): Pre-generated certs (Option 1)
- Phase 2 (Production): ACME for external + internal CA for internal (Option 2+3)
5.4 Chainfire/FlareDB Cluster Join
Bootstrap (First 3 Nodes):
First node (node01):
services.chainfire.settings = {
node_id = "node01";
initial_peers = [
"node01.example.com:2380"
"node02.example.com:2380"
"node03.example.com:2380"
];
bootstrap = true; # This node starts the cluster
};
Subsequent nodes (node02, node03):
services.chainfire.settings = {
node_id = "node02";
initial_peers = [
"node01.example.com:2380"
"node02.example.com:2380"
"node03.example.com:2380"
];
bootstrap = false; # Join existing cluster
};
Runtime Join (After Bootstrap):
New nodes added to running cluster:
- Provision node with
bootstrap = false,initial_peers = [] - First-boot service calls leader's admin API:
curl -k -X POST https://node01.example.com:2379/admin/member/add \ -H "Content-Type: application/json" \ -d '{"id":"node04","raft_addr":"node04.example.com:2380"}' - Node receives cluster state, starts Raft
- Leader replicates to new node
FlareDB Follows Same Pattern:
FlareDB depends on Chainfire for coordination but maintains its own Raft cluster:
services.flaredb.settings = {
node_id = "node01";
chainfire_endpoint = "https://localhost:2379";
initial_peers = [ "node01:2480" "node02:2480" "node03:2480" ];
};
Critical: Ensure chainfire.service is healthy before starting flaredb.service (enforced by systemd requires/after).
5.5 IAM Bootstrap
IAM requires initial admin user creation. Two approaches:
Option 1: First-Boot Initialization Script
systemd.services.iam-bootstrap = {
description = "IAM Initial Admin User";
after = [ "iam.service" ];
wantedBy = [ "multi-user.target" ];
serviceConfig = {
Type = "oneshot";
RemainAfterExit = true;
};
script = ''
# Check if admin exists
if ${pkgs.curl}/bin/curl -k https://localhost:8080/api/users/admin 2>&1 | grep -q "not found"; then
# Create admin user
ADMIN_PASSWORD=$(cat /etc/nixos/secrets/iam-admin-password)
${pkgs.curl}/bin/curl -k -X POST https://localhost:8080/api/users \
-H "Content-Type: application/json" \
-d "{\"username\":\"admin\",\"password\":\"$ADMIN_PASSWORD\",\"role\":\"admin\"}"
echo "Admin user created"
else
echo "Admin user already exists"
fi
'';
};
Option 2: Environment Variable for Default Admin
IAM service creates admin on first start if DB is empty:
// In iam-server main.rs
if user_count() == 0 {
let admin_password = env::var("IAM_INITIAL_ADMIN_PASSWORD")
.expect("IAM_INITIAL_ADMIN_PASSWORD must be set for first boot");
create_user("admin", &admin_password, Role::Admin)?;
info!("Initial admin user created");
}
systemd.services.iam.serviceConfig = {
EnvironmentFile = "/etc/nixos/secrets/iam.env";
# File contains: IAM_INITIAL_ADMIN_PASSWORD=random-secure-password
};
Recommendation: Use Option 2 (environment variable) for simplicity. Generate random password during node provisioning, store in secrets.
6. Alternatives Considered
6.1 nixos-anywhere vs Custom Installer
nixos-anywhere (Chosen):
- Pros:
- Mature, actively maintained by nix-community
- Handles kexec, disko integration, bootloader install automatically
- SSH-based, works from any OS (no need for NixOS on provisioning server)
- Supports remote builds and disk encryption out of box
- Well-documented with many examples
- Cons:
- Requires SSH access (not suitable for zero-touch provisioning without PXE+SSH)
- Opinionated workflow (less flexible than custom scripts)
- Dependency on external project (but very stable)
Custom Installer (Rejected):
- Pros:
- Full control over installation flow
- Could implement zero-touch (e.g., installer pulls config from server without SSH)
- Tailored to PlasmaCloud-specific needs
- Cons:
- Significant development effort (partitioning, bootloader, error handling)
- Reinvents well-tested code (disko, kexec integration)
- Maintenance burden (keep up with NixOS changes)
- Higher risk of bugs (partitioning is error-prone)
Decision: Use nixos-anywhere for reliability and speed. The SSH requirement is acceptable since PXE boot already provides network access, and adding SSH keys to the netboot image is straightforward.
6.2 Disk Management Tools
disko (Chosen):
- Pros:
- Declarative, fits NixOS philosophy
- Integrates with nixos-anywhere out of box
- Supports complex layouts (RAID, LVM, LUKS, ZFS, btrfs)
- Idempotent (can reformat or verify existing layout)
- Cons:
- Nix-based DSL (learning curve)
- Limited to Linux filesystems (no Windows support, not relevant here)
Kickstart/Preseed (Rejected):
- Used by Fedora/Debian installers
- Not NixOS-native, would require custom integration
Terraform with Libvirt (Rejected):
- Good for VMs, not bare metal
- Doesn't handle disk partitioning directly
Decision: disko is the clear choice for NixOS deployments.
6.3 Boot Methods
iPXE over TFTP/HTTP (Chosen):
- Pros:
- Universal support (BIOS + UEFI)
- Flexible scripting (boot menus, conditional logic)
- HTTP support for fast downloads
- Open source, widely deployed
- Cons:
- Requires DHCP configuration (Option 66/67 setup)
- Chainloading adds complexity (but solved problem)
UEFI HTTP Boot (Rejected):
- Pros:
- Native UEFI, no TFTP needed
- Simpler DHCP config (just Option 60/67)
- Cons:
- UEFI only (no BIOS support)
- Firmware support inconsistent (pre-2015 servers)
- Less flexible than iPXE scripting
Preboot USB (Rejected):
- Manual, not scalable for fleet deployment
- Useful for one-off installs only
Decision: iPXE for flexibility and compatibility. UEFI HTTP Boot could be considered later for pure UEFI fleets.
6.4 Configuration Management
NixOS Flakes (Chosen):
- Pros:
- Native to NixOS, declarative
- Reproducible builds with lock files
- Git-based, version controlled
- No external agent needed (systemd handles state)
- Cons:
- Steep learning curve for operators unfamiliar with Nix
- Less dynamic than Ansible (changes require rebuild)
Ansible (Rejected for Provisioning, Useful for Orchestration):
- Pros:
- Agentless, SSH-based
- Large ecosystem of modules
- Dynamic, easy to patch running systems
- Cons:
- Imperative (harder to guarantee state)
- Doesn't integrate with NixOS packages/modules
- Adds another tool to stack
Terraform (Rejected):
- Infrastructure-as-code, not config management
- Better for cloud VMs than bare metal
Decision: Use NixOS flakes for provisioning and base config. Ansible may be added later for operational tasks (e.g., rolling updates, health checks) that don't fit NixOS's declarative model.
7. Open Questions / Decisions Needed
7.1 Hardware Inventory Management
Question: How do we map MAC addresses to node roles and configurations?
Options:
- Manual Inventory File: Operator maintains JSON/YAML with MAC → hostname → config mapping
- Auto-Discovery: First boot prompts operator to assign role (e.g., via serial console or web UI)
- External CMDB: Integrate with existing Configuration Management Database (e.g., NetBox, Nautobot)
Recommendation: Start with manual inventory file (simple), migrate to CMDB integration in Phase 2.
7.2 Secrets Management
Question: How are secrets (TLS keys, passwords) generated, stored, and rotated?
Options:
- File-Based (Current): Secrets in
/srv/provisioning/nodes/*/secrets/, copied during install - Vault Integration: Fetch secrets from HashiCorp Vault at boot time
- systemd Credentials: Use systemd's encrypted credentials feature (requires systemd 250+)
Recommendation: Phase 1 uses file-based (simple, works today). Phase 2 adds Vault for production (centralized, auditable, rotation support).
7.3 Network Boot Security
Question: How do we prevent rogue nodes from joining the cluster?
Concerns:
- Attacker boots unauthorized server on network
- Installer has SSH key, could be accessed
- Node joins cluster with malicious intent
Mitigations:
- MAC Whitelist: DHCP only serves known MAC addresses
- Network Segmentation: PXE boot on isolated provisioning VLAN
- SSH Key Per Node: Each node has unique authorized_keys in netboot image (complex)
- Cluster Authentication: Raft join requires cluster token (not yet implemented)
Recommendation: Use MAC whitelist + provisioning VLAN for Phase 1. Add cluster join tokens in Phase 2 (requires Chainfire/FlareDB changes).
7.4 Multi-Datacenter Deployment
Question: How does provisioning work across geographically distributed datacenters?
Challenges:
- WAN latency for Nix cache fetches
- PXE boot requires local DHCP/TFTP
- Cluster join across WAN (Raft latency)
Options:
- Replicated Provisioning Server: Deploy boot server in each datacenter, sync configs
- Central Provisioning with Local Cache: Single source of truth, local Nix cache mirrors
- Per-DC Clusters: Each datacenter is independent cluster, federated at application layer
Recommendation: Defer to Phase 2. Phase 1 assumes single datacenter or low-latency LAN.
7.5 Disk Encryption
Question: Should disks be encrypted at rest?
Trade-offs:
- Pros: Compliance (GDPR, PCI-DSS), protection against physical theft
- Cons: Key management complexity, can't auto-reboot (manual unlock), performance overhead (~5-10%)
Options:
- No Encryption: Rely on physical security
- LUKS with Network Unlock: Tang/Clevis for automated unlocking (requires network on boot)
- LUKS with Manual Unlock: Operator enters passphrase via KVM/IPMI
Recommendation: Optional, configurable per deployment. Provide disko template for LUKS, let operator decide.
7.6 Rolling Updates
Question: How do we update a running cluster without downtime?
Challenges:
- Raft requires quorum (can't update majority simultaneously)
- Service dependencies (Chainfire → FlareDB → others)
- NixOS rebuild requires reboot (for kernel/init changes)
Strategy:
- Update one node at a time (rolling)
- Verify health before proceeding to next
- Use
nixos-rebuild testfirst (activates without bootloader change), thenswitchafter validation
Tooling:
- Ansible playbook for orchestration
- Health check scripts (curl endpoints + check Raft status)
- Rollback plan (NixOS generations + Raft snapshot restore)
Recommendation: Document as runbook in Phase 1, implement automated rolling update in Phase 2 (T033?).
7.7 Monitoring and Alerting
Question: How do we monitor provisioning success/failure?
Options:
- Manual: Operator watches terminal, checks health endpoints
- Log Aggregation: Collect installer logs, index in Loki/Elasticsearch
- Event Webhook: Installer posts events to monitoring system (Grafana, PagerDuty)
Recommendation: Phase 1 uses manual monitoring. Phase 2 adds structured logging + webhooks for fleet deployments.
7.8 Compatibility with Existing Infrastructure
Question: Can this provisioning system coexist with existing PXE infrastructure (e.g., for other OS deployments)?
Concerns:
- Existing DHCP config may conflict
- TFTP server may serve other boot files
- Network team may control PXE infrastructure
Solutions:
- Dedicated Provisioning VLAN: PlasmaCloud nodes on separate network
- Conditional DHCP: Use vendor-class or subnet matching to route to correct boot server
- Multi-Boot Menu: iPXE menu includes options for PlasmaCloud and other OSes
Recommendation: Document network requirements, provide example DHCP config for common scenarios (dedicated VLAN, shared infrastructure). Coordinate with network team.
Appendices
A. Example Disko Configuration
Single Disk with GPT and ext4:
# nodes/node01/disko.nix
{ disks ? [ "/dev/sda" ], ... }:
{
disko.devices = {
disk = {
main = {
type = "disk";
device = builtins.head disks;
content = {
type = "gpt";
partitions = {
ESP = {
size = "512M";
type = "EF00";
content = {
type = "filesystem";
format = "vfat";
mountpoint = "/boot";
};
};
root = {
size = "100%";
content = {
type = "filesystem";
format = "ext4";
mountpoint = "/";
};
};
};
};
};
};
};
}
RAID1 with LUKS Encryption:
{ disks ? [ "/dev/sda" "/dev/sdb" ], ... }:
{
disko.devices = {
disk = {
disk1 = {
device = builtins.elemAt disks 0;
type = "disk";
content = {
type = "gpt";
partitions = {
boot = {
size = "1M";
type = "EF02"; # BIOS boot
};
mdraid = {
size = "100%";
content = {
type = "mdraid";
name = "raid1";
};
};
};
};
};
disk2 = {
device = builtins.elemAt disks 1;
type = "disk";
content = {
type = "gpt";
partitions = {
boot = {
size = "1M";
type = "EF02";
};
mdraid = {
size = "100%";
content = {
type = "mdraid";
name = "raid1";
};
};
};
};
};
};
mdadm = {
raid1 = {
type = "mdadm";
level = 1;
content = {
type = "luks";
name = "cryptroot";
settings.allowDiscards = true;
content = {
type = "filesystem";
format = "ext4";
mountpoint = "/";
};
};
};
};
};
}
B. Complete nixos-anywhere Command Examples
Basic Deployment:
nix run github:nix-community/nixos-anywhere -- \
--flake .#node01 \
root@10.0.0.100
With Build on Remote (Slow Local Machine):
nix run github:nix-community/nixos-anywhere -- \
--flake .#node01 \
--build-on-remote \
root@10.0.0.100
With Disk Encryption Key:
nix run github:nix-community/nixos-anywhere -- \
--flake .#node01 \
--disk-encryption-keys /tmp/luks.key <(cat /secrets/node01-luks.key) \
root@10.0.0.100
Debug Mode (Keep Installer After Failure):
nix run github:nix-community/nixos-anywhere -- \
--flake .#node01 \
--debug \
--no-reboot \
root@10.0.0.100
C. Provisioning Server Setup Script
#!/bin/bash
# /srv/provisioning/scripts/setup-provisioning-server.sh
set -euo pipefail
# Install dependencies
apt-get update
apt-get install -y nginx tftpd-hpa dnsmasq curl
# Configure TFTP
cat > /etc/default/tftpd-hpa <<EOF
TFTP_USERNAME="tftp"
TFTP_DIRECTORY="/srv/boot/tftp"
TFTP_ADDRESS="0.0.0.0:69"
TFTP_OPTIONS="--secure"
EOF
mkdir -p /srv/boot/tftp
systemctl restart tftpd-hpa
# Download iPXE binaries
curl -L http://boot.ipxe.org/undionly.kpxe -o /srv/boot/tftp/undionly.kpxe
curl -L http://boot.ipxe.org/ipxe.efi -o /srv/boot/tftp/ipxe.efi
# Configure nginx for HTTP boot
cat > /etc/nginx/sites-available/pxe <<EOF
server {
listen 8080;
server_name _;
root /srv/boot;
location / {
autoindex on;
try_files \$uri \$uri/ =404;
}
# Enable range requests for large files
location ~* \.(iso|img|bin|efi|kpxe)$ {
add_header Accept-Ranges bytes;
}
}
EOF
ln -sf /etc/nginx/sites-available/pxe /etc/nginx/sites-enabled/
systemctl restart nginx
# Create directory structure
mkdir -p /srv/boot/{nixos,nix-cache,scripts}
mkdir -p /srv/provisioning/{nodes,profiles,common,scripts}
echo "Provisioning server setup complete!"
echo "Next steps:"
echo "1. Configure DHCP server (see design doc Section 2.2)"
echo "2. Build NixOS netboot image (see Section 3.1)"
echo "3. Create node configurations (see Section 3.2)"
D. First-Boot Cluster Config JSON Schema
{
"$schema": "http://json-schema.org/draft-07/schema#",
"title": "Cluster Configuration",
"type": "object",
"properties": {
"node_id": {
"type": "string",
"description": "Unique identifier for this node"
},
"bootstrap": {
"type": "boolean",
"description": "True if this node should bootstrap a new cluster"
},
"leader_url": {
"type": "string",
"format": "uri",
"description": "URL of existing cluster leader (for join)"
},
"raft_addr": {
"type": "string",
"description": "This node's Raft address (host:port)"
},
"cluster_token": {
"type": "string",
"description": "Shared secret for cluster authentication (future)"
}
},
"required": ["node_id", "bootstrap"],
"if": {
"properties": { "bootstrap": { "const": false } }
},
"then": {
"required": ["leader_url", "raft_addr"]
}
}
Example for bootstrap node:
{
"node_id": "node01",
"bootstrap": true,
"raft_addr": "node01.example.com:2380"
}
Example for joining node:
{
"node_id": "node04",
"bootstrap": false,
"leader_url": "https://node01.example.com:2379",
"raft_addr": "node04.example.com:2380"
}
E. References and Further Reading
Primary Documentation:
Technical Specifications:
Community Resources:
Related Blog Posts:
Revision History
| Version | Date | Author | Changes |
|---|---|---|---|
| 0.1 | 2025-12-10 | peerB | Initial draft |
End of Design Document