# T032 Bare-Metal Provisioning Design Document **Status:** Draft **Author:** peerB **Created:** 2025-12-10 **Last Updated:** 2025-12-10 ## 1. Architecture Overview This document outlines the design for automated bare-metal provisioning of the PlasmaCloud platform, which consists of 8 core services (Chainfire, FlareDB, IAM, PlasmaVMC, PrismNET, FlashDNS, FiberLB, and K8sHost). The provisioning system leverages NixOS's declarative configuration capabilities to enable fully automated deployment from bare hardware to a running, clustered platform. The high-level flow follows this sequence: **PXE Boot → kexec NixOS Installer → disko Disk Partitioning → nixos-anywhere Installation → First-Boot Configuration → Running Cluster**. A bare-metal server performs a network boot via PXE/iPXE, which loads a minimal NixOS installer into RAM using kexec. The installer then connects to a provisioning server, which uses nixos-anywhere to declaratively partition disks (via disko), install NixOS with pre-configured services, and inject node-specific configuration (SSH keys, network settings, cluster join parameters, TLS certificates). On first boot, the system automatically joins existing Raft clusters (Chainfire/FlareDB) or bootstraps new ones, and all 8 services start with proper dependencies and TLS enabled. The key components are: - **PXE/iPXE Boot Server**: Serves boot binaries and configuration scripts via TFTP/HTTP - **nixos-anywhere**: SSH-based remote installation tool that orchestrates the entire deployment - **disko**: Declarative disk partitioning engine integrated with nixos-anywhere - **kexec**: Linux kernel feature enabling fast boot into NixOS installer without full reboot - **NixOS Flake** (from T024): Provides all service packages and NixOS modules - **Configuration Injection System**: Manages node-specific secrets, network config, and cluster metadata - **First-Boot Automation**: Systemd units that perform cluster join and service initialization ## 2. PXE Boot Flow ### 2.1 Boot Sequence ``` ┌─────────────┐ │ Bare Metal │ │ Server │ └──────┬──────┘ │ 1. UEFI/BIOS PXE ROM ▼ ┌──────────────┐ │ DHCP Server │ Option 93: Client Architecture (0=BIOS, 7=UEFI x64) │ │ Option 67: Boot filename (undionly.kpxe or ipxe.efi) │ │ Option 66: TFTP server address └──────┬───────┘ │ 2. DHCP OFFER with boot parameters ▼ ┌──────────────┐ │ TFTP/HTTP │ │ Server │ Serves: undionly.kpxe (BIOS) or ipxe.efi (UEFI) └──────┬───────┘ │ 3. Download iPXE bootloader ▼ ┌──────────────┐ │ iPXE Running │ User-class="iPXE" in DHCP request │ (in RAM) │ └──────┬───────┘ │ 4. Second DHCP request (now with iPXE user-class) ▼ ┌──────────────┐ │ DHCP Server │ Detects user-class="iPXE" │ │ Option 67: http://boot.server/boot.ipxe └──────┬───────┘ │ 5. DHCP OFFER with script URL ▼ ┌──────────────┐ │ HTTP Server │ Serves: boot.ipxe (iPXE script) └──────┬───────┘ │ 6. Download and execute boot script ▼ ┌──────────────┐ │ iPXE Script │ Loads: NixOS kernel + initrd + kexec │ Execution │ └──────┬───────┘ │ 7. kexec into NixOS installer ▼ ┌──────────────┐ │ NixOS Live │ SSH enabled, waiting for nixos-anywhere │ Installer │ └──────────────┘ ``` ### 2.2 DHCP Configuration Requirements The DHCP server must support architecture-specific boot file selection and iPXE user-class detection. For ISC DHCP server (`/etc/dhcp/dhcpd.conf`): ```dhcp # Architecture detection (RFC 4578) option architecture-type code 93 = unsigned integer 16; # iPXE detection option user-class code 77 = string; subnet 10.0.0.0 netmask 255.255.255.0 { range 10.0.0.100 10.0.0.200; option routers 10.0.0.1; option domain-name-servers 10.0.0.1; # Boot server next-server 10.0.0.2; # TFTP/HTTP server IP # Chainloading logic if exists user-class and option user-class = "iPXE" { # iPXE is already loaded, provide boot script via HTTP filename "http://10.0.0.2:8080/boot.ipxe"; } elsif option architecture-type = 00:00 { # BIOS (legacy) - load iPXE via TFTP filename "undionly.kpxe"; } elsif option architecture-type = 00:07 { # UEFI x86_64 - load iPXE via TFTP filename "ipxe.efi"; } elsif option architecture-type = 00:09 { # UEFI x86_64 (alternate) - load iPXE via TFTP filename "ipxe.efi"; } else { # Fallback filename "ipxe.efi"; } } ``` **Key Points:** - **Option 93** (architecture-type): Distinguishes BIOS (0x0000) vs UEFI (0x0007/0x0009) - **Option 66** (next-server): TFTP server IP for initial boot files - **Option 67** (filename): Boot file name, changes based on architecture and iPXE presence - **User-class detection**: Prevents infinite loop (iPXE downloading itself) - **HTTP chainloading**: After iPXE loads, switch to HTTP for faster downloads ### 2.3 iPXE Script Structure The boot script (`/srv/boot/boot.ipxe`) provides a menu for deployment profiles: ```ipxe #!ipxe # Variables set boot-server 10.0.0.2:8080 set nix-cache http://${boot-server}/nix-cache # Display system info echo System information: echo - Platform: ${platform} echo - Architecture: ${buildarch} echo - MAC: ${net0/mac} echo - IP: ${net0/ip} echo # Menu with timeout :menu menu PlasmaCloud Bare-Metal Provisioning item --gap -- ──────────── Deployment Profiles ──────────── item control-plane Install Control Plane Node (Chainfire + FlareDB + IAM) item worker Install Worker Node (PlasmaVMC + PrismNET + Storage) item all-in-one Install All-in-One (All 8 Services) item shell Boot to NixOS Installer Shell item --gap -- ───────────────────────────────────────────── item --key r reboot Reboot System choose --timeout 30000 --default all-in-one target || goto menu # Execute selection goto ${target} :control-plane echo Booting Control Plane installer... set profile control-plane goto boot :worker echo Booting Worker Node installer... set profile worker goto boot :all-in-one echo Booting All-in-One installer... set profile all-in-one goto boot :shell echo Booting to installer shell... set profile shell goto boot :boot # Load NixOS netboot artifacts (from nixos-images or custom build) kernel http://${boot-server}/nixos/bzImage init=/nix/store/...-nixos-system/init loglevel=4 console=ttyS0 console=tty0 nixos.profile=${profile} initrd http://${boot-server}/nixos/initrd boot :reboot reboot :failed echo Boot failed, dropping to shell... sleep 10 shell ``` **Features:** - **Multi-profile support**: Different service combinations per node type - **Hardware detection**: Shows MAC/IP for inventory tracking - **Timeout with default**: Unattended deployment after 30 seconds - **Kernel parameters**: Pass profile to NixOS installer for conditional configuration - **Error handling**: Falls back to shell on failure ### 2.4 HTTP vs TFTP Trade-offs | Aspect | TFTP | HTTP | |--------|------|------| | **Speed** | ~1-5 MB/s (UDP, no windowing) | ~50-100+ MB/s (TCP with pipelining) | | **Reliability** | Low (UDP, prone to timeouts) | High (TCP with retries) | | **Firmware Support** | Universal (all PXE ROMs) | UEFI 2.5+ only (HTTP Boot) | | **Complexity** | Simple protocol, minimal config | Requires web server (nginx/apache) | | **Use Case** | Initial iPXE binary (~100KB) | Kernel/initrd/images (~100-500MB) | **Recommended Hybrid Approach:** 1. **TFTP** for initial iPXE binary delivery (universal compatibility) 2. **HTTP** for all subsequent artifacts (kernel, initrd, scripts, packages) 3. Configure iPXE with embedded HTTP support 4. NixOS netboot images served via HTTP with range request support for resumability **UEFI HTTP Boot Alternative:** For pure UEFI environments, skip TFTP entirely by using DHCP Option 60 (Vendor Class = "HTTPClient") and Option 67 (HTTP URI). However, this lacks BIOS compatibility and requires newer firmware (2015+). ## 3. Image Generation Strategy ### 3.1 Building NixOS Netboot Images NixOS provides built-in netboot image generation. We extend this to include PlasmaCloud services: **Option 1: Custom Netboot Configuration (Recommended)** Create `nix/images/netboot.nix`: ```nix { config, pkgs, lib, modulesPath, ... }: { imports = [ "${modulesPath}/installer/netboot/netboot-minimal.nix" ../../nix/modules # PlasmaCloud service modules ]; # Networking for installer phase networking = { usePredictableInterfaceNames = false; # Use eth0 instead of enpXsY useDHCP = true; firewall.enable = false; # Open during installation }; # SSH for nixos-anywhere services.openssh = { enable = true; settings = { PermitRootLogin = "yes"; PasswordAuthentication = false; # Key-based only }; }; # Authorized keys for provisioning server users.users.root.openssh.authorizedKeys.keys = [ "ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIProvisioning Server Key..." ]; # Minimal kernel for hardware support boot.kernelPackages = pkgs.linuxPackages_latest; boot.supportedFilesystems = [ "ext4" "xfs" "btrfs" "zfs" ]; # Include disko for disk management environment.systemPackages = with pkgs; [ disko parted cryptsetup lvm2 ]; # Disable unnecessary services for installer documentation.enable = false; documentation.nixos.enable = false; sound.enable = false; # Build artifacts needed for netboot system.build = { netbootRamdisk = config.system.build.initialRamdisk; kernel = config.system.build.kernel; netbootIpxeScript = pkgs.writeText "netboot.ipxe" '' #!ipxe kernel \${boot-url}/bzImage init=${config.system.build.toplevel}/init ${toString config.boot.kernelParams} initrd \${boot-url}/initrd boot ''; }; } ``` Build the netboot artifacts: ```bash nix build .#nixosConfigurations.netboot.config.system.build.netbootRamdisk nix build .#nixosConfigurations.netboot.config.system.build.kernel # Copy to HTTP server cp result/bzImage /srv/boot/nixos/ cp result/initrd /srv/boot/nixos/ ``` **Option 2: Use Pre-built Images (Faster Development)** The [nix-community/nixos-images](https://github.com/nix-community/nixos-images) project provides pre-built netboot images: ```bash # Use their iPXE chainload directly chain https://github.com/nix-community/nixos-images/releases/download/nixos-unstable/netboot-x86_64-linux.ipxe # Or download artifacts curl -L https://github.com/nix-community/nixos-images/releases/download/nixos-unstable/bzImage -o /srv/boot/nixos/bzImage curl -L https://github.com/nix-community/nixos-images/releases/download/nixos-unstable/initrd -o /srv/boot/nixos/initrd ``` ### 3.2 Configuration Injection Approach Configuration must be injected at installation time (not baked into netboot image) to support: - Node-specific networking (static IPs, VLANs) - Cluster join parameters (existing Raft leader addresses) - TLS certificates (unique per node) - Hardware-specific disk layouts **Three-Phase Configuration Model:** **Phase 1: Netboot Image (Generic)** - Universal kernel with broad hardware support - SSH server with provisioning key - disko + installer tools - No node-specific data **Phase 2: nixos-anywhere Deployment (Node-Specific)** - Pull node configuration from provisioning server based on MAC/hostname - Partition disks per disko spec - Install NixOS with flake: `github:yourorg/plasmacloud#node-hostname` - Inject secrets: `/etc/nixos/secrets/` (TLS certs, cluster tokens) **Phase 3: First Boot (Service Initialization)** - systemd service reads `/etc/nixos/secrets/cluster-config.json` - Auto-join Chainfire cluster (or bootstrap if first node) - FlareDB joins after Chainfire is healthy - IAM initializes with FlareDB backend - Other services start with proper dependencies **Configuration Repository Structure:** ``` /srv/provisioning/ ├── nodes/ │ ├── node01.example.com/ │ │ ├── hardware.nix # Generated from nixos-generate-config │ │ ├── configuration.nix # Node-specific service config │ │ ├── disko.nix # Disk layout │ │ └── secrets/ │ │ ├── tls-cert.pem │ │ ├── tls-key.pem │ │ ├── tls-ca.pem │ │ └── cluster-config.json │ └── node02.example.com/ │ └── ... ├── profiles/ │ ├── control-plane.nix # Chainfire + FlareDB + IAM │ ├── worker.nix # PlasmaVMC + storage │ └── all-in-one.nix # All 8 services └── common/ ├── base.nix # Common settings (SSH, users, firewall) └── networking.nix # Network defaults ``` **Node Configuration Example (`nodes/node01.example.com/configuration.nix`):** ```nix { config, pkgs, lib, ... }: { imports = [ ../../profiles/control-plane.nix ../../common/base.nix ./hardware.nix ./disko.nix ]; networking = { hostName = "node01"; domain = "example.com"; interfaces.eth0 = { useDHCP = false; ipv4.addresses = [{ address = "10.0.1.10"; prefixLength = 24; }]; }; defaultGateway = "10.0.1.1"; nameservers = [ "10.0.1.1" ]; }; # Service configuration services.chainfire = { enable = true; port = 2379; raftPort = 2380; gossipPort = 2381; settings = { node_id = "node01"; cluster_name = "prod-cluster"; # Initial cluster peers (for bootstrap) initial_peers = [ "node01.example.com:2380" "node02.example.com:2380" "node03.example.com:2380" ]; tls = { cert_path = "/etc/nixos/secrets/tls-cert.pem"; key_path = "/etc/nixos/secrets/tls-key.pem"; ca_path = "/etc/nixos/secrets/tls-ca.pem"; }; }; }; services.flaredb = { enable = true; port = 2479; raftPort = 2480; settings = { node_id = "node01"; cluster_name = "prod-cluster"; chainfire_endpoint = "https://localhost:2379"; tls = { cert_path = "/etc/nixos/secrets/tls-cert.pem"; key_path = "/etc/nixos/secrets/tls-key.pem"; ca_path = "/etc/nixos/secrets/tls-ca.pem"; }; }; }; services.iam = { enable = true; port = 8080; settings = { flaredb_endpoint = "https://localhost:2479"; tls = { cert_path = "/etc/nixos/secrets/tls-cert.pem"; key_path = "/etc/nixos/secrets/tls-key.pem"; }; }; }; system.stateVersion = "24.11"; } ``` ### 3.3 Hardware Detection vs Explicit Hardware Config **Hardware Detection (Automatic):** During installation, `nixos-generate-config` scans hardware and creates `hardware-configuration.nix`: ```bash # On live installer, after disk setup nixos-generate-config --root /mnt --show-hardware-config > /tmp/hardware.nix # Upload to provisioning server curl -X POST -F "file=@/tmp/hardware.nix" http://provisioning-server/api/hardware/node01 ``` **Explicit Hardware Config (Declarative):** For homogeneous hardware (e.g., fleet of identical servers), use a template: ```nix # profiles/hardware/dell-r640.nix { config, lib, pkgs, modulesPath, ... }: { imports = [ (modulesPath + "/installer/scan/not-detected.nix") ]; boot.initrd.availableKernelModules = [ "xhci_pci" "ahci" "nvme" "usbhid" "sd_mod" ]; boot.kernelModules = [ "kvm-intel" ]; # Network interfaces (predictable naming) networking.interfaces = { enp59s0f0 = {}; # 10GbE Port 1 enp59s0f1 = {}; # 10GbE Port 2 }; # CPU microcode updates hardware.cpu.intel.updateMicrocode = true; # Power management powerManagement.cpuFreqGovernor = "performance"; nixpkgs.hostPlatform = "x86_64-linux"; } ``` **Recommendation:** - **Phase 1 (Development):** Auto-detect hardware for flexibility - **Phase 2 (Production):** Standardize on explicit hardware profiles for consistency and faster deployments ### 3.4 Image Size Optimization Netboot images must fit in RAM (typically 1-4 GB available after kexec). Strategies: **1. Exclude Documentation and Locales:** ```nix documentation.enable = false; documentation.nixos.enable = false; i18n.supportedLocales = [ "en_US.UTF-8/UTF-8" ]; ``` **2. Minimal Kernel:** ```nix boot.kernelPackages = pkgs.linuxPackages_latest; boot.kernelParams = [ "modprobe.blacklist=nouveau" ]; # Exclude unused drivers ``` **3. Squashfs Compression:** NixOS netboot uses squashfs for the Nix store, achieving ~2.5x compression: ```nix # Automatically applied by netboot-minimal.nix system.build.squashfsStore = ...; # Default: gzip compression ``` **4. On-Demand Package Fetching:** Instead of bundling all packages, fetch from HTTP substituter during installation: ```nix nix.settings.substituters = [ "http://10.0.0.2:8080/nix-cache" ]; nix.settings.trusted-public-keys = [ "cache-key-here" ]; ``` **Expected Sizes:** - **Minimal installer (no services):** ~150-250 MB (initrd) - **Installer + PlasmaCloud packages:** ~400-600 MB (with on-demand fetch) - **Full offline installer:** ~1-2 GB (includes all service closures) ## 4. Installation Flow ### 4.1 Step-by-Step Process **1. PXE Boot to NixOS Installer (Automated)** - Server powers on, sends DHCP request - DHCP provides iPXE binary (via TFTP) - iPXE loads, sends second DHCP request with user-class - DHCP provides boot script URL (via HTTP) - iPXE downloads script, executes, loads kernel+initrd - kexec into NixOS installer (in RAM, ~30-60 seconds) - Installer boots, acquires IP via DHCP, starts SSH server **2. Provisioning Server Detects Node (Semi-Automated)** Provisioning server monitors DHCP leases or receives webhook from installer: ```bash # Installer sends registration on boot (custom init script) curl -X POST http://provisioning-server/api/register \ -d '{"mac":"aa:bb:cc:dd:ee:ff","ip":"10.0.0.100","hostname":"node01"}' ``` Provisioning server looks up node in inventory: ```bash # /srv/provisioning/inventory.json { "nodes": { "aa:bb:cc:dd:ee:ff": { "hostname": "node01.example.com", "profile": "control-plane", "config_path": "/srv/provisioning/nodes/node01.example.com" } } } ``` **3. Run nixos-anywhere (Automated)** Provisioning server executes nixos-anywhere: ```bash #!/bin/bash # /srv/provisioning/scripts/provision-node.sh NODE_MAC="$1" NODE_IP=$(get_ip_from_dhcp "$NODE_MAC") NODE_HOSTNAME=$(lookup_hostname "$NODE_MAC") CONFIG_PATH="/srv/provisioning/nodes/$NODE_HOSTNAME" # Copy secrets to installer (will be injected during install) ssh root@$NODE_IP "mkdir -p /tmp/secrets" scp $CONFIG_PATH/secrets/* root@$NODE_IP:/tmp/secrets/ # Run nixos-anywhere with disko nix run github:nix-community/nixos-anywhere -- \ --flake "/srv/provisioning#$NODE_HOSTNAME" \ --build-on-remote \ --disk-encryption-keys /tmp/disk.key <(cat $CONFIG_PATH/secrets/disk-encryption.key) \ root@$NODE_IP ``` nixos-anywhere performs: - Detects existing OS (if any) - Loads kexec if needed (already done via PXE) - Runs disko to partition disks (based on `$CONFIG_PATH/disko.nix`) - Builds NixOS system closure (either locally or on target) - Copies closure to `/mnt` (mounted root) - Installs bootloader (GRUB/systemd-boot) - Copies secrets to `/mnt/etc/nixos/secrets/` - Unmounts, reboots **4. First Boot into Installed System (Automated)** Server reboots from disk (GRUB/systemd-boot), loads NixOS: - systemd starts - `chainfire.service` starts (waits 30s for network) - If `initial_peers` matches only self → bootstrap new cluster - If `initial_peers` includes others → attempt to join existing cluster - `flaredb.service` starts after chainfire is healthy - `iam.service` starts after flaredb is healthy - Other services start based on profile **First-boot cluster join logic** (systemd unit): ```nix # /etc/nixos/first-boot-cluster-join.nix { config, lib, pkgs, ... }: let clusterConfig = builtins.fromJSON (builtins.readFile /etc/nixos/secrets/cluster-config.json); in { systemd.services.chainfire-cluster-join = { description = "Chainfire Cluster Join"; after = [ "network-online.target" "chainfire.service" ]; wants = [ "network-online.target" ]; wantedBy = [ "multi-user.target" ]; serviceConfig = { Type = "oneshot"; RemainAfterExit = true; }; script = '' # Wait for local chainfire to be ready until ${pkgs.curl}/bin/curl -k https://localhost:2379/health; do echo "Waiting for local chainfire..." sleep 5 done # Check if this is the first node (bootstrap) if [ "${clusterConfig.bootstrap}" = "true" ]; then echo "Bootstrap node, cluster already initialized" exit 0 fi # Join existing cluster LEADER_URL="${clusterConfig.leader_url}" NODE_ID="${clusterConfig.node_id}" RAFT_ADDR="${clusterConfig.raft_addr}" ${pkgs.curl}/bin/curl -k -X POST "$LEADER_URL/admin/member/add" \ -H "Content-Type: application/json" \ -d "{\"id\":\"$NODE_ID\",\"raft_addr\":\"$RAFT_ADDR\"}" echo "Cluster join initiated" ''; }; # Similar for flaredb systemd.services.flaredb-cluster-join = { description = "FlareDB Cluster Join"; after = [ "chainfire-cluster-join.service" "flaredb.service" ]; requires = [ "chainfire-cluster-join.service" ]; # ... similar logic }; } ``` **5. Validation (Manual/Automated)** Provisioning server polls health endpoints: ```bash # Health check script curl -k https://10.0.1.10:2379/health # Chainfire curl -k https://10.0.1.10:2479/health # FlareDB curl -k https://10.0.1.10:8080/health # IAM # Cluster status curl -k https://10.0.1.10:2379/admin/cluster/members | jq ``` ### 4.2 Error Handling and Recovery **Boot Failures:** - **Symptom:** Server stuck in PXE boot loop - **Diagnosis:** Check DHCP server logs, verify TFTP/HTTP server accessibility - **Recovery:** Fix DHCP config, restart services, retry boot **Disk Partitioning Failures:** - **Symptom:** nixos-anywhere fails during disko phase - **Diagnosis:** SSH to installer, run `dmesg | grep -i error`, check disk accessibility - **Recovery:** Adjust disko config (e.g., wrong disk device), re-run nixos-anywhere **Installation Failures:** - **Symptom:** nixos-anywhere fails during installation phase - **Diagnosis:** Check nixos-anywhere output, SSH to `/mnt` to inspect - **Recovery:** Fix configuration errors, re-run nixos-anywhere (will reformat) **Cluster Join Failures:** - **Symptom:** Service starts but not in cluster - **Diagnosis:** `journalctl -u chainfire-cluster-join`, check leader reachability - **Recovery:** Manually run join command, verify TLS certs, check firewall **Rollback Strategy:** - NixOS generations provide atomic rollback: `nixos-rebuild switch --rollback` - For catastrophic failure: Re-provision from PXE (data loss if not replicated) ### 4.3 Network Requirements **DHCP:** - Option 66/67 for PXE boot - Option 93 for architecture detection - User-class filtering for iPXE chainload - Static reservations for production nodes (optional) **DNS:** - Forward and reverse DNS for all nodes (required for TLS cert CN verification) - Example: `node01.example.com` → `10.0.1.10`, `10.0.1.10` → `node01.example.com` **Firewall:** - Allow TFTP (UDP 69) from nodes to boot server - Allow HTTP (TCP 80/8080) from nodes to boot/provisioning server - Allow SSH (TCP 22) from provisioning server to nodes - Allow service ports (2379-2381, 2479-2480, 8080, etc.) between cluster nodes **Internet Access:** - **During installation:** Required for Nix binary cache (cache.nixos.org) unless using local cache - **After installation:** Optional (recommended for updates), can run air-gapped with local cache - **Workaround:** Set up local binary cache: `nix-serve` + nginx **Bandwidth:** - **PXE boot:** ~200 MB (kernel + initrd) per node, sequential is acceptable - **Installation:** ~1-5 GB (Nix closures) per node, parallel ok if cache is local - **Recommendation:** 1 Gbps link between provisioning server and nodes ## 5. Integration Points ### 5.1 T024 NixOS Modules The NixOS modules from T024 (`nix/modules/*.nix`) provide declarative service configuration. They are included in node configurations: ```nix { config, pkgs, lib, ... }: { imports = [ # Import PlasmaCloud service modules inputs.plasmacloud.nixosModules.default ]; # Enable services declaratively services.chainfire.enable = true; services.flaredb.enable = true; services.iam.enable = true; # ... etc } ``` **Module Integration Strategy:** 1. **Flake Inputs:** Node configurations reference the PlasmaCloud flake: ```nix # flake.nix for provisioning repo inputs.plasmacloud.url = "github:yourorg/plasmacloud"; # or path-based for development inputs.plasmacloud.url = "path:/path/to/plasmacloud/repo"; ``` 2. **Service Packages:** Packages are injected via overlay: ```nix nixpkgs.overlays = [ inputs.plasmacloud.overlays.default ]; # Now pkgs.chainfire-server, pkgs.flaredb-server, etc. are available ``` 3. **Dependency Graph:** systemd units respect T024 dependencies: ``` chainfire.service ↓ requires/after flaredb.service ↓ requires/after iam.service ↓ requires/after plasmavmc.service, flashdns.service, ... (parallel) ``` 4. **Configuration Schema:** Use `services..settings` for service-specific config: ```nix services.chainfire.settings = { node_id = "node01"; cluster_name = "prod"; tls = { ... }; }; ``` ### 5.2 T027 Config Unification T027 established a unified configuration approach (clap + config file/env). This integrates with NixOS in two ways: **1. NixOS Module → Config File Generation:** The NixOS module translates `services..settings` to a config file: ```nix # In nix/modules/chainfire.nix systemd.services.chainfire = { preStart = '' # Generate config file from settings cat > /var/lib/chainfire/config.toml <.settings` (stored in Nix store, world-readable) - **Secrets:** Use `EnvironmentFile` or systemd credentials - **Hybrid:** Config file with placeholders, secrets injected at runtime ### 5.3 T031 TLS Certificates T031 added TLS to all 8 services. Provisioning must handle certificate distribution: **Certificate Provisioning Strategies:** **Option 1: Pre-Generated Certificates (Simple)** 1. Generate certs on provisioning server per node: ```bash # /srv/provisioning/scripts/generate-certs.sh node01.example.com openssl req -x509 -newkey rsa:4096 -nodes \ -keyout node01-key.pem -out node01-cert.pem \ -days 365 -subj "/CN=node01.example.com" ``` 2. Copy to node secrets directory: ```bash cp node01-*.pem /srv/provisioning/nodes/node01.example.com/secrets/ ``` 3. nixos-anywhere installs them to `/etc/nixos/secrets/` (mode 0400, owner root) 4. NixOS module references them: ```nix services.chainfire.settings.tls = { cert_path = "/etc/nixos/secrets/tls-cert.pem"; key_path = "/etc/nixos/secrets/tls-key.pem"; ca_path = "/etc/nixos/secrets/tls-ca.pem"; }; ``` **Option 2: ACME (Let's Encrypt) for External Services** For internet-facing services (e.g., PlasmaVMC API): ```nix security.acme = { acceptTerms = true; defaults.email = "admin@example.com"; }; services.plasmavmc.settings.tls = { cert_path = config.security.acme.certs."plasmavmc.example.com".directory + "/cert.pem"; key_path = config.security.acme.certs."plasmavmc.example.com".directory + "/key.pem"; }; security.acme.certs."plasmavmc.example.com" = { domain = "plasmavmc.example.com"; # Use DNS-01 challenge for internal servers dnsProvider = "cloudflare"; credentialsFile = "/etc/nixos/secrets/cloudflare-api-token"; }; ``` **Option 3: Internal CA with Cert-Manager (Advanced)** 1. Deploy cert-manager as a service on control plane 2. Generate per-node CSRs during first boot 3. Cert-manager signs and distributes certs 4. Systemd timer renews certs before expiry **Recommendation:** - **Phase 1 (MVP):** Pre-generated certs (Option 1) - **Phase 2 (Production):** ACME for external + internal CA for internal (Option 2+3) ### 5.4 Chainfire/FlareDB Cluster Join **Bootstrap (First 3 Nodes):** First node (`node01`): ```nix services.chainfire.settings = { node_id = "node01"; initial_peers = [ "node01.example.com:2380" "node02.example.com:2380" "node03.example.com:2380" ]; bootstrap = true; # This node starts the cluster }; ``` Subsequent nodes (`node02`, `node03`): ```nix services.chainfire.settings = { node_id = "node02"; initial_peers = [ "node01.example.com:2380" "node02.example.com:2380" "node03.example.com:2380" ]; bootstrap = false; # Join existing cluster }; ``` **Runtime Join (After Bootstrap):** New nodes added to running cluster: 1. Provision node with `bootstrap = false`, `initial_peers = []` 2. First-boot service calls leader's admin API: ```bash curl -k -X POST https://node01.example.com:2379/admin/member/add \ -H "Content-Type: application/json" \ -d '{"id":"node04","raft_addr":"node04.example.com:2380"}' ``` 3. Node receives cluster state, starts Raft 4. Leader replicates to new node **FlareDB Follows Same Pattern:** FlareDB depends on Chainfire for coordination but maintains its own Raft cluster: ```nix services.flaredb.settings = { node_id = "node01"; chainfire_endpoint = "https://localhost:2379"; initial_peers = [ "node01:2480" "node02:2480" "node03:2480" ]; }; ``` **Critical:** Ensure `chainfire.service` is healthy before starting `flaredb.service` (enforced by systemd `requires`/`after`). ### 5.5 IAM Bootstrap IAM requires initial admin user creation. Two approaches: **Option 1: First-Boot Initialization Script** ```nix systemd.services.iam-bootstrap = { description = "IAM Initial Admin User"; after = [ "iam.service" ]; wantedBy = [ "multi-user.target" ]; serviceConfig = { Type = "oneshot"; RemainAfterExit = true; }; script = '' # Check if admin exists if ${pkgs.curl}/bin/curl -k https://localhost:8080/api/users/admin 2>&1 | grep -q "not found"; then # Create admin user ADMIN_PASSWORD=$(cat /etc/nixos/secrets/iam-admin-password) ${pkgs.curl}/bin/curl -k -X POST https://localhost:8080/api/users \ -H "Content-Type: application/json" \ -d "{\"username\":\"admin\",\"password\":\"$ADMIN_PASSWORD\",\"role\":\"admin\"}" echo "Admin user created" else echo "Admin user already exists" fi ''; }; ``` **Option 2: Environment Variable for Default Admin** IAM service creates admin on first start if DB is empty: ```rust // In iam-server main.rs if user_count() == 0 { let admin_password = env::var("IAM_INITIAL_ADMIN_PASSWORD") .expect("IAM_INITIAL_ADMIN_PASSWORD must be set for first boot"); create_user("admin", &admin_password, Role::Admin)?; info!("Initial admin user created"); } ``` ```nix systemd.services.iam.serviceConfig = { EnvironmentFile = "/etc/nixos/secrets/iam.env"; # File contains: IAM_INITIAL_ADMIN_PASSWORD=random-secure-password }; ``` **Recommendation:** Use Option 2 (environment variable) for simplicity. Generate random password during node provisioning, store in secrets. ## 6. Alternatives Considered ### 6.1 nixos-anywhere vs Custom Installer **nixos-anywhere (Chosen):** - **Pros:** - Mature, actively maintained by nix-community - Handles kexec, disko integration, bootloader install automatically - SSH-based, works from any OS (no need for NixOS on provisioning server) - Supports remote builds and disk encryption out of box - Well-documented with many examples - **Cons:** - Requires SSH access (not suitable for zero-touch provisioning without PXE+SSH) - Opinionated workflow (less flexible than custom scripts) - Dependency on external project (but very stable) **Custom Installer (Rejected):** - **Pros:** - Full control over installation flow - Could implement zero-touch (e.g., installer pulls config from server without SSH) - Tailored to PlasmaCloud-specific needs - **Cons:** - Significant development effort (partitioning, bootloader, error handling) - Reinvents well-tested code (disko, kexec integration) - Maintenance burden (keep up with NixOS changes) - Higher risk of bugs (partitioning is error-prone) **Decision:** Use nixos-anywhere for reliability and speed. The SSH requirement is acceptable since PXE boot already provides network access, and adding SSH keys to the netboot image is straightforward. ### 6.2 Disk Management Tools **disko (Chosen):** - **Pros:** - Declarative, fits NixOS philosophy - Integrates with nixos-anywhere out of box - Supports complex layouts (RAID, LVM, LUKS, ZFS, btrfs) - Idempotent (can reformat or verify existing layout) - **Cons:** - Nix-based DSL (learning curve) - Limited to Linux filesystems (no Windows support, not relevant here) **Kickstart/Preseed (Rejected):** - Used by Fedora/Debian installers - Not NixOS-native, would require custom integration **Terraform with Libvirt (Rejected):** - Good for VMs, not bare metal - Doesn't handle disk partitioning directly **Decision:** disko is the clear choice for NixOS deployments. ### 6.3 Boot Methods **iPXE over TFTP/HTTP (Chosen):** - **Pros:** - Universal support (BIOS + UEFI) - Flexible scripting (boot menus, conditional logic) - HTTP support for fast downloads - Open source, widely deployed - **Cons:** - Requires DHCP configuration (Option 66/67 setup) - Chainloading adds complexity (but solved problem) **UEFI HTTP Boot (Rejected):** - **Pros:** - Native UEFI, no TFTP needed - Simpler DHCP config (just Option 60/67) - **Cons:** - UEFI only (no BIOS support) - Firmware support inconsistent (pre-2015 servers) - Less flexible than iPXE scripting **Preboot USB (Rejected):** - Manual, not scalable for fleet deployment - Useful for one-off installs only **Decision:** iPXE for flexibility and compatibility. UEFI HTTP Boot could be considered later for pure UEFI fleets. ### 6.4 Configuration Management **NixOS Flakes (Chosen):** - **Pros:** - Native to NixOS, declarative - Reproducible builds with lock files - Git-based, version controlled - No external agent needed (systemd handles state) - **Cons:** - Steep learning curve for operators unfamiliar with Nix - Less dynamic than Ansible (changes require rebuild) **Ansible (Rejected for Provisioning, Useful for Orchestration):** - **Pros:** - Agentless, SSH-based - Large ecosystem of modules - Dynamic, easy to patch running systems - **Cons:** - Imperative (harder to guarantee state) - Doesn't integrate with NixOS packages/modules - Adds another tool to stack **Terraform (Rejected):** - Infrastructure-as-code, not config management - Better for cloud VMs than bare metal **Decision:** Use NixOS flakes for provisioning and base config. Ansible may be added later for operational tasks (e.g., rolling updates, health checks) that don't fit NixOS's declarative model. ## 7. Open Questions / Decisions Needed ### 7.1 Hardware Inventory Management **Question:** How do we map MAC addresses to node roles and configurations? **Options:** 1. **Manual Inventory File:** Operator maintains JSON/YAML with MAC → hostname → config mapping 2. **Auto-Discovery:** First boot prompts operator to assign role (e.g., via serial console or web UI) 3. **External CMDB:** Integrate with existing Configuration Management Database (e.g., NetBox, Nautobot) **Recommendation:** Start with manual inventory file (simple), migrate to CMDB integration in Phase 2. ### 7.2 Secrets Management **Question:** How are secrets (TLS keys, passwords) generated, stored, and rotated? **Options:** 1. **File-Based (Current):** Secrets in `/srv/provisioning/nodes/*/secrets/`, copied during install 2. **Vault Integration:** Fetch secrets from HashiCorp Vault at boot time 3. **systemd Credentials:** Use systemd's encrypted credentials feature (requires systemd 250+) **Recommendation:** Phase 1 uses file-based (simple, works today). Phase 2 adds Vault for production (centralized, auditable, rotation support). ### 7.3 Network Boot Security **Question:** How do we prevent rogue nodes from joining the cluster? **Concerns:** - Attacker boots unauthorized server on network - Installer has SSH key, could be accessed - Node joins cluster with malicious intent **Mitigations:** 1. **MAC Whitelist:** DHCP only serves known MAC addresses 2. **Network Segmentation:** PXE boot on isolated provisioning VLAN 3. **SSH Key Per Node:** Each node has unique authorized_keys in netboot image (complex) 4. **Cluster Authentication:** Raft join requires cluster token (not yet implemented) **Recommendation:** Use MAC whitelist + provisioning VLAN for Phase 1. Add cluster join tokens in Phase 2 (requires Chainfire/FlareDB changes). ### 7.4 Multi-Datacenter Deployment **Question:** How does provisioning work across geographically distributed datacenters? **Challenges:** - WAN latency for Nix cache fetches - PXE boot requires local DHCP/TFTP - Cluster join across WAN (Raft latency) **Options:** 1. **Replicated Provisioning Server:** Deploy boot server in each datacenter, sync configs 2. **Central Provisioning with Local Cache:** Single source of truth, local Nix cache mirrors 3. **Per-DC Clusters:** Each datacenter is independent cluster, federated at application layer **Recommendation:** Defer to Phase 2. Phase 1 assumes single datacenter or low-latency LAN. ### 7.5 Disk Encryption **Question:** Should disks be encrypted at rest? **Trade-offs:** - **Pros:** Compliance (GDPR, PCI-DSS), protection against physical theft - **Cons:** Key management complexity, can't auto-reboot (manual unlock), performance overhead (~5-10%) **Options:** 1. **No Encryption:** Rely on physical security 2. **LUKS with Network Unlock:** Tang/Clevis for automated unlocking (requires network on boot) 3. **LUKS with Manual Unlock:** Operator enters passphrase via KVM/IPMI **Recommendation:** Optional, configurable per deployment. Provide disko template for LUKS, let operator decide. ### 7.6 Rolling Updates **Question:** How do we update a running cluster without downtime? **Challenges:** - Raft requires quorum (can't update majority simultaneously) - Service dependencies (Chainfire → FlareDB → others) - NixOS rebuild requires reboot (for kernel/init changes) **Strategy:** 1. Update one node at a time (rolling) 2. Verify health before proceeding to next 3. Use `nixos-rebuild test` first (activates without bootloader change), then `switch` after validation **Tooling:** - Ansible playbook for orchestration - Health check scripts (curl endpoints + check Raft status) - Rollback plan (NixOS generations + Raft snapshot restore) **Recommendation:** Document as runbook in Phase 1, implement automated rolling update in Phase 2 (T033?). ### 7.7 Monitoring and Alerting **Question:** How do we monitor provisioning success/failure? **Options:** 1. **Manual:** Operator watches terminal, checks health endpoints 2. **Log Aggregation:** Collect installer logs, index in Loki/Elasticsearch 3. **Event Webhook:** Installer posts events to monitoring system (Grafana, PagerDuty) **Recommendation:** Phase 1 uses manual monitoring. Phase 2 adds structured logging + webhooks for fleet deployments. ### 7.8 Compatibility with Existing Infrastructure **Question:** Can this provisioning system coexist with existing PXE infrastructure (e.g., for other OS deployments)? **Concerns:** - Existing DHCP config may conflict - TFTP server may serve other boot files - Network team may control PXE infrastructure **Solutions:** 1. **Dedicated Provisioning VLAN:** PlasmaCloud nodes on separate network 2. **Conditional DHCP:** Use vendor-class or subnet matching to route to correct boot server 3. **Multi-Boot Menu:** iPXE menu includes options for PlasmaCloud and other OSes **Recommendation:** Document network requirements, provide example DHCP config for common scenarios (dedicated VLAN, shared infrastructure). Coordinate with network team. --- ## Appendices ### A. Example Disko Configuration **Single Disk with GPT and ext4:** ```nix # nodes/node01/disko.nix { disks ? [ "/dev/sda" ], ... }: { disko.devices = { disk = { main = { type = "disk"; device = builtins.head disks; content = { type = "gpt"; partitions = { ESP = { size = "512M"; type = "EF00"; content = { type = "filesystem"; format = "vfat"; mountpoint = "/boot"; }; }; root = { size = "100%"; content = { type = "filesystem"; format = "ext4"; mountpoint = "/"; }; }; }; }; }; }; }; } ``` **RAID1 with LUKS Encryption:** ```nix { disks ? [ "/dev/sda" "/dev/sdb" ], ... }: { disko.devices = { disk = { disk1 = { device = builtins.elemAt disks 0; type = "disk"; content = { type = "gpt"; partitions = { boot = { size = "1M"; type = "EF02"; # BIOS boot }; mdraid = { size = "100%"; content = { type = "mdraid"; name = "raid1"; }; }; }; }; }; disk2 = { device = builtins.elemAt disks 1; type = "disk"; content = { type = "gpt"; partitions = { boot = { size = "1M"; type = "EF02"; }; mdraid = { size = "100%"; content = { type = "mdraid"; name = "raid1"; }; }; }; }; }; }; mdadm = { raid1 = { type = "mdadm"; level = 1; content = { type = "luks"; name = "cryptroot"; settings.allowDiscards = true; content = { type = "filesystem"; format = "ext4"; mountpoint = "/"; }; }; }; }; }; } ``` ### B. Complete nixos-anywhere Command Examples **Basic Deployment:** ```bash nix run github:nix-community/nixos-anywhere -- \ --flake .#node01 \ root@10.0.0.100 ``` **With Build on Remote (Slow Local Machine):** ```bash nix run github:nix-community/nixos-anywhere -- \ --flake .#node01 \ --build-on-remote \ root@10.0.0.100 ``` **With Disk Encryption Key:** ```bash nix run github:nix-community/nixos-anywhere -- \ --flake .#node01 \ --disk-encryption-keys /tmp/luks.key <(cat /secrets/node01-luks.key) \ root@10.0.0.100 ``` **Debug Mode (Keep Installer After Failure):** ```bash nix run github:nix-community/nixos-anywhere -- \ --flake .#node01 \ --debug \ --no-reboot \ root@10.0.0.100 ``` ### C. Provisioning Server Setup Script ```bash #!/bin/bash # /srv/provisioning/scripts/setup-provisioning-server.sh set -euo pipefail # Install dependencies apt-get update apt-get install -y nginx tftpd-hpa dnsmasq curl # Configure TFTP cat > /etc/default/tftpd-hpa < /etc/nginx/sites-available/pxe <