- Replace form_urlencoded with RFC 3986 compliant URI encoding - Implement aws_uri_encode() matching AWS SigV4 spec exactly - Unreserved chars (A-Z,a-z,0-9,-,_,.,~) not encoded - All other chars percent-encoded with uppercase hex - Preserve slashes in paths, encode in query params - Normalize empty paths to '/' per AWS spec - Fix test expectations (body hash, HMAC values) - Add comprehensive SigV4 signature determinism test This fixes the canonicalization mismatch that caused signature validation failures in T047. Auth can now be enabled for production. Refs: T058.S1
1553 lines
47 KiB
Markdown
1553 lines
47 KiB
Markdown
# T032 Bare-Metal Provisioning Design Document
|
|
|
|
**Status:** Draft
|
|
**Author:** peerB
|
|
**Created:** 2025-12-10
|
|
**Last Updated:** 2025-12-10
|
|
|
|
## 1. Architecture Overview
|
|
|
|
This document outlines the design for automated bare-metal provisioning of the PlasmaCloud platform, which consists of 8 core services (Chainfire, FlareDB, IAM, PlasmaVMC, PrismNET, FlashDNS, FiberLB, and K8sHost). The provisioning system leverages NixOS's declarative configuration capabilities to enable fully automated deployment from bare hardware to a running, clustered platform.
|
|
|
|
The high-level flow follows this sequence: **PXE Boot → kexec NixOS Installer → disko Disk Partitioning → nixos-anywhere Installation → First-Boot Configuration → Running Cluster**. A bare-metal server performs a network boot via PXE/iPXE, which loads a minimal NixOS installer into RAM using kexec. The installer then connects to a provisioning server, which uses nixos-anywhere to declaratively partition disks (via disko), install NixOS with pre-configured services, and inject node-specific configuration (SSH keys, network settings, cluster join parameters, TLS certificates). On first boot, the system automatically joins existing Raft clusters (Chainfire/FlareDB) or bootstraps new ones, and all 8 services start with proper dependencies and TLS enabled.
|
|
|
|
The key components are:
|
|
|
|
- **PXE/iPXE Boot Server**: Serves boot binaries and configuration scripts via TFTP/HTTP
|
|
- **nixos-anywhere**: SSH-based remote installation tool that orchestrates the entire deployment
|
|
- **disko**: Declarative disk partitioning engine integrated with nixos-anywhere
|
|
- **kexec**: Linux kernel feature enabling fast boot into NixOS installer without full reboot
|
|
- **NixOS Flake** (from T024): Provides all service packages and NixOS modules
|
|
- **Configuration Injection System**: Manages node-specific secrets, network config, and cluster metadata
|
|
- **First-Boot Automation**: Systemd units that perform cluster join and service initialization
|
|
|
|
## 2. PXE Boot Flow
|
|
|
|
### 2.1 Boot Sequence
|
|
|
|
```
|
|
┌─────────────┐
|
|
│ Bare Metal │
|
|
│ Server │
|
|
└──────┬──────┘
|
|
│ 1. UEFI/BIOS PXE ROM
|
|
▼
|
|
┌──────────────┐
|
|
│ DHCP Server │ Option 93: Client Architecture (0=BIOS, 7=UEFI x64)
|
|
│ │ Option 67: Boot filename (undionly.kpxe or ipxe.efi)
|
|
│ │ Option 66: TFTP server address
|
|
└──────┬───────┘
|
|
│ 2. DHCP OFFER with boot parameters
|
|
▼
|
|
┌──────────────┐
|
|
│ TFTP/HTTP │
|
|
│ Server │ Serves: undionly.kpxe (BIOS) or ipxe.efi (UEFI)
|
|
└──────┬───────┘
|
|
│ 3. Download iPXE bootloader
|
|
▼
|
|
┌──────────────┐
|
|
│ iPXE Running │ User-class="iPXE" in DHCP request
|
|
│ (in RAM) │
|
|
└──────┬───────┘
|
|
│ 4. Second DHCP request (now with iPXE user-class)
|
|
▼
|
|
┌──────────────┐
|
|
│ DHCP Server │ Detects user-class="iPXE"
|
|
│ │ Option 67: http://boot.server/boot.ipxe
|
|
└──────┬───────┘
|
|
│ 5. DHCP OFFER with script URL
|
|
▼
|
|
┌──────────────┐
|
|
│ HTTP Server │ Serves: boot.ipxe (iPXE script)
|
|
└──────┬───────┘
|
|
│ 6. Download and execute boot script
|
|
▼
|
|
┌──────────────┐
|
|
│ iPXE Script │ Loads: NixOS kernel + initrd + kexec
|
|
│ Execution │
|
|
└──────┬───────┘
|
|
│ 7. kexec into NixOS installer
|
|
▼
|
|
┌──────────────┐
|
|
│ NixOS Live │ SSH enabled, waiting for nixos-anywhere
|
|
│ Installer │
|
|
└──────────────┘
|
|
```
|
|
|
|
### 2.2 DHCP Configuration Requirements
|
|
|
|
The DHCP server must support architecture-specific boot file selection and iPXE user-class detection. For ISC DHCP server (`/etc/dhcp/dhcpd.conf`):
|
|
|
|
```dhcp
|
|
# Architecture detection (RFC 4578)
|
|
option architecture-type code 93 = unsigned integer 16;
|
|
|
|
# iPXE detection
|
|
option user-class code 77 = string;
|
|
|
|
subnet 10.0.0.0 netmask 255.255.255.0 {
|
|
range 10.0.0.100 10.0.0.200;
|
|
|
|
option routers 10.0.0.1;
|
|
option domain-name-servers 10.0.0.1;
|
|
|
|
# Boot server
|
|
next-server 10.0.0.2; # TFTP/HTTP server IP
|
|
|
|
# Chainloading logic
|
|
if exists user-class and option user-class = "iPXE" {
|
|
# iPXE is already loaded, provide boot script via HTTP
|
|
filename "http://10.0.0.2:8080/boot.ipxe";
|
|
} elsif option architecture-type = 00:00 {
|
|
# BIOS (legacy) - load iPXE via TFTP
|
|
filename "undionly.kpxe";
|
|
} elsif option architecture-type = 00:07 {
|
|
# UEFI x86_64 - load iPXE via TFTP
|
|
filename "ipxe.efi";
|
|
} elsif option architecture-type = 00:09 {
|
|
# UEFI x86_64 (alternate) - load iPXE via TFTP
|
|
filename "ipxe.efi";
|
|
} else {
|
|
# Fallback
|
|
filename "ipxe.efi";
|
|
}
|
|
}
|
|
```
|
|
|
|
**Key Points:**
|
|
- **Option 93** (architecture-type): Distinguishes BIOS (0x0000) vs UEFI (0x0007/0x0009)
|
|
- **Option 66** (next-server): TFTP server IP for initial boot files
|
|
- **Option 67** (filename): Boot file name, changes based on architecture and iPXE presence
|
|
- **User-class detection**: Prevents infinite loop (iPXE downloading itself)
|
|
- **HTTP chainloading**: After iPXE loads, switch to HTTP for faster downloads
|
|
|
|
### 2.3 iPXE Script Structure
|
|
|
|
The boot script (`/srv/boot/boot.ipxe`) provides a menu for deployment profiles:
|
|
|
|
```ipxe
|
|
#!ipxe
|
|
|
|
# Variables
|
|
set boot-server 10.0.0.2:8080
|
|
set nix-cache http://${boot-server}/nix-cache
|
|
|
|
# Display system info
|
|
echo System information:
|
|
echo - Platform: ${platform}
|
|
echo - Architecture: ${buildarch}
|
|
echo - MAC: ${net0/mac}
|
|
echo - IP: ${net0/ip}
|
|
echo
|
|
|
|
# Menu with timeout
|
|
:menu
|
|
menu PlasmaCloud Bare-Metal Provisioning
|
|
item --gap -- ──────────── Deployment Profiles ────────────
|
|
item control-plane Install Control Plane Node (Chainfire + FlareDB + IAM)
|
|
item worker Install Worker Node (PlasmaVMC + PrismNET + Storage)
|
|
item all-in-one Install All-in-One (All 8 Services)
|
|
item shell Boot to NixOS Installer Shell
|
|
item --gap -- ─────────────────────────────────────────────
|
|
item --key r reboot Reboot System
|
|
choose --timeout 30000 --default all-in-one target || goto menu
|
|
|
|
# Execute selection
|
|
goto ${target}
|
|
|
|
:control-plane
|
|
echo Booting Control Plane installer...
|
|
set profile control-plane
|
|
goto boot
|
|
|
|
:worker
|
|
echo Booting Worker Node installer...
|
|
set profile worker
|
|
goto boot
|
|
|
|
:all-in-one
|
|
echo Booting All-in-One installer...
|
|
set profile all-in-one
|
|
goto boot
|
|
|
|
:shell
|
|
echo Booting to installer shell...
|
|
set profile shell
|
|
goto boot
|
|
|
|
:boot
|
|
# Load NixOS netboot artifacts (from nixos-images or custom build)
|
|
kernel http://${boot-server}/nixos/bzImage init=/nix/store/...-nixos-system/init loglevel=4 console=ttyS0 console=tty0 nixos.profile=${profile}
|
|
initrd http://${boot-server}/nixos/initrd
|
|
boot
|
|
|
|
:reboot
|
|
reboot
|
|
|
|
:failed
|
|
echo Boot failed, dropping to shell...
|
|
sleep 10
|
|
shell
|
|
```
|
|
|
|
**Features:**
|
|
- **Multi-profile support**: Different service combinations per node type
|
|
- **Hardware detection**: Shows MAC/IP for inventory tracking
|
|
- **Timeout with default**: Unattended deployment after 30 seconds
|
|
- **Kernel parameters**: Pass profile to NixOS installer for conditional configuration
|
|
- **Error handling**: Falls back to shell on failure
|
|
|
|
### 2.4 HTTP vs TFTP Trade-offs
|
|
|
|
| Aspect | TFTP | HTTP |
|
|
|--------|------|------|
|
|
| **Speed** | ~1-5 MB/s (UDP, no windowing) | ~50-100+ MB/s (TCP with pipelining) |
|
|
| **Reliability** | Low (UDP, prone to timeouts) | High (TCP with retries) |
|
|
| **Firmware Support** | Universal (all PXE ROMs) | UEFI 2.5+ only (HTTP Boot) |
|
|
| **Complexity** | Simple protocol, minimal config | Requires web server (nginx/apache) |
|
|
| **Use Case** | Initial iPXE binary (~100KB) | Kernel/initrd/images (~100-500MB) |
|
|
|
|
**Recommended Hybrid Approach:**
|
|
1. **TFTP** for initial iPXE binary delivery (universal compatibility)
|
|
2. **HTTP** for all subsequent artifacts (kernel, initrd, scripts, packages)
|
|
3. Configure iPXE with embedded HTTP support
|
|
4. NixOS netboot images served via HTTP with range request support for resumability
|
|
|
|
**UEFI HTTP Boot Alternative:**
|
|
For pure UEFI environments, skip TFTP entirely by using DHCP Option 60 (Vendor Class = "HTTPClient") and Option 67 (HTTP URI). However, this lacks BIOS compatibility and requires newer firmware (2015+).
|
|
|
|
## 3. Image Generation Strategy
|
|
|
|
### 3.1 Building NixOS Netboot Images
|
|
|
|
NixOS provides built-in netboot image generation. We extend this to include PlasmaCloud services:
|
|
|
|
**Option 1: Custom Netboot Configuration (Recommended)**
|
|
|
|
Create `nix/images/netboot.nix`:
|
|
|
|
```nix
|
|
{ config, pkgs, lib, modulesPath, ... }:
|
|
|
|
{
|
|
imports = [
|
|
"${modulesPath}/installer/netboot/netboot-minimal.nix"
|
|
../../nix/modules # PlasmaCloud service modules
|
|
];
|
|
|
|
# Networking for installer phase
|
|
networking = {
|
|
usePredictableInterfaceNames = false; # Use eth0 instead of enpXsY
|
|
useDHCP = true;
|
|
firewall.enable = false; # Open during installation
|
|
};
|
|
|
|
# SSH for nixos-anywhere
|
|
services.openssh = {
|
|
enable = true;
|
|
settings = {
|
|
PermitRootLogin = "yes";
|
|
PasswordAuthentication = false; # Key-based only
|
|
};
|
|
};
|
|
|
|
# Authorized keys for provisioning server
|
|
users.users.root.openssh.authorizedKeys.keys = [
|
|
"ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIProvisioning Server Key..."
|
|
];
|
|
|
|
# Minimal kernel for hardware support
|
|
boot.kernelPackages = pkgs.linuxPackages_latest;
|
|
boot.supportedFilesystems = [ "ext4" "xfs" "btrfs" "zfs" ];
|
|
|
|
# Include disko for disk management
|
|
environment.systemPackages = with pkgs; [
|
|
disko
|
|
parted
|
|
cryptsetup
|
|
lvm2
|
|
];
|
|
|
|
# Disable unnecessary services for installer
|
|
documentation.enable = false;
|
|
documentation.nixos.enable = false;
|
|
sound.enable = false;
|
|
|
|
# Build artifacts needed for netboot
|
|
system.build = {
|
|
netbootRamdisk = config.system.build.initialRamdisk;
|
|
kernel = config.system.build.kernel;
|
|
netbootIpxeScript = pkgs.writeText "netboot.ipxe" ''
|
|
#!ipxe
|
|
kernel \${boot-url}/bzImage init=${config.system.build.toplevel}/init ${toString config.boot.kernelParams}
|
|
initrd \${boot-url}/initrd
|
|
boot
|
|
'';
|
|
};
|
|
}
|
|
```
|
|
|
|
Build the netboot artifacts:
|
|
|
|
```bash
|
|
nix build .#nixosConfigurations.netboot.config.system.build.netbootRamdisk
|
|
nix build .#nixosConfigurations.netboot.config.system.build.kernel
|
|
|
|
# Copy to HTTP server
|
|
cp result/bzImage /srv/boot/nixos/
|
|
cp result/initrd /srv/boot/nixos/
|
|
```
|
|
|
|
**Option 2: Use Pre-built Images (Faster Development)**
|
|
|
|
The [nix-community/nixos-images](https://github.com/nix-community/nixos-images) project provides pre-built netboot images:
|
|
|
|
```bash
|
|
# Use their iPXE chainload directly
|
|
chain https://github.com/nix-community/nixos-images/releases/download/nixos-unstable/netboot-x86_64-linux.ipxe
|
|
|
|
# Or download artifacts
|
|
curl -L https://github.com/nix-community/nixos-images/releases/download/nixos-unstable/bzImage -o /srv/boot/nixos/bzImage
|
|
curl -L https://github.com/nix-community/nixos-images/releases/download/nixos-unstable/initrd -o /srv/boot/nixos/initrd
|
|
```
|
|
|
|
### 3.2 Configuration Injection Approach
|
|
|
|
Configuration must be injected at installation time (not baked into netboot image) to support:
|
|
- Node-specific networking (static IPs, VLANs)
|
|
- Cluster join parameters (existing Raft leader addresses)
|
|
- TLS certificates (unique per node)
|
|
- Hardware-specific disk layouts
|
|
|
|
**Three-Phase Configuration Model:**
|
|
|
|
**Phase 1: Netboot Image (Generic)**
|
|
- Universal kernel with broad hardware support
|
|
- SSH server with provisioning key
|
|
- disko + installer tools
|
|
- No node-specific data
|
|
|
|
**Phase 2: nixos-anywhere Deployment (Node-Specific)**
|
|
- Pull node configuration from provisioning server based on MAC/hostname
|
|
- Partition disks per disko spec
|
|
- Install NixOS with flake: `github:yourorg/plasmacloud#node-hostname`
|
|
- Inject secrets: `/etc/nixos/secrets/` (TLS certs, cluster tokens)
|
|
|
|
**Phase 3: First Boot (Service Initialization)**
|
|
- systemd service reads `/etc/nixos/secrets/cluster-config.json`
|
|
- Auto-join Chainfire cluster (or bootstrap if first node)
|
|
- FlareDB joins after Chainfire is healthy
|
|
- IAM initializes with FlareDB backend
|
|
- Other services start with proper dependencies
|
|
|
|
**Configuration Repository Structure:**
|
|
|
|
```
|
|
/srv/provisioning/
|
|
├── nodes/
|
|
│ ├── node01.example.com/
|
|
│ │ ├── hardware.nix # Generated from nixos-generate-config
|
|
│ │ ├── configuration.nix # Node-specific service config
|
|
│ │ ├── disko.nix # Disk layout
|
|
│ │ └── secrets/
|
|
│ │ ├── tls-cert.pem
|
|
│ │ ├── tls-key.pem
|
|
│ │ ├── tls-ca.pem
|
|
│ │ └── cluster-config.json
|
|
│ └── node02.example.com/
|
|
│ └── ...
|
|
├── profiles/
|
|
│ ├── control-plane.nix # Chainfire + FlareDB + IAM
|
|
│ ├── worker.nix # PlasmaVMC + storage
|
|
│ └── all-in-one.nix # All 8 services
|
|
└── common/
|
|
├── base.nix # Common settings (SSH, users, firewall)
|
|
└── networking.nix # Network defaults
|
|
```
|
|
|
|
**Node Configuration Example (`nodes/node01.example.com/configuration.nix`):**
|
|
|
|
```nix
|
|
{ config, pkgs, lib, ... }:
|
|
|
|
{
|
|
imports = [
|
|
../../profiles/control-plane.nix
|
|
../../common/base.nix
|
|
./hardware.nix
|
|
./disko.nix
|
|
];
|
|
|
|
networking = {
|
|
hostName = "node01";
|
|
domain = "example.com";
|
|
interfaces.eth0 = {
|
|
useDHCP = false;
|
|
ipv4.addresses = [{
|
|
address = "10.0.1.10";
|
|
prefixLength = 24;
|
|
}];
|
|
};
|
|
defaultGateway = "10.0.1.1";
|
|
nameservers = [ "10.0.1.1" ];
|
|
};
|
|
|
|
# Service configuration
|
|
services.chainfire = {
|
|
enable = true;
|
|
port = 2379;
|
|
raftPort = 2380;
|
|
gossipPort = 2381;
|
|
settings = {
|
|
node_id = "node01";
|
|
cluster_name = "prod-cluster";
|
|
# Initial cluster peers (for bootstrap)
|
|
initial_peers = [
|
|
"node01.example.com:2380"
|
|
"node02.example.com:2380"
|
|
"node03.example.com:2380"
|
|
];
|
|
tls = {
|
|
cert_path = "/etc/nixos/secrets/tls-cert.pem";
|
|
key_path = "/etc/nixos/secrets/tls-key.pem";
|
|
ca_path = "/etc/nixos/secrets/tls-ca.pem";
|
|
};
|
|
};
|
|
};
|
|
|
|
services.flaredb = {
|
|
enable = true;
|
|
port = 2479;
|
|
raftPort = 2480;
|
|
settings = {
|
|
node_id = "node01";
|
|
cluster_name = "prod-cluster";
|
|
chainfire_endpoint = "https://localhost:2379";
|
|
tls = {
|
|
cert_path = "/etc/nixos/secrets/tls-cert.pem";
|
|
key_path = "/etc/nixos/secrets/tls-key.pem";
|
|
ca_path = "/etc/nixos/secrets/tls-ca.pem";
|
|
};
|
|
};
|
|
};
|
|
|
|
services.iam = {
|
|
enable = true;
|
|
port = 8080;
|
|
settings = {
|
|
flaredb_endpoint = "https://localhost:2479";
|
|
tls = {
|
|
cert_path = "/etc/nixos/secrets/tls-cert.pem";
|
|
key_path = "/etc/nixos/secrets/tls-key.pem";
|
|
};
|
|
};
|
|
};
|
|
|
|
system.stateVersion = "24.11";
|
|
}
|
|
```
|
|
|
|
### 3.3 Hardware Detection vs Explicit Hardware Config
|
|
|
|
**Hardware Detection (Automatic):**
|
|
|
|
During installation, `nixos-generate-config` scans hardware and creates `hardware-configuration.nix`:
|
|
|
|
```bash
|
|
# On live installer, after disk setup
|
|
nixos-generate-config --root /mnt --show-hardware-config > /tmp/hardware.nix
|
|
|
|
# Upload to provisioning server
|
|
curl -X POST -F "file=@/tmp/hardware.nix" http://provisioning-server/api/hardware/node01
|
|
```
|
|
|
|
**Explicit Hardware Config (Declarative):**
|
|
|
|
For homogeneous hardware (e.g., fleet of identical servers), use a template:
|
|
|
|
```nix
|
|
# profiles/hardware/dell-r640.nix
|
|
{ config, lib, pkgs, modulesPath, ... }:
|
|
|
|
{
|
|
imports = [ (modulesPath + "/installer/scan/not-detected.nix") ];
|
|
|
|
boot.initrd.availableKernelModules = [ "xhci_pci" "ahci" "nvme" "usbhid" "sd_mod" ];
|
|
boot.kernelModules = [ "kvm-intel" ];
|
|
|
|
# Network interfaces (predictable naming)
|
|
networking.interfaces = {
|
|
enp59s0f0 = {}; # 10GbE Port 1
|
|
enp59s0f1 = {}; # 10GbE Port 2
|
|
};
|
|
|
|
# CPU microcode updates
|
|
hardware.cpu.intel.updateMicrocode = true;
|
|
|
|
# Power management
|
|
powerManagement.cpuFreqGovernor = "performance";
|
|
|
|
nixpkgs.hostPlatform = "x86_64-linux";
|
|
}
|
|
```
|
|
|
|
**Recommendation:**
|
|
- **Phase 1 (Development):** Auto-detect hardware for flexibility
|
|
- **Phase 2 (Production):** Standardize on explicit hardware profiles for consistency and faster deployments
|
|
|
|
### 3.4 Image Size Optimization
|
|
|
|
Netboot images must fit in RAM (typically 1-4 GB available after kexec). Strategies:
|
|
|
|
**1. Exclude Documentation and Locales:**
|
|
```nix
|
|
documentation.enable = false;
|
|
documentation.nixos.enable = false;
|
|
i18n.supportedLocales = [ "en_US.UTF-8/UTF-8" ];
|
|
```
|
|
|
|
**2. Minimal Kernel:**
|
|
```nix
|
|
boot.kernelPackages = pkgs.linuxPackages_latest;
|
|
boot.kernelParams = [ "modprobe.blacklist=nouveau" ]; # Exclude unused drivers
|
|
```
|
|
|
|
**3. Squashfs Compression:**
|
|
NixOS netboot uses squashfs for the Nix store, achieving ~2.5x compression:
|
|
```nix
|
|
# Automatically applied by netboot-minimal.nix
|
|
system.build.squashfsStore = ...; # Default: gzip compression
|
|
```
|
|
|
|
**4. On-Demand Package Fetching:**
|
|
Instead of bundling all packages, fetch from HTTP substituter during installation:
|
|
```nix
|
|
nix.settings.substituters = [ "http://10.0.0.2:8080/nix-cache" ];
|
|
nix.settings.trusted-public-keys = [ "cache-key-here" ];
|
|
```
|
|
|
|
**Expected Sizes:**
|
|
- **Minimal installer (no services):** ~150-250 MB (initrd)
|
|
- **Installer + PlasmaCloud packages:** ~400-600 MB (with on-demand fetch)
|
|
- **Full offline installer:** ~1-2 GB (includes all service closures)
|
|
|
|
## 4. Installation Flow
|
|
|
|
### 4.1 Step-by-Step Process
|
|
|
|
**1. PXE Boot to NixOS Installer (Automated)**
|
|
|
|
- Server powers on, sends DHCP request
|
|
- DHCP provides iPXE binary (via TFTP)
|
|
- iPXE loads, sends second DHCP request with user-class
|
|
- DHCP provides boot script URL (via HTTP)
|
|
- iPXE downloads script, executes, loads kernel+initrd
|
|
- kexec into NixOS installer (in RAM, ~30-60 seconds)
|
|
- Installer boots, acquires IP via DHCP, starts SSH server
|
|
|
|
**2. Provisioning Server Detects Node (Semi-Automated)**
|
|
|
|
Provisioning server monitors DHCP leases or receives webhook from installer:
|
|
|
|
```bash
|
|
# Installer sends registration on boot (custom init script)
|
|
curl -X POST http://provisioning-server/api/register \
|
|
-d '{"mac":"aa:bb:cc:dd:ee:ff","ip":"10.0.0.100","hostname":"node01"}'
|
|
```
|
|
|
|
Provisioning server looks up node in inventory:
|
|
```bash
|
|
# /srv/provisioning/inventory.json
|
|
{
|
|
"nodes": {
|
|
"aa:bb:cc:dd:ee:ff": {
|
|
"hostname": "node01.example.com",
|
|
"profile": "control-plane",
|
|
"config_path": "/srv/provisioning/nodes/node01.example.com"
|
|
}
|
|
}
|
|
}
|
|
```
|
|
|
|
**3. Run nixos-anywhere (Automated)**
|
|
|
|
Provisioning server executes nixos-anywhere:
|
|
|
|
```bash
|
|
#!/bin/bash
|
|
# /srv/provisioning/scripts/provision-node.sh
|
|
|
|
NODE_MAC="$1"
|
|
NODE_IP=$(get_ip_from_dhcp "$NODE_MAC")
|
|
NODE_HOSTNAME=$(lookup_hostname "$NODE_MAC")
|
|
CONFIG_PATH="/srv/provisioning/nodes/$NODE_HOSTNAME"
|
|
|
|
# Copy secrets to installer (will be injected during install)
|
|
ssh root@$NODE_IP "mkdir -p /tmp/secrets"
|
|
scp $CONFIG_PATH/secrets/* root@$NODE_IP:/tmp/secrets/
|
|
|
|
# Run nixos-anywhere with disko
|
|
nix run github:nix-community/nixos-anywhere -- \
|
|
--flake "/srv/provisioning#$NODE_HOSTNAME" \
|
|
--build-on-remote \
|
|
--disk-encryption-keys /tmp/disk.key <(cat $CONFIG_PATH/secrets/disk-encryption.key) \
|
|
root@$NODE_IP
|
|
```
|
|
|
|
nixos-anywhere performs:
|
|
- Detects existing OS (if any)
|
|
- Loads kexec if needed (already done via PXE)
|
|
- Runs disko to partition disks (based on `$CONFIG_PATH/disko.nix`)
|
|
- Builds NixOS system closure (either locally or on target)
|
|
- Copies closure to `/mnt` (mounted root)
|
|
- Installs bootloader (GRUB/systemd-boot)
|
|
- Copies secrets to `/mnt/etc/nixos/secrets/`
|
|
- Unmounts, reboots
|
|
|
|
**4. First Boot into Installed System (Automated)**
|
|
|
|
Server reboots from disk (GRUB/systemd-boot), loads NixOS:
|
|
|
|
- systemd starts
|
|
- `chainfire.service` starts (waits 30s for network)
|
|
- If `initial_peers` matches only self → bootstrap new cluster
|
|
- If `initial_peers` includes others → attempt to join existing cluster
|
|
- `flaredb.service` starts after chainfire is healthy
|
|
- `iam.service` starts after flaredb is healthy
|
|
- Other services start based on profile
|
|
|
|
**First-boot cluster join logic** (systemd unit):
|
|
|
|
```nix
|
|
# /etc/nixos/first-boot-cluster-join.nix
|
|
{ config, lib, pkgs, ... }:
|
|
|
|
let
|
|
clusterConfig = builtins.fromJSON (builtins.readFile /etc/nixos/secrets/cluster-config.json);
|
|
in
|
|
{
|
|
systemd.services.chainfire-cluster-join = {
|
|
description = "Chainfire Cluster Join";
|
|
after = [ "network-online.target" "chainfire.service" ];
|
|
wants = [ "network-online.target" ];
|
|
wantedBy = [ "multi-user.target" ];
|
|
|
|
serviceConfig = {
|
|
Type = "oneshot";
|
|
RemainAfterExit = true;
|
|
};
|
|
|
|
script = ''
|
|
# Wait for local chainfire to be ready
|
|
until ${pkgs.curl}/bin/curl -k https://localhost:2379/health; do
|
|
echo "Waiting for local chainfire..."
|
|
sleep 5
|
|
done
|
|
|
|
# Check if this is the first node (bootstrap)
|
|
if [ "${clusterConfig.bootstrap}" = "true" ]; then
|
|
echo "Bootstrap node, cluster already initialized"
|
|
exit 0
|
|
fi
|
|
|
|
# Join existing cluster
|
|
LEADER_URL="${clusterConfig.leader_url}"
|
|
NODE_ID="${clusterConfig.node_id}"
|
|
RAFT_ADDR="${clusterConfig.raft_addr}"
|
|
|
|
${pkgs.curl}/bin/curl -k -X POST "$LEADER_URL/admin/member/add" \
|
|
-H "Content-Type: application/json" \
|
|
-d "{\"id\":\"$NODE_ID\",\"raft_addr\":\"$RAFT_ADDR\"}"
|
|
|
|
echo "Cluster join initiated"
|
|
'';
|
|
};
|
|
|
|
# Similar for flaredb
|
|
systemd.services.flaredb-cluster-join = {
|
|
description = "FlareDB Cluster Join";
|
|
after = [ "chainfire-cluster-join.service" "flaredb.service" ];
|
|
requires = [ "chainfire-cluster-join.service" ];
|
|
# ... similar logic
|
|
};
|
|
}
|
|
```
|
|
|
|
**5. Validation (Manual/Automated)**
|
|
|
|
Provisioning server polls health endpoints:
|
|
|
|
```bash
|
|
# Health check script
|
|
curl -k https://10.0.1.10:2379/health # Chainfire
|
|
curl -k https://10.0.1.10:2479/health # FlareDB
|
|
curl -k https://10.0.1.10:8080/health # IAM
|
|
|
|
# Cluster status
|
|
curl -k https://10.0.1.10:2379/admin/cluster/members | jq
|
|
```
|
|
|
|
### 4.2 Error Handling and Recovery
|
|
|
|
**Boot Failures:**
|
|
- **Symptom:** Server stuck in PXE boot loop
|
|
- **Diagnosis:** Check DHCP server logs, verify TFTP/HTTP server accessibility
|
|
- **Recovery:** Fix DHCP config, restart services, retry boot
|
|
|
|
**Disk Partitioning Failures:**
|
|
- **Symptom:** nixos-anywhere fails during disko phase
|
|
- **Diagnosis:** SSH to installer, run `dmesg | grep -i error`, check disk accessibility
|
|
- **Recovery:** Adjust disko config (e.g., wrong disk device), re-run nixos-anywhere
|
|
|
|
**Installation Failures:**
|
|
- **Symptom:** nixos-anywhere fails during installation phase
|
|
- **Diagnosis:** Check nixos-anywhere output, SSH to `/mnt` to inspect
|
|
- **Recovery:** Fix configuration errors, re-run nixos-anywhere (will reformat)
|
|
|
|
**Cluster Join Failures:**
|
|
- **Symptom:** Service starts but not in cluster
|
|
- **Diagnosis:** `journalctl -u chainfire-cluster-join`, check leader reachability
|
|
- **Recovery:** Manually run join command, verify TLS certs, check firewall
|
|
|
|
**Rollback Strategy:**
|
|
- NixOS generations provide atomic rollback: `nixos-rebuild switch --rollback`
|
|
- For catastrophic failure: Re-provision from PXE (data loss if not replicated)
|
|
|
|
### 4.3 Network Requirements
|
|
|
|
**DHCP:**
|
|
- Option 66/67 for PXE boot
|
|
- Option 93 for architecture detection
|
|
- User-class filtering for iPXE chainload
|
|
- Static reservations for production nodes (optional)
|
|
|
|
**DNS:**
|
|
- Forward and reverse DNS for all nodes (required for TLS cert CN verification)
|
|
- Example: `node01.example.com` → `10.0.1.10`, `10.0.1.10` → `node01.example.com`
|
|
|
|
**Firewall:**
|
|
- Allow TFTP (UDP 69) from nodes to boot server
|
|
- Allow HTTP (TCP 80/8080) from nodes to boot/provisioning server
|
|
- Allow SSH (TCP 22) from provisioning server to nodes
|
|
- Allow service ports (2379-2381, 2479-2480, 8080, etc.) between cluster nodes
|
|
|
|
**Internet Access:**
|
|
- **During installation:** Required for Nix binary cache (cache.nixos.org) unless using local cache
|
|
- **After installation:** Optional (recommended for updates), can run air-gapped with local cache
|
|
- **Workaround:** Set up local binary cache: `nix-serve` + nginx
|
|
|
|
**Bandwidth:**
|
|
- **PXE boot:** ~200 MB (kernel + initrd) per node, sequential is acceptable
|
|
- **Installation:** ~1-5 GB (Nix closures) per node, parallel ok if cache is local
|
|
- **Recommendation:** 1 Gbps link between provisioning server and nodes
|
|
|
|
## 5. Integration Points
|
|
|
|
### 5.1 T024 NixOS Modules
|
|
|
|
The NixOS modules from T024 (`nix/modules/*.nix`) provide declarative service configuration. They are included in node configurations:
|
|
|
|
```nix
|
|
{ config, pkgs, lib, ... }:
|
|
|
|
{
|
|
imports = [
|
|
# Import PlasmaCloud service modules
|
|
inputs.plasmacloud.nixosModules.default
|
|
];
|
|
|
|
# Enable services declaratively
|
|
services.chainfire.enable = true;
|
|
services.flaredb.enable = true;
|
|
services.iam.enable = true;
|
|
# ... etc
|
|
}
|
|
```
|
|
|
|
**Module Integration Strategy:**
|
|
|
|
1. **Flake Inputs:** Node configurations reference the PlasmaCloud flake:
|
|
```nix
|
|
# flake.nix for provisioning repo
|
|
inputs.plasmacloud.url = "github:yourorg/plasmacloud";
|
|
# or path-based for development
|
|
inputs.plasmacloud.url = "path:/path/to/plasmacloud/repo";
|
|
```
|
|
|
|
2. **Service Packages:** Packages are injected via overlay:
|
|
```nix
|
|
nixpkgs.overlays = [ inputs.plasmacloud.overlays.default ];
|
|
# Now pkgs.chainfire-server, pkgs.flaredb-server, etc. are available
|
|
```
|
|
|
|
3. **Dependency Graph:** systemd units respect T024 dependencies:
|
|
```
|
|
chainfire.service
|
|
↓ requires/after
|
|
flaredb.service
|
|
↓ requires/after
|
|
iam.service
|
|
↓ requires/after
|
|
plasmavmc.service, flashdns.service, ... (parallel)
|
|
```
|
|
|
|
4. **Configuration Schema:** Use `services.<name>.settings` for service-specific config:
|
|
```nix
|
|
services.chainfire.settings = {
|
|
node_id = "node01";
|
|
cluster_name = "prod";
|
|
tls = { ... };
|
|
};
|
|
```
|
|
|
|
### 5.2 T027 Config Unification
|
|
|
|
T027 established a unified configuration approach (clap + config file/env). This integrates with NixOS in two ways:
|
|
|
|
**1. NixOS Module → Config File Generation:**
|
|
|
|
The NixOS module translates `services.<name>.settings` to a config file:
|
|
|
|
```nix
|
|
# In nix/modules/chainfire.nix
|
|
systemd.services.chainfire = {
|
|
preStart = ''
|
|
# Generate config file from settings
|
|
cat > /var/lib/chainfire/config.toml <<EOF
|
|
node_id = "${cfg.settings.node_id}"
|
|
cluster_name = "${cfg.settings.cluster_name}"
|
|
|
|
[tls]
|
|
cert_path = "${cfg.settings.tls.cert_path}"
|
|
key_path = "${cfg.settings.tls.key_path}"
|
|
ca_path = "${cfg.settings.tls.ca_path or ""}"
|
|
EOF
|
|
'';
|
|
|
|
serviceConfig.ExecStart = "${cfg.package}/bin/chainfire-server --config /var/lib/chainfire/config.toml";
|
|
};
|
|
```
|
|
|
|
**2. Environment Variable Injection:**
|
|
|
|
For secrets not suitable for Nix store:
|
|
|
|
```nix
|
|
systemd.services.chainfire.serviceConfig = {
|
|
EnvironmentFile = "/etc/nixos/secrets/chainfire.env";
|
|
# File contains: CHAINFIRE_API_TOKEN=secret123
|
|
};
|
|
```
|
|
|
|
**Best Practices:**
|
|
- **Public config:** Use `services.<name>.settings` (stored in Nix store, world-readable)
|
|
- **Secrets:** Use `EnvironmentFile` or systemd credentials
|
|
- **Hybrid:** Config file with placeholders, secrets injected at runtime
|
|
|
|
### 5.3 T031 TLS Certificates
|
|
|
|
T031 added TLS to all 8 services. Provisioning must handle certificate distribution:
|
|
|
|
**Certificate Provisioning Strategies:**
|
|
|
|
**Option 1: Pre-Generated Certificates (Simple)**
|
|
|
|
1. Generate certs on provisioning server per node:
|
|
```bash
|
|
# /srv/provisioning/scripts/generate-certs.sh node01.example.com
|
|
openssl req -x509 -newkey rsa:4096 -nodes \
|
|
-keyout node01-key.pem -out node01-cert.pem \
|
|
-days 365 -subj "/CN=node01.example.com"
|
|
```
|
|
|
|
2. Copy to node secrets directory:
|
|
```bash
|
|
cp node01-*.pem /srv/provisioning/nodes/node01.example.com/secrets/
|
|
```
|
|
|
|
3. nixos-anywhere installs them to `/etc/nixos/secrets/` (mode 0400, owner root)
|
|
|
|
4. NixOS module references them:
|
|
```nix
|
|
services.chainfire.settings.tls = {
|
|
cert_path = "/etc/nixos/secrets/tls-cert.pem";
|
|
key_path = "/etc/nixos/secrets/tls-key.pem";
|
|
ca_path = "/etc/nixos/secrets/tls-ca.pem";
|
|
};
|
|
```
|
|
|
|
**Option 2: ACME (Let's Encrypt) for External Services**
|
|
|
|
For internet-facing services (e.g., PlasmaVMC API):
|
|
|
|
```nix
|
|
security.acme = {
|
|
acceptTerms = true;
|
|
defaults.email = "admin@example.com";
|
|
};
|
|
|
|
services.plasmavmc.settings.tls = {
|
|
cert_path = config.security.acme.certs."plasmavmc.example.com".directory + "/cert.pem";
|
|
key_path = config.security.acme.certs."plasmavmc.example.com".directory + "/key.pem";
|
|
};
|
|
|
|
security.acme.certs."plasmavmc.example.com" = {
|
|
domain = "plasmavmc.example.com";
|
|
# Use DNS-01 challenge for internal servers
|
|
dnsProvider = "cloudflare";
|
|
credentialsFile = "/etc/nixos/secrets/cloudflare-api-token";
|
|
};
|
|
```
|
|
|
|
**Option 3: Internal CA with Cert-Manager (Advanced)**
|
|
|
|
1. Deploy cert-manager as a service on control plane
|
|
2. Generate per-node CSRs during first boot
|
|
3. Cert-manager signs and distributes certs
|
|
4. Systemd timer renews certs before expiry
|
|
|
|
**Recommendation:**
|
|
- **Phase 1 (MVP):** Pre-generated certs (Option 1)
|
|
- **Phase 2 (Production):** ACME for external + internal CA for internal (Option 2+3)
|
|
|
|
### 5.4 Chainfire/FlareDB Cluster Join
|
|
|
|
**Bootstrap (First 3 Nodes):**
|
|
|
|
First node (`node01`):
|
|
```nix
|
|
services.chainfire.settings = {
|
|
node_id = "node01";
|
|
initial_peers = [
|
|
"node01.example.com:2380"
|
|
"node02.example.com:2380"
|
|
"node03.example.com:2380"
|
|
];
|
|
bootstrap = true; # This node starts the cluster
|
|
};
|
|
```
|
|
|
|
Subsequent nodes (`node02`, `node03`):
|
|
```nix
|
|
services.chainfire.settings = {
|
|
node_id = "node02";
|
|
initial_peers = [
|
|
"node01.example.com:2380"
|
|
"node02.example.com:2380"
|
|
"node03.example.com:2380"
|
|
];
|
|
bootstrap = false; # Join existing cluster
|
|
};
|
|
```
|
|
|
|
**Runtime Join (After Bootstrap):**
|
|
|
|
New nodes added to running cluster:
|
|
|
|
1. Provision node with `bootstrap = false`, `initial_peers = []`
|
|
2. First-boot service calls leader's admin API:
|
|
```bash
|
|
curl -k -X POST https://node01.example.com:2379/admin/member/add \
|
|
-H "Content-Type: application/json" \
|
|
-d '{"id":"node04","raft_addr":"node04.example.com:2380"}'
|
|
```
|
|
3. Node receives cluster state, starts Raft
|
|
4. Leader replicates to new node
|
|
|
|
**FlareDB Follows Same Pattern:**
|
|
|
|
FlareDB depends on Chainfire for coordination but maintains its own Raft cluster:
|
|
|
|
```nix
|
|
services.flaredb.settings = {
|
|
node_id = "node01";
|
|
chainfire_endpoint = "https://localhost:2379";
|
|
initial_peers = [ "node01:2480" "node02:2480" "node03:2480" ];
|
|
};
|
|
```
|
|
|
|
**Critical:** Ensure `chainfire.service` is healthy before starting `flaredb.service` (enforced by systemd `requires`/`after`).
|
|
|
|
### 5.5 IAM Bootstrap
|
|
|
|
IAM requires initial admin user creation. Two approaches:
|
|
|
|
**Option 1: First-Boot Initialization Script**
|
|
|
|
```nix
|
|
systemd.services.iam-bootstrap = {
|
|
description = "IAM Initial Admin User";
|
|
after = [ "iam.service" ];
|
|
wantedBy = [ "multi-user.target" ];
|
|
|
|
serviceConfig = {
|
|
Type = "oneshot";
|
|
RemainAfterExit = true;
|
|
};
|
|
|
|
script = ''
|
|
# Check if admin exists
|
|
if ${pkgs.curl}/bin/curl -k https://localhost:8080/api/users/admin 2>&1 | grep -q "not found"; then
|
|
# Create admin user
|
|
ADMIN_PASSWORD=$(cat /etc/nixos/secrets/iam-admin-password)
|
|
${pkgs.curl}/bin/curl -k -X POST https://localhost:8080/api/users \
|
|
-H "Content-Type: application/json" \
|
|
-d "{\"username\":\"admin\",\"password\":\"$ADMIN_PASSWORD\",\"role\":\"admin\"}"
|
|
echo "Admin user created"
|
|
else
|
|
echo "Admin user already exists"
|
|
fi
|
|
'';
|
|
};
|
|
```
|
|
|
|
**Option 2: Environment Variable for Default Admin**
|
|
|
|
IAM service creates admin on first start if DB is empty:
|
|
|
|
```rust
|
|
// In iam-server main.rs
|
|
if user_count() == 0 {
|
|
let admin_password = env::var("IAM_INITIAL_ADMIN_PASSWORD")
|
|
.expect("IAM_INITIAL_ADMIN_PASSWORD must be set for first boot");
|
|
create_user("admin", &admin_password, Role::Admin)?;
|
|
info!("Initial admin user created");
|
|
}
|
|
```
|
|
|
|
```nix
|
|
systemd.services.iam.serviceConfig = {
|
|
EnvironmentFile = "/etc/nixos/secrets/iam.env";
|
|
# File contains: IAM_INITIAL_ADMIN_PASSWORD=random-secure-password
|
|
};
|
|
```
|
|
|
|
**Recommendation:** Use Option 2 (environment variable) for simplicity. Generate random password during node provisioning, store in secrets.
|
|
|
|
## 6. Alternatives Considered
|
|
|
|
### 6.1 nixos-anywhere vs Custom Installer
|
|
|
|
**nixos-anywhere (Chosen):**
|
|
- **Pros:**
|
|
- Mature, actively maintained by nix-community
|
|
- Handles kexec, disko integration, bootloader install automatically
|
|
- SSH-based, works from any OS (no need for NixOS on provisioning server)
|
|
- Supports remote builds and disk encryption out of box
|
|
- Well-documented with many examples
|
|
- **Cons:**
|
|
- Requires SSH access (not suitable for zero-touch provisioning without PXE+SSH)
|
|
- Opinionated workflow (less flexible than custom scripts)
|
|
- Dependency on external project (but very stable)
|
|
|
|
**Custom Installer (Rejected):**
|
|
- **Pros:**
|
|
- Full control over installation flow
|
|
- Could implement zero-touch (e.g., installer pulls config from server without SSH)
|
|
- Tailored to PlasmaCloud-specific needs
|
|
- **Cons:**
|
|
- Significant development effort (partitioning, bootloader, error handling)
|
|
- Reinvents well-tested code (disko, kexec integration)
|
|
- Maintenance burden (keep up with NixOS changes)
|
|
- Higher risk of bugs (partitioning is error-prone)
|
|
|
|
**Decision:** Use nixos-anywhere for reliability and speed. The SSH requirement is acceptable since PXE boot already provides network access, and adding SSH keys to the netboot image is straightforward.
|
|
|
|
### 6.2 Disk Management Tools
|
|
|
|
**disko (Chosen):**
|
|
- **Pros:**
|
|
- Declarative, fits NixOS philosophy
|
|
- Integrates with nixos-anywhere out of box
|
|
- Supports complex layouts (RAID, LVM, LUKS, ZFS, btrfs)
|
|
- Idempotent (can reformat or verify existing layout)
|
|
- **Cons:**
|
|
- Nix-based DSL (learning curve)
|
|
- Limited to Linux filesystems (no Windows support, not relevant here)
|
|
|
|
**Kickstart/Preseed (Rejected):**
|
|
- Used by Fedora/Debian installers
|
|
- Not NixOS-native, would require custom integration
|
|
|
|
**Terraform with Libvirt (Rejected):**
|
|
- Good for VMs, not bare metal
|
|
- Doesn't handle disk partitioning directly
|
|
|
|
**Decision:** disko is the clear choice for NixOS deployments.
|
|
|
|
### 6.3 Boot Methods
|
|
|
|
**iPXE over TFTP/HTTP (Chosen):**
|
|
- **Pros:**
|
|
- Universal support (BIOS + UEFI)
|
|
- Flexible scripting (boot menus, conditional logic)
|
|
- HTTP support for fast downloads
|
|
- Open source, widely deployed
|
|
- **Cons:**
|
|
- Requires DHCP configuration (Option 66/67 setup)
|
|
- Chainloading adds complexity (but solved problem)
|
|
|
|
**UEFI HTTP Boot (Rejected):**
|
|
- **Pros:**
|
|
- Native UEFI, no TFTP needed
|
|
- Simpler DHCP config (just Option 60/67)
|
|
- **Cons:**
|
|
- UEFI only (no BIOS support)
|
|
- Firmware support inconsistent (pre-2015 servers)
|
|
- Less flexible than iPXE scripting
|
|
|
|
**Preboot USB (Rejected):**
|
|
- Manual, not scalable for fleet deployment
|
|
- Useful for one-off installs only
|
|
|
|
**Decision:** iPXE for flexibility and compatibility. UEFI HTTP Boot could be considered later for pure UEFI fleets.
|
|
|
|
### 6.4 Configuration Management
|
|
|
|
**NixOS Flakes (Chosen):**
|
|
- **Pros:**
|
|
- Native to NixOS, declarative
|
|
- Reproducible builds with lock files
|
|
- Git-based, version controlled
|
|
- No external agent needed (systemd handles state)
|
|
- **Cons:**
|
|
- Steep learning curve for operators unfamiliar with Nix
|
|
- Less dynamic than Ansible (changes require rebuild)
|
|
|
|
**Ansible (Rejected for Provisioning, Useful for Orchestration):**
|
|
- **Pros:**
|
|
- Agentless, SSH-based
|
|
- Large ecosystem of modules
|
|
- Dynamic, easy to patch running systems
|
|
- **Cons:**
|
|
- Imperative (harder to guarantee state)
|
|
- Doesn't integrate with NixOS packages/modules
|
|
- Adds another tool to stack
|
|
|
|
**Terraform (Rejected):**
|
|
- Infrastructure-as-code, not config management
|
|
- Better for cloud VMs than bare metal
|
|
|
|
**Decision:** Use NixOS flakes for provisioning and base config. Ansible may be added later for operational tasks (e.g., rolling updates, health checks) that don't fit NixOS's declarative model.
|
|
|
|
## 7. Open Questions / Decisions Needed
|
|
|
|
### 7.1 Hardware Inventory Management
|
|
|
|
**Question:** How do we map MAC addresses to node roles and configurations?
|
|
|
|
**Options:**
|
|
1. **Manual Inventory File:** Operator maintains JSON/YAML with MAC → hostname → config mapping
|
|
2. **Auto-Discovery:** First boot prompts operator to assign role (e.g., via serial console or web UI)
|
|
3. **External CMDB:** Integrate with existing Configuration Management Database (e.g., NetBox, Nautobot)
|
|
|
|
**Recommendation:** Start with manual inventory file (simple), migrate to CMDB integration in Phase 2.
|
|
|
|
### 7.2 Secrets Management
|
|
|
|
**Question:** How are secrets (TLS keys, passwords) generated, stored, and rotated?
|
|
|
|
**Options:**
|
|
1. **File-Based (Current):** Secrets in `/srv/provisioning/nodes/*/secrets/`, copied during install
|
|
2. **Vault Integration:** Fetch secrets from HashiCorp Vault at boot time
|
|
3. **systemd Credentials:** Use systemd's encrypted credentials feature (requires systemd 250+)
|
|
|
|
**Recommendation:** Phase 1 uses file-based (simple, works today). Phase 2 adds Vault for production (centralized, auditable, rotation support).
|
|
|
|
### 7.3 Network Boot Security
|
|
|
|
**Question:** How do we prevent rogue nodes from joining the cluster?
|
|
|
|
**Concerns:**
|
|
- Attacker boots unauthorized server on network
|
|
- Installer has SSH key, could be accessed
|
|
- Node joins cluster with malicious intent
|
|
|
|
**Mitigations:**
|
|
1. **MAC Whitelist:** DHCP only serves known MAC addresses
|
|
2. **Network Segmentation:** PXE boot on isolated provisioning VLAN
|
|
3. **SSH Key Per Node:** Each node has unique authorized_keys in netboot image (complex)
|
|
4. **Cluster Authentication:** Raft join requires cluster token (not yet implemented)
|
|
|
|
**Recommendation:** Use MAC whitelist + provisioning VLAN for Phase 1. Add cluster join tokens in Phase 2 (requires Chainfire/FlareDB changes).
|
|
|
|
### 7.4 Multi-Datacenter Deployment
|
|
|
|
**Question:** How does provisioning work across geographically distributed datacenters?
|
|
|
|
**Challenges:**
|
|
- WAN latency for Nix cache fetches
|
|
- PXE boot requires local DHCP/TFTP
|
|
- Cluster join across WAN (Raft latency)
|
|
|
|
**Options:**
|
|
1. **Replicated Provisioning Server:** Deploy boot server in each datacenter, sync configs
|
|
2. **Central Provisioning with Local Cache:** Single source of truth, local Nix cache mirrors
|
|
3. **Per-DC Clusters:** Each datacenter is independent cluster, federated at application layer
|
|
|
|
**Recommendation:** Defer to Phase 2. Phase 1 assumes single datacenter or low-latency LAN.
|
|
|
|
### 7.5 Disk Encryption
|
|
|
|
**Question:** Should disks be encrypted at rest?
|
|
|
|
**Trade-offs:**
|
|
- **Pros:** Compliance (GDPR, PCI-DSS), protection against physical theft
|
|
- **Cons:** Key management complexity, can't auto-reboot (manual unlock), performance overhead (~5-10%)
|
|
|
|
**Options:**
|
|
1. **No Encryption:** Rely on physical security
|
|
2. **LUKS with Network Unlock:** Tang/Clevis for automated unlocking (requires network on boot)
|
|
3. **LUKS with Manual Unlock:** Operator enters passphrase via KVM/IPMI
|
|
|
|
**Recommendation:** Optional, configurable per deployment. Provide disko template for LUKS, let operator decide.
|
|
|
|
### 7.6 Rolling Updates
|
|
|
|
**Question:** How do we update a running cluster without downtime?
|
|
|
|
**Challenges:**
|
|
- Raft requires quorum (can't update majority simultaneously)
|
|
- Service dependencies (Chainfire → FlareDB → others)
|
|
- NixOS rebuild requires reboot (for kernel/init changes)
|
|
|
|
**Strategy:**
|
|
1. Update one node at a time (rolling)
|
|
2. Verify health before proceeding to next
|
|
3. Use `nixos-rebuild test` first (activates without bootloader change), then `switch` after validation
|
|
|
|
**Tooling:**
|
|
- Ansible playbook for orchestration
|
|
- Health check scripts (curl endpoints + check Raft status)
|
|
- Rollback plan (NixOS generations + Raft snapshot restore)
|
|
|
|
**Recommendation:** Document as runbook in Phase 1, implement automated rolling update in Phase 2 (T033?).
|
|
|
|
### 7.7 Monitoring and Alerting
|
|
|
|
**Question:** How do we monitor provisioning success/failure?
|
|
|
|
**Options:**
|
|
1. **Manual:** Operator watches terminal, checks health endpoints
|
|
2. **Log Aggregation:** Collect installer logs, index in Loki/Elasticsearch
|
|
3. **Event Webhook:** Installer posts events to monitoring system (Grafana, PagerDuty)
|
|
|
|
**Recommendation:** Phase 1 uses manual monitoring. Phase 2 adds structured logging + webhooks for fleet deployments.
|
|
|
|
### 7.8 Compatibility with Existing Infrastructure
|
|
|
|
**Question:** Can this provisioning system coexist with existing PXE infrastructure (e.g., for other OS deployments)?
|
|
|
|
**Concerns:**
|
|
- Existing DHCP config may conflict
|
|
- TFTP server may serve other boot files
|
|
- Network team may control PXE infrastructure
|
|
|
|
**Solutions:**
|
|
1. **Dedicated Provisioning VLAN:** PlasmaCloud nodes on separate network
|
|
2. **Conditional DHCP:** Use vendor-class or subnet matching to route to correct boot server
|
|
3. **Multi-Boot Menu:** iPXE menu includes options for PlasmaCloud and other OSes
|
|
|
|
**Recommendation:** Document network requirements, provide example DHCP config for common scenarios (dedicated VLAN, shared infrastructure). Coordinate with network team.
|
|
|
|
---
|
|
|
|
## Appendices
|
|
|
|
### A. Example Disko Configuration
|
|
|
|
**Single Disk with GPT and ext4:**
|
|
|
|
```nix
|
|
# nodes/node01/disko.nix
|
|
{ disks ? [ "/dev/sda" ], ... }:
|
|
{
|
|
disko.devices = {
|
|
disk = {
|
|
main = {
|
|
type = "disk";
|
|
device = builtins.head disks;
|
|
content = {
|
|
type = "gpt";
|
|
partitions = {
|
|
ESP = {
|
|
size = "512M";
|
|
type = "EF00";
|
|
content = {
|
|
type = "filesystem";
|
|
format = "vfat";
|
|
mountpoint = "/boot";
|
|
};
|
|
};
|
|
root = {
|
|
size = "100%";
|
|
content = {
|
|
type = "filesystem";
|
|
format = "ext4";
|
|
mountpoint = "/";
|
|
};
|
|
};
|
|
};
|
|
};
|
|
};
|
|
};
|
|
};
|
|
}
|
|
```
|
|
|
|
**RAID1 with LUKS Encryption:**
|
|
|
|
```nix
|
|
{ disks ? [ "/dev/sda" "/dev/sdb" ], ... }:
|
|
{
|
|
disko.devices = {
|
|
disk = {
|
|
disk1 = {
|
|
device = builtins.elemAt disks 0;
|
|
type = "disk";
|
|
content = {
|
|
type = "gpt";
|
|
partitions = {
|
|
boot = {
|
|
size = "1M";
|
|
type = "EF02"; # BIOS boot
|
|
};
|
|
mdraid = {
|
|
size = "100%";
|
|
content = {
|
|
type = "mdraid";
|
|
name = "raid1";
|
|
};
|
|
};
|
|
};
|
|
};
|
|
};
|
|
disk2 = {
|
|
device = builtins.elemAt disks 1;
|
|
type = "disk";
|
|
content = {
|
|
type = "gpt";
|
|
partitions = {
|
|
boot = {
|
|
size = "1M";
|
|
type = "EF02";
|
|
};
|
|
mdraid = {
|
|
size = "100%";
|
|
content = {
|
|
type = "mdraid";
|
|
name = "raid1";
|
|
};
|
|
};
|
|
};
|
|
};
|
|
};
|
|
};
|
|
mdadm = {
|
|
raid1 = {
|
|
type = "mdadm";
|
|
level = 1;
|
|
content = {
|
|
type = "luks";
|
|
name = "cryptroot";
|
|
settings.allowDiscards = true;
|
|
content = {
|
|
type = "filesystem";
|
|
format = "ext4";
|
|
mountpoint = "/";
|
|
};
|
|
};
|
|
};
|
|
};
|
|
};
|
|
}
|
|
```
|
|
|
|
### B. Complete nixos-anywhere Command Examples
|
|
|
|
**Basic Deployment:**
|
|
|
|
```bash
|
|
nix run github:nix-community/nixos-anywhere -- \
|
|
--flake .#node01 \
|
|
root@10.0.0.100
|
|
```
|
|
|
|
**With Build on Remote (Slow Local Machine):**
|
|
|
|
```bash
|
|
nix run github:nix-community/nixos-anywhere -- \
|
|
--flake .#node01 \
|
|
--build-on-remote \
|
|
root@10.0.0.100
|
|
```
|
|
|
|
**With Disk Encryption Key:**
|
|
|
|
```bash
|
|
nix run github:nix-community/nixos-anywhere -- \
|
|
--flake .#node01 \
|
|
--disk-encryption-keys /tmp/luks.key <(cat /secrets/node01-luks.key) \
|
|
root@10.0.0.100
|
|
```
|
|
|
|
**Debug Mode (Keep Installer After Failure):**
|
|
|
|
```bash
|
|
nix run github:nix-community/nixos-anywhere -- \
|
|
--flake .#node01 \
|
|
--debug \
|
|
--no-reboot \
|
|
root@10.0.0.100
|
|
```
|
|
|
|
### C. Provisioning Server Setup Script
|
|
|
|
```bash
|
|
#!/bin/bash
|
|
# /srv/provisioning/scripts/setup-provisioning-server.sh
|
|
|
|
set -euo pipefail
|
|
|
|
# Install dependencies
|
|
apt-get update
|
|
apt-get install -y nginx tftpd-hpa dnsmasq curl
|
|
|
|
# Configure TFTP
|
|
cat > /etc/default/tftpd-hpa <<EOF
|
|
TFTP_USERNAME="tftp"
|
|
TFTP_DIRECTORY="/srv/boot/tftp"
|
|
TFTP_ADDRESS="0.0.0.0:69"
|
|
TFTP_OPTIONS="--secure"
|
|
EOF
|
|
|
|
mkdir -p /srv/boot/tftp
|
|
systemctl restart tftpd-hpa
|
|
|
|
# Download iPXE binaries
|
|
curl -L http://boot.ipxe.org/undionly.kpxe -o /srv/boot/tftp/undionly.kpxe
|
|
curl -L http://boot.ipxe.org/ipxe.efi -o /srv/boot/tftp/ipxe.efi
|
|
|
|
# Configure nginx for HTTP boot
|
|
cat > /etc/nginx/sites-available/pxe <<EOF
|
|
server {
|
|
listen 8080;
|
|
server_name _;
|
|
root /srv/boot;
|
|
|
|
location / {
|
|
autoindex on;
|
|
try_files \$uri \$uri/ =404;
|
|
}
|
|
|
|
# Enable range requests for large files
|
|
location ~* \.(iso|img|bin|efi|kpxe)$ {
|
|
add_header Accept-Ranges bytes;
|
|
}
|
|
}
|
|
EOF
|
|
|
|
ln -sf /etc/nginx/sites-available/pxe /etc/nginx/sites-enabled/
|
|
systemctl restart nginx
|
|
|
|
# Create directory structure
|
|
mkdir -p /srv/boot/{nixos,nix-cache,scripts}
|
|
mkdir -p /srv/provisioning/{nodes,profiles,common,scripts}
|
|
|
|
echo "Provisioning server setup complete!"
|
|
echo "Next steps:"
|
|
echo "1. Configure DHCP server (see design doc Section 2.2)"
|
|
echo "2. Build NixOS netboot image (see Section 3.1)"
|
|
echo "3. Create node configurations (see Section 3.2)"
|
|
```
|
|
|
|
### D. First-Boot Cluster Config JSON Schema
|
|
|
|
```json
|
|
{
|
|
"$schema": "http://json-schema.org/draft-07/schema#",
|
|
"title": "Cluster Configuration",
|
|
"type": "object",
|
|
"properties": {
|
|
"node_id": {
|
|
"type": "string",
|
|
"description": "Unique identifier for this node"
|
|
},
|
|
"bootstrap": {
|
|
"type": "boolean",
|
|
"description": "True if this node should bootstrap a new cluster"
|
|
},
|
|
"leader_url": {
|
|
"type": "string",
|
|
"format": "uri",
|
|
"description": "URL of existing cluster leader (for join)"
|
|
},
|
|
"raft_addr": {
|
|
"type": "string",
|
|
"description": "This node's Raft address (host:port)"
|
|
},
|
|
"cluster_token": {
|
|
"type": "string",
|
|
"description": "Shared secret for cluster authentication (future)"
|
|
}
|
|
},
|
|
"required": ["node_id", "bootstrap"],
|
|
"if": {
|
|
"properties": { "bootstrap": { "const": false } }
|
|
},
|
|
"then": {
|
|
"required": ["leader_url", "raft_addr"]
|
|
}
|
|
}
|
|
```
|
|
|
|
**Example for bootstrap node:**
|
|
```json
|
|
{
|
|
"node_id": "node01",
|
|
"bootstrap": true,
|
|
"raft_addr": "node01.example.com:2380"
|
|
}
|
|
```
|
|
|
|
**Example for joining node:**
|
|
```json
|
|
{
|
|
"node_id": "node04",
|
|
"bootstrap": false,
|
|
"leader_url": "https://node01.example.com:2379",
|
|
"raft_addr": "node04.example.com:2380"
|
|
}
|
|
```
|
|
|
|
### E. References and Further Reading
|
|
|
|
**Primary Documentation:**
|
|
- [nixos-anywhere Quickstart](https://nix-community.github.io/nixos-anywhere/quickstart.html)
|
|
- [disko Documentation](https://github.com/nix-community/disko)
|
|
- [iPXE Examples](https://ipxe.org/examples)
|
|
- [NixOS Netboot](https://nixos.wiki/wiki/Netboot)
|
|
|
|
**Technical Specifications:**
|
|
- [RFC 4578 - DHCP Options for PXE](https://www.rfc-editor.org/rfc/rfc4578)
|
|
- [UEFI HTTP Boot Specification](https://uefi.org/specs/UEFI/2.10/32_Network_Protocols.html#http-boot)
|
|
|
|
**Community Resources:**
|
|
- [NixOS Discourse - Netboot Discussions](https://discourse.nixos.org/tag/netboot)
|
|
- [nixos-anywhere Examples](https://github.com/nix-community/nixos-anywhere/tree/main/docs)
|
|
|
|
**Related Blog Posts:**
|
|
- [iPXE Booting with NixOS (2024)](https://carlosvaz.com/posts/ipxe-booting-with-nixos/)
|
|
- [Remote Deployment with nixos-anywhere and disko](https://mich-murphy.com/nixos-anywhere-and-disko/)
|
|
|
|
---
|
|
|
|
## Revision History
|
|
|
|
| Version | Date | Author | Changes |
|
|
|---------|------|--------|---------|
|
|
| 0.1 | 2025-12-10 | peerB | Initial draft |
|
|
|
|
---
|
|
|
|
**End of Design Document**
|