# PlasmaCloud Netboot Image Builder - Technical Overview ## Introduction This document provides a technical overview of the PlasmaCloud NixOS Image Builder, which generates bootable netboot images for bare-metal provisioning. This is part of T032 (Bare-Metal Provisioning) and specifically implements deliverable S3 (NixOS Image Builder). ## System Architecture ### High-Level Flow ``` ┌─────────────────────┐ │ Nix Flake │ │ (flake.nix) │ └──────────┬──────────┘ │ ├─── nixosConfigurations │ ├── netboot-control-plane │ ├── netboot-worker │ └── netboot-all-in-one │ ├─── packages (T024) │ ├── chainfire-server │ ├── flaredb-server │ └── ... (8 services) │ └─── modules (T024) ├── chainfire.nix ├── flaredb.nix └── ... (8 modules) Build Process ↓ ┌─────────────────────┐ │ build-images.sh │ └──────────┬──────────┘ │ ├─── nix build netbootRamdisk ├─── nix build kernel └─── copy to artifacts/ Output ↓ ┌─────────────────────┐ │ Netboot Artifacts │ ├─────────────────────┤ │ bzImage (kernel) │ │ initrd (ramdisk) │ │ netboot.ipxe │ └─────────────────────┘ │ ├─── PXE Server │ (HTTP/TFTP) │ └─── Target Machine (PXE Boot) ``` ## Component Breakdown ### 1. Netboot Configurations Located in `nix/images/`, these NixOS configurations define the netboot environment: #### `netboot-base.nix` **Purpose**: Common base configuration for all profiles **Key Features**: - Extends `netboot-minimal.nix` from nixpkgs - SSH server with root login (key-based only) - Generic kernel with broad hardware support - Disk management tools (disko, parted, cryptsetup, lvm2) - Network configuration (DHCP, predictable interface names) - Serial console support (ttyS0, tty0) - Minimal system (no docs, no sound) **Package Inclusions**: ```nix disko, parted, gptfdisk # Disk management cryptsetup, lvm2 # Encryption and LVM e2fsprogs, xfsprogs # Filesystem tools iproute2, curl, tcpdump # Network tools vim, tmux, htop # System tools ``` **Kernel Configuration**: ```nix boot.kernelPackages = pkgs.linuxPackages_latest; boot.kernelParams = [ "console=ttyS0,115200" "console=tty0" "loglevel=4" ]; ``` #### `netboot-control-plane.nix` **Purpose**: Full control plane deployment **Imports**: - `netboot-base.nix` (base configuration) - `../modules` (PlasmaCloud service modules) **Service Inclusions**: - Chainfire (ports 2379, 2380, 2381) - FlareDB (ports 2479, 2480) - IAM (port 8080) - PlasmaVMC (port 8081) - NovaNET (port 8082) - FlashDNS (port 53) - FiberLB (port 8083) - LightningStor (port 8084) - K8sHost (port 8085) **Service State**: All services **disabled** by default via `lib.mkDefault false` **Resource Limits** (for netboot environment): ```nix MemoryMax = "512M" CPUQuota = "50%" ``` #### `netboot-worker.nix` **Purpose**: Compute-focused worker nodes **Imports**: - `netboot-base.nix` - `../modules` **Service Inclusions**: - PlasmaVMC (VM management) - NovaNET (SDN) **Additional Features**: - KVM virtualization support - Open vSwitch for SDN - QEMU and libvirt tools - Optimized sysctl for VM workloads **Performance Tuning**: ```nix "fs.file-max" = 1000000; "net.ipv4.ip_forward" = 1; "net.core.netdev_max_backlog" = 5000; ``` #### `netboot-all-in-one.nix` **Purpose**: Single-node deployment with all services **Imports**: - `netboot-base.nix` - `../modules` **Combines**: All features from control-plane + worker **Use Cases**: - Development environments - Small deployments - Edge locations - POC installations ### 2. Flake Integration The main `flake.nix` exposes netboot configurations: ```nix nixosConfigurations = { netboot-control-plane = nixpkgs.lib.nixosSystem { system = "x86_64-linux"; modules = [ ./nix/images/netboot-control-plane.nix ]; }; netboot-worker = nixpkgs.lib.nixosSystem { system = "x86_64-linux"; modules = [ ./nix/images/netboot-worker.nix ]; }; netboot-all-in-one = nixpkgs.lib.nixosSystem { system = "x86_64-linux"; modules = [ ./nix/images/netboot-all-in-one.nix ]; }; }; ``` ### 3. Build Script `build-images.sh` orchestrates the build process: **Workflow**: 1. Parse command-line arguments (--profile, --output-dir) 2. Create output directories 3. For each profile: - Build netboot ramdisk: `nix build ...netbootRamdisk` - Build kernel: `nix build ...kernel` - Copy artifacts (bzImage, initrd) - Generate iPXE boot script - Calculate and display sizes 4. Verify outputs (file existence, size sanity checks) 5. Copy to PXE server (if available) 6. Print summary **Build Commands**: ```bash nix build .#nixosConfigurations.netboot-$profile.config.system.build.netbootRamdisk nix build .#nixosConfigurations.netboot-$profile.config.system.build.kernel ``` **Output Structure**: ``` artifacts/ ├── control-plane/ │ ├── bzImage # ~10-30 MB │ ├── initrd # ~100-300 MB │ ├── netboot.ipxe # iPXE script │ ├── build.log # Build log │ ├── initrd-link # Nix result symlink │ └── kernel-link # Nix result symlink ├── worker/ │ └── ... (same structure) └── all-in-one/ └── ... (same structure) ``` ## Integration Points ### T024 NixOS Modules The netboot configurations leverage T024 service modules: **Module Structure** (example: chainfire.nix): ```nix { options.services.chainfire = { enable = lib.mkEnableOption "chainfire service"; port = lib.mkOption { ... }; raftPort = lib.mkOption { ... }; package = lib.mkOption { ... }; }; config = lib.mkIf cfg.enable { users.users.chainfire = { ... }; systemd.services.chainfire = { ... }; }; } ``` **Package Availability**: ```nix # In netboot-control-plane.nix environment.systemPackages = with pkgs; [ chainfire-server # From flake overlay flaredb-server # From flake overlay # ... ]; ``` ### T032.S2 PXE Infrastructure The build script integrates with the PXE server: **Copy Workflow**: ```bash # Build script copies to: chainfire/baremetal/pxe-server/assets/nixos/ ├── control-plane/ │ ├── bzImage │ └── initrd ├── worker/ │ ├── bzImage │ └── initrd └── all-in-one/ ├── bzImage └── initrd ``` **iPXE Boot Script** (generated): ```ipxe #!ipxe kernel ${boot-server}/control-plane/bzImage init=/nix/store/*/init console=ttyS0,115200 initrd ${boot-server}/control-plane/initrd boot ``` ## Build Process Deep Dive ### NixOS Netboot Build Internals 1. **netboot-minimal.nix** (from nixpkgs): - Provides base netboot functionality - Configures initrd with kexec support - Sets up squashfs for Nix store 2. **Our Extensions**: - Add PlasmaCloud service packages - Configure SSH for nixos-anywhere - Include provisioning tools (disko, etc.) - Customize kernel and modules 3. **Build Outputs**: - **bzImage**: Compressed Linux kernel - **initrd**: Squashfs-compressed initial ramdisk containing: - Minimal NixOS system - Nix store with service packages - Init scripts for booting ### Size Optimization Strategies **Current Optimizations**: ```nix documentation.enable = false; # -50MB documentation.nixos.enable = false; # -20MB i18n.supportedLocales = [ "en_US" ]; # -100MB ``` **Additional Strategies** (if needed): - Use `linuxPackages_hardened` (smaller kernel) - Remove unused kernel modules - Compress with xz instead of gzip - On-demand package fetching from HTTP substituter **Expected Sizes**: - **Control Plane**: ~250-350 MB (initrd) - **Worker**: ~150-250 MB (initrd) - **All-in-One**: ~300-400 MB (initrd) ## Boot Flow ### From PXE to Running System ``` 1. PXE Boot ├─ DHCP discovers boot server ├─ TFTP loads iPXE binary └─ iPXE executes boot script 2. Netboot Download ├─ HTTP downloads bzImage (~20MB) ├─ HTTP downloads initrd (~200MB) └─ kexec into NixOS installer 3. NixOS Installer (in RAM) ├─ Init system starts ├─ Network configuration (DHCP) ├─ SSH server starts └─ Ready for nixos-anywhere 4. Installation (nixos-anywhere) ├─ SSH connection established ├─ Disk partitioning (disko) ├─ NixOS system installation ├─ Secret injection └─ Bootloader installation 5. First Boot (from disk) ├─ GRUB/systemd-boot loads ├─ Services start (enabled) ├─ Cluster join (if configured) └─ Running PlasmaCloud node ``` ## Customization Guide ### Adding a New Service **Step 1**: Create NixOS module ```nix # nix/modules/myservice.nix { config, lib, pkgs, ... }: { options.services.myservice = { enable = lib.mkEnableOption "myservice"; }; config = lib.mkIf cfg.enable { systemd.services.myservice = { ... }; }; } ``` **Step 2**: Add to flake packages ```nix # flake.nix packages.myservice-server = buildRustWorkspace { ... }; ``` **Step 3**: Include in netboot profile ```nix # nix/images/netboot-control-plane.nix environment.systemPackages = with pkgs; [ myservice-server ]; services.myservice = { enable = lib.mkDefault false; }; ``` ### Creating a Custom Profile **Step 1**: Create new netboot configuration ```nix # nix/images/netboot-custom.nix { config, pkgs, lib, ... }: { imports = [ ./netboot-base.nix ../modules ]; # Your customizations environment.systemPackages = [ ... ]; } ``` **Step 2**: Add to flake ```nix # flake.nix nixosConfigurations.netboot-custom = nixpkgs.lib.nixosSystem { system = "x86_64-linux"; modules = [ ./nix/images/netboot-custom.nix ]; }; ``` **Step 3**: Update build script ```bash # build-images.sh profiles_to_build=("control-plane" "worker" "all-in-one" "custom") ``` ## Security Model ### Netboot Phase **Risk**: Netboot image has root SSH access enabled **Mitigations**: 1. **Key-based authentication only** (no passwords) 2. **Isolated provisioning VLAN** 3. **MAC address whitelist in DHCP** 4. **Firewall disabled only during install** ### Post-Installation Services remain disabled until final configuration enables them: ```nix # In installed system configuration services.chainfire.enable = true; # Overrides lib.mkDefault false ``` ### Secret Management Secrets are **NOT** embedded in netboot images: ```nix # During nixos-anywhere installation: scp secrets/* root@target:/tmp/secrets/ # Installed system references: services.chainfire.settings.tls = { cert_path = "/etc/nixos/secrets/tls-cert.pem"; }; ``` ## Performance Characteristics ### Build Times - **First build**: 30-60 minutes (downloads all dependencies) - **Incremental builds**: 5-15 minutes (reuses cached artifacts) - **With local cache**: 2-5 minutes ### Network Requirements - **Initial download**: ~2GB (nixpkgs + dependencies) - **Netboot download**: ~200-400MB per node - **Installation**: ~500MB-2GB (depending on services) ### Hardware Requirements **Build Machine**: - CPU: 4+ cores recommended - RAM: 8GB minimum, 16GB recommended - Disk: 50GB free space - Network: Broadband connection **Target Machine**: - RAM: 4GB minimum for netboot (8GB+ for production) - Network: PXE boot support, DHCP - Disk: Depends on disko configuration ## Testing Strategy ### Verification Steps 1. **Syntax Validation**: ```bash nix flake check ``` 2. **Build Test**: ```bash ./build-images.sh --profile control-plane ``` 3. **Artifact Verification**: ```bash file artifacts/control-plane/bzImage # Should be Linux kernel file artifacts/control-plane/initrd # Should be compressed data ``` 4. **PXE Boot Test**: - Boot VM from netboot image - Verify SSH access - Check available tools (disko, parted, etc.) 5. **Installation Test**: - Run nixos-anywhere on test target - Verify successful installation - Check service availability ## Troubleshooting Matrix | Symptom | Possible Cause | Solution | |---------|---------------|----------| | Build fails | Missing flakes | Enable experimental-features | | Large initrd | Too many packages | Remove unused packages | | SSH fails | Wrong SSH key | Update authorized_keys | | Boot hangs | Wrong kernel params | Check console= settings | | No network | DHCP issues | Verify useDHCP = true | | Service missing | Package not built | Check flake overlay | ## Future Enhancements ### Planned Improvements 1. **Image Variants**: - Minimal installer (no services) - Debug variant (with extra tools) - Rescue mode (for recovery) 2. **Build Optimizations**: - Parallel profile builds - Incremental rebuild detection - Binary cache integration 3. **Security Enhancements**: - Per-node SSH keys - TPM-based secrets - Measured boot support 4. **Monitoring**: - Build metrics collection - Size trend tracking - Performance benchmarking ## References - **NixOS Netboot**: https://nixos.wiki/wiki/Netboot - **nixos-anywhere**: https://github.com/nix-community/nixos-anywhere - **disko**: https://github.com/nix-community/disko - **T032 Design**: `docs/por/T032-baremetal-provisioning/design.md` - **T024 Modules**: `nix/modules/` ## Revision History | Version | Date | Author | Changes | |---------|------|--------|---------| | 1.0 | 2025-12-10 | T032.S3 | Initial implementation |