# Centra Cloud PXE Boot Server This directory contains the PXE (Preboot eXecution Environment) boot infrastructure for bare-metal provisioning of Centra Cloud nodes. It enables network-based installation of NixOS on physical servers with automated profile selection. ## Table of Contents - [Architecture Overview](#architecture-overview) - [Components](#components) - [Quick Start](#quick-start) - [Detailed Setup](#detailed-setup) - [Configuration](#configuration) - [Boot Profiles](#boot-profiles) - [Network Requirements](#network-requirements) - [Troubleshooting](#troubleshooting) - [Advanced Topics](#advanced-topics) ## Architecture Overview The PXE boot infrastructure consists of three main services: ``` ┌─────────────────────────────────────────────────────────────────┐ │ PXE Boot Flow │ └─────────────────────────────────────────────────────────────────┘ Bare-Metal Server PXE Boot Server ───────────────── ─────────────── 1. Power on │ ├─► DHCP Request ──────────────► DHCP Server │ (ISC DHCP) │ │ │ ├─ Assigns IP │ ├─ Detects BIOS/UEFI │ └─ Provides bootloader path │ ├◄─ DHCP Response ───────────────┤ │ (IP, next-server, filename) │ ├─► TFTP Get bootloader ─────────► TFTP Server │ (undionly.kpxe or ipxe.efi) (atftpd) │ ├◄─ Bootloader file ─────────────┤ │ ├─► Execute iPXE bootloader │ │ │ ├─► HTTP Get boot.ipxe ──────► HTTP Server │ │ (nginx) │ │ │ ├◄─ boot.ipxe script ─────────┤ │ │ │ ├─► Display menu / Auto-select profile │ │ │ ├─► HTTP Get kernel ──────────► HTTP Server │ │ │ ├◄─ bzImage ───────────────────┤ │ │ │ ├─► HTTP Get initrd ───────────► HTTP Server │ │ │ ├◄─ initrd ────────────────────┤ │ │ │ └─► Boot NixOS │ └─► NixOS Installer └─ Provisions node based on profile ``` ## Components ### 1. DHCP Server (ISC DHCP) - **Purpose**: Assigns IP addresses and directs PXE clients to bootloader - **Config**: `dhcp/dhcpd.conf` - **Features**: - BIOS/UEFI detection via option 93 (architecture type) - Per-host configuration for fixed IP assignment - Automatic next-server and filename configuration ### 2. TFTP Server (atftpd) - **Purpose**: Serves iPXE bootloader files to PXE clients - **Files served**: - `undionly.kpxe` - BIOS bootloader - `ipxe.efi` - UEFI x86-64 bootloader - `ipxe-i386.efi` - UEFI x86 32-bit bootloader (optional) ### 3. HTTP Server (nginx) - **Purpose**: Serves iPXE scripts and NixOS boot images - **Config**: `http/nginx.conf` - **Endpoints**: - `/boot/ipxe/boot.ipxe` - Main boot menu script - `/boot/nixos/bzImage` - NixOS kernel - `/boot/nixos/initrd` - NixOS initial ramdisk - `/health` - Health check endpoint ### 4. iPXE Boot Scripts - **Main script**: `ipxe/boot.ipxe` - **Features**: - Interactive boot menu with 3 profiles - MAC-based automatic profile selection - Serial console support for remote management - Detailed error messages and debugging options ### 5. NixOS Service Module - **File**: `nixos-module.nix` - **Purpose**: Declarative NixOS configuration for all services - **Features**: - Single configuration file for entire stack - Firewall rules auto-configured - Systemd service dependencies managed - Directory structure auto-created ## Quick Start ### Prerequisites - NixOS server with network connectivity - Network interface on the same subnet as bare-metal servers - Sufficient disk space (5-10 GB for boot images) ### Installation Steps 1. **Clone this repository** (or copy `baremetal/pxe-server/` to your NixOS system) 2. **Run the setup script**: ```bash sudo ./setup.sh --install --download --validate ``` This will: - Create directory structure at `/var/lib/pxe-boot` - Download iPXE bootloaders from boot.ipxe.org - Install boot scripts - Validate configurations 3. **Configure network settings**: Edit `nixos-module.nix` or create a NixOS configuration: ```nix # /etc/nixos/configuration.nix imports = [ /path/to/baremetal/pxe-server/nixos-module.nix ]; services.centra-pxe-server = { enable = true; interface = "eth0"; # Your network interface serverAddress = "10.0.100.10"; # PXE server IP dhcp = { subnet = "10.0.100.0"; netmask = "255.255.255.0"; broadcast = "10.0.100.255"; range = { start = "10.0.100.100"; end = "10.0.100.200"; }; router = "10.0.100.1"; }; # Optional: Define known nodes with MAC addresses nodes = { "52:54:00:12:34:56" = { profile = "control-plane"; hostname = "control-plane-01"; ipAddress = "10.0.100.50"; }; }; }; ``` 4. **Deploy NixOS configuration**: ```bash sudo nixos-rebuild switch ``` 5. **Verify services are running**: ```bash sudo ./setup.sh --test ``` 6. **Add NixOS boot images** (will be provided by T032.S3): ```bash # Placeholder - actual images will be built by image builder # For testing, you can use any NixOS netboot image sudo mkdir -p /var/lib/pxe-boot/nixos # Copy bzImage and initrd to /var/lib/pxe-boot/nixos/ ``` 7. **Boot a bare-metal server**: - Configure server BIOS to boot from network (PXE) - Connect to same network segment - Power on server - Watch for DHCP discovery and iPXE boot menu ## Detailed Setup ### Option 1: NixOS Module (Recommended) The NixOS module provides a declarative way to configure the entire PXE server stack. **Advantages**: - Single configuration file - Automatic service dependencies - Rollback capability - Integration with NixOS firewall **Configuration Example**: See the NixOS configuration example in [Quick Start](#quick-start). ### Option 2: Manual Installation For non-NixOS systems or manual setup: 1. **Install required packages**: ```bash # Debian/Ubuntu apt-get install isc-dhcp-server atftpd nginx curl # RHEL/CentOS yum install dhcp tftp-server nginx curl ``` 2. **Run setup script**: ```bash sudo ./setup.sh --install --download ``` 3. **Copy configuration files**: ```bash # DHCP configuration sudo cp dhcp/dhcpd.conf /etc/dhcp/dhcpd.conf # Edit to match your network sudo vim /etc/dhcp/dhcpd.conf # Nginx configuration sudo cp http/nginx.conf /etc/nginx/sites-available/pxe-boot sudo ln -s /etc/nginx/sites-available/pxe-boot /etc/nginx/sites-enabled/ ``` 4. **Start services**: ```bash sudo systemctl enable --now isc-dhcp-server sudo systemctl enable --now atftpd sudo systemctl enable --now nginx ``` 5. **Configure firewall**: ```bash # UFW (Ubuntu) sudo ufw allow 67/udp # DHCP sudo ufw allow 68/udp # DHCP sudo ufw allow 69/udp # TFTP sudo ufw allow 80/tcp # HTTP # firewalld (RHEL) sudo firewall-cmd --permanent --add-service=dhcp sudo firewall-cmd --permanent --add-service=tftp sudo firewall-cmd --permanent --add-service=http sudo firewall-cmd --reload ``` ## Configuration ### DHCP Configuration The DHCP server configuration is in `dhcp/dhcpd.conf`. Key sections: **Network Settings**: ```conf subnet 10.0.100.0 netmask 255.255.255.0 { range 10.0.100.100 10.0.100.200; option routers 10.0.100.1; option domain-name-servers 10.0.100.1, 8.8.8.8; next-server 10.0.100.10; # PXE server IP # ... } ``` **Boot File Selection** (automatic BIOS/UEFI detection): ```conf if exists user-class and option user-class = "iPXE" { filename "http://10.0.100.10/boot/ipxe/boot.ipxe"; } elsif option architecture-type = 00:00 { filename "undionly.kpxe"; # BIOS } elsif option architecture-type = 00:07 { filename "ipxe.efi"; # UEFI x86-64 } ``` **Host-Specific Configuration**: ```conf host control-plane-01 { hardware ethernet 52:54:00:12:34:56; fixed-address 10.0.100.50; option host-name "control-plane-01"; } ``` ### iPXE Boot Script The main boot script is `ipxe/boot.ipxe`. It provides: 1. **MAC-based automatic selection**: ```ipxe iseq ${mac} 52:54:00:12:34:56 && set profile control-plane && goto boot || ``` 2. **Interactive menu** (if no MAC match): ```ipxe :menu menu Centra Cloud - Bare-Metal Provisioning item control-plane 1. Control Plane Node (All Services) item worker 2. Worker Node (Compute Services) item all-in-one 3. All-in-One Node (Testing/Homelab) ``` 3. **Kernel parameters**: ```ipxe set kernel-params centra.profile=${profile} set kernel-params ${kernel-params} centra.hostname=${hostname} set kernel-params ${kernel-params} console=tty0 console=ttyS0,115200n8 ``` ### Adding New Nodes To add a new node to the infrastructure: 1. **Get the MAC address** from the server (check BIOS or network card label) 2. **Add to MAC mappings** (`ipxe/mac-mappings.txt`): ``` 52:54:00:12:34:5d worker worker-04 ``` 3. **Update boot script** (`ipxe/boot.ipxe`): ```ipxe iseq ${mac} 52:54:00:12:34:5d && set profile worker && set hostname worker-04 && goto boot || ``` 4. **Add DHCP host entry** (`dhcp/dhcpd.conf`): ```conf host worker-04 { hardware ethernet 52:54:00:12:34:5d; fixed-address 10.0.100.64; option host-name "worker-04"; } ``` 5. **Restart DHCP service**: ```bash sudo systemctl restart dhcpd4 ``` ## Boot Profiles ### 1. Control Plane Profile **Purpose**: Nodes that run core infrastructure services **Services included**: - FlareDB (PD, Store, TiKV-compatible database) - IAM (Identity and Access Management) - PlasmaVMC (Virtual Machine Controller) - K8sHost (Kubernetes node agent) - FlashDNS (High-performance DNS) - ChainFire (Firewall/networking) - Object Storage (S3-compatible) - Monitoring (Prometheus, Grafana) **Resource requirements**: - CPU: 8+ cores recommended - RAM: 32+ GB recommended - Disk: 500+ GB SSD **Use case**: Production control plane nodes in a cluster ### 2. Worker Profile **Purpose**: Nodes that run customer workloads **Services included**: - K8sHost (Kubernetes node agent) - primary service - PlasmaVMC (Virtual Machine Controller) - VM workloads - ChainFire (Network policy enforcement) - FlashDNS (Local DNS caching) - Basic monitoring agents **Resource requirements**: - CPU: 16+ cores recommended - RAM: 64+ GB recommended - Disk: 1+ TB SSD **Use case**: Worker nodes for running customer applications ### 3. All-in-One Profile **Purpose**: Single-node deployment for testing and development **Services included**: - Complete Centra Cloud stack on one node - All services from control-plane profile - Suitable for testing, development, homelab **Resource requirements**: - CPU: 16+ cores recommended - RAM: 64+ GB recommended - Disk: 1+ TB SSD **Use case**: Development, testing, homelab deployments **Warning**: Not recommended for production use (no HA, resource intensive) ## Network Requirements ### Network Topology The PXE server must be on the same network segment as the bare-metal servers, or you must configure DHCP relay. **Same Segment** (recommended for initial setup): ``` ┌──────────────┐ ┌──────────────────┐ │ PXE Server │ │ Bare-Metal Srv │ │ 10.0.100.10 │◄────────┤ (DHCP client) │ └──────────────┘ L2 SW └──────────────────┘ ``` **Different Segments** (requires DHCP relay): ``` ┌──────────────┐ ┌──────────┐ ┌──────────────────┐ │ PXE Server │ │ Router │ │ Bare-Metal Srv │ │ 10.0.100.10 │◄────────┤ (relay) │◄────────┤ (DHCP client) │ └──────────────┘ └──────────┘ └──────────────────┘ Segment A ip helper Segment B ``` ### DHCP Relay Configuration If your PXE server is on a different network segment: **Cisco IOS**: ``` interface vlan 100 ip helper-address 10.0.100.10 ``` **Linux (dhcp-helper)**: ```bash apt-get install dhcp-helper # Edit /etc/default/dhcp-helper DHCPHELPER_OPTS="-s 10.0.100.10" systemctl restart dhcp-helper ``` **Linux (dhcrelay)**: ```bash apt-get install isc-dhcp-relay dhcrelay -i eth0 -i eth1 10.0.100.10 ``` ### Firewall Rules The following ports must be open on the PXE server: | Port | Protocol | Service | Direction | Description | |------|----------|---------|-----------|-------------| | 67 | UDP | DHCP | Inbound | DHCP server | | 68 | UDP | DHCP | Outbound | DHCP client responses | | 69 | UDP | TFTP | Inbound | TFTP bootloader downloads | | 80 | TCP | HTTP | Inbound | iPXE scripts and boot images | | 443 | TCP | HTTPS | Inbound | Optional: secure boot images | ### Network Bandwidth Estimated bandwidth requirements: - Per-node boot: ~500 MB download (kernel + initrd) - Concurrent boots: Multiply by number of simultaneous boots - Recommended: 1 Gbps link for PXE server Example: Booting 10 nodes simultaneously requires ~5 Gbps throughput burst, so stagger boots or use 10 Gbps link. ## Troubleshooting ### DHCP Issues **Problem**: Server doesn't get IP address **Diagnosis**: ```bash # On PXE server, monitor DHCP requests sudo tcpdump -i eth0 -n port 67 or port 68 # Check DHCP server logs sudo journalctl -u dhcpd4 -f # Verify DHCP server is running sudo systemctl status dhcpd4 ``` **Common causes**: - DHCP server not running on correct interface - Firewall blocking UDP 67/68 - Network cable/switch issue - DHCP range exhausted **Solution**: ```bash # Check interface configuration ip addr show # Verify DHCP config syntax sudo dhcpd -t -cf /etc/dhcp/dhcpd.conf # Check firewall sudo iptables -L -n | grep -E "67|68" # Restart DHCP server sudo systemctl restart dhcpd4 ``` ### TFTP Issues **Problem**: PXE client gets IP but fails to download bootloader **Diagnosis**: ```bash # Monitor TFTP requests sudo tcpdump -i eth0 -n port 69 # Check TFTP server logs sudo journalctl -u atftpd -f # Test TFTP locally tftp localhost -c get undionly.kpxe /tmp/test.kpxe ``` **Common causes**: - TFTP server not running - Bootloader files missing - Permissions incorrect - Firewall blocking UDP 69 **Solution**: ```bash # Check files exist ls -la /var/lib/tftpboot/ # Fix permissions sudo chmod 644 /var/lib/tftpboot/*.{kpxe,efi} # Restart TFTP server sudo systemctl restart atftpd # Check firewall sudo iptables -L -n | grep 69 ``` ### HTTP Issues **Problem**: iPXE loads but can't download boot script or kernel **Diagnosis**: ```bash # Monitor HTTP requests sudo tail -f /var/log/nginx/access.log # Test HTTP locally curl -v http://localhost/boot/ipxe/boot.ipxe curl -v http://localhost/health # Check nginx status sudo systemctl status nginx ``` **Common causes**: - Nginx not running - Boot files missing - Permissions incorrect - Firewall blocking TCP 80 - Wrong server IP in boot.ipxe **Solution**: ```bash # Check nginx config sudo nginx -t # Verify files exist ls -la /var/lib/pxe-boot/ipxe/ ls -la /var/lib/pxe-boot/nixos/ # Fix permissions sudo chown -R nginx:nginx /var/lib/pxe-boot sudo chmod -R 755 /var/lib/pxe-boot # Restart nginx sudo systemctl restart nginx ``` ### Boot Script Issues **Problem**: Boot menu appears but fails to load kernel **Diagnosis**: - Check iPXE error messages on console - Verify URLs in boot.ipxe match actual paths - Test kernel download manually: ```bash curl -I http://10.0.100.10/boot/nixos/bzImage ``` **Common causes**: - NixOS boot images not deployed yet (normal for T032.S2) - Wrong paths in boot.ipxe - Files too large (check disk space) **Solution**: ```bash # Wait for T032.S3 (Image Builder) to generate boot images # OR manually place NixOS netboot images: sudo mkdir -p /var/lib/pxe-boot/nixos # Copy bzImage and initrd from NixOS netboot ``` ### Serial Console Debugging For remote debugging without physical access: 1. **Enable serial console in BIOS**: - Configure COM1/ttyS0 at 115200 baud - Enable console redirection 2. **Connect via IPMI SOL** (if available): ```bash ipmitool -I lanplus -H -U admin sol activate ``` 3. **Watch boot process**: - DHCP discovery messages - TFTP download progress - iPXE boot menu - Kernel boot messages 4. **Kernel parameters include serial console**: ``` console=tty0 console=ttyS0,115200n8 ``` ### Common Error Messages | Error | Cause | Solution | |-------|-------|----------| | `PXE-E51: No DHCP or proxyDHCP offers were received` | DHCP server not responding | Check DHCP server running, network connectivity | | `PXE-E53: No boot filename received` | DHCP not providing filename | Check dhcpd.conf has `filename` option | | `PXE-E32: TFTP open timeout` | TFTP server not responding | Check TFTP server running, firewall rules | | `Not found: /boot/ipxe/boot.ipxe` | HTTP 404 error | Check file exists, nginx config, permissions | | `Could not boot: Exec format error` | Corrupted boot file | Re-download/rebuild bootloader | ## Advanced Topics ### Building iPXE from Source For production deployments, building iPXE from source provides: - Custom branding - Embedded certificates for HTTPS - Optimized size - Security hardening **Build instructions**: ```bash sudo ./setup.sh --build-ipxe ``` Or manually: ```bash git clone https://github.com/ipxe/ipxe.git cd ipxe/src # BIOS bootloader make bin/undionly.kpxe # UEFI bootloader make bin-x86_64-efi/ipxe.efi # Copy to PXE server sudo cp bin/undionly.kpxe /var/lib/pxe-boot/ipxe/ sudo cp bin-x86_64-efi/ipxe.efi /var/lib/pxe-boot/ipxe/ ``` ### HTTPS Boot (Secure Boot) For enhanced security, serve boot images over HTTPS: 1. **Generate SSL certificate**: ```bash sudo openssl req -x509 -nodes -days 365 -newkey rsa:2048 \ -keyout /etc/ssl/private/pxe-server.key \ -out /etc/ssl/certs/pxe-server.crt ``` 2. **Configure nginx for HTTPS** (uncomment HTTPS block in `http/nginx.conf`) 3. **Update boot.ipxe** to use `https://` URLs 4. **Rebuild iPXE with embedded certificate** (for secure boot without prompts) ### Multiple NixOS Versions To support multiple NixOS versions for testing/rollback: ``` /var/lib/pxe-boot/nixos/ ├── 24.05/ │ ├── bzImage │ └── initrd ├── 24.11/ │ ├── bzImage │ └── initrd └── latest -> 24.11/ # Symlink to current version ``` Update `boot.ipxe` to use `/boot/nixos/latest/bzImage` or add menu items for version selection. ### Integration with BMC/IPMI For fully automated provisioning: 1. **Discover new hardware** via IPMI/Redfish API 2. **Configure PXE boot** via IPMI: ```bash ipmitool -I lanplus -H -U admin chassis bootdev pxe options=persistent ``` 3. **Power on server**: ```bash ipmitool -I lanplus -H -U admin power on ``` 4. **Monitor via SOL** (serial-over-LAN) ### Monitoring and Metrics Track PXE boot activity: 1. **DHCP leases**: ```bash cat /var/lib/dhcp/dhcpd.leases ``` 2. **HTTP access logs**: ```bash sudo tail -f /var/log/nginx/access.log | grep -E "boot.ipxe|bzImage|initrd" ``` 3. **Prometheus metrics** (if nginx-module-vts installed): - Boot file download counts - Bandwidth usage - Response times 4. **Custom metrics endpoint**: - Parse nginx access logs - Count boots per profile - Alert on failed boots ## Files and Directory Structure ``` baremetal/pxe-server/ ├── README.md # This file ├── setup.sh # Setup and management script ├── nixos-module.nix # NixOS service module │ ├── dhcp/ │ └── dhcpd.conf # DHCP server configuration │ ├── ipxe/ │ ├── boot.ipxe # Main boot menu script │ └── mac-mappings.txt # MAC address documentation │ ├── http/ │ ├── nginx.conf # HTTP server configuration │ └── directory-structure.txt # Directory layout documentation │ └── assets/ # (Created at runtime) └── /var/lib/pxe-boot/ ├── ipxe/ │ ├── undionly.kpxe │ ├── ipxe.efi │ └── boot.ipxe └── nixos/ ├── bzImage └── initrd ``` ## Next Steps After completing the PXE server setup: 1. **T032.S3 - Image Builder**: Automated NixOS image generation with profile-specific configurations 2. **T032.S4 - Provisioning Orchestrator**: API-driven provisioning workflow and node lifecycle management 3. **Integration with IAM**: Authentication for provisioning API 4. **Integration with FlareDB**: Node inventory and state management ## References - [iPXE Documentation](https://ipxe.org/) - [ISC DHCP Documentation](https://www.isc.org/dhcp/) - [NixOS Manual - Netboot](https://nixos.org/manual/nixos/stable/index.html#sec-building-netboot) - [PXE Specification](https://www.intel.com/content/www/us/en/architecture-and-technology/intel-boot-executive.html) ## Support For issues or questions: - Check [Troubleshooting](#troubleshooting) section - Review logs: `sudo journalctl -u dhcpd4 -u atftpd -u nginx -f` - Run diagnostic: `sudo ./setup.sh --test` ## License Part of Centra Cloud infrastructure - see project root for license information.