- netboot-base.nix with SSH key auth - Launch scripts for node01/02/03 - Node configuration.nix and disko.nix - Nix modules for first-boot automation 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> |
||
|---|---|---|
| .. | ||
| assets | ||
| dhcp | ||
| examples | ||
| http | ||
| ipxe | ||
| .gitignore | ||
| nixos-module.nix | ||
| OVERVIEW.md | ||
| QUICKSTART.md | ||
| README.md | ||
| setup.sh | ||
Centra Cloud PXE Boot Server
This directory contains the PXE (Preboot eXecution Environment) boot infrastructure for bare-metal provisioning of Centra Cloud nodes. It enables network-based installation of NixOS on physical servers with automated profile selection.
Table of Contents
- Architecture Overview
- Components
- Quick Start
- Detailed Setup
- Configuration
- Boot Profiles
- Network Requirements
- Troubleshooting
- Advanced Topics
Architecture Overview
The PXE boot infrastructure consists of three main services:
┌─────────────────────────────────────────────────────────────────┐
│ PXE Boot Flow │
└─────────────────────────────────────────────────────────────────┘
Bare-Metal Server PXE Boot Server
───────────────── ───────────────
1. Power on
│
├─► DHCP Request ──────────────► DHCP Server
│ (ISC DHCP)
│ │
│ ├─ Assigns IP
│ ├─ Detects BIOS/UEFI
│ └─ Provides bootloader path
│
├◄─ DHCP Response ───────────────┤
│ (IP, next-server, filename)
│
├─► TFTP Get bootloader ─────────► TFTP Server
│ (undionly.kpxe or ipxe.efi) (atftpd)
│
├◄─ Bootloader file ─────────────┤
│
├─► Execute iPXE bootloader
│ │
│ ├─► HTTP Get boot.ipxe ──────► HTTP Server
│ │ (nginx)
│ │
│ ├◄─ boot.ipxe script ─────────┤
│ │
│ ├─► Display menu / Auto-select profile
│ │
│ ├─► HTTP Get kernel ──────────► HTTP Server
│ │
│ ├◄─ bzImage ───────────────────┤
│ │
│ ├─► HTTP Get initrd ───────────► HTTP Server
│ │
│ ├◄─ initrd ────────────────────┤
│ │
│ └─► Boot NixOS
│
└─► NixOS Installer
└─ Provisions node based on profile
Components
1. DHCP Server (ISC DHCP)
- Purpose: Assigns IP addresses and directs PXE clients to bootloader
- Config:
dhcp/dhcpd.conf - Features:
- BIOS/UEFI detection via option 93 (architecture type)
- Per-host configuration for fixed IP assignment
- Automatic next-server and filename configuration
2. TFTP Server (atftpd)
- Purpose: Serves iPXE bootloader files to PXE clients
- Files served:
undionly.kpxe- BIOS bootloaderipxe.efi- UEFI x86-64 bootloaderipxe-i386.efi- UEFI x86 32-bit bootloader (optional)
3. HTTP Server (nginx)
- Purpose: Serves iPXE scripts and NixOS boot images
- Config:
http/nginx.conf - Endpoints:
/boot/ipxe/boot.ipxe- Main boot menu script/boot/nixos/bzImage- NixOS kernel/boot/nixos/initrd- NixOS initial ramdisk/health- Health check endpoint
4. iPXE Boot Scripts
- Main script:
ipxe/boot.ipxe - Features:
- Interactive boot menu with 3 profiles
- MAC-based automatic profile selection
- Serial console support for remote management
- Detailed error messages and debugging options
5. NixOS Service Module
- File:
nixos-module.nix - Purpose: Declarative NixOS configuration for all services
- Features:
- Single configuration file for entire stack
- Firewall rules auto-configured
- Systemd service dependencies managed
- Directory structure auto-created
Quick Start
Prerequisites
- NixOS server with network connectivity
- Network interface on the same subnet as bare-metal servers
- Sufficient disk space (5-10 GB for boot images)
Installation Steps
-
Clone this repository (or copy
baremetal/pxe-server/to your NixOS system) -
Run the setup script:
sudo ./setup.sh --install --download --validateThis will:
- Create directory structure at
/var/lib/pxe-boot - Download iPXE bootloaders from boot.ipxe.org
- Install boot scripts
- Validate configurations
- Create directory structure at
-
Configure network settings:
Edit
nixos-module.nixor create a NixOS configuration:# /etc/nixos/configuration.nix imports = [ /path/to/baremetal/pxe-server/nixos-module.nix ]; services.centra-pxe-server = { enable = true; interface = "eth0"; # Your network interface serverAddress = "10.0.100.10"; # PXE server IP dhcp = { subnet = "10.0.100.0"; netmask = "255.255.255.0"; broadcast = "10.0.100.255"; range = { start = "10.0.100.100"; end = "10.0.100.200"; }; router = "10.0.100.1"; }; # Optional: Define known nodes with MAC addresses nodes = { "52:54:00:12:34:56" = { profile = "control-plane"; hostname = "control-plane-01"; ipAddress = "10.0.100.50"; }; }; }; -
Deploy NixOS configuration:
sudo nixos-rebuild switch -
Verify services are running:
sudo ./setup.sh --test -
Add NixOS boot images (will be provided by T032.S3):
# Placeholder - actual images will be built by image builder # For testing, you can use any NixOS netboot image sudo mkdir -p /var/lib/pxe-boot/nixos # Copy bzImage and initrd to /var/lib/pxe-boot/nixos/ -
Boot a bare-metal server:
- Configure server BIOS to boot from network (PXE)
- Connect to same network segment
- Power on server
- Watch for DHCP discovery and iPXE boot menu
Detailed Setup
Option 1: NixOS Module (Recommended)
The NixOS module provides a declarative way to configure the entire PXE server stack.
Advantages:
- Single configuration file
- Automatic service dependencies
- Rollback capability
- Integration with NixOS firewall
Configuration Example:
See the NixOS configuration example in Quick Start.
Option 2: Manual Installation
For non-NixOS systems or manual setup:
-
Install required packages:
# Debian/Ubuntu apt-get install isc-dhcp-server atftpd nginx curl # RHEL/CentOS yum install dhcp tftp-server nginx curl -
Run setup script:
sudo ./setup.sh --install --download -
Copy configuration files:
# DHCP configuration sudo cp dhcp/dhcpd.conf /etc/dhcp/dhcpd.conf # Edit to match your network sudo vim /etc/dhcp/dhcpd.conf # Nginx configuration sudo cp http/nginx.conf /etc/nginx/sites-available/pxe-boot sudo ln -s /etc/nginx/sites-available/pxe-boot /etc/nginx/sites-enabled/ -
Start services:
sudo systemctl enable --now isc-dhcp-server sudo systemctl enable --now atftpd sudo systemctl enable --now nginx -
Configure firewall:
# UFW (Ubuntu) sudo ufw allow 67/udp # DHCP sudo ufw allow 68/udp # DHCP sudo ufw allow 69/udp # TFTP sudo ufw allow 80/tcp # HTTP # firewalld (RHEL) sudo firewall-cmd --permanent --add-service=dhcp sudo firewall-cmd --permanent --add-service=tftp sudo firewall-cmd --permanent --add-service=http sudo firewall-cmd --reload
Configuration
DHCP Configuration
The DHCP server configuration is in dhcp/dhcpd.conf. Key sections:
Network Settings:
subnet 10.0.100.0 netmask 255.255.255.0 {
range 10.0.100.100 10.0.100.200;
option routers 10.0.100.1;
option domain-name-servers 10.0.100.1, 8.8.8.8;
next-server 10.0.100.10; # PXE server IP
# ...
}
Boot File Selection (automatic BIOS/UEFI detection):
if exists user-class and option user-class = "iPXE" {
filename "http://10.0.100.10/boot/ipxe/boot.ipxe";
} elsif option architecture-type = 00:00 {
filename "undionly.kpxe"; # BIOS
} elsif option architecture-type = 00:07 {
filename "ipxe.efi"; # UEFI x86-64
}
Host-Specific Configuration:
host control-plane-01 {
hardware ethernet 52:54:00:12:34:56;
fixed-address 10.0.100.50;
option host-name "control-plane-01";
}
iPXE Boot Script
The main boot script is ipxe/boot.ipxe. It provides:
-
MAC-based automatic selection:
iseq ${mac} 52:54:00:12:34:56 && set profile control-plane && goto boot || -
Interactive menu (if no MAC match):
:menu menu Centra Cloud - Bare-Metal Provisioning item control-plane 1. Control Plane Node (All Services) item worker 2. Worker Node (Compute Services) item all-in-one 3. All-in-One Node (Testing/Homelab) -
Kernel parameters:
set kernel-params centra.profile=${profile} set kernel-params ${kernel-params} centra.hostname=${hostname} set kernel-params ${kernel-params} console=tty0 console=ttyS0,115200n8
Adding New Nodes
To add a new node to the infrastructure:
-
Get the MAC address from the server (check BIOS or network card label)
-
Add to MAC mappings (
ipxe/mac-mappings.txt):52:54:00:12:34:5d worker worker-04 -
Update boot script (
ipxe/boot.ipxe):iseq ${mac} 52:54:00:12:34:5d && set profile worker && set hostname worker-04 && goto boot || -
Add DHCP host entry (
dhcp/dhcpd.conf):host worker-04 { hardware ethernet 52:54:00:12:34:5d; fixed-address 10.0.100.64; option host-name "worker-04"; } -
Restart DHCP service:
sudo systemctl restart dhcpd4
Boot Profiles
1. Control Plane Profile
Purpose: Nodes that run core infrastructure services
Services included:
- FlareDB (PD, Store, TiKV-compatible database)
- IAM (Identity and Access Management)
- PlasmaVMC (Virtual Machine Controller)
- K8sHost (Kubernetes node agent)
- FlashDNS (High-performance DNS)
- ChainFire (Firewall/networking)
- Object Storage (S3-compatible)
- Monitoring (Prometheus, Grafana)
Resource requirements:
- CPU: 8+ cores recommended
- RAM: 32+ GB recommended
- Disk: 500+ GB SSD
Use case: Production control plane nodes in a cluster
2. Worker Profile
Purpose: Nodes that run customer workloads
Services included:
- K8sHost (Kubernetes node agent) - primary service
- PlasmaVMC (Virtual Machine Controller) - VM workloads
- ChainFire (Network policy enforcement)
- FlashDNS (Local DNS caching)
- Basic monitoring agents
Resource requirements:
- CPU: 16+ cores recommended
- RAM: 64+ GB recommended
- Disk: 1+ TB SSD
Use case: Worker nodes for running customer applications
3. All-in-One Profile
Purpose: Single-node deployment for testing and development
Services included:
- Complete Centra Cloud stack on one node
- All services from control-plane profile
- Suitable for testing, development, homelab
Resource requirements:
- CPU: 16+ cores recommended
- RAM: 64+ GB recommended
- Disk: 1+ TB SSD
Use case: Development, testing, homelab deployments
Warning: Not recommended for production use (no HA, resource intensive)
Network Requirements
Network Topology
The PXE server must be on the same network segment as the bare-metal servers, or you must configure DHCP relay.
Same Segment (recommended for initial setup):
┌──────────────┐ ┌──────────────────┐
│ PXE Server │ │ Bare-Metal Srv │
│ 10.0.100.10 │◄────────┤ (DHCP client) │
└──────────────┘ L2 SW └──────────────────┘
Different Segments (requires DHCP relay):
┌──────────────┐ ┌──────────┐ ┌──────────────────┐
│ PXE Server │ │ Router │ │ Bare-Metal Srv │
│ 10.0.100.10 │◄────────┤ (relay) │◄────────┤ (DHCP client) │
└──────────────┘ └──────────┘ └──────────────────┘
Segment A ip helper Segment B
DHCP Relay Configuration
If your PXE server is on a different network segment:
Cisco IOS:
interface vlan 100
ip helper-address 10.0.100.10
Linux (dhcp-helper):
apt-get install dhcp-helper
# Edit /etc/default/dhcp-helper
DHCPHELPER_OPTS="-s 10.0.100.10"
systemctl restart dhcp-helper
Linux (dhcrelay):
apt-get install isc-dhcp-relay
dhcrelay -i eth0 -i eth1 10.0.100.10
Firewall Rules
The following ports must be open on the PXE server:
| Port | Protocol | Service | Direction | Description |
|---|---|---|---|---|
| 67 | UDP | DHCP | Inbound | DHCP server |
| 68 | UDP | DHCP | Outbound | DHCP client responses |
| 69 | UDP | TFTP | Inbound | TFTP bootloader downloads |
| 80 | TCP | HTTP | Inbound | iPXE scripts and boot images |
| 443 | TCP | HTTPS | Inbound | Optional: secure boot images |
Network Bandwidth
Estimated bandwidth requirements:
- Per-node boot: ~500 MB download (kernel + initrd)
- Concurrent boots: Multiply by number of simultaneous boots
- Recommended: 1 Gbps link for PXE server
Example: Booting 10 nodes simultaneously requires ~5 Gbps throughput burst, so stagger boots or use 10 Gbps link.
Troubleshooting
DHCP Issues
Problem: Server doesn't get IP address
Diagnosis:
# On PXE server, monitor DHCP requests
sudo tcpdump -i eth0 -n port 67 or port 68
# Check DHCP server logs
sudo journalctl -u dhcpd4 -f
# Verify DHCP server is running
sudo systemctl status dhcpd4
Common causes:
- DHCP server not running on correct interface
- Firewall blocking UDP 67/68
- Network cable/switch issue
- DHCP range exhausted
Solution:
# Check interface configuration
ip addr show
# Verify DHCP config syntax
sudo dhcpd -t -cf /etc/dhcp/dhcpd.conf
# Check firewall
sudo iptables -L -n | grep -E "67|68"
# Restart DHCP server
sudo systemctl restart dhcpd4
TFTP Issues
Problem: PXE client gets IP but fails to download bootloader
Diagnosis:
# Monitor TFTP requests
sudo tcpdump -i eth0 -n port 69
# Check TFTP server logs
sudo journalctl -u atftpd -f
# Test TFTP locally
tftp localhost -c get undionly.kpxe /tmp/test.kpxe
Common causes:
- TFTP server not running
- Bootloader files missing
- Permissions incorrect
- Firewall blocking UDP 69
Solution:
# Check files exist
ls -la /var/lib/tftpboot/
# Fix permissions
sudo chmod 644 /var/lib/tftpboot/*.{kpxe,efi}
# Restart TFTP server
sudo systemctl restart atftpd
# Check firewall
sudo iptables -L -n | grep 69
HTTP Issues
Problem: iPXE loads but can't download boot script or kernel
Diagnosis:
# Monitor HTTP requests
sudo tail -f /var/log/nginx/access.log
# Test HTTP locally
curl -v http://localhost/boot/ipxe/boot.ipxe
curl -v http://localhost/health
# Check nginx status
sudo systemctl status nginx
Common causes:
- Nginx not running
- Boot files missing
- Permissions incorrect
- Firewall blocking TCP 80
- Wrong server IP in boot.ipxe
Solution:
# Check nginx config
sudo nginx -t
# Verify files exist
ls -la /var/lib/pxe-boot/ipxe/
ls -la /var/lib/pxe-boot/nixos/
# Fix permissions
sudo chown -R nginx:nginx /var/lib/pxe-boot
sudo chmod -R 755 /var/lib/pxe-boot
# Restart nginx
sudo systemctl restart nginx
Boot Script Issues
Problem: Boot menu appears but fails to load kernel
Diagnosis:
- Check iPXE error messages on console
- Verify URLs in boot.ipxe match actual paths
- Test kernel download manually:
curl -I http://10.0.100.10/boot/nixos/bzImage
Common causes:
- NixOS boot images not deployed yet (normal for T032.S2)
- Wrong paths in boot.ipxe
- Files too large (check disk space)
Solution:
# Wait for T032.S3 (Image Builder) to generate boot images
# OR manually place NixOS netboot images:
sudo mkdir -p /var/lib/pxe-boot/nixos
# Copy bzImage and initrd from NixOS netboot
Serial Console Debugging
For remote debugging without physical access:
-
Enable serial console in BIOS:
- Configure COM1/ttyS0 at 115200 baud
- Enable console redirection
-
Connect via IPMI SOL (if available):
ipmitool -I lanplus -H <bmc-ip> -U admin sol activate -
Watch boot process:
- DHCP discovery messages
- TFTP download progress
- iPXE boot menu
- Kernel boot messages
-
Kernel parameters include serial console:
console=tty0 console=ttyS0,115200n8
Common Error Messages
| Error | Cause | Solution |
|---|---|---|
PXE-E51: No DHCP or proxyDHCP offers were received |
DHCP server not responding | Check DHCP server running, network connectivity |
PXE-E53: No boot filename received |
DHCP not providing filename | Check dhcpd.conf has filename option |
PXE-E32: TFTP open timeout |
TFTP server not responding | Check TFTP server running, firewall rules |
Not found: /boot/ipxe/boot.ipxe |
HTTP 404 error | Check file exists, nginx config, permissions |
Could not boot: Exec format error |
Corrupted boot file | Re-download/rebuild bootloader |
Advanced Topics
Building iPXE from Source
For production deployments, building iPXE from source provides:
- Custom branding
- Embedded certificates for HTTPS
- Optimized size
- Security hardening
Build instructions:
sudo ./setup.sh --build-ipxe
Or manually:
git clone https://github.com/ipxe/ipxe.git
cd ipxe/src
# BIOS bootloader
make bin/undionly.kpxe
# UEFI bootloader
make bin-x86_64-efi/ipxe.efi
# Copy to PXE server
sudo cp bin/undionly.kpxe /var/lib/pxe-boot/ipxe/
sudo cp bin-x86_64-efi/ipxe.efi /var/lib/pxe-boot/ipxe/
HTTPS Boot (Secure Boot)
For enhanced security, serve boot images over HTTPS:
-
Generate SSL certificate:
sudo openssl req -x509 -nodes -days 365 -newkey rsa:2048 \ -keyout /etc/ssl/private/pxe-server.key \ -out /etc/ssl/certs/pxe-server.crt -
Configure nginx for HTTPS (uncomment HTTPS block in
http/nginx.conf) -
Update boot.ipxe to use
https://URLs -
Rebuild iPXE with embedded certificate (for secure boot without prompts)
Multiple NixOS Versions
To support multiple NixOS versions for testing/rollback:
/var/lib/pxe-boot/nixos/
├── 24.05/
│ ├── bzImage
│ └── initrd
├── 24.11/
│ ├── bzImage
│ └── initrd
└── latest -> 24.11/ # Symlink to current version
Update boot.ipxe to use /boot/nixos/latest/bzImage or add menu items for version selection.
Integration with BMC/IPMI
For fully automated provisioning:
- Discover new hardware via IPMI/Redfish API
- Configure PXE boot via IPMI:
ipmitool -I lanplus -H <bmc-ip> -U admin chassis bootdev pxe options=persistent - Power on server:
ipmitool -I lanplus -H <bmc-ip> -U admin power on - Monitor via SOL (serial-over-LAN)
Monitoring and Metrics
Track PXE boot activity:
-
DHCP leases:
cat /var/lib/dhcp/dhcpd.leases -
HTTP access logs:
sudo tail -f /var/log/nginx/access.log | grep -E "boot.ipxe|bzImage|initrd" -
Prometheus metrics (if nginx-module-vts installed):
- Boot file download counts
- Bandwidth usage
- Response times
-
Custom metrics endpoint:
- Parse nginx access logs
- Count boots per profile
- Alert on failed boots
Files and Directory Structure
baremetal/pxe-server/
├── README.md # This file
├── setup.sh # Setup and management script
├── nixos-module.nix # NixOS service module
│
├── dhcp/
│ └── dhcpd.conf # DHCP server configuration
│
├── ipxe/
│ ├── boot.ipxe # Main boot menu script
│ └── mac-mappings.txt # MAC address documentation
│
├── http/
│ ├── nginx.conf # HTTP server configuration
│ └── directory-structure.txt # Directory layout documentation
│
└── assets/ # (Created at runtime)
└── /var/lib/pxe-boot/
├── ipxe/
│ ├── undionly.kpxe
│ ├── ipxe.efi
│ └── boot.ipxe
└── nixos/
├── bzImage
└── initrd
Next Steps
After completing the PXE server setup:
-
T032.S3 - Image Builder: Automated NixOS image generation with profile-specific configurations
-
T032.S4 - Provisioning Orchestrator: API-driven provisioning workflow and node lifecycle management
-
Integration with IAM: Authentication for provisioning API
-
Integration with FlareDB: Node inventory and state management
References
Support
For issues or questions:
- Check Troubleshooting section
- Review logs:
sudo journalctl -u dhcpd4 -u atftpd -u nginx -f - Run diagnostic:
sudo ./setup.sh --test
License
Part of Centra Cloud infrastructure - see project root for license information.