- netboot-base.nix with SSH key auth - Launch scripts for node01/02/03 - Node configuration.nix and disko.nix - Nix modules for first-boot automation 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
829 lines
22 KiB
Markdown
829 lines
22 KiB
Markdown
# Centra Cloud PXE Boot Server
|
|
|
|
This directory contains the PXE (Preboot eXecution Environment) boot infrastructure for bare-metal provisioning of Centra Cloud nodes. It enables network-based installation of NixOS on physical servers with automated profile selection.
|
|
|
|
## Table of Contents
|
|
|
|
- [Architecture Overview](#architecture-overview)
|
|
- [Components](#components)
|
|
- [Quick Start](#quick-start)
|
|
- [Detailed Setup](#detailed-setup)
|
|
- [Configuration](#configuration)
|
|
- [Boot Profiles](#boot-profiles)
|
|
- [Network Requirements](#network-requirements)
|
|
- [Troubleshooting](#troubleshooting)
|
|
- [Advanced Topics](#advanced-topics)
|
|
|
|
## Architecture Overview
|
|
|
|
The PXE boot infrastructure consists of three main services:
|
|
|
|
```
|
|
┌─────────────────────────────────────────────────────────────────┐
|
|
│ PXE Boot Flow │
|
|
└─────────────────────────────────────────────────────────────────┘
|
|
|
|
Bare-Metal Server PXE Boot Server
|
|
───────────────── ───────────────
|
|
|
|
1. Power on
|
|
│
|
|
├─► DHCP Request ──────────────► DHCP Server
|
|
│ (ISC DHCP)
|
|
│ │
|
|
│ ├─ Assigns IP
|
|
│ ├─ Detects BIOS/UEFI
|
|
│ └─ Provides bootloader path
|
|
│
|
|
├◄─ DHCP Response ───────────────┤
|
|
│ (IP, next-server, filename)
|
|
│
|
|
├─► TFTP Get bootloader ─────────► TFTP Server
|
|
│ (undionly.kpxe or ipxe.efi) (atftpd)
|
|
│
|
|
├◄─ Bootloader file ─────────────┤
|
|
│
|
|
├─► Execute iPXE bootloader
|
|
│ │
|
|
│ ├─► HTTP Get boot.ipxe ──────► HTTP Server
|
|
│ │ (nginx)
|
|
│ │
|
|
│ ├◄─ boot.ipxe script ─────────┤
|
|
│ │
|
|
│ ├─► Display menu / Auto-select profile
|
|
│ │
|
|
│ ├─► HTTP Get kernel ──────────► HTTP Server
|
|
│ │
|
|
│ ├◄─ bzImage ───────────────────┤
|
|
│ │
|
|
│ ├─► HTTP Get initrd ───────────► HTTP Server
|
|
│ │
|
|
│ ├◄─ initrd ────────────────────┤
|
|
│ │
|
|
│ └─► Boot NixOS
|
|
│
|
|
└─► NixOS Installer
|
|
└─ Provisions node based on profile
|
|
```
|
|
|
|
## Components
|
|
|
|
### 1. DHCP Server (ISC DHCP)
|
|
|
|
- **Purpose**: Assigns IP addresses and directs PXE clients to bootloader
|
|
- **Config**: `dhcp/dhcpd.conf`
|
|
- **Features**:
|
|
- BIOS/UEFI detection via option 93 (architecture type)
|
|
- Per-host configuration for fixed IP assignment
|
|
- Automatic next-server and filename configuration
|
|
|
|
### 2. TFTP Server (atftpd)
|
|
|
|
- **Purpose**: Serves iPXE bootloader files to PXE clients
|
|
- **Files served**:
|
|
- `undionly.kpxe` - BIOS bootloader
|
|
- `ipxe.efi` - UEFI x86-64 bootloader
|
|
- `ipxe-i386.efi` - UEFI x86 32-bit bootloader (optional)
|
|
|
|
### 3. HTTP Server (nginx)
|
|
|
|
- **Purpose**: Serves iPXE scripts and NixOS boot images
|
|
- **Config**: `http/nginx.conf`
|
|
- **Endpoints**:
|
|
- `/boot/ipxe/boot.ipxe` - Main boot menu script
|
|
- `/boot/nixos/bzImage` - NixOS kernel
|
|
- `/boot/nixos/initrd` - NixOS initial ramdisk
|
|
- `/health` - Health check endpoint
|
|
|
|
### 4. iPXE Boot Scripts
|
|
|
|
- **Main script**: `ipxe/boot.ipxe`
|
|
- **Features**:
|
|
- Interactive boot menu with 3 profiles
|
|
- MAC-based automatic profile selection
|
|
- Serial console support for remote management
|
|
- Detailed error messages and debugging options
|
|
|
|
### 5. NixOS Service Module
|
|
|
|
- **File**: `nixos-module.nix`
|
|
- **Purpose**: Declarative NixOS configuration for all services
|
|
- **Features**:
|
|
- Single configuration file for entire stack
|
|
- Firewall rules auto-configured
|
|
- Systemd service dependencies managed
|
|
- Directory structure auto-created
|
|
|
|
## Quick Start
|
|
|
|
### Prerequisites
|
|
|
|
- NixOS server with network connectivity
|
|
- Network interface on the same subnet as bare-metal servers
|
|
- Sufficient disk space (5-10 GB for boot images)
|
|
|
|
### Installation Steps
|
|
|
|
1. **Clone this repository** (or copy `baremetal/pxe-server/` to your NixOS system)
|
|
|
|
2. **Run the setup script**:
|
|
```bash
|
|
sudo ./setup.sh --install --download --validate
|
|
```
|
|
|
|
This will:
|
|
- Create directory structure at `/var/lib/pxe-boot`
|
|
- Download iPXE bootloaders from boot.ipxe.org
|
|
- Install boot scripts
|
|
- Validate configurations
|
|
|
|
3. **Configure network settings**:
|
|
|
|
Edit `nixos-module.nix` or create a NixOS configuration:
|
|
|
|
```nix
|
|
# /etc/nixos/configuration.nix
|
|
|
|
imports = [
|
|
/path/to/baremetal/pxe-server/nixos-module.nix
|
|
];
|
|
|
|
services.centra-pxe-server = {
|
|
enable = true;
|
|
interface = "eth0"; # Your network interface
|
|
serverAddress = "10.0.100.10"; # PXE server IP
|
|
|
|
dhcp = {
|
|
subnet = "10.0.100.0";
|
|
netmask = "255.255.255.0";
|
|
broadcast = "10.0.100.255";
|
|
range = {
|
|
start = "10.0.100.100";
|
|
end = "10.0.100.200";
|
|
};
|
|
router = "10.0.100.1";
|
|
};
|
|
|
|
# Optional: Define known nodes with MAC addresses
|
|
nodes = {
|
|
"52:54:00:12:34:56" = {
|
|
profile = "control-plane";
|
|
hostname = "control-plane-01";
|
|
ipAddress = "10.0.100.50";
|
|
};
|
|
};
|
|
};
|
|
```
|
|
|
|
4. **Deploy NixOS configuration**:
|
|
```bash
|
|
sudo nixos-rebuild switch
|
|
```
|
|
|
|
5. **Verify services are running**:
|
|
```bash
|
|
sudo ./setup.sh --test
|
|
```
|
|
|
|
6. **Add NixOS boot images** (will be provided by T032.S3):
|
|
```bash
|
|
# Placeholder - actual images will be built by image builder
|
|
# For testing, you can use any NixOS netboot image
|
|
sudo mkdir -p /var/lib/pxe-boot/nixos
|
|
# Copy bzImage and initrd to /var/lib/pxe-boot/nixos/
|
|
```
|
|
|
|
7. **Boot a bare-metal server**:
|
|
- Configure server BIOS to boot from network (PXE)
|
|
- Connect to same network segment
|
|
- Power on server
|
|
- Watch for DHCP discovery and iPXE boot menu
|
|
|
|
## Detailed Setup
|
|
|
|
### Option 1: NixOS Module (Recommended)
|
|
|
|
The NixOS module provides a declarative way to configure the entire PXE server stack.
|
|
|
|
**Advantages**:
|
|
- Single configuration file
|
|
- Automatic service dependencies
|
|
- Rollback capability
|
|
- Integration with NixOS firewall
|
|
|
|
**Configuration Example**:
|
|
|
|
See the NixOS configuration example in [Quick Start](#quick-start).
|
|
|
|
### Option 2: Manual Installation
|
|
|
|
For non-NixOS systems or manual setup:
|
|
|
|
1. **Install required packages**:
|
|
```bash
|
|
# Debian/Ubuntu
|
|
apt-get install isc-dhcp-server atftpd nginx curl
|
|
|
|
# RHEL/CentOS
|
|
yum install dhcp tftp-server nginx curl
|
|
```
|
|
|
|
2. **Run setup script**:
|
|
```bash
|
|
sudo ./setup.sh --install --download
|
|
```
|
|
|
|
3. **Copy configuration files**:
|
|
```bash
|
|
# DHCP configuration
|
|
sudo cp dhcp/dhcpd.conf /etc/dhcp/dhcpd.conf
|
|
|
|
# Edit to match your network
|
|
sudo vim /etc/dhcp/dhcpd.conf
|
|
|
|
# Nginx configuration
|
|
sudo cp http/nginx.conf /etc/nginx/sites-available/pxe-boot
|
|
sudo ln -s /etc/nginx/sites-available/pxe-boot /etc/nginx/sites-enabled/
|
|
```
|
|
|
|
4. **Start services**:
|
|
```bash
|
|
sudo systemctl enable --now isc-dhcp-server
|
|
sudo systemctl enable --now atftpd
|
|
sudo systemctl enable --now nginx
|
|
```
|
|
|
|
5. **Configure firewall**:
|
|
```bash
|
|
# UFW (Ubuntu)
|
|
sudo ufw allow 67/udp # DHCP
|
|
sudo ufw allow 68/udp # DHCP
|
|
sudo ufw allow 69/udp # TFTP
|
|
sudo ufw allow 80/tcp # HTTP
|
|
|
|
# firewalld (RHEL)
|
|
sudo firewall-cmd --permanent --add-service=dhcp
|
|
sudo firewall-cmd --permanent --add-service=tftp
|
|
sudo firewall-cmd --permanent --add-service=http
|
|
sudo firewall-cmd --reload
|
|
```
|
|
|
|
## Configuration
|
|
|
|
### DHCP Configuration
|
|
|
|
The DHCP server configuration is in `dhcp/dhcpd.conf`. Key sections:
|
|
|
|
**Network Settings**:
|
|
```conf
|
|
subnet 10.0.100.0 netmask 255.255.255.0 {
|
|
range 10.0.100.100 10.0.100.200;
|
|
option routers 10.0.100.1;
|
|
option domain-name-servers 10.0.100.1, 8.8.8.8;
|
|
next-server 10.0.100.10; # PXE server IP
|
|
# ...
|
|
}
|
|
```
|
|
|
|
**Boot File Selection** (automatic BIOS/UEFI detection):
|
|
```conf
|
|
if exists user-class and option user-class = "iPXE" {
|
|
filename "http://10.0.100.10/boot/ipxe/boot.ipxe";
|
|
} elsif option architecture-type = 00:00 {
|
|
filename "undionly.kpxe"; # BIOS
|
|
} elsif option architecture-type = 00:07 {
|
|
filename "ipxe.efi"; # UEFI x86-64
|
|
}
|
|
```
|
|
|
|
**Host-Specific Configuration**:
|
|
```conf
|
|
host control-plane-01 {
|
|
hardware ethernet 52:54:00:12:34:56;
|
|
fixed-address 10.0.100.50;
|
|
option host-name "control-plane-01";
|
|
}
|
|
```
|
|
|
|
### iPXE Boot Script
|
|
|
|
The main boot script is `ipxe/boot.ipxe`. It provides:
|
|
|
|
1. **MAC-based automatic selection**:
|
|
```ipxe
|
|
iseq ${mac} 52:54:00:12:34:56 && set profile control-plane && goto boot ||
|
|
```
|
|
|
|
2. **Interactive menu** (if no MAC match):
|
|
```ipxe
|
|
:menu
|
|
menu Centra Cloud - Bare-Metal Provisioning
|
|
item control-plane 1. Control Plane Node (All Services)
|
|
item worker 2. Worker Node (Compute Services)
|
|
item all-in-one 3. All-in-One Node (Testing/Homelab)
|
|
```
|
|
|
|
3. **Kernel parameters**:
|
|
```ipxe
|
|
set kernel-params centra.profile=${profile}
|
|
set kernel-params ${kernel-params} centra.hostname=${hostname}
|
|
set kernel-params ${kernel-params} console=tty0 console=ttyS0,115200n8
|
|
```
|
|
|
|
### Adding New Nodes
|
|
|
|
To add a new node to the infrastructure:
|
|
|
|
1. **Get the MAC address** from the server (check BIOS or network card label)
|
|
|
|
2. **Add to MAC mappings** (`ipxe/mac-mappings.txt`):
|
|
```
|
|
52:54:00:12:34:5d worker worker-04
|
|
```
|
|
|
|
3. **Update boot script** (`ipxe/boot.ipxe`):
|
|
```ipxe
|
|
iseq ${mac} 52:54:00:12:34:5d && set profile worker && set hostname worker-04 && goto boot ||
|
|
```
|
|
|
|
4. **Add DHCP host entry** (`dhcp/dhcpd.conf`):
|
|
```conf
|
|
host worker-04 {
|
|
hardware ethernet 52:54:00:12:34:5d;
|
|
fixed-address 10.0.100.64;
|
|
option host-name "worker-04";
|
|
}
|
|
```
|
|
|
|
5. **Restart DHCP service**:
|
|
```bash
|
|
sudo systemctl restart dhcpd4
|
|
```
|
|
|
|
## Boot Profiles
|
|
|
|
### 1. Control Plane Profile
|
|
|
|
**Purpose**: Nodes that run core infrastructure services
|
|
|
|
**Services included**:
|
|
- FlareDB (PD, Store, TiKV-compatible database)
|
|
- IAM (Identity and Access Management)
|
|
- PlasmaVMC (Virtual Machine Controller)
|
|
- K8sHost (Kubernetes node agent)
|
|
- FlashDNS (High-performance DNS)
|
|
- ChainFire (Firewall/networking)
|
|
- Object Storage (S3-compatible)
|
|
- Monitoring (Prometheus, Grafana)
|
|
|
|
**Resource requirements**:
|
|
- CPU: 8+ cores recommended
|
|
- RAM: 32+ GB recommended
|
|
- Disk: 500+ GB SSD
|
|
|
|
**Use case**: Production control plane nodes in a cluster
|
|
|
|
### 2. Worker Profile
|
|
|
|
**Purpose**: Nodes that run customer workloads
|
|
|
|
**Services included**:
|
|
- K8sHost (Kubernetes node agent) - primary service
|
|
- PlasmaVMC (Virtual Machine Controller) - VM workloads
|
|
- ChainFire (Network policy enforcement)
|
|
- FlashDNS (Local DNS caching)
|
|
- Basic monitoring agents
|
|
|
|
**Resource requirements**:
|
|
- CPU: 16+ cores recommended
|
|
- RAM: 64+ GB recommended
|
|
- Disk: 1+ TB SSD
|
|
|
|
**Use case**: Worker nodes for running customer applications
|
|
|
|
### 3. All-in-One Profile
|
|
|
|
**Purpose**: Single-node deployment for testing and development
|
|
|
|
**Services included**:
|
|
- Complete Centra Cloud stack on one node
|
|
- All services from control-plane profile
|
|
- Suitable for testing, development, homelab
|
|
|
|
**Resource requirements**:
|
|
- CPU: 16+ cores recommended
|
|
- RAM: 64+ GB recommended
|
|
- Disk: 1+ TB SSD
|
|
|
|
**Use case**: Development, testing, homelab deployments
|
|
|
|
**Warning**: Not recommended for production use (no HA, resource intensive)
|
|
|
|
## Network Requirements
|
|
|
|
### Network Topology
|
|
|
|
The PXE server must be on the same network segment as the bare-metal servers, or you must configure DHCP relay.
|
|
|
|
**Same Segment** (recommended for initial setup):
|
|
```
|
|
┌──────────────┐ ┌──────────────────┐
|
|
│ PXE Server │ │ Bare-Metal Srv │
|
|
│ 10.0.100.10 │◄────────┤ (DHCP client) │
|
|
└──────────────┘ L2 SW └──────────────────┘
|
|
```
|
|
|
|
**Different Segments** (requires DHCP relay):
|
|
```
|
|
┌──────────────┐ ┌──────────┐ ┌──────────────────┐
|
|
│ PXE Server │ │ Router │ │ Bare-Metal Srv │
|
|
│ 10.0.100.10 │◄────────┤ (relay) │◄────────┤ (DHCP client) │
|
|
└──────────────┘ └──────────┘ └──────────────────┘
|
|
Segment A ip helper Segment B
|
|
```
|
|
|
|
### DHCP Relay Configuration
|
|
|
|
If your PXE server is on a different network segment:
|
|
|
|
**Cisco IOS**:
|
|
```
|
|
interface vlan 100
|
|
ip helper-address 10.0.100.10
|
|
```
|
|
|
|
**Linux (dhcp-helper)**:
|
|
```bash
|
|
apt-get install dhcp-helper
|
|
# Edit /etc/default/dhcp-helper
|
|
DHCPHELPER_OPTS="-s 10.0.100.10"
|
|
systemctl restart dhcp-helper
|
|
```
|
|
|
|
**Linux (dhcrelay)**:
|
|
```bash
|
|
apt-get install isc-dhcp-relay
|
|
dhcrelay -i eth0 -i eth1 10.0.100.10
|
|
```
|
|
|
|
### Firewall Rules
|
|
|
|
The following ports must be open on the PXE server:
|
|
|
|
| Port | Protocol | Service | Direction | Description |
|
|
|------|----------|---------|-----------|-------------|
|
|
| 67 | UDP | DHCP | Inbound | DHCP server |
|
|
| 68 | UDP | DHCP | Outbound | DHCP client responses |
|
|
| 69 | UDP | TFTP | Inbound | TFTP bootloader downloads |
|
|
| 80 | TCP | HTTP | Inbound | iPXE scripts and boot images |
|
|
| 443 | TCP | HTTPS | Inbound | Optional: secure boot images |
|
|
|
|
### Network Bandwidth
|
|
|
|
Estimated bandwidth requirements:
|
|
|
|
- Per-node boot: ~500 MB download (kernel + initrd)
|
|
- Concurrent boots: Multiply by number of simultaneous boots
|
|
- Recommended: 1 Gbps link for PXE server
|
|
|
|
Example: Booting 10 nodes simultaneously requires ~5 Gbps throughput burst, so stagger boots or use 10 Gbps link.
|
|
|
|
## Troubleshooting
|
|
|
|
### DHCP Issues
|
|
|
|
**Problem**: Server doesn't get IP address
|
|
|
|
**Diagnosis**:
|
|
```bash
|
|
# On PXE server, monitor DHCP requests
|
|
sudo tcpdump -i eth0 -n port 67 or port 68
|
|
|
|
# Check DHCP server logs
|
|
sudo journalctl -u dhcpd4 -f
|
|
|
|
# Verify DHCP server is running
|
|
sudo systemctl status dhcpd4
|
|
```
|
|
|
|
**Common causes**:
|
|
- DHCP server not running on correct interface
|
|
- Firewall blocking UDP 67/68
|
|
- Network cable/switch issue
|
|
- DHCP range exhausted
|
|
|
|
**Solution**:
|
|
```bash
|
|
# Check interface configuration
|
|
ip addr show
|
|
|
|
# Verify DHCP config syntax
|
|
sudo dhcpd -t -cf /etc/dhcp/dhcpd.conf
|
|
|
|
# Check firewall
|
|
sudo iptables -L -n | grep -E "67|68"
|
|
|
|
# Restart DHCP server
|
|
sudo systemctl restart dhcpd4
|
|
```
|
|
|
|
### TFTP Issues
|
|
|
|
**Problem**: PXE client gets IP but fails to download bootloader
|
|
|
|
**Diagnosis**:
|
|
```bash
|
|
# Monitor TFTP requests
|
|
sudo tcpdump -i eth0 -n port 69
|
|
|
|
# Check TFTP server logs
|
|
sudo journalctl -u atftpd -f
|
|
|
|
# Test TFTP locally
|
|
tftp localhost -c get undionly.kpxe /tmp/test.kpxe
|
|
```
|
|
|
|
**Common causes**:
|
|
- TFTP server not running
|
|
- Bootloader files missing
|
|
- Permissions incorrect
|
|
- Firewall blocking UDP 69
|
|
|
|
**Solution**:
|
|
```bash
|
|
# Check files exist
|
|
ls -la /var/lib/tftpboot/
|
|
|
|
# Fix permissions
|
|
sudo chmod 644 /var/lib/tftpboot/*.{kpxe,efi}
|
|
|
|
# Restart TFTP server
|
|
sudo systemctl restart atftpd
|
|
|
|
# Check firewall
|
|
sudo iptables -L -n | grep 69
|
|
```
|
|
|
|
### HTTP Issues
|
|
|
|
**Problem**: iPXE loads but can't download boot script or kernel
|
|
|
|
**Diagnosis**:
|
|
```bash
|
|
# Monitor HTTP requests
|
|
sudo tail -f /var/log/nginx/access.log
|
|
|
|
# Test HTTP locally
|
|
curl -v http://localhost/boot/ipxe/boot.ipxe
|
|
curl -v http://localhost/health
|
|
|
|
# Check nginx status
|
|
sudo systemctl status nginx
|
|
```
|
|
|
|
**Common causes**:
|
|
- Nginx not running
|
|
- Boot files missing
|
|
- Permissions incorrect
|
|
- Firewall blocking TCP 80
|
|
- Wrong server IP in boot.ipxe
|
|
|
|
**Solution**:
|
|
```bash
|
|
# Check nginx config
|
|
sudo nginx -t
|
|
|
|
# Verify files exist
|
|
ls -la /var/lib/pxe-boot/ipxe/
|
|
ls -la /var/lib/pxe-boot/nixos/
|
|
|
|
# Fix permissions
|
|
sudo chown -R nginx:nginx /var/lib/pxe-boot
|
|
sudo chmod -R 755 /var/lib/pxe-boot
|
|
|
|
# Restart nginx
|
|
sudo systemctl restart nginx
|
|
```
|
|
|
|
### Boot Script Issues
|
|
|
|
**Problem**: Boot menu appears but fails to load kernel
|
|
|
|
**Diagnosis**:
|
|
- Check iPXE error messages on console
|
|
- Verify URLs in boot.ipxe match actual paths
|
|
- Test kernel download manually:
|
|
```bash
|
|
curl -I http://10.0.100.10/boot/nixos/bzImage
|
|
```
|
|
|
|
**Common causes**:
|
|
- NixOS boot images not deployed yet (normal for T032.S2)
|
|
- Wrong paths in boot.ipxe
|
|
- Files too large (check disk space)
|
|
|
|
**Solution**:
|
|
```bash
|
|
# Wait for T032.S3 (Image Builder) to generate boot images
|
|
# OR manually place NixOS netboot images:
|
|
sudo mkdir -p /var/lib/pxe-boot/nixos
|
|
# Copy bzImage and initrd from NixOS netboot
|
|
```
|
|
|
|
### Serial Console Debugging
|
|
|
|
For remote debugging without physical access:
|
|
|
|
1. **Enable serial console in BIOS**:
|
|
- Configure COM1/ttyS0 at 115200 baud
|
|
- Enable console redirection
|
|
|
|
2. **Connect via IPMI SOL** (if available):
|
|
```bash
|
|
ipmitool -I lanplus -H <bmc-ip> -U admin sol activate
|
|
```
|
|
|
|
3. **Watch boot process**:
|
|
- DHCP discovery messages
|
|
- TFTP download progress
|
|
- iPXE boot menu
|
|
- Kernel boot messages
|
|
|
|
4. **Kernel parameters include serial console**:
|
|
```
|
|
console=tty0 console=ttyS0,115200n8
|
|
```
|
|
|
|
### Common Error Messages
|
|
|
|
| Error | Cause | Solution |
|
|
|-------|-------|----------|
|
|
| `PXE-E51: No DHCP or proxyDHCP offers were received` | DHCP server not responding | Check DHCP server running, network connectivity |
|
|
| `PXE-E53: No boot filename received` | DHCP not providing filename | Check dhcpd.conf has `filename` option |
|
|
| `PXE-E32: TFTP open timeout` | TFTP server not responding | Check TFTP server running, firewall rules |
|
|
| `Not found: /boot/ipxe/boot.ipxe` | HTTP 404 error | Check file exists, nginx config, permissions |
|
|
| `Could not boot: Exec format error` | Corrupted boot file | Re-download/rebuild bootloader |
|
|
|
|
## Advanced Topics
|
|
|
|
### Building iPXE from Source
|
|
|
|
For production deployments, building iPXE from source provides:
|
|
- Custom branding
|
|
- Embedded certificates for HTTPS
|
|
- Optimized size
|
|
- Security hardening
|
|
|
|
**Build instructions**:
|
|
```bash
|
|
sudo ./setup.sh --build-ipxe
|
|
```
|
|
|
|
Or manually:
|
|
```bash
|
|
git clone https://github.com/ipxe/ipxe.git
|
|
cd ipxe/src
|
|
|
|
# BIOS bootloader
|
|
make bin/undionly.kpxe
|
|
|
|
# UEFI bootloader
|
|
make bin-x86_64-efi/ipxe.efi
|
|
|
|
# Copy to PXE server
|
|
sudo cp bin/undionly.kpxe /var/lib/pxe-boot/ipxe/
|
|
sudo cp bin-x86_64-efi/ipxe.efi /var/lib/pxe-boot/ipxe/
|
|
```
|
|
|
|
### HTTPS Boot (Secure Boot)
|
|
|
|
For enhanced security, serve boot images over HTTPS:
|
|
|
|
1. **Generate SSL certificate**:
|
|
```bash
|
|
sudo openssl req -x509 -nodes -days 365 -newkey rsa:2048 \
|
|
-keyout /etc/ssl/private/pxe-server.key \
|
|
-out /etc/ssl/certs/pxe-server.crt
|
|
```
|
|
|
|
2. **Configure nginx for HTTPS** (uncomment HTTPS block in `http/nginx.conf`)
|
|
|
|
3. **Update boot.ipxe** to use `https://` URLs
|
|
|
|
4. **Rebuild iPXE with embedded certificate** (for secure boot without prompts)
|
|
|
|
### Multiple NixOS Versions
|
|
|
|
To support multiple NixOS versions for testing/rollback:
|
|
|
|
```
|
|
/var/lib/pxe-boot/nixos/
|
|
├── 24.05/
|
|
│ ├── bzImage
|
|
│ └── initrd
|
|
├── 24.11/
|
|
│ ├── bzImage
|
|
│ └── initrd
|
|
└── latest -> 24.11/ # Symlink to current version
|
|
```
|
|
|
|
Update `boot.ipxe` to use `/boot/nixos/latest/bzImage` or add menu items for version selection.
|
|
|
|
### Integration with BMC/IPMI
|
|
|
|
For fully automated provisioning:
|
|
|
|
1. **Discover new hardware** via IPMI/Redfish API
|
|
2. **Configure PXE boot** via IPMI:
|
|
```bash
|
|
ipmitool -I lanplus -H <bmc-ip> -U admin chassis bootdev pxe options=persistent
|
|
```
|
|
3. **Power on server**:
|
|
```bash
|
|
ipmitool -I lanplus -H <bmc-ip> -U admin power on
|
|
```
|
|
4. **Monitor via SOL** (serial-over-LAN)
|
|
|
|
### Monitoring and Metrics
|
|
|
|
Track PXE boot activity:
|
|
|
|
1. **DHCP leases**:
|
|
```bash
|
|
cat /var/lib/dhcp/dhcpd.leases
|
|
```
|
|
|
|
2. **HTTP access logs**:
|
|
```bash
|
|
sudo tail -f /var/log/nginx/access.log | grep -E "boot.ipxe|bzImage|initrd"
|
|
```
|
|
|
|
3. **Prometheus metrics** (if nginx-module-vts installed):
|
|
- Boot file download counts
|
|
- Bandwidth usage
|
|
- Response times
|
|
|
|
4. **Custom metrics endpoint**:
|
|
- Parse nginx access logs
|
|
- Count boots per profile
|
|
- Alert on failed boots
|
|
|
|
## Files and Directory Structure
|
|
|
|
```
|
|
baremetal/pxe-server/
|
|
├── README.md # This file
|
|
├── setup.sh # Setup and management script
|
|
├── nixos-module.nix # NixOS service module
|
|
│
|
|
├── dhcp/
|
|
│ └── dhcpd.conf # DHCP server configuration
|
|
│
|
|
├── ipxe/
|
|
│ ├── boot.ipxe # Main boot menu script
|
|
│ └── mac-mappings.txt # MAC address documentation
|
|
│
|
|
├── http/
|
|
│ ├── nginx.conf # HTTP server configuration
|
|
│ └── directory-structure.txt # Directory layout documentation
|
|
│
|
|
└── assets/ # (Created at runtime)
|
|
└── /var/lib/pxe-boot/
|
|
├── ipxe/
|
|
│ ├── undionly.kpxe
|
|
│ ├── ipxe.efi
|
|
│ └── boot.ipxe
|
|
└── nixos/
|
|
├── bzImage
|
|
└── initrd
|
|
```
|
|
|
|
## Next Steps
|
|
|
|
After completing the PXE server setup:
|
|
|
|
1. **T032.S3 - Image Builder**: Automated NixOS image generation with profile-specific configurations
|
|
|
|
2. **T032.S4 - Provisioning Orchestrator**: API-driven provisioning workflow and node lifecycle management
|
|
|
|
3. **Integration with IAM**: Authentication for provisioning API
|
|
|
|
4. **Integration with FlareDB**: Node inventory and state management
|
|
|
|
## References
|
|
|
|
- [iPXE Documentation](https://ipxe.org/)
|
|
- [ISC DHCP Documentation](https://www.isc.org/dhcp/)
|
|
- [NixOS Manual - Netboot](https://nixos.org/manual/nixos/stable/index.html#sec-building-netboot)
|
|
- [PXE Specification](https://www.intel.com/content/www/us/en/architecture-and-technology/intel-boot-executive.html)
|
|
|
|
## Support
|
|
|
|
For issues or questions:
|
|
- Check [Troubleshooting](#troubleshooting) section
|
|
- Review logs: `sudo journalctl -u dhcpd4 -u atftpd -u nginx -f`
|
|
- Run diagnostic: `sudo ./setup.sh --test`
|
|
|
|
## License
|
|
|
|
Part of Centra Cloud infrastructure - see project root for license information.
|