photoncloud-monorepo/chainfire/baremetal/pxe-server
centra 5c6eb04a46 T036: Add VM cluster deployment configs for nixos-anywhere
- netboot-base.nix with SSH key auth
- Launch scripts for node01/02/03
- Node configuration.nix and disko.nix
- Nix modules for first-boot automation

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2025-12-11 09:59:19 +09:00
..
assets T036: Add VM cluster deployment configs for nixos-anywhere 2025-12-11 09:59:19 +09:00
dhcp T036: Add VM cluster deployment configs for nixos-anywhere 2025-12-11 09:59:19 +09:00
examples T036: Add VM cluster deployment configs for nixos-anywhere 2025-12-11 09:59:19 +09:00
http T036: Add VM cluster deployment configs for nixos-anywhere 2025-12-11 09:59:19 +09:00
ipxe T036: Add VM cluster deployment configs for nixos-anywhere 2025-12-11 09:59:19 +09:00
.gitignore T036: Add VM cluster deployment configs for nixos-anywhere 2025-12-11 09:59:19 +09:00
nixos-module.nix T036: Add VM cluster deployment configs for nixos-anywhere 2025-12-11 09:59:19 +09:00
OVERVIEW.md T036: Add VM cluster deployment configs for nixos-anywhere 2025-12-11 09:59:19 +09:00
QUICKSTART.md T036: Add VM cluster deployment configs for nixos-anywhere 2025-12-11 09:59:19 +09:00
README.md T036: Add VM cluster deployment configs for nixos-anywhere 2025-12-11 09:59:19 +09:00
setup.sh T036: Add VM cluster deployment configs for nixos-anywhere 2025-12-11 09:59:19 +09:00

Centra Cloud PXE Boot Server

This directory contains the PXE (Preboot eXecution Environment) boot infrastructure for bare-metal provisioning of Centra Cloud nodes. It enables network-based installation of NixOS on physical servers with automated profile selection.

Table of Contents

Architecture Overview

The PXE boot infrastructure consists of three main services:

┌─────────────────────────────────────────────────────────────────┐
│                        PXE Boot Flow                             │
└─────────────────────────────────────────────────────────────────┘

  Bare-Metal Server                  PXE Boot Server
  ─────────────────                  ───────────────

  1. Power on
     │
     ├─► DHCP Request ──────────────► DHCP Server
     │                                (ISC DHCP)
     │                                │
     │                                ├─ Assigns IP
     │                                ├─ Detects BIOS/UEFI
     │                                └─ Provides bootloader path
     │
     ├◄─ DHCP Response ───────────────┤
     │   (IP, next-server, filename)
     │
     ├─► TFTP Get bootloader ─────────► TFTP Server
     │   (undionly.kpxe or ipxe.efi)   (atftpd)
     │
     ├◄─ Bootloader file ─────────────┤
     │
     ├─► Execute iPXE bootloader
     │   │
     │   ├─► HTTP Get boot.ipxe ──────► HTTP Server
     │   │                              (nginx)
     │   │
     │   ├◄─ boot.ipxe script ─────────┤
     │   │
     │   ├─► Display menu / Auto-select profile
     │   │
     │   ├─► HTTP Get kernel ──────────► HTTP Server
     │   │
     │   ├◄─ bzImage ───────────────────┤
     │   │
     │   ├─► HTTP Get initrd ───────────► HTTP Server
     │   │
     │   ├◄─ initrd ────────────────────┤
     │   │
     │   └─► Boot NixOS
     │
     └─► NixOS Installer
         └─ Provisions node based on profile

Components

1. DHCP Server (ISC DHCP)

  • Purpose: Assigns IP addresses and directs PXE clients to bootloader
  • Config: dhcp/dhcpd.conf
  • Features:
    • BIOS/UEFI detection via option 93 (architecture type)
    • Per-host configuration for fixed IP assignment
    • Automatic next-server and filename configuration

2. TFTP Server (atftpd)

  • Purpose: Serves iPXE bootloader files to PXE clients
  • Files served:
    • undionly.kpxe - BIOS bootloader
    • ipxe.efi - UEFI x86-64 bootloader
    • ipxe-i386.efi - UEFI x86 32-bit bootloader (optional)

3. HTTP Server (nginx)

  • Purpose: Serves iPXE scripts and NixOS boot images
  • Config: http/nginx.conf
  • Endpoints:
    • /boot/ipxe/boot.ipxe - Main boot menu script
    • /boot/nixos/bzImage - NixOS kernel
    • /boot/nixos/initrd - NixOS initial ramdisk
    • /health - Health check endpoint

4. iPXE Boot Scripts

  • Main script: ipxe/boot.ipxe
  • Features:
    • Interactive boot menu with 3 profiles
    • MAC-based automatic profile selection
    • Serial console support for remote management
    • Detailed error messages and debugging options

5. NixOS Service Module

  • File: nixos-module.nix
  • Purpose: Declarative NixOS configuration for all services
  • Features:
    • Single configuration file for entire stack
    • Firewall rules auto-configured
    • Systemd service dependencies managed
    • Directory structure auto-created

Quick Start

Prerequisites

  • NixOS server with network connectivity
  • Network interface on the same subnet as bare-metal servers
  • Sufficient disk space (5-10 GB for boot images)

Installation Steps

  1. Clone this repository (or copy baremetal/pxe-server/ to your NixOS system)

  2. Run the setup script:

    sudo ./setup.sh --install --download --validate
    

    This will:

    • Create directory structure at /var/lib/pxe-boot
    • Download iPXE bootloaders from boot.ipxe.org
    • Install boot scripts
    • Validate configurations
  3. Configure network settings:

    Edit nixos-module.nix or create a NixOS configuration:

    # /etc/nixos/configuration.nix
    
    imports = [
      /path/to/baremetal/pxe-server/nixos-module.nix
    ];
    
    services.centra-pxe-server = {
      enable = true;
      interface = "eth0";  # Your network interface
      serverAddress = "10.0.100.10";  # PXE server IP
    
      dhcp = {
        subnet = "10.0.100.0";
        netmask = "255.255.255.0";
        broadcast = "10.0.100.255";
        range = {
          start = "10.0.100.100";
          end = "10.0.100.200";
        };
        router = "10.0.100.1";
      };
    
      # Optional: Define known nodes with MAC addresses
      nodes = {
        "52:54:00:12:34:56" = {
          profile = "control-plane";
          hostname = "control-plane-01";
          ipAddress = "10.0.100.50";
        };
      };
    };
    
  4. Deploy NixOS configuration:

    sudo nixos-rebuild switch
    
  5. Verify services are running:

    sudo ./setup.sh --test
    
  6. Add NixOS boot images (will be provided by T032.S3):

    # Placeholder - actual images will be built by image builder
    # For testing, you can use any NixOS netboot image
    sudo mkdir -p /var/lib/pxe-boot/nixos
    # Copy bzImage and initrd to /var/lib/pxe-boot/nixos/
    
  7. Boot a bare-metal server:

    • Configure server BIOS to boot from network (PXE)
    • Connect to same network segment
    • Power on server
    • Watch for DHCP discovery and iPXE boot menu

Detailed Setup

The NixOS module provides a declarative way to configure the entire PXE server stack.

Advantages:

  • Single configuration file
  • Automatic service dependencies
  • Rollback capability
  • Integration with NixOS firewall

Configuration Example:

See the NixOS configuration example in Quick Start.

Option 2: Manual Installation

For non-NixOS systems or manual setup:

  1. Install required packages:

    # Debian/Ubuntu
    apt-get install isc-dhcp-server atftpd nginx curl
    
    # RHEL/CentOS
    yum install dhcp tftp-server nginx curl
    
  2. Run setup script:

    sudo ./setup.sh --install --download
    
  3. Copy configuration files:

    # DHCP configuration
    sudo cp dhcp/dhcpd.conf /etc/dhcp/dhcpd.conf
    
    # Edit to match your network
    sudo vim /etc/dhcp/dhcpd.conf
    
    # Nginx configuration
    sudo cp http/nginx.conf /etc/nginx/sites-available/pxe-boot
    sudo ln -s /etc/nginx/sites-available/pxe-boot /etc/nginx/sites-enabled/
    
  4. Start services:

    sudo systemctl enable --now isc-dhcp-server
    sudo systemctl enable --now atftpd
    sudo systemctl enable --now nginx
    
  5. Configure firewall:

    # UFW (Ubuntu)
    sudo ufw allow 67/udp    # DHCP
    sudo ufw allow 68/udp    # DHCP
    sudo ufw allow 69/udp    # TFTP
    sudo ufw allow 80/tcp    # HTTP
    
    # firewalld (RHEL)
    sudo firewall-cmd --permanent --add-service=dhcp
    sudo firewall-cmd --permanent --add-service=tftp
    sudo firewall-cmd --permanent --add-service=http
    sudo firewall-cmd --reload
    

Configuration

DHCP Configuration

The DHCP server configuration is in dhcp/dhcpd.conf. Key sections:

Network Settings:

subnet 10.0.100.0 netmask 255.255.255.0 {
    range 10.0.100.100 10.0.100.200;
    option routers 10.0.100.1;
    option domain-name-servers 10.0.100.1, 8.8.8.8;
    next-server 10.0.100.10;  # PXE server IP
    # ...
}

Boot File Selection (automatic BIOS/UEFI detection):

if exists user-class and option user-class = "iPXE" {
    filename "http://10.0.100.10/boot/ipxe/boot.ipxe";
} elsif option architecture-type = 00:00 {
    filename "undionly.kpxe";  # BIOS
} elsif option architecture-type = 00:07 {
    filename "ipxe.efi";  # UEFI x86-64
}

Host-Specific Configuration:

host control-plane-01 {
    hardware ethernet 52:54:00:12:34:56;
    fixed-address 10.0.100.50;
    option host-name "control-plane-01";
}

iPXE Boot Script

The main boot script is ipxe/boot.ipxe. It provides:

  1. MAC-based automatic selection:

    iseq ${mac} 52:54:00:12:34:56 && set profile control-plane && goto boot ||
    
  2. Interactive menu (if no MAC match):

    :menu
    menu Centra Cloud - Bare-Metal Provisioning
    item control-plane    1. Control Plane Node (All Services)
    item worker           2. Worker Node (Compute Services)
    item all-in-one       3. All-in-One Node (Testing/Homelab)
    
  3. Kernel parameters:

    set kernel-params centra.profile=${profile}
    set kernel-params ${kernel-params} centra.hostname=${hostname}
    set kernel-params ${kernel-params} console=tty0 console=ttyS0,115200n8
    

Adding New Nodes

To add a new node to the infrastructure:

  1. Get the MAC address from the server (check BIOS or network card label)

  2. Add to MAC mappings (ipxe/mac-mappings.txt):

    52:54:00:12:34:5d    worker    worker-04
    
  3. Update boot script (ipxe/boot.ipxe):

    iseq ${mac} 52:54:00:12:34:5d && set profile worker && set hostname worker-04 && goto boot ||
    
  4. Add DHCP host entry (dhcp/dhcpd.conf):

    host worker-04 {
        hardware ethernet 52:54:00:12:34:5d;
        fixed-address 10.0.100.64;
        option host-name "worker-04";
    }
    
  5. Restart DHCP service:

    sudo systemctl restart dhcpd4
    

Boot Profiles

1. Control Plane Profile

Purpose: Nodes that run core infrastructure services

Services included:

  • FlareDB (PD, Store, TiKV-compatible database)
  • IAM (Identity and Access Management)
  • PlasmaVMC (Virtual Machine Controller)
  • K8sHost (Kubernetes node agent)
  • FlashDNS (High-performance DNS)
  • ChainFire (Firewall/networking)
  • Object Storage (S3-compatible)
  • Monitoring (Prometheus, Grafana)

Resource requirements:

  • CPU: 8+ cores recommended
  • RAM: 32+ GB recommended
  • Disk: 500+ GB SSD

Use case: Production control plane nodes in a cluster

2. Worker Profile

Purpose: Nodes that run customer workloads

Services included:

  • K8sHost (Kubernetes node agent) - primary service
  • PlasmaVMC (Virtual Machine Controller) - VM workloads
  • ChainFire (Network policy enforcement)
  • FlashDNS (Local DNS caching)
  • Basic monitoring agents

Resource requirements:

  • CPU: 16+ cores recommended
  • RAM: 64+ GB recommended
  • Disk: 1+ TB SSD

Use case: Worker nodes for running customer applications

3. All-in-One Profile

Purpose: Single-node deployment for testing and development

Services included:

  • Complete Centra Cloud stack on one node
  • All services from control-plane profile
  • Suitable for testing, development, homelab

Resource requirements:

  • CPU: 16+ cores recommended
  • RAM: 64+ GB recommended
  • Disk: 1+ TB SSD

Use case: Development, testing, homelab deployments

Warning: Not recommended for production use (no HA, resource intensive)

Network Requirements

Network Topology

The PXE server must be on the same network segment as the bare-metal servers, or you must configure DHCP relay.

Same Segment (recommended for initial setup):

┌──────────────┐         ┌──────────────────┐
│  PXE Server  │         │  Bare-Metal Srv  │
│ 10.0.100.10  │◄────────┤  (DHCP client)   │
└──────────────┘  L2 SW  └──────────────────┘

Different Segments (requires DHCP relay):

┌──────────────┐         ┌──────────┐         ┌──────────────────┐
│  PXE Server  │         │  Router  │         │  Bare-Metal Srv  │
│ 10.0.100.10  │◄────────┤  (relay) │◄────────┤  (DHCP client)   │
└──────────────┘         └──────────┘         └──────────────────┘
   Segment A              ip helper           Segment B

DHCP Relay Configuration

If your PXE server is on a different network segment:

Cisco IOS:

interface vlan 100
  ip helper-address 10.0.100.10

Linux (dhcp-helper):

apt-get install dhcp-helper
# Edit /etc/default/dhcp-helper
DHCPHELPER_OPTS="-s 10.0.100.10"
systemctl restart dhcp-helper

Linux (dhcrelay):

apt-get install isc-dhcp-relay
dhcrelay -i eth0 -i eth1 10.0.100.10

Firewall Rules

The following ports must be open on the PXE server:

Port Protocol Service Direction Description
67 UDP DHCP Inbound DHCP server
68 UDP DHCP Outbound DHCP client responses
69 UDP TFTP Inbound TFTP bootloader downloads
80 TCP HTTP Inbound iPXE scripts and boot images
443 TCP HTTPS Inbound Optional: secure boot images

Network Bandwidth

Estimated bandwidth requirements:

  • Per-node boot: ~500 MB download (kernel + initrd)
  • Concurrent boots: Multiply by number of simultaneous boots
  • Recommended: 1 Gbps link for PXE server

Example: Booting 10 nodes simultaneously requires ~5 Gbps throughput burst, so stagger boots or use 10 Gbps link.

Troubleshooting

DHCP Issues

Problem: Server doesn't get IP address

Diagnosis:

# On PXE server, monitor DHCP requests
sudo tcpdump -i eth0 -n port 67 or port 68

# Check DHCP server logs
sudo journalctl -u dhcpd4 -f

# Verify DHCP server is running
sudo systemctl status dhcpd4

Common causes:

  • DHCP server not running on correct interface
  • Firewall blocking UDP 67/68
  • Network cable/switch issue
  • DHCP range exhausted

Solution:

# Check interface configuration
ip addr show

# Verify DHCP config syntax
sudo dhcpd -t -cf /etc/dhcp/dhcpd.conf

# Check firewall
sudo iptables -L -n | grep -E "67|68"

# Restart DHCP server
sudo systemctl restart dhcpd4

TFTP Issues

Problem: PXE client gets IP but fails to download bootloader

Diagnosis:

# Monitor TFTP requests
sudo tcpdump -i eth0 -n port 69

# Check TFTP server logs
sudo journalctl -u atftpd -f

# Test TFTP locally
tftp localhost -c get undionly.kpxe /tmp/test.kpxe

Common causes:

  • TFTP server not running
  • Bootloader files missing
  • Permissions incorrect
  • Firewall blocking UDP 69

Solution:

# Check files exist
ls -la /var/lib/tftpboot/

# Fix permissions
sudo chmod 644 /var/lib/tftpboot/*.{kpxe,efi}

# Restart TFTP server
sudo systemctl restart atftpd

# Check firewall
sudo iptables -L -n | grep 69

HTTP Issues

Problem: iPXE loads but can't download boot script or kernel

Diagnosis:

# Monitor HTTP requests
sudo tail -f /var/log/nginx/access.log

# Test HTTP locally
curl -v http://localhost/boot/ipxe/boot.ipxe
curl -v http://localhost/health

# Check nginx status
sudo systemctl status nginx

Common causes:

  • Nginx not running
  • Boot files missing
  • Permissions incorrect
  • Firewall blocking TCP 80
  • Wrong server IP in boot.ipxe

Solution:

# Check nginx config
sudo nginx -t

# Verify files exist
ls -la /var/lib/pxe-boot/ipxe/
ls -la /var/lib/pxe-boot/nixos/

# Fix permissions
sudo chown -R nginx:nginx /var/lib/pxe-boot
sudo chmod -R 755 /var/lib/pxe-boot

# Restart nginx
sudo systemctl restart nginx

Boot Script Issues

Problem: Boot menu appears but fails to load kernel

Diagnosis:

  • Check iPXE error messages on console
  • Verify URLs in boot.ipxe match actual paths
  • Test kernel download manually:
    curl -I http://10.0.100.10/boot/nixos/bzImage
    

Common causes:

  • NixOS boot images not deployed yet (normal for T032.S2)
  • Wrong paths in boot.ipxe
  • Files too large (check disk space)

Solution:

# Wait for T032.S3 (Image Builder) to generate boot images
# OR manually place NixOS netboot images:
sudo mkdir -p /var/lib/pxe-boot/nixos
# Copy bzImage and initrd from NixOS netboot

Serial Console Debugging

For remote debugging without physical access:

  1. Enable serial console in BIOS:

    • Configure COM1/ttyS0 at 115200 baud
    • Enable console redirection
  2. Connect via IPMI SOL (if available):

    ipmitool -I lanplus -H <bmc-ip> -U admin sol activate
    
  3. Watch boot process:

    • DHCP discovery messages
    • TFTP download progress
    • iPXE boot menu
    • Kernel boot messages
  4. Kernel parameters include serial console:

    console=tty0 console=ttyS0,115200n8
    

Common Error Messages

Error Cause Solution
PXE-E51: No DHCP or proxyDHCP offers were received DHCP server not responding Check DHCP server running, network connectivity
PXE-E53: No boot filename received DHCP not providing filename Check dhcpd.conf has filename option
PXE-E32: TFTP open timeout TFTP server not responding Check TFTP server running, firewall rules
Not found: /boot/ipxe/boot.ipxe HTTP 404 error Check file exists, nginx config, permissions
Could not boot: Exec format error Corrupted boot file Re-download/rebuild bootloader

Advanced Topics

Building iPXE from Source

For production deployments, building iPXE from source provides:

  • Custom branding
  • Embedded certificates for HTTPS
  • Optimized size
  • Security hardening

Build instructions:

sudo ./setup.sh --build-ipxe

Or manually:

git clone https://github.com/ipxe/ipxe.git
cd ipxe/src

# BIOS bootloader
make bin/undionly.kpxe

# UEFI bootloader
make bin-x86_64-efi/ipxe.efi

# Copy to PXE server
sudo cp bin/undionly.kpxe /var/lib/pxe-boot/ipxe/
sudo cp bin-x86_64-efi/ipxe.efi /var/lib/pxe-boot/ipxe/

HTTPS Boot (Secure Boot)

For enhanced security, serve boot images over HTTPS:

  1. Generate SSL certificate:

    sudo openssl req -x509 -nodes -days 365 -newkey rsa:2048 \
      -keyout /etc/ssl/private/pxe-server.key \
      -out /etc/ssl/certs/pxe-server.crt
    
  2. Configure nginx for HTTPS (uncomment HTTPS block in http/nginx.conf)

  3. Update boot.ipxe to use https:// URLs

  4. Rebuild iPXE with embedded certificate (for secure boot without prompts)

Multiple NixOS Versions

To support multiple NixOS versions for testing/rollback:

/var/lib/pxe-boot/nixos/
├── 24.05/
│   ├── bzImage
│   └── initrd
├── 24.11/
│   ├── bzImage
│   └── initrd
└── latest -> 24.11/  # Symlink to current version

Update boot.ipxe to use /boot/nixos/latest/bzImage or add menu items for version selection.

Integration with BMC/IPMI

For fully automated provisioning:

  1. Discover new hardware via IPMI/Redfish API
  2. Configure PXE boot via IPMI:
    ipmitool -I lanplus -H <bmc-ip> -U admin chassis bootdev pxe options=persistent
    
  3. Power on server:
    ipmitool -I lanplus -H <bmc-ip> -U admin power on
    
  4. Monitor via SOL (serial-over-LAN)

Monitoring and Metrics

Track PXE boot activity:

  1. DHCP leases:

    cat /var/lib/dhcp/dhcpd.leases
    
  2. HTTP access logs:

    sudo tail -f /var/log/nginx/access.log | grep -E "boot.ipxe|bzImage|initrd"
    
  3. Prometheus metrics (if nginx-module-vts installed):

    • Boot file download counts
    • Bandwidth usage
    • Response times
  4. Custom metrics endpoint:

    • Parse nginx access logs
    • Count boots per profile
    • Alert on failed boots

Files and Directory Structure

baremetal/pxe-server/
├── README.md                    # This file
├── setup.sh                     # Setup and management script
├── nixos-module.nix            # NixOS service module
│
├── dhcp/
│   └── dhcpd.conf              # DHCP server configuration
│
├── ipxe/
│   ├── boot.ipxe               # Main boot menu script
│   └── mac-mappings.txt        # MAC address documentation
│
├── http/
│   ├── nginx.conf              # HTTP server configuration
│   └── directory-structure.txt # Directory layout documentation
│
└── assets/                      # (Created at runtime)
    └── /var/lib/pxe-boot/
        ├── ipxe/
        │   ├── undionly.kpxe
        │   ├── ipxe.efi
        │   └── boot.ipxe
        └── nixos/
            ├── bzImage
            └── initrd

Next Steps

After completing the PXE server setup:

  1. T032.S3 - Image Builder: Automated NixOS image generation with profile-specific configurations

  2. T032.S4 - Provisioning Orchestrator: API-driven provisioning workflow and node lifecycle management

  3. Integration with IAM: Authentication for provisioning API

  4. Integration with FlareDB: Node inventory and state management

References

Support

For issues or questions:

  • Check Troubleshooting section
  • Review logs: sudo journalctl -u dhcpd4 -u atftpd -u nginx -f
  • Run diagnostic: sudo ./setup.sh --test

License

Part of Centra Cloud infrastructure - see project root for license information.