- Replace form_urlencoded with RFC 3986 compliant URI encoding - Implement aws_uri_encode() matching AWS SigV4 spec exactly - Unreserved chars (A-Z,a-z,0-9,-,_,.,~) not encoded - All other chars percent-encoded with uppercase hex - Preserve slashes in paths, encode in query params - Normalize empty paths to '/' per AWS spec - Fix test expectations (body hash, HMAC values) - Add comprehensive SigV4 signature determinism test This fixes the canonicalization mismatch that caused signature validation failures in T047. Auth can now be enabled for production. Refs: T058.S1
570 lines
14 KiB
Markdown
570 lines
14 KiB
Markdown
# PlasmaCloud Netboot Image Builder - Technical Overview
|
|
|
|
## Introduction
|
|
|
|
This document provides a technical overview of the PlasmaCloud NixOS Image Builder, which generates bootable netboot images for bare-metal provisioning. This is part of T032 (Bare-Metal Provisioning) and specifically implements deliverable S3 (NixOS Image Builder).
|
|
|
|
## System Architecture
|
|
|
|
### High-Level Flow
|
|
|
|
```
|
|
┌─────────────────────┐
|
|
│ Nix Flake │
|
|
│ (flake.nix) │
|
|
└──────────┬──────────┘
|
|
│
|
|
├─── nixosConfigurations
|
|
│ ├── netboot-control-plane
|
|
│ ├── netboot-worker
|
|
│ └── netboot-all-in-one
|
|
│
|
|
├─── packages (T024)
|
|
│ ├── chainfire-server
|
|
│ ├── flaredb-server
|
|
│ └── ... (8 services)
|
|
│
|
|
└─── modules (T024)
|
|
├── chainfire.nix
|
|
├── flaredb.nix
|
|
└── ... (8 modules)
|
|
|
|
Build Process
|
|
↓
|
|
|
|
┌─────────────────────┐
|
|
│ build-images.sh │
|
|
└──────────┬──────────┘
|
|
│
|
|
├─── nix build netbootRamdisk
|
|
├─── nix build kernel
|
|
└─── copy to artifacts/
|
|
|
|
Output
|
|
↓
|
|
|
|
┌─────────────────────┐
|
|
│ Netboot Artifacts │
|
|
├─────────────────────┤
|
|
│ bzImage (kernel) │
|
|
│ initrd (ramdisk) │
|
|
│ netboot.ipxe │
|
|
└─────────────────────┘
|
|
│
|
|
├─── PXE Server
|
|
│ (HTTP/TFTP)
|
|
│
|
|
└─── Target Machine
|
|
(PXE Boot)
|
|
```
|
|
|
|
## Component Breakdown
|
|
|
|
### 1. Netboot Configurations
|
|
|
|
Located in `nix/images/`, these NixOS configurations define the netboot environment:
|
|
|
|
#### `netboot-base.nix`
|
|
**Purpose**: Common base configuration for all profiles
|
|
|
|
**Key Features**:
|
|
- Extends `netboot-minimal.nix` from nixpkgs
|
|
- SSH server with root login (key-based only)
|
|
- Generic kernel with broad hardware support
|
|
- Disk management tools (disko, parted, cryptsetup, lvm2)
|
|
- Network configuration (DHCP, predictable interface names)
|
|
- Serial console support (ttyS0, tty0)
|
|
- Minimal system (no docs, no sound)
|
|
|
|
**Package Inclusions**:
|
|
```nix
|
|
disko, parted, gptfdisk # Disk management
|
|
cryptsetup, lvm2 # Encryption and LVM
|
|
e2fsprogs, xfsprogs # Filesystem tools
|
|
iproute2, curl, tcpdump # Network tools
|
|
vim, tmux, htop # System tools
|
|
```
|
|
|
|
**Kernel Configuration**:
|
|
```nix
|
|
boot.kernelPackages = pkgs.linuxPackages_latest;
|
|
boot.kernelParams = [
|
|
"console=ttyS0,115200"
|
|
"console=tty0"
|
|
"loglevel=4"
|
|
];
|
|
```
|
|
|
|
#### `netboot-control-plane.nix`
|
|
**Purpose**: Full control plane deployment
|
|
|
|
**Imports**:
|
|
- `netboot-base.nix` (base configuration)
|
|
- `../modules` (PlasmaCloud service modules)
|
|
|
|
**Service Inclusions**:
|
|
- Chainfire (ports 2379, 2380, 2381)
|
|
- FlareDB (ports 2479, 2480)
|
|
- IAM (port 8080)
|
|
- PlasmaVMC (port 8081)
|
|
- PrismNET (port 8082)
|
|
- FlashDNS (port 53)
|
|
- FiberLB (port 8083)
|
|
- LightningStor (port 8084)
|
|
- K8sHost (port 8085)
|
|
|
|
**Service State**: All services **disabled** by default via `lib.mkDefault false`
|
|
|
|
**Resource Limits** (for netboot environment):
|
|
```nix
|
|
MemoryMax = "512M"
|
|
CPUQuota = "50%"
|
|
```
|
|
|
|
#### `netboot-worker.nix`
|
|
**Purpose**: Compute-focused worker nodes
|
|
|
|
**Imports**:
|
|
- `netboot-base.nix`
|
|
- `../modules`
|
|
|
|
**Service Inclusions**:
|
|
- PlasmaVMC (VM management)
|
|
- PrismNET (SDN)
|
|
|
|
**Additional Features**:
|
|
- KVM virtualization support
|
|
- Open vSwitch for SDN
|
|
- QEMU and libvirt tools
|
|
- Optimized sysctl for VM workloads
|
|
|
|
**Performance Tuning**:
|
|
```nix
|
|
"fs.file-max" = 1000000;
|
|
"net.ipv4.ip_forward" = 1;
|
|
"net.core.netdev_max_backlog" = 5000;
|
|
```
|
|
|
|
#### `netboot-all-in-one.nix`
|
|
**Purpose**: Single-node deployment with all services
|
|
|
|
**Imports**:
|
|
- `netboot-base.nix`
|
|
- `../modules`
|
|
|
|
**Combines**: All features from control-plane + worker
|
|
|
|
**Use Cases**:
|
|
- Development environments
|
|
- Small deployments
|
|
- Edge locations
|
|
- POC installations
|
|
|
|
### 2. Flake Integration
|
|
|
|
The main `flake.nix` exposes netboot configurations:
|
|
|
|
```nix
|
|
nixosConfigurations = {
|
|
netboot-control-plane = nixpkgs.lib.nixosSystem {
|
|
system = "x86_64-linux";
|
|
modules = [ ./nix/images/netboot-control-plane.nix ];
|
|
};
|
|
|
|
netboot-worker = nixpkgs.lib.nixosSystem {
|
|
system = "x86_64-linux";
|
|
modules = [ ./nix/images/netboot-worker.nix ];
|
|
};
|
|
|
|
netboot-all-in-one = nixpkgs.lib.nixosSystem {
|
|
system = "x86_64-linux";
|
|
modules = [ ./nix/images/netboot-all-in-one.nix ];
|
|
};
|
|
};
|
|
```
|
|
|
|
### 3. Build Script
|
|
|
|
`build-images.sh` orchestrates the build process:
|
|
|
|
**Workflow**:
|
|
1. Parse command-line arguments (--profile, --output-dir)
|
|
2. Create output directories
|
|
3. For each profile:
|
|
- Build netboot ramdisk: `nix build ...netbootRamdisk`
|
|
- Build kernel: `nix build ...kernel`
|
|
- Copy artifacts (bzImage, initrd)
|
|
- Generate iPXE boot script
|
|
- Calculate and display sizes
|
|
4. Verify outputs (file existence, size sanity checks)
|
|
5. Copy to PXE server (if available)
|
|
6. Print summary
|
|
|
|
**Build Commands**:
|
|
```bash
|
|
nix build .#nixosConfigurations.netboot-$profile.config.system.build.netbootRamdisk
|
|
nix build .#nixosConfigurations.netboot-$profile.config.system.build.kernel
|
|
```
|
|
|
|
**Output Structure**:
|
|
```
|
|
artifacts/
|
|
├── control-plane/
|
|
│ ├── bzImage # ~10-30 MB
|
|
│ ├── initrd # ~100-300 MB
|
|
│ ├── netboot.ipxe # iPXE script
|
|
│ ├── build.log # Build log
|
|
│ ├── initrd-link # Nix result symlink
|
|
│ └── kernel-link # Nix result symlink
|
|
├── worker/
|
|
│ └── ... (same structure)
|
|
└── all-in-one/
|
|
└── ... (same structure)
|
|
```
|
|
|
|
## Integration Points
|
|
|
|
### T024 NixOS Modules
|
|
|
|
The netboot configurations leverage T024 service modules:
|
|
|
|
**Module Structure** (example: chainfire.nix):
|
|
```nix
|
|
{
|
|
options.services.chainfire = {
|
|
enable = lib.mkEnableOption "chainfire service";
|
|
port = lib.mkOption { ... };
|
|
raftPort = lib.mkOption { ... };
|
|
package = lib.mkOption { ... };
|
|
};
|
|
|
|
config = lib.mkIf cfg.enable {
|
|
users.users.chainfire = { ... };
|
|
systemd.services.chainfire = { ... };
|
|
};
|
|
}
|
|
```
|
|
|
|
**Package Availability**:
|
|
```nix
|
|
# In netboot-control-plane.nix
|
|
environment.systemPackages = with pkgs; [
|
|
chainfire-server # From flake overlay
|
|
flaredb-server # From flake overlay
|
|
# ...
|
|
];
|
|
```
|
|
|
|
### T032.S2 PXE Infrastructure
|
|
|
|
The build script integrates with the PXE server:
|
|
|
|
**Copy Workflow**:
|
|
```bash
|
|
# Build script copies to:
|
|
chainfire/baremetal/pxe-server/assets/nixos/
|
|
├── control-plane/
|
|
│ ├── bzImage
|
|
│ └── initrd
|
|
├── worker/
|
|
│ ├── bzImage
|
|
│ └── initrd
|
|
└── all-in-one/
|
|
├── bzImage
|
|
└── initrd
|
|
```
|
|
|
|
**iPXE Boot Script** (generated):
|
|
```ipxe
|
|
#!ipxe
|
|
kernel ${boot-server}/control-plane/bzImage init=/nix/store/*/init console=ttyS0,115200
|
|
initrd ${boot-server}/control-plane/initrd
|
|
boot
|
|
```
|
|
|
|
## Build Process Deep Dive
|
|
|
|
### NixOS Netboot Build Internals
|
|
|
|
1. **netboot-minimal.nix** (from nixpkgs):
|
|
- Provides base netboot functionality
|
|
- Configures initrd with kexec support
|
|
- Sets up squashfs for Nix store
|
|
|
|
2. **Our Extensions**:
|
|
- Add PlasmaCloud service packages
|
|
- Configure SSH for nixos-anywhere
|
|
- Include provisioning tools (disko, etc.)
|
|
- Customize kernel and modules
|
|
|
|
3. **Build Outputs**:
|
|
- **bzImage**: Compressed Linux kernel
|
|
- **initrd**: Squashfs-compressed initial ramdisk containing:
|
|
- Minimal NixOS system
|
|
- Nix store with service packages
|
|
- Init scripts for booting
|
|
|
|
### Size Optimization Strategies
|
|
|
|
**Current Optimizations**:
|
|
```nix
|
|
documentation.enable = false; # -50MB
|
|
documentation.nixos.enable = false; # -20MB
|
|
i18n.supportedLocales = [ "en_US" ]; # -100MB
|
|
```
|
|
|
|
**Additional Strategies** (if needed):
|
|
- Use `linuxPackages_hardened` (smaller kernel)
|
|
- Remove unused kernel modules
|
|
- Compress with xz instead of gzip
|
|
- On-demand package fetching from HTTP substituter
|
|
|
|
**Expected Sizes**:
|
|
- **Control Plane**: ~250-350 MB (initrd)
|
|
- **Worker**: ~150-250 MB (initrd)
|
|
- **All-in-One**: ~300-400 MB (initrd)
|
|
|
|
## Boot Flow
|
|
|
|
### From PXE to Running System
|
|
|
|
```
|
|
1. PXE Boot
|
|
├─ DHCP discovers boot server
|
|
├─ TFTP loads iPXE binary
|
|
└─ iPXE executes boot script
|
|
|
|
2. Netboot Download
|
|
├─ HTTP downloads bzImage (~20MB)
|
|
├─ HTTP downloads initrd (~200MB)
|
|
└─ kexec into NixOS installer
|
|
|
|
3. NixOS Installer (in RAM)
|
|
├─ Init system starts
|
|
├─ Network configuration (DHCP)
|
|
├─ SSH server starts
|
|
└─ Ready for nixos-anywhere
|
|
|
|
4. Installation (nixos-anywhere)
|
|
├─ SSH connection established
|
|
├─ Disk partitioning (disko)
|
|
├─ NixOS system installation
|
|
├─ Secret injection
|
|
└─ Bootloader installation
|
|
|
|
5. First Boot (from disk)
|
|
├─ GRUB/systemd-boot loads
|
|
├─ Services start (enabled)
|
|
├─ Cluster join (if configured)
|
|
└─ Running PlasmaCloud node
|
|
```
|
|
|
|
## Customization Guide
|
|
|
|
### Adding a New Service
|
|
|
|
**Step 1**: Create NixOS module
|
|
```nix
|
|
# nix/modules/myservice.nix
|
|
{ config, lib, pkgs, ... }:
|
|
{
|
|
options.services.myservice = {
|
|
enable = lib.mkEnableOption "myservice";
|
|
};
|
|
|
|
config = lib.mkIf cfg.enable {
|
|
systemd.services.myservice = { ... };
|
|
};
|
|
}
|
|
```
|
|
|
|
**Step 2**: Add to flake packages
|
|
```nix
|
|
# flake.nix
|
|
packages.myservice-server = buildRustWorkspace { ... };
|
|
```
|
|
|
|
**Step 3**: Include in netboot profile
|
|
```nix
|
|
# nix/images/netboot-control-plane.nix
|
|
environment.systemPackages = with pkgs; [
|
|
myservice-server
|
|
];
|
|
|
|
services.myservice = {
|
|
enable = lib.mkDefault false;
|
|
};
|
|
```
|
|
|
|
### Creating a Custom Profile
|
|
|
|
**Step 1**: Create new netboot configuration
|
|
```nix
|
|
# nix/images/netboot-custom.nix
|
|
{ config, pkgs, lib, ... }:
|
|
{
|
|
imports = [
|
|
./netboot-base.nix
|
|
../modules
|
|
];
|
|
|
|
# Your customizations
|
|
environment.systemPackages = [ ... ];
|
|
}
|
|
```
|
|
|
|
**Step 2**: Add to flake
|
|
```nix
|
|
# flake.nix
|
|
nixosConfigurations.netboot-custom = nixpkgs.lib.nixosSystem {
|
|
system = "x86_64-linux";
|
|
modules = [ ./nix/images/netboot-custom.nix ];
|
|
};
|
|
```
|
|
|
|
**Step 3**: Update build script
|
|
```bash
|
|
# build-images.sh
|
|
profiles_to_build=("control-plane" "worker" "all-in-one" "custom")
|
|
```
|
|
|
|
## Security Model
|
|
|
|
### Netboot Phase
|
|
|
|
**Risk**: Netboot image has root SSH access enabled
|
|
|
|
**Mitigations**:
|
|
1. **Key-based authentication only** (no passwords)
|
|
2. **Isolated provisioning VLAN**
|
|
3. **MAC address whitelist in DHCP**
|
|
4. **Firewall disabled only during install**
|
|
|
|
### Post-Installation
|
|
|
|
Services remain disabled until final configuration enables them:
|
|
|
|
```nix
|
|
# In installed system configuration
|
|
services.chainfire.enable = true; # Overrides lib.mkDefault false
|
|
```
|
|
|
|
### Secret Management
|
|
|
|
Secrets are **NOT** embedded in netboot images:
|
|
|
|
```nix
|
|
# During nixos-anywhere installation:
|
|
scp secrets/* root@target:/tmp/secrets/
|
|
|
|
# Installed system references:
|
|
services.chainfire.settings.tls = {
|
|
cert_path = "/etc/nixos/secrets/tls-cert.pem";
|
|
};
|
|
```
|
|
|
|
## Performance Characteristics
|
|
|
|
### Build Times
|
|
|
|
- **First build**: 30-60 minutes (downloads all dependencies)
|
|
- **Incremental builds**: 5-15 minutes (reuses cached artifacts)
|
|
- **With local cache**: 2-5 minutes
|
|
|
|
### Network Requirements
|
|
|
|
- **Initial download**: ~2GB (nixpkgs + dependencies)
|
|
- **Netboot download**: ~200-400MB per node
|
|
- **Installation**: ~500MB-2GB (depending on services)
|
|
|
|
### Hardware Requirements
|
|
|
|
**Build Machine**:
|
|
- CPU: 4+ cores recommended
|
|
- RAM: 8GB minimum, 16GB recommended
|
|
- Disk: 50GB free space
|
|
- Network: Broadband connection
|
|
|
|
**Target Machine**:
|
|
- RAM: 4GB minimum for netboot (8GB+ for production)
|
|
- Network: PXE boot support, DHCP
|
|
- Disk: Depends on disko configuration
|
|
|
|
## Testing Strategy
|
|
|
|
### Verification Steps
|
|
|
|
1. **Syntax Validation**:
|
|
```bash
|
|
nix flake check
|
|
```
|
|
|
|
2. **Build Test**:
|
|
```bash
|
|
./build-images.sh --profile control-plane
|
|
```
|
|
|
|
3. **Artifact Verification**:
|
|
```bash
|
|
file artifacts/control-plane/bzImage # Should be Linux kernel
|
|
file artifacts/control-plane/initrd # Should be compressed data
|
|
```
|
|
|
|
4. **PXE Boot Test**:
|
|
- Boot VM from netboot image
|
|
- Verify SSH access
|
|
- Check available tools (disko, parted, etc.)
|
|
|
|
5. **Installation Test**:
|
|
- Run nixos-anywhere on test target
|
|
- Verify successful installation
|
|
- Check service availability
|
|
|
|
## Troubleshooting Matrix
|
|
|
|
| Symptom | Possible Cause | Solution |
|
|
|---------|---------------|----------|
|
|
| Build fails | Missing flakes | Enable experimental-features |
|
|
| Large initrd | Too many packages | Remove unused packages |
|
|
| SSH fails | Wrong SSH key | Update authorized_keys |
|
|
| Boot hangs | Wrong kernel params | Check console= settings |
|
|
| No network | DHCP issues | Verify useDHCP = true |
|
|
| Service missing | Package not built | Check flake overlay |
|
|
|
|
## Future Enhancements
|
|
|
|
### Planned Improvements
|
|
|
|
1. **Image Variants**:
|
|
- Minimal installer (no services)
|
|
- Debug variant (with extra tools)
|
|
- Rescue mode (for recovery)
|
|
|
|
2. **Build Optimizations**:
|
|
- Parallel profile builds
|
|
- Incremental rebuild detection
|
|
- Binary cache integration
|
|
|
|
3. **Security Enhancements**:
|
|
- Per-node SSH keys
|
|
- TPM-based secrets
|
|
- Measured boot support
|
|
|
|
4. **Monitoring**:
|
|
- Build metrics collection
|
|
- Size trend tracking
|
|
- Performance benchmarking
|
|
|
|
## References
|
|
|
|
- **NixOS Netboot**: https://nixos.wiki/wiki/Netboot
|
|
- **nixos-anywhere**: https://github.com/nix-community/nixos-anywhere
|
|
- **disko**: https://github.com/nix-community/disko
|
|
- **T032 Design**: `docs/por/T032-baremetal-provisioning/design.md`
|
|
- **T024 Modules**: `nix/modules/`
|
|
|
|
## Revision History
|
|
|
|
| Version | Date | Author | Changes |
|
|
|---------|------|--------|---------|
|
|
| 1.0 | 2025-12-10 | T032.S3 | Initial implementation |
|