- Replace form_urlencoded with RFC 3986 compliant URI encoding - Implement aws_uri_encode() matching AWS SigV4 spec exactly - Unreserved chars (A-Z,a-z,0-9,-,_,.,~) not encoded - All other chars percent-encoded with uppercase hex - Preserve slashes in paths, encode in query params - Normalize empty paths to '/' per AWS spec - Fix test expectations (body hash, HMAC values) - Add comprehensive SigV4 signature determinism test This fixes the canonicalization mismatch that caused signature validation failures in T047. Auth can now be enabled for production. Refs: T058.S1
388 lines
10 KiB
Markdown
388 lines
10 KiB
Markdown
# PlasmaCloud NixOS Image Builder
|
|
|
|
This directory contains tools and configurations for building bootable NixOS netboot images for bare-metal provisioning of PlasmaCloud infrastructure.
|
|
|
|
## Overview
|
|
|
|
The NixOS Image Builder generates netboot images (kernel + initrd) that can be served via PXE/iPXE to provision bare-metal servers with PlasmaCloud services. These images integrate with the T024 NixOS service modules and the T032.S2 PXE boot infrastructure.
|
|
|
|
## Architecture
|
|
|
|
The image builder produces three deployment profiles:
|
|
|
|
### 1. Control Plane (`netboot-control-plane`)
|
|
Full control plane deployment with all 8 PlasmaCloud services:
|
|
- **Chainfire**: Distributed configuration and coordination
|
|
- **FlareDB**: Time-series metrics and events database
|
|
- **IAM**: Identity and access management
|
|
- **PlasmaVMC**: Virtual machine control plane
|
|
- **PrismNET**: Software-defined networking controller
|
|
- **FlashDNS**: High-performance DNS server
|
|
- **FiberLB**: Layer 4/7 load balancer
|
|
- **LightningStor**: Distributed block storage
|
|
- **K8sHost**: Kubernetes hosting component
|
|
|
|
**Use Cases**:
|
|
- Multi-node production clusters (3+ control plane nodes)
|
|
- High-availability deployments
|
|
- Separation of control and data planes
|
|
|
|
### 2. Worker (`netboot-worker`)
|
|
Compute-focused deployment for running tenant workloads:
|
|
- **PlasmaVMC**: Virtual machine control plane
|
|
- **PrismNET**: Software-defined networking
|
|
|
|
**Use Cases**:
|
|
- Worker nodes in multi-node clusters
|
|
- Dedicated compute capacity
|
|
- Scalable VM hosting
|
|
|
|
### 3. All-in-One (`netboot-all-in-one`)
|
|
Single-node deployment with all 8 services:
|
|
- All services from Control Plane profile
|
|
- Optimized for single-node operation
|
|
|
|
**Use Cases**:
|
|
- Development/testing environments
|
|
- Small deployments (1-3 nodes)
|
|
- Edge locations
|
|
- Proof-of-concept installations
|
|
|
|
## Prerequisites
|
|
|
|
### Build Environment
|
|
|
|
- **NixOS** or **Nix package manager** installed
|
|
- **Flakes** enabled in Nix configuration
|
|
- **Git** access to PlasmaCloud repository
|
|
- **Sufficient disk space**: ~10GB for build artifacts
|
|
|
|
### Enable Nix Flakes
|
|
|
|
If not already enabled, add to `/etc/nix/nix.conf` or `~/.config/nix/nix.conf`:
|
|
|
|
```
|
|
experimental-features = nix-command flakes
|
|
```
|
|
|
|
### Build Dependencies
|
|
|
|
The build process automatically handles all dependencies, but ensure you have:
|
|
- Working internet connection (for Nix binary cache)
|
|
- ~4GB RAM minimum
|
|
- ~10GB free disk space
|
|
|
|
## Build Instructions
|
|
|
|
### Quick Start
|
|
|
|
Build all profiles:
|
|
|
|
```bash
|
|
cd /home/centra/cloud/baremetal/image-builder
|
|
./build-images.sh
|
|
```
|
|
|
|
Build a specific profile:
|
|
|
|
```bash
|
|
# Control plane only
|
|
./build-images.sh --profile control-plane
|
|
|
|
# Worker nodes only
|
|
./build-images.sh --profile worker
|
|
|
|
# All-in-one deployment
|
|
./build-images.sh --profile all-in-one
|
|
```
|
|
|
|
Custom output directory:
|
|
|
|
```bash
|
|
./build-images.sh --output-dir /srv/pxe/images
|
|
```
|
|
|
|
### Build Output
|
|
|
|
Each profile generates:
|
|
- `bzImage` - Linux kernel (~10-30 MB)
|
|
- `initrd` - Initial ramdisk (~100-300 MB)
|
|
- `netboot.ipxe` - iPXE boot script
|
|
- `build.log` - Build log for troubleshooting
|
|
|
|
Artifacts are placed in:
|
|
```
|
|
./artifacts/
|
|
├── control-plane/
|
|
│ ├── bzImage
|
|
│ ├── initrd
|
|
│ ├── netboot.ipxe
|
|
│ └── build.log
|
|
├── worker/
|
|
│ ├── bzImage
|
|
│ ├── initrd
|
|
│ ├── netboot.ipxe
|
|
│ └── build.log
|
|
└── all-in-one/
|
|
├── bzImage
|
|
├── initrd
|
|
├── netboot.ipxe
|
|
└── build.log
|
|
```
|
|
|
|
### Manual Build Commands
|
|
|
|
You can also build images directly with Nix:
|
|
|
|
```bash
|
|
# Build initrd
|
|
nix build .#nixosConfigurations.netboot-control-plane.config.system.build.netbootRamdisk
|
|
|
|
# Build kernel
|
|
nix build .#nixosConfigurations.netboot-control-plane.config.system.build.kernel
|
|
|
|
# Access artifacts
|
|
ls -lh result/
|
|
```
|
|
|
|
## Deployment
|
|
|
|
### Integration with PXE Server (T032.S2)
|
|
|
|
The build script automatically copies artifacts to the PXE server directory if it exists:
|
|
|
|
```
|
|
chainfire/baremetal/pxe-server/assets/nixos/
|
|
├── control-plane/
|
|
├── worker/
|
|
├── all-in-one/
|
|
├── bzImage-control-plane -> control-plane/bzImage
|
|
├── initrd-control-plane -> control-plane/initrd
|
|
├── bzImage-worker -> worker/bzImage
|
|
└── initrd-worker -> worker/initrd
|
|
```
|
|
|
|
### Manual Deployment
|
|
|
|
Copy artifacts to your PXE/HTTP server:
|
|
|
|
```bash
|
|
# Example: Deploy to nginx serving directory
|
|
sudo cp -r ./artifacts/control-plane /srv/pxe/nixos/
|
|
sudo cp -r ./artifacts/worker /srv/pxe/nixos/
|
|
sudo cp -r ./artifacts/all-in-one /srv/pxe/nixos/
|
|
```
|
|
|
|
### iPXE Boot Configuration
|
|
|
|
Reference the images in your iPXE boot script:
|
|
|
|
```ipxe
|
|
#!ipxe
|
|
|
|
set boot-server 10.0.0.2:8080
|
|
|
|
:control-plane
|
|
kernel http://${boot-server}/nixos/control-plane/bzImage init=/nix/store/*/init console=ttyS0,115200 console=tty0 loglevel=4
|
|
initrd http://${boot-server}/nixos/control-plane/initrd
|
|
boot
|
|
|
|
:worker
|
|
kernel http://${boot-server}/nixos/worker/bzImage init=/nix/store/*/init console=ttyS0,115200 console=tty0 loglevel=4
|
|
initrd http://${boot-server}/nixos/worker/initrd
|
|
boot
|
|
```
|
|
|
|
## Customization
|
|
|
|
### Adding Services
|
|
|
|
To add a service to a profile, edit the corresponding configuration:
|
|
|
|
```nix
|
|
# nix/images/netboot-control-plane.nix
|
|
environment.systemPackages = with pkgs; [
|
|
chainfire-server
|
|
flaredb-server
|
|
# ... existing services ...
|
|
my-custom-service # Add your service
|
|
];
|
|
```
|
|
|
|
### Custom Kernel Configuration
|
|
|
|
Modify `nix/images/netboot-base.nix`:
|
|
|
|
```nix
|
|
boot.kernelPackages = pkgs.linuxPackages_6_6; # Specific kernel version
|
|
boot.kernelModules = [ "my-driver" ]; # Additional modules
|
|
boot.kernelParams = [ "my-param=value" ]; # Additional kernel parameters
|
|
```
|
|
|
|
### Additional Packages
|
|
|
|
Add packages to the netboot environment:
|
|
|
|
```nix
|
|
# nix/images/netboot-base.nix
|
|
environment.systemPackages = with pkgs; [
|
|
# ... existing packages ...
|
|
|
|
# Your additions
|
|
python3
|
|
nodejs
|
|
custom-tool
|
|
];
|
|
```
|
|
|
|
### Hardware-Specific Configuration
|
|
|
|
See `examples/hardware-specific.nix` for hardware-specific customizations.
|
|
|
|
## Troubleshooting
|
|
|
|
### Build Failures
|
|
|
|
**Symptom**: Build fails with Nix errors
|
|
|
|
**Solutions**:
|
|
1. Check build log: `cat artifacts/PROFILE/build.log`
|
|
2. Verify Nix flakes are enabled
|
|
3. Update nixpkgs: `nix flake update`
|
|
4. Clear Nix store cache: `nix-collect-garbage -d`
|
|
|
|
### Missing Service Packages
|
|
|
|
**Symptom**: Error: "package not found"
|
|
|
|
**Solutions**:
|
|
1. Verify service is built: `nix build .#chainfire-server`
|
|
2. Check flake overlay: `nix flake show`
|
|
3. Rebuild all packages: `nix build .#default`
|
|
|
|
### Image Too Large
|
|
|
|
**Symptom**: Initrd > 500 MB
|
|
|
|
**Solutions**:
|
|
1. Remove unnecessary packages from `environment.systemPackages`
|
|
2. Disable documentation (already done in base config)
|
|
3. Use minimal kernel: `boot.kernelPackages = pkgs.linuxPackages_latest_hardened`
|
|
|
|
### PXE Boot Fails
|
|
|
|
**Symptom**: Server fails to boot netboot image
|
|
|
|
**Solutions**:
|
|
1. Verify artifacts are accessible via HTTP
|
|
2. Check iPXE script syntax
|
|
3. Verify kernel parameters in boot script
|
|
4. Check serial console output (ttyS0)
|
|
5. Ensure DHCP provides correct boot server IP
|
|
|
|
### SSH Access Issues
|
|
|
|
**Symptom**: Cannot SSH to netboot installer
|
|
|
|
**Solutions**:
|
|
1. Replace example SSH key in `nix/images/netboot-base.nix`
|
|
2. Verify network connectivity (DHCP, firewall)
|
|
3. Check SSH service is running: `systemctl status sshd`
|
|
|
|
## Configuration Reference
|
|
|
|
### Service Modules (T024 Integration)
|
|
|
|
All netboot profiles import PlasmaCloud service modules from `nix/modules/`:
|
|
|
|
- `chainfire.nix` - Chainfire configuration
|
|
- `flaredb.nix` - FlareDB configuration
|
|
- `iam.nix` - IAM configuration
|
|
- `plasmavmc.nix` - PlasmaVMC configuration
|
|
- `prismnet.nix` - PrismNET configuration
|
|
- `flashdns.nix` - FlashDNS configuration
|
|
- `fiberlb.nix` - FiberLB configuration
|
|
- `lightningstor.nix` - LightningStor configuration
|
|
- `k8shost.nix` - K8sHost configuration
|
|
|
|
Services are **disabled by default** in netboot images and enabled in final installed configurations.
|
|
|
|
### Netboot Base Configuration
|
|
|
|
Located at `nix/images/netboot-base.nix`, provides:
|
|
|
|
- SSH server with root access (key-based)
|
|
- Generic kernel with broad hardware support
|
|
- Disk management tools (disko, parted, cryptsetup, lvm2)
|
|
- Network tools (iproute2, curl, tcpdump)
|
|
- Serial console support (ttyS0, tty0)
|
|
- DHCP networking
|
|
- Minimal system configuration
|
|
|
|
### Profile Configurations
|
|
|
|
- `nix/images/netboot-control-plane.nix` - All 8 services
|
|
- `nix/images/netboot-worker.nix` - Compute services (PlasmaVMC, PrismNET)
|
|
- `nix/images/netboot-all-in-one.nix` - All services for single-node
|
|
|
|
## Security Considerations
|
|
|
|
### SSH Keys
|
|
|
|
**IMPORTANT**: The default SSH key in `netboot-base.nix` is an example placeholder. You MUST replace it with your actual provisioning key:
|
|
|
|
```nix
|
|
users.users.root.openssh.authorizedKeys.keys = [
|
|
"ssh-ed25519 AAAAC3Nza... your-provisioning-key@host"
|
|
];
|
|
```
|
|
|
|
Generate a new key:
|
|
|
|
```bash
|
|
ssh-keygen -t ed25519 -C "provisioning@plasmacloud"
|
|
```
|
|
|
|
### Network Security
|
|
|
|
- Netboot images have **firewall disabled** for installation phase
|
|
- Use isolated provisioning VLAN for PXE boot
|
|
- Implement MAC address whitelist in DHCP
|
|
- Enable firewall in final installed configurations
|
|
|
|
### Secrets Management
|
|
|
|
- Do NOT embed secrets in netboot images
|
|
- Use nixos-anywhere to inject secrets during installation
|
|
- Store secrets in `/etc/nixos/secrets/` on installed systems
|
|
- Use proper file permissions (0400 for keys)
|
|
|
|
## Next Steps
|
|
|
|
After building images:
|
|
|
|
1. **Deploy to PXE Server**: Copy artifacts to HTTP server
|
|
2. **Configure DHCP/iPXE**: Set up boot infrastructure (see T032.S2)
|
|
3. **Prepare Node Configurations**: Create per-node configs for nixos-anywhere
|
|
4. **Test Boot Process**: Verify PXE boot on test hardware
|
|
5. **Run nixos-anywhere**: Install NixOS on target machines
|
|
|
|
## Resources
|
|
|
|
- **Design Document**: `docs/por/T032-baremetal-provisioning/design.md`
|
|
- **PXE Infrastructure**: `chainfire/baremetal/pxe-server/`
|
|
- **Service Modules**: `nix/modules/`
|
|
- **Example Configurations**: `baremetal/image-builder/examples/`
|
|
|
|
## Support
|
|
|
|
For issues or questions:
|
|
|
|
1. Check build logs: `artifacts/PROFILE/build.log`
|
|
2. Review design document: `docs/por/T032-baremetal-provisioning/design.md`
|
|
3. Examine example configurations: `examples/`
|
|
4. Verify service module configuration: `nix/modules/`
|
|
|
|
## License
|
|
|
|
Apache 2.0 - See LICENSE file for details
|