photoncloud-monorepo/baremetal/image-builder/README.md
centra d2149b6249 fix(lightningstor): Fix SigV4 canonicalization for AWS S3 auth
- Replace form_urlencoded with RFC 3986 compliant URI encoding
- Implement aws_uri_encode() matching AWS SigV4 spec exactly
- Unreserved chars (A-Z,a-z,0-9,-,_,.,~) not encoded
- All other chars percent-encoded with uppercase hex
- Preserve slashes in paths, encode in query params
- Normalize empty paths to '/' per AWS spec
- Fix test expectations (body hash, HMAC values)
- Add comprehensive SigV4 signature determinism test

This fixes the canonicalization mismatch that caused signature
validation failures in T047. Auth can now be enabled for production.

Refs: T058.S1
2025-12-12 06:23:46 +09:00

388 lines
10 KiB
Markdown

# PlasmaCloud NixOS Image Builder
This directory contains tools and configurations for building bootable NixOS netboot images for bare-metal provisioning of PlasmaCloud infrastructure.
## Overview
The NixOS Image Builder generates netboot images (kernel + initrd) that can be served via PXE/iPXE to provision bare-metal servers with PlasmaCloud services. These images integrate with the T024 NixOS service modules and the T032.S2 PXE boot infrastructure.
## Architecture
The image builder produces three deployment profiles:
### 1. Control Plane (`netboot-control-plane`)
Full control plane deployment with all 8 PlasmaCloud services:
- **Chainfire**: Distributed configuration and coordination
- **FlareDB**: Time-series metrics and events database
- **IAM**: Identity and access management
- **PlasmaVMC**: Virtual machine control plane
- **PrismNET**: Software-defined networking controller
- **FlashDNS**: High-performance DNS server
- **FiberLB**: Layer 4/7 load balancer
- **LightningStor**: Distributed block storage
- **K8sHost**: Kubernetes hosting component
**Use Cases**:
- Multi-node production clusters (3+ control plane nodes)
- High-availability deployments
- Separation of control and data planes
### 2. Worker (`netboot-worker`)
Compute-focused deployment for running tenant workloads:
- **PlasmaVMC**: Virtual machine control plane
- **PrismNET**: Software-defined networking
**Use Cases**:
- Worker nodes in multi-node clusters
- Dedicated compute capacity
- Scalable VM hosting
### 3. All-in-One (`netboot-all-in-one`)
Single-node deployment with all 8 services:
- All services from Control Plane profile
- Optimized for single-node operation
**Use Cases**:
- Development/testing environments
- Small deployments (1-3 nodes)
- Edge locations
- Proof-of-concept installations
## Prerequisites
### Build Environment
- **NixOS** or **Nix package manager** installed
- **Flakes** enabled in Nix configuration
- **Git** access to PlasmaCloud repository
- **Sufficient disk space**: ~10GB for build artifacts
### Enable Nix Flakes
If not already enabled, add to `/etc/nix/nix.conf` or `~/.config/nix/nix.conf`:
```
experimental-features = nix-command flakes
```
### Build Dependencies
The build process automatically handles all dependencies, but ensure you have:
- Working internet connection (for Nix binary cache)
- ~4GB RAM minimum
- ~10GB free disk space
## Build Instructions
### Quick Start
Build all profiles:
```bash
cd /home/centra/cloud/baremetal/image-builder
./build-images.sh
```
Build a specific profile:
```bash
# Control plane only
./build-images.sh --profile control-plane
# Worker nodes only
./build-images.sh --profile worker
# All-in-one deployment
./build-images.sh --profile all-in-one
```
Custom output directory:
```bash
./build-images.sh --output-dir /srv/pxe/images
```
### Build Output
Each profile generates:
- `bzImage` - Linux kernel (~10-30 MB)
- `initrd` - Initial ramdisk (~100-300 MB)
- `netboot.ipxe` - iPXE boot script
- `build.log` - Build log for troubleshooting
Artifacts are placed in:
```
./artifacts/
├── control-plane/
│ ├── bzImage
│ ├── initrd
│ ├── netboot.ipxe
│ └── build.log
├── worker/
│ ├── bzImage
│ ├── initrd
│ ├── netboot.ipxe
│ └── build.log
└── all-in-one/
├── bzImage
├── initrd
├── netboot.ipxe
└── build.log
```
### Manual Build Commands
You can also build images directly with Nix:
```bash
# Build initrd
nix build .#nixosConfigurations.netboot-control-plane.config.system.build.netbootRamdisk
# Build kernel
nix build .#nixosConfigurations.netboot-control-plane.config.system.build.kernel
# Access artifacts
ls -lh result/
```
## Deployment
### Integration with PXE Server (T032.S2)
The build script automatically copies artifacts to the PXE server directory if it exists:
```
chainfire/baremetal/pxe-server/assets/nixos/
├── control-plane/
├── worker/
├── all-in-one/
├── bzImage-control-plane -> control-plane/bzImage
├── initrd-control-plane -> control-plane/initrd
├── bzImage-worker -> worker/bzImage
└── initrd-worker -> worker/initrd
```
### Manual Deployment
Copy artifacts to your PXE/HTTP server:
```bash
# Example: Deploy to nginx serving directory
sudo cp -r ./artifacts/control-plane /srv/pxe/nixos/
sudo cp -r ./artifacts/worker /srv/pxe/nixos/
sudo cp -r ./artifacts/all-in-one /srv/pxe/nixos/
```
### iPXE Boot Configuration
Reference the images in your iPXE boot script:
```ipxe
#!ipxe
set boot-server 10.0.0.2:8080
:control-plane
kernel http://${boot-server}/nixos/control-plane/bzImage init=/nix/store/*/init console=ttyS0,115200 console=tty0 loglevel=4
initrd http://${boot-server}/nixos/control-plane/initrd
boot
:worker
kernel http://${boot-server}/nixos/worker/bzImage init=/nix/store/*/init console=ttyS0,115200 console=tty0 loglevel=4
initrd http://${boot-server}/nixos/worker/initrd
boot
```
## Customization
### Adding Services
To add a service to a profile, edit the corresponding configuration:
```nix
# nix/images/netboot-control-plane.nix
environment.systemPackages = with pkgs; [
chainfire-server
flaredb-server
# ... existing services ...
my-custom-service # Add your service
];
```
### Custom Kernel Configuration
Modify `nix/images/netboot-base.nix`:
```nix
boot.kernelPackages = pkgs.linuxPackages_6_6; # Specific kernel version
boot.kernelModules = [ "my-driver" ]; # Additional modules
boot.kernelParams = [ "my-param=value" ]; # Additional kernel parameters
```
### Additional Packages
Add packages to the netboot environment:
```nix
# nix/images/netboot-base.nix
environment.systemPackages = with pkgs; [
# ... existing packages ...
# Your additions
python3
nodejs
custom-tool
];
```
### Hardware-Specific Configuration
See `examples/hardware-specific.nix` for hardware-specific customizations.
## Troubleshooting
### Build Failures
**Symptom**: Build fails with Nix errors
**Solutions**:
1. Check build log: `cat artifacts/PROFILE/build.log`
2. Verify Nix flakes are enabled
3. Update nixpkgs: `nix flake update`
4. Clear Nix store cache: `nix-collect-garbage -d`
### Missing Service Packages
**Symptom**: Error: "package not found"
**Solutions**:
1. Verify service is built: `nix build .#chainfire-server`
2. Check flake overlay: `nix flake show`
3. Rebuild all packages: `nix build .#default`
### Image Too Large
**Symptom**: Initrd > 500 MB
**Solutions**:
1. Remove unnecessary packages from `environment.systemPackages`
2. Disable documentation (already done in base config)
3. Use minimal kernel: `boot.kernelPackages = pkgs.linuxPackages_latest_hardened`
### PXE Boot Fails
**Symptom**: Server fails to boot netboot image
**Solutions**:
1. Verify artifacts are accessible via HTTP
2. Check iPXE script syntax
3. Verify kernel parameters in boot script
4. Check serial console output (ttyS0)
5. Ensure DHCP provides correct boot server IP
### SSH Access Issues
**Symptom**: Cannot SSH to netboot installer
**Solutions**:
1. Replace example SSH key in `nix/images/netboot-base.nix`
2. Verify network connectivity (DHCP, firewall)
3. Check SSH service is running: `systemctl status sshd`
## Configuration Reference
### Service Modules (T024 Integration)
All netboot profiles import PlasmaCloud service modules from `nix/modules/`:
- `chainfire.nix` - Chainfire configuration
- `flaredb.nix` - FlareDB configuration
- `iam.nix` - IAM configuration
- `plasmavmc.nix` - PlasmaVMC configuration
- `prismnet.nix` - PrismNET configuration
- `flashdns.nix` - FlashDNS configuration
- `fiberlb.nix` - FiberLB configuration
- `lightningstor.nix` - LightningStor configuration
- `k8shost.nix` - K8sHost configuration
Services are **disabled by default** in netboot images and enabled in final installed configurations.
### Netboot Base Configuration
Located at `nix/images/netboot-base.nix`, provides:
- SSH server with root access (key-based)
- Generic kernel with broad hardware support
- Disk management tools (disko, parted, cryptsetup, lvm2)
- Network tools (iproute2, curl, tcpdump)
- Serial console support (ttyS0, tty0)
- DHCP networking
- Minimal system configuration
### Profile Configurations
- `nix/images/netboot-control-plane.nix` - All 8 services
- `nix/images/netboot-worker.nix` - Compute services (PlasmaVMC, PrismNET)
- `nix/images/netboot-all-in-one.nix` - All services for single-node
## Security Considerations
### SSH Keys
**IMPORTANT**: The default SSH key in `netboot-base.nix` is an example placeholder. You MUST replace it with your actual provisioning key:
```nix
users.users.root.openssh.authorizedKeys.keys = [
"ssh-ed25519 AAAAC3Nza... your-provisioning-key@host"
];
```
Generate a new key:
```bash
ssh-keygen -t ed25519 -C "provisioning@plasmacloud"
```
### Network Security
- Netboot images have **firewall disabled** for installation phase
- Use isolated provisioning VLAN for PXE boot
- Implement MAC address whitelist in DHCP
- Enable firewall in final installed configurations
### Secrets Management
- Do NOT embed secrets in netboot images
- Use nixos-anywhere to inject secrets during installation
- Store secrets in `/etc/nixos/secrets/` on installed systems
- Use proper file permissions (0400 for keys)
## Next Steps
After building images:
1. **Deploy to PXE Server**: Copy artifacts to HTTP server
2. **Configure DHCP/iPXE**: Set up boot infrastructure (see T032.S2)
3. **Prepare Node Configurations**: Create per-node configs for nixos-anywhere
4. **Test Boot Process**: Verify PXE boot on test hardware
5. **Run nixos-anywhere**: Install NixOS on target machines
## Resources
- **Design Document**: `docs/por/T032-baremetal-provisioning/design.md`
- **PXE Infrastructure**: `chainfire/baremetal/pxe-server/`
- **Service Modules**: `nix/modules/`
- **Example Configurations**: `baremetal/image-builder/examples/`
## Support
For issues or questions:
1. Check build logs: `artifacts/PROFILE/build.log`
2. Review design document: `docs/por/T032-baremetal-provisioning/design.md`
3. Examine example configurations: `examples/`
4. Verify service module configuration: `nix/modules/`
## License
Apache 2.0 - See LICENSE file for details