id: T032 name: Bare-Metal Provisioning goal: Implement Nix-based bare-metal provisioning for automated deployment from bare hardware to fully operational platform. status: complete priority: P0 owner: peerB created: 2025-12-10 completed: 2025-12-10 depends_on: [T024] blocks: [] context: | PROJECT.md Item 10: "Nixによるベアメタルプロビジョニング" T024 delivered NixOS packaging (flake + modules for all 8 services). This task enables automated deployment from bare metal to running platform. Key capabilities needed: - PXE/iPXE network boot - NixOS image generation with pre-configured services - Declarative hardware configuration - Automated first-boot setup acceptance: - Boot bare metal server via PXE/iPXE to NixOS installer - Generate deployable NixOS images with all platform services - Declarative configuration for hardware (disk partitioning, networking) - First-boot automation (Chainfire/FlareDB cluster join, IAM bootstrap) - Documentation for operator workflow steps: - step: S1 name: Research & Architecture done: Design doc covering PXE flow, image generation, config injection status: complete owner: peerB priority: P0 completed: 2025-12-10 notes: | COMPLETE 2025-12-10: Comprehensive design document created (1,553 lines) - docs/por/T032-baremetal-provisioning/design.md - Researched nixos-anywhere, disko, iPXE/PXE boot, kexec - Detailed architecture, boot flow, installation process - Integration with T024/T027/T031 (NixOS modules, config, TLS) - Code examples for DHCP, iPXE scripts, disko layouts - Open questions documented for S2-S5 implementation - step: S2 name: PXE Boot Infrastructure done: iPXE server + DHCP config for network boot status: complete owner: peerB priority: P0 completed: 2025-12-10 notes: | COMPLETE 2025-12-10: Full PXE boot infrastructure (3,381+ lines, 13 files) - chainfire/baremetal/pxe-server/dhcp/dhcpd.conf (ISC DHCP with BIOS/UEFI detection) - chainfire/baremetal/pxe-server/ipxe/boot.ipxe (Boot menu with 3 profiles) - chainfire/baremetal/pxe-server/http/nginx.conf (HTTP server for boot assets) - chainfire/baremetal/pxe-server/nixos-module.nix (Declarative NixOS module) - chainfire/baremetal/pxe-server/setup.sh (Automated setup script) - Comprehensive docs: README.md, QUICKSTART.md, OVERVIEW.md, examples/ Profiles implemented: - control-plane: All 8 services (chainfire, flaredb, plasmavmc, novanet, fiberlb, flashdns, lightningstor, k8shost) - worker: Compute-focused (plasmavmc, novanet) - all-in-one: Testing/homelab (all services on one node) - step: S3 name: NixOS Image Builder done: Tool to generate bootable NixOS images with platform services status: complete owner: peerB priority: P0 completed: 2025-12-10 notes: | COMPLETE 2025-12-10: NixOS netboot image builder (2,911 lines, 9 files) - nix/images/netboot-base.nix (184L): Base config with SSH, disko, generic kernel - nix/images/netboot-control-plane.nix (177L): All 8 services - nix/images/netboot-worker.nix (133L): Compute-focused (plasmavmc, novanet) - nix/images/netboot-all-in-one.nix (267L): All services, single-node optimized - baremetal/image-builder/build-images.sh (389L, executable): Build automation - baremetal/image-builder/README.md (388L): User documentation - baremetal/image-builder/OVERVIEW.md (570L): Technical deep-dive - baremetal/image-builder/examples/custom-netboot.nix (361L): Customization examples - baremetal/image-builder/examples/hardware-specific.nix (442L): Platform-specific configs - flake.nix: Updated with nixosConfigurations for all 3 profiles Profiles: - control-plane: All 8 services, HA-ready - worker: VM compute workloads - all-in-one: Dev/test/edge deployments Integration: T024 service modules, S2 PXE infrastructure, automatic artifact deployment - step: S4 name: First-Boot Automation done: Automated cluster join and service initialization status: complete owner: peerB priority: P1 completed: 2025-12-10 notes: | COMPLETE 2025-12-10: First-boot automation (2,564 lines, 9 files) - nix/modules/first-boot-automation.nix (402L): NixOS module with systemd services - baremetal/first-boot/cluster-join.sh (167L, executable): Reusable cluster join logic - baremetal/first-boot/health-check.sh (72L, executable): Health check wrapper - baremetal/first-boot/bootstrap-detector.sh (89L, executable): Bootstrap vs join detection - baremetal/first-boot/README.md (858L): Operator guide - baremetal/first-boot/ARCHITECTURE.md (763L): Technical deep-dive - baremetal/first-boot/examples/*.json (213L): Config examples (bootstrap, join, all-in-one) Systemd Services: - chainfire-cluster-join.service: Join Chainfire cluster (bootstrap or runtime) - flaredb-cluster-join.service: Join FlareDB cluster after Chainfire - iam-initial-setup.service: IAM initial admin setup - cluster-health-check.service: Validate all services healthy Features: Bootstrap detection, retry logic (5x10s), idempotency (marker files), structured logging (JSON) - step: S5 name: Operator Documentation done: Runbook for bare-metal deployment workflow status: complete owner: peerB priority: P1 completed: 2025-12-10 notes: | COMPLETE 2025-12-10: Comprehensive operator documentation (6,792 lines, 8 files) - RUNBOOK.md (2,178L): Complete operator guide (10 sections: overview, hardware, network, pre-deployment, deployment workflow, validation, operations, troubleshooting, recovery, security) - QUICKSTART.md (529L): Condensed 5-page guide for experienced operators - HARDWARE.md (898L): Tested hardware platforms (Dell, HPE, Supermicro, Lenovo), BIOS/UEFI config, BMC/IPMI reference - NETWORK.md (919L): Complete port matrix, DHCP options, DNS zones, firewall rules, VLAN guide - COMMANDS.md (922L): All commands organized by task (PXE, images, provisioning, cluster, service, health, BMC, diagnostics) - diagrams/deployment-flow.md (492L): End-to-end flow from bare metal to running cluster - diagrams/network-topology.md (362L): Physical and logical network layout - diagrams/service-dependencies.md (492L): Service startup order and dependencies Coverage: 6 deployment scenarios (bootstrap, join, all-in-one, replacement, rolling updates, disaster recovery) Cross-references: Complete integration with S1-S4 deliverables evidence: [] notes: | **Reference implementations:** - nixos-anywhere: SSH-based remote NixOS installation - disko: Declarative disk partitioning - kexec: Fast kernel switch without full reboot **Priority rationale:** - S1-S3 P0: Core provisioning capability - S4-S5 P1: Automation and documentation **Integration with existing work:** - T024: NixOS flake + modules foundation - T027: TLS certificates and config unification - T031: Service TLS configuration