photoncloud-monorepo/docs/por/T032-baremetal-provisioning/task.yaml
centra 5c6eb04a46 T036: Add VM cluster deployment configs for nixos-anywhere
- netboot-base.nix with SSH key auth
- Launch scripts for node01/02/03
- Node configuration.nix and disko.nix
- Nix modules for first-boot automation

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2025-12-11 09:59:19 +09:00

156 lines
7 KiB
YAML

id: T032
name: Bare-Metal Provisioning
goal: Implement Nix-based bare-metal provisioning for automated deployment from bare hardware to fully operational platform.
status: complete
priority: P0
owner: peerB
created: 2025-12-10
completed: 2025-12-10
depends_on: [T024]
blocks: []
context: |
PROJECT.md Item 10: "Nixによるベアメタルプロビジョニング"
T024 delivered NixOS packaging (flake + modules for all 8 services).
This task enables automated deployment from bare metal to running platform.
Key capabilities needed:
- PXE/iPXE network boot
- NixOS image generation with pre-configured services
- Declarative hardware configuration
- Automated first-boot setup
acceptance:
- Boot bare metal server via PXE/iPXE to NixOS installer
- Generate deployable NixOS images with all platform services
- Declarative configuration for hardware (disk partitioning, networking)
- First-boot automation (Chainfire/FlareDB cluster join, IAM bootstrap)
- Documentation for operator workflow
steps:
- step: S1
name: Research & Architecture
done: Design doc covering PXE flow, image generation, config injection
status: complete
owner: peerB
priority: P0
completed: 2025-12-10
notes: |
COMPLETE 2025-12-10: Comprehensive design document created (1,553 lines)
- docs/por/T032-baremetal-provisioning/design.md
- Researched nixos-anywhere, disko, iPXE/PXE boot, kexec
- Detailed architecture, boot flow, installation process
- Integration with T024/T027/T031 (NixOS modules, config, TLS)
- Code examples for DHCP, iPXE scripts, disko layouts
- Open questions documented for S2-S5 implementation
- step: S2
name: PXE Boot Infrastructure
done: iPXE server + DHCP config for network boot
status: complete
owner: peerB
priority: P0
completed: 2025-12-10
notes: |
COMPLETE 2025-12-10: Full PXE boot infrastructure (3,381+ lines, 13 files)
- chainfire/baremetal/pxe-server/dhcp/dhcpd.conf (ISC DHCP with BIOS/UEFI detection)
- chainfire/baremetal/pxe-server/ipxe/boot.ipxe (Boot menu with 3 profiles)
- chainfire/baremetal/pxe-server/http/nginx.conf (HTTP server for boot assets)
- chainfire/baremetal/pxe-server/nixos-module.nix (Declarative NixOS module)
- chainfire/baremetal/pxe-server/setup.sh (Automated setup script)
- Comprehensive docs: README.md, QUICKSTART.md, OVERVIEW.md, examples/
Profiles implemented:
- control-plane: All 8 services (chainfire, flaredb, plasmavmc, novanet, fiberlb, flashdns, lightningstor, k8shost)
- worker: Compute-focused (plasmavmc, novanet)
- all-in-one: Testing/homelab (all services on one node)
- step: S3
name: NixOS Image Builder
done: Tool to generate bootable NixOS images with platform services
status: complete
owner: peerB
priority: P0
completed: 2025-12-10
notes: |
COMPLETE 2025-12-10: NixOS netboot image builder (2,911 lines, 9 files)
- nix/images/netboot-base.nix (184L): Base config with SSH, disko, generic kernel
- nix/images/netboot-control-plane.nix (177L): All 8 services
- nix/images/netboot-worker.nix (133L): Compute-focused (plasmavmc, novanet)
- nix/images/netboot-all-in-one.nix (267L): All services, single-node optimized
- baremetal/image-builder/build-images.sh (389L, executable): Build automation
- baremetal/image-builder/README.md (388L): User documentation
- baremetal/image-builder/OVERVIEW.md (570L): Technical deep-dive
- baremetal/image-builder/examples/custom-netboot.nix (361L): Customization examples
- baremetal/image-builder/examples/hardware-specific.nix (442L): Platform-specific configs
- flake.nix: Updated with nixosConfigurations for all 3 profiles
Profiles:
- control-plane: All 8 services, HA-ready
- worker: VM compute workloads
- all-in-one: Dev/test/edge deployments
Integration: T024 service modules, S2 PXE infrastructure, automatic artifact deployment
- step: S4
name: First-Boot Automation
done: Automated cluster join and service initialization
status: complete
owner: peerB
priority: P1
completed: 2025-12-10
notes: |
COMPLETE 2025-12-10: First-boot automation (2,564 lines, 9 files)
- nix/modules/first-boot-automation.nix (402L): NixOS module with systemd services
- baremetal/first-boot/cluster-join.sh (167L, executable): Reusable cluster join logic
- baremetal/first-boot/health-check.sh (72L, executable): Health check wrapper
- baremetal/first-boot/bootstrap-detector.sh (89L, executable): Bootstrap vs join detection
- baremetal/first-boot/README.md (858L): Operator guide
- baremetal/first-boot/ARCHITECTURE.md (763L): Technical deep-dive
- baremetal/first-boot/examples/*.json (213L): Config examples (bootstrap, join, all-in-one)
Systemd Services:
- chainfire-cluster-join.service: Join Chainfire cluster (bootstrap or runtime)
- flaredb-cluster-join.service: Join FlareDB cluster after Chainfire
- iam-initial-setup.service: IAM initial admin setup
- cluster-health-check.service: Validate all services healthy
Features: Bootstrap detection, retry logic (5x10s), idempotency (marker files), structured logging (JSON)
- step: S5
name: Operator Documentation
done: Runbook for bare-metal deployment workflow
status: complete
owner: peerB
priority: P1
completed: 2025-12-10
notes: |
COMPLETE 2025-12-10: Comprehensive operator documentation (6,792 lines, 8 files)
- RUNBOOK.md (2,178L): Complete operator guide (10 sections: overview, hardware, network, pre-deployment, deployment workflow, validation, operations, troubleshooting, recovery, security)
- QUICKSTART.md (529L): Condensed 5-page guide for experienced operators
- HARDWARE.md (898L): Tested hardware platforms (Dell, HPE, Supermicro, Lenovo), BIOS/UEFI config, BMC/IPMI reference
- NETWORK.md (919L): Complete port matrix, DHCP options, DNS zones, firewall rules, VLAN guide
- COMMANDS.md (922L): All commands organized by task (PXE, images, provisioning, cluster, service, health, BMC, diagnostics)
- diagrams/deployment-flow.md (492L): End-to-end flow from bare metal to running cluster
- diagrams/network-topology.md (362L): Physical and logical network layout
- diagrams/service-dependencies.md (492L): Service startup order and dependencies
Coverage: 6 deployment scenarios (bootstrap, join, all-in-one, replacement, rolling updates, disaster recovery)
Cross-references: Complete integration with S1-S4 deliverables
evidence: []
notes: |
**Reference implementations:**
- nixos-anywhere: SSH-based remote NixOS installation
- disko: Declarative disk partitioning
- kexec: Fast kernel switch without full reboot
**Priority rationale:**
- S1-S3 P0: Core provisioning capability
- S4-S5 P1: Automation and documentation
**Integration with existing work:**
- T024: NixOS flake + modules foundation
- T027: TLS certificates and config unification
- T031: Service TLS configuration