id: T039 name: Production Deployment (Bare-Metal) goal: Deploy the full PlasmaCloud stack to target bare-metal environment using T032 provisioning tools and T036 learnings. status: active priority: P0 owner: peerA depends_on: [T032, T036, T038] blocks: [] context: | **MVP-Alpha Achieved: 12/12 components operational** With the application stack validated and provisioning tools proven (T032/T036), we now execute production deployment to bare-metal infrastructure. **Prerequisites:** - T032 (COMPLETE): PXE boot infra, NixOS image builder, first-boot automation (17,201L) - T036 (PARTIAL SUCCESS): VM validation proved infrastructure concepts - VDE networking validated L2 clustering - Custom netboot with SSH key auth validated zero-touch provisioning - Key learning: Full NixOS required (nix-copy-closure needs nix-daemon) - T038 (COMPLETE): Build chain working, all services compile **Key Insight from T036:** - nix-copy-closure requires nix on target → full NixOS deployment via nixos-anywhere - Custom netboot (minimal Linux) insufficient for nix-built services - T032's nixos-anywhere approach is architecturally correct acceptance: - All target bare-metal nodes provisioned with NixOS - ChainFire + FlareDB Raft clusters formed (3-node quorum) - IAM service operational on all control-plane nodes - All 12 services deployed and healthy - T029/T035 integration tests passing on live cluster - Production deployment documented in runbook steps: - step: S1 name: Hardware Readiness Verification done: Target bare-metal hardware accessible and ready for provisioning (verified by T032 completion) status: complete completed: 2025-12-12 04:15 JST - step: S2 name: Bootstrap Infrastructure done: PXE server or alternative boot mechanism operational status: pending owner: peerB priority: P0 notes: | Options (based on T036 learnings): A. PXE Boot (T032 default): - Deploy PXE server with netboot artifacts - Configure DHCP for PXE boot - Test boot on first node B. Direct Boot (T036 validated): - Use custom netboot with SSH key baked in - Boot via IPMI/iLO virtual media or USB - Eliminates PXE server dependency Decision point: PeerA to select based on hardware capabilities - step: S3 name: NixOS Provisioning done: All nodes provisioned with base NixOS via nixos-anywhere status: pending owner: peerB priority: P0 notes: | For each node: 1. Boot into installer environment (custom netboot or NixOS ISO) 2. Verify SSH access 3. Run nixos-anywhere with node-specific configuration: ``` nixos-anywhere --flake .#node01 root@ ``` 4. Wait for reboot and verify SSH access 5. Confirm NixOS installed successfully Node configurations from T036 (adapt IPs for production): - docs/por/T036-vm-cluster-deployment/node01/ - docs/por/T036-vm-cluster-deployment/node02/ - docs/por/T036-vm-cluster-deployment/node03/ - step: S4 name: Service Deployment done: All 12 PlasmaCloud services deployed and running status: pending owner: peerB priority: P0 notes: | Deploy services via NixOS modules (T024): - chainfire-server (cluster KVS) - flaredb-server (DBaaS KVS) - iam-server (aegis) - plasmavmc-server (VM infrastructure) - lightningstor-server (object storage) - flashdns-server (DNS) - fiberlb-server (load balancer) - novanet-server (overlay networking) - k8shost-server (K8s hosting) - metricstor-server (metrics) Service deployment is part of NixOS configuration in S3. This step verifies all services started successfully. - step: S5 name: Cluster Formation done: Raft clusters operational (ChainFire + FlareDB) status: pending owner: peerB priority: P0 notes: | Verify cluster formation: 1. ChainFire: - 3 nodes joined - Leader elected - Health check passing 2. FlareDB: - 3 nodes joined - Quorum formed - Read/write operations working 3. IAM: - All nodes responding - Authentication working - step: S6 name: Integration Testing done: T029/T035 integration tests passing on live cluster status: pending owner: peerA priority: P0 notes: | Run existing integration tests against production cluster: - T029 practical application tests (VM+NovaNET, FlareDB+IAM, k8shost) - T035 build validation tests - Cross-component integration verification If tests fail: - Document failures - Create follow-up task for fixes - Do not proceed to production traffic until resolved evidence: [] notes: | **T036 Learnings Applied:** - Use full NixOS deployment (not minimal netboot) - nixos-anywhere is the proven deployment path - Custom netboot with SSH key auth for zero-touch access - VDE networking concepts map to real L2 switches **Risk Mitigations:** - Hardware validation before deployment (S1) - Staged deployment (node-by-node) - Integration testing before production traffic (S6) - Rollback plan: Re-provision from scratch if needed