photoncloud-monorepo/docs/hardware-bringup.md

135 lines
5.3 KiB
Markdown

# Hardware Bring-Up
This document is the operator bridge between the canonical QEMU ISO proof and a real USB or BMC/Redfish install smoke.
## Canonical entrypoint
```bash
nix run ./nix/test-cluster#hardware-smoke -- preflight
nix run ./nix/test-cluster#hardware-smoke -- run
nix run ./nix/test-cluster#hardware-smoke -- capture
```
The wrapper always writes artifacts under `./work/hardware-smoke/<run-id>` and refreshes `./work/hardware-smoke/latest`.
## What it fixes
- kernel parameters are emitted once in `kernel-params.txt`
- expected success markers are emitted once in `expected-markers.txt`
- failure markers are emitted once in `failure-markers.txt`
- operator instructions are emitted once in `operator-handoff.md`
- missing transport inputs are emitted once in `missing-requirements.txt`
When transport is absent, `preflight` exits successfully but records `status=blocked` in `status.env`.
## Shared ISO contract
The physical-node wrapper uses the same ISO attr and the same success markers as the QEMU proof:
- ISO attr: `.#nixosConfigurations.ultracloud-iso.config.system.build.isoImage`
- QEMU proof: `nix run ./nix/test-cluster#cluster -- baremetal-iso`
- exact local-KVM proof: `nix build .#checks.x86_64-linux.baremetal-iso-e2e && ./result/bin/baremetal-iso-e2e`
The bridge is intentional: QEMU stands in for the chassis only. The install sequence stays `phone-home -> bundle download -> Disko -> reboot -> post-install boot -> desired-system active`.
## Required kernel parameters
`hardware-smoke.sh` writes the exact kernel parameter set to `kernel-params.txt`:
- `ultracloud.deployer_url=<scheme://host:port>`
- `ultracloud.bootstrap_token=<token>` or a deliberate unauthenticated lab deployer with `ULTRACLOUD_HARDWARE_ALLOW_UNAUTHENTICATED=1`
- optional `ultracloud.ca_cert_url=<https://.../ca.crt>`
- optional `ultracloud.binary_cache_url=<http://cache:8090>`
- optional `ultracloud.node_id=<node-id>`
- optional `ultracloud.hostname=<hostname>`
## Expected success markers
The wrapper records the canonical marker list in `expected-markers.txt`:
- `ULTRACLOUD_MARKER pre-install.boot.<node-id>`
- `ULTRACLOUD_MARKER pre-install.phone-home.complete.<node-id>`
- `ULTRACLOUD_MARKER install.bundle-downloaded.<node-id>`
- `ULTRACLOUD_MARKER install.disko.complete.<node-id>`
- `ULTRACLOUD_MARKER install.nixos-install.complete.<node-id>`
- `ULTRACLOUD_MARKER reboot.<node-id>`
- `ULTRACLOUD_MARKER post-install.boot.<node-id>.<role>`
- `ULTRACLOUD_MARKER desired-system-active.<node-id>`
The wrapper also expects `nix-agent.service` to be active after install, and `chainfire.service` to be active when the node role is `control-plane`.
## USB path
Provide:
- `ULTRACLOUD_HARDWARE_TRANSPORT=usb`
- `ULTRACLOUD_HARDWARE_USB_DEVICE=/dev/sdX`
- `ULTRACLOUD_HARDWARE_ALLOW_DESTRUCTIVE=YES`
- `ULTRACLOUD_HARDWARE_DEPLOYER_URL=...`
- `ULTRACLOUD_HARDWARE_BOOTSTRAP_TOKEN=...` or `ULTRACLOUD_HARDWARE_ALLOW_UNAUTHENTICATED=1`
- `ULTRACLOUD_HARDWARE_SSH_HOST=...` or `ULTRACLOUD_HARDWARE_SERIAL_LOG=...`
Example:
```bash
ULTRACLOUD_HARDWARE_TRANSPORT=usb \
ULTRACLOUD_HARDWARE_USB_DEVICE=/dev/sdX \
ULTRACLOUD_HARDWARE_ALLOW_DESTRUCTIVE=YES \
ULTRACLOUD_HARDWARE_DEPLOYER_URL=http://10.0.0.10:8088 \
ULTRACLOUD_HARDWARE_BOOTSTRAP_TOKEN=lab-bootstrap-token \
ULTRACLOUD_HARDWARE_SSH_HOST=10.0.0.21 \
nix run ./nix/test-cluster#hardware-smoke -- run
```
## BMC / Redfish virtual media path
Provide:
- `ULTRACLOUD_HARDWARE_TRANSPORT=redfish` or `bmc`
- `ULTRACLOUD_HARDWARE_REDFISH_ENDPOINT=https://bmc.example`
- `ULTRACLOUD_HARDWARE_REDFISH_USERNAME=...`
- `ULTRACLOUD_HARDWARE_REDFISH_PASSWORD=...`
- `ULTRACLOUD_HARDWARE_ISO_URL=https://http-server/ultracloud-bootstrap.iso`
- optional `ULTRACLOUD_HARDWARE_REDFISH_SYSTEM_ID=System.Embedded.1`
- optional `ULTRACLOUD_HARDWARE_REDFISH_MANAGER_ID=iDRAC.Embedded.1`
- optional `ULTRACLOUD_HARDWARE_REDFISH_VIRTUAL_MEDIA_ID=CD`
- `ULTRACLOUD_HARDWARE_DEPLOYER_URL=...`
- `ULTRACLOUD_HARDWARE_BOOTSTRAP_TOKEN=...` or `ULTRACLOUD_HARDWARE_ALLOW_UNAUTHENTICATED=1`
- `ULTRACLOUD_HARDWARE_SSH_HOST=...` or `ULTRACLOUD_HARDWARE_SERIAL_LOG=...`
Example:
```bash
ULTRACLOUD_HARDWARE_TRANSPORT=redfish \
ULTRACLOUD_HARDWARE_REDFISH_ENDPOINT=https://bmc.example \
ULTRACLOUD_HARDWARE_REDFISH_USERNAME=admin \
ULTRACLOUD_HARDWARE_REDFISH_PASSWORD=secret \
ULTRACLOUD_HARDWARE_ISO_URL=https://mirror.example/ultracloud-bootstrap.iso \
ULTRACLOUD_HARDWARE_DEPLOYER_URL=http://10.0.0.10:8088 \
ULTRACLOUD_HARDWARE_BOOTSTRAP_TOKEN=lab-bootstrap-token \
ULTRACLOUD_HARDWARE_SSH_HOST=10.0.0.21 \
nix run ./nix/test-cluster#hardware-smoke -- run
```
## Capture-only mode
If the transport action is manual, keep the same proof root and collect the success evidence later:
```bash
ULTRACLOUD_HARDWARE_PROOF_ROOT=./work/hardware-smoke/latest \
ULTRACLOUD_HARDWARE_SSH_HOST=10.0.0.21 \
nix run ./nix/test-cluster#hardware-smoke -- capture
```
## Failure and blocked behavior
`preflight` records `status=blocked` when any of these are missing:
- transport device or BMC/Redfish endpoint
- deployer URL
- bootstrap token or explicit unauthenticated acknowledgement
- USB destructive acknowledgement
- BMC/Redfish ISO URL
- capture channel for `desired-system active`
That blocked state is intentional. It means the repo is ready for a physical-node run, but the local session still lacks the external transport or credentials needed to execute it.