photoncloud-monorepo/docs/hardware-bringup.md

5.3 KiB

Hardware Bring-Up

This document is the operator bridge between the canonical QEMU ISO proof and a real USB or BMC/Redfish install smoke.

Canonical entrypoint

nix run ./nix/test-cluster#hardware-smoke -- preflight
nix run ./nix/test-cluster#hardware-smoke -- run
nix run ./nix/test-cluster#hardware-smoke -- capture

The wrapper always writes artifacts under ./work/hardware-smoke/<run-id> and refreshes ./work/hardware-smoke/latest.

What it fixes

  • kernel parameters are emitted once in kernel-params.txt
  • expected success markers are emitted once in expected-markers.txt
  • failure markers are emitted once in failure-markers.txt
  • operator instructions are emitted once in operator-handoff.md
  • missing transport inputs are emitted once in missing-requirements.txt

When transport is absent, preflight exits successfully but records status=blocked in status.env.

Shared ISO contract

The physical-node wrapper uses the same ISO attr and the same success markers as the QEMU proof:

  • ISO attr: .#nixosConfigurations.ultracloud-iso.config.system.build.isoImage
  • QEMU proof: nix run ./nix/test-cluster#cluster -- baremetal-iso
  • exact local-KVM proof: nix build .#checks.x86_64-linux.baremetal-iso-e2e && ./result/bin/baremetal-iso-e2e

The bridge is intentional: QEMU stands in for the chassis only. The install sequence stays phone-home -> bundle download -> Disko -> reboot -> post-install boot -> desired-system active.

Required kernel parameters

hardware-smoke.sh writes the exact kernel parameter set to kernel-params.txt:

  • ultracloud.deployer_url=<scheme://host:port>
  • ultracloud.bootstrap_token=<token> or a deliberate unauthenticated lab deployer with ULTRACLOUD_HARDWARE_ALLOW_UNAUTHENTICATED=1
  • optional ultracloud.ca_cert_url=<https://.../ca.crt>
  • optional ultracloud.binary_cache_url=<http://cache:8090>
  • optional ultracloud.node_id=<node-id>
  • optional ultracloud.hostname=<hostname>

Expected success markers

The wrapper records the canonical marker list in expected-markers.txt:

  • ULTRACLOUD_MARKER pre-install.boot.<node-id>
  • ULTRACLOUD_MARKER pre-install.phone-home.complete.<node-id>
  • ULTRACLOUD_MARKER install.bundle-downloaded.<node-id>
  • ULTRACLOUD_MARKER install.disko.complete.<node-id>
  • ULTRACLOUD_MARKER install.nixos-install.complete.<node-id>
  • ULTRACLOUD_MARKER reboot.<node-id>
  • ULTRACLOUD_MARKER post-install.boot.<node-id>.<role>
  • ULTRACLOUD_MARKER desired-system-active.<node-id>

The wrapper also expects nix-agent.service to be active after install, and chainfire.service to be active when the node role is control-plane.

USB path

Provide:

  • ULTRACLOUD_HARDWARE_TRANSPORT=usb
  • ULTRACLOUD_HARDWARE_USB_DEVICE=/dev/sdX
  • ULTRACLOUD_HARDWARE_ALLOW_DESTRUCTIVE=YES
  • ULTRACLOUD_HARDWARE_DEPLOYER_URL=...
  • ULTRACLOUD_HARDWARE_BOOTSTRAP_TOKEN=... or ULTRACLOUD_HARDWARE_ALLOW_UNAUTHENTICATED=1
  • ULTRACLOUD_HARDWARE_SSH_HOST=... or ULTRACLOUD_HARDWARE_SERIAL_LOG=...

Example:

ULTRACLOUD_HARDWARE_TRANSPORT=usb \
ULTRACLOUD_HARDWARE_USB_DEVICE=/dev/sdX \
ULTRACLOUD_HARDWARE_ALLOW_DESTRUCTIVE=YES \
ULTRACLOUD_HARDWARE_DEPLOYER_URL=http://10.0.0.10:8088 \
ULTRACLOUD_HARDWARE_BOOTSTRAP_TOKEN=lab-bootstrap-token \
ULTRACLOUD_HARDWARE_SSH_HOST=10.0.0.21 \
nix run ./nix/test-cluster#hardware-smoke -- run

BMC / Redfish virtual media path

Provide:

  • ULTRACLOUD_HARDWARE_TRANSPORT=redfish or bmc
  • ULTRACLOUD_HARDWARE_REDFISH_ENDPOINT=https://bmc.example
  • ULTRACLOUD_HARDWARE_REDFISH_USERNAME=...
  • ULTRACLOUD_HARDWARE_REDFISH_PASSWORD=...
  • ULTRACLOUD_HARDWARE_ISO_URL=https://http-server/ultracloud-bootstrap.iso
  • optional ULTRACLOUD_HARDWARE_REDFISH_SYSTEM_ID=System.Embedded.1
  • optional ULTRACLOUD_HARDWARE_REDFISH_MANAGER_ID=iDRAC.Embedded.1
  • optional ULTRACLOUD_HARDWARE_REDFISH_VIRTUAL_MEDIA_ID=CD
  • ULTRACLOUD_HARDWARE_DEPLOYER_URL=...
  • ULTRACLOUD_HARDWARE_BOOTSTRAP_TOKEN=... or ULTRACLOUD_HARDWARE_ALLOW_UNAUTHENTICATED=1
  • ULTRACLOUD_HARDWARE_SSH_HOST=... or ULTRACLOUD_HARDWARE_SERIAL_LOG=...

Example:

ULTRACLOUD_HARDWARE_TRANSPORT=redfish \
ULTRACLOUD_HARDWARE_REDFISH_ENDPOINT=https://bmc.example \
ULTRACLOUD_HARDWARE_REDFISH_USERNAME=admin \
ULTRACLOUD_HARDWARE_REDFISH_PASSWORD=secret \
ULTRACLOUD_HARDWARE_ISO_URL=https://mirror.example/ultracloud-bootstrap.iso \
ULTRACLOUD_HARDWARE_DEPLOYER_URL=http://10.0.0.10:8088 \
ULTRACLOUD_HARDWARE_BOOTSTRAP_TOKEN=lab-bootstrap-token \
ULTRACLOUD_HARDWARE_SSH_HOST=10.0.0.21 \
nix run ./nix/test-cluster#hardware-smoke -- run

Capture-only mode

If the transport action is manual, keep the same proof root and collect the success evidence later:

ULTRACLOUD_HARDWARE_PROOF_ROOT=./work/hardware-smoke/latest \
ULTRACLOUD_HARDWARE_SSH_HOST=10.0.0.21 \
nix run ./nix/test-cluster#hardware-smoke -- capture

Failure and blocked behavior

preflight records status=blocked when any of these are missing:

  • transport device or BMC/Redfish endpoint
  • deployer URL
  • bootstrap token or explicit unauthenticated acknowledgement
  • USB destructive acknowledgement
  • BMC/Redfish ISO URL
  • capture channel for desired-system active

That blocked state is intentional. It means the repo is ready for a physical-node run, but the local session still lacks the external transport or credentials needed to execute it.