- netboot-base.nix with SSH key auth - Launch scripts for node01/02/03 - Node configuration.nix and disko.nix - Nix modules for first-boot automation 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
898 lines
24 KiB
Markdown
898 lines
24 KiB
Markdown
# Hardware Compatibility Guide
|
|
|
|
**Document Version:** 1.0
|
|
**Last Updated:** 2025-12-10
|
|
|
|
## Table of Contents
|
|
|
|
- [Tested Hardware Platforms](#tested-hardware-platforms)
|
|
- [BIOS/UEFI Configuration by Vendor](#biosuefi-configuration-by-vendor)
|
|
- [Known Issues and Workarounds](#known-issues-and-workarounds)
|
|
- [Hardware-Specific NixOS Modules](#hardware-specific-nixos-modules)
|
|
- [BMC/IPMI Command Reference](#bmcipmi-command-reference)
|
|
- [Hardware Recommendations](#hardware-recommendations)
|
|
|
|
## Tested Hardware Platforms
|
|
|
|
### Dell PowerEdge Servers
|
|
|
|
| Model | BIOS | UEFI | PXE Boot | Notes | Status |
|
|
|--------------|------|------|----------|------------------------------------|-----------|
|
|
| R640 | Yes | Yes | Yes | Requires BIOS version A19+ | Fully Tested |
|
|
| R650 | Yes | Yes | Yes | Best PXE compatibility | Fully Tested |
|
|
| R740 | Yes | Yes | Yes | Disable Secure Boot for PXE | Fully Tested |
|
|
| R750 | Yes | Yes | Yes | Latest iDRAC firmware recommended | Tested |
|
|
| R6515 | Yes | Yes | Yes | AMD EPYC, requires BIOS 2.10+ | Tested |
|
|
| R6525 | Yes | Yes | Yes | Dual AMD EPYC | Tested |
|
|
|
|
**Recommended Dell Configuration:**
|
|
- iDRAC Enterprise license (for virtual console)
|
|
- Latest BIOS and iDRAC firmware
|
|
- Redundant power supplies
|
|
- Dual 10GbE or 25GbE NICs
|
|
|
|
### HPE ProLiant Servers
|
|
|
|
| Model | BIOS | UEFI | PXE Boot | Notes | Status |
|
|
|--------------|------|------|----------|------------------------------------|-----------|
|
|
| DL360 Gen10 | Yes | Yes | Yes | Requires iLO 5 firmware 2.40+ | Fully Tested |
|
|
| DL380 Gen10 | Yes | Yes | Yes | Excellent PXE support | Fully Tested |
|
|
| DL360 Gen10+ | Yes | Yes | Yes | Latest generation, best support | Tested |
|
|
| DL385 Gen10+ | Yes | Yes | Yes | AMD EPYC, good NixOS support | Tested |
|
|
|
|
**Recommended HPE Configuration:**
|
|
- iLO Advanced license (for remote console)
|
|
- Smart Array controller in AHCI/HBA mode (for direct disk access)
|
|
- Latest System ROM and iLO firmware
|
|
- HPE Ethernet 10/25Gb 2-port adapters
|
|
|
|
### Supermicro Servers
|
|
|
|
| Model | BIOS | UEFI | PXE Boot | Notes | Status |
|
|
|--------------|------|------|----------|------------------------------------|-----------|
|
|
| SYS-2029U | Yes | Yes | Yes | Requires BMC firmware 1.73+ | Fully Tested |
|
|
| SYS-1029P | Yes | Yes | Yes | Good PXE support | Tested |
|
|
| SYS-6029U | Yes | Yes | Yes | 2U 4-node, tested per-node | Tested |
|
|
| X11DPH-T | Yes | Yes | Yes | Motherboard, good compatibility | Tested |
|
|
|
|
**Recommended Supermicro Configuration:**
|
|
- Latest BMC firmware (IPMI 2.0)
|
|
- Dedicated BMC port (not shared with NIC)
|
|
- Intel or Mellanox NICs (better Linux support than Broadcom)
|
|
|
|
### Lenovo ThinkSystem
|
|
|
|
| Model | BIOS | UEFI | PXE Boot | Notes | Status |
|
|
|--------------|------|------|----------|------------------------------------|-----------|
|
|
| SR630 | Yes | Yes | Yes | Requires XCC firmware 4.10+ | Tested |
|
|
| SR650 | Yes | Yes | Yes | Good PXE support | Tested |
|
|
| SR670 | Yes | Yes | Partial | Some NIC driver issues (see below) | Partial |
|
|
|
|
**Known Issue:** Older Lenovo models (pre-2019) may have NIC driver issues with NixOS netboot kernel. Update to latest NIC firmware.
|
|
|
|
### Generic Whitebox x86 Servers
|
|
|
|
| Type | BIOS | UEFI | PXE Boot | Notes | Status |
|
|
|---------------|------|------|----------|-----------------------------------|-----------|
|
|
| Intel-based | Yes | Maybe| Yes | Depends on motherboard firmware | Varies |
|
|
| AMD-based | Yes | Maybe| Yes | UEFI support varies by board | Varies |
|
|
|
|
**Recommendations:**
|
|
- Use server-grade motherboards (Supermicro, ASRock Rack, Gigabyte)
|
|
- Verify UEFI PXE support in motherboard specs
|
|
- Test PXE boot before purchasing large quantities
|
|
- Intel NICs preferred (i350, X520, X540) over Realtek
|
|
|
|
## BIOS/UEFI Configuration by Vendor
|
|
|
|
### Dell PowerEdge (iDRAC)
|
|
|
|
**Access BIOS:**
|
|
1. Power on server
|
|
2. Press F2 during POST
|
|
3. Navigate with arrow keys, Enter to select
|
|
|
|
**PXE Boot Configuration:**
|
|
|
|
```
|
|
System BIOS → Boot Settings:
|
|
Boot Mode: UEFI (or BIOS for legacy systems)
|
|
Boot Sequence Retry: Enabled
|
|
UEFI Network Stack: Enabled
|
|
|
|
System BIOS → Network Settings:
|
|
PXE Device 1: Embedded NIC 1 Port 1
|
|
PXE Device Settings: UEFI
|
|
|
|
System BIOS → Boot Sequence:
|
|
Boot Mode: UEFI
|
|
Boot Sequence:
|
|
1. UEFI: Network - Embedded NIC 1 Port 1
|
|
2. UEFI Hard Drive
|
|
3. ...
|
|
```
|
|
|
|
**Performance Settings:**
|
|
```
|
|
System BIOS → System Profile Settings:
|
|
System Profile: Performance
|
|
|
|
System BIOS → Processor Settings:
|
|
Logical Processor: Enabled
|
|
Virtualization Technology: Enabled
|
|
Execute Disable: Enabled
|
|
|
|
System BIOS → Memory Settings:
|
|
Node Interleaving: Disabled (for NUMA awareness)
|
|
Memory Operating Mode: Optimizer Mode
|
|
```
|
|
|
|
**Disable Secure Boot (required for PXE):**
|
|
```
|
|
System BIOS → Secure Boot:
|
|
Secure Boot: Disabled
|
|
```
|
|
|
|
**iDRAC Configuration (via iDRAC web interface):**
|
|
```
|
|
iDRAC Settings → Network:
|
|
Enable NIC: Enabled
|
|
NIC Selection: Dedicated
|
|
IPv4 Settings:
|
|
Enable IPv4: Enabled
|
|
DHCP or Static IP: Static
|
|
IP Address: 10.0.10.50
|
|
Subnet Mask: 255.255.255.0
|
|
Gateway: 10.0.10.1
|
|
|
|
iDRAC Settings → User Authentication:
|
|
Change default password: <strong password>
|
|
|
|
Console/Media → Virtual Console:
|
|
Enabled: Yes
|
|
Plug-in Type: HTML5
|
|
|
|
Services → Virtual Console:
|
|
Enable Virtual Console: Enabled
|
|
```
|
|
|
|
**CLI Commands (racadm):**
|
|
```bash
|
|
# Configure network boot
|
|
racadm set BIOS.BiosBootSettings.BootMode Uefi
|
|
racadm set BIOS.PxeDev1Settings.PxeDev1Interface.Embedded.NIC.1-1-1
|
|
racadm jobqueue create BIOS.Setup.1-1
|
|
|
|
# Set boot order (network first)
|
|
racadm set BIOS.BiosBootSettings.BootSeq Nic.Embedded.1-1-1,HardDisk.List.1-1
|
|
|
|
# Enable virtualization
|
|
racadm set BIOS.ProcSettings.LogicalProc Enabled
|
|
racadm set BIOS.ProcSettings.ProcVirtualization Enabled
|
|
```
|
|
|
|
### HPE ProLiant (iLO)
|
|
|
|
**Access BIOS:**
|
|
1. Power on server
|
|
2. Press F9 during POST
|
|
3. Navigate with arrow keys, F10 to save
|
|
|
|
**PXE Boot Configuration:**
|
|
|
|
```
|
|
System Configuration → BIOS/Platform Configuration (RBSU):
|
|
|
|
Boot Options → Boot Mode:
|
|
Boot Mode: UEFI Mode
|
|
|
|
Boot Options → UEFI Optimized Boot:
|
|
UEFI Optimized Boot: Enabled
|
|
|
|
Network Options → Network Boot:
|
|
Network Boot: Enabled
|
|
PXE Support: UEFI Only
|
|
|
|
Network Options → Pre-Boot Network Environment:
|
|
Pre-Boot Network Environment: Auto
|
|
|
|
Boot Options → UEFI Boot Order:
|
|
1. Embedded FlexibleLOM 1 Port 1 : HPE Ethernet...
|
|
2. Generic USB Boot
|
|
3. Embedded SATA
|
|
```
|
|
|
|
**Performance Settings:**
|
|
```
|
|
System Configuration → BIOS/Platform Configuration (RBSU):
|
|
|
|
Processor Options:
|
|
Intel Hyperthreading Options: Enabled
|
|
Intel Virtualization Technology: Enabled
|
|
|
|
Memory Options:
|
|
Node Interleaving: Disabled
|
|
Memory Patrol Scrubbing: Enabled
|
|
|
|
Power and Performance Options:
|
|
Power Regulator: Static High Performance Mode
|
|
Collaborative Power Control: Disabled
|
|
```
|
|
|
|
**Disable Secure Boot:**
|
|
```
|
|
System Configuration → BIOS/Platform Configuration (RBSU):
|
|
|
|
Server Security → Secure Boot Settings:
|
|
Secure Boot Enforcement: Disabled
|
|
```
|
|
|
|
**iLO Configuration (via iLO web interface):**
|
|
```
|
|
Network → iLO Dedicated Network Port:
|
|
Enable iLO Dedicated Network Port: Enabled
|
|
|
|
Network Settings:
|
|
DHCP Enable: Disabled
|
|
IP Address: 10.0.10.50
|
|
Subnet Mask: 255.255.255.0
|
|
Gateway: 10.0.10.1
|
|
|
|
Administration → Access Settings:
|
|
Change default password: <strong password>
|
|
|
|
Remote Console → Remote Console Settings:
|
|
Remote Console Enabled: Yes
|
|
.NET IRC or Java IRC: HTML5
|
|
```
|
|
|
|
**CLI Commands (hponcfg):**
|
|
```bash
|
|
# Enable network boot (via iLO SSH)
|
|
set /system1/bootconfig1/bootsource5 bootorder=1
|
|
|
|
# Enable virtualization
|
|
set /system1/cpu1 ProcessorEnableIntelVT=Yes
|
|
```
|
|
|
|
### Supermicro (IPMI)
|
|
|
|
**Access BIOS:**
|
|
1. Power on server
|
|
2. Press Delete during POST
|
|
3. Navigate with arrow keys, F10 to save
|
|
|
|
**PXE Boot Configuration:**
|
|
|
|
```
|
|
BIOS Setup → Boot:
|
|
Boot mode select: UEFI
|
|
UEFI Network Stack: Enabled
|
|
IPv4 PXE Support: Enabled
|
|
IPv6 PXE Support: Disabled (unless needed)
|
|
|
|
BIOS Setup → Boot Priority:
|
|
Boot Option #1: UEFI Network : ...
|
|
Boot Option #2: UEFI Hard Disk
|
|
|
|
BIOS Setup → Advanced → Network Stack Configuration:
|
|
Network Stack: Enabled
|
|
Ipv4 PXE Support: Enabled
|
|
```
|
|
|
|
**Performance Settings:**
|
|
```
|
|
BIOS Setup → Advanced → CPU Configuration:
|
|
Hyper-Threading: Enabled
|
|
Intel Virtualization Technology: Enabled
|
|
Execute Disable Bit: Enabled
|
|
|
|
BIOS Setup → Advanced → Chipset Configuration → North Bridge:
|
|
NUMA: Enabled
|
|
|
|
BIOS Setup → Advanced → Power & Performance:
|
|
Power Technology: Performance
|
|
```
|
|
|
|
**Disable Secure Boot:**
|
|
```
|
|
BIOS Setup → Boot → Secure Boot:
|
|
Secure Boot: Disabled
|
|
```
|
|
|
|
**IPMI Configuration (via web interface or ipmitool):**
|
|
|
|
Web Interface:
|
|
```
|
|
Configuration → Network:
|
|
IP Assignment: Static
|
|
IP Address: 10.0.10.50
|
|
Subnet Mask: 255.255.255.0
|
|
Gateway: 10.0.10.1
|
|
|
|
Configuration → Users:
|
|
User 2 (ADMIN): <strong password>
|
|
|
|
Remote Control → Console Redirection:
|
|
Enable Remote Console: Yes
|
|
```
|
|
|
|
**CLI Commands (ipmitool):**
|
|
```bash
|
|
# Set static IP
|
|
ipmitool lan set 1 ipsrc static
|
|
ipmitool lan set 1 ipaddr 10.0.10.50
|
|
ipmitool lan set 1 netmask 255.255.255.0
|
|
ipmitool lan set 1 defgw ipaddr 10.0.10.1
|
|
|
|
# Change admin password
|
|
ipmitool user set password 2 <new_password>
|
|
|
|
# Enable SOL (Serial-over-LAN)
|
|
ipmitool sol set enabled true 1
|
|
ipmitool sol set volatile-bit-rate 115.2 1
|
|
```
|
|
|
|
### Lenovo ThinkSystem (XCC)
|
|
|
|
**Access BIOS:**
|
|
1. Power on server
|
|
2. Press F1 during POST
|
|
3. Navigate with arrow keys, F10 to save
|
|
|
|
**PXE Boot Configuration:**
|
|
|
|
```
|
|
System Settings → Operating Modes:
|
|
Boot Mode: UEFI Mode
|
|
|
|
System Settings → Devices and I/O Ports → Network:
|
|
Network 1 Boot Agent: Enabled
|
|
|
|
Startup → Primary Boot Sequence:
|
|
1. Network 1 (UEFI)
|
|
2. SATA Hard Drive
|
|
```
|
|
|
|
**Performance Settings:**
|
|
```
|
|
System Settings → Processors:
|
|
Intel Hyper-Threading Technology: Enabled
|
|
Intel Virtualization Technology: Enabled
|
|
|
|
System Settings → Power:
|
|
Power Performance Bias: Maximum Performance
|
|
```
|
|
|
|
**Disable Secure Boot:**
|
|
```
|
|
Security → Secure Boot:
|
|
Secure Boot: Disabled
|
|
```
|
|
|
|
**XCC Configuration (via XCC web interface):**
|
|
```
|
|
BMC Configuration → Network:
|
|
Interface: Dedicated
|
|
IP Configuration: Static
|
|
IP Address: 10.0.10.50
|
|
Subnet Mask: 255.255.255.0
|
|
Gateway: 10.0.10.1
|
|
|
|
BMC Configuration → User/LDAP:
|
|
Change USERID password: <strong password>
|
|
|
|
Remote Control → Remote Console & Media:
|
|
Remote Console: Enabled
|
|
Console Type: HTML5
|
|
```
|
|
|
|
## Known Issues and Workarounds
|
|
|
|
### Issue 1: Dell R640 - PXE Boot Loops After Installation
|
|
|
|
**Symptom:** After successful installation, server continues to boot from network instead of disk.
|
|
|
|
**Cause:** Boot order not updated after installation.
|
|
|
|
**Workaround:**
|
|
1. Via iDRAC, set boot order: Disk → Network
|
|
2. Or via racadm:
|
|
```bash
|
|
racadm set BIOS.BiosBootSettings.BootSeq HardDisk.List.1-1,Nic.Embedded.1-1-1
|
|
racadm jobqueue create BIOS.Setup.1-1
|
|
```
|
|
|
|
### Issue 2: HPE DL360 - Slow TFTP Downloads
|
|
|
|
**Symptom:** iPXE bootloader download takes >5 minutes over TFTP.
|
|
|
|
**Cause:** HPE UEFI firmware has slow TFTP implementation.
|
|
|
|
**Workaround:**
|
|
1. Use HTTP Boot instead of TFTP (requires UEFI 2.5+):
|
|
- DHCP Option 67: `http://10.0.100.10:8080/boot/ipxe/ipxe.efi`
|
|
2. Or enable chainloading: TFTP → iPXE → HTTP for rest
|
|
|
|
### Issue 3: Supermicro - BMC Not Accessible After Install
|
|
|
|
**Symptom:** Cannot access IPMI web interface after NixOS installation.
|
|
|
|
**Cause:** NixOS default firewall blocks BMC network.
|
|
|
|
**Workaround:**
|
|
Add firewall rule to allow BMC subnet:
|
|
```nix
|
|
networking.firewall.extraCommands = ''
|
|
iptables -A INPUT -s 10.0.10.0/24 -j ACCEPT
|
|
'';
|
|
```
|
|
|
|
### Issue 4: Lenovo ThinkSystem - NIC Not Recognized in Installer
|
|
|
|
**Symptom:** Network interface not detected during PXE boot (models 2018-2019).
|
|
|
|
**Cause:** Broadcom NIC requires proprietary driver not in default kernel.
|
|
|
|
**Workaround:**
|
|
1. Update NIC firmware to latest version
|
|
2. Or use Intel NIC add-on card (X540-T2)
|
|
3. Or include Broadcom driver in netboot image:
|
|
```nix
|
|
boot.kernelModules = [ "bnxt_en" ];
|
|
```
|
|
|
|
### Issue 5: Secure Boot Prevents PXE Boot
|
|
|
|
**Symptom:** Server shows "Secure Boot Violation" and refuses to boot.
|
|
|
|
**Cause:** Secure Boot is enabled, but iPXE bootloader is not signed.
|
|
|
|
**Workaround:**
|
|
1. Disable Secure Boot in BIOS/UEFI (see vendor sections above)
|
|
2. Or sign iPXE bootloader with your own key (advanced)
|
|
|
|
### Issue 6: Missing Disk After Boot
|
|
|
|
**Symptom:** NixOS installer cannot find disk (`/dev/sda` not found).
|
|
|
|
**Cause:** NVMe disk has different device name (`/dev/nvme0n1`).
|
|
|
|
**Workaround:**
|
|
Update disko configuration:
|
|
```nix
|
|
{ disks ? [ "/dev/nvme0n1" ], ... }: # Changed from /dev/sda
|
|
{
|
|
disko.devices = {
|
|
disk.main.device = builtins.head disks;
|
|
# ...
|
|
};
|
|
}
|
|
```
|
|
|
|
### Issue 7: RAID Controller Hides Disks
|
|
|
|
**Symptom:** Disks not visible to OS, only RAID volumes shown.
|
|
|
|
**Cause:** RAID controller in RAID mode, not HBA/AHCI mode.
|
|
|
|
**Workaround:**
|
|
1. Enter RAID controller BIOS (Ctrl+R for Dell PERC, Ctrl+P for HPE Smart Array)
|
|
2. Switch to HBA mode or AHCI mode
|
|
3. Or configure RAID0 volumes for each disk (not recommended)
|
|
|
|
### Issue 8: Network Speed Limited to 100 Mbps
|
|
|
|
**Symptom:** PXE boot and installation extremely slow.
|
|
|
|
**Cause:** Auto-negotiation failure, NIC negotiated 100 Mbps instead of 1 Gbps.
|
|
|
|
**Workaround:**
|
|
1. Check network cable (must be Cat5e or better)
|
|
2. Update NIC firmware
|
|
3. Force 1 Gbps in BIOS network settings
|
|
4. Or configure switch port to force 1 Gbps
|
|
|
|
## Hardware-Specific NixOS Modules
|
|
|
|
### Dell PowerEdge Module
|
|
|
|
```nix
|
|
# nix/modules/hardware/dell-poweredge.nix
|
|
{ config, lib, pkgs, modulesPath, ... }:
|
|
|
|
{
|
|
imports = [ (modulesPath + "/installer/scan/not-detected.nix") ];
|
|
|
|
# Dell-specific kernel modules
|
|
boot.initrd.availableKernelModules = [
|
|
"ahci" "xhci_pci" "nvme" "usbhid" "usb_storage" "sd_mod" "sr_mod"
|
|
"megaraid_sas" # Dell PERC RAID controller
|
|
];
|
|
|
|
boot.kernelModules = [ "kvm-intel" ]; # or "kvm-amd" for AMD
|
|
|
|
# Dell OMSA (OpenManage Server Administrator) - optional
|
|
services.opensmtpd.enable = false; # Disable if using OMSA alerts
|
|
|
|
# Enable sensors for monitoring
|
|
hardware.enableRedistributableFirmware = true;
|
|
boot.kernelModules = [ "coretemp" "dell_smm_hwmon" ];
|
|
|
|
# iDRAC serial console
|
|
boot.kernelParams = [ "console=tty0" "console=ttyS1,115200n8" ];
|
|
|
|
# Predictable network interface names (Dell uses eno1, eno2)
|
|
networking.usePredictableInterfaceNames = true;
|
|
|
|
# CPU microcode updates
|
|
hardware.cpu.intel.updateMicrocode = true;
|
|
|
|
nixpkgs.hostPlatform = "x86_64-linux";
|
|
}
|
|
```
|
|
|
|
### HPE ProLiant Module
|
|
|
|
```nix
|
|
# nix/modules/hardware/hpe-proliant.nix
|
|
{ config, lib, pkgs, modulesPath, ... }:
|
|
|
|
{
|
|
imports = [ (modulesPath + "/installer/scan/not-detected.nix") ];
|
|
|
|
# HPE-specific kernel modules
|
|
boot.initrd.availableKernelModules = [
|
|
"ahci" "xhci_pci" "nvme" "usbhid" "usb_storage" "sd_mod"
|
|
"hpsa" # HPE Smart Array controller
|
|
];
|
|
|
|
boot.kernelModules = [ "kvm-intel" ];
|
|
|
|
# Enable HPE health monitoring
|
|
boot.kernelModules = [ "hpilo" ];
|
|
|
|
# iLO serial console
|
|
boot.kernelParams = [ "console=tty0" "console=ttyS0,115200n8" ];
|
|
|
|
# HPE NICs (often use hpenet driver)
|
|
networking.usePredictableInterfaceNames = true;
|
|
|
|
# CPU microcode
|
|
hardware.cpu.intel.updateMicrocode = true;
|
|
|
|
nixpkgs.hostPlatform = "x86_64-linux";
|
|
}
|
|
```
|
|
|
|
### Supermicro Module
|
|
|
|
```nix
|
|
# nix/modules/hardware/supermicro.nix
|
|
{ config, lib, pkgs, modulesPath, ... }:
|
|
|
|
{
|
|
imports = [ (modulesPath + "/installer/scan/not-detected.nix") ];
|
|
|
|
# Supermicro-specific kernel modules
|
|
boot.initrd.availableKernelModules = [
|
|
"ahci" "xhci_pci" "nvme" "usbhid" "usb_storage" "sd_mod"
|
|
"mpt3sas" # LSI/Broadcom HBA (common in Supermicro)
|
|
];
|
|
|
|
boot.kernelModules = [ "kvm-intel" ];
|
|
|
|
# IPMI watchdog (optional, for automatic recovery)
|
|
boot.kernelModules = [ "ipmi_devintf" "ipmi_si" "ipmi_watchdog" ];
|
|
|
|
# Serial console for IPMI SOL
|
|
boot.kernelParams = [ "console=tty0" "console=ttyS1,115200n8" ];
|
|
|
|
# Supermicro often uses Intel NICs
|
|
networking.usePredictableInterfaceNames = true;
|
|
|
|
# CPU microcode
|
|
hardware.cpu.intel.updateMicrocode = true;
|
|
|
|
nixpkgs.hostPlatform = "x86_64-linux";
|
|
}
|
|
```
|
|
|
|
### Usage Example
|
|
|
|
```nix
|
|
# In node configuration
|
|
{ config, pkgs, lib, ... }:
|
|
|
|
{
|
|
imports = [
|
|
../../profiles/control-plane.nix
|
|
../../common/base.nix
|
|
../../hardware/dell-poweredge.nix # Import hardware-specific module
|
|
./disko.nix
|
|
];
|
|
|
|
# Rest of configuration...
|
|
}
|
|
```
|
|
|
|
## BMC/IPMI Command Reference
|
|
|
|
### Dell iDRAC Commands
|
|
|
|
**Power Control:**
|
|
```bash
|
|
# Power on
|
|
racadm serveraction powerup
|
|
|
|
# Power off (graceful)
|
|
racadm serveraction powerdown
|
|
|
|
# Power cycle
|
|
racadm serveraction powercycle
|
|
|
|
# Force power off
|
|
racadm serveraction hardreset
|
|
|
|
# Get power status
|
|
racadm serveraction powerstatus
|
|
```
|
|
|
|
**Boot Device:**
|
|
```bash
|
|
# Set next boot to PXE
|
|
racadm set iDRAC.ServerBoot.FirstBootDevice PXE
|
|
|
|
# Set next boot to disk
|
|
racadm set iDRAC.ServerBoot.FirstBootDevice HDD
|
|
|
|
# Set boot order permanently
|
|
racadm set BIOS.BiosBootSettings.BootSeq Nic.Embedded.1-1-1,HardDisk.List.1-1
|
|
```
|
|
|
|
**Remote Console:**
|
|
```bash
|
|
# Via web: https://<idrac-ip>/console
|
|
# Via racadm: Not directly supported, use web interface
|
|
```
|
|
|
|
**System Information:**
|
|
```bash
|
|
# Get system info
|
|
racadm getsysinfo
|
|
|
|
# Get sensor readings
|
|
racadm getsensorinfo
|
|
|
|
# Get event log
|
|
racadm getsel
|
|
```
|
|
|
|
### HPE iLO Commands (via hponcfg or SSH)
|
|
|
|
**Power Control:**
|
|
```bash
|
|
# Via SSH to iLO
|
|
power on
|
|
power off
|
|
power reset
|
|
|
|
# Via ipmitool
|
|
ipmitool -I lanplus -H <ilo-ip> -U admin -P password chassis power on
|
|
ipmitool -I lanplus -H <ilo-ip> -U admin -P password chassis power off
|
|
ipmitool -I lanplus -H <ilo-ip> -U admin -P password chassis power cycle
|
|
```
|
|
|
|
**Boot Device:**
|
|
```bash
|
|
# Via SSH to iLO
|
|
set /system1/bootconfig1/bootsource5 bootorder=1 # Network
|
|
set /system1/bootconfig1/bootsource1 bootorder=1 # Disk
|
|
|
|
# Via ipmitool
|
|
ipmitool -I lanplus -H <ilo-ip> -U admin chassis bootdev pxe
|
|
ipmitool -I lanplus -H <ilo-ip> -U admin chassis bootdev disk
|
|
```
|
|
|
|
**Remote Console:**
|
|
```bash
|
|
# Via web: https://<ilo-ip>/html5console
|
|
# Via SSH: Not directly supported, use web interface
|
|
```
|
|
|
|
**System Information:**
|
|
```bash
|
|
# Via SSH to iLO
|
|
show /system1
|
|
show /system1/oemhp_powerreg1
|
|
show /map1/elog1
|
|
|
|
# Via ipmitool
|
|
ipmitool -I lanplus -H <ilo-ip> -U admin sdr list
|
|
ipmitool -I lanplus -H <ilo-ip> -U admin sel list
|
|
```
|
|
|
|
### Supermicro IPMI Commands
|
|
|
|
**Power Control:**
|
|
```bash
|
|
# Power on
|
|
ipmitool -I lanplus -H <ipmi-ip> -U ADMIN -P ADMIN chassis power on
|
|
|
|
# Power off (graceful)
|
|
ipmitool -I lanplus -H <ipmi-ip> -U ADMIN chassis power soft
|
|
|
|
# Power off (force)
|
|
ipmitool -I lanplus -H <ipmi-ip> -U ADMIN chassis power off
|
|
|
|
# Power cycle
|
|
ipmitool -I lanplus -H <ipmi-ip> -U ADMIN chassis power cycle
|
|
|
|
# Get power status
|
|
ipmitool -I lanplus -H <ipmi-ip> -U ADMIN chassis power status
|
|
```
|
|
|
|
**Boot Device:**
|
|
```bash
|
|
# Set next boot to PXE
|
|
ipmitool -I lanplus -H <ipmi-ip> -U ADMIN chassis bootdev pxe
|
|
|
|
# Set next boot to disk
|
|
ipmitool -I lanplus -H <ipmi-ip> -U ADMIN chassis bootdev disk
|
|
|
|
# Set persistent (apply to all future boots)
|
|
ipmitool -I lanplus -H <ipmi-ip> -U ADMIN chassis bootdev pxe options=persistent
|
|
```
|
|
|
|
**Remote Console:**
|
|
```bash
|
|
# Web-based KVM: https://<ipmi-ip> (requires Java or HTML5)
|
|
|
|
# Serial-over-LAN (SOL)
|
|
ipmitool -I lanplus -H <ipmi-ip> -U ADMIN sol activate
|
|
# Press ~. to exit SOL session
|
|
```
|
|
|
|
**System Information:**
|
|
```bash
|
|
# Get sensor readings
|
|
ipmitool -I lanplus -H <ipmi-ip> -U ADMIN sdr list
|
|
|
|
# Get system event log
|
|
ipmitool -I lanplus -H <ipmi-ip> -U ADMIN sel list
|
|
|
|
# Get FRU information
|
|
ipmitool -I lanplus -H <ipmi-ip> -U ADMIN fru print
|
|
|
|
# Get BMC info
|
|
ipmitool -I lanplus -H <ipmi-ip> -U ADMIN bmc info
|
|
```
|
|
|
|
### Lenovo XCC Commands (via ipmitool or web)
|
|
|
|
**Power Control:**
|
|
```bash
|
|
# Power on/off/cycle (same as standard IPMI)
|
|
ipmitool -I lanplus -H <xcc-ip> -U USERID -P PASSW0RD chassis power on
|
|
ipmitool -I lanplus -H <xcc-ip> -U USERID chassis power off
|
|
ipmitool -I lanplus -H <xcc-ip> -U USERID chassis power cycle
|
|
```
|
|
|
|
**Boot Device:**
|
|
```bash
|
|
# Set boot device (same as standard IPMI)
|
|
ipmitool -I lanplus -H <xcc-ip> -U USERID chassis bootdev pxe
|
|
ipmitool -I lanplus -H <xcc-ip> -U USERID chassis bootdev disk
|
|
```
|
|
|
|
**Remote Console:**
|
|
```bash
|
|
# Web-based: https://<xcc-ip>/console
|
|
# SOL: Same as standard IPMI
|
|
ipmitool -I lanplus -H <xcc-ip> -U USERID sol activate
|
|
```
|
|
|
|
### Batch Operations
|
|
|
|
**Power on all nodes:**
|
|
```bash
|
|
#!/bin/bash
|
|
# /srv/provisioning/scripts/power-on-all.sh
|
|
|
|
BMC_IPS=("10.0.10.50" "10.0.10.51" "10.0.10.52")
|
|
BMC_USER="admin"
|
|
BMC_PASS="password"
|
|
|
|
for ip in "${BMC_IPS[@]}"; do
|
|
echo "Powering on $ip..."
|
|
ipmitool -I lanplus -H $ip -U $BMC_USER -P $BMC_PASS \
|
|
chassis bootdev pxe options=persistent
|
|
ipmitool -I lanplus -H $ip -U $BMC_USER -P $BMC_PASS \
|
|
chassis power on
|
|
done
|
|
```
|
|
|
|
**Check power status all nodes:**
|
|
```bash
|
|
#!/bin/bash
|
|
for ip in 10.0.10.{50..52}; do
|
|
echo -n "$ip: "
|
|
ipmitool -I lanplus -H $ip -U admin -P password \
|
|
chassis power status
|
|
done
|
|
```
|
|
|
|
## Hardware Recommendations
|
|
|
|
### Minimum Production Hardware (Per Node)
|
|
|
|
**Control Plane:**
|
|
- CPU: Intel Xeon Silver 4208 (8C/16T) or AMD EPYC 7252 (8C/16T)
|
|
- RAM: 32 GB DDR4 ECC (4x 8GB, 2666 MHz)
|
|
- Storage: 500 GB NVMe SSD (Intel P4510 or Samsung PM983)
|
|
- Network: Intel X540-T2 (2x 10GbE)
|
|
- PSU: Dual redundant 550W
|
|
- Form Factor: 1U or 2U
|
|
|
|
**Worker:**
|
|
- CPU: Intel Xeon Silver 4214 (12C/24T) or AMD EPYC 7302 (16C/32T)
|
|
- RAM: 64 GB DDR4 ECC (4x 16GB, 2666 MHz)
|
|
- Storage: 1 TB NVMe SSD (Intel P4610 or Samsung PM983)
|
|
- Network: Mellanox ConnectX-5 (2x 25GbE) or Intel XXV710 (2x 25GbE)
|
|
- PSU: Dual redundant 750W
|
|
- Form Factor: 1U or 2U
|
|
|
|
### Recommended Production Hardware (Per Node)
|
|
|
|
**Control Plane:**
|
|
- CPU: Intel Xeon Gold 5218 (16C/32T) or AMD EPYC 7402 (24C/48T)
|
|
- RAM: 128 GB DDR4 ECC (8x 16GB, 2933 MHz)
|
|
- Storage: 1 TB NVMe SSD, RAID1 (2x Intel P5510 or Samsung PM9A3)
|
|
- Network: Mellanox ConnectX-6 (2x 25GbE or 2x 100GbE)
|
|
- PSU: Dual redundant 800W Titanium
|
|
- Form Factor: 2U
|
|
|
|
**Worker:**
|
|
- CPU: Intel Xeon Gold 6226 (12C/24T) or AMD EPYC 7542 (32C/64T)
|
|
- RAM: 256 GB DDR4 ECC (8x 32GB, 2933 MHz)
|
|
- Storage: 2 TB NVMe SSD (Intel P5510 or Samsung PM9A3)
|
|
- Network: Mellanox ConnectX-6 (2x 100GbE) or Intel E810 (2x 100GbE)
|
|
- GPU: Optional (NVIDIA A40 or AMD Instinct MI50 for ML workloads)
|
|
- PSU: Dual redundant 1200W Titanium
|
|
- Form Factor: 2U or 4U (for GPU)
|
|
|
|
### Network Interface Card (NIC) Recommendations
|
|
|
|
| Vendor | Model | Speed | Linux Support | Notes |
|
|
|----------|--------------|-----------|---------------|----------------------------|
|
|
| Intel | X540-T2 | 2x 10GbE | Excellent | Best for copper |
|
|
| Intel | X710-DA2 | 2x 10GbE | Excellent | Best for fiber (SFP+) |
|
|
| Intel | XXV710-DA2 | 2x 25GbE | Excellent | Good price/performance |
|
|
| Intel | E810-CQDA2 | 2x 100GbE | Excellent | Latest generation |
|
|
| Mellanox | ConnectX-5 | 2x 25GbE | Excellent | RDMA support (RoCE) |
|
|
| Mellanox | ConnectX-6 | 2x 100GbE | Excellent | Best performance, RDMA |
|
|
| Broadcom | BCM57810 | 2x 10GbE | Good | Common in OEM servers |
|
|
|
|
**Avoid:** Realtek NICs (poor Linux support, performance issues)
|
|
|
|
### Storage Recommendations
|
|
|
|
**NVMe SSDs (Recommended):**
|
|
- Intel P4510, P4610, P5510 series (data center grade)
|
|
- Samsung PM983, PM9A3 series (enterprise)
|
|
- Micron 7300, 7400 series (enterprise)
|
|
- Western Digital SN640, SN840 series (data center)
|
|
|
|
**SATA SSDs (Budget Option):**
|
|
- Intel S4510, S4610 series
|
|
- Samsung 883 DCT series
|
|
- Crucial MX500 (consumer, but reliable)
|
|
|
|
**Avoid:**
|
|
- Consumer-grade NVMe (Samsung 970 EVO, etc.) for production
|
|
- QLC NAND for write-heavy workloads
|
|
- Unknown brands with poor endurance ratings
|
|
|
|
---
|
|
|
|
**Document End**
|