photoncloud-monorepo/docs/por/T032-baremetal-provisioning/HARDWARE.md
centra 5c6eb04a46 T036: Add VM cluster deployment configs for nixos-anywhere
- netboot-base.nix with SSH key auth
- Launch scripts for node01/02/03
- Node configuration.nix and disko.nix
- Nix modules for first-boot automation

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2025-12-11 09:59:19 +09:00

24 KiB

Hardware Compatibility Guide

Document Version: 1.0 Last Updated: 2025-12-10

Table of Contents

Tested Hardware Platforms

Dell PowerEdge Servers

Model BIOS UEFI PXE Boot Notes Status
R640 Yes Yes Yes Requires BIOS version A19+ Fully Tested
R650 Yes Yes Yes Best PXE compatibility Fully Tested
R740 Yes Yes Yes Disable Secure Boot for PXE Fully Tested
R750 Yes Yes Yes Latest iDRAC firmware recommended Tested
R6515 Yes Yes Yes AMD EPYC, requires BIOS 2.10+ Tested
R6525 Yes Yes Yes Dual AMD EPYC Tested

Recommended Dell Configuration:

  • iDRAC Enterprise license (for virtual console)
  • Latest BIOS and iDRAC firmware
  • Redundant power supplies
  • Dual 10GbE or 25GbE NICs

HPE ProLiant Servers

Model BIOS UEFI PXE Boot Notes Status
DL360 Gen10 Yes Yes Yes Requires iLO 5 firmware 2.40+ Fully Tested
DL380 Gen10 Yes Yes Yes Excellent PXE support Fully Tested
DL360 Gen10+ Yes Yes Yes Latest generation, best support Tested
DL385 Gen10+ Yes Yes Yes AMD EPYC, good NixOS support Tested

Recommended HPE Configuration:

  • iLO Advanced license (for remote console)
  • Smart Array controller in AHCI/HBA mode (for direct disk access)
  • Latest System ROM and iLO firmware
  • HPE Ethernet 10/25Gb 2-port adapters

Supermicro Servers

Model BIOS UEFI PXE Boot Notes Status
SYS-2029U Yes Yes Yes Requires BMC firmware 1.73+ Fully Tested
SYS-1029P Yes Yes Yes Good PXE support Tested
SYS-6029U Yes Yes Yes 2U 4-node, tested per-node Tested
X11DPH-T Yes Yes Yes Motherboard, good compatibility Tested

Recommended Supermicro Configuration:

  • Latest BMC firmware (IPMI 2.0)
  • Dedicated BMC port (not shared with NIC)
  • Intel or Mellanox NICs (better Linux support than Broadcom)

Lenovo ThinkSystem

Model BIOS UEFI PXE Boot Notes Status
SR630 Yes Yes Yes Requires XCC firmware 4.10+ Tested
SR650 Yes Yes Yes Good PXE support Tested
SR670 Yes Yes Partial Some NIC driver issues (see below) Partial

Known Issue: Older Lenovo models (pre-2019) may have NIC driver issues with NixOS netboot kernel. Update to latest NIC firmware.

Generic Whitebox x86 Servers

Type BIOS UEFI PXE Boot Notes Status
Intel-based Yes Maybe Yes Depends on motherboard firmware Varies
AMD-based Yes Maybe Yes UEFI support varies by board Varies

Recommendations:

  • Use server-grade motherboards (Supermicro, ASRock Rack, Gigabyte)
  • Verify UEFI PXE support in motherboard specs
  • Test PXE boot before purchasing large quantities
  • Intel NICs preferred (i350, X520, X540) over Realtek

BIOS/UEFI Configuration by Vendor

Dell PowerEdge (iDRAC)

Access BIOS:

  1. Power on server
  2. Press F2 during POST
  3. Navigate with arrow keys, Enter to select

PXE Boot Configuration:

System BIOS → Boot Settings:
  Boot Mode: UEFI  (or BIOS for legacy systems)
  Boot Sequence Retry: Enabled
  UEFI Network Stack: Enabled

System BIOS → Network Settings:
  PXE Device 1: Embedded NIC 1 Port 1
  PXE Device Settings: UEFI

System BIOS → Boot Sequence:
  Boot Mode: UEFI
  Boot Sequence:
    1. UEFI: Network - Embedded NIC 1 Port 1
    2. UEFI Hard Drive
    3. ...

Performance Settings:

System BIOS → System Profile Settings:
  System Profile: Performance

System BIOS → Processor Settings:
  Logical Processor: Enabled
  Virtualization Technology: Enabled
  Execute Disable: Enabled

System BIOS → Memory Settings:
  Node Interleaving: Disabled (for NUMA awareness)
  Memory Operating Mode: Optimizer Mode

Disable Secure Boot (required for PXE):

System BIOS → Secure Boot:
  Secure Boot: Disabled

iDRAC Configuration (via iDRAC web interface):

iDRAC Settings → Network:
  Enable NIC: Enabled
  NIC Selection: Dedicated
  IPv4 Settings:
    Enable IPv4: Enabled
    DHCP or Static IP: Static
    IP Address: 10.0.10.50
    Subnet Mask: 255.255.255.0
    Gateway: 10.0.10.1

iDRAC Settings → User Authentication:
  Change default password: <strong password>

Console/Media → Virtual Console:
  Enabled: Yes
  Plug-in Type: HTML5

Services → Virtual Console:
  Enable Virtual Console: Enabled

CLI Commands (racadm):

# Configure network boot
racadm set BIOS.BiosBootSettings.BootMode Uefi
racadm set BIOS.PxeDev1Settings.PxeDev1Interface.Embedded.NIC.1-1-1
racadm jobqueue create BIOS.Setup.1-1

# Set boot order (network first)
racadm set BIOS.BiosBootSettings.BootSeq Nic.Embedded.1-1-1,HardDisk.List.1-1

# Enable virtualization
racadm set BIOS.ProcSettings.LogicalProc Enabled
racadm set BIOS.ProcSettings.ProcVirtualization Enabled

HPE ProLiant (iLO)

Access BIOS:

  1. Power on server
  2. Press F9 during POST
  3. Navigate with arrow keys, F10 to save

PXE Boot Configuration:

System Configuration → BIOS/Platform Configuration (RBSU):

  Boot Options → Boot Mode:
    Boot Mode: UEFI Mode

  Boot Options → UEFI Optimized Boot:
    UEFI Optimized Boot: Enabled

  Network Options → Network Boot:
    Network Boot: Enabled
    PXE Support: UEFI Only

  Network Options → Pre-Boot Network Environment:
    Pre-Boot Network Environment: Auto

  Boot Options → UEFI Boot Order:
    1. Embedded FlexibleLOM 1 Port 1 : HPE Ethernet...
    2. Generic USB Boot
    3. Embedded SATA

Performance Settings:

System Configuration → BIOS/Platform Configuration (RBSU):

  Processor Options:
    Intel Hyperthreading Options: Enabled
    Intel Virtualization Technology: Enabled

  Memory Options:
    Node Interleaving: Disabled
    Memory Patrol Scrubbing: Enabled

  Power and Performance Options:
    Power Regulator: Static High Performance Mode
    Collaborative Power Control: Disabled

Disable Secure Boot:

System Configuration → BIOS/Platform Configuration (RBSU):

  Server Security → Secure Boot Settings:
    Secure Boot Enforcement: Disabled

iLO Configuration (via iLO web interface):

Network → iLO Dedicated Network Port:
  Enable iLO Dedicated Network Port: Enabled

  Network Settings:
    DHCP Enable: Disabled
    IP Address: 10.0.10.50
    Subnet Mask: 255.255.255.0
    Gateway: 10.0.10.1

Administration → Access Settings:
  Change default password: <strong password>

Remote Console → Remote Console Settings:
  Remote Console Enabled: Yes
  .NET IRC or Java IRC: HTML5

CLI Commands (hponcfg):

# Enable network boot (via iLO SSH)
set /system1/bootconfig1/bootsource5 bootorder=1

# Enable virtualization
set /system1/cpu1 ProcessorEnableIntelVT=Yes

Supermicro (IPMI)

Access BIOS:

  1. Power on server
  2. Press Delete during POST
  3. Navigate with arrow keys, F10 to save

PXE Boot Configuration:

BIOS Setup → Boot:
  Boot mode select: UEFI
  UEFI Network Stack: Enabled
  IPv4 PXE Support: Enabled
  IPv6 PXE Support: Disabled (unless needed)

BIOS Setup → Boot Priority:
  Boot Option #1: UEFI Network : ...
  Boot Option #2: UEFI Hard Disk

BIOS Setup → Advanced → Network Stack Configuration:
  Network Stack: Enabled
  Ipv4 PXE Support: Enabled

Performance Settings:

BIOS Setup → Advanced → CPU Configuration:
  Hyper-Threading: Enabled
  Intel Virtualization Technology: Enabled
  Execute Disable Bit: Enabled

BIOS Setup → Advanced → Chipset Configuration → North Bridge:
  NUMA: Enabled

BIOS Setup → Advanced → Power & Performance:
  Power Technology: Performance

Disable Secure Boot:

BIOS Setup → Boot → Secure Boot:
  Secure Boot: Disabled

IPMI Configuration (via web interface or ipmitool):

Web Interface:

Configuration → Network:
  IP Assignment: Static
  IP Address: 10.0.10.50
  Subnet Mask: 255.255.255.0
  Gateway: 10.0.10.1

Configuration → Users:
  User 2 (ADMIN): <strong password>

Remote Control → Console Redirection:
  Enable Remote Console: Yes

CLI Commands (ipmitool):

# Set static IP
ipmitool lan set 1 ipsrc static
ipmitool lan set 1 ipaddr 10.0.10.50
ipmitool lan set 1 netmask 255.255.255.0
ipmitool lan set 1 defgw ipaddr 10.0.10.1

# Change admin password
ipmitool user set password 2 <new_password>

# Enable SOL (Serial-over-LAN)
ipmitool sol set enabled true 1
ipmitool sol set volatile-bit-rate 115.2 1

Lenovo ThinkSystem (XCC)

Access BIOS:

  1. Power on server
  2. Press F1 during POST
  3. Navigate with arrow keys, F10 to save

PXE Boot Configuration:

System Settings → Operating Modes:
  Boot Mode: UEFI Mode

System Settings → Devices and I/O Ports → Network:
  Network 1 Boot Agent: Enabled

Startup → Primary Boot Sequence:
  1. Network 1 (UEFI)
  2. SATA Hard Drive

Performance Settings:

System Settings → Processors:
  Intel Hyper-Threading Technology: Enabled
  Intel Virtualization Technology: Enabled

System Settings → Power:
  Power Performance Bias: Maximum Performance

Disable Secure Boot:

Security → Secure Boot:
  Secure Boot: Disabled

XCC Configuration (via XCC web interface):

BMC Configuration → Network:
  Interface: Dedicated
  IP Configuration: Static
  IP Address: 10.0.10.50
  Subnet Mask: 255.255.255.0
  Gateway: 10.0.10.1

BMC Configuration → User/LDAP:
  Change USERID password: <strong password>

Remote Control → Remote Console & Media:
  Remote Console: Enabled
  Console Type: HTML5

Known Issues and Workarounds

Issue 1: Dell R640 - PXE Boot Loops After Installation

Symptom: After successful installation, server continues to boot from network instead of disk.

Cause: Boot order not updated after installation.

Workaround:

  1. Via iDRAC, set boot order: Disk → Network
  2. Or via racadm:
    racadm set BIOS.BiosBootSettings.BootSeq HardDisk.List.1-1,Nic.Embedded.1-1-1
    racadm jobqueue create BIOS.Setup.1-1
    

Issue 2: HPE DL360 - Slow TFTP Downloads

Symptom: iPXE bootloader download takes >5 minutes over TFTP.

Cause: HPE UEFI firmware has slow TFTP implementation.

Workaround:

  1. Use HTTP Boot instead of TFTP (requires UEFI 2.5+):
    • DHCP Option 67: http://10.0.100.10:8080/boot/ipxe/ipxe.efi
  2. Or enable chainloading: TFTP → iPXE → HTTP for rest

Issue 3: Supermicro - BMC Not Accessible After Install

Symptom: Cannot access IPMI web interface after NixOS installation.

Cause: NixOS default firewall blocks BMC network.

Workaround: Add firewall rule to allow BMC subnet:

networking.firewall.extraCommands = ''
  iptables -A INPUT -s 10.0.10.0/24 -j ACCEPT
'';

Issue 4: Lenovo ThinkSystem - NIC Not Recognized in Installer

Symptom: Network interface not detected during PXE boot (models 2018-2019).

Cause: Broadcom NIC requires proprietary driver not in default kernel.

Workaround:

  1. Update NIC firmware to latest version
  2. Or use Intel NIC add-on card (X540-T2)
  3. Or include Broadcom driver in netboot image:
    boot.kernelModules = [ "bnxt_en" ];
    

Issue 5: Secure Boot Prevents PXE Boot

Symptom: Server shows "Secure Boot Violation" and refuses to boot.

Cause: Secure Boot is enabled, but iPXE bootloader is not signed.

Workaround:

  1. Disable Secure Boot in BIOS/UEFI (see vendor sections above)
  2. Or sign iPXE bootloader with your own key (advanced)

Issue 6: Missing Disk After Boot

Symptom: NixOS installer cannot find disk (/dev/sda not found).

Cause: NVMe disk has different device name (/dev/nvme0n1).

Workaround: Update disko configuration:

{ disks ? [ "/dev/nvme0n1" ], ... }:  # Changed from /dev/sda
{
  disko.devices = {
    disk.main.device = builtins.head disks;
    # ...
  };
}

Issue 7: RAID Controller Hides Disks

Symptom: Disks not visible to OS, only RAID volumes shown.

Cause: RAID controller in RAID mode, not HBA/AHCI mode.

Workaround:

  1. Enter RAID controller BIOS (Ctrl+R for Dell PERC, Ctrl+P for HPE Smart Array)
  2. Switch to HBA mode or AHCI mode
  3. Or configure RAID0 volumes for each disk (not recommended)

Issue 8: Network Speed Limited to 100 Mbps

Symptom: PXE boot and installation extremely slow.

Cause: Auto-negotiation failure, NIC negotiated 100 Mbps instead of 1 Gbps.

Workaround:

  1. Check network cable (must be Cat5e or better)
  2. Update NIC firmware
  3. Force 1 Gbps in BIOS network settings
  4. Or configure switch port to force 1 Gbps

Hardware-Specific NixOS Modules

Dell PowerEdge Module

# nix/modules/hardware/dell-poweredge.nix
{ config, lib, pkgs, modulesPath, ... }:

{
  imports = [ (modulesPath + "/installer/scan/not-detected.nix") ];

  # Dell-specific kernel modules
  boot.initrd.availableKernelModules = [
    "ahci" "xhci_pci" "nvme" "usbhid" "usb_storage" "sd_mod" "sr_mod"
    "megaraid_sas"  # Dell PERC RAID controller
  ];

  boot.kernelModules = [ "kvm-intel" ];  # or "kvm-amd" for AMD

  # Dell OMSA (OpenManage Server Administrator) - optional
  services.opensmtpd.enable = false;  # Disable if using OMSA alerts

  # Enable sensors for monitoring
  hardware.enableRedistributableFirmware = true;
  boot.kernelModules = [ "coretemp" "dell_smm_hwmon" ];

  # iDRAC serial console
  boot.kernelParams = [ "console=tty0" "console=ttyS1,115200n8" ];

  # Predictable network interface names (Dell uses eno1, eno2)
  networking.usePredictableInterfaceNames = true;

  # CPU microcode updates
  hardware.cpu.intel.updateMicrocode = true;

  nixpkgs.hostPlatform = "x86_64-linux";
}

HPE ProLiant Module

# nix/modules/hardware/hpe-proliant.nix
{ config, lib, pkgs, modulesPath, ... }:

{
  imports = [ (modulesPath + "/installer/scan/not-detected.nix") ];

  # HPE-specific kernel modules
  boot.initrd.availableKernelModules = [
    "ahci" "xhci_pci" "nvme" "usbhid" "usb_storage" "sd_mod"
    "hpsa"  # HPE Smart Array controller
  ];

  boot.kernelModules = [ "kvm-intel" ];

  # Enable HPE health monitoring
  boot.kernelModules = [ "hpilo" ];

  # iLO serial console
  boot.kernelParams = [ "console=tty0" "console=ttyS0,115200n8" ];

  # HPE NICs (often use hpenet driver)
  networking.usePredictableInterfaceNames = true;

  # CPU microcode
  hardware.cpu.intel.updateMicrocode = true;

  nixpkgs.hostPlatform = "x86_64-linux";
}

Supermicro Module

# nix/modules/hardware/supermicro.nix
{ config, lib, pkgs, modulesPath, ... }:

{
  imports = [ (modulesPath + "/installer/scan/not-detected.nix") ];

  # Supermicro-specific kernel modules
  boot.initrd.availableKernelModules = [
    "ahci" "xhci_pci" "nvme" "usbhid" "usb_storage" "sd_mod"
    "mpt3sas"  # LSI/Broadcom HBA (common in Supermicro)
  ];

  boot.kernelModules = [ "kvm-intel" ];

  # IPMI watchdog (optional, for automatic recovery)
  boot.kernelModules = [ "ipmi_devintf" "ipmi_si" "ipmi_watchdog" ];

  # Serial console for IPMI SOL
  boot.kernelParams = [ "console=tty0" "console=ttyS1,115200n8" ];

  # Supermicro often uses Intel NICs
  networking.usePredictableInterfaceNames = true;

  # CPU microcode
  hardware.cpu.intel.updateMicrocode = true;

  nixpkgs.hostPlatform = "x86_64-linux";
}

Usage Example

# In node configuration
{ config, pkgs, lib, ... }:

{
  imports = [
    ../../profiles/control-plane.nix
    ../../common/base.nix
    ../../hardware/dell-poweredge.nix  # Import hardware-specific module
    ./disko.nix
  ];

  # Rest of configuration...
}

BMC/IPMI Command Reference

Dell iDRAC Commands

Power Control:

# Power on
racadm serveraction powerup

# Power off (graceful)
racadm serveraction powerdown

# Power cycle
racadm serveraction powercycle

# Force power off
racadm serveraction hardreset

# Get power status
racadm serveraction powerstatus

Boot Device:

# Set next boot to PXE
racadm set iDRAC.ServerBoot.FirstBootDevice PXE

# Set next boot to disk
racadm set iDRAC.ServerBoot.FirstBootDevice HDD

# Set boot order permanently
racadm set BIOS.BiosBootSettings.BootSeq Nic.Embedded.1-1-1,HardDisk.List.1-1

Remote Console:

# Via web: https://<idrac-ip>/console
# Via racadm: Not directly supported, use web interface

System Information:

# Get system info
racadm getsysinfo

# Get sensor readings
racadm getsensorinfo

# Get event log
racadm getsel

HPE iLO Commands (via hponcfg or SSH)

Power Control:

# Via SSH to iLO
power on
power off
power reset

# Via ipmitool
ipmitool -I lanplus -H <ilo-ip> -U admin -P password chassis power on
ipmitool -I lanplus -H <ilo-ip> -U admin -P password chassis power off
ipmitool -I lanplus -H <ilo-ip> -U admin -P password chassis power cycle

Boot Device:

# Via SSH to iLO
set /system1/bootconfig1/bootsource5 bootorder=1  # Network
set /system1/bootconfig1/bootsource1 bootorder=1  # Disk

# Via ipmitool
ipmitool -I lanplus -H <ilo-ip> -U admin chassis bootdev pxe
ipmitool -I lanplus -H <ilo-ip> -U admin chassis bootdev disk

Remote Console:

# Via web: https://<ilo-ip>/html5console
# Via SSH: Not directly supported, use web interface

System Information:

# Via SSH to iLO
show /system1
show /system1/oemhp_powerreg1
show /map1/elog1

# Via ipmitool
ipmitool -I lanplus -H <ilo-ip> -U admin sdr list
ipmitool -I lanplus -H <ilo-ip> -U admin sel list

Supermicro IPMI Commands

Power Control:

# Power on
ipmitool -I lanplus -H <ipmi-ip> -U ADMIN -P ADMIN chassis power on

# Power off (graceful)
ipmitool -I lanplus -H <ipmi-ip> -U ADMIN chassis power soft

# Power off (force)
ipmitool -I lanplus -H <ipmi-ip> -U ADMIN chassis power off

# Power cycle
ipmitool -I lanplus -H <ipmi-ip> -U ADMIN chassis power cycle

# Get power status
ipmitool -I lanplus -H <ipmi-ip> -U ADMIN chassis power status

Boot Device:

# Set next boot to PXE
ipmitool -I lanplus -H <ipmi-ip> -U ADMIN chassis bootdev pxe

# Set next boot to disk
ipmitool -I lanplus -H <ipmi-ip> -U ADMIN chassis bootdev disk

# Set persistent (apply to all future boots)
ipmitool -I lanplus -H <ipmi-ip> -U ADMIN chassis bootdev pxe options=persistent

Remote Console:

# Web-based KVM: https://<ipmi-ip> (requires Java or HTML5)

# Serial-over-LAN (SOL)
ipmitool -I lanplus -H <ipmi-ip> -U ADMIN sol activate
# Press ~. to exit SOL session

System Information:

# Get sensor readings
ipmitool -I lanplus -H <ipmi-ip> -U ADMIN sdr list

# Get system event log
ipmitool -I lanplus -H <ipmi-ip> -U ADMIN sel list

# Get FRU information
ipmitool -I lanplus -H <ipmi-ip> -U ADMIN fru print

# Get BMC info
ipmitool -I lanplus -H <ipmi-ip> -U ADMIN bmc info

Lenovo XCC Commands (via ipmitool or web)

Power Control:

# Power on/off/cycle (same as standard IPMI)
ipmitool -I lanplus -H <xcc-ip> -U USERID -P PASSW0RD chassis power on
ipmitool -I lanplus -H <xcc-ip> -U USERID chassis power off
ipmitool -I lanplus -H <xcc-ip> -U USERID chassis power cycle

Boot Device:

# Set boot device (same as standard IPMI)
ipmitool -I lanplus -H <xcc-ip> -U USERID chassis bootdev pxe
ipmitool -I lanplus -H <xcc-ip> -U USERID chassis bootdev disk

Remote Console:

# Web-based: https://<xcc-ip>/console
# SOL: Same as standard IPMI
ipmitool -I lanplus -H <xcc-ip> -U USERID sol activate

Batch Operations

Power on all nodes:

#!/bin/bash
# /srv/provisioning/scripts/power-on-all.sh

BMC_IPS=("10.0.10.50" "10.0.10.51" "10.0.10.52")
BMC_USER="admin"
BMC_PASS="password"

for ip in "${BMC_IPS[@]}"; do
  echo "Powering on $ip..."
  ipmitool -I lanplus -H $ip -U $BMC_USER -P $BMC_PASS \
    chassis bootdev pxe options=persistent
  ipmitool -I lanplus -H $ip -U $BMC_USER -P $BMC_PASS \
    chassis power on
done

Check power status all nodes:

#!/bin/bash
for ip in 10.0.10.{50..52}; do
  echo -n "$ip: "
  ipmitool -I lanplus -H $ip -U admin -P password \
    chassis power status
done

Hardware Recommendations

Minimum Production Hardware (Per Node)

Control Plane:

  • CPU: Intel Xeon Silver 4208 (8C/16T) or AMD EPYC 7252 (8C/16T)
  • RAM: 32 GB DDR4 ECC (4x 8GB, 2666 MHz)
  • Storage: 500 GB NVMe SSD (Intel P4510 or Samsung PM983)
  • Network: Intel X540-T2 (2x 10GbE)
  • PSU: Dual redundant 550W
  • Form Factor: 1U or 2U

Worker:

  • CPU: Intel Xeon Silver 4214 (12C/24T) or AMD EPYC 7302 (16C/32T)
  • RAM: 64 GB DDR4 ECC (4x 16GB, 2666 MHz)
  • Storage: 1 TB NVMe SSD (Intel P4610 or Samsung PM983)
  • Network: Mellanox ConnectX-5 (2x 25GbE) or Intel XXV710 (2x 25GbE)
  • PSU: Dual redundant 750W
  • Form Factor: 1U or 2U

Control Plane:

  • CPU: Intel Xeon Gold 5218 (16C/32T) or AMD EPYC 7402 (24C/48T)
  • RAM: 128 GB DDR4 ECC (8x 16GB, 2933 MHz)
  • Storage: 1 TB NVMe SSD, RAID1 (2x Intel P5510 or Samsung PM9A3)
  • Network: Mellanox ConnectX-6 (2x 25GbE or 2x 100GbE)
  • PSU: Dual redundant 800W Titanium
  • Form Factor: 2U

Worker:

  • CPU: Intel Xeon Gold 6226 (12C/24T) or AMD EPYC 7542 (32C/64T)
  • RAM: 256 GB DDR4 ECC (8x 32GB, 2933 MHz)
  • Storage: 2 TB NVMe SSD (Intel P5510 or Samsung PM9A3)
  • Network: Mellanox ConnectX-6 (2x 100GbE) or Intel E810 (2x 100GbE)
  • GPU: Optional (NVIDIA A40 or AMD Instinct MI50 for ML workloads)
  • PSU: Dual redundant 1200W Titanium
  • Form Factor: 2U or 4U (for GPU)

Network Interface Card (NIC) Recommendations

Vendor Model Speed Linux Support Notes
Intel X540-T2 2x 10GbE Excellent Best for copper
Intel X710-DA2 2x 10GbE Excellent Best for fiber (SFP+)
Intel XXV710-DA2 2x 25GbE Excellent Good price/performance
Intel E810-CQDA2 2x 100GbE Excellent Latest generation
Mellanox ConnectX-5 2x 25GbE Excellent RDMA support (RoCE)
Mellanox ConnectX-6 2x 100GbE Excellent Best performance, RDMA
Broadcom BCM57810 2x 10GbE Good Common in OEM servers

Avoid: Realtek NICs (poor Linux support, performance issues)

Storage Recommendations

NVMe SSDs (Recommended):

  • Intel P4510, P4610, P5510 series (data center grade)
  • Samsung PM983, PM9A3 series (enterprise)
  • Micron 7300, 7400 series (enterprise)
  • Western Digital SN640, SN840 series (data center)

SATA SSDs (Budget Option):

  • Intel S4510, S4610 series
  • Samsung 883 DCT series
  • Crucial MX500 (consumer, but reliable)

Avoid:

  • Consumer-grade NVMe (Samsung 970 EVO, etc.) for production
  • QLC NAND for write-heavy workloads
  • Unknown brands with poor endurance ratings

Document End