photoncloud-monorepo/baremetal/first-boot/README.md
centra 5c6eb04a46 T036: Add VM cluster deployment configs for nixos-anywhere
- netboot-base.nix with SSH key auth
- Launch scripts for node01/02/03
- Node configuration.nix and disko.nix
- Nix modules for first-boot automation

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2025-12-11 09:59:19 +09:00

19 KiB

First-Boot Automation for Bare-Metal Provisioning

Automated cluster joining and service initialization for bare-metal provisioned NixOS nodes.

Table of Contents

Overview

The first-boot automation system handles automated cluster joining for distributed services (Chainfire, FlareDB, IAM) on first boot of bare-metal provisioned nodes. It supports two modes:

  • Bootstrap Mode: Initialize a new Raft cluster (first 3 nodes)
  • Join Mode: Join an existing cluster (additional nodes)

Features

  • Automated health checking with retries
  • Idempotent operations (safe to run multiple times)
  • Structured JSON logging to journald
  • Graceful failure handling with configurable retries
  • Integration with TLS certificates (T031)
  • Support for both bootstrap and runtime join scenarios

Architecture

See ARCHITECTURE.md for detailed design documentation.

Quick Start

Prerequisites

  1. Node provisioned via T032.S1-S3 (PXE boot and installation)
  2. Cluster configuration file at /etc/nixos/secrets/cluster-config.json
  3. TLS certificates at /etc/nixos/secrets/ (T031)
  4. Network connectivity to cluster leader (for join mode)

Enable First-Boot Automation

In your NixOS configuration:

# /etc/nixos/configuration.nix
{
  imports = [
    ./nix/modules/first-boot-automation.nix
  ];

  services.first-boot-automation = {
    enable = true;
    configFile = "/etc/nixos/secrets/cluster-config.json";

    # Optional: disable specific services
    enableChainfire = true;
    enableFlareDB = true;
    enableIAM = true;
    enableHealthCheck = true;
  };
}

First Boot

After provisioning and reboot:

  1. Node boots from disk
  2. systemd starts services
  3. First-boot automation runs automatically
  4. Cluster join completes within 30-60 seconds

Check status:

systemctl status chainfire-cluster-join.service
systemctl status flaredb-cluster-join.service
systemctl status iam-initial-setup.service
systemctl status cluster-health-check.service

Configuration

cluster-config.json Format

{
  "node_id": "node01",
  "node_role": "control-plane",
  "bootstrap": true,
  "cluster_name": "prod-cluster",
  "leader_url": "https://node01.prod.example.com:2379",
  "raft_addr": "10.0.1.10:2380",
  "initial_peers": [
    "node01:2380",
    "node02:2380",
    "node03:2380"
  ],
  "flaredb_peers": [
    "node01:2480",
    "node02:2480",
    "node03:2480"
  ]
}

Required Fields

Field Type Description
node_id string Unique identifier for this node
node_role string Node role: control-plane, worker, or all-in-one
bootstrap boolean true for first 3 nodes, false for additional nodes
cluster_name string Cluster identifier
leader_url string HTTPS URL of cluster leader (used for join)
raft_addr string This node's Raft address (IP:port)
initial_peers array List of bootstrap peer addresses
flaredb_peers array List of FlareDB peer addresses

Optional Fields

Field Type Description
node_ip string Node's primary IP address
node_fqdn string Fully qualified domain name
datacenter string Datacenter identifier
rack string Rack identifier
services object Per-service configuration
tls object TLS certificate paths
network object Network CIDR ranges

Example Configurations

See examples/ directory:

  • cluster-config-bootstrap.json - Bootstrap node (first 3)
  • cluster-config-join.json - Join node (additional)
  • cluster-config-all-in-one.json - Single-node deployment

Bootstrap vs Join

Bootstrap Mode (bootstrap: true)

When to use:

  • First 3 nodes in a new cluster
  • Nodes configured with matching initial_peers
  • No existing cluster to join

Behavior:

  1. Services start with --initial-cluster configuration
  2. Raft consensus automatically elects leader
  3. Cluster join service detects bootstrap mode and exits immediately
  4. Marker file created: /var/lib/first-boot-automation/.chainfire-initialized

Example:

{
  "node_id": "node01",
  "bootstrap": true,
  "initial_peers": ["node01:2380", "node02:2380", "node03:2380"]
}

Join Mode (bootstrap: false)

When to use:

  • Nodes joining an existing cluster
  • Expansion or replacement nodes
  • Leader is known and reachable

Behavior:

  1. Service starts with no initial cluster config
  2. Waits for local service to be healthy (max 120s)
  3. POST to leader's /admin/member/add endpoint
  4. Retries up to 5 times with 10s delay
  5. Marker file created: /var/lib/first-boot-automation/.chainfire-joined

Example:

{
  "node_id": "node04",
  "bootstrap": false,
  "leader_url": "https://node01.prod.example.com:2379",
  "raft_addr": "10.0.1.13:2380"
}

Decision Matrix

Scenario bootstrap initial_peers leader_url
Node 1 (first) true all 3 nodes self
Node 2 (first) true all 3 nodes self
Node 3 (first) true all 3 nodes self
Node 4+ (join) false all 3 nodes node 1

Systemd Services

chainfire-cluster-join.service

Description: Joins Chainfire cluster on first boot

Dependencies:

  • After: network-online.target, chainfire.service
  • Before: flaredb-cluster-join.service

Configuration:

  • Type: oneshot
  • RemainAfterExit: true
  • Restart: on-failure

Logs:

journalctl -u chainfire-cluster-join.service

flaredb-cluster-join.service

Description: Joins FlareDB cluster after Chainfire

Dependencies:

  • After: chainfire-cluster-join.service, flaredb.service
  • Requires: chainfire-cluster-join.service

Configuration:

  • Type: oneshot
  • RemainAfterExit: true
  • Restart: on-failure

Logs:

journalctl -u flaredb-cluster-join.service

iam-initial-setup.service

Description: IAM initial setup and admin user creation

Dependencies:

  • After: flaredb-cluster-join.service, iam.service

Configuration:

  • Type: oneshot
  • RemainAfterExit: true

Logs:

journalctl -u iam-initial-setup.service

cluster-health-check.service

Description: Validates cluster health on first boot

Dependencies:

  • After: all cluster-join services

Configuration:

  • Type: oneshot
  • RemainAfterExit: false

Logs:

journalctl -u cluster-health-check.service

Troubleshooting

Check Service Status

# Overall status
systemctl status chainfire-cluster-join.service
systemctl status flaredb-cluster-join.service

# Detailed logs with JSON output
journalctl -u chainfire-cluster-join.service -o json-pretty

# Follow logs in real-time
journalctl -u chainfire-cluster-join.service -f

Common Issues

1. Health Check Timeout

Symptom:

{"level":"ERROR","message":"Health check timeout after 120s"}

Causes:

  • Service not starting (check main service logs)
  • Port conflict
  • TLS certificate issues

Solutions:

# Check main service
systemctl status chainfire.service
journalctl -u chainfire.service

# Test health endpoint manually
curl -k https://localhost:2379/health

# Restart services
systemctl restart chainfire.service
systemctl restart chainfire-cluster-join.service

2. Leader Unreachable

Symptom:

{"level":"ERROR","message":"Join request failed: connection error"}

Causes:

  • Network connectivity issues
  • Firewall blocking ports
  • Leader not running
  • Wrong leader URL in config

Solutions:

# Test network connectivity
ping node01.prod.example.com
curl -k https://node01.prod.example.com:2379/health

# Check firewall
iptables -L -n | grep 2379

# Verify configuration
jq '.leader_url' /etc/nixos/secrets/cluster-config.json

# Try manual join (see below)

3. Invalid Configuration

Symptom:

{"level":"ERROR","message":"Configuration file not found"}

Causes:

  • Missing configuration file
  • Wrong file path
  • Invalid JSON syntax
  • Missing required fields

Solutions:

# Check file exists
ls -la /etc/nixos/secrets/cluster-config.json

# Validate JSON syntax
jq . /etc/nixos/secrets/cluster-config.json

# Check required fields
jq '.node_id, .bootstrap, .leader_url' /etc/nixos/secrets/cluster-config.json

# Fix and restart
systemctl restart chainfire-cluster-join.service

4. Already Member (Reboot)

Symptom:

{"level":"WARN","message":"Already member of cluster (HTTP 409)"}

Explanation:

  • This is normal on reboots
  • Marker file prevents duplicate joins
  • No action needed

Verify:

# Check marker file
cat /var/lib/first-boot-automation/.chainfire-joined

# Should show timestamp: 2025-12-10T10:30:45+00:00

5. Join Retry Exhausted

Symptom:

{"level":"ERROR","message":"Failed to join cluster after 5 attempts"}

Causes:

  • Persistent network issues
  • Leader down or overloaded
  • Invalid node configuration
  • Cluster at capacity

Solutions:

# Check cluster status on leader
curl -k https://node01.prod.example.com:2379/admin/cluster/members | jq

# Verify this node's configuration
jq '.node_id, .raft_addr' /etc/nixos/secrets/cluster-config.json

# Increase retry attempts (edit NixOS config)
# Or perform manual join (see below)

Verify Cluster Membership

On leader node:

# Chainfire members
curl -k https://localhost:2379/admin/cluster/members | jq

# FlareDB members
curl -k https://localhost:2479/admin/cluster/members | jq

Expected output:

{
  "members": [
    {"id": "node01", "raft_addr": "10.0.1.10:2380", "status": "healthy"},
    {"id": "node02", "raft_addr": "10.0.1.11:2380", "status": "healthy"},
    {"id": "node03", "raft_addr": "10.0.1.12:2380", "status": "healthy"}
  ]
}

Check Marker Files

# List all marker files
ls -la /var/lib/first-boot-automation/

# View timestamps
cat /var/lib/first-boot-automation/.chainfire-joined
cat /var/lib/first-boot-automation/.flaredb-joined

Reset and Re-join

Warning: This will remove the node from the cluster and rejoin.

# Stop services
systemctl stop chainfire.service flaredb.service

# Remove data and markers
rm -rf /var/lib/chainfire/*
rm -rf /var/lib/flaredb/*
rm /var/lib/first-boot-automation/.chainfire-*
rm /var/lib/first-boot-automation/.flaredb-*

# Restart (will auto-join)
systemctl start chainfire.service
systemctl restart chainfire-cluster-join.service

Manual Operations

Manual Cluster Join

If automation fails, perform manual join:

Chainfire:

# On joining node, ensure service is running and healthy
curl -k https://localhost:2379/health

# From any node, add member to cluster
curl -k -X POST https://node01.prod.example.com:2379/admin/member/add \
  -H "Content-Type: application/json" \
  -d '{
    "id": "node04",
    "raft_addr": "10.0.1.13:2380"
  }'

# Create marker to prevent auto-retry
mkdir -p /var/lib/first-boot-automation
date -Iseconds > /var/lib/first-boot-automation/.chainfire-joined

FlareDB:

curl -k -X POST https://node01.prod.example.com:2479/admin/member/add \
  -H "Content-Type: application/json" \
  -d '{
    "id": "node04",
    "raft_addr": "10.0.1.13:2480"
  }'

date -Iseconds > /var/lib/first-boot-automation/.flaredb-joined

Remove Node from Cluster

On leader:

# Chainfire
curl -k -X DELETE https://node01.prod.example.com:2379/admin/member/node04

# FlareDB
curl -k -X DELETE https://node01.prod.example.com:2479/admin/member/node04

On removed node:

# Stop services
systemctl stop chainfire.service flaredb.service

# Clean up data
rm -rf /var/lib/chainfire/*
rm -rf /var/lib/flaredb/*
rm /var/lib/first-boot-automation/.chainfire-*
rm /var/lib/first-boot-automation/.flaredb-*

Disable First-Boot Automation

If you need to disable automation:

# In NixOS configuration
services.first-boot-automation.enable = false;

Or stop services temporarily:

systemctl stop chainfire-cluster-join.service
systemctl disable chainfire-cluster-join.service

Re-enable After Manual Operations

After manual cluster operations:

# Create marker files to indicate join complete
mkdir -p /var/lib/first-boot-automation
date -Iseconds > /var/lib/first-boot-automation/.chainfire-joined
date -Iseconds > /var/lib/first-boot-automation/.flaredb-joined

# Or re-enable automation (will skip if markers exist)
systemctl enable --now chainfire-cluster-join.service

Security

TLS Certificates

Requirements:

  • All cluster communication uses TLS
  • Certificates must exist before first boot
  • Generated by T031 TLS automation

Certificate Paths:

/etc/nixos/secrets/
├── ca.crt              # CA certificate
├── node01.crt          # Node certificate
└── node01.key          # Node private key (mode 0600)

Permissions:

chmod 600 /etc/nixos/secrets/node01.key
chmod 644 /etc/nixos/secrets/node01.crt
chmod 644 /etc/nixos/secrets/ca.crt

Configuration File Security

Cluster configuration contains sensitive data:

  • IP addresses and network topology
  • Service URLs
  • Node identifiers

Recommended permissions:

chmod 600 /etc/nixos/secrets/cluster-config.json
chown root:root /etc/nixos/secrets/cluster-config.json

Network Security

Required firewall rules:

# Chainfire
iptables -A INPUT -p tcp --dport 2379 -s 10.0.1.0/24 -j ACCEPT  # API
iptables -A INPUT -p tcp --dport 2380 -s 10.0.1.0/24 -j ACCEPT  # Raft
iptables -A INPUT -p tcp --dport 2381 -s 10.0.1.0/24 -j ACCEPT  # Gossip

# FlareDB
iptables -A INPUT -p tcp --dport 2479 -s 10.0.1.0/24 -j ACCEPT  # API
iptables -A INPUT -p tcp --dport 2480 -s 10.0.1.0/24 -j ACCEPT  # Raft

# IAM
iptables -A INPUT -p tcp --dport 8080 -s 10.0.1.0/24 -j ACCEPT  # API

Production Considerations

For production deployments:

  1. Remove -k flag from curl (validate TLS certificates)
  2. Implement mTLS for client authentication
  3. Rotate credentials regularly
  4. Audit logs with structured logging
  5. Monitor health endpoints continuously
  6. Backup cluster state before changes

Examples

Example 1: 3-Node Bootstrap Cluster

Node 1:

{
  "node_id": "node01",
  "bootstrap": true,
  "raft_addr": "10.0.1.10:2380",
  "initial_peers": ["node01:2380", "node02:2380", "node03:2380"]
}

Node 2:

{
  "node_id": "node02",
  "bootstrap": true,
  "raft_addr": "10.0.1.11:2380",
  "initial_peers": ["node01:2380", "node02:2380", "node03:2380"]
}

Node 3:

{
  "node_id": "node03",
  "bootstrap": true,
  "raft_addr": "10.0.1.12:2380",
  "initial_peers": ["node01:2380", "node02:2380", "node03:2380"]
}

Provisioning:

# Provision all 3 nodes simultaneously
for i in {1..3}; do
  nixos-anywhere --flake .#node0$i root@node0$i.example.com &
done
wait

# Nodes will bootstrap automatically on first boot

Example 2: Join Existing Cluster

Node 4 (joining):

{
  "node_id": "node04",
  "bootstrap": false,
  "leader_url": "https://node01.prod.example.com:2379",
  "raft_addr": "10.0.1.13:2380"
}

Provisioning:

nixos-anywhere --flake .#node04 root@node04.example.com

# Node will automatically join on first boot

Example 3: Single-Node All-in-One

For development/testing:

{
  "node_id": "aio01",
  "bootstrap": true,
  "raft_addr": "10.0.2.10:2380",
  "initial_peers": ["aio01:2380"],
  "flaredb_peers": ["aio01:2480"]
}

Provisioning:

nixos-anywhere --flake .#aio01 root@aio01.example.com

Integration with Other Systems

T024 NixOS Modules

First-boot automation integrates with service modules:

{
  imports = [
    ./nix/modules/chainfire.nix
    ./nix/modules/flaredb.nix
    ./nix/modules/first-boot-automation.nix
  ];

  services.chainfire.enable = true;
  services.flaredb.enable = true;
  services.first-boot-automation.enable = true;
}

T025 Observability

Health checks integrate with Prometheus:

# prometheus.yml
scrape_configs:
  - job_name: 'cluster-health'
    static_configs:
      - targets: ['node01:2379', 'node02:2379', 'node03:2379']
    metrics_path: '/health'

T031 TLS Certificates

Certificates generated by T031 are used automatically:

# On provisioning server
./tls/generate-node-cert.sh node01.example.com 10.0.1.10

# Copied during nixos-anywhere
# First-boot automation reads from /etc/nixos/secrets/

Logs and Debugging

Structured Logging

All logs are JSON-formatted:

{
  "timestamp": "2025-12-10T10:30:45+00:00",
  "level": "INFO",
  "service": "chainfire",
  "operation": "cluster-join",
  "message": "Successfully joined cluster"
}

Query Examples

All first-boot logs:

journalctl -u "*cluster-join*" -u "*initial-setup*" -u "*health-check*"

Errors only:

journalctl -u chainfire-cluster-join.service | grep '"level":"ERROR"'

Last boot only:

journalctl -b -u chainfire-cluster-join.service

JSON output for parsing:

journalctl -u chainfire-cluster-join.service -o json | jq '.MESSAGE'

Performance Tuning

Timeout Configuration

Adjust timeouts in NixOS module:

services.first-boot-automation = {
  enable = true;

  # Override default ports if needed
  chainfirePort = 2379;
  flaredbPort = 2479;
};

Retry Configuration

Modify retry logic in scripts:

# baremetal/first-boot/cluster-join.sh
MAX_ATTEMPTS=10      # Increase from 5
RETRY_DELAY=15       # Increase from 10s

Health Check Interval

Adjust polling interval:

# In service scripts
sleep 10  # Increase from 5s for less aggressive polling

Support and Contributing

Getting Help

  1. Check logs: journalctl -u chainfire-cluster-join.service
  2. Review troubleshooting section above
  3. Consult ARCHITECTURE.md for design details
  4. Check cluster status on leader node

Reporting Issues

Include in bug reports:

# Gather diagnostic information
journalctl -u chainfire-cluster-join.service > cluster-join.log
systemctl status chainfire-cluster-join.service > service-status.txt
cat /etc/nixos/secrets/cluster-config.json > config.json  # Redact sensitive data!
ls -la /var/lib/first-boot-automation/ > markers.txt

Development

See ARCHITECTURE.md for contributing guidelines.

References

  • ARCHITECTURE.md: Detailed design documentation
  • T024: NixOS service modules
  • T025: Observability and monitoring
  • T031: TLS certificate automation
  • T032.S1-S3: PXE boot and provisioning
  • Design Document: /home/centra/cloud/docs/por/T032-baremetal-provisioning/design.md

License

Internal use only - Centra Cloud Platform