- netboot-base.nix with SSH key auth - Launch scripts for node01/02/03 - Node configuration.nix and disko.nix - Nix modules for first-boot automation 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
23 KiB
First-Boot Automation Architecture
Overview
The first-boot automation system provides automated cluster joining and service initialization for bare-metal provisioned nodes. It handles two critical scenarios:
- Bootstrap Mode: First 3 nodes initialize a new Raft cluster
- Join Mode: Additional nodes join an existing cluster
This document describes the architecture, design decisions, and implementation details.
System Architecture
Component Hierarchy
┌─────────────────────────────────────────────────────────────┐
│ NixOS Boot Process │
└────────────────────┬────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────┐
│ systemd.target: multi-user.target │
└────────────────────┬────────────────────────────────────────┘
│
┌───────────────┼───────────────┐
│ │ │
▼ ▼ ▼
┌──────────┐ ┌──────────┐ ┌──────────┐
│chainfire │ │ flaredb │ │ iam │
│.service │ │.service │ │.service │
└────┬─────┘ └────┬─────┘ └────┬─────┘
│ │ │
▼ ▼ ▼
┌──────────────────────────────────────────┐
│ chainfire-cluster-join.service │
│ - Waits for local chainfire health │
│ - Checks bootstrap flag │
│ - Joins cluster if bootstrap=false │
└────────────────┬─────────────────────────┘
│
▼
┌──────────────────────────────────────────┐
│ flaredb-cluster-join.service │
│ - Requires chainfire-cluster-join │
│ - Waits for local flaredb health │
│ - Joins FlareDB cluster │
└────────────────┬─────────────────────────┘
│
▼
┌──────────────────────────────────────────┐
│ iam-initial-setup.service │
│ - Waits for IAM health │
│ - Creates admin user if needed │
│ - Generates initial tokens │
└────────────────┬─────────────────────────┘
│
▼
┌──────────────────────────────────────────┐
│ cluster-health-check.service │
│ - Polls all service health endpoints │
│ - Verifies cluster membership │
│ - Reports to journald │
└──────────────────────────────────────────┘
Configuration Flow
┌─────────────────────────────────────────┐
│ Provisioning Server │
│ - Generates cluster-config.json │
│ - Copies to /etc/nixos/secrets/ │
└────────────────┬────────────────────────┘
│
│ nixos-anywhere
│
▼
┌─────────────────────────────────────────┐
│ Target Node │
│ /etc/nixos/secrets/cluster-config.json │
└────────────────┬────────────────────────┘
│
│ Read by NixOS module
│
▼
┌─────────────────────────────────────────┐
│ first-boot-automation.nix │
│ - Parses JSON config │
│ - Creates systemd services │
│ - Sets up dependencies │
└────────────────┬────────────────────────┘
│
│ systemd activation
│
▼
┌─────────────────────────────────────────┐
│ Cluster Join Services │
│ - Execute join logic │
│ - Create marker files │
│ - Log to journald │
└─────────────────────────────────────────┘
Bootstrap vs Join Decision Logic
Decision Tree
┌─────────────────┐
│ Node Boots │
└────────┬────────┘
│
┌────────▼────────┐
│ Read cluster- │
│ config.json │
└────────┬────────┘
│
┌────────▼────────┐
│ bootstrap=true? │
└────────┬────────┘
│
┌────────────┴────────────┐
│ │
YES ▼ ▼ NO
┌─────────────────┐ ┌─────────────────┐
│ Bootstrap Mode │ │ Join Mode │
│ │ │ │
│ - Skip cluster │ │ - Wait for │
│ join API │ │ local health │
│ - Raft cluster │ │ - Contact │
│ initializes │ │ leader │
│ internally │ │ - POST to │
│ - Create marker │ │ /member/add │
│ - Exit success │ │ - Retry 5x │
└─────────────────┘ └─────────────────┘
Bootstrap Mode (bootstrap: true)
When to use:
- First 3 nodes in a new cluster
- Nodes configured with matching
initial_peers - No existing cluster to join
Behavior:
- Service starts with
--initial-clusterparameter containing all bootstrap peers - Raft consensus protocol automatically elects leader
- Cluster join service detects bootstrap mode and exits immediately
- No API calls to leader (cluster doesn't exist yet)
Configuration:
{
"bootstrap": true,
"initial_peers": ["node01:2380", "node02:2380", "node03:2380"]
}
Marker file: /var/lib/first-boot-automation/.chainfire-initialized
Join Mode (bootstrap: false)
When to use:
- Nodes joining an existing cluster
- Expansion or replacement nodes
- Leader URL is known and reachable
Behavior:
- Service starts with no initial cluster configuration
- Cluster join service waits for local service health
- POST to leader's
/admin/member/addwith node info - Leader adds member to Raft configuration
- Node joins cluster and synchronizes state
Configuration:
{
"bootstrap": false,
"leader_url": "https://node01.example.com:2379",
"raft_addr": "10.0.1.13:2380"
}
Marker file: /var/lib/first-boot-automation/.chainfire-joined
Idempotency and State Management
Marker Files
The system uses marker files to track initialization state:
/var/lib/first-boot-automation/
├── .chainfire-initialized # Bootstrap node initialized
├── .chainfire-joined # Node joined cluster
├── .flaredb-initialized # FlareDB bootstrap
├── .flaredb-joined # FlareDB joined
└── .iam-initialized # IAM setup complete
Purpose:
- Prevent duplicate join attempts on reboot
- Support idempotent operations
- Enable troubleshooting (check timestamps)
Format: ISO8601 timestamp of initialization
2025-12-10T10:30:45+00:00
State Transitions
┌──────────────┐
│ First Boot │
│ (no marker) │
└──────┬───────┘
│
▼
┌──────────────┐
│ Check Config │
│ bootstrap=? │
└──────┬───────┘
│
├─(true)──▶ Bootstrap ──▶ Create .initialized ──▶ Done
│
└─(false)─▶ Join ──▶ Create .joined ──▶ Done
│
│ (reboot)
▼
┌──────────────┐
│ Marker Exists│
│ Skip Join │
└──────────────┘
Retry Logic and Error Handling
Health Check Retry
Parameters:
- Timeout: 120 seconds (configurable)
- Retry Interval: 5 seconds
- Max Elapsed: 300 seconds
Logic:
START_TIME=$(date +%s)
while true; do
ELAPSED=$(($(date +%s) - START_TIME))
if [[ $ELAPSED -ge $TIMEOUT ]]; then
exit 1 # Timeout
fi
HTTP_CODE=$(curl -k -s -o /dev/null -w "%{http_code}" "$HEALTH_URL")
if [[ "$HTTP_CODE" == "200" ]]; then
exit 0 # Success
fi
sleep 5
done
Cluster Join Retry
Parameters:
- Max Attempts: 5 (configurable)
- Retry Delay: 10 seconds
- Exponential Backoff: Optional (not implemented)
Logic:
for ATTEMPT in $(seq 1 $MAX_ATTEMPTS); do
HTTP_CODE=$(curl -X POST "$LEADER_URL/admin/member/add" -d "$PAYLOAD")
if [[ "$HTTP_CODE" == "200" || "$HTTP_CODE" == "201" ]]; then
exit 0 # Success
elif [[ "$HTTP_CODE" == "409" ]]; then
exit 2 # Already member
fi
sleep $RETRY_DELAY
done
exit 1 # Max attempts exhausted
Error Codes
Health Check:
0: Service healthy1: Timeout or unhealthy
Cluster Join:
0: Successfully joined1: Failed after max attempts2: Already joined (idempotent)3: Invalid arguments
Bootstrap Detector:
0: Should bootstrap1: Should join existing2: Configuration error
Security Considerations
TLS Certificate Handling
Requirements:
- All inter-node communication uses TLS
- Self-signed certificates supported via
-kflag to curl - Certificate validation in production (remove
-k)
Certificate Paths:
{
"tls": {
"enabled": true,
"ca_cert_path": "/etc/nixos/secrets/ca.crt",
"node_cert_path": "/etc/nixos/secrets/node01.crt",
"node_key_path": "/etc/nixos/secrets/node01.key"
}
}
Integration with T031:
- Certificates generated by T031 TLS automation
- Copied to target during provisioning
- Read by services at startup
Secrets Management
Cluster Configuration:
- Stored in
/etc/nixos/secrets/cluster-config.json - Permissions:
0600 root:root(recommended) - Contains sensitive data: URLs, IPs, topology
API Credentials:
- IAM admin credentials (future implementation)
- Stored in separate file:
/etc/nixos/secrets/iam-admin.json - Never logged to journald
Attack Surface
Mitigations:
- Network-level: Firewall rules restrict cluster API ports
- Application-level: mTLS for authenticated requests
- Access control: SystemD service isolation
- Audit: All operations logged to journald with structured JSON
Integration Points
T024 NixOS Modules
The first-boot automation module imports and extends service modules:
# Example: netboot-control-plane.nix
{
imports = [
../modules/chainfire.nix
../modules/flaredb.nix
../modules/iam.nix
../modules/first-boot-automation.nix
];
services.first-boot-automation.enable = true;
}
T031 TLS Certificates
Dependencies:
- TLS certificates must exist before first boot
- Provisioning script copies certificates to
/etc/nixos/secrets/ - Services read certificates at startup
Certificate Generation:
# On provisioning server (T031)
./tls/generate-node-cert.sh node01.example.com 10.0.1.10
# Copied to target
scp ca.crt node01.crt node01.key root@10.0.1.10:/etc/nixos/secrets/
T032.S1-S3 PXE/Netboot
Boot Flow:
- PXE boot loads iPXE firmware
- iPXE chainloads NixOS kernel/initrd
- NixOS installer runs (nixos-anywhere)
- System installed to disk with first-boot automation
- Reboot into installed system
- First-boot automation executes
Configuration Injection:
# During nixos-anywhere provisioning
mkdir -p /mnt/etc/nixos/secrets
cp cluster-config.json /mnt/etc/nixos/secrets/
chmod 600 /mnt/etc/nixos/secrets/cluster-config.json
Service Dependencies
Systemd Ordering
Chainfire:
After: network-online.target, chainfire.service
Before: flaredb-cluster-join.service
Wants: network-online.target
FlareDB:
After: chainfire-cluster-join.service, flaredb.service
Requires: chainfire-cluster-join.service
Before: iam-initial-setup.service
IAM:
After: flaredb-cluster-join.service, iam.service
Before: cluster-health-check.service
Health Check:
After: chainfire-cluster-join, flaredb-cluster-join, iam-initial-setup
Type: oneshot (no RemainAfterExit)
Dependency Graph
network-online.target
│
├──▶ chainfire.service
│ │
│ ▼
│ chainfire-cluster-join.service
│ │
├──▶ flaredb.service
│ │
│ ▼
└────▶ flaredb-cluster-join.service
│
┌────┴────┐
│ │
iam.service │
│ │
▼ │
iam-initial-setup.service
│ │
└────┬────┘
│
▼
cluster-health-check.service
Logging and Observability
Structured Logging
All scripts output JSON-formatted logs:
{
"timestamp": "2025-12-10T10:30:45+00:00",
"level": "INFO",
"service": "chainfire",
"operation": "cluster-join",
"message": "Successfully joined cluster"
}
Benefits:
- Machine-readable for log aggregation (T025)
- Easy filtering with
journalctl -o json - Includes context (service, operation, timestamp)
Querying Logs
View all first-boot automation logs:
journalctl -u chainfire-cluster-join.service -u flaredb-cluster-join.service \
-u iam-initial-setup.service -u cluster-health-check.service
Filter by log level:
journalctl -u chainfire-cluster-join.service | grep '"level":"ERROR"'
Follow live:
journalctl -u chainfire-cluster-join.service -f
Health Check Integration
T025 Observability:
- Health check service can POST to metrics endpoint
- Prometheus scraping of
/healthendpoints - Alerts on cluster join failures
Future:
- Webhook to provisioning server on completion
- Slack/email notifications on errors
- Dashboard showing cluster join status
Performance Characteristics
Boot Time Analysis
Typical Timeline (3-node cluster):
T+0s : systemd starts
T+5s : network-online.target reached
T+10s : chainfire.service starts
T+15s : chainfire healthy
T+15s : chainfire-cluster-join runs (bootstrap, immediate exit)
T+20s : flaredb.service starts
T+25s : flaredb healthy
T+25s : flaredb-cluster-join runs (bootstrap, immediate exit)
T+30s : iam.service starts
T+35s : iam healthy
T+35s : iam-initial-setup runs
T+40s : cluster-health-check runs
T+40s : Node fully operational
Join Mode (node joining existing cluster):
T+0s : systemd starts
T+5s : network-online.target reached
T+10s : chainfire.service starts
T+15s : chainfire healthy
T+15s : chainfire-cluster-join runs
T+20s : POST to leader, wait for response
T+25s : Successfully joined chainfire cluster
T+25s : flaredb.service starts
T+30s : flaredb healthy
T+30s : flaredb-cluster-join runs
T+35s : Successfully joined flaredb cluster
T+40s : iam-initial-setup (skips, already initialized)
T+45s : cluster-health-check runs
T+45s : Node fully operational
Bottlenecks
Health Check Polling:
- 5-second intervals may be too aggressive
- Recommendation: Exponential backoff
Network Latency:
- Join requests block on network RTT
- Mitigation: Ensure low-latency cluster network
Raft Synchronization:
- New member must catch up on Raft log
- Time depends on log size (seconds to minutes)
Failure Modes and Recovery
Common Failures
1. Leader Unreachable
Symptom:
{"level":"ERROR","message":"Join request failed: connection error"}
Diagnosis:
- Check network connectivity:
ping node01.example.com - Verify firewall rules:
iptables -L - Check leader service status:
systemctl status chainfire.service
Recovery:
# Fix network/firewall, then restart join service
systemctl restart chainfire-cluster-join.service
2. Invalid Configuration
Symptom:
{"level":"ERROR","message":"Configuration file not found"}
Diagnosis:
- Verify file exists:
ls -la /etc/nixos/secrets/cluster-config.json - Check JSON syntax:
jq . /etc/nixos/secrets/cluster-config.json
Recovery:
# Fix configuration, then restart
systemctl restart chainfire-cluster-join.service
3. Service Not Healthy
Symptom:
{"level":"ERROR","message":"Health check timeout"}
Diagnosis:
- Check service logs:
journalctl -u chainfire.service - Verify service is running:
systemctl status chainfire.service - Test health endpoint:
curl -k https://localhost:2379/health
Recovery:
# Restart the main service
systemctl restart chainfire.service
# Join service will auto-retry after RestartSec
4. Already Member
Symptom:
{"level":"WARN","message":"Node already member of cluster (HTTP 409)"}
Diagnosis:
- This is normal on reboots
- Marker file created to prevent future attempts
Recovery:
- No action needed (idempotent behavior)
Manual Cluster Join
If automation fails, manual join:
Chainfire:
curl -k -X POST https://node01.example.com:2379/admin/member/add \
-H "Content-Type: application/json" \
-d '{"id":"node04","raft_addr":"10.0.1.13:2380"}'
# Create marker to prevent auto-retry
mkdir -p /var/lib/first-boot-automation
date -Iseconds > /var/lib/first-boot-automation/.chainfire-joined
FlareDB:
curl -k -X POST https://node01.example.com:2479/admin/member/add \
-H "Content-Type: application/json" \
-d '{"id":"node04","raft_addr":"10.0.1.13:2480"}'
date -Iseconds > /var/lib/first-boot-automation/.flaredb-joined
Rollback Procedure
Remove from cluster:
# On leader
curl -k -X DELETE https://node01.example.com:2379/admin/member/node04
# On node being removed
systemctl stop chainfire.service
rm -rf /var/lib/chainfire/*
rm /var/lib/first-boot-automation/.chainfire-joined
# Re-enable automation
systemctl restart chainfire-cluster-join.service
Future Enhancements
Planned Improvements
1. Exponential Backoff
- Current: Fixed 10-second delay
- Future: 1s, 2s, 4s, 8s, 16s exponential backoff
2. Leader Discovery
- Current: Static leader URL in config
- Future: DNS SRV records for dynamic discovery
3. Webhook Notifications
- POST to provisioning server on completion
- Include node info, join time, cluster health
4. Pre-flight Checks
- Validate network connectivity before attempting join
- Check TLS certificate validity
- Verify disk space, memory, CPU requirements
5. Automated Testing
- Integration tests with real cluster
- Simulate failures (network partitions, leader crashes)
- Validate idempotency
6. Configuration Validation
- JSON schema validation at boot
- Fail fast on invalid configuration
- Provide clear error messages
References
- T024: NixOS service modules
- T025: Observability and monitoring
- T031: TLS certificate automation
- T032.S1-S3: PXE boot, netboot images, provisioning
- Design Document:
/home/centra/cloud/docs/por/T032-baremetal-provisioning/design.md
Appendix: Configuration Schema
cluster-config.json Schema
{
"$schema": "http://json-schema.org/draft-07/schema#",
"type": "object",
"required": ["node_id", "node_role", "bootstrap", "cluster_name", "leader_url", "raft_addr"],
"properties": {
"node_id": {
"type": "string",
"description": "Unique node identifier"
},
"node_role": {
"type": "string",
"enum": ["control-plane", "worker", "all-in-one"]
},
"bootstrap": {
"type": "boolean",
"description": "True for first 3 nodes, false for join"
},
"cluster_name": {
"type": "string"
},
"leader_url": {
"type": "string",
"format": "uri"
},
"raft_addr": {
"type": "string",
"pattern": "^[0-9.]+:[0-9]+$"
},
"initial_peers": {
"type": "array",
"items": {"type": "string"}
},
"flaredb_peers": {
"type": "array",
"items": {"type": "string"}
}
}
}