photoncloud-monorepo/baremetal/first-boot/README.md
centra 5c6eb04a46 T036: Add VM cluster deployment configs for nixos-anywhere
- netboot-base.nix with SSH key auth
- Launch scripts for node01/02/03
- Node configuration.nix and disko.nix
- Nix modules for first-boot automation

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2025-12-11 09:59:19 +09:00

858 lines
19 KiB
Markdown

# First-Boot Automation for Bare-Metal Provisioning
Automated cluster joining and service initialization for bare-metal provisioned NixOS nodes.
## Table of Contents
- [Overview](#overview)
- [Quick Start](#quick-start)
- [Configuration](#configuration)
- [Bootstrap vs Join](#bootstrap-vs-join)
- [Systemd Services](#systemd-services)
- [Troubleshooting](#troubleshooting)
- [Manual Operations](#manual-operations)
- [Security](#security)
- [Examples](#examples)
## Overview
The first-boot automation system handles automated cluster joining for distributed services (Chainfire, FlareDB, IAM) on first boot of bare-metal provisioned nodes. It supports two modes:
- **Bootstrap Mode**: Initialize a new Raft cluster (first 3 nodes)
- **Join Mode**: Join an existing cluster (additional nodes)
### Features
- Automated health checking with retries
- Idempotent operations (safe to run multiple times)
- Structured JSON logging to journald
- Graceful failure handling with configurable retries
- Integration with TLS certificates (T031)
- Support for both bootstrap and runtime join scenarios
### Architecture
See [ARCHITECTURE.md](ARCHITECTURE.md) for detailed design documentation.
## Quick Start
### Prerequisites
1. Node provisioned via T032.S1-S3 (PXE boot and installation)
2. Cluster configuration file at `/etc/nixos/secrets/cluster-config.json`
3. TLS certificates at `/etc/nixos/secrets/` (T031)
4. Network connectivity to cluster leader (for join mode)
### Enable First-Boot Automation
In your NixOS configuration:
```nix
# /etc/nixos/configuration.nix
{
imports = [
./nix/modules/first-boot-automation.nix
];
services.first-boot-automation = {
enable = true;
configFile = "/etc/nixos/secrets/cluster-config.json";
# Optional: disable specific services
enableChainfire = true;
enableFlareDB = true;
enableIAM = true;
enableHealthCheck = true;
};
}
```
### First Boot
After provisioning and reboot:
1. Node boots from disk
2. systemd starts services
3. First-boot automation runs automatically
4. Cluster join completes within 30-60 seconds
Check status:
```bash
systemctl status chainfire-cluster-join.service
systemctl status flaredb-cluster-join.service
systemctl status iam-initial-setup.service
systemctl status cluster-health-check.service
```
## Configuration
### cluster-config.json Format
```json
{
"node_id": "node01",
"node_role": "control-plane",
"bootstrap": true,
"cluster_name": "prod-cluster",
"leader_url": "https://node01.prod.example.com:2379",
"raft_addr": "10.0.1.10:2380",
"initial_peers": [
"node01:2380",
"node02:2380",
"node03:2380"
],
"flaredb_peers": [
"node01:2480",
"node02:2480",
"node03:2480"
]
}
```
### Required Fields
| Field | Type | Description |
|-------|------|-------------|
| `node_id` | string | Unique identifier for this node |
| `node_role` | string | Node role: `control-plane`, `worker`, or `all-in-one` |
| `bootstrap` | boolean | `true` for first 3 nodes, `false` for additional nodes |
| `cluster_name` | string | Cluster identifier |
| `leader_url` | string | HTTPS URL of cluster leader (used for join) |
| `raft_addr` | string | This node's Raft address (IP:port) |
| `initial_peers` | array | List of bootstrap peer addresses |
| `flaredb_peers` | array | List of FlareDB peer addresses |
### Optional Fields
| Field | Type | Description |
|-------|------|-------------|
| `node_ip` | string | Node's primary IP address |
| `node_fqdn` | string | Fully qualified domain name |
| `datacenter` | string | Datacenter identifier |
| `rack` | string | Rack identifier |
| `services` | object | Per-service configuration |
| `tls` | object | TLS certificate paths |
| `network` | object | Network CIDR ranges |
### Example Configurations
See [examples/](examples/) directory:
- `cluster-config-bootstrap.json` - Bootstrap node (first 3)
- `cluster-config-join.json` - Join node (additional)
- `cluster-config-all-in-one.json` - Single-node deployment
## Bootstrap vs Join
### Bootstrap Mode (bootstrap: true)
**When to use:**
- First 3 nodes in a new cluster
- Nodes configured with matching `initial_peers`
- No existing cluster to join
**Behavior:**
1. Services start with `--initial-cluster` configuration
2. Raft consensus automatically elects leader
3. Cluster join service detects bootstrap mode and exits immediately
4. Marker file created: `/var/lib/first-boot-automation/.chainfire-initialized`
**Example:**
```json
{
"node_id": "node01",
"bootstrap": true,
"initial_peers": ["node01:2380", "node02:2380", "node03:2380"]
}
```
### Join Mode (bootstrap: false)
**When to use:**
- Nodes joining an existing cluster
- Expansion or replacement nodes
- Leader is known and reachable
**Behavior:**
1. Service starts with no initial cluster config
2. Waits for local service to be healthy (max 120s)
3. POST to leader's `/admin/member/add` endpoint
4. Retries up to 5 times with 10s delay
5. Marker file created: `/var/lib/first-boot-automation/.chainfire-joined`
**Example:**
```json
{
"node_id": "node04",
"bootstrap": false,
"leader_url": "https://node01.prod.example.com:2379",
"raft_addr": "10.0.1.13:2380"
}
```
### Decision Matrix
| Scenario | bootstrap | initial_peers | leader_url |
|----------|-----------|---------------|------------|
| Node 1 (first) | `true` | all 3 nodes | self |
| Node 2 (first) | `true` | all 3 nodes | self |
| Node 3 (first) | `true` | all 3 nodes | self |
| Node 4+ (join) | `false` | all 3 nodes | node 1 |
## Systemd Services
### chainfire-cluster-join.service
**Description:** Joins Chainfire cluster on first boot
**Dependencies:**
- After: `network-online.target`, `chainfire.service`
- Before: `flaredb-cluster-join.service`
**Configuration:**
- Type: `oneshot`
- RemainAfterExit: `true`
- Restart: `on-failure`
**Logs:**
```bash
journalctl -u chainfire-cluster-join.service
```
### flaredb-cluster-join.service
**Description:** Joins FlareDB cluster after Chainfire
**Dependencies:**
- After: `chainfire-cluster-join.service`, `flaredb.service`
- Requires: `chainfire-cluster-join.service`
**Configuration:**
- Type: `oneshot`
- RemainAfterExit: `true`
- Restart: `on-failure`
**Logs:**
```bash
journalctl -u flaredb-cluster-join.service
```
### iam-initial-setup.service
**Description:** IAM initial setup and admin user creation
**Dependencies:**
- After: `flaredb-cluster-join.service`, `iam.service`
**Configuration:**
- Type: `oneshot`
- RemainAfterExit: `true`
**Logs:**
```bash
journalctl -u iam-initial-setup.service
```
### cluster-health-check.service
**Description:** Validates cluster health on first boot
**Dependencies:**
- After: all cluster-join services
**Configuration:**
- Type: `oneshot`
- RemainAfterExit: `false`
**Logs:**
```bash
journalctl -u cluster-health-check.service
```
## Troubleshooting
### Check Service Status
```bash
# Overall status
systemctl status chainfire-cluster-join.service
systemctl status flaredb-cluster-join.service
# Detailed logs with JSON output
journalctl -u chainfire-cluster-join.service -o json-pretty
# Follow logs in real-time
journalctl -u chainfire-cluster-join.service -f
```
### Common Issues
#### 1. Health Check Timeout
**Symptom:**
```json
{"level":"ERROR","message":"Health check timeout after 120s"}
```
**Causes:**
- Service not starting (check main service logs)
- Port conflict
- TLS certificate issues
**Solutions:**
```bash
# Check main service
systemctl status chainfire.service
journalctl -u chainfire.service
# Test health endpoint manually
curl -k https://localhost:2379/health
# Restart services
systemctl restart chainfire.service
systemctl restart chainfire-cluster-join.service
```
#### 2. Leader Unreachable
**Symptom:**
```json
{"level":"ERROR","message":"Join request failed: connection error"}
```
**Causes:**
- Network connectivity issues
- Firewall blocking ports
- Leader not running
- Wrong leader URL in config
**Solutions:**
```bash
# Test network connectivity
ping node01.prod.example.com
curl -k https://node01.prod.example.com:2379/health
# Check firewall
iptables -L -n | grep 2379
# Verify configuration
jq '.leader_url' /etc/nixos/secrets/cluster-config.json
# Try manual join (see below)
```
#### 3. Invalid Configuration
**Symptom:**
```json
{"level":"ERROR","message":"Configuration file not found"}
```
**Causes:**
- Missing configuration file
- Wrong file path
- Invalid JSON syntax
- Missing required fields
**Solutions:**
```bash
# Check file exists
ls -la /etc/nixos/secrets/cluster-config.json
# Validate JSON syntax
jq . /etc/nixos/secrets/cluster-config.json
# Check required fields
jq '.node_id, .bootstrap, .leader_url' /etc/nixos/secrets/cluster-config.json
# Fix and restart
systemctl restart chainfire-cluster-join.service
```
#### 4. Already Member (Reboot)
**Symptom:**
```json
{"level":"WARN","message":"Already member of cluster (HTTP 409)"}
```
**Explanation:**
- This is **normal** on reboots
- Marker file prevents duplicate joins
- No action needed
**Verify:**
```bash
# Check marker file
cat /var/lib/first-boot-automation/.chainfire-joined
# Should show timestamp: 2025-12-10T10:30:45+00:00
```
#### 5. Join Retry Exhausted
**Symptom:**
```json
{"level":"ERROR","message":"Failed to join cluster after 5 attempts"}
```
**Causes:**
- Persistent network issues
- Leader down or overloaded
- Invalid node configuration
- Cluster at capacity
**Solutions:**
```bash
# Check cluster status on leader
curl -k https://node01.prod.example.com:2379/admin/cluster/members | jq
# Verify this node's configuration
jq '.node_id, .raft_addr' /etc/nixos/secrets/cluster-config.json
# Increase retry attempts (edit NixOS config)
# Or perform manual join (see below)
```
### Verify Cluster Membership
**On leader node:**
```bash
# Chainfire members
curl -k https://localhost:2379/admin/cluster/members | jq
# FlareDB members
curl -k https://localhost:2479/admin/cluster/members | jq
```
**Expected output:**
```json
{
"members": [
{"id": "node01", "raft_addr": "10.0.1.10:2380", "status": "healthy"},
{"id": "node02", "raft_addr": "10.0.1.11:2380", "status": "healthy"},
{"id": "node03", "raft_addr": "10.0.1.12:2380", "status": "healthy"}
]
}
```
### Check Marker Files
```bash
# List all marker files
ls -la /var/lib/first-boot-automation/
# View timestamps
cat /var/lib/first-boot-automation/.chainfire-joined
cat /var/lib/first-boot-automation/.flaredb-joined
```
### Reset and Re-join
**Warning:** This will remove the node from the cluster and rejoin.
```bash
# Stop services
systemctl stop chainfire.service flaredb.service
# Remove data and markers
rm -rf /var/lib/chainfire/*
rm -rf /var/lib/flaredb/*
rm /var/lib/first-boot-automation/.chainfire-*
rm /var/lib/first-boot-automation/.flaredb-*
# Restart (will auto-join)
systemctl start chainfire.service
systemctl restart chainfire-cluster-join.service
```
## Manual Operations
### Manual Cluster Join
If automation fails, perform manual join:
**Chainfire:**
```bash
# On joining node, ensure service is running and healthy
curl -k https://localhost:2379/health
# From any node, add member to cluster
curl -k -X POST https://node01.prod.example.com:2379/admin/member/add \
-H "Content-Type: application/json" \
-d '{
"id": "node04",
"raft_addr": "10.0.1.13:2380"
}'
# Create marker to prevent auto-retry
mkdir -p /var/lib/first-boot-automation
date -Iseconds > /var/lib/first-boot-automation/.chainfire-joined
```
**FlareDB:**
```bash
curl -k -X POST https://node01.prod.example.com:2479/admin/member/add \
-H "Content-Type: application/json" \
-d '{
"id": "node04",
"raft_addr": "10.0.1.13:2480"
}'
date -Iseconds > /var/lib/first-boot-automation/.flaredb-joined
```
### Remove Node from Cluster
**On leader:**
```bash
# Chainfire
curl -k -X DELETE https://node01.prod.example.com:2379/admin/member/node04
# FlareDB
curl -k -X DELETE https://node01.prod.example.com:2479/admin/member/node04
```
**On removed node:**
```bash
# Stop services
systemctl stop chainfire.service flaredb.service
# Clean up data
rm -rf /var/lib/chainfire/*
rm -rf /var/lib/flaredb/*
rm /var/lib/first-boot-automation/.chainfire-*
rm /var/lib/first-boot-automation/.flaredb-*
```
### Disable First-Boot Automation
If you need to disable automation:
```nix
# In NixOS configuration
services.first-boot-automation.enable = false;
```
Or stop services temporarily:
```bash
systemctl stop chainfire-cluster-join.service
systemctl disable chainfire-cluster-join.service
```
### Re-enable After Manual Operations
After manual cluster operations:
```bash
# Create marker files to indicate join complete
mkdir -p /var/lib/first-boot-automation
date -Iseconds > /var/lib/first-boot-automation/.chainfire-joined
date -Iseconds > /var/lib/first-boot-automation/.flaredb-joined
# Or re-enable automation (will skip if markers exist)
systemctl enable --now chainfire-cluster-join.service
```
## Security
### TLS Certificates
**Requirements:**
- All cluster communication uses TLS
- Certificates must exist before first boot
- Generated by T031 TLS automation
**Certificate Paths:**
```
/etc/nixos/secrets/
├── ca.crt # CA certificate
├── node01.crt # Node certificate
└── node01.key # Node private key (mode 0600)
```
**Permissions:**
```bash
chmod 600 /etc/nixos/secrets/node01.key
chmod 644 /etc/nixos/secrets/node01.crt
chmod 644 /etc/nixos/secrets/ca.crt
```
### Configuration File Security
**Cluster configuration contains sensitive data:**
- IP addresses and network topology
- Service URLs
- Node identifiers
**Recommended permissions:**
```bash
chmod 600 /etc/nixos/secrets/cluster-config.json
chown root:root /etc/nixos/secrets/cluster-config.json
```
### Network Security
**Required firewall rules:**
```bash
# Chainfire
iptables -A INPUT -p tcp --dport 2379 -s 10.0.1.0/24 -j ACCEPT # API
iptables -A INPUT -p tcp --dport 2380 -s 10.0.1.0/24 -j ACCEPT # Raft
iptables -A INPUT -p tcp --dport 2381 -s 10.0.1.0/24 -j ACCEPT # Gossip
# FlareDB
iptables -A INPUT -p tcp --dport 2479 -s 10.0.1.0/24 -j ACCEPT # API
iptables -A INPUT -p tcp --dport 2480 -s 10.0.1.0/24 -j ACCEPT # Raft
# IAM
iptables -A INPUT -p tcp --dport 8080 -s 10.0.1.0/24 -j ACCEPT # API
```
### Production Considerations
**For production deployments:**
1. **Remove `-k` flag from curl** (validate TLS certificates)
2. **Implement mTLS** for client authentication
3. **Rotate credentials** regularly
4. **Audit logs** with structured logging
5. **Monitor health endpoints** continuously
6. **Backup cluster state** before changes
## Examples
### Example 1: 3-Node Bootstrap Cluster
**Node 1:**
```json
{
"node_id": "node01",
"bootstrap": true,
"raft_addr": "10.0.1.10:2380",
"initial_peers": ["node01:2380", "node02:2380", "node03:2380"]
}
```
**Node 2:**
```json
{
"node_id": "node02",
"bootstrap": true,
"raft_addr": "10.0.1.11:2380",
"initial_peers": ["node01:2380", "node02:2380", "node03:2380"]
}
```
**Node 3:**
```json
{
"node_id": "node03",
"bootstrap": true,
"raft_addr": "10.0.1.12:2380",
"initial_peers": ["node01:2380", "node02:2380", "node03:2380"]
}
```
**Provisioning:**
```bash
# Provision all 3 nodes simultaneously
for i in {1..3}; do
nixos-anywhere --flake .#node0$i root@node0$i.example.com &
done
wait
# Nodes will bootstrap automatically on first boot
```
### Example 2: Join Existing Cluster
**Node 4 (joining):**
```json
{
"node_id": "node04",
"bootstrap": false,
"leader_url": "https://node01.prod.example.com:2379",
"raft_addr": "10.0.1.13:2380"
}
```
**Provisioning:**
```bash
nixos-anywhere --flake .#node04 root@node04.example.com
# Node will automatically join on first boot
```
### Example 3: Single-Node All-in-One
**For development/testing:**
```json
{
"node_id": "aio01",
"bootstrap": true,
"raft_addr": "10.0.2.10:2380",
"initial_peers": ["aio01:2380"],
"flaredb_peers": ["aio01:2480"]
}
```
**Provisioning:**
```bash
nixos-anywhere --flake .#aio01 root@aio01.example.com
```
## Integration with Other Systems
### T024 NixOS Modules
First-boot automation integrates with service modules:
```nix
{
imports = [
./nix/modules/chainfire.nix
./nix/modules/flaredb.nix
./nix/modules/first-boot-automation.nix
];
services.chainfire.enable = true;
services.flaredb.enable = true;
services.first-boot-automation.enable = true;
}
```
### T025 Observability
Health checks integrate with Prometheus:
```yaml
# prometheus.yml
scrape_configs:
- job_name: 'cluster-health'
static_configs:
- targets: ['node01:2379', 'node02:2379', 'node03:2379']
metrics_path: '/health'
```
### T031 TLS Certificates
Certificates generated by T031 are used automatically:
```bash
# On provisioning server
./tls/generate-node-cert.sh node01.example.com 10.0.1.10
# Copied during nixos-anywhere
# First-boot automation reads from /etc/nixos/secrets/
```
## Logs and Debugging
### Structured Logging
All logs are JSON-formatted:
```json
{
"timestamp": "2025-12-10T10:30:45+00:00",
"level": "INFO",
"service": "chainfire",
"operation": "cluster-join",
"message": "Successfully joined cluster"
}
```
### Query Examples
**All first-boot logs:**
```bash
journalctl -u "*cluster-join*" -u "*initial-setup*" -u "*health-check*"
```
**Errors only:**
```bash
journalctl -u chainfire-cluster-join.service | grep '"level":"ERROR"'
```
**Last boot only:**
```bash
journalctl -b -u chainfire-cluster-join.service
```
**JSON output for parsing:**
```bash
journalctl -u chainfire-cluster-join.service -o json | jq '.MESSAGE'
```
## Performance Tuning
### Timeout Configuration
Adjust timeouts in NixOS module:
```nix
services.first-boot-automation = {
enable = true;
# Override default ports if needed
chainfirePort = 2379;
flaredbPort = 2479;
};
```
### Retry Configuration
Modify retry logic in scripts:
```bash
# baremetal/first-boot/cluster-join.sh
MAX_ATTEMPTS=10 # Increase from 5
RETRY_DELAY=15 # Increase from 10s
```
### Health Check Interval
Adjust polling interval:
```bash
# In service scripts
sleep 10 # Increase from 5s for less aggressive polling
```
## Support and Contributing
### Getting Help
1. Check logs: `journalctl -u chainfire-cluster-join.service`
2. Review troubleshooting section above
3. Consult [ARCHITECTURE.md](ARCHITECTURE.md) for design details
4. Check cluster status on leader node
### Reporting Issues
Include in bug reports:
```bash
# Gather diagnostic information
journalctl -u chainfire-cluster-join.service > cluster-join.log
systemctl status chainfire-cluster-join.service > service-status.txt
cat /etc/nixos/secrets/cluster-config.json > config.json # Redact sensitive data!
ls -la /var/lib/first-boot-automation/ > markers.txt
```
### Development
See [ARCHITECTURE.md](ARCHITECTURE.md) for contributing guidelines.
## References
- **ARCHITECTURE.md**: Detailed design documentation
- **T024**: NixOS service modules
- **T025**: Observability and monitoring
- **T031**: TLS certificate automation
- **T032.S1-S3**: PXE boot and provisioning
- **Design Document**: `/home/centra/cloud/docs/por/T032-baremetal-provisioning/design.md`
## License
Internal use only - Centra Cloud Platform