photoncloud-monorepo/baremetal/first-boot/README.md

# First-Boot Automation for Bare-Metal Provisioning

Automated cluster joining and service initialization for bare-metal provisioned NixOS nodes.

## Table of Contents

- [Overview](#overview)
- [Quick Start](#quick-start)
- [Configuration](#configuration)
- [Bootstrap vs Join](#bootstrap-vs-join)
- [Systemd Services](#systemd-services)
- [Troubleshooting](#troubleshooting)
- [Manual Operations](#manual-operations)
- [Security](#security)
- [Examples](#examples)

## Overview

The first-boot automation system handles automated cluster joining for distributed services (Chainfire, FlareDB, IAM) on first boot of bare-metal provisioned nodes. It supports two modes:

- **Bootstrap Mode**: Initialize a new Raft cluster (first 3 nodes)
- **Join Mode**: Join an existing cluster (additional nodes)

### Features

- Automated health checking with retries
- Idempotent operations (safe to run multiple times)
- Structured JSON logging to journald
- Graceful failure handling with configurable retries
- Integration with TLS certificates (T031)
- Support for both bootstrap and runtime join scenarios

### Architecture

See [ARCHITECTURE.md](ARCHITECTURE.md) for detailed design documentation.

## Quick Start

### Prerequisites

1. Node provisioned via T032.S1-S3 (PXE boot and installation)
2. Cluster configuration file at `/etc/nixos/secrets/cluster-config.json`
3. TLS certificates at `/etc/nixos/secrets/` (T031)
4. Network connectivity to cluster leader (for join mode)

### Enable First-Boot Automation

In your NixOS configuration:

```nix
# /etc/nixos/configuration.nix
{
  imports = [
    ./nix/modules/first-boot-automation.nix
  ];

  services.first-boot-automation = {
    enable = true;
    configFile = "/etc/nixos/secrets/cluster-config.json";

    # Optional: disable specific services
    enableChainfire = true;
    enableFlareDB = true;
    enableIAM = true;
    enableHealthCheck = true;
  };
}
```

### First Boot

After provisioning and reboot:

1. Node boots from disk
2. systemd starts services
3. First-boot automation runs automatically
4. Cluster join completes within 30-60 seconds

Check status:
```bash
systemctl status chainfire-cluster-join.service
systemctl status flaredb-cluster-join.service
systemctl status iam-initial-setup.service
systemctl status cluster-health-check.service
```

## Configuration

### cluster-config.json Format

```json
{
  "node_id": "node01",
  "node_role": "control-plane",
  "bootstrap": true,
  "cluster_name": "prod-cluster",
  "leader_url": "https://node01.prod.example.com:2379",
  "raft_addr": "10.0.1.10:2380",
  "initial_peers": [
    "node01:2380",
    "node02:2380",
    "node03:2380"
  ],
  "flaredb_peers": [
    "node01:2480",
    "node02:2480",
    "node03:2480"
  ]
}
```

### Required Fields

| Field | Type | Description |
|-------|------|-------------|
| `node_id` | string | Unique identifier for this node |
| `node_role` | string | Node role: `control-plane`, `worker`, or `all-in-one` |
| `bootstrap` | boolean | `true` for first 3 nodes, `false` for additional nodes |
| `cluster_name` | string | Cluster identifier |
| `leader_url` | string | HTTPS URL of cluster leader (used for join) |
| `raft_addr` | string | This node's Raft address (IP:port) |
| `initial_peers` | array | List of bootstrap peer addresses |
| `flaredb_peers` | array | List of FlareDB peer addresses |

### Optional Fields

| Field | Type | Description |
|-------|------|-------------|
| `node_ip` | string | Node's primary IP address |
| `node_fqdn` | string | Fully qualified domain name |
| `datacenter` | string | Datacenter identifier |
| `rack` | string | Rack identifier |
| `services` | object | Per-service configuration |
| `tls` | object | TLS certificate paths |
| `network` | object | Network CIDR ranges |

### Example Configurations

See [examples/](examples/) directory:

- `cluster-config-bootstrap.json` - Bootstrap node (first 3)
- `cluster-config-join.json` - Join node (additional)
- `cluster-config-all-in-one.json` - Single-node deployment

## Bootstrap vs Join

### Bootstrap Mode (bootstrap: true)

**When to use:**
- First 3 nodes in a new cluster
- Nodes configured with matching `initial_peers`
- No existing cluster to join

**Behavior:**
1. Services start with `--initial-cluster` configuration
2. Raft consensus automatically elects leader
3. Cluster join service detects bootstrap mode and exits immediately
4. Marker file created: `/var/lib/first-boot-automation/.chainfire-initialized`

**Example:**
```json
{
  "node_id": "node01",
  "bootstrap": true,
  "initial_peers": ["node01:2380", "node02:2380", "node03:2380"]
}
```

### Join Mode (bootstrap: false)

**When to use:**
- Nodes joining an existing cluster
- Expansion or replacement nodes
- Leader is known and reachable

**Behavior:**
1. Service starts with no initial cluster config
2. Waits for local service to be healthy (max 120s)
3. POST to leader's `/admin/member/add` endpoint
4. Retries up to 5 times with 10s delay
5. Marker file created: `/var/lib/first-boot-automation/.chainfire-joined`

**Example:**
```json
{
  "node_id": "node04",
  "bootstrap": false,
  "leader_url": "https://node01.prod.example.com:2379",
  "raft_addr": "10.0.1.13:2380"
}
```

### Decision Matrix

| Scenario | bootstrap | initial_peers | leader_url |
|----------|-----------|---------------|------------|
| Node 1 (first) | `true` | all 3 nodes | self |
| Node 2 (first) | `true` | all 3 nodes | self |
| Node 3 (first) | `true` | all 3 nodes | self |
| Node 4+ (join) | `false` | all 3 nodes | node 1 |

## Systemd Services

### chainfire-cluster-join.service

**Description:** Joins Chainfire cluster on first boot

**Dependencies:**
- After: `network-online.target`, `chainfire.service`
- Before: `flaredb-cluster-join.service`

**Configuration:**
- Type: `oneshot`
- RemainAfterExit: `true`
- Restart: `on-failure`

**Logs:**
```bash
journalctl -u chainfire-cluster-join.service
```

### flaredb-cluster-join.service

**Description:** Joins FlareDB cluster after Chainfire

**Dependencies:**
- After: `chainfire-cluster-join.service`, `flaredb.service`
- Requires: `chainfire-cluster-join.service`

**Configuration:**
- Type: `oneshot`
- RemainAfterExit: `true`
- Restart: `on-failure`

**Logs:**
```bash
journalctl -u flaredb-cluster-join.service
```

### iam-initial-setup.service

**Description:** IAM initial setup and admin user creation

**Dependencies:**
- After: `flaredb-cluster-join.service`, `iam.service`

**Configuration:**
- Type: `oneshot`
- RemainAfterExit: `true`

**Logs:**
```bash
journalctl -u iam-initial-setup.service
```

### cluster-health-check.service

**Description:** Validates cluster health on first boot

**Dependencies:**
- After: all cluster-join services

**Configuration:**
- Type: `oneshot`
- RemainAfterExit: `false`

**Logs:**
```bash
journalctl -u cluster-health-check.service
```

## Troubleshooting

### Check Service Status

```bash
# Overall status
systemctl status chainfire-cluster-join.service
systemctl status flaredb-cluster-join.service

# Detailed logs with JSON output
journalctl -u chainfire-cluster-join.service -o json-pretty

# Follow logs in real-time
journalctl -u chainfire-cluster-join.service -f
```

### Common Issues

#### 1. Health Check Timeout

**Symptom:**
```json
{"level":"ERROR","message":"Health check timeout after 120s"}
```

**Causes:**
- Service not starting (check main service logs)
- Port conflict
- TLS certificate issues

**Solutions:**
```bash
# Check main service
systemctl status chainfire.service
journalctl -u chainfire.service

# Test health endpoint manually
curl -k https://localhost:2379/health

# Restart services
systemctl restart chainfire.service
systemctl restart chainfire-cluster-join.service
```

#### 2. Leader Unreachable

**Symptom:**
```json
{"level":"ERROR","message":"Join request failed: connection error"}
```

**Causes:**
- Network connectivity issues
- Firewall blocking ports
- Leader not running
- Wrong leader URL in config

**Solutions:**
```bash
# Test network connectivity
ping node01.prod.example.com
curl -k https://node01.prod.example.com:2379/health

# Check firewall
iptables -L -n | grep 2379

# Verify configuration
jq '.leader_url' /etc/nixos/secrets/cluster-config.json

# Try manual join (see below)
```

#### 3. Invalid Configuration

**Symptom:**
```json
{"level":"ERROR","message":"Configuration file not found"}
```

**Causes:**
- Missing configuration file
- Wrong file path
- Invalid JSON syntax
- Missing required fields

**Solutions:**
```bash
# Check file exists
ls -la /etc/nixos/secrets/cluster-config.json

# Validate JSON syntax
jq . /etc/nixos/secrets/cluster-config.json

# Check required fields
jq '.node_id, .bootstrap, .leader_url' /etc/nixos/secrets/cluster-config.json

# Fix and restart
systemctl restart chainfire-cluster-join.service
```

#### 4. Already Member (Reboot)

**Symptom:**
```json
{"level":"WARN","message":"Already member of cluster (HTTP 409)"}
```

**Explanation:**
- This is **normal** on reboots
- Marker file prevents duplicate joins
- No action needed

**Verify:**
```bash
# Check marker file
cat /var/lib/first-boot-automation/.chainfire-joined

# Should show timestamp: 2025-12-10T10:30:45+00:00
```

#### 5. Join Retry Exhausted

**Symptom:**
```json
{"level":"ERROR","message":"Failed to join cluster after 5 attempts"}
```

**Causes:**
- Persistent network issues
- Leader down or overloaded
- Invalid node configuration
- Cluster at capacity

**Solutions:**
```bash
# Check cluster status on leader
curl -k https://node01.prod.example.com:2379/admin/cluster/members | jq

# Verify this node's configuration
jq '.node_id, .raft_addr' /etc/nixos/secrets/cluster-config.json

# Increase retry attempts (edit NixOS config)
# Or perform manual join (see below)
```

### Verify Cluster Membership

**On leader node:**
```bash
# Chainfire members
curl -k https://localhost:2379/admin/cluster/members | jq

# FlareDB members
curl -k https://localhost:2479/admin/cluster/members | jq
```

**Expected output:**
```json
{
  "members": [
    {"id": "node01", "raft_addr": "10.0.1.10:2380", "status": "healthy"},
    {"id": "node02", "raft_addr": "10.0.1.11:2380", "status": "healthy"},
    {"id": "node03", "raft_addr": "10.0.1.12:2380", "status": "healthy"}
  ]
}
```

### Check Marker Files

```bash
# List all marker files
ls -la /var/lib/first-boot-automation/

# View timestamps
cat /var/lib/first-boot-automation/.chainfire-joined
cat /var/lib/first-boot-automation/.flaredb-joined
```

### Reset and Re-join

**Warning:** This will remove the node from the cluster and rejoin.

```bash
# Stop services
systemctl stop chainfire.service flaredb.service

# Remove data and markers
rm -rf /var/lib/chainfire/*
rm -rf /var/lib/flaredb/*
rm /var/lib/first-boot-automation/.chainfire-*
rm /var/lib/first-boot-automation/.flaredb-*

# Restart (will auto-join)
systemctl start chainfire.service
systemctl restart chainfire-cluster-join.service
```

## Manual Operations

### Manual Cluster Join

If automation fails, perform manual join:

**Chainfire:**
```bash
# On joining node, ensure service is running and healthy
curl -k https://localhost:2379/health

# From any node, add member to cluster
curl -k -X POST https://node01.prod.example.com:2379/admin/member/add \
  -H "Content-Type: application/json" \
  -d '{
    "id": "node04",
    "raft_addr": "10.0.1.13:2380"
  }'

# Create marker to prevent auto-retry
mkdir -p /var/lib/first-boot-automation
date -Iseconds > /var/lib/first-boot-automation/.chainfire-joined
```

**FlareDB:**
```bash
curl -k -X POST https://node01.prod.example.com:2479/admin/member/add \
  -H "Content-Type: application/json" \
  -d '{
    "id": "node04",
    "raft_addr": "10.0.1.13:2480"
  }'

date -Iseconds > /var/lib/first-boot-automation/.flaredb-joined
```

### Remove Node from Cluster

**On leader:**
```bash
# Chainfire
curl -k -X DELETE https://node01.prod.example.com:2379/admin/member/node04

# FlareDB
curl -k -X DELETE https://node01.prod.example.com:2479/admin/member/node04
```

**On removed node:**
```bash
# Stop services
systemctl stop chainfire.service flaredb.service

# Clean up data
rm -rf /var/lib/chainfire/*
rm -rf /var/lib/flaredb/*
rm /var/lib/first-boot-automation/.chainfire-*
rm /var/lib/first-boot-automation/.flaredb-*
```

### Disable First-Boot Automation

If you need to disable automation:

```nix
# In NixOS configuration
services.first-boot-automation.enable = false;
```

Or stop services temporarily:
```bash
systemctl stop chainfire-cluster-join.service
systemctl disable chainfire-cluster-join.service
```

### Re-enable After Manual Operations

After manual cluster operations:

```bash
# Create marker files to indicate join complete
mkdir -p /var/lib/first-boot-automation
date -Iseconds > /var/lib/first-boot-automation/.chainfire-joined
date -Iseconds > /var/lib/first-boot-automation/.flaredb-joined

# Or re-enable automation (will skip if markers exist)
systemctl enable --now chainfire-cluster-join.service
```

## Security

### TLS Certificates

**Requirements:**
- All cluster communication uses TLS
- Certificates must exist before first boot
- Generated by T031 TLS automation

**Certificate Paths:**
```
/etc/nixos/secrets/
├── ca.crt              # CA certificate
├── node01.crt          # Node certificate
└── node01.key          # Node private key (mode 0600)
```

**Permissions:**
```bash
chmod 600 /etc/nixos/secrets/node01.key
chmod 644 /etc/nixos/secrets/node01.crt
chmod 644 /etc/nixos/secrets/ca.crt
```

### Configuration File Security

**Cluster configuration contains sensitive data:**
- IP addresses and network topology
- Service URLs
- Node identifiers

**Recommended permissions:**
```bash
chmod 600 /etc/nixos/secrets/cluster-config.json
chown root:root /etc/nixos/secrets/cluster-config.json
```

### Network Security

**Required firewall rules:**
```bash
# Chainfire
iptables -A INPUT -p tcp --dport 2379 -s 10.0.1.0/24 -j ACCEPT  # API
iptables -A INPUT -p tcp --dport 2380 -s 10.0.1.0/24 -j ACCEPT  # Raft
iptables -A INPUT -p tcp --dport 2381 -s 10.0.1.0/24 -j ACCEPT  # Gossip

# FlareDB
iptables -A INPUT -p tcp --dport 2479 -s 10.0.1.0/24 -j ACCEPT  # API
iptables -A INPUT -p tcp --dport 2480 -s 10.0.1.0/24 -j ACCEPT  # Raft

# IAM
iptables -A INPUT -p tcp --dport 8080 -s 10.0.1.0/24 -j ACCEPT  # API
```

### Production Considerations

**For production deployments:**

1. **Remove `-k` flag from curl** (validate TLS certificates)
2. **Implement mTLS** for client authentication
3. **Rotate credentials** regularly
4. **Audit logs** with structured logging
5. **Monitor health endpoints** continuously
6. **Backup cluster state** before changes

## Examples

### Example 1: 3-Node Bootstrap Cluster

**Node 1:**
```json
{
  "node_id": "node01",
  "bootstrap": true,
  "raft_addr": "10.0.1.10:2380",
  "initial_peers": ["node01:2380", "node02:2380", "node03:2380"]
}
```

**Node 2:**
```json
{
  "node_id": "node02",
  "bootstrap": true,
  "raft_addr": "10.0.1.11:2380",
  "initial_peers": ["node01:2380", "node02:2380", "node03:2380"]
}
```

**Node 3:**
```json
{
  "node_id": "node03",
  "bootstrap": true,
  "raft_addr": "10.0.1.12:2380",
  "initial_peers": ["node01:2380", "node02:2380", "node03:2380"]
}
```

**Provisioning:**
```bash
# Provision all 3 nodes simultaneously
for i in {1..3}; do
  nixos-anywhere --flake .#node0$i root@node0$i.example.com &
done
wait

# Nodes will bootstrap automatically on first boot
```

### Example 2: Join Existing Cluster

**Node 4 (joining):**
```json
{
  "node_id": "node04",
  "bootstrap": false,
  "leader_url": "https://node01.prod.example.com:2379",
  "raft_addr": "10.0.1.13:2380"
}
```

**Provisioning:**
```bash
nixos-anywhere --flake .#node04 root@node04.example.com

# Node will automatically join on first boot
```

### Example 3: Single-Node All-in-One

**For development/testing:**
```json
{
  "node_id": "aio01",
  "bootstrap": true,
  "raft_addr": "10.0.2.10:2380",
  "initial_peers": ["aio01:2380"],
  "flaredb_peers": ["aio01:2480"]
}
```

**Provisioning:**
```bash
nixos-anywhere --flake .#aio01 root@aio01.example.com
```

## Integration with Other Systems

### T024 NixOS Modules

First-boot automation integrates with service modules:

```nix
{
  imports = [
    ./nix/modules/chainfire.nix
    ./nix/modules/flaredb.nix
    ./nix/modules/first-boot-automation.nix
  ];

  services.chainfire.enable = true;
  services.flaredb.enable = true;
  services.first-boot-automation.enable = true;
}
```

### T025 Observability

Health checks integrate with Prometheus:

```yaml
# prometheus.yml
scrape_configs:
  - job_name: 'cluster-health'
    static_configs:
      - targets: ['node01:2379', 'node02:2379', 'node03:2379']
    metrics_path: '/health'
```

### T031 TLS Certificates

Certificates generated by T031 are used automatically:

```bash
# On provisioning server
./tls/generate-node-cert.sh node01.example.com 10.0.1.10

# Copied during nixos-anywhere
# First-boot automation reads from /etc/nixos/secrets/
```

## Logs and Debugging

### Structured Logging

All logs are JSON-formatted:

```json
{
  "timestamp": "2025-12-10T10:30:45+00:00",
  "level": "INFO",
  "service": "chainfire",
  "operation": "cluster-join",
  "message": "Successfully joined cluster"
}
```

### Query Examples

**All first-boot logs:**
```bash
journalctl -u "*cluster-join*" -u "*initial-setup*" -u "*health-check*"
```

**Errors only:**
```bash
journalctl -u chainfire-cluster-join.service | grep '"level":"ERROR"'
```

**Last boot only:**
```bash
journalctl -b -u chainfire-cluster-join.service
```

**JSON output for parsing:**
```bash
journalctl -u chainfire-cluster-join.service -o json | jq '.MESSAGE'
```

## Performance Tuning

### Timeout Configuration

Adjust timeouts in NixOS module:

```nix
services.first-boot-automation = {
  enable = true;

  # Override default ports if needed
  chainfirePort = 2379;
  flaredbPort = 2479;
};
```

### Retry Configuration

Modify retry logic in scripts:

```bash
# baremetal/first-boot/cluster-join.sh
MAX_ATTEMPTS=10      # Increase from 5
RETRY_DELAY=15       # Increase from 10s
```

### Health Check Interval

Adjust polling interval:

```bash
# In service scripts
sleep 10  # Increase from 5s for less aggressive polling
```

## Support and Contributing

### Getting Help

1. Check logs: `journalctl -u chainfire-cluster-join.service`
2. Review troubleshooting section above
3. Consult [ARCHITECTURE.md](ARCHITECTURE.md) for design details
4. Check cluster status on leader node

### Reporting Issues

Include in bug reports:

```bash
# Gather diagnostic information
journalctl -u chainfire-cluster-join.service > cluster-join.log
systemctl status chainfire-cluster-join.service > service-status.txt
cat /etc/nixos/secrets/cluster-config.json > config.json  # Redact sensitive data!
ls -la /var/lib/first-boot-automation/ > markers.txt
```

### Development

See [ARCHITECTURE.md](ARCHITECTURE.md) for contributing guidelines.

## References

- **ARCHITECTURE.md**: Detailed design documentation
- **T024**: NixOS service modules
- **T025**: Observability and monitoring
- **T031**: TLS certificate automation
- **T032.S1-S3**: PXE boot and provisioning
- **Design Document**: `/home/centra/cloud/docs/por/T032-baremetal-provisioning/design.md`

## License

Internal use only - Centra Cloud Platform