- netboot-base.nix with SSH key auth - Launch scripts for node01/02/03 - Node configuration.nix and disko.nix - Nix modules for first-boot automation 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
763 lines
23 KiB
Markdown
763 lines
23 KiB
Markdown
# First-Boot Automation Architecture
|
|
|
|
## Overview
|
|
|
|
The first-boot automation system provides automated cluster joining and service initialization for bare-metal provisioned nodes. It handles two critical scenarios:
|
|
|
|
1. **Bootstrap Mode**: First 3 nodes initialize a new Raft cluster
|
|
2. **Join Mode**: Additional nodes join an existing cluster
|
|
|
|
This document describes the architecture, design decisions, and implementation details.
|
|
|
|
## System Architecture
|
|
|
|
### Component Hierarchy
|
|
|
|
```
|
|
┌─────────────────────────────────────────────────────────────┐
|
|
│ NixOS Boot Process │
|
|
└────────────────────┬────────────────────────────────────────┘
|
|
│
|
|
▼
|
|
┌─────────────────────────────────────────────────────────────┐
|
|
│ systemd.target: multi-user.target │
|
|
└────────────────────┬────────────────────────────────────────┘
|
|
│
|
|
┌───────────────┼───────────────┐
|
|
│ │ │
|
|
▼ ▼ ▼
|
|
┌──────────┐ ┌──────────┐ ┌──────────┐
|
|
│chainfire │ │ flaredb │ │ iam │
|
|
│.service │ │.service │ │.service │
|
|
└────┬─────┘ └────┬─────┘ └────┬─────┘
|
|
│ │ │
|
|
▼ ▼ ▼
|
|
┌──────────────────────────────────────────┐
|
|
│ chainfire-cluster-join.service │
|
|
│ - Waits for local chainfire health │
|
|
│ - Checks bootstrap flag │
|
|
│ - Joins cluster if bootstrap=false │
|
|
└────────────────┬─────────────────────────┘
|
|
│
|
|
▼
|
|
┌──────────────────────────────────────────┐
|
|
│ flaredb-cluster-join.service │
|
|
│ - Requires chainfire-cluster-join │
|
|
│ - Waits for local flaredb health │
|
|
│ - Joins FlareDB cluster │
|
|
└────────────────┬─────────────────────────┘
|
|
│
|
|
▼
|
|
┌──────────────────────────────────────────┐
|
|
│ iam-initial-setup.service │
|
|
│ - Waits for IAM health │
|
|
│ - Creates admin user if needed │
|
|
│ - Generates initial tokens │
|
|
└────────────────┬─────────────────────────┘
|
|
│
|
|
▼
|
|
┌──────────────────────────────────────────┐
|
|
│ cluster-health-check.service │
|
|
│ - Polls all service health endpoints │
|
|
│ - Verifies cluster membership │
|
|
│ - Reports to journald │
|
|
└──────────────────────────────────────────┘
|
|
```
|
|
|
|
### Configuration Flow
|
|
|
|
```
|
|
┌─────────────────────────────────────────┐
|
|
│ Provisioning Server │
|
|
│ - Generates cluster-config.json │
|
|
│ - Copies to /etc/nixos/secrets/ │
|
|
└────────────────┬────────────────────────┘
|
|
│
|
|
│ nixos-anywhere
|
|
│
|
|
▼
|
|
┌─────────────────────────────────────────┐
|
|
│ Target Node │
|
|
│ /etc/nixos/secrets/cluster-config.json │
|
|
└────────────────┬────────────────────────┘
|
|
│
|
|
│ Read by NixOS module
|
|
│
|
|
▼
|
|
┌─────────────────────────────────────────┐
|
|
│ first-boot-automation.nix │
|
|
│ - Parses JSON config │
|
|
│ - Creates systemd services │
|
|
│ - Sets up dependencies │
|
|
└────────────────┬────────────────────────┘
|
|
│
|
|
│ systemd activation
|
|
│
|
|
▼
|
|
┌─────────────────────────────────────────┐
|
|
│ Cluster Join Services │
|
|
│ - Execute join logic │
|
|
│ - Create marker files │
|
|
│ - Log to journald │
|
|
└─────────────────────────────────────────┘
|
|
```
|
|
|
|
## Bootstrap vs Join Decision Logic
|
|
|
|
### Decision Tree
|
|
|
|
```
|
|
┌─────────────────┐
|
|
│ Node Boots │
|
|
└────────┬────────┘
|
|
│
|
|
┌────────▼────────┐
|
|
│ Read cluster- │
|
|
│ config.json │
|
|
└────────┬────────┘
|
|
│
|
|
┌────────▼────────┐
|
|
│ bootstrap=true? │
|
|
└────────┬────────┘
|
|
│
|
|
┌────────────┴────────────┐
|
|
│ │
|
|
YES ▼ ▼ NO
|
|
┌─────────────────┐ ┌─────────────────┐
|
|
│ Bootstrap Mode │ │ Join Mode │
|
|
│ │ │ │
|
|
│ - Skip cluster │ │ - Wait for │
|
|
│ join API │ │ local health │
|
|
│ - Raft cluster │ │ - Contact │
|
|
│ initializes │ │ leader │
|
|
│ internally │ │ - POST to │
|
|
│ - Create marker │ │ /member/add │
|
|
│ - Exit success │ │ - Retry 5x │
|
|
└─────────────────┘ └─────────────────┘
|
|
```
|
|
|
|
### Bootstrap Mode (bootstrap: true)
|
|
|
|
**When to use:**
|
|
- First 3 nodes in a new cluster
|
|
- Nodes configured with matching `initial_peers`
|
|
- No existing cluster to join
|
|
|
|
**Behavior:**
|
|
1. Service starts with `--initial-cluster` parameter containing all bootstrap peers
|
|
2. Raft consensus protocol automatically elects leader
|
|
3. Cluster join service detects bootstrap mode and exits immediately
|
|
4. No API calls to leader (cluster doesn't exist yet)
|
|
|
|
**Configuration:**
|
|
```json
|
|
{
|
|
"bootstrap": true,
|
|
"initial_peers": ["node01:2380", "node02:2380", "node03:2380"]
|
|
}
|
|
```
|
|
|
|
**Marker file:** `/var/lib/first-boot-automation/.chainfire-initialized`
|
|
|
|
### Join Mode (bootstrap: false)
|
|
|
|
**When to use:**
|
|
- Nodes joining an existing cluster
|
|
- Expansion or replacement nodes
|
|
- Leader URL is known and reachable
|
|
|
|
**Behavior:**
|
|
1. Service starts with no initial cluster configuration
|
|
2. Cluster join service waits for local service health
|
|
3. POST to leader's `/admin/member/add` with node info
|
|
4. Leader adds member to Raft configuration
|
|
5. Node joins cluster and synchronizes state
|
|
|
|
**Configuration:**
|
|
```json
|
|
{
|
|
"bootstrap": false,
|
|
"leader_url": "https://node01.example.com:2379",
|
|
"raft_addr": "10.0.1.13:2380"
|
|
}
|
|
```
|
|
|
|
**Marker file:** `/var/lib/first-boot-automation/.chainfire-joined`
|
|
|
|
## Idempotency and State Management
|
|
|
|
### Marker Files
|
|
|
|
The system uses marker files to track initialization state:
|
|
|
|
```
|
|
/var/lib/first-boot-automation/
|
|
├── .chainfire-initialized # Bootstrap node initialized
|
|
├── .chainfire-joined # Node joined cluster
|
|
├── .flaredb-initialized # FlareDB bootstrap
|
|
├── .flaredb-joined # FlareDB joined
|
|
└── .iam-initialized # IAM setup complete
|
|
```
|
|
|
|
**Purpose:**
|
|
- Prevent duplicate join attempts on reboot
|
|
- Support idempotent operations
|
|
- Enable troubleshooting (check timestamps)
|
|
|
|
**Format:** ISO8601 timestamp of initialization
|
|
```
|
|
2025-12-10T10:30:45+00:00
|
|
```
|
|
|
|
### State Transitions
|
|
|
|
```
|
|
┌──────────────┐
|
|
│ First Boot │
|
|
│ (no marker) │
|
|
└──────┬───────┘
|
|
│
|
|
▼
|
|
┌──────────────┐
|
|
│ Check Config │
|
|
│ bootstrap=? │
|
|
└──────┬───────┘
|
|
│
|
|
├─(true)──▶ Bootstrap ──▶ Create .initialized ──▶ Done
|
|
│
|
|
└─(false)─▶ Join ──▶ Create .joined ──▶ Done
|
|
│
|
|
│ (reboot)
|
|
▼
|
|
┌──────────────┐
|
|
│ Marker Exists│
|
|
│ Skip Join │
|
|
└──────────────┘
|
|
```
|
|
|
|
## Retry Logic and Error Handling
|
|
|
|
### Health Check Retry
|
|
|
|
**Parameters:**
|
|
- Timeout: 120 seconds (configurable)
|
|
- Retry Interval: 5 seconds
|
|
- Max Elapsed: 300 seconds
|
|
|
|
**Logic:**
|
|
```bash
|
|
START_TIME=$(date +%s)
|
|
while true; do
|
|
ELAPSED=$(($(date +%s) - START_TIME))
|
|
if [[ $ELAPSED -ge $TIMEOUT ]]; then
|
|
exit 1 # Timeout
|
|
fi
|
|
|
|
HTTP_CODE=$(curl -k -s -o /dev/null -w "%{http_code}" "$HEALTH_URL")
|
|
if [[ "$HTTP_CODE" == "200" ]]; then
|
|
exit 0 # Success
|
|
fi
|
|
|
|
sleep 5
|
|
done
|
|
```
|
|
|
|
### Cluster Join Retry
|
|
|
|
**Parameters:**
|
|
- Max Attempts: 5 (configurable)
|
|
- Retry Delay: 10 seconds
|
|
- Exponential Backoff: Optional (not implemented)
|
|
|
|
**Logic:**
|
|
```bash
|
|
for ATTEMPT in $(seq 1 $MAX_ATTEMPTS); do
|
|
HTTP_CODE=$(curl -X POST "$LEADER_URL/admin/member/add" -d "$PAYLOAD")
|
|
|
|
if [[ "$HTTP_CODE" == "200" || "$HTTP_CODE" == "201" ]]; then
|
|
exit 0 # Success
|
|
elif [[ "$HTTP_CODE" == "409" ]]; then
|
|
exit 2 # Already member
|
|
fi
|
|
|
|
sleep $RETRY_DELAY
|
|
done
|
|
|
|
exit 1 # Max attempts exhausted
|
|
```
|
|
|
|
### Error Codes
|
|
|
|
**Health Check:**
|
|
- `0`: Service healthy
|
|
- `1`: Timeout or unhealthy
|
|
|
|
**Cluster Join:**
|
|
- `0`: Successfully joined
|
|
- `1`: Failed after max attempts
|
|
- `2`: Already joined (idempotent)
|
|
- `3`: Invalid arguments
|
|
|
|
**Bootstrap Detector:**
|
|
- `0`: Should bootstrap
|
|
- `1`: Should join existing
|
|
- `2`: Configuration error
|
|
|
|
## Security Considerations
|
|
|
|
### TLS Certificate Handling
|
|
|
|
**Requirements:**
|
|
- All inter-node communication uses TLS
|
|
- Self-signed certificates supported via `-k` flag to curl
|
|
- Certificate validation in production (remove `-k`)
|
|
|
|
**Certificate Paths:**
|
|
```json
|
|
{
|
|
"tls": {
|
|
"enabled": true,
|
|
"ca_cert_path": "/etc/nixos/secrets/ca.crt",
|
|
"node_cert_path": "/etc/nixos/secrets/node01.crt",
|
|
"node_key_path": "/etc/nixos/secrets/node01.key"
|
|
}
|
|
}
|
|
```
|
|
|
|
**Integration with T031:**
|
|
- Certificates generated by T031 TLS automation
|
|
- Copied to target during provisioning
|
|
- Read by services at startup
|
|
|
|
### Secrets Management
|
|
|
|
**Cluster Configuration:**
|
|
- Stored in `/etc/nixos/secrets/cluster-config.json`
|
|
- Permissions: `0600 root:root` (recommended)
|
|
- Contains sensitive data: URLs, IPs, topology
|
|
|
|
**API Credentials:**
|
|
- IAM admin credentials (future implementation)
|
|
- Stored in separate file: `/etc/nixos/secrets/iam-admin.json`
|
|
- Never logged to journald
|
|
|
|
### Attack Surface
|
|
|
|
**Mitigations:**
|
|
1. **Network-level**: Firewall rules restrict cluster API ports
|
|
2. **Application-level**: mTLS for authenticated requests
|
|
3. **Access control**: SystemD service isolation
|
|
4. **Audit**: All operations logged to journald with structured JSON
|
|
|
|
## Integration Points
|
|
|
|
### T024 NixOS Modules
|
|
|
|
The first-boot automation module imports and extends service modules:
|
|
|
|
```nix
|
|
# Example: netboot-control-plane.nix
|
|
{
|
|
imports = [
|
|
../modules/chainfire.nix
|
|
../modules/flaredb.nix
|
|
../modules/iam.nix
|
|
../modules/first-boot-automation.nix
|
|
];
|
|
|
|
services.first-boot-automation.enable = true;
|
|
}
|
|
```
|
|
|
|
### T031 TLS Certificates
|
|
|
|
**Dependencies:**
|
|
- TLS certificates must exist before first boot
|
|
- Provisioning script copies certificates to `/etc/nixos/secrets/`
|
|
- Services read certificates at startup
|
|
|
|
**Certificate Generation:**
|
|
```bash
|
|
# On provisioning server (T031)
|
|
./tls/generate-node-cert.sh node01.example.com 10.0.1.10
|
|
|
|
# Copied to target
|
|
scp ca.crt node01.crt node01.key root@10.0.1.10:/etc/nixos/secrets/
|
|
```
|
|
|
|
### T032.S1-S3 PXE/Netboot
|
|
|
|
**Boot Flow:**
|
|
1. PXE boot loads iPXE firmware
|
|
2. iPXE chainloads NixOS kernel/initrd
|
|
3. NixOS installer runs (nixos-anywhere)
|
|
4. System installed to disk with first-boot automation
|
|
5. Reboot into installed system
|
|
6. First-boot automation executes
|
|
|
|
**Configuration Injection:**
|
|
```bash
|
|
# During nixos-anywhere provisioning
|
|
mkdir -p /mnt/etc/nixos/secrets
|
|
cp cluster-config.json /mnt/etc/nixos/secrets/
|
|
chmod 600 /mnt/etc/nixos/secrets/cluster-config.json
|
|
```
|
|
|
|
## Service Dependencies
|
|
|
|
### Systemd Ordering
|
|
|
|
**Chainfire:**
|
|
```
|
|
After: network-online.target, chainfire.service
|
|
Before: flaredb-cluster-join.service
|
|
Wants: network-online.target
|
|
```
|
|
|
|
**FlareDB:**
|
|
```
|
|
After: chainfire-cluster-join.service, flaredb.service
|
|
Requires: chainfire-cluster-join.service
|
|
Before: iam-initial-setup.service
|
|
```
|
|
|
|
**IAM:**
|
|
```
|
|
After: flaredb-cluster-join.service, iam.service
|
|
Before: cluster-health-check.service
|
|
```
|
|
|
|
**Health Check:**
|
|
```
|
|
After: chainfire-cluster-join, flaredb-cluster-join, iam-initial-setup
|
|
Type: oneshot (no RemainAfterExit)
|
|
```
|
|
|
|
### Dependency Graph
|
|
|
|
```
|
|
network-online.target
|
|
│
|
|
├──▶ chainfire.service
|
|
│ │
|
|
│ ▼
|
|
│ chainfire-cluster-join.service
|
|
│ │
|
|
├──▶ flaredb.service
|
|
│ │
|
|
│ ▼
|
|
└────▶ flaredb-cluster-join.service
|
|
│
|
|
┌────┴────┐
|
|
│ │
|
|
iam.service │
|
|
│ │
|
|
▼ │
|
|
iam-initial-setup.service
|
|
│ │
|
|
└────┬────┘
|
|
│
|
|
▼
|
|
cluster-health-check.service
|
|
```
|
|
|
|
## Logging and Observability
|
|
|
|
### Structured Logging
|
|
|
|
All scripts output JSON-formatted logs:
|
|
|
|
```json
|
|
{
|
|
"timestamp": "2025-12-10T10:30:45+00:00",
|
|
"level": "INFO",
|
|
"service": "chainfire",
|
|
"operation": "cluster-join",
|
|
"message": "Successfully joined cluster"
|
|
}
|
|
```
|
|
|
|
**Benefits:**
|
|
- Machine-readable for log aggregation (T025)
|
|
- Easy filtering with `journalctl -o json`
|
|
- Includes context (service, operation, timestamp)
|
|
|
|
### Querying Logs
|
|
|
|
**View all first-boot automation logs:**
|
|
```bash
|
|
journalctl -u chainfire-cluster-join.service -u flaredb-cluster-join.service \
|
|
-u iam-initial-setup.service -u cluster-health-check.service
|
|
```
|
|
|
|
**Filter by log level:**
|
|
```bash
|
|
journalctl -u chainfire-cluster-join.service | grep '"level":"ERROR"'
|
|
```
|
|
|
|
**Follow live:**
|
|
```bash
|
|
journalctl -u chainfire-cluster-join.service -f
|
|
```
|
|
|
|
### Health Check Integration
|
|
|
|
**T025 Observability:**
|
|
- Health check service can POST to metrics endpoint
|
|
- Prometheus scraping of `/health` endpoints
|
|
- Alerts on cluster join failures
|
|
|
|
**Future:**
|
|
- Webhook to provisioning server on completion
|
|
- Slack/email notifications on errors
|
|
- Dashboard showing cluster join status
|
|
|
|
## Performance Characteristics
|
|
|
|
### Boot Time Analysis
|
|
|
|
**Typical Timeline (3-node cluster):**
|
|
```
|
|
T+0s : systemd starts
|
|
T+5s : network-online.target reached
|
|
T+10s : chainfire.service starts
|
|
T+15s : chainfire healthy
|
|
T+15s : chainfire-cluster-join runs (bootstrap, immediate exit)
|
|
T+20s : flaredb.service starts
|
|
T+25s : flaredb healthy
|
|
T+25s : flaredb-cluster-join runs (bootstrap, immediate exit)
|
|
T+30s : iam.service starts
|
|
T+35s : iam healthy
|
|
T+35s : iam-initial-setup runs
|
|
T+40s : cluster-health-check runs
|
|
T+40s : Node fully operational
|
|
```
|
|
|
|
**Join Mode (node joining existing cluster):**
|
|
```
|
|
T+0s : systemd starts
|
|
T+5s : network-online.target reached
|
|
T+10s : chainfire.service starts
|
|
T+15s : chainfire healthy
|
|
T+15s : chainfire-cluster-join runs
|
|
T+20s : POST to leader, wait for response
|
|
T+25s : Successfully joined chainfire cluster
|
|
T+25s : flaredb.service starts
|
|
T+30s : flaredb healthy
|
|
T+30s : flaredb-cluster-join runs
|
|
T+35s : Successfully joined flaredb cluster
|
|
T+40s : iam-initial-setup (skips, already initialized)
|
|
T+45s : cluster-health-check runs
|
|
T+45s : Node fully operational
|
|
```
|
|
|
|
### Bottlenecks
|
|
|
|
**Health Check Polling:**
|
|
- 5-second intervals may be too aggressive
|
|
- Recommendation: Exponential backoff
|
|
|
|
**Network Latency:**
|
|
- Join requests block on network RTT
|
|
- Mitigation: Ensure low-latency cluster network
|
|
|
|
**Raft Synchronization:**
|
|
- New member must catch up on Raft log
|
|
- Time depends on log size (seconds to minutes)
|
|
|
|
## Failure Modes and Recovery
|
|
|
|
### Common Failures
|
|
|
|
**1. Leader Unreachable**
|
|
|
|
**Symptom:**
|
|
```json
|
|
{"level":"ERROR","message":"Join request failed: connection error"}
|
|
```
|
|
|
|
**Diagnosis:**
|
|
- Check network connectivity: `ping node01.example.com`
|
|
- Verify firewall rules: `iptables -L`
|
|
- Check leader service status: `systemctl status chainfire.service`
|
|
|
|
**Recovery:**
|
|
```bash
|
|
# Fix network/firewall, then restart join service
|
|
systemctl restart chainfire-cluster-join.service
|
|
```
|
|
|
|
**2. Invalid Configuration**
|
|
|
|
**Symptom:**
|
|
```json
|
|
{"level":"ERROR","message":"Configuration file not found"}
|
|
```
|
|
|
|
**Diagnosis:**
|
|
- Verify file exists: `ls -la /etc/nixos/secrets/cluster-config.json`
|
|
- Check JSON syntax: `jq . /etc/nixos/secrets/cluster-config.json`
|
|
|
|
**Recovery:**
|
|
```bash
|
|
# Fix configuration, then restart
|
|
systemctl restart chainfire-cluster-join.service
|
|
```
|
|
|
|
**3. Service Not Healthy**
|
|
|
|
**Symptom:**
|
|
```json
|
|
{"level":"ERROR","message":"Health check timeout"}
|
|
```
|
|
|
|
**Diagnosis:**
|
|
- Check service logs: `journalctl -u chainfire.service`
|
|
- Verify service is running: `systemctl status chainfire.service`
|
|
- Test health endpoint: `curl -k https://localhost:2379/health`
|
|
|
|
**Recovery:**
|
|
```bash
|
|
# Restart the main service
|
|
systemctl restart chainfire.service
|
|
|
|
# Join service will auto-retry after RestartSec
|
|
```
|
|
|
|
**4. Already Member**
|
|
|
|
**Symptom:**
|
|
```json
|
|
{"level":"WARN","message":"Node already member of cluster (HTTP 409)"}
|
|
```
|
|
|
|
**Diagnosis:**
|
|
- This is normal on reboots
|
|
- Marker file created to prevent future attempts
|
|
|
|
**Recovery:**
|
|
- No action needed (idempotent behavior)
|
|
|
|
### Manual Cluster Join
|
|
|
|
If automation fails, manual join:
|
|
|
|
**Chainfire:**
|
|
```bash
|
|
curl -k -X POST https://node01.example.com:2379/admin/member/add \
|
|
-H "Content-Type: application/json" \
|
|
-d '{"id":"node04","raft_addr":"10.0.1.13:2380"}'
|
|
|
|
# Create marker to prevent auto-retry
|
|
mkdir -p /var/lib/first-boot-automation
|
|
date -Iseconds > /var/lib/first-boot-automation/.chainfire-joined
|
|
```
|
|
|
|
**FlareDB:**
|
|
```bash
|
|
curl -k -X POST https://node01.example.com:2479/admin/member/add \
|
|
-H "Content-Type: application/json" \
|
|
-d '{"id":"node04","raft_addr":"10.0.1.13:2480"}'
|
|
|
|
date -Iseconds > /var/lib/first-boot-automation/.flaredb-joined
|
|
```
|
|
|
|
### Rollback Procedure
|
|
|
|
**Remove from cluster:**
|
|
```bash
|
|
# On leader
|
|
curl -k -X DELETE https://node01.example.com:2379/admin/member/node04
|
|
|
|
# On node being removed
|
|
systemctl stop chainfire.service
|
|
rm -rf /var/lib/chainfire/*
|
|
rm /var/lib/first-boot-automation/.chainfire-joined
|
|
|
|
# Re-enable automation
|
|
systemctl restart chainfire-cluster-join.service
|
|
```
|
|
|
|
## Future Enhancements
|
|
|
|
### Planned Improvements
|
|
|
|
**1. Exponential Backoff**
|
|
- Current: Fixed 10-second delay
|
|
- Future: 1s, 2s, 4s, 8s, 16s exponential backoff
|
|
|
|
**2. Leader Discovery**
|
|
- Current: Static leader URL in config
|
|
- Future: DNS SRV records for dynamic discovery
|
|
|
|
**3. Webhook Notifications**
|
|
- POST to provisioning server on completion
|
|
- Include node info, join time, cluster health
|
|
|
|
**4. Pre-flight Checks**
|
|
- Validate network connectivity before attempting join
|
|
- Check TLS certificate validity
|
|
- Verify disk space, memory, CPU requirements
|
|
|
|
**5. Automated Testing**
|
|
- Integration tests with real cluster
|
|
- Simulate failures (network partitions, leader crashes)
|
|
- Validate idempotency
|
|
|
|
**6. Configuration Validation**
|
|
- JSON schema validation at boot
|
|
- Fail fast on invalid configuration
|
|
- Provide clear error messages
|
|
|
|
## References
|
|
|
|
- **T024**: NixOS service modules
|
|
- **T025**: Observability and monitoring
|
|
- **T031**: TLS certificate automation
|
|
- **T032.S1-S3**: PXE boot, netboot images, provisioning
|
|
- **Design Document**: `/home/centra/cloud/docs/por/T032-baremetal-provisioning/design.md`
|
|
|
|
## Appendix: Configuration Schema
|
|
|
|
### cluster-config.json Schema
|
|
|
|
```json
|
|
{
|
|
"$schema": "http://json-schema.org/draft-07/schema#",
|
|
"type": "object",
|
|
"required": ["node_id", "node_role", "bootstrap", "cluster_name", "leader_url", "raft_addr"],
|
|
"properties": {
|
|
"node_id": {
|
|
"type": "string",
|
|
"description": "Unique node identifier"
|
|
},
|
|
"node_role": {
|
|
"type": "string",
|
|
"enum": ["control-plane", "worker", "all-in-one"]
|
|
},
|
|
"bootstrap": {
|
|
"type": "boolean",
|
|
"description": "True for first 3 nodes, false for join"
|
|
},
|
|
"cluster_name": {
|
|
"type": "string"
|
|
},
|
|
"leader_url": {
|
|
"type": "string",
|
|
"format": "uri"
|
|
},
|
|
"raft_addr": {
|
|
"type": "string",
|
|
"pattern": "^[0-9.]+:[0-9]+$"
|
|
},
|
|
"initial_peers": {
|
|
"type": "array",
|
|
"items": {"type": "string"}
|
|
},
|
|
"flaredb_peers": {
|
|
"type": "array",
|
|
"items": {"type": "string"}
|
|
}
|
|
}
|
|
}
|
|
```
|