- netboot-base.nix with SSH key auth - Launch scripts for node01/02/03 - Node configuration.nix and disko.nix - Nix modules for first-boot automation 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
492 lines
33 KiB
Markdown
492 lines
33 KiB
Markdown
# Service Dependencies Diagram
|
|
|
|
**Document Version:** 1.0
|
|
**Last Updated:** 2025-12-10
|
|
|
|
## Service Startup Order
|
|
|
|
```
|
|
┌─────────────────────────────────────────────────────────────────────────┐
|
|
│ PlasmaCloud Service Dependency Graph │
|
|
│ (systemd unit dependencies) │
|
|
└─────────────────────────────────────────────────────────────────────────┘
|
|
|
|
System Boot
|
|
│
|
|
v
|
|
┌──────────────────┐
|
|
│ systemd (PID 1) │
|
|
└────────┬─────────┘
|
|
│
|
|
v
|
|
┌───────────────────────────────┐
|
|
│ basic.target │
|
|
│ • mounts filesystems │
|
|
│ • activates swap │
|
|
└───────────────┬───────────────┘
|
|
│
|
|
v
|
|
┌───────────────────────────────┐
|
|
│ network.target │
|
|
│ • brings up network interfaces│
|
|
│ • configures IP addresses │
|
|
└───────────────┬───────────────┘
|
|
│
|
|
v
|
|
┌───────────────────────────────┐
|
|
│ network-online.target │
|
|
│ • waits for network ready │
|
|
│ • ensures DNS resolution │
|
|
└───────────────┬───────────────┘
|
|
│
|
|
v
|
|
┌─────────────────────┐
|
|
│ multi-user.target │
|
|
└──────────┬──────────┘
|
|
│
|
|
┌──────────────────┼──────────────────┐
|
|
│ │ │
|
|
v v v
|
|
[Level 1] [Level 2] [Level 3+]
|
|
Foundation Core Services Application Services
|
|
|
|
|
|
Level 1: Foundation Services (No dependencies)
|
|
═══════════════════════════════════════════════════════════════════════════
|
|
|
|
┌────────────────────────────────────────────────────────────────────────┐
|
|
│ Chainfire │
|
|
│ ├─ After: network-online.target │
|
|
│ ├─ Type: notify (systemd-aware) │
|
|
│ ├─ Ports: 2379 (API), 2380 (Raft), 2381 (Gossip) │
|
|
│ ├─ Data: /var/lib/chainfire │
|
|
│ └─ Start: ~10 seconds │
|
|
│ │
|
|
│ Purpose: Distributed configuration store, service discovery │
|
|
│ Critical: Yes (all other services depend on this) │
|
|
└────────────────────────────────────────────────────────────────────────┘
|
|
|
|
┌────────────────────────────────────────────────────────────────────────┐
|
|
│ FlareDB │
|
|
│ ├─ After: network-online.target, chainfire.service │
|
|
│ ├─ Requires: chainfire.service │
|
|
│ ├─ Type: notify │
|
|
│ ├─ Ports: 2479 (API), 2480 (Raft) │
|
|
│ ├─ Data: /var/lib/flaredb │
|
|
│ └─ Start: ~15 seconds (after Chainfire) │
|
|
│ │
|
|
│ Purpose: Time-series database for metrics and events │
|
|
│ Critical: Yes (IAM and monitoring depend on this) │
|
|
└────────────────────────────────────────────────────────────────────────┘
|
|
|
|
|
|
Level 2: Core Services (Depend on Chainfire + FlareDB)
|
|
═══════════════════════════════════════════════════════════════════════════
|
|
|
|
┌────────────────────────────────────────────────────────────────────────┐
|
|
│ IAM (Identity and Access Management) │
|
|
│ ├─ After: flaredb.service │
|
|
│ ├─ Requires: flaredb.service │
|
|
│ ├─ Type: simple │
|
|
│ ├─ Port: 8080 (API) │
|
|
│ ├─ Backend: FlareDB (stores users, roles, tokens) │
|
|
│ └─ Start: ~5 seconds (after FlareDB) │
|
|
│ │
|
|
│ Purpose: Authentication and authorization for all APIs │
|
|
│ Critical: Yes (API access requires IAM tokens) │
|
|
└────────────────────────────────────────────────────────────────────────┘
|
|
|
|
|
|
Level 3: Application Services (Parallel startup)
|
|
═══════════════════════════════════════════════════════════════════════════
|
|
|
|
┌────────────────────────────────────────────────────────────────────────┐
|
|
│ PlasmaVMC (Virtual Machine Controller) │
|
|
│ ├─ After: chainfire.service, iam.service │
|
|
│ ├─ Wants: chainfire.service, iam.service │
|
|
│ ├─ Type: notify │
|
|
│ ├─ Port: 9090 (API) │
|
|
│ └─ Start: ~10 seconds │
|
|
│ │
|
|
│ Purpose: VM lifecycle management and orchestration │
|
|
└────────────────────────────────────────────────────────────────────────┘
|
|
|
|
┌────────────────────────────────────────────────────────────────────────┐
|
|
│ NovaNET (Software-Defined Networking) │
|
|
│ ├─ After: chainfire.service, iam.service │
|
|
│ ├─ Wants: chainfire.service │
|
|
│ ├─ Type: notify │
|
|
│ ├─ Ports: 9091 (API), 4789 (VXLAN) │
|
|
│ └─ Start: ~8 seconds │
|
|
│ │
|
|
│ Purpose: Virtual networking, VXLAN overlay management │
|
|
└────────────────────────────────────────────────────────────────────────┘
|
|
|
|
┌────────────────────────────────────────────────────────────────────────┐
|
|
│ FlashDNS (High-Performance DNS) │
|
|
│ ├─ After: chainfire.service │
|
|
│ ├─ Wants: chainfire.service │
|
|
│ ├─ Type: forking │
|
|
│ ├─ Ports: 53 (DNS), 853 (DoT) │
|
|
│ └─ Start: ~3 seconds │
|
|
│ │
|
|
│ Purpose: DNS resolution for VMs and services │
|
|
└────────────────────────────────────────────────────────────────────────┘
|
|
|
|
┌────────────────────────────────────────────────────────────────────────┐
|
|
│ FiberLB (Layer 4/7 Load Balancer) │
|
|
│ ├─ After: chainfire.service, iam.service │
|
|
│ ├─ Wants: chainfire.service │
|
|
│ ├─ Type: notify │
|
|
│ ├─ Port: 9092 (API), 80 (HTTP), 443 (HTTPS) │
|
|
│ └─ Start: ~5 seconds │
|
|
│ │
|
|
│ Purpose: Load balancing and traffic distribution │
|
|
└────────────────────────────────────────────────────────────────────────┘
|
|
|
|
┌────────────────────────────────────────────────────────────────────────┐
|
|
│ LightningStor (Distributed Block Storage) │
|
|
│ ├─ After: chainfire.service, flaredb.service │
|
|
│ ├─ Wants: chainfire.service │
|
|
│ ├─ Type: notify │
|
|
│ ├─ Ports: 9093 (API), 9094 (Replication), 3260 (iSCSI) │
|
|
│ └─ Start: ~12 seconds │
|
|
│ │
|
|
│ Purpose: Block storage for VMs and containers │
|
|
└────────────────────────────────────────────────────────────────────────┘
|
|
|
|
┌────────────────────────────────────────────────────────────────────────┐
|
|
│ K8sHost (Kubernetes Node Agent) │
|
|
│ ├─ After: chainfire.service, plasmavmc.service, novanet.service │
|
|
│ ├─ Wants: chainfire.service, novanet.service │
|
|
│ ├─ Type: notify │
|
|
│ ├─ Ports: 10250 (Kubelet), 10256 (Health) │
|
|
│ └─ Start: ~15 seconds │
|
|
│ │
|
|
│ Purpose: Kubernetes node agent for container orchestration │
|
|
└────────────────────────────────────────────────────────────────────────┘
|
|
```
|
|
|
|
## Dependency Visualization (ASCII)
|
|
|
|
```
|
|
┌─────────────────────────────────────────────────────────────────────────┐
|
|
│ Service Dependency Tree │
|
|
│ (direction: top-down) │
|
|
└─────────────────────────────────────────────────────────────────────────┘
|
|
|
|
network-online.target
|
|
│
|
|
│ After
|
|
v
|
|
┌───────────────┐
|
|
│ Chainfire │ (Level 1)
|
|
│ Port: 2379 │
|
|
└───────┬───────┘
|
|
│
|
|
┌────────────┼────────────┐
|
|
│ Requires │ Wants │ Wants
|
|
v v v
|
|
┌────────────┐ ┌──────────┐ ┌──────────┐
|
|
│ FlareDB │ │NovaNET │ │FlashDNS │
|
|
│ Port: 2479 │ │Port: 9091│ │Port: 53 │
|
|
└──────┬─────┘ └──────────┘ └──────────┘
|
|
│
|
|
┌────────┼────────┬──────────┐
|
|
│ Requires│ Wants │ Wants │ Wants
|
|
v v v v
|
|
┌─────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐
|
|
│ IAM │ │PlasmaVMC │ │ FiberLB │ │Lightning │
|
|
│Port:8080│ │Port: 9090│ │Port: 9092│ │Port: 9093│
|
|
└─────────┘ └─────┬────┘ └──────────┘ └──────────┘
|
|
│
|
|
│ Wants
|
|
v
|
|
┌─────────────┐
|
|
│ K8sHost │ (Level 3)
|
|
│ Port: 10250 │
|
|
└─────────────┘
|
|
|
|
Legend:
|
|
Requires: Hard dependency (service fails if dependency fails)
|
|
Wants: Soft dependency (service starts even if dependency fails)
|
|
After: Ordering (wait for dependency to start, but doesn't require success)
|
|
```
|
|
|
|
## Runtime Dependencies (Data Flow)
|
|
|
|
```
|
|
┌─────────────────────────────────────────────────────────────────────────┐
|
|
│ Service Communication Flow │
|
|
└─────────────────────────────────────────────────────────────────────────┘
|
|
|
|
External Client
|
|
│
|
|
│ HTTPS (8080)
|
|
v
|
|
┌────────────────┐
|
|
│ FiberLB │ Load balances requests
|
|
└───────┬────────┘
|
|
│
|
|
│ Forward to
|
|
v
|
|
┌────────────────┐ ┌──────────────┐
|
|
│ IAM │──────>│ FlareDB │ Validate token
|
|
│ (Auth check) │<──────│ (Token store)│
|
|
└───────┬────────┘ └──────────────┘
|
|
│
|
|
│ Token valid
|
|
v
|
|
┌────────────────┐ ┌──────────────┐ ┌──────────────┐
|
|
│ PlasmaVMC │──────>│ Chainfire │──────>│ Worker Node │
|
|
│ (API handler) │<──────│ (Coordination)│<──────│ (VM host) │
|
|
└────────────────┘ └──────────────┘ └──────────────┘
|
|
│
|
|
│ Allocate storage
|
|
v
|
|
┌────────────────┐ ┌──────────────┐
|
|
│ LightningStor │──────>│ FlareDB │ Store metadata
|
|
│ (Block device)│<──────│ (Metadata) │
|
|
└────────────────┘ └──────────────┘
|
|
│
|
|
│ Configure network
|
|
v
|
|
┌────────────────┐ ┌──────────────┐
|
|
│ NovaNET │──────>│ FlashDNS │ Register DNS
|
|
│ (VXLAN setup) │<──────│ (Resolution) │
|
|
└────────────────┘ └──────────────┘
|
|
```
|
|
|
|
## Failure Impact Analysis
|
|
|
|
```
|
|
┌─────────────────────────────────────────────────────────────────────────┐
|
|
│ Failure Impact Matrix │
|
|
└─────────────────────────────────────────────────────────────────────────┘
|
|
|
|
Service Fails │ Impact │ Mitigation
|
|
──────────────────┼──────────────────────────────────┼────────────────────
|
|
Chainfire │ ✗ Total cluster failure │ Raft quorum (3/5)
|
|
│ ✗ All services lose config │ Data replicated
|
|
│ ✗ New VMs cannot start │ Existing VMs run
|
|
│ │ Auto-leader election
|
|
──────────────────┼──────────────────────────────────┼────────────────────
|
|
FlareDB │ ✗ Metrics not collected │ Raft quorum (3/5)
|
|
│ ✗ IAM auth fails │ Cache last tokens
|
|
│ ⚠ Existing VMs continue │ New VMs blocked
|
|
│ │ Data replicated
|
|
──────────────────┼──────────────────────────────────┼────────────────────
|
|
IAM │ ✗ New API requests fail │ Token cache (TTL)
|
|
│ ⚠ Existing sessions valid │ Multiple instances
|
|
│ ⚠ Internal services unaffected │ Load balanced
|
|
──────────────────┼──────────────────────────────────┼────────────────────
|
|
PlasmaVMC │ ✗ Cannot create/delete VMs │ Multiple instances
|
|
│ ✓ Existing VMs unaffected │ Stateless (uses DB)
|
|
│ ⚠ VM monitoring stops │ Auto-restart VMs
|
|
──────────────────┼──────────────────────────────────┼────────────────────
|
|
NovaNET │ ✗ Cannot create new networks │ Multiple instances
|
|
│ ✓ Existing networks work │ Distributed agents
|
|
│ ⚠ VXLAN tunnels persist │ Control plane HA
|
|
──────────────────┼──────────────────────────────────┼────────────────────
|
|
FlashDNS │ ⚠ DNS resolution fails │ Multiple instances
|
|
│ ✓ Existing connections work │ DNS caching
|
|
│ ⚠ New connections affected │ Fallback DNS
|
|
──────────────────┼──────────────────────────────────┼────────────────────
|
|
FiberLB │ ⚠ Load balancing stops │ Multiple instances
|
|
│ ✓ Direct API access works │ VIP failover
|
|
│ ⚠ Client requests may timeout │ Health checks
|
|
──────────────────┼──────────────────────────────────┼────────────────────
|
|
LightningStor │ ⚠ Storage I/O may degrade │ Replication (3x)
|
|
│ ✓ Replicas on other nodes │ Auto-rebalance
|
|
│ ✗ New volumes cannot be created │ Multi-node cluster
|
|
──────────────────┼──────────────────────────────────┼────────────────────
|
|
K8sHost │ ⚠ Pods on failed node evicted │ Pod replicas
|
|
│ ✓ Cluster continues │ Kubernetes HA
|
|
│ ⚠ Capacity reduced │ Auto-rescheduling
|
|
|
|
Legend:
|
|
✗ Complete service failure
|
|
⚠ Partial service degradation
|
|
✓ No impact or minimal impact
|
|
```
|
|
|
|
## Service Health Check Endpoints
|
|
|
|
```
|
|
┌─────────────────────────────────────────────────────────────────────────┐
|
|
│ Health Check Endpoint Reference │
|
|
└─────────────────────────────────────────────────────────────────────────┘
|
|
|
|
Service │ Endpoint │ Expected Response
|
|
──────────────┼──────────────────────────────────┼────────────────────────
|
|
Chainfire │ https://host:2379/health │ {"status":"healthy",
|
|
│ │ "raft":"leader",
|
|
│ │ "cluster_size":3}
|
|
──────────────┼──────────────────────────────────┼────────────────────────
|
|
FlareDB │ https://host:2479/health │ {"status":"healthy",
|
|
│ │ "raft":"follower",
|
|
│ │ "chainfire":"connected"}
|
|
──────────────┼──────────────────────────────────┼────────────────────────
|
|
IAM │ https://host:8080/health │ {"status":"healthy",
|
|
│ │ "database":"connected",
|
|
│ │ "version":"1.0.0"}
|
|
──────────────┼──────────────────────────────────┼────────────────────────
|
|
PlasmaVMC │ https://host:9090/health │ {"status":"healthy",
|
|
│ │ "vms_running":42}
|
|
──────────────┼──────────────────────────────────┼────────────────────────
|
|
NovaNET │ https://host:9091/health │ {"status":"healthy",
|
|
│ │ "networks":5}
|
|
──────────────┼──────────────────────────────────┼────────────────────────
|
|
FlashDNS │ dig @host +short health.local │ 127.0.0.1 (A record)
|
|
│ https://host:853/health │ {"status":"healthy"}
|
|
──────────────┼──────────────────────────────────┼────────────────────────
|
|
FiberLB │ https://host:9092/health │ {"status":"healthy",
|
|
│ │ "backends":3}
|
|
──────────────┼──────────────────────────────────┼────────────────────────
|
|
LightningStor │ https://host:9093/health │ {"status":"healthy",
|
|
│ │ "volumes":15,
|
|
│ │ "total_gb":5000}
|
|
──────────────┼──────────────────────────────────┼────────────────────────
|
|
K8sHost │ https://host:10250/healthz │ ok (HTTP 200)
|
|
```
|
|
|
|
## First-Boot Service Dependencies
|
|
|
|
```
|
|
┌─────────────────────────────────────────────────────────────────────────┐
|
|
│ First-Boot Automation Services │
|
|
│ (T032.S4 - First-Boot) │
|
|
└─────────────────────────────────────────────────────────────────────────┘
|
|
|
|
network-online.target
|
|
│
|
|
v
|
|
┌─────────────────┐
|
|
│ chainfire.service│
|
|
└────────┬─────────┘
|
|
│ After
|
|
v
|
|
┌──────────────────────────────┐
|
|
│ chainfire-cluster-join.service│ (First-boot)
|
|
│ ├─ Reads cluster-config.json │
|
|
│ ├─ Detects bootstrap mode │
|
|
│ └─ Joins cluster or waits │
|
|
└────────┬─────────────────────┘
|
|
│ After
|
|
v
|
|
┌───────────────┐
|
|
│flaredb.service│
|
|
└────────┬──────┘
|
|
│ After
|
|
v
|
|
┌──────────────────────────────┐
|
|
│ flaredb-cluster-join.service │ (First-boot)
|
|
│ ├─ Waits for FlareDB healthy │
|
|
│ └─ Joins FlareDB cluster │
|
|
└────────┬─────────────────────┘
|
|
│ After
|
|
v
|
|
┌───────────────┐
|
|
│ iam.service │
|
|
└────────┬──────┘
|
|
│ After
|
|
v
|
|
┌──────────────────────────────┐
|
|
│ iam-initial-setup.service │ (First-boot)
|
|
│ ├─ Creates admin user │
|
|
│ └─ Initializes IAM │
|
|
└────────┬─────────────────────┘
|
|
│ After
|
|
v
|
|
┌──────────────────────────────┐
|
|
│ cluster-health-check.service│ (First-boot)
|
|
│ ├─ Validates all services │
|
|
│ ├─ Checks Raft quorum │
|
|
│ └─ Reports cluster ready │
|
|
└──────────────────────────────┘
|
|
│
|
|
v
|
|
┌──────────────────┐
|
|
│ Cluster Ready │
|
|
│ (multi-user.target reached)│
|
|
└──────────────────┘
|
|
```
|
|
|
|
## Systemd Unit Configuration Examples
|
|
|
|
```bash
|
|
# Chainfire service (example)
|
|
[Unit]
|
|
Description=Chainfire Distributed Configuration Service
|
|
After=network-online.target
|
|
Wants=network-online.target
|
|
|
|
[Service]
|
|
Type=notify
|
|
ExecStart=/nix/store/.../bin/chainfire-server --config /etc/nixos/chainfire.toml
|
|
Restart=on-failure
|
|
RestartSec=10s
|
|
TimeoutStartSec=60s
|
|
|
|
# Environment
|
|
Environment="CHAINFIRE_LOG_LEVEL=info"
|
|
EnvironmentFile=-/etc/nixos/secrets/chainfire.env
|
|
|
|
# Permissions
|
|
User=chainfire
|
|
Group=chainfire
|
|
StateDirectory=chainfire
|
|
ConfigurationDirectory=chainfire
|
|
|
|
# Security hardening
|
|
PrivateTmp=true
|
|
ProtectSystem=strict
|
|
ProtectHome=true
|
|
NoNewPrivileges=true
|
|
|
|
[Install]
|
|
WantedBy=multi-user.target
|
|
|
|
|
|
# FlareDB service (example)
|
|
[Unit]
|
|
Description=FlareDB Time-Series Database
|
|
After=network-online.target chainfire.service
|
|
Requires=chainfire.service
|
|
Wants=network-online.target
|
|
|
|
[Service]
|
|
Type=notify
|
|
ExecStart=/nix/store/.../bin/flaredb-server --config /etc/nixos/flaredb.toml
|
|
Restart=on-failure
|
|
RestartSec=10s
|
|
TimeoutStartSec=90s
|
|
|
|
# Dependencies: Wait for Chainfire
|
|
ExecStartPre=/bin/sh -c 'until curl -k https://localhost:2379/health; do sleep 5; done'
|
|
|
|
[Install]
|
|
WantedBy=multi-user.target
|
|
|
|
|
|
# First-boot cluster join (example)
|
|
[Unit]
|
|
Description=Chainfire Cluster Join (First Boot)
|
|
After=chainfire.service
|
|
Requires=chainfire.service
|
|
Before=flaredb-cluster-join.service
|
|
|
|
[Service]
|
|
Type=oneshot
|
|
RemainAfterExit=true
|
|
ExecStart=/nix/store/.../bin/cluster-join.sh --service chainfire
|
|
Restart=on-failure
|
|
RestartSec=10s
|
|
|
|
[Install]
|
|
WantedBy=multi-user.target
|
|
```
|
|
|
|
---
|
|
|
|
**Document End**
|