- Replace form_urlencoded with RFC 3986 compliant URI encoding - Implement aws_uri_encode() matching AWS SigV4 spec exactly - Unreserved chars (A-Z,a-z,0-9,-,_,.,~) not encoded - All other chars percent-encoded with uppercase hex - Preserve slashes in paths, encode in query params - Normalize empty paths to '/' per AWS spec - Fix test expectations (body hash, HMAC values) - Add comprehensive SigV4 signature determinism test This fixes the canonicalization mismatch that caused signature validation failures in T047. Auth can now be enabled for production. Refs: T058.S1
33 KiB
33 KiB
Service Dependencies Diagram
Document Version: 1.0 Last Updated: 2025-12-10
Service Startup Order
┌─────────────────────────────────────────────────────────────────────────┐
│ PlasmaCloud Service Dependency Graph │
│ (systemd unit dependencies) │
└─────────────────────────────────────────────────────────────────────────┘
System Boot
│
v
┌──────────────────┐
│ systemd (PID 1) │
└────────┬─────────┘
│
v
┌───────────────────────────────┐
│ basic.target │
│ • mounts filesystems │
│ • activates swap │
└───────────────┬───────────────┘
│
v
┌───────────────────────────────┐
│ network.target │
│ • brings up network interfaces│
│ • configures IP addresses │
└───────────────┬───────────────┘
│
v
┌───────────────────────────────┐
│ network-online.target │
│ • waits for network ready │
│ • ensures DNS resolution │
└───────────────┬───────────────┘
│
v
┌─────────────────────┐
│ multi-user.target │
└──────────┬──────────┘
│
┌──────────────────┼──────────────────┐
│ │ │
v v v
[Level 1] [Level 2] [Level 3+]
Foundation Core Services Application Services
Level 1: Foundation Services (No dependencies)
═══════════════════════════════════════════════════════════════════════════
┌────────────────────────────────────────────────────────────────────────┐
│ Chainfire │
│ ├─ After: network-online.target │
│ ├─ Type: notify (systemd-aware) │
│ ├─ Ports: 2379 (API), 2380 (Raft), 2381 (Gossip) │
│ ├─ Data: /var/lib/chainfire │
│ └─ Start: ~10 seconds │
│ │
│ Purpose: Distributed configuration store, service discovery │
│ Critical: Yes (all other services depend on this) │
└────────────────────────────────────────────────────────────────────────┘
┌────────────────────────────────────────────────────────────────────────┐
│ FlareDB │
│ ├─ After: network-online.target, chainfire.service │
│ ├─ Requires: chainfire.service │
│ ├─ Type: notify │
│ ├─ Ports: 2479 (API), 2480 (Raft) │
│ ├─ Data: /var/lib/flaredb │
│ └─ Start: ~15 seconds (after Chainfire) │
│ │
│ Purpose: Time-series database for metrics and events │
│ Critical: Yes (IAM and monitoring depend on this) │
└────────────────────────────────────────────────────────────────────────┘
Level 2: Core Services (Depend on Chainfire + FlareDB)
═══════════════════════════════════════════════════════════════════════════
┌────────────────────────────────────────────────────────────────────────┐
│ IAM (Identity and Access Management) │
│ ├─ After: flaredb.service │
│ ├─ Requires: flaredb.service │
│ ├─ Type: simple │
│ ├─ Port: 8080 (API) │
│ ├─ Backend: FlareDB (stores users, roles, tokens) │
│ └─ Start: ~5 seconds (after FlareDB) │
│ │
│ Purpose: Authentication and authorization for all APIs │
│ Critical: Yes (API access requires IAM tokens) │
└────────────────────────────────────────────────────────────────────────┘
Level 3: Application Services (Parallel startup)
═══════════════════════════════════════════════════════════════════════════
┌────────────────────────────────────────────────────────────────────────┐
│ PlasmaVMC (Virtual Machine Controller) │
│ ├─ After: chainfire.service, iam.service │
│ ├─ Wants: chainfire.service, iam.service │
│ ├─ Type: notify │
│ ├─ Port: 9090 (API) │
│ └─ Start: ~10 seconds │
│ │
│ Purpose: VM lifecycle management and orchestration │
└────────────────────────────────────────────────────────────────────────┘
┌────────────────────────────────────────────────────────────────────────┐
│ PrismNET (Software-Defined Networking) │
│ ├─ After: chainfire.service, iam.service │
│ ├─ Wants: chainfire.service │
│ ├─ Type: notify │
│ ├─ Ports: 9091 (API), 4789 (VXLAN) │
│ └─ Start: ~8 seconds │
│ │
│ Purpose: Virtual networking, VXLAN overlay management │
└────────────────────────────────────────────────────────────────────────┘
┌────────────────────────────────────────────────────────────────────────┐
│ FlashDNS (High-Performance DNS) │
│ ├─ After: chainfire.service │
│ ├─ Wants: chainfire.service │
│ ├─ Type: forking │
│ ├─ Ports: 53 (DNS), 853 (DoT) │
│ └─ Start: ~3 seconds │
│ │
│ Purpose: DNS resolution for VMs and services │
└────────────────────────────────────────────────────────────────────────┘
┌────────────────────────────────────────────────────────────────────────┐
│ FiberLB (Layer 4/7 Load Balancer) │
│ ├─ After: chainfire.service, iam.service │
│ ├─ Wants: chainfire.service │
│ ├─ Type: notify │
│ ├─ Port: 9092 (API), 80 (HTTP), 443 (HTTPS) │
│ └─ Start: ~5 seconds │
│ │
│ Purpose: Load balancing and traffic distribution │
└────────────────────────────────────────────────────────────────────────┘
┌────────────────────────────────────────────────────────────────────────┐
│ LightningStor (Distributed Block Storage) │
│ ├─ After: chainfire.service, flaredb.service │
│ ├─ Wants: chainfire.service │
│ ├─ Type: notify │
│ ├─ Ports: 9093 (API), 9094 (Replication), 3260 (iSCSI) │
│ └─ Start: ~12 seconds │
│ │
│ Purpose: Block storage for VMs and containers │
└────────────────────────────────────────────────────────────────────────┘
┌────────────────────────────────────────────────────────────────────────┐
│ K8sHost (Kubernetes Node Agent) │
│ ├─ After: chainfire.service, plasmavmc.service, prismnet.service │
│ ├─ Wants: chainfire.service, prismnet.service │
│ ├─ Type: notify │
│ ├─ Ports: 10250 (Kubelet), 10256 (Health) │
│ └─ Start: ~15 seconds │
│ │
│ Purpose: Kubernetes node agent for container orchestration │
└────────────────────────────────────────────────────────────────────────┘
Dependency Visualization (ASCII)
┌─────────────────────────────────────────────────────────────────────────┐
│ Service Dependency Tree │
│ (direction: top-down) │
└─────────────────────────────────────────────────────────────────────────┘
network-online.target
│
│ After
v
┌───────────────┐
│ Chainfire │ (Level 1)
│ Port: 2379 │
└───────┬───────┘
│
┌────────────┼────────────┐
│ Requires │ Wants │ Wants
v v v
┌────────────┐ ┌──────────┐ ┌──────────┐
│ FlareDB │ │PrismNET │ │FlashDNS │
│ Port: 2479 │ │Port: 9091│ │Port: 53 │
└──────┬─────┘ └──────────┘ └──────────┘
│
┌────────┼────────┬──────────┐
│ Requires│ Wants │ Wants │ Wants
v v v v
┌─────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐
│ IAM │ │PlasmaVMC │ │ FiberLB │ │Lightning │
│Port:8080│ │Port: 9090│ │Port: 9092│ │Port: 9093│
└─────────┘ └─────┬────┘ └──────────┘ └──────────┘
│
│ Wants
v
┌─────────────┐
│ K8sHost │ (Level 3)
│ Port: 10250 │
└─────────────┘
Legend:
Requires: Hard dependency (service fails if dependency fails)
Wants: Soft dependency (service starts even if dependency fails)
After: Ordering (wait for dependency to start, but doesn't require success)
Runtime Dependencies (Data Flow)
┌─────────────────────────────────────────────────────────────────────────┐
│ Service Communication Flow │
└─────────────────────────────────────────────────────────────────────────┘
External Client
│
│ HTTPS (8080)
v
┌────────────────┐
│ FiberLB │ Load balances requests
└───────┬────────┘
│
│ Forward to
v
┌────────────────┐ ┌──────────────┐
│ IAM │──────>│ FlareDB │ Validate token
│ (Auth check) │<──────│ (Token store)│
└───────┬────────┘ └──────────────┘
│
│ Token valid
v
┌────────────────┐ ┌──────────────┐ ┌──────────────┐
│ PlasmaVMC │──────>│ Chainfire │──────>│ Worker Node │
│ (API handler) │<──────│ (Coordination)│<──────│ (VM host) │
└────────────────┘ └──────────────┘ └──────────────┘
│
│ Allocate storage
v
┌────────────────┐ ┌──────────────┐
│ LightningStor │──────>│ FlareDB │ Store metadata
│ (Block device)│<──────│ (Metadata) │
└────────────────┘ └──────────────┘
│
│ Configure network
v
┌────────────────┐ ┌──────────────┐
│ PrismNET │──────>│ FlashDNS │ Register DNS
│ (VXLAN setup) │<──────│ (Resolution) │
└────────────────┘ └──────────────┘
Failure Impact Analysis
┌─────────────────────────────────────────────────────────────────────────┐
│ Failure Impact Matrix │
└─────────────────────────────────────────────────────────────────────────┘
Service Fails │ Impact │ Mitigation
──────────────────┼──────────────────────────────────┼────────────────────
Chainfire │ ✗ Total cluster failure │ Raft quorum (3/5)
│ ✗ All services lose config │ Data replicated
│ ✗ New VMs cannot start │ Existing VMs run
│ │ Auto-leader election
──────────────────┼──────────────────────────────────┼────────────────────
FlareDB │ ✗ Metrics not collected │ Raft quorum (3/5)
│ ✗ IAM auth fails │ Cache last tokens
│ ⚠ Existing VMs continue │ New VMs blocked
│ │ Data replicated
──────────────────┼──────────────────────────────────┼────────────────────
IAM │ ✗ New API requests fail │ Token cache (TTL)
│ ⚠ Existing sessions valid │ Multiple instances
│ ⚠ Internal services unaffected │ Load balanced
──────────────────┼──────────────────────────────────┼────────────────────
PlasmaVMC │ ✗ Cannot create/delete VMs │ Multiple instances
│ ✓ Existing VMs unaffected │ Stateless (uses DB)
│ ⚠ VM monitoring stops │ Auto-restart VMs
──────────────────┼──────────────────────────────────┼────────────────────
PrismNET │ ✗ Cannot create new networks │ Multiple instances
│ ✓ Existing networks work │ Distributed agents
│ ⚠ VXLAN tunnels persist │ Control plane HA
──────────────────┼──────────────────────────────────┼────────────────────
FlashDNS │ ⚠ DNS resolution fails │ Multiple instances
│ ✓ Existing connections work │ DNS caching
│ ⚠ New connections affected │ Fallback DNS
──────────────────┼──────────────────────────────────┼────────────────────
FiberLB │ ⚠ Load balancing stops │ Multiple instances
│ ✓ Direct API access works │ VIP failover
│ ⚠ Client requests may timeout │ Health checks
──────────────────┼──────────────────────────────────┼────────────────────
LightningStor │ ⚠ Storage I/O may degrade │ Replication (3x)
│ ✓ Replicas on other nodes │ Auto-rebalance
│ ✗ New volumes cannot be created │ Multi-node cluster
──────────────────┼──────────────────────────────────┼────────────────────
K8sHost │ ⚠ Pods on failed node evicted │ Pod replicas
│ ✓ Cluster continues │ Kubernetes HA
│ ⚠ Capacity reduced │ Auto-rescheduling
Legend:
✗ Complete service failure
⚠ Partial service degradation
✓ No impact or minimal impact
Service Health Check Endpoints
┌─────────────────────────────────────────────────────────────────────────┐
│ Health Check Endpoint Reference │
└─────────────────────────────────────────────────────────────────────────┘
Service │ Endpoint │ Expected Response
──────────────┼──────────────────────────────────┼────────────────────────
Chainfire │ https://host:2379/health │ {"status":"healthy",
│ │ "raft":"leader",
│ │ "cluster_size":3}
──────────────┼──────────────────────────────────┼────────────────────────
FlareDB │ https://host:2479/health │ {"status":"healthy",
│ │ "raft":"follower",
│ │ "chainfire":"connected"}
──────────────┼──────────────────────────────────┼────────────────────────
IAM │ https://host:8080/health │ {"status":"healthy",
│ │ "database":"connected",
│ │ "version":"1.0.0"}
──────────────┼──────────────────────────────────┼────────────────────────
PlasmaVMC │ https://host:9090/health │ {"status":"healthy",
│ │ "vms_running":42}
──────────────┼──────────────────────────────────┼────────────────────────
PrismNET │ https://host:9091/health │ {"status":"healthy",
│ │ "networks":5}
──────────────┼──────────────────────────────────┼────────────────────────
FlashDNS │ dig @host +short health.local │ 127.0.0.1 (A record)
│ https://host:853/health │ {"status":"healthy"}
──────────────┼──────────────────────────────────┼────────────────────────
FiberLB │ https://host:9092/health │ {"status":"healthy",
│ │ "backends":3}
──────────────┼──────────────────────────────────┼────────────────────────
LightningStor │ https://host:9093/health │ {"status":"healthy",
│ │ "volumes":15,
│ │ "total_gb":5000}
──────────────┼──────────────────────────────────┼────────────────────────
K8sHost │ https://host:10250/healthz │ ok (HTTP 200)
First-Boot Service Dependencies
┌─────────────────────────────────────────────────────────────────────────┐
│ First-Boot Automation Services │
│ (T032.S4 - First-Boot) │
└─────────────────────────────────────────────────────────────────────────┘
network-online.target
│
v
┌─────────────────┐
│ chainfire.service│
└────────┬─────────┘
│ After
v
┌──────────────────────────────┐
│ chainfire-cluster-join.service│ (First-boot)
│ ├─ Reads cluster-config.json │
│ ├─ Detects bootstrap mode │
│ └─ Joins cluster or waits │
└────────┬─────────────────────┘
│ After
v
┌───────────────┐
│flaredb.service│
└────────┬──────┘
│ After
v
┌──────────────────────────────┐
│ flaredb-cluster-join.service │ (First-boot)
│ ├─ Waits for FlareDB healthy │
│ └─ Joins FlareDB cluster │
└────────┬─────────────────────┘
│ After
v
┌───────────────┐
│ iam.service │
└────────┬──────┘
│ After
v
┌──────────────────────────────┐
│ iam-initial-setup.service │ (First-boot)
│ ├─ Creates admin user │
│ └─ Initializes IAM │
└────────┬─────────────────────┘
│ After
v
┌──────────────────────────────┐
│ cluster-health-check.service│ (First-boot)
│ ├─ Validates all services │
│ ├─ Checks Raft quorum │
│ └─ Reports cluster ready │
└──────────────────────────────┘
│
v
┌──────────────────┐
│ Cluster Ready │
│ (multi-user.target reached)│
└──────────────────┘
Systemd Unit Configuration Examples
# Chainfire service (example)
[Unit]
Description=Chainfire Distributed Configuration Service
After=network-online.target
Wants=network-online.target
[Service]
Type=notify
ExecStart=/nix/store/.../bin/chainfire-server --config /etc/nixos/chainfire.toml
Restart=on-failure
RestartSec=10s
TimeoutStartSec=60s
# Environment
Environment="CHAINFIRE_LOG_LEVEL=info"
EnvironmentFile=-/etc/nixos/secrets/chainfire.env
# Permissions
User=chainfire
Group=chainfire
StateDirectory=chainfire
ConfigurationDirectory=chainfire
# Security hardening
PrivateTmp=true
ProtectSystem=strict
ProtectHome=true
NoNewPrivileges=true
[Install]
WantedBy=multi-user.target
# FlareDB service (example)
[Unit]
Description=FlareDB Time-Series Database
After=network-online.target chainfire.service
Requires=chainfire.service
Wants=network-online.target
[Service]
Type=notify
ExecStart=/nix/store/.../bin/flaredb-server --config /etc/nixos/flaredb.toml
Restart=on-failure
RestartSec=10s
TimeoutStartSec=90s
# Dependencies: Wait for Chainfire
ExecStartPre=/bin/sh -c 'until curl -k https://localhost:2379/health; do sleep 5; done'
[Install]
WantedBy=multi-user.target
# First-boot cluster join (example)
[Unit]
Description=Chainfire Cluster Join (First Boot)
After=chainfire.service
Requires=chainfire.service
Before=flaredb-cluster-join.service
[Service]
Type=oneshot
RemainAfterExit=true
ExecStart=/nix/store/.../bin/cluster-join.sh --service chainfire
Restart=on-failure
RestartSec=10s
[Install]
WantedBy=multi-user.target
Document End