photoncloud-monorepo/docs/por/T032-baremetal-provisioning/diagrams/service-dependencies.md
centra d2149b6249 fix(lightningstor): Fix SigV4 canonicalization for AWS S3 auth
- Replace form_urlencoded with RFC 3986 compliant URI encoding
- Implement aws_uri_encode() matching AWS SigV4 spec exactly
- Unreserved chars (A-Z,a-z,0-9,-,_,.,~) not encoded
- All other chars percent-encoded with uppercase hex
- Preserve slashes in paths, encode in query params
- Normalize empty paths to '/' per AWS spec
- Fix test expectations (body hash, HMAC values)
- Add comprehensive SigV4 signature determinism test

This fixes the canonicalization mismatch that caused signature
validation failures in T047. Auth can now be enabled for production.

Refs: T058.S1
2025-12-12 06:23:46 +09:00

33 KiB

Service Dependencies Diagram

Document Version: 1.0 Last Updated: 2025-12-10

Service Startup Order

┌─────────────────────────────────────────────────────────────────────────┐
│               PlasmaCloud Service Dependency Graph                       │
│               (systemd unit dependencies)                                │
└─────────────────────────────────────────────────────────────────────────┘

                        System Boot
                            │
                            v
                 ┌──────────────────┐
                 │  systemd (PID 1) │
                 └────────┬─────────┘
                          │
                          v
          ┌───────────────────────────────┐
          │  basic.target                 │
          │  • mounts filesystems         │
          │  • activates swap             │
          └───────────────┬───────────────┘
                          │
                          v
          ┌───────────────────────────────┐
          │  network.target               │
          │  • brings up network interfaces│
          │  • configures IP addresses    │
          └───────────────┬───────────────┘
                          │
                          v
          ┌───────────────────────────────┐
          │  network-online.target        │
          │  • waits for network ready    │
          │  • ensures DNS resolution     │
          └───────────────┬───────────────┘
                          │
                          v
                ┌─────────────────────┐
                │  multi-user.target  │
                └──────────┬──────────┘
                           │
        ┌──────────────────┼──────────────────┐
        │                  │                  │
        v                  v                  v
    [Level 1]          [Level 2]          [Level 3+]
    Foundation         Core Services      Application Services


Level 1: Foundation Services (No dependencies)
═══════════════════════════════════════════════════════════════════════════

┌────────────────────────────────────────────────────────────────────────┐
│  Chainfire                                                             │
│  ├─ After: network-online.target                                      │
│  ├─ Type: notify (systemd-aware)                                      │
│  ├─ Ports: 2379 (API), 2380 (Raft), 2381 (Gossip)                    │
│  ├─ Data: /var/lib/chainfire                                          │
│  └─ Start: ~10 seconds                                                │
│                                                                        │
│  Purpose: Distributed configuration store, service discovery          │
│  Critical: Yes (all other services depend on this)                    │
└────────────────────────────────────────────────────────────────────────┘

┌────────────────────────────────────────────────────────────────────────┐
│  FlareDB                                                               │
│  ├─ After: network-online.target, chainfire.service                   │
│  ├─ Requires: chainfire.service                                       │
│  ├─ Type: notify                                                       │
│  ├─ Ports: 2479 (API), 2480 (Raft)                                    │
│  ├─ Data: /var/lib/flaredb                                            │
│  └─ Start: ~15 seconds (after Chainfire)                              │
│                                                                        │
│  Purpose: Time-series database for metrics and events                 │
│  Critical: Yes (IAM and monitoring depend on this)                    │
└────────────────────────────────────────────────────────────────────────┘


Level 2: Core Services (Depend on Chainfire + FlareDB)
═══════════════════════════════════════════════════════════════════════════

┌────────────────────────────────────────────────────────────────────────┐
│  IAM (Identity and Access Management)                                  │
│  ├─ After: flaredb.service                                            │
│  ├─ Requires: flaredb.service                                         │
│  ├─ Type: simple                                                       │
│  ├─ Port: 8080 (API)                                                  │
│  ├─ Backend: FlareDB (stores users, roles, tokens)                    │
│  └─ Start: ~5 seconds (after FlareDB)                                 │
│                                                                        │
│  Purpose: Authentication and authorization for all APIs               │
│  Critical: Yes (API access requires IAM tokens)                       │
└────────────────────────────────────────────────────────────────────────┘


Level 3: Application Services (Parallel startup)
═══════════════════════════════════════════════════════════════════════════

┌────────────────────────────────────────────────────────────────────────┐
│  PlasmaVMC (Virtual Machine Controller)                                │
│  ├─ After: chainfire.service, iam.service                             │
│  ├─ Wants: chainfire.service, iam.service                             │
│  ├─ Type: notify                                                       │
│  ├─ Port: 9090 (API)                                                  │
│  └─ Start: ~10 seconds                                                │
│                                                                        │
│  Purpose: VM lifecycle management and orchestration                   │
└────────────────────────────────────────────────────────────────────────┘

┌────────────────────────────────────────────────────────────────────────┐
│  PrismNET (Software-Defined Networking)                                 │
│  ├─ After: chainfire.service, iam.service                             │
│  ├─ Wants: chainfire.service                                          │
│  ├─ Type: notify                                                       │
│  ├─ Ports: 9091 (API), 4789 (VXLAN)                                   │
│  └─ Start: ~8 seconds                                                 │
│                                                                        │
│  Purpose: Virtual networking, VXLAN overlay management                │
└────────────────────────────────────────────────────────────────────────┘

┌────────────────────────────────────────────────────────────────────────┐
│  FlashDNS (High-Performance DNS)                                       │
│  ├─ After: chainfire.service                                          │
│  ├─ Wants: chainfire.service                                          │
│  ├─ Type: forking                                                      │
│  ├─ Ports: 53 (DNS), 853 (DoT)                                        │
│  └─ Start: ~3 seconds                                                 │
│                                                                        │
│  Purpose: DNS resolution for VMs and services                         │
└────────────────────────────────────────────────────────────────────────┘

┌────────────────────────────────────────────────────────────────────────┐
│  FiberLB (Layer 4/7 Load Balancer)                                    │
│  ├─ After: chainfire.service, iam.service                             │
│  ├─ Wants: chainfire.service                                          │
│  ├─ Type: notify                                                       │
│  ├─ Port: 9092 (API), 80 (HTTP), 443 (HTTPS)                          │
│  └─ Start: ~5 seconds                                                 │
│                                                                        │
│  Purpose: Load balancing and traffic distribution                     │
└────────────────────────────────────────────────────────────────────────┘

┌────────────────────────────────────────────────────────────────────────┐
│  LightningStor (Distributed Block Storage)                             │
│  ├─ After: chainfire.service, flaredb.service                         │
│  ├─ Wants: chainfire.service                                          │
│  ├─ Type: notify                                                       │
│  ├─ Ports: 9093 (API), 9094 (Replication), 3260 (iSCSI)               │
│  └─ Start: ~12 seconds                                                │
│                                                                        │
│  Purpose: Block storage for VMs and containers                        │
└────────────────────────────────────────────────────────────────────────┘

┌────────────────────────────────────────────────────────────────────────┐
│  K8sHost (Kubernetes Node Agent)                                      │
│  ├─ After: chainfire.service, plasmavmc.service, prismnet.service      │
│  ├─ Wants: chainfire.service, prismnet.service                         │
│  ├─ Type: notify                                                       │
│  ├─ Ports: 10250 (Kubelet), 10256 (Health)                            │
│  └─ Start: ~15 seconds                                                │
│                                                                        │
│  Purpose: Kubernetes node agent for container orchestration           │
└────────────────────────────────────────────────────────────────────────┘

Dependency Visualization (ASCII)

┌─────────────────────────────────────────────────────────────────────────┐
│                    Service Dependency Tree                               │
│                    (direction: top-down)                                 │
└─────────────────────────────────────────────────────────────────────────┘

                         network-online.target
                                  │
                                  │ After
                                  v
                          ┌───────────────┐
                          │  Chainfire    │  (Level 1)
                          │  Port: 2379   │
                          └───────┬───────┘
                                  │
                     ┌────────────┼────────────┐
                     │ Requires   │ Wants      │ Wants
                     v            v            v
            ┌────────────┐  ┌──────────┐  ┌──────────┐
            │  FlareDB   │  │PrismNET   │  │FlashDNS  │
            │ Port: 2479 │  │Port: 9091│  │Port: 53  │
            └──────┬─────┘  └──────────┘  └──────────┘
                   │
          ┌────────┼────────┬──────────┐
          │ Requires│ Wants  │ Wants    │ Wants
          v        v        v          v
    ┌─────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐
    │  IAM    │ │PlasmaVMC │ │ FiberLB  │ │Lightning │
    │Port:8080│ │Port: 9090│ │Port: 9092│ │Port: 9093│
    └─────────┘ └─────┬────┘ └──────────┘ └──────────┘
                      │
                      │ Wants
                      v
               ┌─────────────┐
               │  K8sHost    │  (Level 3)
               │ Port: 10250 │
               └─────────────┘

Legend:
  Requires: Hard dependency (service fails if dependency fails)
  Wants: Soft dependency (service starts even if dependency fails)
  After: Ordering (wait for dependency to start, but doesn't require success)

Runtime Dependencies (Data Flow)

┌─────────────────────────────────────────────────────────────────────────┐
│                       Service Communication Flow                         │
└─────────────────────────────────────────────────────────────────────────┘

External Client
    │
    │ HTTPS (8080)
    v
┌────────────────┐
│  FiberLB       │  Load balances requests
└───────┬────────┘
        │
        │ Forward to
        v
┌────────────────┐        ┌──────────────┐
│  IAM           │──────>│  FlareDB     │  Validate token
│  (Auth check)  │<──────│  (Token store)│
└───────┬────────┘        └──────────────┘
        │
        │ Token valid
        v
┌────────────────┐        ┌──────────────┐        ┌──────────────┐
│  PlasmaVMC     │──────>│  Chainfire   │──────>│  Worker Node │
│  (API handler) │<──────│  (Coordination)│<──────│  (VM host)   │
└────────────────┘        └──────────────┘        └──────────────┘
        │
        │ Allocate storage
        v
┌────────────────┐        ┌──────────────┐
│ LightningStor  │──────>│  FlareDB     │  Store metadata
│  (Block device)│<──────│  (Metadata)  │
└────────────────┘        └──────────────┘
        │
        │ Configure network
        v
┌────────────────┐        ┌──────────────┐
│  PrismNET       │──────>│  FlashDNS    │  Register DNS
│  (VXLAN setup) │<──────│  (Resolution) │
└────────────────┘        └──────────────┘

Failure Impact Analysis

┌─────────────────────────────────────────────────────────────────────────┐
│                    Failure Impact Matrix                                 │
└─────────────────────────────────────────────────────────────────────────┘

Service Fails     │ Impact                           │ Mitigation
──────────────────┼──────────────────────────────────┼────────────────────
Chainfire         │ ✗ Total cluster failure          │ Raft quorum (3/5)
                  │ ✗ All services lose config       │ Data replicated
                  │ ✗ New VMs cannot start           │ Existing VMs run
                  │                                  │ Auto-leader election
──────────────────┼──────────────────────────────────┼────────────────────
FlareDB           │ ✗ Metrics not collected          │ Raft quorum (3/5)
                  │ ✗ IAM auth fails                 │ Cache last tokens
                  │ ⚠ Existing VMs continue          │ New VMs blocked
                  │                                  │ Data replicated
──────────────────┼──────────────────────────────────┼────────────────────
IAM               │ ✗ New API requests fail          │ Token cache (TTL)
                  │ ⚠ Existing sessions valid        │ Multiple instances
                  │ ⚠ Internal services unaffected   │ Load balanced
──────────────────┼──────────────────────────────────┼────────────────────
PlasmaVMC         │ ✗ Cannot create/delete VMs       │ Multiple instances
                  │ ✓ Existing VMs unaffected        │ Stateless (uses DB)
                  │ ⚠ VM monitoring stops            │ Auto-restart VMs
──────────────────┼──────────────────────────────────┼────────────────────
PrismNET           │ ✗ Cannot create new networks     │ Multiple instances
                  │ ✓ Existing networks work         │ Distributed agents
                  │ ⚠ VXLAN tunnels persist          │ Control plane HA
──────────────────┼──────────────────────────────────┼────────────────────
FlashDNS          │ ⚠ DNS resolution fails           │ Multiple instances
                  │ ✓ Existing connections work      │ DNS caching
                  │ ⚠ New connections affected       │ Fallback DNS
──────────────────┼──────────────────────────────────┼────────────────────
FiberLB           │ ⚠ Load balancing stops           │ Multiple instances
                  │ ✓ Direct API access works        │ VIP failover
                  │ ⚠ Client requests may timeout    │ Health checks
──────────────────┼──────────────────────────────────┼────────────────────
LightningStor     │ ⚠ Storage I/O may degrade        │ Replication (3x)
                  │ ✓ Replicas on other nodes        │ Auto-rebalance
                  │ ✗ New volumes cannot be created  │ Multi-node cluster
──────────────────┼──────────────────────────────────┼────────────────────
K8sHost           │ ⚠ Pods on failed node evicted    │ Pod replicas
                  │ ✓ Cluster continues              │ Kubernetes HA
                  │ ⚠ Capacity reduced               │ Auto-rescheduling

Legend:
  ✗ Complete service failure
  ⚠ Partial service degradation
  ✓ No impact or minimal impact

Service Health Check Endpoints

┌─────────────────────────────────────────────────────────────────────────┐
│                    Health Check Endpoint Reference                       │
└─────────────────────────────────────────────────────────────────────────┘

Service       │ Endpoint                         │ Expected Response
──────────────┼──────────────────────────────────┼────────────────────────
Chainfire     │ https://host:2379/health         │ {"status":"healthy",
              │                                  │  "raft":"leader",
              │                                  │  "cluster_size":3}
──────────────┼──────────────────────────────────┼────────────────────────
FlareDB       │ https://host:2479/health         │ {"status":"healthy",
              │                                  │  "raft":"follower",
              │                                  │  "chainfire":"connected"}
──────────────┼──────────────────────────────────┼────────────────────────
IAM           │ https://host:8080/health         │ {"status":"healthy",
              │                                  │  "database":"connected",
              │                                  │  "version":"1.0.0"}
──────────────┼──────────────────────────────────┼────────────────────────
PlasmaVMC     │ https://host:9090/health         │ {"status":"healthy",
              │                                  │  "vms_running":42}
──────────────┼──────────────────────────────────┼────────────────────────
PrismNET       │ https://host:9091/health         │ {"status":"healthy",
              │                                  │  "networks":5}
──────────────┼──────────────────────────────────┼────────────────────────
FlashDNS      │ dig @host +short health.local    │ 127.0.0.1 (A record)
              │ https://host:853/health          │ {"status":"healthy"}
──────────────┼──────────────────────────────────┼────────────────────────
FiberLB       │ https://host:9092/health         │ {"status":"healthy",
              │                                  │  "backends":3}
──────────────┼──────────────────────────────────┼────────────────────────
LightningStor │ https://host:9093/health         │ {"status":"healthy",
              │                                  │  "volumes":15,
              │                                  │  "total_gb":5000}
──────────────┼──────────────────────────────────┼────────────────────────
K8sHost       │ https://host:10250/healthz       │ ok (HTTP 200)

First-Boot Service Dependencies

┌─────────────────────────────────────────────────────────────────────────┐
│                   First-Boot Automation Services                         │
│                   (T032.S4 - First-Boot)                                 │
└─────────────────────────────────────────────────────────────────────────┘

                      network-online.target
                              │
                              v
                    ┌─────────────────┐
                    │ chainfire.service│
                    └────────┬─────────┘
                             │ After
                             v
              ┌──────────────────────────────┐
              │ chainfire-cluster-join.service│  (First-boot)
              │ ├─ Reads cluster-config.json │
              │ ├─ Detects bootstrap mode    │
              │ └─ Joins cluster or waits    │
              └────────┬─────────────────────┘
                       │ After
                       v
              ┌───────────────┐
              │flaredb.service│
              └────────┬──────┘
                       │ After
                       v
              ┌──────────────────────────────┐
              │ flaredb-cluster-join.service │  (First-boot)
              │ ├─ Waits for FlareDB healthy │
              │ └─ Joins FlareDB cluster     │
              └────────┬─────────────────────┘
                       │ After
                       v
              ┌───────────────┐
              │  iam.service  │
              └────────┬──────┘
                       │ After
                       v
              ┌──────────────────────────────┐
              │   iam-initial-setup.service  │  (First-boot)
              │   ├─ Creates admin user      │
              │   └─ Initializes IAM         │
              └────────┬─────────────────────┘
                       │ After
                       v
              ┌──────────────────────────────┐
              │  cluster-health-check.service│  (First-boot)
              │  ├─ Validates all services   │
              │  ├─ Checks Raft quorum       │
              │  └─ Reports cluster ready    │
              └──────────────────────────────┘
                       │
                       v
              ┌──────────────────┐
              │  Cluster Ready   │
              │  (multi-user.target reached)│
              └──────────────────┘

Systemd Unit Configuration Examples

# Chainfire service (example)
[Unit]
Description=Chainfire Distributed Configuration Service
After=network-online.target
Wants=network-online.target

[Service]
Type=notify
ExecStart=/nix/store/.../bin/chainfire-server --config /etc/nixos/chainfire.toml
Restart=on-failure
RestartSec=10s
TimeoutStartSec=60s

# Environment
Environment="CHAINFIRE_LOG_LEVEL=info"
EnvironmentFile=-/etc/nixos/secrets/chainfire.env

# Permissions
User=chainfire
Group=chainfire
StateDirectory=chainfire
ConfigurationDirectory=chainfire

# Security hardening
PrivateTmp=true
ProtectSystem=strict
ProtectHome=true
NoNewPrivileges=true

[Install]
WantedBy=multi-user.target


# FlareDB service (example)
[Unit]
Description=FlareDB Time-Series Database
After=network-online.target chainfire.service
Requires=chainfire.service
Wants=network-online.target

[Service]
Type=notify
ExecStart=/nix/store/.../bin/flaredb-server --config /etc/nixos/flaredb.toml
Restart=on-failure
RestartSec=10s
TimeoutStartSec=90s

# Dependencies: Wait for Chainfire
ExecStartPre=/bin/sh -c 'until curl -k https://localhost:2379/health; do sleep 5; done'

[Install]
WantedBy=multi-user.target


# First-boot cluster join (example)
[Unit]
Description=Chainfire Cluster Join (First Boot)
After=chainfire.service
Requires=chainfire.service
Before=flaredb-cluster-join.service

[Service]
Type=oneshot
RemainAfterExit=true
ExecStart=/nix/store/.../bin/cluster-join.sh --service chainfire
Restart=on-failure
RestartSec=10s

[Install]
WantedBy=multi-user.target

Document End