photoncloud-monorepo/docs/deployment/bare-metal.md
centra a7ec7e2158 Add T026 practical test + k8shost to flake + workspace files
- Created T026-practical-test task.yaml for MVP smoke testing
- Added k8shost-server to flake.nix (packages, apps, overlays)
- Staged all workspace directories for nix flake build
- Updated flake.nix shellHook to include k8shost

Resolves: T026.S1 blocker (R8 - nix submodule visibility)
2025-12-09 06:07:50 +09:00

643 lines
13 KiB
Markdown

# PlasmaCloud Bare-Metal Deployment
Complete guide for deploying PlasmaCloud infrastructure from scratch on bare metal using NixOS.
## Table of Contents
- [Prerequisites](#prerequisites)
- [NixOS Installation](#nixos-installation)
- [Repository Setup](#repository-setup)
- [Configuration](#configuration)
- [Deployment](#deployment)
- [Verification](#verification)
- [Troubleshooting](#troubleshooting)
- [Multi-Node Scaling](#multi-node-scaling)
## Prerequisites
### Hardware Requirements
**Minimum (Development/Testing):**
- 8GB RAM
- 4 CPU cores
- 100GB disk space
- 1 Gbps network interface
**Recommended (Production):**
- 32GB RAM
- 8+ CPU cores
- 500GB SSD (NVMe preferred)
- 10 Gbps network interface
### Network Requirements
- Static IP address or DHCP reservation
- Open ports for services:
- **Chainfire:** 2379 (API), 2380 (Raft), 2381 (Gossip)
- **FlareDB:** 2479 (API), 2480 (Raft)
- **IAM:** 3000
- **PlasmaVMC:** 4000
- **NovaNET:** 5000
- **FlashDNS:** 6000 (API), 53 (DNS)
- **FiberLB:** 7000
- **LightningStor:** 8000
## NixOS Installation
### 1. Download NixOS
Download NixOS 23.11 or later from [nixos.org](https://nixos.org/download.html).
```bash
# Verify ISO checksum
sha256sum nixos-minimal-23.11.iso
```
### 2. Create Bootable USB
```bash
# Linux
dd if=nixos-minimal-23.11.iso of=/dev/sdX bs=4M status=progress && sync
# macOS
dd if=nixos-minimal-23.11.iso of=/dev/rdiskX bs=1m
```
### 3. Boot and Partition Disk
Boot from USB and partition the disk:
```bash
# Partition layout (adjust /dev/sda to your disk)
parted /dev/sda -- mklabel gpt
parted /dev/sda -- mkpart primary 512MB -8GB
parted /dev/sda -- mkpart primary linux-swap -8GB 100%
parted /dev/sda -- mkpart ESP fat32 1MB 512MB
parted /dev/sda -- set 3 esp on
# Format partitions
mkfs.ext4 -L nixos /dev/sda1
mkswap -L swap /dev/sda2
swapon /dev/sda2
mkfs.fat -F 32 -n boot /dev/sda3
# Mount
mount /dev/disk/by-label/nixos /mnt
mkdir -p /mnt/boot
mount /dev/disk/by-label/boot /mnt/boot
```
### 4. Generate Initial Configuration
```bash
nixos-generate-config --root /mnt
```
### 5. Minimal Base Configuration
Edit `/mnt/etc/nixos/configuration.nix`:
```nix
{ config, pkgs, ... }:
{
imports = [ ./hardware-configuration.nix ];
# Boot loader
boot.loader.systemd-boot.enable = true;
boot.loader.efi.canTouchEfiVariables = true;
# Networking
networking.hostName = "plasmacloud-01";
networking.networkmanager.enable = true;
# Enable flakes
nix.settings.experimental-features = [ "nix-command" "flakes" ];
# System packages
environment.systemPackages = with pkgs; [
git vim curl wget htop
];
# User account
users.users.admin = {
isNormalUser = true;
extraGroups = [ "wheel" "networkmanager" ];
openssh.authorizedKeys.keys = [
# Add your SSH public key here
"ssh-ed25519 AAAAC3... user@host"
];
};
# SSH
services.openssh = {
enable = true;
settings.PermitRootLogin = "no";
settings.PasswordAuthentication = false;
};
# Firewall
networking.firewall.enable = true;
networking.firewall.allowedTCPPorts = [ 22 ];
system.stateVersion = "23.11";
}
```
### 6. Install NixOS
```bash
nixos-install
reboot
```
Log in as `admin` user after reboot.
## Repository Setup
### 1. Clone PlasmaCloud Repository
```bash
# Clone via HTTPS
git clone https://github.com/yourorg/plasmacloud.git /opt/plasmacloud
# Or clone locally for development
git clone /path/to/local/plasmacloud /opt/plasmacloud
cd /opt/plasmacloud
```
### 2. Verify Flake Structure
```bash
# Check flake outputs
nix flake show
# Expected output:
# ├───nixosModules
# │ ├───default
# │ └───plasmacloud
# ├───overlays
# │ └───default
# └───packages
# ├───chainfire-server
# ├───flaredb-server
# ├───iam-server
# ├───plasmavmc-server
# ├───novanet-server
# ├───flashdns-server
# ├───fiberlb-server
# └───lightningstor-server
```
## Configuration
### Single-Node Deployment
Create `/etc/nixos/plasmacloud.nix`:
```nix
{ config, pkgs, ... }:
{
# Import PlasmaCloud modules
imports = [ /opt/plasmacloud/nix/modules ];
# Apply PlasmaCloud overlay for packages
nixpkgs.overlays = [
(import /opt/plasmacloud).overlays.default
];
# Enable all PlasmaCloud services
services = {
# Core distributed infrastructure
chainfire = {
enable = true;
port = 2379;
raftPort = 2380;
gossipPort = 2381;
dataDir = "/var/lib/chainfire";
settings = {
node_id = 1;
cluster_id = 1;
bootstrap = true;
};
};
flaredb = {
enable = true;
port = 2479;
raftPort = 2480;
dataDir = "/var/lib/flaredb";
settings = {
chainfire_endpoint = "127.0.0.1:2379";
};
};
# Identity and access management
iam = {
enable = true;
port = 3000;
dataDir = "/var/lib/iam";
settings = {
flaredb_endpoint = "127.0.0.1:2479";
};
};
# Compute and networking
plasmavmc = {
enable = true;
port = 4000;
dataDir = "/var/lib/plasmavmc";
settings = {
iam_endpoint = "127.0.0.1:3000";
flaredb_endpoint = "127.0.0.1:2479";
};
};
novanet = {
enable = true;
port = 5000;
dataDir = "/var/lib/novanet";
settings = {
iam_endpoint = "127.0.0.1:3000";
flaredb_endpoint = "127.0.0.1:2479";
ovn_northd_endpoint = "tcp:127.0.0.1:6641";
};
};
# Edge services
flashdns = {
enable = true;
port = 6000;
dnsPort = 5353; # Non-privileged port for development
dataDir = "/var/lib/flashdns";
settings = {
iam_endpoint = "127.0.0.1:3000";
flaredb_endpoint = "127.0.0.1:2479";
};
};
fiberlb = {
enable = true;
port = 7000;
dataDir = "/var/lib/fiberlb";
settings = {
iam_endpoint = "127.0.0.1:3000";
flaredb_endpoint = "127.0.0.1:2479";
};
};
lightningstor = {
enable = true;
port = 8000;
dataDir = "/var/lib/lightningstor";
settings = {
iam_endpoint = "127.0.0.1:3000";
flaredb_endpoint = "127.0.0.1:2479";
};
};
};
# Open firewall ports
networking.firewall.allowedTCPPorts = [
2379 2380 2381 # chainfire
2479 2480 # flaredb
3000 # iam
4000 # plasmavmc
5000 # novanet
5353 6000 # flashdns
7000 # fiberlb
8000 # lightningstor
];
networking.firewall.allowedUDPPorts = [
2381 # chainfire gossip
5353 # flashdns
];
}
```
### Update Main Configuration
Edit `/etc/nixos/configuration.nix` to import PlasmaCloud config:
```nix
{ config, pkgs, ... }:
{
imports = [
./hardware-configuration.nix
./plasmacloud.nix # Add this line
];
# ... rest of configuration
}
```
## Deployment
### 1. Test Configuration
```bash
# Validate configuration syntax
sudo nixos-rebuild dry-build
# Build without activation (test build)
sudo nixos-rebuild build
```
### 2. Deploy Services
```bash
# Apply configuration and activate services
sudo nixos-rebuild switch
# Or use flake-based rebuild
sudo nixos-rebuild switch --flake /opt/plasmacloud#plasmacloud-01
```
### 3. Monitor Deployment
```bash
# Watch service startup
sudo journalctl -f
# Check systemd services
systemctl list-units 'chainfire*' 'flaredb*' 'iam*' 'plasmavmc*' 'novanet*' 'flashdns*' 'fiberlb*' 'lightningstor*'
```
## Verification
### Service Status Checks
```bash
# Check all services are running
systemctl status chainfire
systemctl status flaredb
systemctl status iam
systemctl status plasmavmc
systemctl status novanet
systemctl status flashdns
systemctl status fiberlb
systemctl status lightningstor
# Quick check all at once
for service in chainfire flaredb iam plasmavmc novanet flashdns fiberlb lightningstor; do
systemctl is-active $service && echo "$service: ✓" || echo "$service: ✗"
done
```
### Health Checks
```bash
# Chainfire health check
curl http://localhost:2379/health
# Expected: {"status":"ok","role":"leader"}
# FlareDB health check
curl http://localhost:2479/health
# Expected: {"status":"healthy"}
# IAM health check
curl http://localhost:3000/health
# Expected: {"status":"ok","version":"0.1.0"}
# PlasmaVMC health check
curl http://localhost:4000/health
# Expected: {"status":"ok"}
# NovaNET health check
curl http://localhost:5000/health
# Expected: {"status":"healthy"}
# FlashDNS health check
curl http://localhost:6000/health
# Expected: {"status":"ok"}
# FiberLB health check
curl http://localhost:7000/health
# Expected: {"status":"running"}
# LightningStor health check
curl http://localhost:8000/health
# Expected: {"status":"healthy"}
```
### DNS Resolution Test
```bash
# Test DNS server (if using standard port 53)
dig @localhost -p 5353 example.com
# Test PTR reverse lookup
dig @localhost -p 5353 -x 192.168.1.100
```
### Logs Inspection
```bash
# View service logs
sudo journalctl -u chainfire -f
sudo journalctl -u flaredb -f
sudo journalctl -u iam -f
# View recent logs with priority
sudo journalctl -u plasmavmc --since "10 minutes ago" -p err
```
## Troubleshooting
### Service Won't Start
**Check dependencies:**
```bash
# Verify chainfire is running before flaredb
systemctl status chainfire
systemctl status flaredb
# Check service ordering
systemctl list-dependencies flaredb
```
**Check logs:**
```bash
# Full logs since boot
sudo journalctl -u <service> -b
# Last 100 lines
sudo journalctl -u <service> -n 100
```
### Permission Errors
```bash
# Verify data directories exist with correct permissions
ls -la /var/lib/chainfire
ls -la /var/lib/flaredb
# Check service user exists
id chainfire
id flaredb
```
### Port Conflicts
```bash
# Check if ports are already in use
sudo ss -tulpn | grep :2379
sudo ss -tulpn | grep :3000
# Find process using port
sudo lsof -i :2379
```
### Chainfire Cluster Issues
If chainfire fails to bootstrap:
```bash
# Check cluster state
curl http://localhost:2379/cluster/members
# Reset data directory (DESTRUCTIVE)
sudo systemctl stop chainfire
sudo rm -rf /var/lib/chainfire/*
sudo systemctl start chainfire
```
### Firewall Issues
```bash
# Check firewall rules
sudo nft list ruleset
# Temporarily disable firewall for testing
sudo systemctl stop firewall
# Re-enable after testing
sudo systemctl start firewall
```
## Multi-Node Scaling
### Architecture Patterns
**Pattern 1: Core + Workers**
- **Node 1-3:** chainfire, flaredb, iam (HA core)
- **Node 4-N:** plasmavmc, novanet, flashdns, fiberlb, lightningstor (workers)
**Pattern 2: Service Separation**
- **Node 1-3:** chainfire, flaredb (data layer)
- **Node 4-6:** iam, plasmavmc, novanet (control plane)
- **Node 7-N:** flashdns, fiberlb, lightningstor (edge services)
### Multi-Node Configuration Example
**Core Node (node01.nix):**
```nix
{
services = {
chainfire = {
enable = true;
settings = {
node_id = 1;
cluster_id = 1;
initial_members = [
{ id = 1; raft_addr = "10.0.0.11:2380"; }
{ id = 2; raft_addr = "10.0.0.12:2380"; }
{ id = 3; raft_addr = "10.0.0.13:2380"; }
];
};
};
flaredb.enable = true;
iam.enable = true;
};
}
```
**Worker Node (node04.nix):**
```nix
{
services = {
plasmavmc = {
enable = true;
settings = {
iam_endpoint = "10.0.0.11:3000"; # Point to core
flaredb_endpoint = "10.0.0.11:2479";
};
};
novanet = {
enable = true;
settings = {
iam_endpoint = "10.0.0.11:3000";
flaredb_endpoint = "10.0.0.11:2479";
};
};
};
}
```
### Load Balancing
Use DNS round-robin or HAProxy for distributing requests:
```nix
# Example HAProxy config for IAM service
services.haproxy = {
enable = true;
config = ''
frontend iam_frontend
bind *:3000
default_backend iam_nodes
backend iam_nodes
balance roundrobin
server node01 10.0.0.11:3000 check
server node02 10.0.0.12:3000 check
server node03 10.0.0.13:3000 check
'';
};
```
### Monitoring and Observability
**Prometheus metrics:**
```nix
services.prometheus = {
enable = true;
scrapeConfigs = [
{
job_name = "plasmacloud";
static_configs = [{
targets = [
"localhost:9091" # chainfire metrics
"localhost:9092" # flaredb metrics
# ... add all service metrics ports
];
}];
}
];
};
```
## Next Steps
- **[Configuration Templates](./config-templates.md)** — Pre-built configs for common scenarios
- **[High Availability Guide](./high-availability.md)** — Multi-node HA setup
- **[Monitoring Setup](./monitoring.md)** — Metrics and logging
- **[Backup and Recovery](./backup-recovery.md)** — Data protection strategies
## Additional Resources
- [NixOS Manual](https://nixos.org/manual/nixos/stable/)
- [Nix Flakes Guide](https://nixos.wiki/wiki/Flakes)
- [PlasmaCloud Architecture](../architecture/mvp-beta-tenant-path.md)
- [Service API Documentation](../api/)
---
**Deployment Complete!**
Your PlasmaCloud infrastructure is now running. Verify all services are healthy and proceed with tenant onboarding.