photoncloud-monorepo/docs/deployment/bare-metal.md
centra a7ec7e2158 Add T026 practical test + k8shost to flake + workspace files
- Created T026-practical-test task.yaml for MVP smoke testing
- Added k8shost-server to flake.nix (packages, apps, overlays)
- Staged all workspace directories for nix flake build
- Updated flake.nix shellHook to include k8shost

Resolves: T026.S1 blocker (R8 - nix submodule visibility)
2025-12-09 06:07:50 +09:00

13 KiB

PlasmaCloud Bare-Metal Deployment

Complete guide for deploying PlasmaCloud infrastructure from scratch on bare metal using NixOS.

Table of Contents

Prerequisites

Hardware Requirements

Minimum (Development/Testing):

  • 8GB RAM
  • 4 CPU cores
  • 100GB disk space
  • 1 Gbps network interface

Recommended (Production):

  • 32GB RAM
  • 8+ CPU cores
  • 500GB SSD (NVMe preferred)
  • 10 Gbps network interface

Network Requirements

  • Static IP address or DHCP reservation
  • Open ports for services:
    • Chainfire: 2379 (API), 2380 (Raft), 2381 (Gossip)
    • FlareDB: 2479 (API), 2480 (Raft)
    • IAM: 3000
    • PlasmaVMC: 4000
    • NovaNET: 5000
    • FlashDNS: 6000 (API), 53 (DNS)
    • FiberLB: 7000
    • LightningStor: 8000

NixOS Installation

1. Download NixOS

Download NixOS 23.11 or later from nixos.org.

# Verify ISO checksum
sha256sum nixos-minimal-23.11.iso

2. Create Bootable USB

# Linux
dd if=nixos-minimal-23.11.iso of=/dev/sdX bs=4M status=progress && sync

# macOS
dd if=nixos-minimal-23.11.iso of=/dev/rdiskX bs=1m

3. Boot and Partition Disk

Boot from USB and partition the disk:

# Partition layout (adjust /dev/sda to your disk)
parted /dev/sda -- mklabel gpt
parted /dev/sda -- mkpart primary 512MB -8GB
parted /dev/sda -- mkpart primary linux-swap -8GB 100%
parted /dev/sda -- mkpart ESP fat32 1MB 512MB
parted /dev/sda -- set 3 esp on

# Format partitions
mkfs.ext4 -L nixos /dev/sda1
mkswap -L swap /dev/sda2
swapon /dev/sda2
mkfs.fat -F 32 -n boot /dev/sda3

# Mount
mount /dev/disk/by-label/nixos /mnt
mkdir -p /mnt/boot
mount /dev/disk/by-label/boot /mnt/boot

4. Generate Initial Configuration

nixos-generate-config --root /mnt

5. Minimal Base Configuration

Edit /mnt/etc/nixos/configuration.nix:

{ config, pkgs, ... }:

{
  imports = [ ./hardware-configuration.nix ];

  # Boot loader
  boot.loader.systemd-boot.enable = true;
  boot.loader.efi.canTouchEfiVariables = true;

  # Networking
  networking.hostName = "plasmacloud-01";
  networking.networkmanager.enable = true;

  # Enable flakes
  nix.settings.experimental-features = [ "nix-command" "flakes" ];

  # System packages
  environment.systemPackages = with pkgs; [
    git vim curl wget htop
  ];

  # User account
  users.users.admin = {
    isNormalUser = true;
    extraGroups = [ "wheel" "networkmanager" ];
    openssh.authorizedKeys.keys = [
      # Add your SSH public key here
      "ssh-ed25519 AAAAC3... user@host"
    ];
  };

  # SSH
  services.openssh = {
    enable = true;
    settings.PermitRootLogin = "no";
    settings.PasswordAuthentication = false;
  };

  # Firewall
  networking.firewall.enable = true;
  networking.firewall.allowedTCPPorts = [ 22 ];

  system.stateVersion = "23.11";
}

6. Install NixOS

nixos-install
reboot

Log in as admin user after reboot.

Repository Setup

1. Clone PlasmaCloud Repository

# Clone via HTTPS
git clone https://github.com/yourorg/plasmacloud.git /opt/plasmacloud

# Or clone locally for development
git clone /path/to/local/plasmacloud /opt/plasmacloud

cd /opt/plasmacloud

2. Verify Flake Structure

# Check flake outputs
nix flake show

# Expected output:
# ├───nixosModules
# │   ├───default
# │   └───plasmacloud
# ├───overlays
# │   └───default
# └───packages
#     ├───chainfire-server
#     ├───flaredb-server
#     ├───iam-server
#     ├───plasmavmc-server
#     ├───novanet-server
#     ├───flashdns-server
#     ├───fiberlb-server
#     └───lightningstor-server

Configuration

Single-Node Deployment

Create /etc/nixos/plasmacloud.nix:

{ config, pkgs, ... }:

{
  # Import PlasmaCloud modules
  imports = [ /opt/plasmacloud/nix/modules ];

  # Apply PlasmaCloud overlay for packages
  nixpkgs.overlays = [
    (import /opt/plasmacloud).overlays.default
  ];

  # Enable all PlasmaCloud services
  services = {
    # Core distributed infrastructure
    chainfire = {
      enable = true;
      port = 2379;
      raftPort = 2380;
      gossipPort = 2381;
      dataDir = "/var/lib/chainfire";
      settings = {
        node_id = 1;
        cluster_id = 1;
        bootstrap = true;
      };
    };

    flaredb = {
      enable = true;
      port = 2479;
      raftPort = 2480;
      dataDir = "/var/lib/flaredb";
      settings = {
        chainfire_endpoint = "127.0.0.1:2379";
      };
    };

    # Identity and access management
    iam = {
      enable = true;
      port = 3000;
      dataDir = "/var/lib/iam";
      settings = {
        flaredb_endpoint = "127.0.0.1:2479";
      };
    };

    # Compute and networking
    plasmavmc = {
      enable = true;
      port = 4000;
      dataDir = "/var/lib/plasmavmc";
      settings = {
        iam_endpoint = "127.0.0.1:3000";
        flaredb_endpoint = "127.0.0.1:2479";
      };
    };

    novanet = {
      enable = true;
      port = 5000;
      dataDir = "/var/lib/novanet";
      settings = {
        iam_endpoint = "127.0.0.1:3000";
        flaredb_endpoint = "127.0.0.1:2479";
        ovn_northd_endpoint = "tcp:127.0.0.1:6641";
      };
    };

    # Edge services
    flashdns = {
      enable = true;
      port = 6000;
      dnsPort = 5353;  # Non-privileged port for development
      dataDir = "/var/lib/flashdns";
      settings = {
        iam_endpoint = "127.0.0.1:3000";
        flaredb_endpoint = "127.0.0.1:2479";
      };
    };

    fiberlb = {
      enable = true;
      port = 7000;
      dataDir = "/var/lib/fiberlb";
      settings = {
        iam_endpoint = "127.0.0.1:3000";
        flaredb_endpoint = "127.0.0.1:2479";
      };
    };

    lightningstor = {
      enable = true;
      port = 8000;
      dataDir = "/var/lib/lightningstor";
      settings = {
        iam_endpoint = "127.0.0.1:3000";
        flaredb_endpoint = "127.0.0.1:2479";
      };
    };
  };

  # Open firewall ports
  networking.firewall.allowedTCPPorts = [
    2379 2380 2381  # chainfire
    2479 2480       # flaredb
    3000            # iam
    4000            # plasmavmc
    5000            # novanet
    5353 6000       # flashdns
    7000            # fiberlb
    8000            # lightningstor
  ];
  networking.firewall.allowedUDPPorts = [
    2381  # chainfire gossip
    5353  # flashdns
  ];
}

Update Main Configuration

Edit /etc/nixos/configuration.nix to import PlasmaCloud config:

{ config, pkgs, ... }:

{
  imports = [
    ./hardware-configuration.nix
    ./plasmacloud.nix  # Add this line
  ];

  # ... rest of configuration
}

Deployment

1. Test Configuration

# Validate configuration syntax
sudo nixos-rebuild dry-build

# Build without activation (test build)
sudo nixos-rebuild build

2. Deploy Services

# Apply configuration and activate services
sudo nixos-rebuild switch

# Or use flake-based rebuild
sudo nixos-rebuild switch --flake /opt/plasmacloud#plasmacloud-01

3. Monitor Deployment

# Watch service startup
sudo journalctl -f

# Check systemd services
systemctl list-units 'chainfire*' 'flaredb*' 'iam*' 'plasmavmc*' 'novanet*' 'flashdns*' 'fiberlb*' 'lightningstor*'

Verification

Service Status Checks

# Check all services are running
systemctl status chainfire
systemctl status flaredb
systemctl status iam
systemctl status plasmavmc
systemctl status novanet
systemctl status flashdns
systemctl status fiberlb
systemctl status lightningstor

# Quick check all at once
for service in chainfire flaredb iam plasmavmc novanet flashdns fiberlb lightningstor; do
  systemctl is-active $service && echo "$service: ✓" || echo "$service: ✗"
done

Health Checks

# Chainfire health check
curl http://localhost:2379/health
# Expected: {"status":"ok","role":"leader"}

# FlareDB health check
curl http://localhost:2479/health
# Expected: {"status":"healthy"}

# IAM health check
curl http://localhost:3000/health
# Expected: {"status":"ok","version":"0.1.0"}

# PlasmaVMC health check
curl http://localhost:4000/health
# Expected: {"status":"ok"}

# NovaNET health check
curl http://localhost:5000/health
# Expected: {"status":"healthy"}

# FlashDNS health check
curl http://localhost:6000/health
# Expected: {"status":"ok"}

# FiberLB health check
curl http://localhost:7000/health
# Expected: {"status":"running"}

# LightningStor health check
curl http://localhost:8000/health
# Expected: {"status":"healthy"}

DNS Resolution Test

# Test DNS server (if using standard port 53)
dig @localhost -p 5353 example.com

# Test PTR reverse lookup
dig @localhost -p 5353 -x 192.168.1.100

Logs Inspection

# View service logs
sudo journalctl -u chainfire -f
sudo journalctl -u flaredb -f
sudo journalctl -u iam -f

# View recent logs with priority
sudo journalctl -u plasmavmc --since "10 minutes ago" -p err

Troubleshooting

Service Won't Start

Check dependencies:

# Verify chainfire is running before flaredb
systemctl status chainfire
systemctl status flaredb

# Check service ordering
systemctl list-dependencies flaredb

Check logs:

# Full logs since boot
sudo journalctl -u <service> -b

# Last 100 lines
sudo journalctl -u <service> -n 100

Permission Errors

# Verify data directories exist with correct permissions
ls -la /var/lib/chainfire
ls -la /var/lib/flaredb

# Check service user exists
id chainfire
id flaredb

Port Conflicts

# Check if ports are already in use
sudo ss -tulpn | grep :2379
sudo ss -tulpn | grep :3000

# Find process using port
sudo lsof -i :2379

Chainfire Cluster Issues

If chainfire fails to bootstrap:

# Check cluster state
curl http://localhost:2379/cluster/members

# Reset data directory (DESTRUCTIVE)
sudo systemctl stop chainfire
sudo rm -rf /var/lib/chainfire/*
sudo systemctl start chainfire

Firewall Issues

# Check firewall rules
sudo nft list ruleset

# Temporarily disable firewall for testing
sudo systemctl stop firewall

# Re-enable after testing
sudo systemctl start firewall

Multi-Node Scaling

Architecture Patterns

Pattern 1: Core + Workers

  • Node 1-3: chainfire, flaredb, iam (HA core)
  • Node 4-N: plasmavmc, novanet, flashdns, fiberlb, lightningstor (workers)

Pattern 2: Service Separation

  • Node 1-3: chainfire, flaredb (data layer)
  • Node 4-6: iam, plasmavmc, novanet (control plane)
  • Node 7-N: flashdns, fiberlb, lightningstor (edge services)

Multi-Node Configuration Example

Core Node (node01.nix):

{
  services = {
    chainfire = {
      enable = true;
      settings = {
        node_id = 1;
        cluster_id = 1;
        initial_members = [
          { id = 1; raft_addr = "10.0.0.11:2380"; }
          { id = 2; raft_addr = "10.0.0.12:2380"; }
          { id = 3; raft_addr = "10.0.0.13:2380"; }
        ];
      };
    };
    flaredb.enable = true;
    iam.enable = true;
  };
}

Worker Node (node04.nix):

{
  services = {
    plasmavmc = {
      enable = true;
      settings = {
        iam_endpoint = "10.0.0.11:3000";  # Point to core
        flaredb_endpoint = "10.0.0.11:2479";
      };
    };
    novanet = {
      enable = true;
      settings = {
        iam_endpoint = "10.0.0.11:3000";
        flaredb_endpoint = "10.0.0.11:2479";
      };
    };
  };
}

Load Balancing

Use DNS round-robin or HAProxy for distributing requests:

# Example HAProxy config for IAM service
services.haproxy = {
  enable = true;
  config = ''
    frontend iam_frontend
      bind *:3000
      default_backend iam_nodes

    backend iam_nodes
      balance roundrobin
      server node01 10.0.0.11:3000 check
      server node02 10.0.0.12:3000 check
      server node03 10.0.0.13:3000 check
  '';
};

Monitoring and Observability

Prometheus metrics:

services.prometheus = {
  enable = true;
  scrapeConfigs = [
    {
      job_name = "plasmacloud";
      static_configs = [{
        targets = [
          "localhost:9091"  # chainfire metrics
          "localhost:9092"  # flaredb metrics
          # ... add all service metrics ports
        ];
      }];
    }
  ];
};

Next Steps

Additional Resources


Deployment Complete!

Your PlasmaCloud infrastructure is now running. Verify all services are healthy and proceed with tenant onboarding.