centra 5c6eb04a46 T036: Add VM cluster deployment configs for nixos-anywhere

- netboot-base.nix with SSH key auth
- Launch scripts for node01/02/03
- Node configuration.nix and disko.nix
- Nix modules for first-boot automation

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

2025-12-11 09:59:19 +09:00

32 KiB

Raw Blame History

PlasmaVMC Specification

Version: 1.0 | Status: Draft | Last Updated: 2025-12-08

1. Overview

1.1 Purpose

PlasmaVMC is a virtual machine control platform providing unified management across multiple hypervisor backends. It abstracts hypervisor-specific implementations behind trait-based interfaces, enabling consistent VM lifecycle management regardless of the underlying virtualization technology.

The name "Plasma" reflects its role as the energized medium that powers virtual machines, with "VMC" denoting Virtual Machine Controller.

1.2 Scope

In scope: VM lifecycle (create, start, stop, delete), hypervisor abstraction (KVM, FireCracker, mvisor), image management, resource allocation (CPU, memory, storage, network), multi-tenant isolation, console/serial access, live migration (future)
Out of scope: Container orchestration (Kubernetes), bare metal provisioning, storage backend implementation (uses LightningSTOR), network fabric (uses overlay network)

1.3 Design Goals

Hypervisor agnostic: Trait-based abstraction supporting KVM, FireCracker, mvisor
AWS/GCP EC2-like UX: Familiar concepts for cloud users
Multi-tenant from day one: Full org/project hierarchy with resource isolation
High density: Support thousands of VMs per node
Fast boot: Sub-second boot times with FireCracker/microVMs
Observable: Rich metrics, events, and audit logging

2. Architecture

2.1 Crate Structure

plasmavmc/
├── crates/
│   ├── plasmavmc-api/        # gRPC service implementations
│   ├── plasmavmc-client/     # Rust client library
│   ├── plasmavmc-core/       # Core orchestration logic
│   ├── plasmavmc-hypervisor/ # Hypervisor trait + registry
│   ├── plasmavmc-kvm/        # KVM/QEMU backend
│   ├── plasmavmc-firecracker/# FireCracker backend
│   ├── plasmavmc-mvisor/     # mvisor backend
│   ├── plasmavmc-server/     # Control plane server
│   ├── plasmavmc-agent/      # Node agent binary
│   ├── plasmavmc-storage/    # Image/disk management
│   └── plasmavmc-types/      # Shared types
└── proto/
    ├── plasmavmc.proto       # Public API
    └── agent.proto           # Agent internal RPCs

2.2 Component Topology

┌─────────────────────────────────────────────────────────────────┐
│                      Control Plane                               │
│  ┌─────────────────┐  ┌─────────────────┐  ┌─────────────────┐  │
│  │  plasmavmc-api  │  │ plasmavmc-core  │  │plasmavmc-storage│  │
│  │   (gRPC svc)    │──│  (scheduler)    │──│  (image mgmt)   │  │
│  └────────┬────────┘  └────────┬────────┘  └────────┬────────┘  │
│           │                    │                    │           │
│           └────────────────────┼────────────────────┘           │
│                                │                                 │
│                         ┌──────▼──────┐                         │
│                         │  Chainfire  │                         │
│                         │  (state)    │                         │
│                         └─────────────┘                         │
└─────────────────────────────────────────────────────────────────┘
                                 │
                    ┌────────────┼────────────┐
                    ▼            ▼            ▼
┌─────────────────────┐ ┌─────────────────────┐ ┌─────────────────────┐
│      Node 1         │ │      Node 2         │ │      Node N         │
│ ┌─────────────────┐ │ │ ┌─────────────────┐ │ │ ┌─────────────────┐ │
│ │ plasmavmc-agent │ │ │ │ plasmavmc-agent │ │ │ │ plasmavmc-agent │ │
│ └────────┬────────┘ │ │ └────────┬────────┘ │ │ └────────┬────────┘ │
│          │          │ │          │          │ │          │          │
│ ┌────────▼────────┐ │ │ ┌────────▼────────┐ │ │ ┌────────▼────────┐ │
│ │HypervisorBackend│ │ │ │HypervisorBackend│ │ │ │HypervisorBackend│ │
│ │ (KVM/FC/mvisor) │ │ │ │ (KVM/FC/mvisor) │ │ │ │ (KVM/FC/mvisor) │ │
│ └─────────────────┘ │ │ └─────────────────┘ │ │ └─────────────────┘ │
└─────────────────────┘ └─────────────────────┘ └─────────────────────┘

2.3 Data Flow

[Client gRPC] → [API Layer] → [Scheduler] → [Agent gRPC] → [Hypervisor]
                     │              │
                     ▼              ▼
               [Chainfire]    [Node Selection]
               (VM state)     (capacity, affinity)

2.4 Dependencies

Crate	Version	Purpose
tokio	1.x	Async runtime
tonic	0.12	gRPC framework
prost	0.13	Protocol buffers
uuid	1.x	VM/resource identifiers
dashmap	6.x	Concurrent state caches
nix	0.29	Linux system calls

3. Core Concepts

3.1 Virtual Machine (VM)

The primary managed resource representing a virtual machine instance.

pub struct VirtualMachine {
    pub id: VmId,                    // UUID
    pub name: String,                // User-defined name
    pub org_id: String,              // Organization owner
    pub project_id: String,          // Project owner
    pub state: VmState,              // Current state
    pub spec: VmSpec,                // Desired configuration
    pub status: VmStatus,            // Runtime status
    pub node_id: Option<NodeId>,     // Assigned node
    pub hypervisor: HypervisorType,  // Backend type
    pub created_at: u64,
    pub updated_at: u64,
    pub created_by: String,          // Principal ID
    pub metadata: HashMap<String, String>,
    pub labels: HashMap<String, String>,
}

pub struct VmSpec {
    pub cpu: CpuSpec,
    pub memory: MemorySpec,
    pub disks: Vec<DiskSpec>,
    pub network: Vec<NetworkSpec>,
    pub boot: BootSpec,
    pub security: SecuritySpec,
}

pub struct CpuSpec {
    pub vcpus: u32,              // Number of vCPUs
    pub cores_per_socket: u32,   // Topology: cores per socket
    pub sockets: u32,            // Topology: socket count
    pub cpu_model: Option<String>, // e.g., "host-passthrough"
}

pub struct MemorySpec {
    pub size_mib: u64,           // Memory size in MiB
    pub hugepages: bool,         // Use huge pages
    pub numa_nodes: Vec<NumaNode>, // NUMA topology
}

pub struct DiskSpec {
    pub id: String,              // Disk identifier
    pub source: DiskSource,      // Image or volume
    pub size_gib: u64,           // Disk size
    pub bus: DiskBus,            // virtio, scsi, ide
    pub cache: DiskCache,        // none, writeback, writethrough
    pub boot_index: Option<u32>, // Boot order
}

pub struct NetworkSpec {
    pub id: String,              // Interface identifier
    pub network_id: String,      // Overlay network ID
    pub mac_address: Option<String>,
    pub ip_address: Option<String>,
    pub model: NicModel,         // virtio-net, e1000
    pub security_groups: Vec<String>,
}

3.2 VM State Machine

                        ┌──────────────────────────────────────┐
                        ▼                                      │
┌─────────┐   create   ┌─────────┐   start   ┌─────────┐      │
│ PENDING │──────────►│ STOPPED │──────────►│ RUNNING │      │
└─────────┘            └────┬────┘           └────┬────┘      │
                            │ delete              │           │
                            ▼                     │ stop      │
                       ┌─────────┐                │           │
                       │ DELETED │◄───────────────┤           │
                       └─────────┘                │           │
                                                  │ reboot    │
                                                  └───────────┘

Additional states:
  CREATING  - Provisioning resources
  STARTING  - Boot in progress
  STOPPING  - Shutdown in progress
  MIGRATING - Live migration in progress
  ERROR     - Failed state (recoverable)
  FAILED    - Terminal failure

pub enum VmState {
    Pending,     // Awaiting scheduling
    Creating,    // Resources being provisioned
    Stopped,     // Created but not running
    Starting,    // Boot in progress
    Running,     // Active and healthy
    Stopping,    // Graceful shutdown
    Migrating,   // Live migration in progress
    Error,       // Recoverable error
    Failed,      // Terminal failure
    Deleted,     // Soft-deleted, pending cleanup
}

3.3 Runtime Status

pub struct VmStatus {
    pub actual_state: VmState,
    pub host_pid: Option<u32>,       // Hypervisor process PID
    pub started_at: Option<u64>,     // Last boot timestamp
    pub ip_addresses: Vec<IpAddress>,
    pub resource_usage: ResourceUsage,
    pub last_error: Option<String>,
    pub conditions: Vec<Condition>,
}

pub struct ResourceUsage {
    pub cpu_percent: f64,
    pub memory_used_mib: u64,
    pub disk_read_bytes: u64,
    pub disk_write_bytes: u64,
    pub network_rx_bytes: u64,
    pub network_tx_bytes: u64,
}

3.4 Image

Bootable disk images for VM creation.

pub struct Image {
    pub id: ImageId,
    pub name: String,
    pub org_id: String,              // Owner org (or "system" for public)
    pub visibility: Visibility,      // Public, Private, Shared
    pub source: ImageSource,
    pub format: ImageFormat,
    pub size_bytes: u64,
    pub checksum: String,            // SHA256
    pub os_type: OsType,
    pub os_version: String,
    pub architecture: Architecture,
    pub min_disk_gib: u32,
    pub min_memory_mib: u32,
    pub status: ImageStatus,
    pub created_at: u64,
    pub updated_at: u64,
    pub metadata: HashMap<String, String>,
}

pub enum ImageSource {
    Url { url: String },
    Upload { storage_path: String },
    Snapshot { vm_id: VmId, disk_id: String },
}

pub enum ImageFormat {
    Raw,
    Qcow2,
    Vmdk,
    Vhd,
}

pub enum Visibility {
    Public,          // Available to all orgs
    Private,         // Only owner org
    Shared { orgs: Vec<String> },
}

3.5 Node

Physical or virtual host running the agent.

pub struct Node {
    pub id: NodeId,
    pub name: String,
    pub state: NodeState,
    pub capacity: NodeCapacity,
    pub allocatable: NodeCapacity,
    pub allocated: NodeCapacity,
    pub hypervisors: Vec<HypervisorType>,  // Supported backends
    pub labels: HashMap<String, String>,
    pub taints: Vec<Taint>,
    pub conditions: Vec<NodeCondition>,
    pub agent_version: String,
    pub last_heartbeat: u64,
}

pub struct NodeCapacity {
    pub vcpus: u32,
    pub memory_mib: u64,
    pub storage_gib: u64,
}

pub enum NodeState {
    Ready,
    NotReady,
    Cordoned,      // No new VMs scheduled
    Draining,      // Migrating VMs off
    Maintenance,
}

4. Hypervisor Abstraction

4.1 Backend Trait

#[async_trait]
pub trait HypervisorBackend: Send + Sync {
    /// Backend identifier
    fn backend_type(&self) -> HypervisorType;

    /// Check if this backend supports the given VM spec
    fn supports(&self, spec: &VmSpec) -> Result<(), UnsupportedFeature>;

    /// Create VM resources (disk, network) without starting
    async fn create(&self, vm: &VirtualMachine) -> Result<VmHandle>;

    /// Start the VM
    async fn start(&self, handle: &VmHandle) -> Result<()>;

    /// Stop the VM (graceful shutdown)
    async fn stop(&self, handle: &VmHandle, timeout: Duration) -> Result<()>;

    /// Force stop the VM
    async fn kill(&self, handle: &VmHandle) -> Result<()>;

    /// Reboot the VM
    async fn reboot(&self, handle: &VmHandle) -> Result<()>;

    /// Delete VM and cleanup resources
    async fn delete(&self, handle: &VmHandle) -> Result<()>;

    /// Get current VM status
    async fn status(&self, handle: &VmHandle) -> Result<VmStatus>;

    /// Attach a disk to running VM
    async fn attach_disk(&self, handle: &VmHandle, disk: &DiskSpec) -> Result<()>;

    /// Detach a disk from running VM
    async fn detach_disk(&self, handle: &VmHandle, disk_id: &str) -> Result<()>;

    /// Attach a network interface
    async fn attach_nic(&self, handle: &VmHandle, nic: &NetworkSpec) -> Result<()>;

    /// Get console stream (VNC/serial)
    async fn console(&self, handle: &VmHandle, console_type: ConsoleType)
        -> Result<Box<dyn AsyncReadWrite>>;

    /// Take a snapshot
    async fn snapshot(&self, handle: &VmHandle, snapshot_id: &str) -> Result<()>;
}

4.2 Hypervisor Types

pub enum HypervisorType {
    Kvm,          // QEMU/KVM - full-featured
    Firecracker,  // AWS Firecracker - microVMs
    Mvisor,       // mvisor - lightweight
}

4.3 Backend Registry

pub struct HypervisorRegistry {
    backends: HashMap<HypervisorType, Arc<dyn HypervisorBackend>>,
}

impl HypervisorRegistry {
    pub fn register(&mut self, backend: Arc<dyn HypervisorBackend>);
    pub fn get(&self, typ: HypervisorType) -> Option<Arc<dyn HypervisorBackend>>;
    pub fn available(&self) -> Vec<HypervisorType>;
}

4.4 Backend Capabilities

pub struct BackendCapabilities {
    pub live_migration: bool,
    pub hot_plug_cpu: bool,
    pub hot_plug_memory: bool,
    pub hot_plug_disk: bool,
    pub hot_plug_nic: bool,
    pub vnc_console: bool,
    pub serial_console: bool,
    pub nested_virtualization: bool,
    pub gpu_passthrough: bool,
    pub max_vcpus: u32,
    pub max_memory_gib: u64,
    pub supported_disk_buses: Vec<DiskBus>,
    pub supported_nic_models: Vec<NicModel>,
}

4.5 KVM Backend Implementation

// plasmavmc-kvm crate
pub struct KvmBackend {
    qemu_path: PathBuf,
    runtime_dir: PathBuf,
    network_helper: NetworkHelper,
}

impl HypervisorBackend for KvmBackend {
    fn backend_type(&self) -> HypervisorType {
        HypervisorType::Kvm
    }

    async fn create(&self, vm: &VirtualMachine) -> Result<VmHandle> {
        // Generate QEMU command line
        // Create runtime directory
        // Prepare disks and network devices
    }

    async fn start(&self, handle: &VmHandle) -> Result<()> {
        // Launch QEMU process
        // Wait for QMP socket
        // Configure via QMP
    }
    // ... other methods
}

4.6 FireCracker Backend Implementation

// plasmavmc-firecracker crate
pub struct FirecrackerBackend {
    fc_path: PathBuf,
    jailer_path: PathBuf,
    runtime_dir: PathBuf,
}

impl HypervisorBackend for FirecrackerBackend {
    fn backend_type(&self) -> HypervisorType {
        HypervisorType::Firecracker
    }

    fn supports(&self, spec: &VmSpec) -> Result<(), UnsupportedFeature> {
        // FireCracker limitations:
        // - No VNC, only serial
        // - No live migration
        // - Limited device models
        if spec.disks.iter().any(|d| d.bus != DiskBus::Virtio) {
            return Err(UnsupportedFeature::DiskBus);
        }
        Ok(())
    }
    // ... other methods
}

5. API

5.1 gRPC Services

VM Service (`plasmavmc.v1.VmService`)

service VmService {
  // Lifecycle
  rpc CreateVm(CreateVmRequest) returns (VirtualMachine);
  rpc GetVm(GetVmRequest) returns (VirtualMachine);
  rpc ListVms(ListVmsRequest) returns (ListVmsResponse);
  rpc UpdateVm(UpdateVmRequest) returns (VirtualMachine);
  rpc DeleteVm(DeleteVmRequest) returns (Empty);

  // Power operations
  rpc StartVm(StartVmRequest) returns (VirtualMachine);
  rpc StopVm(StopVmRequest) returns (VirtualMachine);
  rpc RebootVm(RebootVmRequest) returns (VirtualMachine);
  rpc ResetVm(ResetVmRequest) returns (VirtualMachine);

  // Disks
  rpc AttachDisk(AttachDiskRequest) returns (VirtualMachine);
  rpc DetachDisk(DetachDiskRequest) returns (VirtualMachine);

  // Network
  rpc AttachNic(AttachNicRequest) returns (VirtualMachine);
  rpc DetachNic(DetachNicRequest) returns (VirtualMachine);

  // Console
  rpc GetConsole(GetConsoleRequest) returns (stream ConsoleData);

  // Events
  rpc WatchVm(WatchVmRequest) returns (stream VmEvent);
}

Image Service (`plasmavmc.v1.ImageService`)

service ImageService {
  rpc CreateImage(CreateImageRequest) returns (Image);
  rpc GetImage(GetImageRequest) returns (Image);
  rpc ListImages(ListImagesRequest) returns (ListImagesResponse);
  rpc UpdateImage(UpdateImageRequest) returns (Image);
  rpc DeleteImage(DeleteImageRequest) returns (Empty);

  // Upload/Download
  rpc UploadImage(stream UploadImageRequest) returns (Image);
  rpc DownloadImage(DownloadImageRequest) returns (stream DownloadImageResponse);

  // Conversion
  rpc ConvertImage(ConvertImageRequest) returns (Image);
}

Node Service (`plasmavmc.v1.NodeService`)

service NodeService {
  rpc ListNodes(ListNodesRequest) returns (ListNodesResponse);
  rpc GetNode(GetNodeRequest) returns (Node);
  rpc CordonNode(CordonNodeRequest) returns (Node);
  rpc UncordonNode(UncordonNodeRequest) returns (Node);
  rpc DrainNode(DrainNodeRequest) returns (Node);
}

5.2 Agent Internal API (`plasmavmc.agent.v1`)

service AgentService {
  // VM operations (called by control plane)
  rpc CreateVm(CreateVmRequest) returns (VmHandle);
  rpc StartVm(StartVmRequest) returns (Empty);
  rpc StopVm(StopVmRequest) returns (Empty);
  rpc DeleteVm(DeleteVmRequest) returns (Empty);
  rpc GetVmStatus(GetVmStatusRequest) returns (VmStatus);

  // Node status (reported to control plane)
  rpc Heartbeat(HeartbeatRequest) returns (HeartbeatResponse);
  rpc ReportStatus(ReportStatusRequest) returns (Empty);
}

5.3 Client Library

use plasmavmc_client::PlasmaClient;

let client = PlasmaClient::connect("http://127.0.0.1:8080").await?;

// Create VM
let vm = client.create_vm(CreateVmRequest {
    name: "my-vm".into(),
    org_id: "org-1".into(),
    project_id: "proj-1".into(),
    spec: VmSpec {
        cpu: CpuSpec { vcpus: 2, ..Default::default() },
        memory: MemorySpec { size_mib: 2048, ..Default::default() },
        disks: vec![DiskSpec {
            source: DiskSource::Image { id: "ubuntu-22.04".into() },
            size_gib: 20,
            ..Default::default()
        }],
        network: vec![NetworkSpec {
            network_id: "default".into(),
            ..Default::default()
        }],
        ..Default::default()
    },
    hypervisor: HypervisorType::Kvm,
    ..Default::default()
}).await?;

// Start VM
client.start_vm(vm.id).await?;

// Watch events
let mut stream = client.watch_vm(vm.id).await?;
while let Some(event) = stream.next().await {
    println!("Event: {:?}", event);
}

6. Scheduling

6.1 Scheduler

pub struct Scheduler {
    node_cache: Arc<NodeCache>,
    filters: Vec<Box<dyn ScheduleFilter>>,
    scorers: Vec<Box<dyn ScheduleScorer>>,
}

impl Scheduler {
    pub async fn schedule(&self, vm: &VirtualMachine) -> Result<NodeId> {
        let candidates = self.node_cache.ready_nodes();

        // Filter phase
        let filtered: Vec<_> = candidates
            .into_iter()
            .filter(|n| self.filters.iter().all(|f| f.filter(vm, n)))
            .collect();

        if filtered.is_empty() {
            return Err(Error::NoSuitableNode);
        }

        // Score phase
        let scored: Vec<_> = filtered
            .into_iter()
            .map(|n| {
                let score: i64 = self.scorers.iter().map(|s| s.score(vm, &n)).sum();
                (n, score)
            })
            .collect();

        // Select highest score
        let (node, _) = scored.into_iter().max_by_key(|(_, s)| *s).unwrap();
        Ok(node.id)
    }
}

6.2 Filters

pub trait ScheduleFilter: Send + Sync {
    fn name(&self) -> &'static str;
    fn filter(&self, vm: &VirtualMachine, node: &Node) -> bool;
}

// Built-in filters
struct ResourceFilter;      // CPU/memory fits
struct HypervisorFilter;    // Node supports hypervisor type
struct TaintFilter;         // Toleration matching
struct AffinityFilter;      // Node affinity rules
struct AntiAffinityFilter;  // Pod anti-affinity

6.3 Scorers

pub trait ScheduleScorer: Send + Sync {
    fn name(&self) -> &'static str;
    fn score(&self, vm: &VirtualMachine, node: &Node) -> i64;
}

// Built-in scorers
struct LeastAllocatedScorer;   // Prefer less loaded nodes
struct BalancedResourceScorer; // Balance CPU/memory ratio
struct LocalityScorer;         // Prefer same zone/rack

7. Multi-Tenancy

7.1 Resource Hierarchy

System (platform operators)
  └─ Organization (tenant boundary)
      └─ Project (workload isolation)
          └─ Resources (VMs, images, networks)

7.2 Scoped Resources

// All resources include scope identifiers
pub trait Scoped {
    fn org_id(&self) -> &str;
    fn project_id(&self) -> &str;
}

// Resource paths follow aegis pattern
// org/{org_id}/project/{project_id}/vm/{vm_id}
// org/{org_id}/project/{project_id}/image/{image_id}

7.3 Quotas

pub struct Quota {
    pub scope: Scope,                 // Org or Project
    pub limits: ResourceLimits,
    pub usage: ResourceUsage,
}

pub struct ResourceLimits {
    pub max_vms: Option<u32>,
    pub max_vcpus: Option<u32>,
    pub max_memory_gib: Option<u64>,
    pub max_storage_gib: Option<u64>,
    pub max_images: Option<u32>,
}

7.4 Namespace Isolation

Compute: VMs scoped to project, nodes shared across orgs
Network: Overlay network provides tenant isolation
Storage: Images can be private, shared, or public
Naming: Names unique within project scope

8. Storage

8.1 State Storage (Chainfire)

# VM records
plasmavmc/vms/{org_id}/{project_id}/{vm_id}

# Image records
plasmavmc/images/{org_id}/{image_id}
plasmavmc/images/public/{image_id}

# Node records
plasmavmc/nodes/{node_id}

# Scheduling state
plasmavmc/scheduler/assignments/{vm_id}
plasmavmc/scheduler/pending/{timestamp}/{vm_id}

8.2 Image Storage

Backend: LightningSTOR (object storage)
Format: Raw, qcow2, vmdk with automatic conversion
Caching: Node-local image cache with pull-through
Path: images/{org_id}/{image_id}/{version}

8.3 Disk Storage

Ephemeral: Local SSD/NVMe on node
Persistent: LightningSTOR volumes (via CSI)
Snapshot: Copy-on-write via backend

9. Configuration

9.1 Control Plane Config (TOML)

[server]
addr = "0.0.0.0:8080"

[server.tls]
cert_file = "/etc/plasmavmc/tls/server.crt"
key_file = "/etc/plasmavmc/tls/server.key"
ca_file = "/etc/plasmavmc/tls/ca.crt"

[store]
backend = "chainfire"
chainfire_endpoints = ["http://chainfire-1:2379", "http://chainfire-2:2379"]

[iam]
endpoint = "http://aegis:9090"
service_account = "plasmavmc-controller"
token_path = "/var/run/secrets/iam/token"

[scheduler]
default_hypervisor = "kvm"

[image_store]
backend = "lightningstор"
endpoint = "http://lightningstор:9000"
bucket = "vm-images"

[logging]
level = "info"
format = "json"

9.2 Agent Config (TOML)

[agent]
node_id = "node-001"
control_plane = "http://plasmavmc-api:8080"

[agent.tls]
cert_file = "/etc/plasmavmc/tls/agent.crt"
key_file = "/etc/plasmavmc/tls/agent.key"
ca_file = "/etc/plasmavmc/tls/ca.crt"

[hypervisors]
enabled = ["kvm", "firecracker"]

[hypervisors.kvm]
qemu_path = "/usr/bin/qemu-system-x86_64"
runtime_dir = "/var/run/plasmavmc/kvm"

[hypervisors.firecracker]
fc_path = "/usr/bin/firecracker"
jailer_path = "/usr/bin/jailer"
runtime_dir = "/var/run/plasmavmc/fc"

[storage]
image_cache_dir = "/var/lib/plasmavmc/images"
runtime_dir = "/var/lib/plasmavmc/vms"
cache_size_gib = 100

[network]
overlay_endpoint = "http://ovn-controller:6641"
bridge_name = "plasmavmc0"

[logging]
level = "info"
format = "json"

9.3 Environment Variables

Variable	Default	Description
`PLASMAVMC_CONFIG`	-	Config file path
`PLASMAVMC_ADDR`	`0.0.0.0:8080`	API listen address
`PLASMAVMC_LOG_LEVEL`	`info`	Log level
`PLASMAVMC_NODE_ID`	-	Agent node identifier

9.4 CLI Arguments

plasmavmc-server [OPTIONS]
  -c, --config <PATH>     Config file path
  -a, --addr <ADDR>       Listen address
  -l, --log-level <LEVEL> Log level
  -h, --help              Print help
  -V, --version           Print version

plasmavmc-agent [OPTIONS]
  -c, --config <PATH>     Config file path
  -n, --node-id <ID>      Node identifier
  --control-plane <URL>   Control plane endpoint
  -h, --help              Print help

10. Integration

10.1 Aegis (IAM)

// Authorization check before VM operations
async fn authorize_vm_action(
    iam: &IamClient,
    principal: &PrincipalRef,
    action: &str,
    vm: &VirtualMachine,
) -> Result<()> {
    let resource = ResourceRef {
        kind: "vm".into(),
        id: vm.id.to_string(),
        org_id: vm.org_id.clone(),
        project_id: vm.project_id.clone(),
        ..Default::default()
    };

    let allowed = iam.authorize(principal, action, &resource).await?;
    if !allowed {
        return Err(Error::PermissionDenied);
    }
    Ok(())
}

// Action patterns
// plasmavmc:vms:create
// plasmavmc:vms:get
// plasmavmc:vms:list
// plasmavmc:vms:update
// plasmavmc:vms:delete
// plasmavmc:vms:start
// plasmavmc:vms:stop
// plasmavmc:vms:console
// plasmavmc:images:create
// plasmavmc:images:get
// plasmavmc:images:delete

10.2 Overlay Network

// Network integration for VM NICs
pub trait NetworkProvider: Send + Sync {
    /// Allocate port for VM NIC
    async fn create_port(&self, req: CreatePortRequest) -> Result<Port>;

    /// Release port
    async fn delete_port(&self, port_id: &str) -> Result<()>;

    /// Get port details (MAC, IP, security groups)
    async fn get_port(&self, port_id: &str) -> Result<Port>;

    /// Update security groups
    async fn update_security_groups(
        &self,
        port_id: &str,
        groups: Vec<String>,
    ) -> Result<()>;
}

pub struct Port {
    pub id: String,
    pub network_id: String,
    pub mac_address: String,
    pub ip_addresses: Vec<IpAddress>,
    pub security_groups: Vec<String>,
    pub tap_device: String,  // Host tap device name
}

10.3 Chainfire (State)

// Watch for VM changes (controller pattern)
async fn reconcile_loop(chainfire: &ChainfireClient) {
    let mut watch = chainfire
        .watch_prefix("plasmavmc/vms/")
        .await?;

    while let Some(event) = watch.next().await {
        match event.event_type {
            Put => reconcile_vm(event.kv).await?,
            Delete => cleanup_vm(event.kv).await?,
        }
    }
}

11. Security

11.1 Authentication

Control Plane: mTLS + aegis tokens
Agent: mTLS with node certificate
Console: WebSocket with aegis token

11.2 Authorization

Integrated with aegis (IAM)
Action-based permissions
Scope enforcement (org/project)

11.3 VM Isolation

Process: Hypervisor process per VM
Filesystem: Seccomp, namespaces, chroot (FireCracker jailer)
Network: Overlay network tenant isolation
Resources: cgroups for CPU/memory limits

11.4 Image Security

Checksum verification on download
Signature verification (optional)
Content scanning integration point

12. Operations

12.1 Deployment

Single Node (Development)

# Start control plane
plasmavmc-server --config config.toml

# Start agent on same node
plasmavmc-agent --config agent.toml --node-id dev-node

Production Cluster

# Control plane (3 instances for HA)
plasmavmc-server --config config.toml

# Agents (each compute node)
plasmavmc-agent --config agent.toml --node-id node-$(hostname)

12.2 Monitoring

Metrics (Prometheus)

Metric	Type	Description
`plasmavmc_vms_total`	Gauge	Total VMs by state
`plasmavmc_vm_operations_total`	Counter	Operations by type
`plasmavmc_vm_boot_seconds`	Histogram	VM boot time
`plasmavmc_node_capacity_vcpus`	Gauge	Node vCPU capacity
`plasmavmc_node_allocated_vcpus`	Gauge	Allocated vCPUs
`plasmavmc_scheduler_latency_seconds`	Histogram	Scheduling latency
`plasmavmc_agent_heartbeat_age_seconds`	Gauge	Time since heartbeat

Health Endpoints

GET /health - Liveness
GET /ready - Readiness (chainfire connected, agents online)

12.3 Backup & Recovery

State: Chainfire handles via Raft snapshots
Images: LightningSTOR replication
VM Disks: Volume snapshots via storage backend

13. Compatibility

13.1 API Versioning

gRPC package: plasmavmc.v1
Semantic versioning
Backward compatible within major version

13.2 Hypervisor Versions

Backend	Minimum Version	Notes
QEMU/KVM	6.0	QMP protocol
FireCracker	1.0	API v1
mvisor	TBD

Appendix

A. Error Codes

Error	Meaning
VM_NOT_FOUND	VM does not exist
IMAGE_NOT_FOUND	Image does not exist
NODE_NOT_FOUND	Node does not exist
NO_SUITABLE_NODE	Scheduling failed
QUOTA_EXCEEDED	Resource quota exceeded
HYPERVISOR_ERROR	Backend operation failed
INVALID_STATE	Operation invalid for current state

B. Port Assignments

Port	Protocol	Purpose
8080	gRPC	Control plane API
8081	HTTP	Metrics/health
8082	gRPC	Agent internal API

C. Glossary

VM: Virtual Machine - an isolated compute instance
Hypervisor: Software that creates and runs VMs (KVM, FireCracker)
Image: Bootable disk image template
Node: Physical/virtual host running VMs
Agent: Daemon running on each node managing local VMs
Scheduler: Component selecting nodes for VM placement
Overlay Network: Virtual network providing tenant isolation

D. Backend Comparison

Feature	KVM/QEMU	FireCracker	mvisor
Boot time	~5s	<125ms	TBD
Memory overhead	Medium	Low	Low
Device support	Full	Limited	Limited
Live migration	Yes	No	No
VNC console	Yes	No	No
GPU passthrough	Yes	No	No
Nested virt	Yes	No	No
Best for	General	Serverless	TBD

32 KiB Raw Blame History Unescape Escape