photoncloud-monorepo/specifications/plasmavmc/README.md
centra 5c6eb04a46 T036: Add VM cluster deployment configs for nixos-anywhere
- netboot-base.nix with SSH key auth
- Launch scripts for node01/02/03
- Node configuration.nix and disko.nix
- Nix modules for first-boot automation

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2025-12-11 09:59:19 +09:00

32 KiB
Raw Blame History

PlasmaVMC Specification

Version: 1.0 | Status: Draft | Last Updated: 2025-12-08

1. Overview

1.1 Purpose

PlasmaVMC is a virtual machine control platform providing unified management across multiple hypervisor backends. It abstracts hypervisor-specific implementations behind trait-based interfaces, enabling consistent VM lifecycle management regardless of the underlying virtualization technology.

The name "Plasma" reflects its role as the energized medium that powers virtual machines, with "VMC" denoting Virtual Machine Controller.

1.2 Scope

  • In scope: VM lifecycle (create, start, stop, delete), hypervisor abstraction (KVM, FireCracker, mvisor), image management, resource allocation (CPU, memory, storage, network), multi-tenant isolation, console/serial access, live migration (future)
  • Out of scope: Container orchestration (Kubernetes), bare metal provisioning, storage backend implementation (uses LightningSTOR), network fabric (uses overlay network)

1.3 Design Goals

  • Hypervisor agnostic: Trait-based abstraction supporting KVM, FireCracker, mvisor
  • AWS/GCP EC2-like UX: Familiar concepts for cloud users
  • Multi-tenant from day one: Full org/project hierarchy with resource isolation
  • High density: Support thousands of VMs per node
  • Fast boot: Sub-second boot times with FireCracker/microVMs
  • Observable: Rich metrics, events, and audit logging

2. Architecture

2.1 Crate Structure

plasmavmc/
├── crates/
│   ├── plasmavmc-api/        # gRPC service implementations
│   ├── plasmavmc-client/     # Rust client library
│   ├── plasmavmc-core/       # Core orchestration logic
│   ├── plasmavmc-hypervisor/ # Hypervisor trait + registry
│   ├── plasmavmc-kvm/        # KVM/QEMU backend
│   ├── plasmavmc-firecracker/# FireCracker backend
│   ├── plasmavmc-mvisor/     # mvisor backend
│   ├── plasmavmc-server/     # Control plane server
│   ├── plasmavmc-agent/      # Node agent binary
│   ├── plasmavmc-storage/    # Image/disk management
│   └── plasmavmc-types/      # Shared types
└── proto/
    ├── plasmavmc.proto       # Public API
    └── agent.proto           # Agent internal RPCs

2.2 Component Topology

┌─────────────────────────────────────────────────────────────────┐
│                      Control Plane                               │
│  ┌─────────────────┐  ┌─────────────────┐  ┌─────────────────┐  │
│  │  plasmavmc-api  │  │ plasmavmc-core  │  │plasmavmc-storage│  │
│  │   (gRPC svc)    │──│  (scheduler)    │──│  (image mgmt)   │  │
│  └────────┬────────┘  └────────┬────────┘  └────────┬────────┘  │
│           │                    │                    │           │
│           └────────────────────┼────────────────────┘           │
│                                │                                 │
│                         ┌──────▼──────┐                         │
│                         │  Chainfire  │                         │
│                         │  (state)    │                         │
│                         └─────────────┘                         │
└─────────────────────────────────────────────────────────────────┘
                                 │
                    ┌────────────┼────────────┐
                    ▼            ▼            ▼
┌─────────────────────┐ ┌─────────────────────┐ ┌─────────────────────┐
│      Node 1         │ │      Node 2         │ │      Node N         │
│ ┌─────────────────┐ │ │ ┌─────────────────┐ │ │ ┌─────────────────┐ │
│ │ plasmavmc-agent │ │ │ │ plasmavmc-agent │ │ │ │ plasmavmc-agent │ │
│ └────────┬────────┘ │ │ └────────┬────────┘ │ │ └────────┬────────┘ │
│          │          │ │          │          │ │          │          │
│ ┌────────▼────────┐ │ │ ┌────────▼────────┐ │ │ ┌────────▼────────┐ │
│ │HypervisorBackend│ │ │ │HypervisorBackend│ │ │ │HypervisorBackend│ │
│ │ (KVM/FC/mvisor) │ │ │ │ (KVM/FC/mvisor) │ │ │ │ (KVM/FC/mvisor) │ │
│ └─────────────────┘ │ │ └─────────────────┘ │ │ └─────────────────┘ │
└─────────────────────┘ └─────────────────────┘ └─────────────────────┘

2.3 Data Flow

[Client gRPC] → [API Layer] → [Scheduler] → [Agent gRPC] → [Hypervisor]
                     │              │
                     ▼              ▼
               [Chainfire]    [Node Selection]
               (VM state)     (capacity, affinity)

2.4 Dependencies

Crate Version Purpose
tokio 1.x Async runtime
tonic 0.12 gRPC framework
prost 0.13 Protocol buffers
uuid 1.x VM/resource identifiers
dashmap 6.x Concurrent state caches
nix 0.29 Linux system calls

3. Core Concepts

3.1 Virtual Machine (VM)

The primary managed resource representing a virtual machine instance.

pub struct VirtualMachine {
    pub id: VmId,                    // UUID
    pub name: String,                // User-defined name
    pub org_id: String,              // Organization owner
    pub project_id: String,          // Project owner
    pub state: VmState,              // Current state
    pub spec: VmSpec,                // Desired configuration
    pub status: VmStatus,            // Runtime status
    pub node_id: Option<NodeId>,     // Assigned node
    pub hypervisor: HypervisorType,  // Backend type
    pub created_at: u64,
    pub updated_at: u64,
    pub created_by: String,          // Principal ID
    pub metadata: HashMap<String, String>,
    pub labels: HashMap<String, String>,
}

pub struct VmSpec {
    pub cpu: CpuSpec,
    pub memory: MemorySpec,
    pub disks: Vec<DiskSpec>,
    pub network: Vec<NetworkSpec>,
    pub boot: BootSpec,
    pub security: SecuritySpec,
}

pub struct CpuSpec {
    pub vcpus: u32,              // Number of vCPUs
    pub cores_per_socket: u32,   // Topology: cores per socket
    pub sockets: u32,            // Topology: socket count
    pub cpu_model: Option<String>, // e.g., "host-passthrough"
}

pub struct MemorySpec {
    pub size_mib: u64,           // Memory size in MiB
    pub hugepages: bool,         // Use huge pages
    pub numa_nodes: Vec<NumaNode>, // NUMA topology
}

pub struct DiskSpec {
    pub id: String,              // Disk identifier
    pub source: DiskSource,      // Image or volume
    pub size_gib: u64,           // Disk size
    pub bus: DiskBus,            // virtio, scsi, ide
    pub cache: DiskCache,        // none, writeback, writethrough
    pub boot_index: Option<u32>, // Boot order
}

pub struct NetworkSpec {
    pub id: String,              // Interface identifier
    pub network_id: String,      // Overlay network ID
    pub mac_address: Option<String>,
    pub ip_address: Option<String>,
    pub model: NicModel,         // virtio-net, e1000
    pub security_groups: Vec<String>,
}

3.2 VM State Machine

                        ┌──────────────────────────────────────┐
                        ▼                                      │
┌─────────┐   create   ┌─────────┐   start   ┌─────────┐      │
│ PENDING │──────────►│ STOPPED │──────────►│ RUNNING │      │
└─────────┘            └────┬────┘           └────┬────┘      │
                            │ delete              │           │
                            ▼                     │ stop      │
                       ┌─────────┐                │           │
                       │ DELETED │◄───────────────┤           │
                       └─────────┘                │           │
                                                  │ reboot    │
                                                  └───────────┘

Additional states:
  CREATING  - Provisioning resources
  STARTING  - Boot in progress
  STOPPING  - Shutdown in progress
  MIGRATING - Live migration in progress
  ERROR     - Failed state (recoverable)
  FAILED    - Terminal failure
pub enum VmState {
    Pending,     // Awaiting scheduling
    Creating,    // Resources being provisioned
    Stopped,     // Created but not running
    Starting,    // Boot in progress
    Running,     // Active and healthy
    Stopping,    // Graceful shutdown
    Migrating,   // Live migration in progress
    Error,       // Recoverable error
    Failed,      // Terminal failure
    Deleted,     // Soft-deleted, pending cleanup
}

3.3 Runtime Status

pub struct VmStatus {
    pub actual_state: VmState,
    pub host_pid: Option<u32>,       // Hypervisor process PID
    pub started_at: Option<u64>,     // Last boot timestamp
    pub ip_addresses: Vec<IpAddress>,
    pub resource_usage: ResourceUsage,
    pub last_error: Option<String>,
    pub conditions: Vec<Condition>,
}

pub struct ResourceUsage {
    pub cpu_percent: f64,
    pub memory_used_mib: u64,
    pub disk_read_bytes: u64,
    pub disk_write_bytes: u64,
    pub network_rx_bytes: u64,
    pub network_tx_bytes: u64,
}

3.4 Image

Bootable disk images for VM creation.

pub struct Image {
    pub id: ImageId,
    pub name: String,
    pub org_id: String,              // Owner org (or "system" for public)
    pub visibility: Visibility,      // Public, Private, Shared
    pub source: ImageSource,
    pub format: ImageFormat,
    pub size_bytes: u64,
    pub checksum: String,            // SHA256
    pub os_type: OsType,
    pub os_version: String,
    pub architecture: Architecture,
    pub min_disk_gib: u32,
    pub min_memory_mib: u32,
    pub status: ImageStatus,
    pub created_at: u64,
    pub updated_at: u64,
    pub metadata: HashMap<String, String>,
}

pub enum ImageSource {
    Url { url: String },
    Upload { storage_path: String },
    Snapshot { vm_id: VmId, disk_id: String },
}

pub enum ImageFormat {
    Raw,
    Qcow2,
    Vmdk,
    Vhd,
}

pub enum Visibility {
    Public,          // Available to all orgs
    Private,         // Only owner org
    Shared { orgs: Vec<String> },
}

3.5 Node

Physical or virtual host running the agent.

pub struct Node {
    pub id: NodeId,
    pub name: String,
    pub state: NodeState,
    pub capacity: NodeCapacity,
    pub allocatable: NodeCapacity,
    pub allocated: NodeCapacity,
    pub hypervisors: Vec<HypervisorType>,  // Supported backends
    pub labels: HashMap<String, String>,
    pub taints: Vec<Taint>,
    pub conditions: Vec<NodeCondition>,
    pub agent_version: String,
    pub last_heartbeat: u64,
}

pub struct NodeCapacity {
    pub vcpus: u32,
    pub memory_mib: u64,
    pub storage_gib: u64,
}

pub enum NodeState {
    Ready,
    NotReady,
    Cordoned,      // No new VMs scheduled
    Draining,      // Migrating VMs off
    Maintenance,
}

4. Hypervisor Abstraction

4.1 Backend Trait

#[async_trait]
pub trait HypervisorBackend: Send + Sync {
    /// Backend identifier
    fn backend_type(&self) -> HypervisorType;

    /// Check if this backend supports the given VM spec
    fn supports(&self, spec: &VmSpec) -> Result<(), UnsupportedFeature>;

    /// Create VM resources (disk, network) without starting
    async fn create(&self, vm: &VirtualMachine) -> Result<VmHandle>;

    /// Start the VM
    async fn start(&self, handle: &VmHandle) -> Result<()>;

    /// Stop the VM (graceful shutdown)
    async fn stop(&self, handle: &VmHandle, timeout: Duration) -> Result<()>;

    /// Force stop the VM
    async fn kill(&self, handle: &VmHandle) -> Result<()>;

    /// Reboot the VM
    async fn reboot(&self, handle: &VmHandle) -> Result<()>;

    /// Delete VM and cleanup resources
    async fn delete(&self, handle: &VmHandle) -> Result<()>;

    /// Get current VM status
    async fn status(&self, handle: &VmHandle) -> Result<VmStatus>;

    /// Attach a disk to running VM
    async fn attach_disk(&self, handle: &VmHandle, disk: &DiskSpec) -> Result<()>;

    /// Detach a disk from running VM
    async fn detach_disk(&self, handle: &VmHandle, disk_id: &str) -> Result<()>;

    /// Attach a network interface
    async fn attach_nic(&self, handle: &VmHandle, nic: &NetworkSpec) -> Result<()>;

    /// Get console stream (VNC/serial)
    async fn console(&self, handle: &VmHandle, console_type: ConsoleType)
        -> Result<Box<dyn AsyncReadWrite>>;

    /// Take a snapshot
    async fn snapshot(&self, handle: &VmHandle, snapshot_id: &str) -> Result<()>;
}

4.2 Hypervisor Types

pub enum HypervisorType {
    Kvm,          // QEMU/KVM - full-featured
    Firecracker,  // AWS Firecracker - microVMs
    Mvisor,       // mvisor - lightweight
}

4.3 Backend Registry

pub struct HypervisorRegistry {
    backends: HashMap<HypervisorType, Arc<dyn HypervisorBackend>>,
}

impl HypervisorRegistry {
    pub fn register(&mut self, backend: Arc<dyn HypervisorBackend>);
    pub fn get(&self, typ: HypervisorType) -> Option<Arc<dyn HypervisorBackend>>;
    pub fn available(&self) -> Vec<HypervisorType>;
}

4.4 Backend Capabilities

pub struct BackendCapabilities {
    pub live_migration: bool,
    pub hot_plug_cpu: bool,
    pub hot_plug_memory: bool,
    pub hot_plug_disk: bool,
    pub hot_plug_nic: bool,
    pub vnc_console: bool,
    pub serial_console: bool,
    pub nested_virtualization: bool,
    pub gpu_passthrough: bool,
    pub max_vcpus: u32,
    pub max_memory_gib: u64,
    pub supported_disk_buses: Vec<DiskBus>,
    pub supported_nic_models: Vec<NicModel>,
}

4.5 KVM Backend Implementation

// plasmavmc-kvm crate
pub struct KvmBackend {
    qemu_path: PathBuf,
    runtime_dir: PathBuf,
    network_helper: NetworkHelper,
}

impl HypervisorBackend for KvmBackend {
    fn backend_type(&self) -> HypervisorType {
        HypervisorType::Kvm
    }

    async fn create(&self, vm: &VirtualMachine) -> Result<VmHandle> {
        // Generate QEMU command line
        // Create runtime directory
        // Prepare disks and network devices
    }

    async fn start(&self, handle: &VmHandle) -> Result<()> {
        // Launch QEMU process
        // Wait for QMP socket
        // Configure via QMP
    }
    // ... other methods
}

4.6 FireCracker Backend Implementation

// plasmavmc-firecracker crate
pub struct FirecrackerBackend {
    fc_path: PathBuf,
    jailer_path: PathBuf,
    runtime_dir: PathBuf,
}

impl HypervisorBackend for FirecrackerBackend {
    fn backend_type(&self) -> HypervisorType {
        HypervisorType::Firecracker
    }

    fn supports(&self, spec: &VmSpec) -> Result<(), UnsupportedFeature> {
        // FireCracker limitations:
        // - No VNC, only serial
        // - No live migration
        // - Limited device models
        if spec.disks.iter().any(|d| d.bus != DiskBus::Virtio) {
            return Err(UnsupportedFeature::DiskBus);
        }
        Ok(())
    }
    // ... other methods
}

5. API

5.1 gRPC Services

VM Service (plasmavmc.v1.VmService)

service VmService {
  // Lifecycle
  rpc CreateVm(CreateVmRequest) returns (VirtualMachine);
  rpc GetVm(GetVmRequest) returns (VirtualMachine);
  rpc ListVms(ListVmsRequest) returns (ListVmsResponse);
  rpc UpdateVm(UpdateVmRequest) returns (VirtualMachine);
  rpc DeleteVm(DeleteVmRequest) returns (Empty);

  // Power operations
  rpc StartVm(StartVmRequest) returns (VirtualMachine);
  rpc StopVm(StopVmRequest) returns (VirtualMachine);
  rpc RebootVm(RebootVmRequest) returns (VirtualMachine);
  rpc ResetVm(ResetVmRequest) returns (VirtualMachine);

  // Disks
  rpc AttachDisk(AttachDiskRequest) returns (VirtualMachine);
  rpc DetachDisk(DetachDiskRequest) returns (VirtualMachine);

  // Network
  rpc AttachNic(AttachNicRequest) returns (VirtualMachine);
  rpc DetachNic(DetachNicRequest) returns (VirtualMachine);

  // Console
  rpc GetConsole(GetConsoleRequest) returns (stream ConsoleData);

  // Events
  rpc WatchVm(WatchVmRequest) returns (stream VmEvent);
}

Image Service (plasmavmc.v1.ImageService)

service ImageService {
  rpc CreateImage(CreateImageRequest) returns (Image);
  rpc GetImage(GetImageRequest) returns (Image);
  rpc ListImages(ListImagesRequest) returns (ListImagesResponse);
  rpc UpdateImage(UpdateImageRequest) returns (Image);
  rpc DeleteImage(DeleteImageRequest) returns (Empty);

  // Upload/Download
  rpc UploadImage(stream UploadImageRequest) returns (Image);
  rpc DownloadImage(DownloadImageRequest) returns (stream DownloadImageResponse);

  // Conversion
  rpc ConvertImage(ConvertImageRequest) returns (Image);
}

Node Service (plasmavmc.v1.NodeService)

service NodeService {
  rpc ListNodes(ListNodesRequest) returns (ListNodesResponse);
  rpc GetNode(GetNodeRequest) returns (Node);
  rpc CordonNode(CordonNodeRequest) returns (Node);
  rpc UncordonNode(UncordonNodeRequest) returns (Node);
  rpc DrainNode(DrainNodeRequest) returns (Node);
}

5.2 Agent Internal API (plasmavmc.agent.v1)

service AgentService {
  // VM operations (called by control plane)
  rpc CreateVm(CreateVmRequest) returns (VmHandle);
  rpc StartVm(StartVmRequest) returns (Empty);
  rpc StopVm(StopVmRequest) returns (Empty);
  rpc DeleteVm(DeleteVmRequest) returns (Empty);
  rpc GetVmStatus(GetVmStatusRequest) returns (VmStatus);

  // Node status (reported to control plane)
  rpc Heartbeat(HeartbeatRequest) returns (HeartbeatResponse);
  rpc ReportStatus(ReportStatusRequest) returns (Empty);
}

5.3 Client Library

use plasmavmc_client::PlasmaClient;

let client = PlasmaClient::connect("http://127.0.0.1:8080").await?;

// Create VM
let vm = client.create_vm(CreateVmRequest {
    name: "my-vm".into(),
    org_id: "org-1".into(),
    project_id: "proj-1".into(),
    spec: VmSpec {
        cpu: CpuSpec { vcpus: 2, ..Default::default() },
        memory: MemorySpec { size_mib: 2048, ..Default::default() },
        disks: vec![DiskSpec {
            source: DiskSource::Image { id: "ubuntu-22.04".into() },
            size_gib: 20,
            ..Default::default()
        }],
        network: vec![NetworkSpec {
            network_id: "default".into(),
            ..Default::default()
        }],
        ..Default::default()
    },
    hypervisor: HypervisorType::Kvm,
    ..Default::default()
}).await?;

// Start VM
client.start_vm(vm.id).await?;

// Watch events
let mut stream = client.watch_vm(vm.id).await?;
while let Some(event) = stream.next().await {
    println!("Event: {:?}", event);
}

6. Scheduling

6.1 Scheduler

pub struct Scheduler {
    node_cache: Arc<NodeCache>,
    filters: Vec<Box<dyn ScheduleFilter>>,
    scorers: Vec<Box<dyn ScheduleScorer>>,
}

impl Scheduler {
    pub async fn schedule(&self, vm: &VirtualMachine) -> Result<NodeId> {
        let candidates = self.node_cache.ready_nodes();

        // Filter phase
        let filtered: Vec<_> = candidates
            .into_iter()
            .filter(|n| self.filters.iter().all(|f| f.filter(vm, n)))
            .collect();

        if filtered.is_empty() {
            return Err(Error::NoSuitableNode);
        }

        // Score phase
        let scored: Vec<_> = filtered
            .into_iter()
            .map(|n| {
                let score: i64 = self.scorers.iter().map(|s| s.score(vm, &n)).sum();
                (n, score)
            })
            .collect();

        // Select highest score
        let (node, _) = scored.into_iter().max_by_key(|(_, s)| *s).unwrap();
        Ok(node.id)
    }
}

6.2 Filters

pub trait ScheduleFilter: Send + Sync {
    fn name(&self) -> &'static str;
    fn filter(&self, vm: &VirtualMachine, node: &Node) -> bool;
}

// Built-in filters
struct ResourceFilter;      // CPU/memory fits
struct HypervisorFilter;    // Node supports hypervisor type
struct TaintFilter;         // Toleration matching
struct AffinityFilter;      // Node affinity rules
struct AntiAffinityFilter;  // Pod anti-affinity

6.3 Scorers

pub trait ScheduleScorer: Send + Sync {
    fn name(&self) -> &'static str;
    fn score(&self, vm: &VirtualMachine, node: &Node) -> i64;
}

// Built-in scorers
struct LeastAllocatedScorer;   // Prefer less loaded nodes
struct BalancedResourceScorer; // Balance CPU/memory ratio
struct LocalityScorer;         // Prefer same zone/rack

7. Multi-Tenancy

7.1 Resource Hierarchy

System (platform operators)
  └─ Organization (tenant boundary)
      └─ Project (workload isolation)
          └─ Resources (VMs, images, networks)

7.2 Scoped Resources

// All resources include scope identifiers
pub trait Scoped {
    fn org_id(&self) -> &str;
    fn project_id(&self) -> &str;
}

// Resource paths follow aegis pattern
// org/{org_id}/project/{project_id}/vm/{vm_id}
// org/{org_id}/project/{project_id}/image/{image_id}

7.3 Quotas

pub struct Quota {
    pub scope: Scope,                 // Org or Project
    pub limits: ResourceLimits,
    pub usage: ResourceUsage,
}

pub struct ResourceLimits {
    pub max_vms: Option<u32>,
    pub max_vcpus: Option<u32>,
    pub max_memory_gib: Option<u64>,
    pub max_storage_gib: Option<u64>,
    pub max_images: Option<u32>,
}

7.4 Namespace Isolation

  • Compute: VMs scoped to project, nodes shared across orgs
  • Network: Overlay network provides tenant isolation
  • Storage: Images can be private, shared, or public
  • Naming: Names unique within project scope

8. Storage

8.1 State Storage (Chainfire)

# VM records
plasmavmc/vms/{org_id}/{project_id}/{vm_id}

# Image records
plasmavmc/images/{org_id}/{image_id}
plasmavmc/images/public/{image_id}

# Node records
plasmavmc/nodes/{node_id}

# Scheduling state
plasmavmc/scheduler/assignments/{vm_id}
plasmavmc/scheduler/pending/{timestamp}/{vm_id}

8.2 Image Storage

  • Backend: LightningSTOR (object storage)
  • Format: Raw, qcow2, vmdk with automatic conversion
  • Caching: Node-local image cache with pull-through
  • Path: images/{org_id}/{image_id}/{version}

8.3 Disk Storage

  • Ephemeral: Local SSD/NVMe on node
  • Persistent: LightningSTOR volumes (via CSI)
  • Snapshot: Copy-on-write via backend

9. Configuration

9.1 Control Plane Config (TOML)

[server]
addr = "0.0.0.0:8080"

[server.tls]
cert_file = "/etc/plasmavmc/tls/server.crt"
key_file = "/etc/plasmavmc/tls/server.key"
ca_file = "/etc/plasmavmc/tls/ca.crt"

[store]
backend = "chainfire"
chainfire_endpoints = ["http://chainfire-1:2379", "http://chainfire-2:2379"]

[iam]
endpoint = "http://aegis:9090"
service_account = "plasmavmc-controller"
token_path = "/var/run/secrets/iam/token"

[scheduler]
default_hypervisor = "kvm"

[image_store]
backend = "lightningstор"
endpoint = "http://lightningstор:9000"
bucket = "vm-images"

[logging]
level = "info"
format = "json"

9.2 Agent Config (TOML)

[agent]
node_id = "node-001"
control_plane = "http://plasmavmc-api:8080"

[agent.tls]
cert_file = "/etc/plasmavmc/tls/agent.crt"
key_file = "/etc/plasmavmc/tls/agent.key"
ca_file = "/etc/plasmavmc/tls/ca.crt"

[hypervisors]
enabled = ["kvm", "firecracker"]

[hypervisors.kvm]
qemu_path = "/usr/bin/qemu-system-x86_64"
runtime_dir = "/var/run/plasmavmc/kvm"

[hypervisors.firecracker]
fc_path = "/usr/bin/firecracker"
jailer_path = "/usr/bin/jailer"
runtime_dir = "/var/run/plasmavmc/fc"

[storage]
image_cache_dir = "/var/lib/plasmavmc/images"
runtime_dir = "/var/lib/plasmavmc/vms"
cache_size_gib = 100

[network]
overlay_endpoint = "http://ovn-controller:6641"
bridge_name = "plasmavmc0"

[logging]
level = "info"
format = "json"

9.3 Environment Variables

Variable Default Description
PLASMAVMC_CONFIG - Config file path
PLASMAVMC_ADDR 0.0.0.0:8080 API listen address
PLASMAVMC_LOG_LEVEL info Log level
PLASMAVMC_NODE_ID - Agent node identifier

9.4 CLI Arguments

plasmavmc-server [OPTIONS]
  -c, --config <PATH>     Config file path
  -a, --addr <ADDR>       Listen address
  -l, --log-level <LEVEL> Log level
  -h, --help              Print help
  -V, --version           Print version

plasmavmc-agent [OPTIONS]
  -c, --config <PATH>     Config file path
  -n, --node-id <ID>      Node identifier
  --control-plane <URL>   Control plane endpoint
  -h, --help              Print help

10. Integration

10.1 Aegis (IAM)

// Authorization check before VM operations
async fn authorize_vm_action(
    iam: &IamClient,
    principal: &PrincipalRef,
    action: &str,
    vm: &VirtualMachine,
) -> Result<()> {
    let resource = ResourceRef {
        kind: "vm".into(),
        id: vm.id.to_string(),
        org_id: vm.org_id.clone(),
        project_id: vm.project_id.clone(),
        ..Default::default()
    };

    let allowed = iam.authorize(principal, action, &resource).await?;
    if !allowed {
        return Err(Error::PermissionDenied);
    }
    Ok(())
}

// Action patterns
// plasmavmc:vms:create
// plasmavmc:vms:get
// plasmavmc:vms:list
// plasmavmc:vms:update
// plasmavmc:vms:delete
// plasmavmc:vms:start
// plasmavmc:vms:stop
// plasmavmc:vms:console
// plasmavmc:images:create
// plasmavmc:images:get
// plasmavmc:images:delete

10.2 Overlay Network

// Network integration for VM NICs
pub trait NetworkProvider: Send + Sync {
    /// Allocate port for VM NIC
    async fn create_port(&self, req: CreatePortRequest) -> Result<Port>;

    /// Release port
    async fn delete_port(&self, port_id: &str) -> Result<()>;

    /// Get port details (MAC, IP, security groups)
    async fn get_port(&self, port_id: &str) -> Result<Port>;

    /// Update security groups
    async fn update_security_groups(
        &self,
        port_id: &str,
        groups: Vec<String>,
    ) -> Result<()>;
}

pub struct Port {
    pub id: String,
    pub network_id: String,
    pub mac_address: String,
    pub ip_addresses: Vec<IpAddress>,
    pub security_groups: Vec<String>,
    pub tap_device: String,  // Host tap device name
}

10.3 Chainfire (State)

// Watch for VM changes (controller pattern)
async fn reconcile_loop(chainfire: &ChainfireClient) {
    let mut watch = chainfire
        .watch_prefix("plasmavmc/vms/")
        .await?;

    while let Some(event) = watch.next().await {
        match event.event_type {
            Put => reconcile_vm(event.kv).await?,
            Delete => cleanup_vm(event.kv).await?,
        }
    }
}

11. Security

11.1 Authentication

  • Control Plane: mTLS + aegis tokens
  • Agent: mTLS with node certificate
  • Console: WebSocket with aegis token

11.2 Authorization

  • Integrated with aegis (IAM)
  • Action-based permissions
  • Scope enforcement (org/project)

11.3 VM Isolation

  • Process: Hypervisor process per VM
  • Filesystem: Seccomp, namespaces, chroot (FireCracker jailer)
  • Network: Overlay network tenant isolation
  • Resources: cgroups for CPU/memory limits

11.4 Image Security

  • Checksum verification on download
  • Signature verification (optional)
  • Content scanning integration point

12. Operations

12.1 Deployment

Single Node (Development)

# Start control plane
plasmavmc-server --config config.toml

# Start agent on same node
plasmavmc-agent --config agent.toml --node-id dev-node

Production Cluster

# Control plane (3 instances for HA)
plasmavmc-server --config config.toml

# Agents (each compute node)
plasmavmc-agent --config agent.toml --node-id node-$(hostname)

12.2 Monitoring

Metrics (Prometheus)

Metric Type Description
plasmavmc_vms_total Gauge Total VMs by state
plasmavmc_vm_operations_total Counter Operations by type
plasmavmc_vm_boot_seconds Histogram VM boot time
plasmavmc_node_capacity_vcpus Gauge Node vCPU capacity
plasmavmc_node_allocated_vcpus Gauge Allocated vCPUs
plasmavmc_scheduler_latency_seconds Histogram Scheduling latency
plasmavmc_agent_heartbeat_age_seconds Gauge Time since heartbeat

Health Endpoints

  • GET /health - Liveness
  • GET /ready - Readiness (chainfire connected, agents online)

12.3 Backup & Recovery

  • State: Chainfire handles via Raft snapshots
  • Images: LightningSTOR replication
  • VM Disks: Volume snapshots via storage backend

13. Compatibility

13.1 API Versioning

  • gRPC package: plasmavmc.v1
  • Semantic versioning
  • Backward compatible within major version

13.2 Hypervisor Versions

Backend Minimum Version Notes
QEMU/KVM 6.0 QMP protocol
FireCracker 1.0 API v1
mvisor TBD

Appendix

A. Error Codes

Error Meaning
VM_NOT_FOUND VM does not exist
IMAGE_NOT_FOUND Image does not exist
NODE_NOT_FOUND Node does not exist
NO_SUITABLE_NODE Scheduling failed
QUOTA_EXCEEDED Resource quota exceeded
HYPERVISOR_ERROR Backend operation failed
INVALID_STATE Operation invalid for current state

B. Port Assignments

Port Protocol Purpose
8080 gRPC Control plane API
8081 HTTP Metrics/health
8082 gRPC Agent internal API

C. Glossary

  • VM: Virtual Machine - an isolated compute instance
  • Hypervisor: Software that creates and runs VMs (KVM, FireCracker)
  • Image: Bootable disk image template
  • Node: Physical/virtual host running VMs
  • Agent: Daemon running on each node managing local VMs
  • Scheduler: Component selecting nodes for VM placement
  • Overlay Network: Virtual network providing tenant isolation

D. Backend Comparison

Feature KVM/QEMU FireCracker mvisor
Boot time ~5s <125ms TBD
Memory overhead Medium Low Low
Device support Full Limited Limited
Live migration Yes No No
VNC console Yes No No
GPU passthrough Yes No No
Nested virt Yes No No
Best for General Serverless TBD