# PlasmaVMC Specification > Version: 1.0 | Status: Draft | Last Updated: 2025-12-08 ## 1. Overview ### 1.1 Purpose PlasmaVMC is a virtual machine control platform providing unified management across multiple hypervisor backends. It abstracts hypervisor-specific implementations behind trait-based interfaces, enabling consistent VM lifecycle management regardless of the underlying virtualization technology. The name "Plasma" reflects its role as the energized medium that powers virtual machines, with "VMC" denoting Virtual Machine Controller. ### 1.2 Scope - **In scope**: VM lifecycle (create, start, stop, delete), hypervisor abstraction (KVM, FireCracker, mvisor), image management, resource allocation (CPU, memory, storage, network), multi-tenant isolation, console/serial access, live migration (future) - **Out of scope**: Container orchestration (Kubernetes), bare metal provisioning, storage backend implementation (uses LightningSTOR), network fabric (uses overlay network) ### 1.3 Design Goals - **Hypervisor agnostic**: Trait-based abstraction supporting KVM, FireCracker, mvisor - **AWS/GCP EC2-like UX**: Familiar concepts for cloud users - **Multi-tenant from day one**: Full org/project hierarchy with resource isolation - **High density**: Support thousands of VMs per node - **Fast boot**: Sub-second boot times with FireCracker/microVMs - **Observable**: Rich metrics, events, and audit logging ## 2. Architecture ### 2.1 Crate Structure ``` plasmavmc/ ├── crates/ │ ├── plasmavmc-api/ # gRPC service implementations │ ├── plasmavmc-client/ # Rust client library │ ├── plasmavmc-core/ # Core orchestration logic │ ├── plasmavmc-hypervisor/ # Hypervisor trait + registry │ ├── plasmavmc-kvm/ # KVM/QEMU backend │ ├── plasmavmc-firecracker/# FireCracker backend │ ├── plasmavmc-mvisor/ # mvisor backend │ ├── plasmavmc-server/ # Control plane server │ ├── plasmavmc-agent/ # Node agent binary │ ├── plasmavmc-storage/ # Image/disk management │ └── plasmavmc-types/ # Shared types └── proto/ ├── plasmavmc.proto # Public API └── agent.proto # Agent internal RPCs ``` ### 2.2 Component Topology ``` ┌─────────────────────────────────────────────────────────────────┐ │ Control Plane │ │ ┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐ │ │ │ plasmavmc-api │ │ plasmavmc-core │ │plasmavmc-storage│ │ │ │ (gRPC svc) │──│ (scheduler) │──│ (image mgmt) │ │ │ └────────┬────────┘ └────────┬────────┘ └────────┬────────┘ │ │ │ │ │ │ │ └────────────────────┼────────────────────┘ │ │ │ │ │ ┌──────▼──────┐ │ │ │ Chainfire │ │ │ │ (state) │ │ │ └─────────────┘ │ └─────────────────────────────────────────────────────────────────┘ │ ┌────────────┼────────────┐ ▼ ▼ ▼ ┌─────────────────────┐ ┌─────────────────────┐ ┌─────────────────────┐ │ Node 1 │ │ Node 2 │ │ Node N │ │ ┌─────────────────┐ │ │ ┌─────────────────┐ │ │ ┌─────────────────┐ │ │ │ plasmavmc-agent │ │ │ │ plasmavmc-agent │ │ │ │ plasmavmc-agent │ │ │ └────────┬────────┘ │ │ └────────┬────────┘ │ │ └────────┬────────┘ │ │ │ │ │ │ │ │ │ │ │ ┌────────▼────────┐ │ │ ┌────────▼────────┐ │ │ ┌────────▼────────┐ │ │ │HypervisorBackend│ │ │ │HypervisorBackend│ │ │ │HypervisorBackend│ │ │ │ (KVM/FC/mvisor) │ │ │ │ (KVM/FC/mvisor) │ │ │ │ (KVM/FC/mvisor) │ │ │ └─────────────────┘ │ │ └─────────────────┘ │ │ └─────────────────┘ │ └─────────────────────┘ └─────────────────────┘ └─────────────────────┘ ``` ### 2.3 Data Flow ``` [Client gRPC] → [API Layer] → [Scheduler] → [Agent gRPC] → [Hypervisor] │ │ ▼ ▼ [Chainfire] [Node Selection] (VM state) (capacity, affinity) ``` ### 2.4 Dependencies | Crate | Version | Purpose | |-------|---------|---------| | tokio | 1.x | Async runtime | | tonic | 0.12 | gRPC framework | | prost | 0.13 | Protocol buffers | | uuid | 1.x | VM/resource identifiers | | dashmap | 6.x | Concurrent state caches | | nix | 0.29 | Linux system calls | ## 3. Core Concepts ### 3.1 Virtual Machine (VM) The primary managed resource representing a virtual machine instance. ```rust pub struct VirtualMachine { pub id: VmId, // UUID pub name: String, // User-defined name pub org_id: String, // Organization owner pub project_id: String, // Project owner pub state: VmState, // Current state pub spec: VmSpec, // Desired configuration pub status: VmStatus, // Runtime status pub node_id: Option, // Assigned node pub hypervisor: HypervisorType, // Backend type pub created_at: u64, pub updated_at: u64, pub created_by: String, // Principal ID pub metadata: HashMap, pub labels: HashMap, } pub struct VmSpec { pub cpu: CpuSpec, pub memory: MemorySpec, pub disks: Vec, pub network: Vec, pub boot: BootSpec, pub security: SecuritySpec, } pub struct CpuSpec { pub vcpus: u32, // Number of vCPUs pub cores_per_socket: u32, // Topology: cores per socket pub sockets: u32, // Topology: socket count pub cpu_model: Option, // e.g., "host-passthrough" } pub struct MemorySpec { pub size_mib: u64, // Memory size in MiB pub hugepages: bool, // Use huge pages pub numa_nodes: Vec, // NUMA topology } pub struct DiskSpec { pub id: String, // Disk identifier pub source: DiskSource, // Image or volume pub size_gib: u64, // Disk size pub bus: DiskBus, // virtio, scsi, ide pub cache: DiskCache, // none, writeback, writethrough pub boot_index: Option, // Boot order } pub struct NetworkSpec { pub id: String, // Interface identifier pub network_id: String, // Overlay network ID pub mac_address: Option, pub ip_address: Option, pub model: NicModel, // virtio-net, e1000 pub security_groups: Vec, } ``` ### 3.2 VM State Machine ``` ┌──────────────────────────────────────┐ ▼ │ ┌─────────┐ create ┌─────────┐ start ┌─────────┐ │ │ PENDING │──────────►│ STOPPED │──────────►│ RUNNING │ │ └─────────┘ └────┬────┘ └────┬────┘ │ │ delete │ │ ▼ │ stop │ ┌─────────┐ │ │ │ DELETED │◄───────────────┤ │ └─────────┘ │ │ │ reboot │ └───────────┘ Additional states: CREATING - Provisioning resources STARTING - Boot in progress STOPPING - Shutdown in progress MIGRATING - Live migration in progress ERROR - Failed state (recoverable) FAILED - Terminal failure ``` ```rust pub enum VmState { Pending, // Awaiting scheduling Creating, // Resources being provisioned Stopped, // Created but not running Starting, // Boot in progress Running, // Active and healthy Stopping, // Graceful shutdown Migrating, // Live migration in progress Error, // Recoverable error Failed, // Terminal failure Deleted, // Soft-deleted, pending cleanup } ``` ### 3.3 Runtime Status ```rust pub struct VmStatus { pub actual_state: VmState, pub host_pid: Option, // Hypervisor process PID pub started_at: Option, // Last boot timestamp pub ip_addresses: Vec, pub resource_usage: ResourceUsage, pub last_error: Option, pub conditions: Vec, } pub struct ResourceUsage { pub cpu_percent: f64, pub memory_used_mib: u64, pub disk_read_bytes: u64, pub disk_write_bytes: u64, pub network_rx_bytes: u64, pub network_tx_bytes: u64, } ``` ### 3.4 Image Bootable disk images for VM creation. ```rust pub struct Image { pub id: ImageId, pub name: String, pub org_id: String, // Owner org (or "system" for public) pub visibility: Visibility, // Public, Private, Shared pub source: ImageSource, pub format: ImageFormat, pub size_bytes: u64, pub checksum: String, // SHA256 pub os_type: OsType, pub os_version: String, pub architecture: Architecture, pub min_disk_gib: u32, pub min_memory_mib: u32, pub status: ImageStatus, pub created_at: u64, pub updated_at: u64, pub metadata: HashMap, } pub enum ImageSource { Url { url: String }, Upload { storage_path: String }, Snapshot { vm_id: VmId, disk_id: String }, } pub enum ImageFormat { Raw, Qcow2, Vmdk, Vhd, } pub enum Visibility { Public, // Available to all orgs Private, // Only owner org Shared { orgs: Vec }, } ``` ### 3.5 Node Physical or virtual host running the agent. ```rust pub struct Node { pub id: NodeId, pub name: String, pub state: NodeState, pub capacity: NodeCapacity, pub allocatable: NodeCapacity, pub allocated: NodeCapacity, pub hypervisors: Vec, // Supported backends pub labels: HashMap, pub taints: Vec, pub conditions: Vec, pub agent_version: String, pub last_heartbeat: u64, } pub struct NodeCapacity { pub vcpus: u32, pub memory_mib: u64, pub storage_gib: u64, } pub enum NodeState { Ready, NotReady, Cordoned, // No new VMs scheduled Draining, // Migrating VMs off Maintenance, } ``` ## 4. Hypervisor Abstraction ### 4.1 Backend Trait ```rust #[async_trait] pub trait HypervisorBackend: Send + Sync { /// Backend identifier fn backend_type(&self) -> HypervisorType; /// Check if this backend supports the given VM spec fn supports(&self, spec: &VmSpec) -> Result<(), UnsupportedFeature>; /// Create VM resources (disk, network) without starting async fn create(&self, vm: &VirtualMachine) -> Result; /// Start the VM async fn start(&self, handle: &VmHandle) -> Result<()>; /// Stop the VM (graceful shutdown) async fn stop(&self, handle: &VmHandle, timeout: Duration) -> Result<()>; /// Force stop the VM async fn kill(&self, handle: &VmHandle) -> Result<()>; /// Reboot the VM async fn reboot(&self, handle: &VmHandle) -> Result<()>; /// Delete VM and cleanup resources async fn delete(&self, handle: &VmHandle) -> Result<()>; /// Get current VM status async fn status(&self, handle: &VmHandle) -> Result; /// Attach a disk to running VM async fn attach_disk(&self, handle: &VmHandle, disk: &DiskSpec) -> Result<()>; /// Detach a disk from running VM async fn detach_disk(&self, handle: &VmHandle, disk_id: &str) -> Result<()>; /// Attach a network interface async fn attach_nic(&self, handle: &VmHandle, nic: &NetworkSpec) -> Result<()>; /// Get console stream (VNC/serial) async fn console(&self, handle: &VmHandle, console_type: ConsoleType) -> Result>; /// Take a snapshot async fn snapshot(&self, handle: &VmHandle, snapshot_id: &str) -> Result<()>; } ``` ### 4.2 Hypervisor Types ```rust pub enum HypervisorType { Kvm, // QEMU/KVM - full-featured Firecracker, // AWS Firecracker - microVMs Mvisor, // mvisor - lightweight } ``` ### 4.3 Backend Registry ```rust pub struct HypervisorRegistry { backends: HashMap>, } impl HypervisorRegistry { pub fn register(&mut self, backend: Arc); pub fn get(&self, typ: HypervisorType) -> Option>; pub fn available(&self) -> Vec; } ``` ### 4.4 Backend Capabilities ```rust pub struct BackendCapabilities { pub live_migration: bool, pub hot_plug_cpu: bool, pub hot_plug_memory: bool, pub hot_plug_disk: bool, pub hot_plug_nic: bool, pub vnc_console: bool, pub serial_console: bool, pub nested_virtualization: bool, pub gpu_passthrough: bool, pub max_vcpus: u32, pub max_memory_gib: u64, pub supported_disk_buses: Vec, pub supported_nic_models: Vec, } ``` ### 4.5 KVM Backend Implementation ```rust // plasmavmc-kvm crate pub struct KvmBackend { qemu_path: PathBuf, runtime_dir: PathBuf, network_helper: NetworkHelper, } impl HypervisorBackend for KvmBackend { fn backend_type(&self) -> HypervisorType { HypervisorType::Kvm } async fn create(&self, vm: &VirtualMachine) -> Result { // Generate QEMU command line // Create runtime directory // Prepare disks and network devices } async fn start(&self, handle: &VmHandle) -> Result<()> { // Launch QEMU process // Wait for QMP socket // Configure via QMP } // ... other methods } ``` ### 4.6 FireCracker Backend Implementation ```rust // plasmavmc-firecracker crate pub struct FirecrackerBackend { fc_path: PathBuf, jailer_path: PathBuf, runtime_dir: PathBuf, } impl HypervisorBackend for FirecrackerBackend { fn backend_type(&self) -> HypervisorType { HypervisorType::Firecracker } fn supports(&self, spec: &VmSpec) -> Result<(), UnsupportedFeature> { // FireCracker limitations: // - No VNC, only serial // - No live migration // - Limited device models if spec.disks.iter().any(|d| d.bus != DiskBus::Virtio) { return Err(UnsupportedFeature::DiskBus); } Ok(()) } // ... other methods } ``` ## 5. API ### 5.1 gRPC Services #### VM Service (`plasmavmc.v1.VmService`) ```protobuf service VmService { // Lifecycle rpc CreateVm(CreateVmRequest) returns (VirtualMachine); rpc GetVm(GetVmRequest) returns (VirtualMachine); rpc ListVms(ListVmsRequest) returns (ListVmsResponse); rpc UpdateVm(UpdateVmRequest) returns (VirtualMachine); rpc DeleteVm(DeleteVmRequest) returns (Empty); // Power operations rpc StartVm(StartVmRequest) returns (VirtualMachine); rpc StopVm(StopVmRequest) returns (VirtualMachine); rpc RebootVm(RebootVmRequest) returns (VirtualMachine); rpc ResetVm(ResetVmRequest) returns (VirtualMachine); // Disks rpc AttachDisk(AttachDiskRequest) returns (VirtualMachine); rpc DetachDisk(DetachDiskRequest) returns (VirtualMachine); // Network rpc AttachNic(AttachNicRequest) returns (VirtualMachine); rpc DetachNic(DetachNicRequest) returns (VirtualMachine); // Console rpc GetConsole(GetConsoleRequest) returns (stream ConsoleData); // Events rpc WatchVm(WatchVmRequest) returns (stream VmEvent); } ``` #### Image Service (`plasmavmc.v1.ImageService`) ```protobuf service ImageService { rpc CreateImage(CreateImageRequest) returns (Image); rpc GetImage(GetImageRequest) returns (Image); rpc ListImages(ListImagesRequest) returns (ListImagesResponse); rpc UpdateImage(UpdateImageRequest) returns (Image); rpc DeleteImage(DeleteImageRequest) returns (Empty); // Upload/Download rpc UploadImage(stream UploadImageRequest) returns (Image); rpc DownloadImage(DownloadImageRequest) returns (stream DownloadImageResponse); // Conversion rpc ConvertImage(ConvertImageRequest) returns (Image); } ``` #### Node Service (`plasmavmc.v1.NodeService`) ```protobuf service NodeService { rpc ListNodes(ListNodesRequest) returns (ListNodesResponse); rpc GetNode(GetNodeRequest) returns (Node); rpc CordonNode(CordonNodeRequest) returns (Node); rpc UncordonNode(UncordonNodeRequest) returns (Node); rpc DrainNode(DrainNodeRequest) returns (Node); } ``` ### 5.2 Agent Internal API (`plasmavmc.agent.v1`) ```protobuf service AgentService { // VM operations (called by control plane) rpc CreateVm(CreateVmRequest) returns (VmHandle); rpc StartVm(StartVmRequest) returns (Empty); rpc StopVm(StopVmRequest) returns (Empty); rpc DeleteVm(DeleteVmRequest) returns (Empty); rpc GetVmStatus(GetVmStatusRequest) returns (VmStatus); // Node status (reported to control plane) rpc Heartbeat(HeartbeatRequest) returns (HeartbeatResponse); rpc ReportStatus(ReportStatusRequest) returns (Empty); } ``` ### 5.3 Client Library ```rust use plasmavmc_client::PlasmaClient; let client = PlasmaClient::connect("http://127.0.0.1:8080").await?; // Create VM let vm = client.create_vm(CreateVmRequest { name: "my-vm".into(), org_id: "org-1".into(), project_id: "proj-1".into(), spec: VmSpec { cpu: CpuSpec { vcpus: 2, ..Default::default() }, memory: MemorySpec { size_mib: 2048, ..Default::default() }, disks: vec![DiskSpec { source: DiskSource::Image { id: "ubuntu-22.04".into() }, size_gib: 20, ..Default::default() }], network: vec![NetworkSpec { network_id: "default".into(), ..Default::default() }], ..Default::default() }, hypervisor: HypervisorType::Kvm, ..Default::default() }).await?; // Start VM client.start_vm(vm.id).await?; // Watch events let mut stream = client.watch_vm(vm.id).await?; while let Some(event) = stream.next().await { println!("Event: {:?}", event); } ``` ## 6. Scheduling ### 6.1 Scheduler ```rust pub struct Scheduler { node_cache: Arc, filters: Vec>, scorers: Vec>, } impl Scheduler { pub async fn schedule(&self, vm: &VirtualMachine) -> Result { let candidates = self.node_cache.ready_nodes(); // Filter phase let filtered: Vec<_> = candidates .into_iter() .filter(|n| self.filters.iter().all(|f| f.filter(vm, n))) .collect(); if filtered.is_empty() { return Err(Error::NoSuitableNode); } // Score phase let scored: Vec<_> = filtered .into_iter() .map(|n| { let score: i64 = self.scorers.iter().map(|s| s.score(vm, &n)).sum(); (n, score) }) .collect(); // Select highest score let (node, _) = scored.into_iter().max_by_key(|(_, s)| *s).unwrap(); Ok(node.id) } } ``` ### 6.2 Filters ```rust pub trait ScheduleFilter: Send + Sync { fn name(&self) -> &'static str; fn filter(&self, vm: &VirtualMachine, node: &Node) -> bool; } // Built-in filters struct ResourceFilter; // CPU/memory fits struct HypervisorFilter; // Node supports hypervisor type struct TaintFilter; // Toleration matching struct AffinityFilter; // Node affinity rules struct AntiAffinityFilter; // Pod anti-affinity ``` ### 6.3 Scorers ```rust pub trait ScheduleScorer: Send + Sync { fn name(&self) -> &'static str; fn score(&self, vm: &VirtualMachine, node: &Node) -> i64; } // Built-in scorers struct LeastAllocatedScorer; // Prefer less loaded nodes struct BalancedResourceScorer; // Balance CPU/memory ratio struct LocalityScorer; // Prefer same zone/rack ``` ## 7. Multi-Tenancy ### 7.1 Resource Hierarchy ``` System (platform operators) └─ Organization (tenant boundary) └─ Project (workload isolation) └─ Resources (VMs, images, networks) ``` ### 7.2 Scoped Resources ```rust // All resources include scope identifiers pub trait Scoped { fn org_id(&self) -> &str; fn project_id(&self) -> &str; } // Resource paths follow aegis pattern // org/{org_id}/project/{project_id}/vm/{vm_id} // org/{org_id}/project/{project_id}/image/{image_id} ``` ### 7.3 Quotas ```rust pub struct Quota { pub scope: Scope, // Org or Project pub limits: ResourceLimits, pub usage: ResourceUsage, } pub struct ResourceLimits { pub max_vms: Option, pub max_vcpus: Option, pub max_memory_gib: Option, pub max_storage_gib: Option, pub max_images: Option, } ``` ### 7.4 Namespace Isolation - **Compute**: VMs scoped to project, nodes shared across orgs - **Network**: Overlay network provides tenant isolation - **Storage**: Images can be private, shared, or public - **Naming**: Names unique within project scope ## 8. Storage ### 8.1 State Storage (Chainfire) ``` # VM records plasmavmc/vms/{org_id}/{project_id}/{vm_id} # Image records plasmavmc/images/{org_id}/{image_id} plasmavmc/images/public/{image_id} # Node records plasmavmc/nodes/{node_id} # Scheduling state plasmavmc/scheduler/assignments/{vm_id} plasmavmc/scheduler/pending/{timestamp}/{vm_id} ``` ### 8.2 Image Storage - **Backend**: LightningSTOR (object storage) - **Format**: Raw, qcow2, vmdk with automatic conversion - **Caching**: Node-local image cache with pull-through - **Path**: `images/{org_id}/{image_id}/{version}` ### 8.3 Disk Storage - **Ephemeral**: Local SSD/NVMe on node - **Persistent**: LightningSTOR volumes (via CSI) - **Snapshot**: Copy-on-write via backend ## 9. Configuration ### 9.1 Control Plane Config (TOML) ```toml [server] addr = "0.0.0.0:8080" [server.tls] cert_file = "/etc/plasmavmc/tls/server.crt" key_file = "/etc/plasmavmc/tls/server.key" ca_file = "/etc/plasmavmc/tls/ca.crt" [store] backend = "chainfire" chainfire_endpoints = ["http://chainfire-1:2379", "http://chainfire-2:2379"] [iam] endpoint = "http://aegis:9090" service_account = "plasmavmc-controller" token_path = "/var/run/secrets/iam/token" [scheduler] default_hypervisor = "kvm" [image_store] backend = "lightningstор" endpoint = "http://lightningstор:9000" bucket = "vm-images" [logging] level = "info" format = "json" ``` ### 9.2 Agent Config (TOML) ```toml [agent] node_id = "node-001" control_plane = "http://plasmavmc-api:8080" [agent.tls] cert_file = "/etc/plasmavmc/tls/agent.crt" key_file = "/etc/plasmavmc/tls/agent.key" ca_file = "/etc/plasmavmc/tls/ca.crt" [hypervisors] enabled = ["kvm", "firecracker"] [hypervisors.kvm] qemu_path = "/usr/bin/qemu-system-x86_64" runtime_dir = "/var/run/plasmavmc/kvm" [hypervisors.firecracker] fc_path = "/usr/bin/firecracker" jailer_path = "/usr/bin/jailer" runtime_dir = "/var/run/plasmavmc/fc" [storage] image_cache_dir = "/var/lib/plasmavmc/images" runtime_dir = "/var/lib/plasmavmc/vms" cache_size_gib = 100 [network] overlay_endpoint = "http://ovn-controller:6641" bridge_name = "plasmavmc0" [logging] level = "info" format = "json" ``` ### 9.3 Environment Variables | Variable | Default | Description | |----------|---------|-------------| | `PLASMAVMC_CONFIG` | - | Config file path | | `PLASMAVMC_ADDR` | `0.0.0.0:8080` | API listen address | | `PLASMAVMC_LOG_LEVEL` | `info` | Log level | | `PLASMAVMC_NODE_ID` | - | Agent node identifier | ### 9.4 CLI Arguments ``` plasmavmc-server [OPTIONS] -c, --config Config file path -a, --addr Listen address -l, --log-level Log level -h, --help Print help -V, --version Print version plasmavmc-agent [OPTIONS] -c, --config Config file path -n, --node-id Node identifier --control-plane Control plane endpoint -h, --help Print help ``` ## 10. Integration ### 10.1 Aegis (IAM) ```rust // Authorization check before VM operations async fn authorize_vm_action( iam: &IamClient, principal: &PrincipalRef, action: &str, vm: &VirtualMachine, ) -> Result<()> { let resource = ResourceRef { kind: "vm".into(), id: vm.id.to_string(), org_id: vm.org_id.clone(), project_id: vm.project_id.clone(), ..Default::default() }; let allowed = iam.authorize(principal, action, &resource).await?; if !allowed { return Err(Error::PermissionDenied); } Ok(()) } // Action patterns // plasmavmc:vms:create // plasmavmc:vms:get // plasmavmc:vms:list // plasmavmc:vms:update // plasmavmc:vms:delete // plasmavmc:vms:start // plasmavmc:vms:stop // plasmavmc:vms:console // plasmavmc:images:create // plasmavmc:images:get // plasmavmc:images:delete ``` ### 10.2 Overlay Network ```rust // Network integration for VM NICs pub trait NetworkProvider: Send + Sync { /// Allocate port for VM NIC async fn create_port(&self, req: CreatePortRequest) -> Result; /// Release port async fn delete_port(&self, port_id: &str) -> Result<()>; /// Get port details (MAC, IP, security groups) async fn get_port(&self, port_id: &str) -> Result; /// Update security groups async fn update_security_groups( &self, port_id: &str, groups: Vec, ) -> Result<()>; } pub struct Port { pub id: String, pub network_id: String, pub mac_address: String, pub ip_addresses: Vec, pub security_groups: Vec, pub tap_device: String, // Host tap device name } ``` ### 10.3 Chainfire (State) ```rust // Watch for VM changes (controller pattern) async fn reconcile_loop(chainfire: &ChainfireClient) { let mut watch = chainfire .watch_prefix("plasmavmc/vms/") .await?; while let Some(event) = watch.next().await { match event.event_type { Put => reconcile_vm(event.kv).await?, Delete => cleanup_vm(event.kv).await?, } } } ``` ## 11. Security ### 11.1 Authentication - **Control Plane**: mTLS + aegis tokens - **Agent**: mTLS with node certificate - **Console**: WebSocket with aegis token ### 11.2 Authorization - Integrated with aegis (IAM) - Action-based permissions - Scope enforcement (org/project) ### 11.3 VM Isolation - **Process**: Hypervisor process per VM - **Filesystem**: Seccomp, namespaces, chroot (FireCracker jailer) - **Network**: Overlay network tenant isolation - **Resources**: cgroups for CPU/memory limits ### 11.4 Image Security - Checksum verification on download - Signature verification (optional) - Content scanning integration point ## 12. Operations ### 12.1 Deployment **Single Node (Development)** ```bash # Start control plane plasmavmc-server --config config.toml # Start agent on same node plasmavmc-agent --config agent.toml --node-id dev-node ``` **Production Cluster** ```bash # Control plane (3 instances for HA) plasmavmc-server --config config.toml # Agents (each compute node) plasmavmc-agent --config agent.toml --node-id node-$(hostname) ``` ### 12.2 Monitoring **Metrics (Prometheus)** | Metric | Type | Description | |--------|------|-------------| | `plasmavmc_vms_total` | Gauge | Total VMs by state | | `plasmavmc_vm_operations_total` | Counter | Operations by type | | `plasmavmc_vm_boot_seconds` | Histogram | VM boot time | | `plasmavmc_node_capacity_vcpus` | Gauge | Node vCPU capacity | | `plasmavmc_node_allocated_vcpus` | Gauge | Allocated vCPUs | | `plasmavmc_scheduler_latency_seconds` | Histogram | Scheduling latency | | `plasmavmc_agent_heartbeat_age_seconds` | Gauge | Time since heartbeat | **Health Endpoints** - `GET /health` - Liveness - `GET /ready` - Readiness (chainfire connected, agents online) ### 12.3 Backup & Recovery - **State**: Chainfire handles via Raft snapshots - **Images**: LightningSTOR replication - **VM Disks**: Volume snapshots via storage backend ## 13. Compatibility ### 13.1 API Versioning - gRPC package: `plasmavmc.v1` - Semantic versioning - Backward compatible within major version ### 13.2 Hypervisor Versions | Backend | Minimum Version | Notes | |---------|-----------------|-------| | QEMU/KVM | 6.0 | QMP protocol | | FireCracker | 1.0 | API v1 | | mvisor | TBD | | ## Appendix ### A. Error Codes | Error | Meaning | |-------|---------| | VM_NOT_FOUND | VM does not exist | | IMAGE_NOT_FOUND | Image does not exist | | NODE_NOT_FOUND | Node does not exist | | NO_SUITABLE_NODE | Scheduling failed | | QUOTA_EXCEEDED | Resource quota exceeded | | HYPERVISOR_ERROR | Backend operation failed | | INVALID_STATE | Operation invalid for current state | ### B. Port Assignments | Port | Protocol | Purpose | |------|----------|---------| | 8080 | gRPC | Control plane API | | 8081 | HTTP | Metrics/health | | 8082 | gRPC | Agent internal API | ### C. Glossary - **VM**: Virtual Machine - an isolated compute instance - **Hypervisor**: Software that creates and runs VMs (KVM, FireCracker) - **Image**: Bootable disk image template - **Node**: Physical/virtual host running VMs - **Agent**: Daemon running on each node managing local VMs - **Scheduler**: Component selecting nodes for VM placement - **Overlay Network**: Virtual network providing tenant isolation ### D. Backend Comparison | Feature | KVM/QEMU | FireCracker | mvisor | |---------|----------|-------------|--------| | Boot time | ~5s | <125ms | TBD | | Memory overhead | Medium | Low | Low | | Device support | Full | Limited | Limited | | Live migration | Yes | No | No | | VNC console | Yes | No | No | | GPU passthrough | Yes | No | No | | Nested virt | Yes | No | No | | Best for | General | Serverless | TBD |