- netboot-base.nix with SSH key auth - Launch scripts for node01/02/03 - Node configuration.nix and disko.nix - Nix modules for first-boot automation 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
32 KiB
32 KiB
PlasmaVMC Specification
Version: 1.0 | Status: Draft | Last Updated: 2025-12-08
1. Overview
1.1 Purpose
PlasmaVMC is a virtual machine control platform providing unified management across multiple hypervisor backends. It abstracts hypervisor-specific implementations behind trait-based interfaces, enabling consistent VM lifecycle management regardless of the underlying virtualization technology.
The name "Plasma" reflects its role as the energized medium that powers virtual machines, with "VMC" denoting Virtual Machine Controller.
1.2 Scope
- In scope: VM lifecycle (create, start, stop, delete), hypervisor abstraction (KVM, FireCracker, mvisor), image management, resource allocation (CPU, memory, storage, network), multi-tenant isolation, console/serial access, live migration (future)
- Out of scope: Container orchestration (Kubernetes), bare metal provisioning, storage backend implementation (uses LightningSTOR), network fabric (uses overlay network)
1.3 Design Goals
- Hypervisor agnostic: Trait-based abstraction supporting KVM, FireCracker, mvisor
- AWS/GCP EC2-like UX: Familiar concepts for cloud users
- Multi-tenant from day one: Full org/project hierarchy with resource isolation
- High density: Support thousands of VMs per node
- Fast boot: Sub-second boot times with FireCracker/microVMs
- Observable: Rich metrics, events, and audit logging
2. Architecture
2.1 Crate Structure
plasmavmc/
├── crates/
│ ├── plasmavmc-api/ # gRPC service implementations
│ ├── plasmavmc-client/ # Rust client library
│ ├── plasmavmc-core/ # Core orchestration logic
│ ├── plasmavmc-hypervisor/ # Hypervisor trait + registry
│ ├── plasmavmc-kvm/ # KVM/QEMU backend
│ ├── plasmavmc-firecracker/# FireCracker backend
│ ├── plasmavmc-mvisor/ # mvisor backend
│ ├── plasmavmc-server/ # Control plane server
│ ├── plasmavmc-agent/ # Node agent binary
│ ├── plasmavmc-storage/ # Image/disk management
│ └── plasmavmc-types/ # Shared types
└── proto/
├── plasmavmc.proto # Public API
└── agent.proto # Agent internal RPCs
2.2 Component Topology
┌─────────────────────────────────────────────────────────────────┐
│ Control Plane │
│ ┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐ │
│ │ plasmavmc-api │ │ plasmavmc-core │ │plasmavmc-storage│ │
│ │ (gRPC svc) │──│ (scheduler) │──│ (image mgmt) │ │
│ └────────┬────────┘ └────────┬────────┘ └────────┬────────┘ │
│ │ │ │ │
│ └────────────────────┼────────────────────┘ │
│ │ │
│ ┌──────▼──────┐ │
│ │ Chainfire │ │
│ │ (state) │ │
│ └─────────────┘ │
└─────────────────────────────────────────────────────────────────┘
│
┌────────────┼────────────┐
▼ ▼ ▼
┌─────────────────────┐ ┌─────────────────────┐ ┌─────────────────────┐
│ Node 1 │ │ Node 2 │ │ Node N │
│ ┌─────────────────┐ │ │ ┌─────────────────┐ │ │ ┌─────────────────┐ │
│ │ plasmavmc-agent │ │ │ │ plasmavmc-agent │ │ │ │ plasmavmc-agent │ │
│ └────────┬────────┘ │ │ └────────┬────────┘ │ │ └────────┬────────┘ │
│ │ │ │ │ │ │ │ │
│ ┌────────▼────────┐ │ │ ┌────────▼────────┐ │ │ ┌────────▼────────┐ │
│ │HypervisorBackend│ │ │ │HypervisorBackend│ │ │ │HypervisorBackend│ │
│ │ (KVM/FC/mvisor) │ │ │ │ (KVM/FC/mvisor) │ │ │ │ (KVM/FC/mvisor) │ │
│ └─────────────────┘ │ │ └─────────────────┘ │ │ └─────────────────┘ │
└─────────────────────┘ └─────────────────────┘ └─────────────────────┘
2.3 Data Flow
[Client gRPC] → [API Layer] → [Scheduler] → [Agent gRPC] → [Hypervisor]
│ │
▼ ▼
[Chainfire] [Node Selection]
(VM state) (capacity, affinity)
2.4 Dependencies
| Crate | Version | Purpose |
|---|---|---|
| tokio | 1.x | Async runtime |
| tonic | 0.12 | gRPC framework |
| prost | 0.13 | Protocol buffers |
| uuid | 1.x | VM/resource identifiers |
| dashmap | 6.x | Concurrent state caches |
| nix | 0.29 | Linux system calls |
3. Core Concepts
3.1 Virtual Machine (VM)
The primary managed resource representing a virtual machine instance.
pub struct VirtualMachine {
pub id: VmId, // UUID
pub name: String, // User-defined name
pub org_id: String, // Organization owner
pub project_id: String, // Project owner
pub state: VmState, // Current state
pub spec: VmSpec, // Desired configuration
pub status: VmStatus, // Runtime status
pub node_id: Option<NodeId>, // Assigned node
pub hypervisor: HypervisorType, // Backend type
pub created_at: u64,
pub updated_at: u64,
pub created_by: String, // Principal ID
pub metadata: HashMap<String, String>,
pub labels: HashMap<String, String>,
}
pub struct VmSpec {
pub cpu: CpuSpec,
pub memory: MemorySpec,
pub disks: Vec<DiskSpec>,
pub network: Vec<NetworkSpec>,
pub boot: BootSpec,
pub security: SecuritySpec,
}
pub struct CpuSpec {
pub vcpus: u32, // Number of vCPUs
pub cores_per_socket: u32, // Topology: cores per socket
pub sockets: u32, // Topology: socket count
pub cpu_model: Option<String>, // e.g., "host-passthrough"
}
pub struct MemorySpec {
pub size_mib: u64, // Memory size in MiB
pub hugepages: bool, // Use huge pages
pub numa_nodes: Vec<NumaNode>, // NUMA topology
}
pub struct DiskSpec {
pub id: String, // Disk identifier
pub source: DiskSource, // Image or volume
pub size_gib: u64, // Disk size
pub bus: DiskBus, // virtio, scsi, ide
pub cache: DiskCache, // none, writeback, writethrough
pub boot_index: Option<u32>, // Boot order
}
pub struct NetworkSpec {
pub id: String, // Interface identifier
pub network_id: String, // Overlay network ID
pub mac_address: Option<String>,
pub ip_address: Option<String>,
pub model: NicModel, // virtio-net, e1000
pub security_groups: Vec<String>,
}
3.2 VM State Machine
┌──────────────────────────────────────┐
▼ │
┌─────────┐ create ┌─────────┐ start ┌─────────┐ │
│ PENDING │──────────►│ STOPPED │──────────►│ RUNNING │ │
└─────────┘ └────┬────┘ └────┬────┘ │
│ delete │ │
▼ │ stop │
┌─────────┐ │ │
│ DELETED │◄───────────────┤ │
└─────────┘ │ │
│ reboot │
└───────────┘
Additional states:
CREATING - Provisioning resources
STARTING - Boot in progress
STOPPING - Shutdown in progress
MIGRATING - Live migration in progress
ERROR - Failed state (recoverable)
FAILED - Terminal failure
pub enum VmState {
Pending, // Awaiting scheduling
Creating, // Resources being provisioned
Stopped, // Created but not running
Starting, // Boot in progress
Running, // Active and healthy
Stopping, // Graceful shutdown
Migrating, // Live migration in progress
Error, // Recoverable error
Failed, // Terminal failure
Deleted, // Soft-deleted, pending cleanup
}
3.3 Runtime Status
pub struct VmStatus {
pub actual_state: VmState,
pub host_pid: Option<u32>, // Hypervisor process PID
pub started_at: Option<u64>, // Last boot timestamp
pub ip_addresses: Vec<IpAddress>,
pub resource_usage: ResourceUsage,
pub last_error: Option<String>,
pub conditions: Vec<Condition>,
}
pub struct ResourceUsage {
pub cpu_percent: f64,
pub memory_used_mib: u64,
pub disk_read_bytes: u64,
pub disk_write_bytes: u64,
pub network_rx_bytes: u64,
pub network_tx_bytes: u64,
}
3.4 Image
Bootable disk images for VM creation.
pub struct Image {
pub id: ImageId,
pub name: String,
pub org_id: String, // Owner org (or "system" for public)
pub visibility: Visibility, // Public, Private, Shared
pub source: ImageSource,
pub format: ImageFormat,
pub size_bytes: u64,
pub checksum: String, // SHA256
pub os_type: OsType,
pub os_version: String,
pub architecture: Architecture,
pub min_disk_gib: u32,
pub min_memory_mib: u32,
pub status: ImageStatus,
pub created_at: u64,
pub updated_at: u64,
pub metadata: HashMap<String, String>,
}
pub enum ImageSource {
Url { url: String },
Upload { storage_path: String },
Snapshot { vm_id: VmId, disk_id: String },
}
pub enum ImageFormat {
Raw,
Qcow2,
Vmdk,
Vhd,
}
pub enum Visibility {
Public, // Available to all orgs
Private, // Only owner org
Shared { orgs: Vec<String> },
}
3.5 Node
Physical or virtual host running the agent.
pub struct Node {
pub id: NodeId,
pub name: String,
pub state: NodeState,
pub capacity: NodeCapacity,
pub allocatable: NodeCapacity,
pub allocated: NodeCapacity,
pub hypervisors: Vec<HypervisorType>, // Supported backends
pub labels: HashMap<String, String>,
pub taints: Vec<Taint>,
pub conditions: Vec<NodeCondition>,
pub agent_version: String,
pub last_heartbeat: u64,
}
pub struct NodeCapacity {
pub vcpus: u32,
pub memory_mib: u64,
pub storage_gib: u64,
}
pub enum NodeState {
Ready,
NotReady,
Cordoned, // No new VMs scheduled
Draining, // Migrating VMs off
Maintenance,
}
4. Hypervisor Abstraction
4.1 Backend Trait
#[async_trait]
pub trait HypervisorBackend: Send + Sync {
/// Backend identifier
fn backend_type(&self) -> HypervisorType;
/// Check if this backend supports the given VM spec
fn supports(&self, spec: &VmSpec) -> Result<(), UnsupportedFeature>;
/// Create VM resources (disk, network) without starting
async fn create(&self, vm: &VirtualMachine) -> Result<VmHandle>;
/// Start the VM
async fn start(&self, handle: &VmHandle) -> Result<()>;
/// Stop the VM (graceful shutdown)
async fn stop(&self, handle: &VmHandle, timeout: Duration) -> Result<()>;
/// Force stop the VM
async fn kill(&self, handle: &VmHandle) -> Result<()>;
/// Reboot the VM
async fn reboot(&self, handle: &VmHandle) -> Result<()>;
/// Delete VM and cleanup resources
async fn delete(&self, handle: &VmHandle) -> Result<()>;
/// Get current VM status
async fn status(&self, handle: &VmHandle) -> Result<VmStatus>;
/// Attach a disk to running VM
async fn attach_disk(&self, handle: &VmHandle, disk: &DiskSpec) -> Result<()>;
/// Detach a disk from running VM
async fn detach_disk(&self, handle: &VmHandle, disk_id: &str) -> Result<()>;
/// Attach a network interface
async fn attach_nic(&self, handle: &VmHandle, nic: &NetworkSpec) -> Result<()>;
/// Get console stream (VNC/serial)
async fn console(&self, handle: &VmHandle, console_type: ConsoleType)
-> Result<Box<dyn AsyncReadWrite>>;
/// Take a snapshot
async fn snapshot(&self, handle: &VmHandle, snapshot_id: &str) -> Result<()>;
}
4.2 Hypervisor Types
pub enum HypervisorType {
Kvm, // QEMU/KVM - full-featured
Firecracker, // AWS Firecracker - microVMs
Mvisor, // mvisor - lightweight
}
4.3 Backend Registry
pub struct HypervisorRegistry {
backends: HashMap<HypervisorType, Arc<dyn HypervisorBackend>>,
}
impl HypervisorRegistry {
pub fn register(&mut self, backend: Arc<dyn HypervisorBackend>);
pub fn get(&self, typ: HypervisorType) -> Option<Arc<dyn HypervisorBackend>>;
pub fn available(&self) -> Vec<HypervisorType>;
}
4.4 Backend Capabilities
pub struct BackendCapabilities {
pub live_migration: bool,
pub hot_plug_cpu: bool,
pub hot_plug_memory: bool,
pub hot_plug_disk: bool,
pub hot_plug_nic: bool,
pub vnc_console: bool,
pub serial_console: bool,
pub nested_virtualization: bool,
pub gpu_passthrough: bool,
pub max_vcpus: u32,
pub max_memory_gib: u64,
pub supported_disk_buses: Vec<DiskBus>,
pub supported_nic_models: Vec<NicModel>,
}
4.5 KVM Backend Implementation
// plasmavmc-kvm crate
pub struct KvmBackend {
qemu_path: PathBuf,
runtime_dir: PathBuf,
network_helper: NetworkHelper,
}
impl HypervisorBackend for KvmBackend {
fn backend_type(&self) -> HypervisorType {
HypervisorType::Kvm
}
async fn create(&self, vm: &VirtualMachine) -> Result<VmHandle> {
// Generate QEMU command line
// Create runtime directory
// Prepare disks and network devices
}
async fn start(&self, handle: &VmHandle) -> Result<()> {
// Launch QEMU process
// Wait for QMP socket
// Configure via QMP
}
// ... other methods
}
4.6 FireCracker Backend Implementation
// plasmavmc-firecracker crate
pub struct FirecrackerBackend {
fc_path: PathBuf,
jailer_path: PathBuf,
runtime_dir: PathBuf,
}
impl HypervisorBackend for FirecrackerBackend {
fn backend_type(&self) -> HypervisorType {
HypervisorType::Firecracker
}
fn supports(&self, spec: &VmSpec) -> Result<(), UnsupportedFeature> {
// FireCracker limitations:
// - No VNC, only serial
// - No live migration
// - Limited device models
if spec.disks.iter().any(|d| d.bus != DiskBus::Virtio) {
return Err(UnsupportedFeature::DiskBus);
}
Ok(())
}
// ... other methods
}
5. API
5.1 gRPC Services
VM Service (plasmavmc.v1.VmService)
service VmService {
// Lifecycle
rpc CreateVm(CreateVmRequest) returns (VirtualMachine);
rpc GetVm(GetVmRequest) returns (VirtualMachine);
rpc ListVms(ListVmsRequest) returns (ListVmsResponse);
rpc UpdateVm(UpdateVmRequest) returns (VirtualMachine);
rpc DeleteVm(DeleteVmRequest) returns (Empty);
// Power operations
rpc StartVm(StartVmRequest) returns (VirtualMachine);
rpc StopVm(StopVmRequest) returns (VirtualMachine);
rpc RebootVm(RebootVmRequest) returns (VirtualMachine);
rpc ResetVm(ResetVmRequest) returns (VirtualMachine);
// Disks
rpc AttachDisk(AttachDiskRequest) returns (VirtualMachine);
rpc DetachDisk(DetachDiskRequest) returns (VirtualMachine);
// Network
rpc AttachNic(AttachNicRequest) returns (VirtualMachine);
rpc DetachNic(DetachNicRequest) returns (VirtualMachine);
// Console
rpc GetConsole(GetConsoleRequest) returns (stream ConsoleData);
// Events
rpc WatchVm(WatchVmRequest) returns (stream VmEvent);
}
Image Service (plasmavmc.v1.ImageService)
service ImageService {
rpc CreateImage(CreateImageRequest) returns (Image);
rpc GetImage(GetImageRequest) returns (Image);
rpc ListImages(ListImagesRequest) returns (ListImagesResponse);
rpc UpdateImage(UpdateImageRequest) returns (Image);
rpc DeleteImage(DeleteImageRequest) returns (Empty);
// Upload/Download
rpc UploadImage(stream UploadImageRequest) returns (Image);
rpc DownloadImage(DownloadImageRequest) returns (stream DownloadImageResponse);
// Conversion
rpc ConvertImage(ConvertImageRequest) returns (Image);
}
Node Service (plasmavmc.v1.NodeService)
service NodeService {
rpc ListNodes(ListNodesRequest) returns (ListNodesResponse);
rpc GetNode(GetNodeRequest) returns (Node);
rpc CordonNode(CordonNodeRequest) returns (Node);
rpc UncordonNode(UncordonNodeRequest) returns (Node);
rpc DrainNode(DrainNodeRequest) returns (Node);
}
5.2 Agent Internal API (plasmavmc.agent.v1)
service AgentService {
// VM operations (called by control plane)
rpc CreateVm(CreateVmRequest) returns (VmHandle);
rpc StartVm(StartVmRequest) returns (Empty);
rpc StopVm(StopVmRequest) returns (Empty);
rpc DeleteVm(DeleteVmRequest) returns (Empty);
rpc GetVmStatus(GetVmStatusRequest) returns (VmStatus);
// Node status (reported to control plane)
rpc Heartbeat(HeartbeatRequest) returns (HeartbeatResponse);
rpc ReportStatus(ReportStatusRequest) returns (Empty);
}
5.3 Client Library
use plasmavmc_client::PlasmaClient;
let client = PlasmaClient::connect("http://127.0.0.1:8080").await?;
// Create VM
let vm = client.create_vm(CreateVmRequest {
name: "my-vm".into(),
org_id: "org-1".into(),
project_id: "proj-1".into(),
spec: VmSpec {
cpu: CpuSpec { vcpus: 2, ..Default::default() },
memory: MemorySpec { size_mib: 2048, ..Default::default() },
disks: vec![DiskSpec {
source: DiskSource::Image { id: "ubuntu-22.04".into() },
size_gib: 20,
..Default::default()
}],
network: vec![NetworkSpec {
network_id: "default".into(),
..Default::default()
}],
..Default::default()
},
hypervisor: HypervisorType::Kvm,
..Default::default()
}).await?;
// Start VM
client.start_vm(vm.id).await?;
// Watch events
let mut stream = client.watch_vm(vm.id).await?;
while let Some(event) = stream.next().await {
println!("Event: {:?}", event);
}
6. Scheduling
6.1 Scheduler
pub struct Scheduler {
node_cache: Arc<NodeCache>,
filters: Vec<Box<dyn ScheduleFilter>>,
scorers: Vec<Box<dyn ScheduleScorer>>,
}
impl Scheduler {
pub async fn schedule(&self, vm: &VirtualMachine) -> Result<NodeId> {
let candidates = self.node_cache.ready_nodes();
// Filter phase
let filtered: Vec<_> = candidates
.into_iter()
.filter(|n| self.filters.iter().all(|f| f.filter(vm, n)))
.collect();
if filtered.is_empty() {
return Err(Error::NoSuitableNode);
}
// Score phase
let scored: Vec<_> = filtered
.into_iter()
.map(|n| {
let score: i64 = self.scorers.iter().map(|s| s.score(vm, &n)).sum();
(n, score)
})
.collect();
// Select highest score
let (node, _) = scored.into_iter().max_by_key(|(_, s)| *s).unwrap();
Ok(node.id)
}
}
6.2 Filters
pub trait ScheduleFilter: Send + Sync {
fn name(&self) -> &'static str;
fn filter(&self, vm: &VirtualMachine, node: &Node) -> bool;
}
// Built-in filters
struct ResourceFilter; // CPU/memory fits
struct HypervisorFilter; // Node supports hypervisor type
struct TaintFilter; // Toleration matching
struct AffinityFilter; // Node affinity rules
struct AntiAffinityFilter; // Pod anti-affinity
6.3 Scorers
pub trait ScheduleScorer: Send + Sync {
fn name(&self) -> &'static str;
fn score(&self, vm: &VirtualMachine, node: &Node) -> i64;
}
// Built-in scorers
struct LeastAllocatedScorer; // Prefer less loaded nodes
struct BalancedResourceScorer; // Balance CPU/memory ratio
struct LocalityScorer; // Prefer same zone/rack
7. Multi-Tenancy
7.1 Resource Hierarchy
System (platform operators)
└─ Organization (tenant boundary)
└─ Project (workload isolation)
└─ Resources (VMs, images, networks)
7.2 Scoped Resources
// All resources include scope identifiers
pub trait Scoped {
fn org_id(&self) -> &str;
fn project_id(&self) -> &str;
}
// Resource paths follow aegis pattern
// org/{org_id}/project/{project_id}/vm/{vm_id}
// org/{org_id}/project/{project_id}/image/{image_id}
7.3 Quotas
pub struct Quota {
pub scope: Scope, // Org or Project
pub limits: ResourceLimits,
pub usage: ResourceUsage,
}
pub struct ResourceLimits {
pub max_vms: Option<u32>,
pub max_vcpus: Option<u32>,
pub max_memory_gib: Option<u64>,
pub max_storage_gib: Option<u64>,
pub max_images: Option<u32>,
}
7.4 Namespace Isolation
- Compute: VMs scoped to project, nodes shared across orgs
- Network: Overlay network provides tenant isolation
- Storage: Images can be private, shared, or public
- Naming: Names unique within project scope
8. Storage
8.1 State Storage (Chainfire)
# VM records
plasmavmc/vms/{org_id}/{project_id}/{vm_id}
# Image records
plasmavmc/images/{org_id}/{image_id}
plasmavmc/images/public/{image_id}
# Node records
plasmavmc/nodes/{node_id}
# Scheduling state
plasmavmc/scheduler/assignments/{vm_id}
plasmavmc/scheduler/pending/{timestamp}/{vm_id}
8.2 Image Storage
- Backend: LightningSTOR (object storage)
- Format: Raw, qcow2, vmdk with automatic conversion
- Caching: Node-local image cache with pull-through
- Path:
images/{org_id}/{image_id}/{version}
8.3 Disk Storage
- Ephemeral: Local SSD/NVMe on node
- Persistent: LightningSTOR volumes (via CSI)
- Snapshot: Copy-on-write via backend
9. Configuration
9.1 Control Plane Config (TOML)
[server]
addr = "0.0.0.0:8080"
[server.tls]
cert_file = "/etc/plasmavmc/tls/server.crt"
key_file = "/etc/plasmavmc/tls/server.key"
ca_file = "/etc/plasmavmc/tls/ca.crt"
[store]
backend = "chainfire"
chainfire_endpoints = ["http://chainfire-1:2379", "http://chainfire-2:2379"]
[iam]
endpoint = "http://aegis:9090"
service_account = "plasmavmc-controller"
token_path = "/var/run/secrets/iam/token"
[scheduler]
default_hypervisor = "kvm"
[image_store]
backend = "lightningstор"
endpoint = "http://lightningstор:9000"
bucket = "vm-images"
[logging]
level = "info"
format = "json"
9.2 Agent Config (TOML)
[agent]
node_id = "node-001"
control_plane = "http://plasmavmc-api:8080"
[agent.tls]
cert_file = "/etc/plasmavmc/tls/agent.crt"
key_file = "/etc/plasmavmc/tls/agent.key"
ca_file = "/etc/plasmavmc/tls/ca.crt"
[hypervisors]
enabled = ["kvm", "firecracker"]
[hypervisors.kvm]
qemu_path = "/usr/bin/qemu-system-x86_64"
runtime_dir = "/var/run/plasmavmc/kvm"
[hypervisors.firecracker]
fc_path = "/usr/bin/firecracker"
jailer_path = "/usr/bin/jailer"
runtime_dir = "/var/run/plasmavmc/fc"
[storage]
image_cache_dir = "/var/lib/plasmavmc/images"
runtime_dir = "/var/lib/plasmavmc/vms"
cache_size_gib = 100
[network]
overlay_endpoint = "http://ovn-controller:6641"
bridge_name = "plasmavmc0"
[logging]
level = "info"
format = "json"
9.3 Environment Variables
| Variable | Default | Description |
|---|---|---|
PLASMAVMC_CONFIG |
- | Config file path |
PLASMAVMC_ADDR |
0.0.0.0:8080 |
API listen address |
PLASMAVMC_LOG_LEVEL |
info |
Log level |
PLASMAVMC_NODE_ID |
- | Agent node identifier |
9.4 CLI Arguments
plasmavmc-server [OPTIONS]
-c, --config <PATH> Config file path
-a, --addr <ADDR> Listen address
-l, --log-level <LEVEL> Log level
-h, --help Print help
-V, --version Print version
plasmavmc-agent [OPTIONS]
-c, --config <PATH> Config file path
-n, --node-id <ID> Node identifier
--control-plane <URL> Control plane endpoint
-h, --help Print help
10. Integration
10.1 Aegis (IAM)
// Authorization check before VM operations
async fn authorize_vm_action(
iam: &IamClient,
principal: &PrincipalRef,
action: &str,
vm: &VirtualMachine,
) -> Result<()> {
let resource = ResourceRef {
kind: "vm".into(),
id: vm.id.to_string(),
org_id: vm.org_id.clone(),
project_id: vm.project_id.clone(),
..Default::default()
};
let allowed = iam.authorize(principal, action, &resource).await?;
if !allowed {
return Err(Error::PermissionDenied);
}
Ok(())
}
// Action patterns
// plasmavmc:vms:create
// plasmavmc:vms:get
// plasmavmc:vms:list
// plasmavmc:vms:update
// plasmavmc:vms:delete
// plasmavmc:vms:start
// plasmavmc:vms:stop
// plasmavmc:vms:console
// plasmavmc:images:create
// plasmavmc:images:get
// plasmavmc:images:delete
10.2 Overlay Network
// Network integration for VM NICs
pub trait NetworkProvider: Send + Sync {
/// Allocate port for VM NIC
async fn create_port(&self, req: CreatePortRequest) -> Result<Port>;
/// Release port
async fn delete_port(&self, port_id: &str) -> Result<()>;
/// Get port details (MAC, IP, security groups)
async fn get_port(&self, port_id: &str) -> Result<Port>;
/// Update security groups
async fn update_security_groups(
&self,
port_id: &str,
groups: Vec<String>,
) -> Result<()>;
}
pub struct Port {
pub id: String,
pub network_id: String,
pub mac_address: String,
pub ip_addresses: Vec<IpAddress>,
pub security_groups: Vec<String>,
pub tap_device: String, // Host tap device name
}
10.3 Chainfire (State)
// Watch for VM changes (controller pattern)
async fn reconcile_loop(chainfire: &ChainfireClient) {
let mut watch = chainfire
.watch_prefix("plasmavmc/vms/")
.await?;
while let Some(event) = watch.next().await {
match event.event_type {
Put => reconcile_vm(event.kv).await?,
Delete => cleanup_vm(event.kv).await?,
}
}
}
11. Security
11.1 Authentication
- Control Plane: mTLS + aegis tokens
- Agent: mTLS with node certificate
- Console: WebSocket with aegis token
11.2 Authorization
- Integrated with aegis (IAM)
- Action-based permissions
- Scope enforcement (org/project)
11.3 VM Isolation
- Process: Hypervisor process per VM
- Filesystem: Seccomp, namespaces, chroot (FireCracker jailer)
- Network: Overlay network tenant isolation
- Resources: cgroups for CPU/memory limits
11.4 Image Security
- Checksum verification on download
- Signature verification (optional)
- Content scanning integration point
12. Operations
12.1 Deployment
Single Node (Development)
# Start control plane
plasmavmc-server --config config.toml
# Start agent on same node
plasmavmc-agent --config agent.toml --node-id dev-node
Production Cluster
# Control plane (3 instances for HA)
plasmavmc-server --config config.toml
# Agents (each compute node)
plasmavmc-agent --config agent.toml --node-id node-$(hostname)
12.2 Monitoring
Metrics (Prometheus)
| Metric | Type | Description |
|---|---|---|
plasmavmc_vms_total |
Gauge | Total VMs by state |
plasmavmc_vm_operations_total |
Counter | Operations by type |
plasmavmc_vm_boot_seconds |
Histogram | VM boot time |
plasmavmc_node_capacity_vcpus |
Gauge | Node vCPU capacity |
plasmavmc_node_allocated_vcpus |
Gauge | Allocated vCPUs |
plasmavmc_scheduler_latency_seconds |
Histogram | Scheduling latency |
plasmavmc_agent_heartbeat_age_seconds |
Gauge | Time since heartbeat |
Health Endpoints
GET /health- LivenessGET /ready- Readiness (chainfire connected, agents online)
12.3 Backup & Recovery
- State: Chainfire handles via Raft snapshots
- Images: LightningSTOR replication
- VM Disks: Volume snapshots via storage backend
13. Compatibility
13.1 API Versioning
- gRPC package:
plasmavmc.v1 - Semantic versioning
- Backward compatible within major version
13.2 Hypervisor Versions
| Backend | Minimum Version | Notes |
|---|---|---|
| QEMU/KVM | 6.0 | QMP protocol |
| FireCracker | 1.0 | API v1 |
| mvisor | TBD |
Appendix
A. Error Codes
| Error | Meaning |
|---|---|
| VM_NOT_FOUND | VM does not exist |
| IMAGE_NOT_FOUND | Image does not exist |
| NODE_NOT_FOUND | Node does not exist |
| NO_SUITABLE_NODE | Scheduling failed |
| QUOTA_EXCEEDED | Resource quota exceeded |
| HYPERVISOR_ERROR | Backend operation failed |
| INVALID_STATE | Operation invalid for current state |
B. Port Assignments
| Port | Protocol | Purpose |
|---|---|---|
| 8080 | gRPC | Control plane API |
| 8081 | HTTP | Metrics/health |
| 8082 | gRPC | Agent internal API |
C. Glossary
- VM: Virtual Machine - an isolated compute instance
- Hypervisor: Software that creates and runs VMs (KVM, FireCracker)
- Image: Bootable disk image template
- Node: Physical/virtual host running VMs
- Agent: Daemon running on each node managing local VMs
- Scheduler: Component selecting nodes for VM placement
- Overlay Network: Virtual network providing tenant isolation
D. Backend Comparison
| Feature | KVM/QEMU | FireCracker | mvisor |
|---|---|---|---|
| Boot time | ~5s | <125ms | TBD |
| Memory overhead | Medium | Low | Low |
| Device support | Full | Limited | Limited |
| Live migration | Yes | No | No |
| VNC console | Yes | No | No |
| GPU passthrough | Yes | No | No |
| Nested virt | Yes | No | No |
| Best for | General | Serverless | TBD |