- netboot-base.nix with SSH key auth - Launch scripts for node01/02/03 - Node configuration.nix and disko.nix - Nix modules for first-boot automation 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
1017 lines
32 KiB
Markdown
1017 lines
32 KiB
Markdown
# PlasmaVMC Specification
|
||
|
||
> Version: 1.0 | Status: Draft | Last Updated: 2025-12-08
|
||
|
||
## 1. Overview
|
||
|
||
### 1.1 Purpose
|
||
PlasmaVMC is a virtual machine control platform providing unified management across multiple hypervisor backends. It abstracts hypervisor-specific implementations behind trait-based interfaces, enabling consistent VM lifecycle management regardless of the underlying virtualization technology.
|
||
|
||
The name "Plasma" reflects its role as the energized medium that powers virtual machines, with "VMC" denoting Virtual Machine Controller.
|
||
|
||
### 1.2 Scope
|
||
- **In scope**: VM lifecycle (create, start, stop, delete), hypervisor abstraction (KVM, FireCracker, mvisor), image management, resource allocation (CPU, memory, storage, network), multi-tenant isolation, console/serial access, live migration (future)
|
||
- **Out of scope**: Container orchestration (Kubernetes), bare metal provisioning, storage backend implementation (uses LightningSTOR), network fabric (uses overlay network)
|
||
|
||
### 1.3 Design Goals
|
||
- **Hypervisor agnostic**: Trait-based abstraction supporting KVM, FireCracker, mvisor
|
||
- **AWS/GCP EC2-like UX**: Familiar concepts for cloud users
|
||
- **Multi-tenant from day one**: Full org/project hierarchy with resource isolation
|
||
- **High density**: Support thousands of VMs per node
|
||
- **Fast boot**: Sub-second boot times with FireCracker/microVMs
|
||
- **Observable**: Rich metrics, events, and audit logging
|
||
|
||
## 2. Architecture
|
||
|
||
### 2.1 Crate Structure
|
||
```
|
||
plasmavmc/
|
||
├── crates/
|
||
│ ├── plasmavmc-api/ # gRPC service implementations
|
||
│ ├── plasmavmc-client/ # Rust client library
|
||
│ ├── plasmavmc-core/ # Core orchestration logic
|
||
│ ├── plasmavmc-hypervisor/ # Hypervisor trait + registry
|
||
│ ├── plasmavmc-kvm/ # KVM/QEMU backend
|
||
│ ├── plasmavmc-firecracker/# FireCracker backend
|
||
│ ├── plasmavmc-mvisor/ # mvisor backend
|
||
│ ├── plasmavmc-server/ # Control plane server
|
||
│ ├── plasmavmc-agent/ # Node agent binary
|
||
│ ├── plasmavmc-storage/ # Image/disk management
|
||
│ └── plasmavmc-types/ # Shared types
|
||
└── proto/
|
||
├── plasmavmc.proto # Public API
|
||
└── agent.proto # Agent internal RPCs
|
||
```
|
||
|
||
### 2.2 Component Topology
|
||
```
|
||
┌─────────────────────────────────────────────────────────────────┐
|
||
│ Control Plane │
|
||
│ ┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐ │
|
||
│ │ plasmavmc-api │ │ plasmavmc-core │ │plasmavmc-storage│ │
|
||
│ │ (gRPC svc) │──│ (scheduler) │──│ (image mgmt) │ │
|
||
│ └────────┬────────┘ └────────┬────────┘ └────────┬────────┘ │
|
||
│ │ │ │ │
|
||
│ └────────────────────┼────────────────────┘ │
|
||
│ │ │
|
||
│ ┌──────▼──────┐ │
|
||
│ │ Chainfire │ │
|
||
│ │ (state) │ │
|
||
│ └─────────────┘ │
|
||
└─────────────────────────────────────────────────────────────────┘
|
||
│
|
||
┌────────────┼────────────┐
|
||
▼ ▼ ▼
|
||
┌─────────────────────┐ ┌─────────────────────┐ ┌─────────────────────┐
|
||
│ Node 1 │ │ Node 2 │ │ Node N │
|
||
│ ┌─────────────────┐ │ │ ┌─────────────────┐ │ │ ┌─────────────────┐ │
|
||
│ │ plasmavmc-agent │ │ │ │ plasmavmc-agent │ │ │ │ plasmavmc-agent │ │
|
||
│ └────────┬────────┘ │ │ └────────┬────────┘ │ │ └────────┬────────┘ │
|
||
│ │ │ │ │ │ │ │ │
|
||
│ ┌────────▼────────┐ │ │ ┌────────▼────────┐ │ │ ┌────────▼────────┐ │
|
||
│ │HypervisorBackend│ │ │ │HypervisorBackend│ │ │ │HypervisorBackend│ │
|
||
│ │ (KVM/FC/mvisor) │ │ │ │ (KVM/FC/mvisor) │ │ │ │ (KVM/FC/mvisor) │ │
|
||
│ └─────────────────┘ │ │ └─────────────────┘ │ │ └─────────────────┘ │
|
||
└─────────────────────┘ └─────────────────────┘ └─────────────────────┘
|
||
```
|
||
|
||
### 2.3 Data Flow
|
||
```
|
||
[Client gRPC] → [API Layer] → [Scheduler] → [Agent gRPC] → [Hypervisor]
|
||
│ │
|
||
▼ ▼
|
||
[Chainfire] [Node Selection]
|
||
(VM state) (capacity, affinity)
|
||
```
|
||
|
||
### 2.4 Dependencies
|
||
| Crate | Version | Purpose |
|
||
|-------|---------|---------|
|
||
| tokio | 1.x | Async runtime |
|
||
| tonic | 0.12 | gRPC framework |
|
||
| prost | 0.13 | Protocol buffers |
|
||
| uuid | 1.x | VM/resource identifiers |
|
||
| dashmap | 6.x | Concurrent state caches |
|
||
| nix | 0.29 | Linux system calls |
|
||
|
||
## 3. Core Concepts
|
||
|
||
### 3.1 Virtual Machine (VM)
|
||
The primary managed resource representing a virtual machine instance.
|
||
|
||
```rust
|
||
pub struct VirtualMachine {
|
||
pub id: VmId, // UUID
|
||
pub name: String, // User-defined name
|
||
pub org_id: String, // Organization owner
|
||
pub project_id: String, // Project owner
|
||
pub state: VmState, // Current state
|
||
pub spec: VmSpec, // Desired configuration
|
||
pub status: VmStatus, // Runtime status
|
||
pub node_id: Option<NodeId>, // Assigned node
|
||
pub hypervisor: HypervisorType, // Backend type
|
||
pub created_at: u64,
|
||
pub updated_at: u64,
|
||
pub created_by: String, // Principal ID
|
||
pub metadata: HashMap<String, String>,
|
||
pub labels: HashMap<String, String>,
|
||
}
|
||
|
||
pub struct VmSpec {
|
||
pub cpu: CpuSpec,
|
||
pub memory: MemorySpec,
|
||
pub disks: Vec<DiskSpec>,
|
||
pub network: Vec<NetworkSpec>,
|
||
pub boot: BootSpec,
|
||
pub security: SecuritySpec,
|
||
}
|
||
|
||
pub struct CpuSpec {
|
||
pub vcpus: u32, // Number of vCPUs
|
||
pub cores_per_socket: u32, // Topology: cores per socket
|
||
pub sockets: u32, // Topology: socket count
|
||
pub cpu_model: Option<String>, // e.g., "host-passthrough"
|
||
}
|
||
|
||
pub struct MemorySpec {
|
||
pub size_mib: u64, // Memory size in MiB
|
||
pub hugepages: bool, // Use huge pages
|
||
pub numa_nodes: Vec<NumaNode>, // NUMA topology
|
||
}
|
||
|
||
pub struct DiskSpec {
|
||
pub id: String, // Disk identifier
|
||
pub source: DiskSource, // Image or volume
|
||
pub size_gib: u64, // Disk size
|
||
pub bus: DiskBus, // virtio, scsi, ide
|
||
pub cache: DiskCache, // none, writeback, writethrough
|
||
pub boot_index: Option<u32>, // Boot order
|
||
}
|
||
|
||
pub struct NetworkSpec {
|
||
pub id: String, // Interface identifier
|
||
pub network_id: String, // Overlay network ID
|
||
pub mac_address: Option<String>,
|
||
pub ip_address: Option<String>,
|
||
pub model: NicModel, // virtio-net, e1000
|
||
pub security_groups: Vec<String>,
|
||
}
|
||
```
|
||
|
||
### 3.2 VM State Machine
|
||
```
|
||
┌──────────────────────────────────────┐
|
||
▼ │
|
||
┌─────────┐ create ┌─────────┐ start ┌─────────┐ │
|
||
│ PENDING │──────────►│ STOPPED │──────────►│ RUNNING │ │
|
||
└─────────┘ └────┬────┘ └────┬────┘ │
|
||
│ delete │ │
|
||
▼ │ stop │
|
||
┌─────────┐ │ │
|
||
│ DELETED │◄───────────────┤ │
|
||
└─────────┘ │ │
|
||
│ reboot │
|
||
└───────────┘
|
||
|
||
Additional states:
|
||
CREATING - Provisioning resources
|
||
STARTING - Boot in progress
|
||
STOPPING - Shutdown in progress
|
||
MIGRATING - Live migration in progress
|
||
ERROR - Failed state (recoverable)
|
||
FAILED - Terminal failure
|
||
```
|
||
|
||
```rust
|
||
pub enum VmState {
|
||
Pending, // Awaiting scheduling
|
||
Creating, // Resources being provisioned
|
||
Stopped, // Created but not running
|
||
Starting, // Boot in progress
|
||
Running, // Active and healthy
|
||
Stopping, // Graceful shutdown
|
||
Migrating, // Live migration in progress
|
||
Error, // Recoverable error
|
||
Failed, // Terminal failure
|
||
Deleted, // Soft-deleted, pending cleanup
|
||
}
|
||
```
|
||
|
||
### 3.3 Runtime Status
|
||
```rust
|
||
pub struct VmStatus {
|
||
pub actual_state: VmState,
|
||
pub host_pid: Option<u32>, // Hypervisor process PID
|
||
pub started_at: Option<u64>, // Last boot timestamp
|
||
pub ip_addresses: Vec<IpAddress>,
|
||
pub resource_usage: ResourceUsage,
|
||
pub last_error: Option<String>,
|
||
pub conditions: Vec<Condition>,
|
||
}
|
||
|
||
pub struct ResourceUsage {
|
||
pub cpu_percent: f64,
|
||
pub memory_used_mib: u64,
|
||
pub disk_read_bytes: u64,
|
||
pub disk_write_bytes: u64,
|
||
pub network_rx_bytes: u64,
|
||
pub network_tx_bytes: u64,
|
||
}
|
||
```
|
||
|
||
### 3.4 Image
|
||
Bootable disk images for VM creation.
|
||
|
||
```rust
|
||
pub struct Image {
|
||
pub id: ImageId,
|
||
pub name: String,
|
||
pub org_id: String, // Owner org (or "system" for public)
|
||
pub visibility: Visibility, // Public, Private, Shared
|
||
pub source: ImageSource,
|
||
pub format: ImageFormat,
|
||
pub size_bytes: u64,
|
||
pub checksum: String, // SHA256
|
||
pub os_type: OsType,
|
||
pub os_version: String,
|
||
pub architecture: Architecture,
|
||
pub min_disk_gib: u32,
|
||
pub min_memory_mib: u32,
|
||
pub status: ImageStatus,
|
||
pub created_at: u64,
|
||
pub updated_at: u64,
|
||
pub metadata: HashMap<String, String>,
|
||
}
|
||
|
||
pub enum ImageSource {
|
||
Url { url: String },
|
||
Upload { storage_path: String },
|
||
Snapshot { vm_id: VmId, disk_id: String },
|
||
}
|
||
|
||
pub enum ImageFormat {
|
||
Raw,
|
||
Qcow2,
|
||
Vmdk,
|
||
Vhd,
|
||
}
|
||
|
||
pub enum Visibility {
|
||
Public, // Available to all orgs
|
||
Private, // Only owner org
|
||
Shared { orgs: Vec<String> },
|
||
}
|
||
```
|
||
|
||
### 3.5 Node
|
||
Physical or virtual host running the agent.
|
||
|
||
```rust
|
||
pub struct Node {
|
||
pub id: NodeId,
|
||
pub name: String,
|
||
pub state: NodeState,
|
||
pub capacity: NodeCapacity,
|
||
pub allocatable: NodeCapacity,
|
||
pub allocated: NodeCapacity,
|
||
pub hypervisors: Vec<HypervisorType>, // Supported backends
|
||
pub labels: HashMap<String, String>,
|
||
pub taints: Vec<Taint>,
|
||
pub conditions: Vec<NodeCondition>,
|
||
pub agent_version: String,
|
||
pub last_heartbeat: u64,
|
||
}
|
||
|
||
pub struct NodeCapacity {
|
||
pub vcpus: u32,
|
||
pub memory_mib: u64,
|
||
pub storage_gib: u64,
|
||
}
|
||
|
||
pub enum NodeState {
|
||
Ready,
|
||
NotReady,
|
||
Cordoned, // No new VMs scheduled
|
||
Draining, // Migrating VMs off
|
||
Maintenance,
|
||
}
|
||
```
|
||
|
||
## 4. Hypervisor Abstraction
|
||
|
||
### 4.1 Backend Trait
|
||
```rust
|
||
#[async_trait]
|
||
pub trait HypervisorBackend: Send + Sync {
|
||
/// Backend identifier
|
||
fn backend_type(&self) -> HypervisorType;
|
||
|
||
/// Check if this backend supports the given VM spec
|
||
fn supports(&self, spec: &VmSpec) -> Result<(), UnsupportedFeature>;
|
||
|
||
/// Create VM resources (disk, network) without starting
|
||
async fn create(&self, vm: &VirtualMachine) -> Result<VmHandle>;
|
||
|
||
/// Start the VM
|
||
async fn start(&self, handle: &VmHandle) -> Result<()>;
|
||
|
||
/// Stop the VM (graceful shutdown)
|
||
async fn stop(&self, handle: &VmHandle, timeout: Duration) -> Result<()>;
|
||
|
||
/// Force stop the VM
|
||
async fn kill(&self, handle: &VmHandle) -> Result<()>;
|
||
|
||
/// Reboot the VM
|
||
async fn reboot(&self, handle: &VmHandle) -> Result<()>;
|
||
|
||
/// Delete VM and cleanup resources
|
||
async fn delete(&self, handle: &VmHandle) -> Result<()>;
|
||
|
||
/// Get current VM status
|
||
async fn status(&self, handle: &VmHandle) -> Result<VmStatus>;
|
||
|
||
/// Attach a disk to running VM
|
||
async fn attach_disk(&self, handle: &VmHandle, disk: &DiskSpec) -> Result<()>;
|
||
|
||
/// Detach a disk from running VM
|
||
async fn detach_disk(&self, handle: &VmHandle, disk_id: &str) -> Result<()>;
|
||
|
||
/// Attach a network interface
|
||
async fn attach_nic(&self, handle: &VmHandle, nic: &NetworkSpec) -> Result<()>;
|
||
|
||
/// Get console stream (VNC/serial)
|
||
async fn console(&self, handle: &VmHandle, console_type: ConsoleType)
|
||
-> Result<Box<dyn AsyncReadWrite>>;
|
||
|
||
/// Take a snapshot
|
||
async fn snapshot(&self, handle: &VmHandle, snapshot_id: &str) -> Result<()>;
|
||
}
|
||
```
|
||
|
||
### 4.2 Hypervisor Types
|
||
```rust
|
||
pub enum HypervisorType {
|
||
Kvm, // QEMU/KVM - full-featured
|
||
Firecracker, // AWS Firecracker - microVMs
|
||
Mvisor, // mvisor - lightweight
|
||
}
|
||
```
|
||
|
||
### 4.3 Backend Registry
|
||
```rust
|
||
pub struct HypervisorRegistry {
|
||
backends: HashMap<HypervisorType, Arc<dyn HypervisorBackend>>,
|
||
}
|
||
|
||
impl HypervisorRegistry {
|
||
pub fn register(&mut self, backend: Arc<dyn HypervisorBackend>);
|
||
pub fn get(&self, typ: HypervisorType) -> Option<Arc<dyn HypervisorBackend>>;
|
||
pub fn available(&self) -> Vec<HypervisorType>;
|
||
}
|
||
```
|
||
|
||
### 4.4 Backend Capabilities
|
||
```rust
|
||
pub struct BackendCapabilities {
|
||
pub live_migration: bool,
|
||
pub hot_plug_cpu: bool,
|
||
pub hot_plug_memory: bool,
|
||
pub hot_plug_disk: bool,
|
||
pub hot_plug_nic: bool,
|
||
pub vnc_console: bool,
|
||
pub serial_console: bool,
|
||
pub nested_virtualization: bool,
|
||
pub gpu_passthrough: bool,
|
||
pub max_vcpus: u32,
|
||
pub max_memory_gib: u64,
|
||
pub supported_disk_buses: Vec<DiskBus>,
|
||
pub supported_nic_models: Vec<NicModel>,
|
||
}
|
||
```
|
||
|
||
### 4.5 KVM Backend Implementation
|
||
```rust
|
||
// plasmavmc-kvm crate
|
||
pub struct KvmBackend {
|
||
qemu_path: PathBuf,
|
||
runtime_dir: PathBuf,
|
||
network_helper: NetworkHelper,
|
||
}
|
||
|
||
impl HypervisorBackend for KvmBackend {
|
||
fn backend_type(&self) -> HypervisorType {
|
||
HypervisorType::Kvm
|
||
}
|
||
|
||
async fn create(&self, vm: &VirtualMachine) -> Result<VmHandle> {
|
||
// Generate QEMU command line
|
||
// Create runtime directory
|
||
// Prepare disks and network devices
|
||
}
|
||
|
||
async fn start(&self, handle: &VmHandle) -> Result<()> {
|
||
// Launch QEMU process
|
||
// Wait for QMP socket
|
||
// Configure via QMP
|
||
}
|
||
// ... other methods
|
||
}
|
||
```
|
||
|
||
### 4.6 FireCracker Backend Implementation
|
||
```rust
|
||
// plasmavmc-firecracker crate
|
||
pub struct FirecrackerBackend {
|
||
fc_path: PathBuf,
|
||
jailer_path: PathBuf,
|
||
runtime_dir: PathBuf,
|
||
}
|
||
|
||
impl HypervisorBackend for FirecrackerBackend {
|
||
fn backend_type(&self) -> HypervisorType {
|
||
HypervisorType::Firecracker
|
||
}
|
||
|
||
fn supports(&self, spec: &VmSpec) -> Result<(), UnsupportedFeature> {
|
||
// FireCracker limitations:
|
||
// - No VNC, only serial
|
||
// - No live migration
|
||
// - Limited device models
|
||
if spec.disks.iter().any(|d| d.bus != DiskBus::Virtio) {
|
||
return Err(UnsupportedFeature::DiskBus);
|
||
}
|
||
Ok(())
|
||
}
|
||
// ... other methods
|
||
}
|
||
```
|
||
|
||
## 5. API
|
||
|
||
### 5.1 gRPC Services
|
||
|
||
#### VM Service (`plasmavmc.v1.VmService`)
|
||
```protobuf
|
||
service VmService {
|
||
// Lifecycle
|
||
rpc CreateVm(CreateVmRequest) returns (VirtualMachine);
|
||
rpc GetVm(GetVmRequest) returns (VirtualMachine);
|
||
rpc ListVms(ListVmsRequest) returns (ListVmsResponse);
|
||
rpc UpdateVm(UpdateVmRequest) returns (VirtualMachine);
|
||
rpc DeleteVm(DeleteVmRequest) returns (Empty);
|
||
|
||
// Power operations
|
||
rpc StartVm(StartVmRequest) returns (VirtualMachine);
|
||
rpc StopVm(StopVmRequest) returns (VirtualMachine);
|
||
rpc RebootVm(RebootVmRequest) returns (VirtualMachine);
|
||
rpc ResetVm(ResetVmRequest) returns (VirtualMachine);
|
||
|
||
// Disks
|
||
rpc AttachDisk(AttachDiskRequest) returns (VirtualMachine);
|
||
rpc DetachDisk(DetachDiskRequest) returns (VirtualMachine);
|
||
|
||
// Network
|
||
rpc AttachNic(AttachNicRequest) returns (VirtualMachine);
|
||
rpc DetachNic(DetachNicRequest) returns (VirtualMachine);
|
||
|
||
// Console
|
||
rpc GetConsole(GetConsoleRequest) returns (stream ConsoleData);
|
||
|
||
// Events
|
||
rpc WatchVm(WatchVmRequest) returns (stream VmEvent);
|
||
}
|
||
```
|
||
|
||
#### Image Service (`plasmavmc.v1.ImageService`)
|
||
```protobuf
|
||
service ImageService {
|
||
rpc CreateImage(CreateImageRequest) returns (Image);
|
||
rpc GetImage(GetImageRequest) returns (Image);
|
||
rpc ListImages(ListImagesRequest) returns (ListImagesResponse);
|
||
rpc UpdateImage(UpdateImageRequest) returns (Image);
|
||
rpc DeleteImage(DeleteImageRequest) returns (Empty);
|
||
|
||
// Upload/Download
|
||
rpc UploadImage(stream UploadImageRequest) returns (Image);
|
||
rpc DownloadImage(DownloadImageRequest) returns (stream DownloadImageResponse);
|
||
|
||
// Conversion
|
||
rpc ConvertImage(ConvertImageRequest) returns (Image);
|
||
}
|
||
```
|
||
|
||
#### Node Service (`plasmavmc.v1.NodeService`)
|
||
```protobuf
|
||
service NodeService {
|
||
rpc ListNodes(ListNodesRequest) returns (ListNodesResponse);
|
||
rpc GetNode(GetNodeRequest) returns (Node);
|
||
rpc CordonNode(CordonNodeRequest) returns (Node);
|
||
rpc UncordonNode(UncordonNodeRequest) returns (Node);
|
||
rpc DrainNode(DrainNodeRequest) returns (Node);
|
||
}
|
||
```
|
||
|
||
### 5.2 Agent Internal API (`plasmavmc.agent.v1`)
|
||
```protobuf
|
||
service AgentService {
|
||
// VM operations (called by control plane)
|
||
rpc CreateVm(CreateVmRequest) returns (VmHandle);
|
||
rpc StartVm(StartVmRequest) returns (Empty);
|
||
rpc StopVm(StopVmRequest) returns (Empty);
|
||
rpc DeleteVm(DeleteVmRequest) returns (Empty);
|
||
rpc GetVmStatus(GetVmStatusRequest) returns (VmStatus);
|
||
|
||
// Node status (reported to control plane)
|
||
rpc Heartbeat(HeartbeatRequest) returns (HeartbeatResponse);
|
||
rpc ReportStatus(ReportStatusRequest) returns (Empty);
|
||
}
|
||
```
|
||
|
||
### 5.3 Client Library
|
||
```rust
|
||
use plasmavmc_client::PlasmaClient;
|
||
|
||
let client = PlasmaClient::connect("http://127.0.0.1:8080").await?;
|
||
|
||
// Create VM
|
||
let vm = client.create_vm(CreateVmRequest {
|
||
name: "my-vm".into(),
|
||
org_id: "org-1".into(),
|
||
project_id: "proj-1".into(),
|
||
spec: VmSpec {
|
||
cpu: CpuSpec { vcpus: 2, ..Default::default() },
|
||
memory: MemorySpec { size_mib: 2048, ..Default::default() },
|
||
disks: vec![DiskSpec {
|
||
source: DiskSource::Image { id: "ubuntu-22.04".into() },
|
||
size_gib: 20,
|
||
..Default::default()
|
||
}],
|
||
network: vec![NetworkSpec {
|
||
network_id: "default".into(),
|
||
..Default::default()
|
||
}],
|
||
..Default::default()
|
||
},
|
||
hypervisor: HypervisorType::Kvm,
|
||
..Default::default()
|
||
}).await?;
|
||
|
||
// Start VM
|
||
client.start_vm(vm.id).await?;
|
||
|
||
// Watch events
|
||
let mut stream = client.watch_vm(vm.id).await?;
|
||
while let Some(event) = stream.next().await {
|
||
println!("Event: {:?}", event);
|
||
}
|
||
```
|
||
|
||
## 6. Scheduling
|
||
|
||
### 6.1 Scheduler
|
||
```rust
|
||
pub struct Scheduler {
|
||
node_cache: Arc<NodeCache>,
|
||
filters: Vec<Box<dyn ScheduleFilter>>,
|
||
scorers: Vec<Box<dyn ScheduleScorer>>,
|
||
}
|
||
|
||
impl Scheduler {
|
||
pub async fn schedule(&self, vm: &VirtualMachine) -> Result<NodeId> {
|
||
let candidates = self.node_cache.ready_nodes();
|
||
|
||
// Filter phase
|
||
let filtered: Vec<_> = candidates
|
||
.into_iter()
|
||
.filter(|n| self.filters.iter().all(|f| f.filter(vm, n)))
|
||
.collect();
|
||
|
||
if filtered.is_empty() {
|
||
return Err(Error::NoSuitableNode);
|
||
}
|
||
|
||
// Score phase
|
||
let scored: Vec<_> = filtered
|
||
.into_iter()
|
||
.map(|n| {
|
||
let score: i64 = self.scorers.iter().map(|s| s.score(vm, &n)).sum();
|
||
(n, score)
|
||
})
|
||
.collect();
|
||
|
||
// Select highest score
|
||
let (node, _) = scored.into_iter().max_by_key(|(_, s)| *s).unwrap();
|
||
Ok(node.id)
|
||
}
|
||
}
|
||
```
|
||
|
||
### 6.2 Filters
|
||
```rust
|
||
pub trait ScheduleFilter: Send + Sync {
|
||
fn name(&self) -> &'static str;
|
||
fn filter(&self, vm: &VirtualMachine, node: &Node) -> bool;
|
||
}
|
||
|
||
// Built-in filters
|
||
struct ResourceFilter; // CPU/memory fits
|
||
struct HypervisorFilter; // Node supports hypervisor type
|
||
struct TaintFilter; // Toleration matching
|
||
struct AffinityFilter; // Node affinity rules
|
||
struct AntiAffinityFilter; // Pod anti-affinity
|
||
```
|
||
|
||
### 6.3 Scorers
|
||
```rust
|
||
pub trait ScheduleScorer: Send + Sync {
|
||
fn name(&self) -> &'static str;
|
||
fn score(&self, vm: &VirtualMachine, node: &Node) -> i64;
|
||
}
|
||
|
||
// Built-in scorers
|
||
struct LeastAllocatedScorer; // Prefer less loaded nodes
|
||
struct BalancedResourceScorer; // Balance CPU/memory ratio
|
||
struct LocalityScorer; // Prefer same zone/rack
|
||
```
|
||
|
||
## 7. Multi-Tenancy
|
||
|
||
### 7.1 Resource Hierarchy
|
||
```
|
||
System (platform operators)
|
||
└─ Organization (tenant boundary)
|
||
└─ Project (workload isolation)
|
||
└─ Resources (VMs, images, networks)
|
||
```
|
||
|
||
### 7.2 Scoped Resources
|
||
```rust
|
||
// All resources include scope identifiers
|
||
pub trait Scoped {
|
||
fn org_id(&self) -> &str;
|
||
fn project_id(&self) -> &str;
|
||
}
|
||
|
||
// Resource paths follow aegis pattern
|
||
// org/{org_id}/project/{project_id}/vm/{vm_id}
|
||
// org/{org_id}/project/{project_id}/image/{image_id}
|
||
```
|
||
|
||
### 7.3 Quotas
|
||
```rust
|
||
pub struct Quota {
|
||
pub scope: Scope, // Org or Project
|
||
pub limits: ResourceLimits,
|
||
pub usage: ResourceUsage,
|
||
}
|
||
|
||
pub struct ResourceLimits {
|
||
pub max_vms: Option<u32>,
|
||
pub max_vcpus: Option<u32>,
|
||
pub max_memory_gib: Option<u64>,
|
||
pub max_storage_gib: Option<u64>,
|
||
pub max_images: Option<u32>,
|
||
}
|
||
```
|
||
|
||
### 7.4 Namespace Isolation
|
||
- **Compute**: VMs scoped to project, nodes shared across orgs
|
||
- **Network**: Overlay network provides tenant isolation
|
||
- **Storage**: Images can be private, shared, or public
|
||
- **Naming**: Names unique within project scope
|
||
|
||
## 8. Storage
|
||
|
||
### 8.1 State Storage (Chainfire)
|
||
```
|
||
# VM records
|
||
plasmavmc/vms/{org_id}/{project_id}/{vm_id}
|
||
|
||
# Image records
|
||
plasmavmc/images/{org_id}/{image_id}
|
||
plasmavmc/images/public/{image_id}
|
||
|
||
# Node records
|
||
plasmavmc/nodes/{node_id}
|
||
|
||
# Scheduling state
|
||
plasmavmc/scheduler/assignments/{vm_id}
|
||
plasmavmc/scheduler/pending/{timestamp}/{vm_id}
|
||
```
|
||
|
||
### 8.2 Image Storage
|
||
- **Backend**: LightningSTOR (object storage)
|
||
- **Format**: Raw, qcow2, vmdk with automatic conversion
|
||
- **Caching**: Node-local image cache with pull-through
|
||
- **Path**: `images/{org_id}/{image_id}/{version}`
|
||
|
||
### 8.3 Disk Storage
|
||
- **Ephemeral**: Local SSD/NVMe on node
|
||
- **Persistent**: LightningSTOR volumes (via CSI)
|
||
- **Snapshot**: Copy-on-write via backend
|
||
|
||
## 9. Configuration
|
||
|
||
### 9.1 Control Plane Config (TOML)
|
||
```toml
|
||
[server]
|
||
addr = "0.0.0.0:8080"
|
||
|
||
[server.tls]
|
||
cert_file = "/etc/plasmavmc/tls/server.crt"
|
||
key_file = "/etc/plasmavmc/tls/server.key"
|
||
ca_file = "/etc/plasmavmc/tls/ca.crt"
|
||
|
||
[store]
|
||
backend = "chainfire"
|
||
chainfire_endpoints = ["http://chainfire-1:2379", "http://chainfire-2:2379"]
|
||
|
||
[iam]
|
||
endpoint = "http://aegis:9090"
|
||
service_account = "plasmavmc-controller"
|
||
token_path = "/var/run/secrets/iam/token"
|
||
|
||
[scheduler]
|
||
default_hypervisor = "kvm"
|
||
|
||
[image_store]
|
||
backend = "lightningstор"
|
||
endpoint = "http://lightningstор:9000"
|
||
bucket = "vm-images"
|
||
|
||
[logging]
|
||
level = "info"
|
||
format = "json"
|
||
```
|
||
|
||
### 9.2 Agent Config (TOML)
|
||
```toml
|
||
[agent]
|
||
node_id = "node-001"
|
||
control_plane = "http://plasmavmc-api:8080"
|
||
|
||
[agent.tls]
|
||
cert_file = "/etc/plasmavmc/tls/agent.crt"
|
||
key_file = "/etc/plasmavmc/tls/agent.key"
|
||
ca_file = "/etc/plasmavmc/tls/ca.crt"
|
||
|
||
[hypervisors]
|
||
enabled = ["kvm", "firecracker"]
|
||
|
||
[hypervisors.kvm]
|
||
qemu_path = "/usr/bin/qemu-system-x86_64"
|
||
runtime_dir = "/var/run/plasmavmc/kvm"
|
||
|
||
[hypervisors.firecracker]
|
||
fc_path = "/usr/bin/firecracker"
|
||
jailer_path = "/usr/bin/jailer"
|
||
runtime_dir = "/var/run/plasmavmc/fc"
|
||
|
||
[storage]
|
||
image_cache_dir = "/var/lib/plasmavmc/images"
|
||
runtime_dir = "/var/lib/plasmavmc/vms"
|
||
cache_size_gib = 100
|
||
|
||
[network]
|
||
overlay_endpoint = "http://ovn-controller:6641"
|
||
bridge_name = "plasmavmc0"
|
||
|
||
[logging]
|
||
level = "info"
|
||
format = "json"
|
||
```
|
||
|
||
### 9.3 Environment Variables
|
||
| Variable | Default | Description |
|
||
|----------|---------|-------------|
|
||
| `PLASMAVMC_CONFIG` | - | Config file path |
|
||
| `PLASMAVMC_ADDR` | `0.0.0.0:8080` | API listen address |
|
||
| `PLASMAVMC_LOG_LEVEL` | `info` | Log level |
|
||
| `PLASMAVMC_NODE_ID` | - | Agent node identifier |
|
||
|
||
### 9.4 CLI Arguments
|
||
```
|
||
plasmavmc-server [OPTIONS]
|
||
-c, --config <PATH> Config file path
|
||
-a, --addr <ADDR> Listen address
|
||
-l, --log-level <LEVEL> Log level
|
||
-h, --help Print help
|
||
-V, --version Print version
|
||
|
||
plasmavmc-agent [OPTIONS]
|
||
-c, --config <PATH> Config file path
|
||
-n, --node-id <ID> Node identifier
|
||
--control-plane <URL> Control plane endpoint
|
||
-h, --help Print help
|
||
```
|
||
|
||
## 10. Integration
|
||
|
||
### 10.1 Aegis (IAM)
|
||
```rust
|
||
// Authorization check before VM operations
|
||
async fn authorize_vm_action(
|
||
iam: &IamClient,
|
||
principal: &PrincipalRef,
|
||
action: &str,
|
||
vm: &VirtualMachine,
|
||
) -> Result<()> {
|
||
let resource = ResourceRef {
|
||
kind: "vm".into(),
|
||
id: vm.id.to_string(),
|
||
org_id: vm.org_id.clone(),
|
||
project_id: vm.project_id.clone(),
|
||
..Default::default()
|
||
};
|
||
|
||
let allowed = iam.authorize(principal, action, &resource).await?;
|
||
if !allowed {
|
||
return Err(Error::PermissionDenied);
|
||
}
|
||
Ok(())
|
||
}
|
||
|
||
// Action patterns
|
||
// plasmavmc:vms:create
|
||
// plasmavmc:vms:get
|
||
// plasmavmc:vms:list
|
||
// plasmavmc:vms:update
|
||
// plasmavmc:vms:delete
|
||
// plasmavmc:vms:start
|
||
// plasmavmc:vms:stop
|
||
// plasmavmc:vms:console
|
||
// plasmavmc:images:create
|
||
// plasmavmc:images:get
|
||
// plasmavmc:images:delete
|
||
```
|
||
|
||
### 10.2 Overlay Network
|
||
```rust
|
||
// Network integration for VM NICs
|
||
pub trait NetworkProvider: Send + Sync {
|
||
/// Allocate port for VM NIC
|
||
async fn create_port(&self, req: CreatePortRequest) -> Result<Port>;
|
||
|
||
/// Release port
|
||
async fn delete_port(&self, port_id: &str) -> Result<()>;
|
||
|
||
/// Get port details (MAC, IP, security groups)
|
||
async fn get_port(&self, port_id: &str) -> Result<Port>;
|
||
|
||
/// Update security groups
|
||
async fn update_security_groups(
|
||
&self,
|
||
port_id: &str,
|
||
groups: Vec<String>,
|
||
) -> Result<()>;
|
||
}
|
||
|
||
pub struct Port {
|
||
pub id: String,
|
||
pub network_id: String,
|
||
pub mac_address: String,
|
||
pub ip_addresses: Vec<IpAddress>,
|
||
pub security_groups: Vec<String>,
|
||
pub tap_device: String, // Host tap device name
|
||
}
|
||
```
|
||
|
||
### 10.3 Chainfire (State)
|
||
```rust
|
||
// Watch for VM changes (controller pattern)
|
||
async fn reconcile_loop(chainfire: &ChainfireClient) {
|
||
let mut watch = chainfire
|
||
.watch_prefix("plasmavmc/vms/")
|
||
.await?;
|
||
|
||
while let Some(event) = watch.next().await {
|
||
match event.event_type {
|
||
Put => reconcile_vm(event.kv).await?,
|
||
Delete => cleanup_vm(event.kv).await?,
|
||
}
|
||
}
|
||
}
|
||
```
|
||
|
||
## 11. Security
|
||
|
||
### 11.1 Authentication
|
||
- **Control Plane**: mTLS + aegis tokens
|
||
- **Agent**: mTLS with node certificate
|
||
- **Console**: WebSocket with aegis token
|
||
|
||
### 11.2 Authorization
|
||
- Integrated with aegis (IAM)
|
||
- Action-based permissions
|
||
- Scope enforcement (org/project)
|
||
|
||
### 11.3 VM Isolation
|
||
- **Process**: Hypervisor process per VM
|
||
- **Filesystem**: Seccomp, namespaces, chroot (FireCracker jailer)
|
||
- **Network**: Overlay network tenant isolation
|
||
- **Resources**: cgroups for CPU/memory limits
|
||
|
||
### 11.4 Image Security
|
||
- Checksum verification on download
|
||
- Signature verification (optional)
|
||
- Content scanning integration point
|
||
|
||
## 12. Operations
|
||
|
||
### 12.1 Deployment
|
||
|
||
**Single Node (Development)**
|
||
```bash
|
||
# Start control plane
|
||
plasmavmc-server --config config.toml
|
||
|
||
# Start agent on same node
|
||
plasmavmc-agent --config agent.toml --node-id dev-node
|
||
```
|
||
|
||
**Production Cluster**
|
||
```bash
|
||
# Control plane (3 instances for HA)
|
||
plasmavmc-server --config config.toml
|
||
|
||
# Agents (each compute node)
|
||
plasmavmc-agent --config agent.toml --node-id node-$(hostname)
|
||
```
|
||
|
||
### 12.2 Monitoring
|
||
|
||
**Metrics (Prometheus)**
|
||
| Metric | Type | Description |
|
||
|--------|------|-------------|
|
||
| `plasmavmc_vms_total` | Gauge | Total VMs by state |
|
||
| `plasmavmc_vm_operations_total` | Counter | Operations by type |
|
||
| `plasmavmc_vm_boot_seconds` | Histogram | VM boot time |
|
||
| `plasmavmc_node_capacity_vcpus` | Gauge | Node vCPU capacity |
|
||
| `plasmavmc_node_allocated_vcpus` | Gauge | Allocated vCPUs |
|
||
| `plasmavmc_scheduler_latency_seconds` | Histogram | Scheduling latency |
|
||
| `plasmavmc_agent_heartbeat_age_seconds` | Gauge | Time since heartbeat |
|
||
|
||
**Health Endpoints**
|
||
- `GET /health` - Liveness
|
||
- `GET /ready` - Readiness (chainfire connected, agents online)
|
||
|
||
### 12.3 Backup & Recovery
|
||
- **State**: Chainfire handles via Raft snapshots
|
||
- **Images**: LightningSTOR replication
|
||
- **VM Disks**: Volume snapshots via storage backend
|
||
|
||
## 13. Compatibility
|
||
|
||
### 13.1 API Versioning
|
||
- gRPC package: `plasmavmc.v1`
|
||
- Semantic versioning
|
||
- Backward compatible within major version
|
||
|
||
### 13.2 Hypervisor Versions
|
||
| Backend | Minimum Version | Notes |
|
||
|---------|-----------------|-------|
|
||
| QEMU/KVM | 6.0 | QMP protocol |
|
||
| FireCracker | 1.0 | API v1 |
|
||
| mvisor | TBD | |
|
||
|
||
## Appendix
|
||
|
||
### A. Error Codes
|
||
| Error | Meaning |
|
||
|-------|---------|
|
||
| VM_NOT_FOUND | VM does not exist |
|
||
| IMAGE_NOT_FOUND | Image does not exist |
|
||
| NODE_NOT_FOUND | Node does not exist |
|
||
| NO_SUITABLE_NODE | Scheduling failed |
|
||
| QUOTA_EXCEEDED | Resource quota exceeded |
|
||
| HYPERVISOR_ERROR | Backend operation failed |
|
||
| INVALID_STATE | Operation invalid for current state |
|
||
|
||
### B. Port Assignments
|
||
| Port | Protocol | Purpose |
|
||
|------|----------|---------|
|
||
| 8080 | gRPC | Control plane API |
|
||
| 8081 | HTTP | Metrics/health |
|
||
| 8082 | gRPC | Agent internal API |
|
||
|
||
### C. Glossary
|
||
- **VM**: Virtual Machine - an isolated compute instance
|
||
- **Hypervisor**: Software that creates and runs VMs (KVM, FireCracker)
|
||
- **Image**: Bootable disk image template
|
||
- **Node**: Physical/virtual host running VMs
|
||
- **Agent**: Daemon running on each node managing local VMs
|
||
- **Scheduler**: Component selecting nodes for VM placement
|
||
- **Overlay Network**: Virtual network providing tenant isolation
|
||
|
||
### D. Backend Comparison
|
||
|
||
| Feature | KVM/QEMU | FireCracker | mvisor |
|
||
|---------|----------|-------------|--------|
|
||
| Boot time | ~5s | <125ms | TBD |
|
||
| Memory overhead | Medium | Low | Low |
|
||
| Device support | Full | Limited | Limited |
|
||
| Live migration | Yes | No | No |
|
||
| VNC console | Yes | No | No |
|
||
| GPU passthrough | Yes | No | No |
|
||
| Nested virt | Yes | No | No |
|
||
| Best for | General | Serverless | TBD |
|