photoncloud-monorepo/docs/por/T013-vm-chainfire-persistence/schema.md
centra a7ec7e2158 Add T026 practical test + k8shost to flake + workspace files
- Created T026-practical-test task.yaml for MVP smoke testing
- Added k8shost-server to flake.nix (packages, apps, overlays)
- Staged all workspace directories for nix flake build
- Updated flake.nix shellHook to include k8shost

Resolves: T026.S1 blocker (R8 - nix submodule visibility)
2025-12-09 06:07:50 +09:00

4 KiB

PlasmaVMC ChainFire Key Schema

Date: 2025-12-08
Task: T013 S1
Status: Design Complete

Key Layout

VM Metadata

Key: /plasmavmc/vms/{org_id}/{project_id}/{vm_id}
Value: JSON-serialized VirtualMachine (plasmavmc_types::VirtualMachine)

VM Handle

Key: /plasmavmc/handles/{org_id}/{project_id}/{vm_id}
Value: JSON-serialized VmHandle (plasmavmc_types::VmHandle)

Lock Key (for atomic operations)

Key: /plasmavmc/locks/{org_id}/{project_id}/{vm_id}
Value: JSON-serialized LockInfo { timestamp: u64, node_id: String }
TTL: 30 seconds (via ChainFire lease)

Key Structure Rationale

  1. Prefix-based organization: /plasmavmc/ namespace isolates PlasmaVMC data
  2. Tenant scoping: {org_id}/{project_id} ensures multi-tenancy
  3. Resource separation: Separate keys for VM metadata and handles enable independent updates
  4. Lock mechanism: Uses ChainFire lease TTL for distributed locking without manual cleanup

Serialization

  • Format: JSON (via serde_json)
  • Rationale: Human-readable, debuggable, compatible with existing PersistedState structure
  • Alternative considered: bincode (rejected for debuggability)

Atomic Write Strategy

Option 1: Transaction-based (Preferred)

Use ChainFire transactions to atomically update VM + handle:

// Pseudo-code
let txn = TxnRequest {
    compare: vec![Compare {
        key: lock_key,
        result: CompareResult::Equal,
        target: CompareTarget::Version(0), // Lock doesn't exist
    }],
    success: vec![
        RequestOp { request: Some(Request::Put(vm_put)) },
        RequestOp { request: Some(Request::Put(handle_put)) },
        RequestOp { request: Some(Request::Put(lock_put)) },
    ],
    failure: vec![],
};

Option 2: Lease-based Locking (Fallback)

  1. Acquire lease (30s TTL)
  2. Put lock key with lease_id
  3. Update VM + handle
  4. Release lease (or let expire)

Fallback Behavior

File Fallback Mode

  • Trigger: PLASMAVMC_STORAGE_BACKEND=file or PLASMAVMC_CHAINFIRE_ENDPOINT unset
  • Behavior: Use existing file-based persistence (PLASMAVMC_STATE_PATH)
  • Locking: File-based lockfile ({state_path}.lock) with flock() or atomic rename

Migration Path

  1. On startup, if ChainFire unavailable and file exists, load from file
  2. If ChainFire available, prefer ChainFire; migrate file → ChainFire on first write
  3. File fallback remains for development/testing without ChainFire cluster

Configuration

Environment Variables

  • PLASMAVMC_STORAGE_BACKEND: chainfire (default) | file
  • PLASMAVMC_CHAINFIRE_ENDPOINT: ChainFire gRPC endpoint (e.g., http://127.0.0.1:50051)
  • PLASMAVMC_STATE_PATH: File fallback path (default: /var/run/plasmavmc/state.json)
  • PLASMAVMC_LOCK_TTL_SECONDS: Lock TTL (default: 30)

Config File (Future)

[storage]
backend = "chainfire"  # or "file"
chainfire_endpoint = "http://127.0.0.1:50051"
state_path = "/var/run/plasmavmc/state.json"
lock_ttl_seconds = 30

Operations

Create VM

  1. Generate vm_id (UUID)
  2. Acquire lock (transaction or lease)
  3. Put VM metadata key
  4. Put VM handle key
  5. Release lock

Update VM

  1. Acquire lock
  2. Get current VM (verify exists)
  3. Put updated VM metadata
  4. Put updated handle (if changed)
  5. Release lock

Delete VM

  1. Acquire lock
  2. Delete VM metadata key
  3. Delete VM handle key
  4. Release lock

Load on Startup

  1. Scan prefix /plasmavmc/vms/{org_id}/{project_id}/
  2. For each VM key, extract vm_id
  3. Load VM metadata
  4. Load corresponding handle
  5. Populate in-memory DashMap

Error Handling

  • ChainFire unavailable: Fall back to file mode (if configured)
  • Lock contention: Retry with exponential backoff (max 3 retries)
  • Serialization error: Log and return error (should not happen)
  • Partial write: Transaction rollback ensures atomicity

Testing Considerations

  • Unit tests: Mock ChainFire client
  • Integration tests: Real ChainFire server (env-gated)
  • Fallback tests: Disable ChainFire, verify file mode works
  • Lock tests: Concurrent operations, verify atomicity