photoncloud-monorepo/T003-architectural-gap-analysis.md
centra 5c6eb04a46 T036: Add VM cluster deployment configs for nixos-anywhere
- netboot-base.nix with SSH key auth
- Launch scripts for node01/02/03
- Node configuration.nix and disko.nix
- Nix modules for first-boot automation

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2025-12-11 09:59:19 +09:00

3.3 KiB

Architectural Gap Analysis: Compute & Core

Date: 2025-12-08 Scope: Core Infrastructure (Chainfire, IAM, FlareDB) & Application Services (FlashDNS, PlasmaVMC)

Executive Summary

The platform's core infrastructure ("Data" and "Identity" pillars) is in excellent shape, with implementation matching specifications closely. However, the "Compute" pillar (PlasmaVMC) exhibits a significant architectural deviation from its specification, currently existing as a monolithic prototype rather than the specified distributed control plane/agent model.

Component Status Matrix

Component Role Specification Status Implementation Status Alignment
Chainfire Cluster KVS High High Strong
Aegis (IAM) Identity High High Strong
FlareDB DBaaS KVS High High Strong
FlashDNS DNS Service High High Strong
PlasmaVMC VM Platform High Low / Prototype Mismatch

Detailed Findings

1. Core Infrastructure (Chainfire, Aegis, FlareDB)

  • Chainfire: Fully implemented crate structure. Detailed feature gap analysis exists (chainfire_t003_gap_analysis.md).
  • Aegis: Correctly structured with iam-server, iam-authn, iam-authz, etc. Integration with Chainfire/FlareDB backends is present in main.rs.
  • FlareDB: Correctly structured with flaredb-pd, flaredb-server (Multi-Raft), and reserved namespaces for IAM/Metrics.

2. Application Services (FlashDNS)

  • Status: Excellent.
  • Evidence: Crate structure matches spec. Integration with Chainfire (storage) and Aegis (auth) is visible in configuration and code.

3. Compute Platform (PlasmaVMC) - The Gap

  • Specification: Describes a distributed system with:
    • Control Plane: API, Scheduler, Image management.
    • Agent: Runs on compute nodes, manages local hypervisors.
    • Communication: gRPC between Control Plane and Agent.
  • Current Implementation: Monolithic plasmavmc-server.
    • The server binary directly initializes HypervisorRegistry and registers KvmBackend/FireCrackerBackend.
    • Missing Crates:
      • plasmavmc-agent (Critical)
      • plasmavmc-client
      • plasmavmc-core (Scheduler logic)
    • Implication: The current code cannot support multi-node deployment or scheduling. It effectively runs the control plane on the hypervisor node.

Recommendations

  1. Prioritize PlasmaVMC Refactoring: The immediate engineering focus should be to split plasmavmc-server into:
    • plasmavmc-server (Control Plane, Scheduler, API)
    • plasmavmc-agent (Node status, Hypervisor control)
  2. Implement Agent Protocol: Define the gRPC interface between Server and Agent (agent.proto mentioned in spec but possibly missing or unused).
  3. Leverage Existing Foundation: The plasmavmc-hypervisor trait is solid. The agent implementation should simply wrap this existing trait, making the refactor straightforward.

Conclusion

The project foundation is solid. The "Data" and "Identity" layers are ready for higher-level integration. The "Compute" layer requires architectural realignment to meet the distributed design goals.