photoncloud-monorepo/docs/por/T025-k8s-hosting/research.md
centra a7ec7e2158 Add T026 practical test + k8shost to flake + workspace files
- Created T026-practical-test task.yaml for MVP smoke testing
- Added k8shost-server to flake.nix (packages, apps, overlays)
- Staged all workspace directories for nix flake build
- Updated flake.nix shellHook to include k8shost

Resolves: T026.S1 blocker (R8 - nix submodule visibility)
2025-12-09 06:07:50 +09:00

844 lines
39 KiB
Markdown

# K8s Hosting Architecture Research
## Executive Summary
This document evaluates three architecture options for bringing Kubernetes hosting capabilities to PlasmaCloud: k3s-style architecture, k0s-style architecture, and a custom Rust implementation. After analyzing complexity, integration requirements, multi-tenant isolation, development timeline, and production reliability, **we recommend adopting a k3s-style architecture with selective component replacement** as the optimal path to MVP.
The k3s approach provides a battle-tested foundation with full Kubernetes API compatibility, enabling rapid time-to-market (3-4 months to MVP) while allowing strategic integration with PlasmaCloud components through standard interfaces (CNI, CSI, CRI, LoadBalancer controllers). Multi-tenant isolation requirements can be satisfied using namespace separation, RBAC, and network policies. While this approach involves some Go code (k3s itself, containerd), the integration points with PlasmaCloud's Rust components are well-defined through standard Kubernetes interfaces.
---
## Option 1: k3s-style Architecture
### Overview
k3s is a CNCF-certified lightweight Kubernetes distribution packaged as a single <70MB binary. It consolidates all Kubernetes control plane components (API server, scheduler, controller manager, kubelet, kube-proxy) into a single process with a unified binary, dramatically simplifying deployment and operations. Despite its lightweight nature, k3s maintains full Kubernetes API compatibility and supports both single-server and high-availability configurations.
### Key Features
**Single Binary Architecture**
- All control plane components run in a single Server or Agent process
- Containerd handles container lifecycle functions (CRI integration)
- Memory footprint: <512MB for control plane, <50MB for worker nodes
- Fast deployment: typically under 30 seconds
**Flexible Datastore Options**
- SQLite (default): Embedded, zero-configuration, suitable for single-server setups
- Embedded etcd: For high-availability (HA) multi-server deployments
- External datastores: MySQL, PostgreSQL, etcd (via Kine proxy layer)
**Bundled Components**
- **Container Runtime**: containerd (embedded)
- **CNI**: Flannel with VXLAN backend (default, replaceable)
- **Ingress**: Traefik (default, replaceable)
- **Service Load Balancer**: ServiceLB (Klipper-lb, replaceable)
- **DNS**: CoreDNS
- **Helm Controller**: Deploys Helm charts via CRDs
**Component Flexibility**
All embedded components can be disabled, allowing replacement with custom implementations:
```bash
k3s server --disable traefik --disable servicelb --flannel-backend=none
```
### Pros
1. **Rapid Time-to-Market**: Production-ready solution with minimal development effort
2. **Battle-Tested**: Used in thousands of production deployments (e.g., Chick-fil-A's 2000+ edge locations)
3. **Full API Compatibility**: 100% Kubernetes API coverage, certified by CNCF
4. **Low Resource Overhead**: Efficient resource usage suitable for both edge and cloud deployments
5. **Easy Operations**: Single binary simplifies upgrades, patching, and deployment automation
6. **Proven Multi-Tenancy**: Standard Kubernetes namespace/RBAC isolation patterns
7. **Integration Points**: Well-defined interfaces (CNI, CSI, CRI, Service controllers) for custom component integration
8. **Active Ecosystem**: Large community, regular updates, extensive documentation
### Cons
1. **Go Codebase**: k3s and containerd are written in Go, not Rust (potential operational/debugging complexity)
2. **Limited Control**: Core components are opaque; debugging deep issues requires Go expertise
3. **Component Coupling**: While replaceable, default components are tightly integrated
4. **Not Pure Rust**: Doesn't align with PlasmaCloud's Rust-first philosophy
5. **Overhead**: Still carries full Kubernetes complexity internally despite simplified deployment
### Integration Analysis
**PlasmaVMC (Compute Backend)**
- **Approach**: Keep containerd as default CRI for container workloads
- **Alternative**: Develop custom CRI implementation to run Pods as lightweight VMs (Firecracker/KVM)
- **Effort**: High (6-8 weeks for custom CRI); Low (1 week if using containerd)
- **Recommendation**: Start with containerd, consider custom CRI in Phase 2 for VM-based pod isolation
**NovaNET (Pod Networking)**
- **Approach**: Replace Flannel with custom CNI plugin backed by NovaNET
- **Interface**: Standard CNI 1.0.0 specification
- **Implementation**: Rust binary + daemon for pod NIC creation, IPAM, routing via NovaNET SDN
- **Effort**: 4-5 weeks (CNI plugin + NovaNET integration)
- **Benefits**: Unified network control, OVN integration, advanced SDN features
**FlashDNS (Service Discovery)**
- **Approach**: Replace CoreDNS or run as secondary DNS with custom controller
- **Implementation**: K8s controller watches Services/Endpoints, updates FlashDNS records
- **Interface**: Standard K8s informers/client-go (or kube-rs)
- **Effort**: 2-3 weeks (controller + FlashDNS API integration)
- **Benefits**: Pattern-based reverse DNS, unified DNS management
**FiberLB (LoadBalancer Services)**
- **Approach**: Replace ServiceLB with custom LoadBalancer controller
- **Implementation**: K8s controller watches Services (type=LoadBalancer), provisions FiberLB L4/L7 frontends
- **Interface**: Standard Service controller pattern
- **Effort**: 3-4 weeks (controller + FiberLB API integration)
- **Benefits**: Advanced L7 features, unified load balancing
**LightningStor (Persistent Volumes)**
- **Approach**: Develop CSI driver for LightningStor
- **Interface**: CSI 1.x specification (ControllerService + NodeService)
- **Implementation**: Rust CSI driver (gRPC server) + sidecar containers
- **Effort**: 5-6 weeks (CSI driver + volume provisioning/attach/mount logic)
- **Benefits**: Dynamic volume provisioning, snapshots, cloning
**IAM (Authentication/RBAC)**
- **Approach**: K8s webhook authentication + custom authorizer backed by IAM
- **Implementation**: Webhook server validates tokens via IAM, maps users to K8s RBAC roles
- **Interface**: Standard K8s authentication/authorization webhooks
- **Effort**: 3-4 weeks (webhook server + IAM integration + RBAC mapping)
- **Benefits**: Unified identity, PlasmaCloud IAM policies enforced in K8s
### Effort Estimate
**Phase 1: MVP (3-4 months)**
- Week 1-2: k3s deployment, basic cluster setup, testing
- Week 3-6: NovaNET CNI plugin development
- Week 7-9: FiberLB LoadBalancer controller
- Week 10-12: IAM authentication webhook
- Week 13-14: Integration testing, documentation
- Week 15-16: Beta testing, hardening
**Phase 2: Advanced Features (2-3 months)**
- FlashDNS service discovery controller
- LightningStor CSI driver
- Custom CRI for VM-based pods (optional)
- Multi-tenant isolation enhancements
**Total: 5-7 months to production-ready platform**
---
## Option 2: k0s-style Architecture
### Overview
k0s is an open-source, all-inclusive Kubernetes distribution distributed as a single binary but architected with strong component modularity. Unlike k3s's process consolidation, k0s runs components as separate processes supervised by the k0s binary, enabling true control plane/worker separation and flexible component replacement. The k0s approach emphasizes production-grade deployments with enhanced security isolation.
### Key Features
**Modular Component Architecture**
- k0s binary acts as process supervisor for control plane components
- Components run as separate "naked" processes (not containers)
- No kubelet or container runtime on controllers by default
- Workers use containerd (high-level) + runc (low-level) by default
**True Control Plane/Worker Separation**
- Controllers cannot run workloads (no kubelet by default)
- Protects controllers from rogue workloads
- Reduces control plane attack surface
- Workers cannot access etcd directly (security isolation)
**Flexible Component Replacement**
- Each component can be replaced independently
- Clear boundaries between components
- Easier to swap CNI, CSI, or other plugins
- Supports custom infrastructure controllers
**k0smotron Extension**
- Control plane runs on existing cluster
- No direct networking between control/worker planes
- Enhanced multi-tenant isolation
- Suitable for hosted Kubernetes offerings
### Pros
1. **Production-Grade Design**: True control/worker separation enhances security
2. **Component Modularity**: Easier to replace individual components without affecting others
3. **Security Isolation**: Workers cannot access etcd; controllers isolated from workloads
4. **Battle-Tested**: Used in enterprise production environments
5. **Full API Compatibility**: 100% Kubernetes API coverage, CNCF-certified
6. **Clear Boundaries**: Process-level separation simplifies understanding and debugging
7. **Multi-Tenancy Ready**: k0smotron provides excellent hosted K8s architecture
8. **Integration Flexibility**: Modular design makes PlasmaCloud component integration cleaner
### Cons
1. **Go Codebase**: k0s is written in Go (same as k3s)
2. **Higher Resource Usage**: Separate processes consume more memory than k3s's unified approach
3. **Complex Architecture**: Process supervision adds operational complexity
4. **Smaller Community**: Less adoption than k3s, fewer community resources
5. **Not Pure Rust**: Doesn't align with Rust-first philosophy
6. **Learning Curve**: Unique architecture requires understanding k0s-specific patterns
### Integration Analysis
**PlasmaVMC (Compute Backend)**
- **Approach**: Replace containerd with custom CRI or run containerd for containers
- **Benefits**: Modular design makes CRI replacement cleaner than k3s
- **Effort**: 6-8 weeks for custom CRI (similar to k3s)
- **Recommendation**: Modular architecture supports phased CRI replacement
**NovaNET (Pod Networking)**
- **Approach**: Custom CNI plugin (same as k3s)
- **Benefits**: Clean component boundary for CNI integration
- **Effort**: 4-5 weeks (identical to k3s)
- **Advantages**: k0s's modularity makes CNI swap more straightforward
**FlashDNS (Service Discovery)**
- **Approach**: Controller watching Services/Endpoints (same as k3s)
- **Benefits**: Process separation provides clearer integration point
- **Effort**: 2-3 weeks (identical to k3s)
**FiberLB (LoadBalancer Services)**
- **Approach**: Custom LoadBalancer controller (same as k3s)
- **Benefits**: k0s's worker isolation protects FiberLB control plane
- **Effort**: 3-4 weeks (identical to k3s)
**LightningStor (Persistent Volumes)**
- **Approach**: CSI driver (same as k3s)
- **Benefits**: Modular design simplifies CSI deployment
- **Effort**: 5-6 weeks (identical to k3s)
**IAM (Authentication/RBAC)**
- **Approach**: Authentication webhook (same as k3s)
- **Benefits**: Control plane isolation enhances IAM security
- **Effort**: 3-4 weeks (identical to k3s)
### Effort Estimate
**Phase 1: MVP (4-5 months)**
- Week 1-3: k0s deployment, cluster setup, understanding architecture
- Week 4-7: NovaNET CNI plugin development
- Week 8-10: FiberLB LoadBalancer controller
- Week 11-13: IAM authentication webhook
- Week 14-16: Integration testing, documentation
- Week 17-18: Beta testing, hardening
**Phase 2: Advanced Features (2-3 months)**
- FlashDNS service discovery controller
- LightningStor CSI driver
- k0smotron evaluation for multi-tenant isolation
- Custom CRI exploration
**Total: 6-8 months to production-ready platform**
**Note**: Timeline is longer than k3s due to:
- Smaller community (fewer examples/resources)
- More complex architecture requiring deeper understanding
- Less documentation for edge cases
---
## Option 3: Custom Rust Implementation
### Overview
Build a minimal Kubernetes API server and control plane components from scratch in Rust, implementing only essential APIs required for container orchestration. This approach provides maximum control and alignment with PlasmaCloud's Rust-first philosophy but requires significant development effort to reach production readiness.
### Minimal K8s API Subset
**Core APIs (Essential)**
**Core API Group (`/api/v1`)**
- **Namespaces**: Tenant isolation, resource grouping
- **Pods**: Container specifications, lifecycle management
- **Services**: Network service discovery, load balancing
- **ConfigMaps**: Configuration data injection
- **Secrets**: Sensitive data storage
- **PersistentVolumes**: Storage resources
- **PersistentVolumeClaims**: Storage requests
- **Nodes**: Worker node registration and status
- **Events**: Audit trail and debugging
**Apps API Group (`/apis/apps/v1`)**
- **Deployments**: Declarative pod management, rolling updates
- **StatefulSets**: Stateful applications with stable network IDs
- **DaemonSets**: One pod per node (logging, monitoring agents)
**Batch API Group (`/apis/batch/v1`)**
- **Jobs**: Run-to-completion workloads
- **CronJobs**: Scheduled job execution
**RBAC API Group (`/apis/rbac.authorization.k8s.io/v1`)**
- **Roles/RoleBindings**: Namespace-scoped permissions
- **ClusterRoles/ClusterRoleBindings**: Cluster-wide permissions
**Networking API Group (`/apis/networking.k8s.io/v1`)**
- **NetworkPolicies**: Pod-to-pod traffic control
- **Ingress**: HTTP/HTTPS routing (optional for MVP)
**Storage API Group (`/apis/storage.k8s.io/v1`)**
- **StorageClasses**: Dynamic volume provisioning
- **VolumeAttachments**: Volume lifecycle management
**Total Estimate**: ~25-30 API resource types (vs. 50+ in full Kubernetes)
### Architecture Design
**Component Stack**
1. **API Server** (Rust)
- RESTful API endpoint (actix-web/axum)
- Authentication/authorization (IAM integration)
- Admission controllers
- OpenAPI spec generation
- Watch API (WebSocket for resource changes)
2. **Controller Manager** (Rust)
- Deployment controller (replica management)
- Service controller (endpoint management)
- Job controller (batch workload management)
- Built using kube-rs runtime abstractions
3. **Scheduler** (Rust)
- Pod-to-node assignment
- Resource-aware scheduling (CPU, memory, storage)
- Affinity/anti-affinity rules
- Extensible filter/score framework
4. **Kubelet** (Rust or adapt existing)
- Pod lifecycle management on nodes
- CRI client for container runtime (containerd/PlasmaVMC)
- Volume mounting (CSI client)
- Health checks (liveness/readiness probes)
- **Challenge**: Complex component, may need to use existing Go kubelet
5. **Datastore** (FlareDB or etcd)
- Cluster state storage
- Watch API support (real-time change notifications)
- Strong consistency guarantees
- **Option A**: Use FlareDB (Rust, PlasmaCloud-native)
- **Option B**: Use embedded etcd (proven, standard)
6. **Integration Components**
- CNI plugin for NovaNET (same as other options)
- CSI driver for LightningStor (same as other options)
- LoadBalancer controller for FiberLB (same as other options)
**Libraries and Ecosystem**
- **kube-rs**: Kubernetes client library (API bindings, controller runtime)
- **k8s-openapi**: Auto-generated Rust bindings for K8s API types
- **krator**: Operator framework built on kube-rs
- **Krustlet**: Example Kubelet implementation in Rust (WebAssembly focus)
### Pros
1. **Pure Rust**: Full alignment with PlasmaCloud philosophy (memory safety, performance, maintainability)
2. **Maximum Control**: Complete ownership of codebase, no black boxes
3. **Minimal Complexity**: Only implement APIs actually needed, no legacy cruft
4. **Deep Integration**: Native integration with Chainfire, FlareDB, IAM at code level
5. **Optimized for PlasmaCloud**: Architecture tailored to our specific use cases
6. **No Go Dependencies**: Eliminate Go runtime, simplify operations
7. **Learning Experience**: Team gains deep Kubernetes knowledge
8. **Differentiation**: Unique selling point (Rust-native K8s platform)
### Cons
1. **Extreme Development Effort**: 12-18 months to MVP, 24+ months to production-grade
2. **Not Battle-Tested**: Zero production deployments, high risk of bugs
3. **API Compatibility**: Non-standard behavior breaks kubectl, Helm, operators
4. **Ecosystem Compatibility**: Most K8s tools assume full API compliance
5. **Maintenance Burden**: Ongoing effort to maintain, fix bugs, add features
6. **Talent Acquisition**: Hard to hire K8s experts willing to work on custom implementation
7. **Client Tools**: May need custom kubectl/client libraries if APIs diverge
8. **Certification**: No CNCF certification, potential customer concerns
9. **Kubelet Challenge**: Rewriting kubelet is extremely complex (1000s of edge cases)
### Integration Analysis
**PlasmaVMC (Compute Backend)**
- **Approach**: Custom kubelet with native PlasmaVMC integration or CRI interface
- **Benefits**: Deep integration, pods-as-VMs native support
- **Effort**: 10-12 weeks (if using CRI abstraction), 20+ weeks (if custom kubelet)
- **Risk**: High complexity, many edge cases in pod lifecycle
**NovaNET (Pod Networking)**
- **Approach**: Native integration in kubelet or standard CNI plugin
- **Benefits**: Tight coupling possible, eliminate CNI overhead
- **Effort**: 4-5 weeks (CNI plugin), 8-10 weeks (native integration)
- **Recommendation**: Start with CNI for compatibility
**FlashDNS (Service Discovery)**
- **Approach**: Service controller with native FlashDNS API calls
- **Benefits**: Direct integration, no intermediate DNS server
- **Effort**: 3-4 weeks (controller)
- **Advantages**: Tighter integration than CoreDNS replacement
**FiberLB (LoadBalancer Services)**
- **Approach**: Service controller with native FiberLB API calls
- **Benefits**: First-class PlasmaCloud integration
- **Effort**: 3-4 weeks (controller)
- **Advantages**: Native load balancer support
**LightningStor (Persistent Volumes)**
- **Approach**: Native volume plugin or CSI driver
- **Benefits**: Simplified architecture without CSI overhead
- **Effort**: 6-8 weeks (native plugin), 5-6 weeks (CSI driver)
- **Recommendation**: CSI driver for compatibility with K8s ecosystem tools
**IAM (Authentication/RBAC)**
- **Approach**: Native IAM integration in API server authentication layer
- **Benefits**: Zero-hop authentication, unified permissions model
- **Effort**: 2-3 weeks (direct integration vs. webhook)
- **Advantages**: Cleanest IAM integration possible
### Effort Estimate
**Phase 1: Core API Server (6-8 months)**
- Months 1-2: API server framework, authentication, basic CRUD for core resources
- Months 3-4: Controller manager (Deployment, Service, Job controllers)
- Months 5-6: Scheduler (basic resource-aware scheduling)
- Months 7-8: Testing, bug fixing, integration with IAM/FlareDB
**Phase 2: Kubelet and Runtime (6-8 months)**
- Months 9-11: Kubelet implementation (pod lifecycle, CRI client)
- Months 12-13: CNI integration (NovaNET plugin)
- Months 14-15: Volume management (CSI or native LightningStor)
- Months 16: Testing, bug fixing
**Phase 3: Production Hardening (6-8 months)**
- Months 17-19: LoadBalancer controller, DNS controller
- Months 20-21: Advanced features (StatefulSets, DaemonSets, CronJobs)
- Months 22-24: Production testing, performance tuning, edge case handling
**Total: 18-24 months to production-ready platform**
**Risk Factors**
- Kubelet complexity may extend timeline by 3-6 months
- API compatibility issues may require rework
- Performance optimization may take longer than expected
- Production bugs will require ongoing maintenance team
---
## Integration Points
### PlasmaVMC (Compute)
**Common Approach Across Options**
- Use Container Runtime Interface (CRI) for abstraction
- containerd as default runtime (mature, battle-tested)
- Phase 2: Custom CRI implementation for VM-based pods
**CRI Integration Details**
- **Interface**: gRPC protocol (RuntimeService + ImageService)
- **Operations**: RunPodSandbox, CreateContainer, StartContainer, StopContainer, etc.
- **PlasmaVMC Adapter**: Translate CRI calls to PlasmaVMC API (Firecracker/KVM)
- **Benefits**: Pod-level isolation via VMs, stronger security boundaries
**Implementation Options**
1. **Containerd (Low Risk)**: Use as-is, defer VM integration
2. **CRI-PlasmaVMC (Medium Risk)**: Custom CRI shim, pods run as lightweight VMs
3. **Native Integration (High Risk, Custom Implementation Only)**: Direct kubelet-PlasmaVMC coupling
### NovaNET (Networking)
**CNI Plugin Approach (Recommended)**
- **Interface**: CNI 1.0.0 specification (JSON-based stdin/stdout protocol)
- **Components**:
- CNI binary (Rust): Creates pod veth pairs, assigns IPs, configures routing
- CNI daemon (Rust): Manages node-level networking, integrates with NovaNET API
- **NovaNET Integration**: Daemon syncs pod network configs to NovaNET SDN controller
- **Features**: VXLAN overlays, OVN integration, security groups, network policies
**Implementation Steps**
1. Implement CNI ADD/DEL/CHECK operations (pod lifecycle)
2. IPAM (IP address management) via NovaNET or local allocation
3. Routing table updates for pod reachability
4. Network policy enforcement (optional: eBPF for performance)
**Benefits**
- Unified network management across PlasmaCloud
- Leverage OVN capabilities for advanced networking
- Standard interface (works with any K8s distribution)
### FlashDNS (Service Discovery)
**Controller Approach (Recommended)**
- **Interface**: Kubernetes Informer API (watch Services, Endpoints)
- **Implementation**: Rust controller using kube-rs
- **Logic**:
1. Watch Service objects for changes
2. Watch Endpoints objects (backend pod IPs)
3. Update FlashDNS records: `<service>.<namespace>.svc.cluster.local` pod IPs
4. Support pattern-based reverse DNS lookups
**Deployment Options**
1. **Replace CoreDNS**: FlashDNS becomes authoritative DNS for cluster
2. **Secondary DNS**: CoreDNS delegates to FlashDNS, fallback for external queries
3. **Hybrid**: CoreDNS for K8s-standard queries, FlashDNS for PlasmaCloud-specific patterns
**Benefits**
- Unified DNS management (PlasmaCloud VMs + K8s Services)
- Pattern-based reverse DNS for debugging
- Reduced DNS server overhead
### FiberLB (Load Balancing)
**Controller Approach (Recommended)**
- **Interface**: Kubernetes Informer API (watch Services type=LoadBalancer)
- **Implementation**: Rust controller using kube-rs
- **Logic**:
1. Watch Service objects with `type: LoadBalancer`
2. Provision FiberLB L4 or L7 load balancer
3. Assign external IP, configure backend pool (pod IPs from Endpoints)
4. Update Service `.status.loadBalancer.ingress` with assigned IP
5. Handle updates (backend changes, health checks)
**Features**
- L4 (TCP/UDP) load balancing for standard Services
- L7 (HTTP/HTTPS) load balancing with Ingress integration (optional)
- Health checks (TCP/HTTP probes)
- SSL termination, session affinity
**Benefits**
- Unified load balancing across PlasmaCloud
- Advanced L7 features unavailable in default ServiceLB/Traefik
- Native integration with PlasmaCloud networking
### LightningStor (Storage)
**CSI Driver Approach (Recommended)**
- **Interface**: CSI 1.x specification (gRPC: ControllerService + NodeService + IdentityService)
- **Components**:
- **Controller Plugin**: Runs on control plane, handles CreateVolume, DeleteVolume, ControllerPublishVolume
- **Node Plugin**: Runs on each worker, handles NodeStageVolume, NodePublishVolume (mount operations)
- **Sidecar Containers**: external-provisioner, external-attacher, node-driver-registrar (standard K8s components)
**Implementation Steps**
1. IdentityService: Driver name, capabilities
2. ControllerService: Volume CRUD operations (LightningStor API calls)
3. NodeService: Volume attach/mount on worker nodes (iSCSI or NBD)
4. StorageClass configuration: Parameters for LightningStor (replication, performance tier)
**Features**
- Dynamic provisioning (PVCs automatically create volumes)
- Volume snapshots
- Volume cloning
- Resize support (expand PVCs)
**Benefits**
- Standard interface (works with any K8s distribution)
- Ecosystem compatibility (backup tools, operators that use PVCs)
- Unified storage management
### IAM (Authentication/RBAC)
**Webhook Approach (k3s/k0s)**
- **Interface**: Kubernetes authentication/authorization webhooks (HTTPS POST)
- **Implementation**: Rust webhook server
- **Authentication Flow**:
1. kubectl sends request with Bearer token to K8s API server
2. API server forwards token to IAM webhook
3. Webhook validates token via IAM, returns UserInfo (username, groups, UID)
4. API server uses UserInfo for RBAC checks
**Authorization Integration (Optional)**
- **Webhook**: API server sends SubjectAccessReview to IAM
- **Logic**: IAM evaluates PlasmaCloud policies, returns Allowed/Denied
- **Benefits**: Unified policy enforcement across PlasmaCloud + K8s
**RBAC Mapping**
- Map PlasmaCloud IAM roles to K8s RBAC roles
- Synchronize permissions via controller
- Example: `plasmacloud:project:admin` K8s `ClusterRole: admin`
**Native Integration (Custom Implementation)**
- Directly integrate IAM into API server authentication layer
- Zero-hop authentication (no webhook latency)
- Unified permissions model (single source of truth)
**Benefits**
- Unified identity management
- PlasmaCloud IAM policies enforced in K8s
- Simplified user experience (single login)
---
## Decision Matrix
| Criteria | k3s-style | k0s-style | Custom Rust | Weight |
|----------|-----------|-----------|-------------|--------|
| **Time to MVP** | 3-4 months ⭐⭐⭐⭐⭐ | 4-5 months ⭐⭐⭐⭐ | 18-24 months | 25% |
| **Production Reliability** | Battle-tested ⭐⭐⭐⭐⭐ | Battle-tested ⭐⭐⭐⭐⭐ | Untested | 20% |
| **Integration Difficulty** | Standard interfaces ⭐⭐⭐⭐ | Standard interfaces ⭐⭐⭐⭐⭐ | Native integration ⭐⭐⭐⭐⭐ | 15% |
| **Multi-Tenant Isolation** | K8s standard ⭐⭐⭐⭐ | Enhanced (k0smotron) ⭐⭐⭐⭐⭐ | Custom (flexible) ⭐⭐⭐⭐ | 15% |
| **Complexity vs Control** | Low complexity, less control ⭐⭐⭐ | Medium complexity, medium control ⭐⭐⭐⭐ | High complexity, full control ⭐⭐⭐⭐⭐ | 10% |
| **Rust Alignment** | Go codebase | Go codebase | Pure Rust ⭐⭐⭐⭐⭐ | 5% |
| **API Compatibility** | 100% K8s API ⭐⭐⭐⭐⭐ | 100% K8s API ⭐⭐⭐⭐⭐ | Partial API ⭐⭐ | 5% |
| **Maintenance Burden** | Low (upstream updates) ⭐⭐⭐⭐⭐ | Low (upstream updates) ⭐⭐⭐⭐⭐ | High (full ownership) | 5% |
| **Weighted Score** | **4.25** | **4.30** | **2.15** | **100%** |
**Scoring**: (1) = Poor, ⭐⭐ (2) = Fair, ⭐⭐⭐ (3) = Good, ⭐⭐⭐⭐ (4) = Very Good, ⭐⭐⭐⭐⭐ (5) = Excellent
### Detailed Analysis
**Time to MVP (25% weight)**
- k3s wins with fastest path to market (3-4 months)
- k0s slightly slower due to smaller community and more complex architecture
- Custom implementation requires 18-24 months, unacceptable for MVP
**Production Reliability (20% weight)**
- Both k3s and k0s are battle-tested with thousands of production deployments
- Custom implementation has zero production track record, high risk
**Integration Difficulty (15% weight)**
- k0s edges ahead with cleaner modular boundaries
- Both k3s/k0s use standard interfaces (CNI, CSI, CRI, webhooks)
- Custom implementation allows native integration but requires building everything
**Multi-Tenant Isolation (15% weight)**
- k0s excels with k0smotron architecture (true control/worker plane separation)
- k3s provides standard K8s namespace/RBAC isolation (sufficient for most use cases)
- Custom implementation offers flexibility but requires building isolation mechanisms
**Complexity vs Control (10% weight)**
- Custom implementation offers maximum control but extreme complexity
- k0s provides good balance with modular architecture
- k3s prioritizes simplicity over control
**Rust Alignment (5% weight)**
- Only custom implementation aligns with Rust-first philosophy
- Both k3s and k0s are Go-based (operational impact minimal with standard interfaces)
**API Compatibility (5% weight)**
- k3s and k0s provide 100% K8s API compatibility (ecosystem compatibility)
- Custom implementation likely has gaps (breaks kubectl, Helm, operators)
**Maintenance Burden (5% weight)**
- k3s and k0s receive upstream updates, security patches
- Custom implementation requires dedicated maintenance team
---
## Recommendation
**We recommend adopting a k3s-style architecture with selective component replacement as the optimal path to MVP.**
### Primary Recommendation: k3s-style Architecture
**Rationale**
1. **Fastest Time to Market**: 3-4 months to MVP vs. 4-5 months (k0s) or 18-24 months (custom)
2. **Proven Reliability**: Battle-tested in thousands of production deployments, including large-scale edge deployments
3. **Full API Compatibility**: 100% Kubernetes API coverage ensures ecosystem compatibility (kubectl, Helm, operators, monitoring tools)
4. **Low Risk**: Mature codebase with active community and regular security updates
5. **Clean Integration Points**: Standard interfaces (CNI, CSI, CRI, webhooks) allow PlasmaCloud component integration without forking k3s
6. **Acceptable Trade-offs**:
- Go codebase is acceptable given integration happens via standard interfaces
- Operations team doesn't need deep k3s internals knowledge for day-to-day tasks
- Debugging deep issues is rare with mature software
**Implementation Strategy**
**Phase 1: MVP (3-4 months)**
1. Deploy k3s with default components (containerd, Flannel, CoreDNS, Traefik)
2. Develop and deploy NovaNET CNI plugin (replace Flannel)
3. Develop and deploy FiberLB LoadBalancer controller (replace ServiceLB)
4. Develop and deploy IAM authentication webhook
5. Multi-tenant isolation: namespace separation + RBAC + network policies
6. Testing and documentation
**Phase 2: Production Hardening (2-3 months)**
7. Develop and deploy FlashDNS service discovery controller
8. Develop and deploy LightningStor CSI driver
9. HA setup with embedded etcd (multi-master)
10. Monitoring and logging integration
11. Production testing and performance tuning
**Phase 3: Advanced Features (3-4 months, optional)**
12. Custom CRI implementation for VM-based pods (integrate PlasmaVMC)
13. Enhanced multi-tenant isolation (dedicated control planes via vcluster or similar)
14. Advanced networking features (BGP, network policies)
15. Disaster recovery and backup
**Component Replacement Strategy**
| Component | Default (k3s) | PlasmaCloud Replacement | Timeline |
|-----------|---------------|-------------------------|----------|
| Container Runtime | containerd | Keep (or custom CRI Phase 3) | Phase 1 / Phase 3 |
| CNI | Flannel | NovaNET CNI plugin | Phase 1 (Week 3-6) |
| DNS | CoreDNS | FlashDNS controller | Phase 2 (Week 17-19) |
| Load Balancer | ServiceLB | FiberLB controller | Phase 1 (Week 7-9) |
| Storage | local-path | LightningStor CSI driver | Phase 2 (Week 20-22) |
| Auth/RBAC | Static tokens | IAM webhook | Phase 1 (Week 10-12) |
**Multi-Tenant Isolation Strategy**
1. **Namespace Isolation**: Each tenant gets dedicated namespace(s)
2. **RBAC**: Roles/RoleBindings restrict cross-tenant access
3. **Network Policies**: Block pod-to-pod communication across tenants
4. **Resource Quotas**: Prevent resource monopolization
5. **Pod Security Standards**: Enforce security baselines per tenant
6. **Monitoring**: Tenant-level metrics and logging with filtering
**Risks and Mitigations**
| Risk | Mitigation |
|------|------------|
| Go codebase (not Rust) | Use standard interfaces, minimize deep k3s interactions |
| Limited control over core | Fork only if absolutely necessary, contribute upstream when possible |
| Multi-tenant isolation gaps | Layer multiple isolation mechanisms (namespace + RBAC + NetworkPolicy) |
| Vendor lock-in to Rancher | k3s is open-source (Apache 2.0), can fork if needed |
### Alternative Recommendation: k0s-style Architecture
**If the following conditions apply, consider k0s instead:**
1. **Enhanced security isolation is critical**: k0smotron provides true control/worker plane separation
2. **Timeline flexibility**: 4-5 months to MVP is acceptable
3. **Future-proofing**: Modular architecture simplifies component replacement in Phase 3+
4. **Hosted K8s offering**: k0smotron architecture is ideal for multi-tenant hosted Kubernetes
**Trade-offs vs. k3s**:
- Slower time to market (+1-2 months)
- Smaller community (fewer resources for troubleshooting)
- More complex architecture (higher learning curve)
- Better modularity (easier component replacement)
### Why Not Custom Rust Implementation?
**Reject for MVP**, consider for long-term differentiation:
1. **Timeline unacceptable**: 18-24 months to production-ready vs. 3-4 months (k3s)
2. **High risk**: Zero production deployments, unknown bugs, maintenance burden
3. **Ecosystem incompatibility**: Partial K8s API breaks kubectl, Helm, operators
4. **Talent challenges**: Hard to hire K8s experts for custom implementation
5. **Opportunity cost**: Engineering effort better spent on PlasmaCloud differentiators
**Reconsider if:**
- Unique requirements that k3s/k0s cannot satisfy (unlikely given standard interfaces)
- Long-term competitive advantage requires Rust-native K8s (2-3 year horizon)
- Team has deep K8s internals expertise (kubelet, scheduler, controller-manager)
**Compromise approach:**
- Start with k3s for MVP
- Gradually replace components with Rust implementations (CNI, CSI, controllers)
- Evaluate custom API server in Year 2-3 if strategic value is clear
---
## Next Steps
### If Recommendation Accepted (k3s-style Architecture)
**Step 2 (S2): Architecture Design Document**
- Detailed PlasmaCloud K8s architecture diagram
- Component interaction flows (API server IAM, kubelet PlasmaVMC, etc.)
- Data flow diagrams (pod creation, service routing, volume provisioning)
- Network architecture (pod networking, service networking, ingress)
- Security architecture (authentication, authorization, network policies)
- High-availability design (multi-master, etcd, load balancing)
**Step 3 (S3): CNI Plugin Design**
- NovaNET CNI plugin specification
- CNI binary interface (ADD/DEL/CHECK operations)
- CNI daemon architecture (node networking, OVN integration)
- IPAM strategy (NovaNET-based or local allocation)
- Network policy enforcement approach (eBPF or iptables)
- Testing plan (unit tests, integration tests with k3s)
**Step 4 (S4): LoadBalancer Controller Design**
- FiberLB controller specification
- Service watch logic (Informer pattern)
- FiberLB provisioning API integration
- Health check configuration
- L4 vs. L7 decision criteria
- Testing plan
**Step 5 (S5): IAM Integration Design**
- Authentication webhook specification
- Token validation flow (IAM API calls)
- UserInfo mapping (IAM roles K8s RBAC)
- Authorization webhook (optional, future)
- RBAC synchronization controller (optional)
- Testing plan
**Step 6 (S6): Implementation Roadmap**
- Week-by-week breakdown of Phase 1 work
- Team assignments (who builds CNI, LoadBalancer controller, IAM webhook)
- Milestone definitions (what constitutes MVP, beta, GA)
- Testing strategy (unit, integration, end-to-end, chaos)
- Documentation plan (user docs, operator docs, developer docs)
- Go/no-go criteria for production launch
### Research Validation Tasks
Before proceeding to S2, validate the following:
1. **k3s Component Replacement**: Deploy k3s cluster, disable Flannel, test custom CNI plugin replacement
2. **LoadBalancer Controller**: Deploy sample controller, watch Services, verify lifecycle
3. **Authentication Webhook**: Deploy test webhook server, configure k3s API server, verify token flow
4. **Multi-Tenancy**: Create namespaces, RBAC roles, NetworkPolicies; test isolation
5. **Integration Testing**: Verify k3s works with PlasmaCloud network environment
**Timeline**: 1-2 weeks for validation tasks
---
## References
### k3s Architecture
- [K3s Architecture Documentation](https://docs.k3s.io/architecture)
- [K3s GitHub Repository](https://github.com/k3s-io/k3s)
- [What is K3s and How is it Different from K8s? | Traefik Labs](https://traefik.io/glossary/k3s-explained)
- [K3s Cluster Datastore Options](https://docs.k3s.io/datastore)
- [Lightweight and powerful: K3s at a glance - NETWAYS](https://nws.netways.de/en/blog/2025/01/16/lightweight-and-powerful-k3s-at-a-glance/)
### k0s Architecture
- [k0s Architecture Documentation](https://docs.k0sproject.io/v1.28.2+k0s.0/architecture/)
- [k0s GitHub Repository](https://github.com/k0sproject/k0s)
- [Understanding k0s: a lightweight Kubernetes distribution | CNCF](https://www.cncf.io/blog/2024/12/06/understanding-k0s-a-lightweight-kubernetes-distribution-for-the-community/)
- [k0s vs k3s Comparison Chart | Mirantis](https://www.mirantis.com/resources/k0s-vs-k3s-comparison-chart/)
### Comparisons
- [Comparing K0s vs K3s vs K8s: Key Differences & Use Cases](https://cloudavocado.com/blog/comparing-k0s-vs-k3s-vs-k8s-key-differences-ideal-use-cases/)
- [K0s Vs. K3s Vs. K8s: The Differences And Use Cases | nOps](https://www.nops.io/blog/k0s-vs-k3s-vs-k8s/)
- [Lightweight Kubernetes Distributions: Performance Comparison (ACM 2023)](https://dl.acm.org/doi/abs/10.1145/3578244.3583737)
### Kubernetes APIs
- [Kubernetes API Concepts](https://kubernetes.io/docs/reference/using-api/api-concepts/)
- [The Kubernetes API](https://kubernetes.io/docs/concepts/overview/kubernetes-api/)
- [Minimal API Server Investigation](https://docs.kcp.io/kcp/v0.26/developers/investigations/minimal-api-server/)
### CNI Integration
- [Kubernetes Network Plugins](https://kubernetes.io/docs/concepts/extend-kubernetes/compute-storage-net/network-plugins/)
- [Container Network Interface (CNI) Specification](https://www.cni.dev/docs/)
- [Kubernetes CNI: The Ultimate Guide (2025)](https://www.plural.sh/blog/kubernetes-cni-guide/)
- [CNI GitHub Repository](https://github.com/containernetworking/cni)
### CSI Integration
- [Container Storage Interface (CSI) for Kubernetes GA](https://kubernetes.io/blog/2019/01/15/container-storage-interface-ga/)
- [Kubernetes CSI: Basics and How to Build a CSI Driver](https://bluexp.netapp.com/blog/cvo-blg-kubernetes-csi-basics-of-csi-volumes-and-how-to-build-a-csi-driver)
- [Kubernetes Persistent Volumes](https://kubernetes.io/docs/concepts/storage/persistent-volumes/)
- [CSI Developer Documentation](https://kubernetes-csi.github.io/docs/drivers.html)
### CRI Integration
- [Kubernetes Container Runtimes](https://kubernetes.io/docs/setup/production-environment/container-runtimes/)
- [Container Runtime Interface (CRI)](https://kubernetes.io/docs/concepts/architecture/cri/)
- [Kubernetes Containerd Integration Goes GA](https://kubernetes.io/blog/2018/05/24/kubernetes-containerd-integration-goes-ga/)
### Rust Kubernetes Ecosystem
- [kube-rs: Rust Kubernetes Client and Controller Runtime](https://github.com/kube-rs/kube)
- [Rust and Kubernetes: A Match Made in Heaven](https://collabnix.com/rust-and-kubernetes-a-match-made-in-heaven/)
- [Write Your Next Kubernetes Controller in Rust](https://kty.dev/blog/2024-09-30-use-kube-rs)
- [Using Kubernetes with Rust | Shuttle](https://www.shuttle.dev/blog/2024/10/22/using-kubernetes-with-rust)
### Multi-Tenancy
- [Kubernetes Multi-tenancy](https://kubernetes.io/docs/concepts/security/multi-tenancy/)
- [Kubernetes Multi-Tenancy: Implementation Guide (2025)](https://atmosly.com/blog/kubernetes-multi-tenancy-complete-implementation-guide-2025/)
- [Best Practices for Isolation in K8s Multi-Tenant Environments](https://www.vcluster.com/blog/best-practices-for-achieving-isolation-in-kubernetes-multi-tenant-environments)
- [Kubernetes Multi-Tenancy: Three Key Approaches](https://www.spectrocloud.com/blog/kubernetes-multi-tenancy-three-key-approaches)
---
**Document Version**: 1.0
**Last Updated**: 2025-12-09
**Author**: PlasmaCloud Architecture Team
**Status**: For Review