photoncloud-monorepo/docs/por/T025-k8s-hosting/spec.md

# K8s Hosting Specification

## Overview

PlasmaCloud's K8s Hosting service provides managed Kubernetes clusters for multi-tenant container orchestration. This specification defines a k3s-based architecture that integrates deeply with existing PlasmaCloud infrastructure components: PrismNET for networking, FiberLB for load balancing, IAM for authentication/authorization, FlashDNS for service discovery, and LightningStor for persistent storage.

### Purpose

Enable customers to deploy and manage containerized workloads using standard Kubernetes APIs while benefiting from PlasmaCloud's integrated infrastructure services. The system provides:

- **Standard K8s API compatibility**: Use kubectl, Helm, and existing K8s tooling
- **Multi-tenant isolation**: Project-based namespaces with IAM-backed RBAC
- **Deep integration**: Leverage PrismNET SDN, FiberLB load balancing, LightningStor block storage
- **Production-ready**: HA control plane, automated failover, comprehensive monitoring

### Scope

**Phase 1 (MVP, 3-4 months):**
- Core K8s APIs (Pods, Services, Deployments, ReplicaSets, Namespaces, ConfigMaps, Secrets)
- LoadBalancer services via FiberLB
- Persistent storage via LightningStor CSI
- IAM authentication and RBAC
- PrismNET CNI for pod networking
- FlashDNS service discovery

**Future Phases:**
- PlasmaVMC integration for VM-backed pods (enhanced isolation)
- StatefulSets, DaemonSets, Jobs/CronJobs
- Network policies with PrismNET enforcement
- Horizontal Pod Autoscaler
- FlareDB as k3s datastore

### Architecture Decision Summary

**Base Technology: k3s**
- Lightweight K8s distribution (single binary, minimal dependencies)
- Production-proven (CNCF certified, widely deployed)
- Flexible architecture allowing component replacement
- Embedded SQLite (single-server) or etcd (HA cluster)
- 3-4 month timeline achievable

**Component Replacement Strategy:**
- **Disable**: servicelb (replaced by FiberLB), traefik (use FiberLB), flannel (replaced by PrismNET)
- **Keep**: kube-apiserver, kube-scheduler, kube-controller-manager, kubelet, containerd
- **Add**: Custom controllers for FiberLB, FlashDNS, IAM webhook, LightningStor CSI, PrismNET CNI

## Architecture

### Base: k3s with Selective Component Replacement

**k3s Core (Keep):**
- **kube-apiserver**: K8s REST API server with IAM webhook authentication
- **kube-scheduler**: Pod scheduling with resource awareness
- **kube-controller-manager**: Core controllers (replication, endpoints, service accounts, etc.)
- **kubelet**: Node agent managing pod lifecycle via containerd CRI
- **containerd**: Container runtime (Phase 1), later replaceable by PlasmaVMC CRI
- **kube-proxy**: Service networking (iptables/ipvs mode)

**k3s Components (Disable):**
- **servicelb**: Default LoadBalancer implementation → Replaced by FiberLB controller
- **traefik**: Ingress controller → Replaced by FiberLB L7 capabilities
- **flannel**: CNI plugin → Replaced by PrismNET CNI
- **local-path-provisioner**: Storage provisioner → Replaced by LightningStor CSI

**PlasmaCloud Custom Components (Add):**
- **PrismNET CNI Plugin**: Pod networking via OVN logical switches
- **FiberLB Controller**: LoadBalancer service reconciliation
- **IAM Webhook Server**: Token validation and user mapping
- **FlashDNS Controller**: Service DNS record synchronization
- **LightningStor CSI Driver**: PersistentVolume provisioning and attachment

### Component Topology

```
┌─────────────────────────────────────────────────────────────┐
│                     k3s Control Plane                       │
│  ┌──────────────┐  ┌────────────┐  ┌──────────────────┐   │
│  │ kube-apiserver│◄─┤ IAM Webhook├──┤ IAM Service      │   │
│  │              │  │            │  │ (Authentication) │   │
│  └──────┬───────┘  └────────────┘  └──────────────────┘   │
│         │                                                   │
│  ┌──────▼───────┐  ┌──────────────┐  ┌────────────────┐   │
│  │kube-scheduler│  │kube-controller│  │ etcd/SQLite    │   │
│  │              │  │   -manager    │  │  (Datastore)   │   │
│  └──────────────┘  └──────────────┘  └────────────────┘   │
└─────────────────────────────────────────────────────────────┘
                           │
        ┌──────────────────┼──────────────────┐
        │                  │                  │
┌───────▼───────┐  ┌───────▼───────┐  ┌──────▼──────┐
│ FiberLB       │  │ FlashDNS      │  │ LightningStor│
│ Controller    │  │ Controller    │  │ CSI Plugin   │
│ (Watch Svcs)  │  │ (Sync DNS)    │  │ (Provision)  │
└───────┬───────┘  └───────┬───────┘  └──────┬───────┘
        │                  │                  │
        ▼                  ▼                  ▼
┌──────────────┐  ┌──────────────┐  ┌────────────────┐
│ FiberLB      │  │ FlashDNS     │  │ LightningStor  │
│ gRPC API     │  │ gRPC API     │  │ gRPC API       │
└──────────────┘  └──────────────┘  └────────────────┘

┌─────────────────────────────────────────────────────────────┐
│                      k3s Worker Nodes                        │
│  ┌──────────────┐  ┌────────────┐  ┌──────────────────┐   │
│  │  kubelet     │◄─┤containerd  ├──┤ Pods (containers)│   │
│  │              │  │    CRI     │  │                  │   │
│  └──────┬───────┘  └────────────┘  └──────────────────┘   │
│         │                                                   │
│  ┌──────▼───────┐  ┌──────────────┐                        │
│  │ PrismNET CNI  │◄─┤ kube-proxy   │                        │
│  │ (Pod Network)│  │ (Service Net)│                        │
│  └──────┬───────┘  └──────────────┘                        │
│         │                                                   │
│         ▼                                                   │
│  ┌──────────────┐                                          │
│  │ PrismNET OVN  │                                          │
│  │ (ovs-vswitchd)│                                         │
│  └──────────────┘                                          │
└─────────────────────────────────────────────────────────────┘
```

### Data Flow Examples

**1. Pod Creation:**
```
kubectl create pod → kube-apiserver (IAM auth) → scheduler → kubelet → containerd
                                                                  ↓
                                                            PrismNET CNI
                                                                  ↓
                                                          OVN logical port
```

**2. LoadBalancer Service:**
```
kubectl expose → kube-apiserver → Service created → FiberLB controller watches
                                                           ↓
                                                   FiberLB gRPC API
                                                           ↓
                                               External IP + L4 forwarding
```

**3. PersistentVolume:**
```
PVC created → kube-apiserver → CSI controller → LightningStor CSI driver
                                                         ↓
                                                 LightningStor gRPC
                                                         ↓
                                                   Volume created
                                                         ↓
                                               kubelet → CSI node plugin
                                                         ↓
                                                   Mount to pod
```

## K8s API Subset

### Phase 1: Core APIs (Essential)

**Pods (v1):**
- Full CRUD operations (create, get, list, update, delete, patch)
- Watch API for real-time updates
- Logs streaming (`kubectl logs -f`)
- Exec into containers (`kubectl exec`)
- Port forwarding (`kubectl port-forward`)
- Status: Phase (Pending, Running, Succeeded, Failed), conditions, container states

**Services (v1):**
- **ClusterIP**: Internal cluster networking (default)
- **LoadBalancer**: External access via FiberLB
- **Headless**: StatefulSet support (clusterIP: None)
- Service discovery via FlashDNS
- Endpoint slices for large service backends

**Deployments (apps/v1):**
- Declarative desired state (replicas, pod template)
- Rolling updates with configurable strategy (maxSurge, maxUnavailable)
- Rollback to previous revision
- Pause/resume for canary deployments
- Scaling (manual in Phase 1)

**ReplicaSets (apps/v1):**
- Pod replication with label selectors
- Owned by Deployments (rarely created directly)
- Orphan/adopt pod ownership

**Namespaces (v1):**
- Tenant isolation (one namespace per project)
- Resource quota enforcement
- Network policy scope (Phase 2)
- RBAC scope

**ConfigMaps (v1):**
- Non-sensitive configuration data
- Mount as volumes or environment variables
- Update triggers pod restarts (via annotation)

**Secrets (v1):**
- Sensitive data (passwords, tokens, certificates)
- Base64 encoded in etcd (at-rest encryption in future phase)
- Mount as volumes or environment variables
- Service account tokens

**Nodes (v1):**
- Node registration via kubelet
- Heartbeat and status reporting
- Capacity and allocatable resources
- Labels and taints for scheduling

**Events (v1):**
- Audit trail of cluster activities
- Retention policy (1 hour in-memory, longer in etcd)
- Debugging and troubleshooting

### Phase 2: Storage & Config (Required for MVP)

**PersistentVolumes (v1):**
- Volume lifecycle independent of pods
- Access modes: ReadWriteOnce, ReadOnlyMany, ReadWriteMany (LightningStor support)
- Reclaim policy: Retain, Delete
- Status: Available, Bound, Released, Failed

**PersistentVolumeClaims (v1):**
- User request for storage
- Binding to PVs by storage class, capacity, access mode
- Volume expansion (if storage class allows)

**StorageClasses (storage.k8s.io/v1):**
- Dynamic provisioning via LightningStor CSI
- Parameters: volume type (ssd, hdd), replication factor, org_id, project_id
- Volume binding mode: Immediate or WaitForFirstConsumer

### Phase 3: Advanced (Post-MVP)

**StatefulSets (apps/v1):**
- Ordered pod creation/deletion
- Stable network identities (pod-0, pod-1, ...)
- Persistent storage per pod via volumeClaimTemplates
- Use case: Databases, distributed systems

**DaemonSets (apps/v1):**
- One pod per node (e.g., log collectors, monitoring agents)
- Node selector and tolerations

**Jobs (batch/v1):**
- Run-to-completion workloads
- Parallelism and completions
- Retry policy

**CronJobs (batch/v1):**
- Scheduled jobs (cron syntax)
- Concurrency policy

**NetworkPolicies (networking.k8s.io/v1):**
- Ingress and egress rules
- Label-based pod selection
- Namespace selectors
- Requires PrismNET CNI support for OVN ACL translation

**Ingress (networking.k8s.io/v1):**
- HTTP/HTTPS routing via FiberLB L7
- Host-based and path-based routing
- TLS termination

### Deferred APIs (Not in MVP)

- HorizontalPodAutoscaler (autoscaling/v2): Requires metrics-server
- VerticalPodAutoscaler: Complex, low priority
- PodDisruptionBudget: Useful for HA, but post-MVP
- LimitRange: Resource limits per namespace (future)
- ResourceQuota: Supported in Phase 1, but advanced features deferred
- CustomResourceDefinitions (CRDs): Framework exists, but no custom resources in Phase 1
- APIService: Aggregation layer not needed initially

## Integration Specifications

### 1. PrismNET CNI Plugin

**Purpose:** Provide pod networking using PrismNET's OVN-based SDN.

**Interface:** CNI 1.0.0 specification (https://github.com/containernetworking/cni/blob/main/SPEC.md)

**Components:**
- **CNI binary**: `/opt/cni/bin/prismnet`
- **Configuration**: `/etc/cni/net.d/10-prismnet.conflist`
- **IPAM plugin**: `/opt/cni/bin/prismnet-ipam` (or integrated)

**Responsibilities:**
- Create network interface for pod (veth pair)
- Allocate IP address from namespace-specific subnet
- Connect pod to OVN logical switch
- Configure routing for pod egress
- Enforce network policies (Phase 2)

**Configuration Schema:**
```json
{
  "cniVersion": "1.0.0",
  "name": "prismnet",
  "type": "prismnet",
  "ipam": {
    "type": "prismnet-ipam",
    "subnet": "10.244.0.0/16",
    "rangeStart": "10.244.0.10",
    "rangeEnd": "10.244.255.254",
    "routes": [
      {"dst": "0.0.0.0/0"}
    ],
    "gateway": "10.244.0.1"
  },
  "ovn": {
    "northbound": "tcp:prismnet-server:6641",
    "southbound": "tcp:prismnet-server:6642",
    "encapType": "geneve"
  },
  "mtu": 1400,
  "prismnetEndpoint": "prismnet-server:5000"
}
```

**CNI Plugin Workflow:**

1. **ADD Command** (pod creation):
   ```
   Input: Container ID, network namespace path, interface name
   Process:
   - Call PrismNET gRPC API: AllocateIP(namespace, pod_name)
   - Create veth pair: one end in pod netns, one in host
   - Add host veth to OVN logical switch port
   - Configure pod veth: IP address, routes, MTU
   - Return: IP config, routes, DNS settings
   ```

2. **DEL Command** (pod deletion):
   ```
   Input: Container ID, network namespace path
   Process:
   - Call PrismNET gRPC API: ReleaseIP(namespace, pod_name)
   - Delete OVN logical switch port
   - Delete veth pair
   ```

3. **CHECK Command** (health check):
   ```
   Verify interface exists and has expected configuration
   ```

**API Integration (PrismNET gRPC):**

```protobuf
service NetworkService {
  rpc AllocateIP(AllocateIPRequest) returns (AllocateIPResponse);
  rpc ReleaseIP(ReleaseIPRequest) returns (ReleaseIPResponse);
  rpc CreateLogicalSwitch(CreateLogicalSwitchRequest) returns (CreateLogicalSwitchResponse);
}

message AllocateIPRequest {
  string namespace = 1;
  string pod_name = 2;
  string container_id = 3;
}

message AllocateIPResponse {
  string ip_address = 1;  // e.g., "10.244.1.5/24"
  string gateway = 2;
  repeated string dns_servers = 3;
}
```

**OVN Topology:**
- **Logical Switch per Namespace**: `k8s-<namespace>` (e.g., `k8s-project-123`)
- **Logical Router**: `k8s-cluster-router` for inter-namespace routing
- **Logical Switch Ports**: One per pod (`<pod-name>-<container-id>`)
- **ACLs**: NetworkPolicy enforcement (Phase 2)

**Network Policy Translation (Phase 2):**
```
K8s NetworkPolicy:
  podSelector: app=web
  ingress:
  - from:
    - podSelector: app=frontend
    ports:
    - protocol: TCP
      port: 80

→ OVN ACL:
  direction: to-lport
  match: "ip4.src == $frontend_pods && tcp.dst == 80"
  action: allow-related
  priority: 1000
```

**Address Sets:**
- Dynamic updates as pods are added/removed
- Efficient ACL matching for large pod groups

### 2. FiberLB LoadBalancer Controller

**Purpose:** Reconcile K8s Services of type LoadBalancer with FiberLB resources.

**Architecture:**
- **Controller Process**: Runs as a pod in `kube-system` namespace or embedded in k3s server
- **Watch Resources**: Services (type=LoadBalancer), Endpoints
- **Manage Resources**: FiberLB LoadBalancers, Listeners, Pools, Members

**Controller Logic:**

**1. Service Watch Loop:**
```go
for event := range serviceWatcher {
  if event.Type == Created || event.Type == Updated {
    if service.Spec.Type == "LoadBalancer" {
      reconcileLoadBalancer(service)
    }
  } else if event.Type == Deleted {
    deleteLoadBalancer(service)
  }
}
```

**2. Reconcile Logic:**
```
Input: Service object
Process:
1. Check if FiberLB LoadBalancer exists (by annotation or name mapping)
2. If not exists:
   a. Allocate external IP from pool
   b. Create FiberLB LoadBalancer resource (gRPC CreateLoadBalancer)
   c. Store LoadBalancer ID in service annotation
3. For each service.Spec.Ports:
   a. Create/update FiberLB Listener (protocol, port, algorithm)
4. Get service endpoints:
   a. Create/update FiberLB Pool with backend members (pod IPs, ports)
5. Update service.Status.LoadBalancer.Ingress with external IP
6. If service spec changed:
   a. Update FiberLB resources accordingly
```

**3. Endpoint Watch Loop:**
```
for event := range endpointWatcher {
  service := getServiceForEndpoint(event.Object)
  if service.Spec.Type == "LoadBalancer" {
    updateLoadBalancerPool(service, event.Object)
  }
}
```

**Configuration:**
- **External IP Pool**: `--external-ip-pool=192.168.100.0/24` (CIDR or IP range)
- **FiberLB Endpoint**: `--fiberlb-endpoint=fiberlb-server:7000` (gRPC address)
- **IP Allocation**: First-available or integration with IPAM service

**Service Annotations:**
```yaml
apiVersion: v1
kind: Service
metadata:
  name: web-service
  annotations:
    fiberlb.plasmacloud.io/load-balancer-id: "lb-abc123"
    fiberlb.plasmacloud.io/algorithm: "round-robin"  # round-robin | least-conn | ip-hash
    fiberlb.plasmacloud.io/health-check-path: "/health"
    fiberlb.plasmacloud.io/health-check-interval: "10s"
    fiberlb.plasmacloud.io/health-check-timeout: "5s"
    fiberlb.plasmacloud.io/health-check-retries: "3"
    fiberlb.plasmacloud.io/session-affinity: "client-ip"  # For sticky sessions
spec:
  type: LoadBalancer
  selector:
    app: web
  ports:
  - protocol: TCP
    port: 80
    targetPort: 8080
status:
  loadBalancer:
    ingress:
    - ip: 192.168.100.50
```

**FiberLB gRPC API Integration:**
```protobuf
service LoadBalancerService {
  rpc CreateLoadBalancer(CreateLoadBalancerRequest) returns (LoadBalancer);
  rpc UpdateLoadBalancer(UpdateLoadBalancerRequest) returns (LoadBalancer);
  rpc DeleteLoadBalancer(DeleteLoadBalancerRequest) returns (Empty);
  rpc CreateListener(CreateListenerRequest) returns (Listener);
  rpc UpdatePool(UpdatePoolRequest) returns (Pool);
}

message CreateLoadBalancerRequest {
  string name = 1;
  string description = 2;
  string external_ip = 3;  // If empty, allocate from pool
  string org_id = 4;
  string project_id = 5;
}

message CreateListenerRequest {
  string load_balancer_id = 1;
  string protocol = 2;  // TCP, UDP, HTTP, HTTPS
  int32 port = 3;
  string default_pool_id = 4;
  HealthCheck health_check = 5;
}

message UpdatePoolRequest {
  string pool_id = 1;
  repeated PoolMember members = 2;
  string algorithm = 3;
}

message PoolMember {
  string address = 1;  // Pod IP
  int32 port = 2;
  int32 weight = 3;
}
```

**Health Checks:**
- HTTP health checks: Use annotation `health-check-path`
- TCP health checks: Connection-based for non-HTTP services
- Health check failures remove pod from pool (auto-healing)

**Edge Cases:**
- **Service deletion**: Controller must clean up FiberLB resources and release external IP
- **Endpoint churn**: Debounce pool updates to avoid excessive FiberLB API calls
- **IP exhaustion**: Return error event on service, set status condition

### 3. IAM Authentication Webhook

**Purpose:** Authenticate K8s API requests using PlasmaCloud IAM tokens.

**Architecture:**
- **Webhook Server**: HTTPS endpoint (can be part of IAM service or standalone)
- **Integration Point**: kube-apiserver `--authentication-token-webhook-config-file`
- **Protocol**: K8s TokenReview API

**Webhook Endpoint:** `POST /apis/iam.plasmacloud.io/v1/authenticate`

**Request Flow:**
```
kubectl --token=<IAM-token> get pods
    ↓
kube-apiserver extracts Bearer token
    ↓
POST /apis/iam.plasmacloud.io/v1/authenticate
    body: TokenReview with token
    ↓
IAM webhook validates token
    ↓
Response: authenticated=true, user info, groups
    ↓
kube-apiserver proceeds with RBAC authorization
```

**Request Schema (from kube-apiserver):**
```json
{
  "apiVersion": "authentication.k8s.io/v1",
  "kind": "TokenReview",
  "spec": {
    "token": "eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9..."
  }
}
```

**Response Schema (from IAM webhook):**
```json
{
  "apiVersion": "authentication.k8s.io/v1",
  "kind": "TokenReview",
  "status": {
    "authenticated": true,
    "user": {
      "username": "user@example.com",
      "uid": "user-550e8400-e29b-41d4-a716-446655440000",
      "groups": [
        "org:org-123",
        "project:proj-456",
        "system:authenticated"
      ],
      "extra": {
        "org_id": ["org-123"],
        "project_id": ["proj-456"],
        "roles": ["org_admin"]
      }
    }
  }
}
```

**Error Response (invalid token):**
```json
{
  "apiVersion": "authentication.k8s.io/v1",
  "kind": "TokenReview",
  "status": {
    "authenticated": false,
    "error": "Invalid or expired token"
  }
}
```

**IAM Token Format:**
- **JWT**: Signed by IAM service with shared secret or public/private key
- **Claims**: sub (user ID), email, org_id, project_id, roles, exp (expiration)
- **Example**:
  ```json
  {
    "sub": "user-550e8400-e29b-41d4-a716-446655440000",
    "email": "user@example.com",
    "org_id": "org-123",
    "project_id": "proj-456",
    "roles": ["org_admin", "project_member"],
    "exp": 1672531200
  }
  ```

**User/Group Mapping:**

| IAM Principal | K8s Username | K8s Groups |
|---------------|--------------|------------|
| User (email) | user@example.com | org:<org_id>, project:<project_id>, system:authenticated |
| User (ID) | user-<uuid> | org:<org_id>, project:<project_id>, system:authenticated |
| Service Account | sa-<name>@<project> | org:<org_id>, project:<project_id>, system:serviceaccounts |
| Org Admin | admin@example.com | org:<org_id>, project:<all_projects>, k8s:org-admin |

**RBAC Integration:**
- Groups are used in RoleBindings and ClusterRoleBindings
- Example: `org:org-123` group gets admin access to all `project-*` namespaces for that org

**Webhook Configuration File (`/etc/k8shost/iam-webhook.yaml`):**
```yaml
apiVersion: v1
kind: Config
clusters:
- name: iam-webhook
  cluster:
    server: https://iam-server:3000/apis/iam.plasmacloud.io/v1/authenticate
    certificate-authority: /etc/k8shost/ca.crt
users:
- name: k8s-apiserver
  user:
    client-certificate: /etc/k8shost/apiserver-client.crt
    client-key: /etc/k8shost/apiserver-client.key
current-context: webhook
contexts:
- context:
    cluster: iam-webhook
    user: k8s-apiserver
  name: webhook
```

**Performance Considerations:**
- **Caching**: kube-apiserver caches successful authentications (--authentication-token-webhook-cache-ttl=2m)
- **Timeouts**: Webhook must respond within 10s (configurable)
- **Rate Limiting**: IAM webhook should handle high request volume (100s of req/s)

### 4. FlashDNS Service Discovery Controller

**Purpose:** Synchronize K8s Services and Pods to FlashDNS for cluster DNS resolution.

**Architecture:**
- **Controller Process**: Runs as pod in `kube-system` or embedded in k3s server
- **Watch Resources**: Services, Endpoints, Pods
- **Manage Resources**: FlashDNS A/AAAA/SRV records

**DNS Hierarchy:**
- **Pod A Records**: `<pod-ip-dashed>.pod.cluster.local` → Pod IP
  - Example: `10-244-1-5.pod.cluster.local` → `10.244.1.5`
- **Service A Records**: `<service>.<namespace>.svc.cluster.local` → ClusterIP or external IP
  - Example: `web.default.svc.cluster.local` → `10.96.0.100`
- **Headless Service**: `<endpoint>.<service>.<namespace>.svc.cluster.local` → Endpoint IPs
  - Example: `web-0.web.default.svc.cluster.local` → `10.244.1.10`
- **SRV Records**: `_<port>._<protocol>.<service>.<namespace>.svc.cluster.local`
  - Example: `_http._tcp.web.default.svc.cluster.local` → `0 50 80 web.default.svc.cluster.local`

**Controller Logic:**

**1. Service Watch:**
```
for event := range serviceWatcher {
  service := event.Object
  switch event.Type {
  case Created, Updated:
    if service.Spec.ClusterIP != "None":
      // Regular service
      createOrUpdateDNSRecord(
        name: service.Name + "." + service.Namespace + ".svc.cluster.local",
        type: "A",
        value: service.Spec.ClusterIP
      )

      if len(service.Status.LoadBalancer.Ingress) > 0:
        // LoadBalancer service - also add external IP
        createOrUpdateDNSRecord(
          name: service.Name + "." + service.Namespace + ".svc.cluster.local",
          type: "A",
          value: service.Status.LoadBalancer.Ingress[0].IP
        )
    else:
      // Headless service - add endpoint records
      endpoints := getEndpoints(service)
      for _, ep := range endpoints:
        createOrUpdateDNSRecord(
          name: ep.Hostname + "." + service.Name + "." + service.Namespace + ".svc.cluster.local",
          type: "A",
          value: ep.IP
        )

    // Create SRV records for each port
    for _, port := range service.Spec.Ports:
      createSRVRecord(service, port)

  case Deleted:
    deleteDNSRecords(service)
  }
}
```

**2. Pod Watch (for pod DNS):**
```
for event := range podWatcher {
  pod := event.Object
  switch event.Type {
  case Created, Updated:
    if pod.Status.PodIP != "":
      dashedIP := strings.ReplaceAll(pod.Status.PodIP, ".", "-")
      createOrUpdateDNSRecord(
        name: dashedIP + ".pod.cluster.local",
        type: "A",
        value: pod.Status.PodIP
      )
  case Deleted:
    deleteDNSRecord(pod)
  }
}
```

**FlashDNS gRPC API Integration:**
```protobuf
service DNSService {
  rpc CreateRecord(CreateRecordRequest) returns (DNSRecord);
  rpc UpdateRecord(UpdateRecordRequest) returns (DNSRecord);
  rpc DeleteRecord(DeleteRecordRequest) returns (Empty);
  rpc ListRecords(ListRecordsRequest) returns (ListRecordsResponse);
}

message CreateRecordRequest {
  string zone = 1;  // "cluster.local"
  string name = 2;  // "web.default.svc"
  string type = 3;  // "A", "AAAA", "SRV", "CNAME"
  string value = 4; // "10.96.0.100"
  int32 ttl = 5;    // 30 (seconds)
  map<string, string> labels = 6;  // k8s metadata
}

message DNSRecord {
  string id = 1;
  string zone = 2;
  string name = 3;
  string type = 4;
  string value = 5;
  int32 ttl = 6;
}
```

**Configuration:**
- **FlashDNS Endpoint**: `--flashdns-endpoint=flashdns-server:6000`
- **Cluster Domain**: `--cluster-domain=cluster.local` (default)
- **Record TTL**: `--dns-ttl=30` (seconds, low for fast updates)

**Example DNS Records:**

```
# Regular service
web.default.svc.cluster.local.  30 IN A 10.96.0.100

# Headless service with 3 pods
web.default.svc.cluster.local.  30 IN A 10.244.1.10
web.default.svc.cluster.local.  30 IN A 10.244.1.11
web.default.svc.cluster.local.  30 IN A 10.244.1.12

# StatefulSet pods (Phase 3)
web-0.web.default.svc.cluster.local.  30 IN A 10.244.1.10
web-1.web.default.svc.cluster.local.  30 IN A 10.244.1.11

# SRV record for service port
_http._tcp.web.default.svc.cluster.local. 30 IN SRV 0 50 80 web.default.svc.cluster.local.

# Pod DNS
10-244-1-10.pod.cluster.local.  30 IN A 10.244.1.10
```

**Integration with kubelet:**
- kubelet configures pod DNS via `/etc/resolv.conf`
- `nameserver`: FlashDNS service IP (typically first IP in service CIDR, e.g., `10.96.0.10`)
- `search`: `<namespace>.svc.cluster.local svc.cluster.local cluster.local`

**Edge Cases:**
- **Service IP change**: Update DNS record atomically
- **Endpoint churn**: Debounce updates for headless services with many endpoints
- **DNS caching**: Low TTL (30s) for fast convergence

### 5. LightningStor CSI Driver

**Purpose:** Provide dynamic PersistentVolume provisioning and lifecycle management.

**CSI Driver Name:** `stor.plasmacloud.io`

**Architecture:**
- **Controller Plugin**: Runs as StatefulSet or Deployment in `kube-system`
  - Provisioning, deletion, attaching, detaching, snapshots
- **Node Plugin**: Runs as DaemonSet on every node
  - Staging, publishing (mounting), unpublishing, unstaging

**CSI Components:**

**1. Controller Service (Identity, Controller RPCs):**
- `CreateVolume`: Provision new volume via LightningStor
- `DeleteVolume`: Delete volume
- `ControllerPublishVolume`: Attach volume to node
- `ControllerUnpublishVolume`: Detach volume from node
- `ValidateVolumeCapabilities`: Check if volume supports requested capabilities
- `ListVolumes`: List all volumes
- `GetCapacity`: Query available storage capacity
- `CreateSnapshot`, `DeleteSnapshot`: Volume snapshots (Phase 2)

**2. Node Service (Node RPCs):**
- `NodeStageVolume`: Mount volume to global staging path on node
- `NodeUnstageVolume`: Unmount from staging path
- `NodePublishVolume`: Bind mount from staging to pod path
- `NodeUnpublishVolume`: Unmount from pod path
- `NodeGetInfo`: Return node ID and topology
- `NodeGetCapabilities`: Return node capabilities

**CSI Driver Workflow:**

**Volume Provisioning:**
```
1. User creates PVC:
   apiVersion: v1
   kind: PersistentVolumeClaim
   metadata:
     name: my-pvc
   spec:
     accessModes: [ReadWriteOnce]
     resources:
       requests:
         storage: 10Gi
     storageClassName: lightningstor-ssd

2. CSI Controller watches PVC, calls CreateVolume:
   CreateVolumeRequest {
     name: "pvc-550e8400-e29b-41d4-a716-446655440000"
     capacity_range: { required_bytes: 10737418240 }
     volume_capabilities: [{ access_mode: SINGLE_NODE_WRITER }]
     parameters: {
       "type": "ssd",
       "replication": "3",
       "org_id": "org-123",
       "project_id": "proj-456"
     }
   }

3. CSI Controller calls LightningStor gRPC CreateVolume:
   LightningStor creates volume, returns volume_id

4. CSI Controller creates PV:
   apiVersion: v1
   kind: PersistentVolume
   metadata:
     name: pvc-550e8400-e29b-41d4-a716-446655440000
   spec:
     capacity:
       storage: 10Gi
     accessModes: [ReadWriteOnce]
     persistentVolumeReclaimPolicy: Delete
     storageClassName: lightningstor-ssd
     csi:
       driver: stor.plasmacloud.io
       volumeHandle: vol-abc123
       fsType: ext4

5. K8s binds PVC to PV
```

**Volume Attachment (when pod is scheduled):**
```
1. kube-controller-manager creates VolumeAttachment:
   apiVersion: storage.k8s.io/v1
   kind: VolumeAttachment
   metadata:
     name: csi-<hash>
   spec:
     attacher: stor.plasmacloud.io
     nodeName: worker-1
     source:
       persistentVolumeName: pvc-550e8400-e29b-41d4-a716-446655440000

2. CSI Controller watches VolumeAttachment, calls ControllerPublishVolume:
   ControllerPublishVolumeRequest {
     volume_id: "vol-abc123"
     node_id: "worker-1"
     volume_capability: { access_mode: SINGLE_NODE_WRITER }
   }

3. CSI Controller calls LightningStor gRPC AttachVolume:
   LightningStor attaches volume to node (e.g., iSCSI target, NBD)

4. CSI Controller updates VolumeAttachment status: attached=true
```

**Volume Mounting (on node):**
```
1. kubelet calls CSI Node plugin: NodeStageVolume
   NodeStageVolumeRequest {
     volume_id: "vol-abc123"
     staging_target_path: "/var/lib/kubelet/plugins/kubernetes.io/csi/stor.plasmacloud.io/<hash>/globalmount"
     volume_capability: { mount: { fs_type: "ext4" } }
   }

2. CSI Node plugin:
   - Discovers block device (e.g., /dev/nbd0) via LightningStor
   - Formats if needed: mkfs.ext4 /dev/nbd0
   - Mounts to staging path: mount /dev/nbd0 <staging_target_path>

3. kubelet calls CSI Node plugin: NodePublishVolume
   NodePublishVolumeRequest {
     volume_id: "vol-abc123"
     staging_target_path: "/var/lib/kubelet/plugins/kubernetes.io/csi/stor.plasmacloud.io/<hash>/globalmount"
     target_path: "/var/lib/kubelet/pods/<pod-uid>/volumes/kubernetes.io~csi/pvc-<hash>/mount"
   }

4. CSI Node plugin:
   - Bind mount staging path to target path
   - Pod can now read/write to volume
```

**LightningStor gRPC API Integration:**
```protobuf
service VolumeService {
  rpc CreateVolume(CreateVolumeRequest) returns (Volume);
  rpc DeleteVolume(DeleteVolumeRequest) returns (Empty);
  rpc AttachVolume(AttachVolumeRequest) returns (VolumeAttachment);
  rpc DetachVolume(DetachVolumeRequest) returns (Empty);
  rpc GetVolume(GetVolumeRequest) returns (Volume);
  rpc ListVolumes(ListVolumesRequest) returns (ListVolumesResponse);
}

message CreateVolumeRequest {
  string name = 1;
  int64 size_bytes = 2;
  string volume_type = 3;  // "ssd", "hdd"
  int32 replication_factor = 4;
  string org_id = 5;
  string project_id = 6;
}

message Volume {
  string id = 1;
  string name = 2;
  int64 size_bytes = 3;
  string status = 4;  // "available", "in-use", "error"
  string volume_type = 5;
}

message AttachVolumeRequest {
  string volume_id = 1;
  string node_id = 2;
  string attach_mode = 3;  // "read-write", "read-only"
}

message VolumeAttachment {
  string id = 1;
  string volume_id = 2;
  string node_id = 3;
  string device_path = 4;  // e.g., "/dev/nbd0"
  string connection_info = 5;  // JSON with iSCSI target, NBD socket, etc.
}
```

**StorageClass Examples:**
```yaml
# SSD storage with 3x replication
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: lightningstor-ssd
provisioner: stor.plasmacloud.io
parameters:
  type: "ssd"
  replication: "3"
volumeBindingMode: WaitForFirstConsumer  # Topology-aware scheduling
allowVolumeExpansion: true
reclaimPolicy: Delete

---
# HDD storage with 2x replication
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: lightningstor-hdd
provisioner: stor.plasmacloud.io
parameters:
  type: "hdd"
  replication: "2"
volumeBindingMode: Immediate
allowVolumeExpansion: true
reclaimPolicy: Retain  # Keep volume after PVC deletion
```

**Access Modes:**
- **ReadWriteOnce (RWO)**: Single node read-write (most common)
- **ReadOnlyMany (ROX)**: Multiple nodes read-only
- **ReadWriteMany (RWX)**: Multiple nodes read-write (requires shared filesystem like NFS, Phase 2)

**Volume Expansion (if allowVolumeExpansion: true):**
```
1. User edits PVC: spec.resources.requests.storage: 20Gi (was 10Gi)
2. CSI Controller calls ControllerExpandVolume
3. LightningStor expands volume backend
4. CSI Node plugin calls NodeExpandVolume
5. Filesystem resize: resize2fs /dev/nbd0
```

### 6. PlasmaVMC Integration

**Phase 1 (MVP):** Use containerd as default CRI
- k3s ships with containerd embedded
- Standard OCI container runtime
- No changes needed for Phase 1

**Phase 3 (Future):** Custom CRI for VM-backed pods

**Motivation:**
- **Enhanced Isolation**: Stronger security boundary than containers
- **Multi-Tenant Security**: Prevent container escape attacks
- **Consistent Runtime**: Unify VM and container workloads on PlasmaVMC

**Architecture:**
- PlasmaVMC implements CRI (Container Runtime Interface)
- Each pod runs as a lightweight VM (Firecracker microVM)
- Pod containers run inside VM (still using containerd within VM)
- kubelet communicates with PlasmaVMC CRI endpoint instead of containerd

**CRI Interface Implementation:**

**RuntimeService:**
- `RunPodSandbox`: Create Firecracker microVM for pod
- `StopPodSandbox`: Stop microVM
- `RemovePodSandbox`: Delete microVM
- `PodSandboxStatus`: Query microVM status
- `ListPodSandbox`: List all pod microVMs
- `CreateContainer`: Create container inside microVM
- `StartContainer`, `StopContainer`, `RemoveContainer`: Container lifecycle
- `ExecSync`, `Exec`: Execute commands in container
- `Attach`: Attach to container stdio

**ImageService:**
- `PullImage`: Download container image (delegate to internal containerd)
- `RemoveImage`: Delete image
- `ListImages`: List cached images
- `ImageStatus`: Query image metadata

**Implementation Strategy:**
```
┌─────────────────────────────────────────┐
│           kubelet (k3s agent)           │
└─────────────┬───────────────────────────┘
              │ CRI gRPC
              ▼
┌─────────────────────────────────────────┐
│      PlasmaVMC CRI Server (Rust)        │
│  - RunPodSandbox → Create microVM       │
│  - CreateContainer → Run in VM          │
└─────────────┬───────────────────────────┘
              │
              ▼
┌─────────────────────────────────────────┐
│      Firecracker VMM (per pod)          │
│  ┌───────────────────────────────────┐  │
│  │  Pod VM (minimal Linux kernel)    │  │
│  │  ┌──────────────────────────────┐ │  │
│  │  │ containerd (in-VM)           │ │  │
│  │  │  - Container 1               │ │  │
│  │  │  - Container 2               │ │  │
│  │  └──────────────────────────────┘ │  │
│  └───────────────────────────────────┘  │
└─────────────────────────────────────────┘
```

**Configuration (Phase 3):**
```nix
services.k8shost = {
  enable = true;
  cri = "plasmavmc";  # Instead of "containerd"
  plasmavmc = {
    endpoint = "unix:///var/run/plasmavmc/cri.sock";
    vmKernel = "/var/lib/plasmavmc/vmlinux.bin";
    vmRootfs = "/var/lib/plasmavmc/rootfs.ext4";
  };
};
```

**Benefits:**
- Stronger isolation for untrusted workloads
- Leverage existing PlasmaVMC infrastructure
- Consistent management across VM and K8s workloads

**Challenges:**
- Performance overhead (microVM startup time, memory overhead)
- Image caching complexity (need containerd inside VM)
- Networking integration (CNI must configure VM network)

**Decision:** Defer to Phase 3, focus on standard containerd for MVP.

## Multi-Tenant Model

### Namespace Strategy

**Principle:** One K8s namespace per PlasmaCloud project.

**Namespace Naming:**
- **Project namespaces**: `project-<project_id>` (e.g., `project-550e8400-e29b-41d4-a716-446655440000`)
- **Org shared namespaces** (optional): `org-<org_id>-shared` (for shared resources like monitoring)
- **System namespaces**: `kube-system`, `kube-public`, `kube-node-lease`, `default`

**Namespace Lifecycle:**
- Created automatically when project provisions K8s cluster
- Labeled with `org_id`, `project_id` for RBAC and billing
- Deleted when project is deleted (with grace period)

**Namespace Metadata:**
```yaml
apiVersion: v1
kind: Namespace
metadata:
  name: project-550e8400-e29b-41d4-a716-446655440000
  labels:
    plasmacloud.io/org-id: "org-123"
    plasmacloud.io/project-id: "proj-456"
    plasmacloud.io/tenant-type: "project"
  annotations:
    plasmacloud.io/project-name: "my-web-app"
    plasmacloud.io/created-by: "user@example.com"
```

### RBAC Templates

**Org Admin Role (full access to all project namespaces):**
```yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  name: org-admin
  namespace: project-550e8400-e29b-41d4-a716-446655440000
rules:
- apiGroups: ["*"]
  resources: ["*"]
  verbs: ["*"]

---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: org-admin-binding
  namespace: project-550e8400-e29b-41d4-a716-446655440000
subjects:
- kind: Group
  name: org:org-123
  apiGroup: rbac.authorization.k8s.io
roleRef:
  kind: Role
  name: org-admin
  apiGroup: rbac.authorization.k8s.io
```

**Project Admin Role (full access to specific project namespace):**
```yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  name: project-admin
  namespace: project-550e8400-e29b-41d4-a716-446655440000
rules:
- apiGroups: ["", "apps", "batch", "networking.k8s.io", "storage.k8s.io"]
  resources: ["*"]
  verbs: ["*"]

---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: project-admin-binding
  namespace: project-550e8400-e29b-41d4-a716-446655440000
subjects:
- kind: Group
  name: project:proj-456
  apiGroup: rbac.authorization.k8s.io
roleRef:
  kind: Role
  name: project-admin
  apiGroup: rbac.authorization.k8s.io
```

**Project Viewer Role (read-only access):**
```yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  name: project-viewer
  namespace: project-550e8400-e29b-41d4-a716-446655440000
rules:
- apiGroups: ["", "apps", "batch", "networking.k8s.io"]
  resources: ["pods", "services", "deployments", "replicasets", "configmaps", "secrets"]
  verbs: ["get", "list", "watch"]
- apiGroups: [""]
  resources: ["pods/log"]
  verbs: ["get", "list"]

---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: project-viewer-binding
  namespace: project-550e8400-e29b-41d4-a716-446655440000
subjects:
- kind: Group
  name: project:proj-456:viewer
  apiGroup: rbac.authorization.k8s.io
roleRef:
  kind: Role
  name: project-viewer
  apiGroup: rbac.authorization.k8s.io
```

**ClusterRole for Node Access (for cluster admins):**
```yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: plasmacloud-cluster-admin
rules:
- apiGroups: [""]
  resources: ["nodes", "persistentvolumes"]
  verbs: ["*"]
- apiGroups: ["storage.k8s.io"]
  resources: ["storageclasses"]
  verbs: ["*"]

---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: plasmacloud-cluster-admin-binding
subjects:
- kind: Group
  name: system:plasmacloud-admins
  apiGroup: rbac.authorization.k8s.io
roleRef:
  kind: ClusterRole
  name: plasmacloud-cluster-admin
  apiGroup: rbac.authorization.k8s.io
```

### Network Isolation

**Default NetworkPolicy (deny all, except DNS):**
```yaml
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: default-deny-all
  namespace: project-550e8400-e29b-41d4-a716-446655440000
spec:
  podSelector: {}  # Apply to all pods
  policyTypes:
  - Ingress
  - Egress
  egress:
  - to:
    - namespaceSelector:
        matchLabels:
          kubernetes.io/metadata.name: kube-system
    ports:
    - protocol: UDP
      port: 53  # DNS
```

**Allow Ingress from LoadBalancer:**
```yaml
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allow-loadbalancer
  namespace: project-550e8400-e29b-41d4-a716-446655440000
spec:
  podSelector:
    matchLabels:
      app: web
  policyTypes:
  - Ingress
  ingress:
  - from:
    - ipBlock:
        cidr: 0.0.0.0/0  # Allow from anywhere (LoadBalancer external traffic)
    ports:
    - protocol: TCP
      port: 8080
```

**Allow Inter-Namespace Communication (optional, for org-shared services):**
```yaml
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allow-org-shared
  namespace: project-550e8400-e29b-41d4-a716-446655440000
spec:
  podSelector: {}
  policyTypes:
  - Egress
  egress:
  - to:
    - namespaceSelector:
        matchLabels:
          plasmacloud.io/org-id: "org-123"
          plasmacloud.io/tenant-type: "org-shared"
```

**PrismNET Enforcement:**
- NetworkPolicies are translated to OVN ACLs by PrismNET CNI controller
- Enforced at OVN logical switch level (low-level packet filtering)

### Resource Quotas

**CPU and Memory Quotas:**
```yaml
apiVersion: v1
kind: ResourceQuota
metadata:
  name: project-compute-quota
  namespace: project-550e8400-e29b-41d4-a716-446655440000
spec:
  hard:
    requests.cpu: "10"       # 10 CPU cores
    requests.memory: "20Gi"  # 20 GB RAM
    limits.cpu: "20"         # Allow bursting to 20 cores
    limits.memory: "40Gi"    # Allow bursting to 40 GB RAM
```

**Storage Quotas:**
```yaml
apiVersion: v1
kind: ResourceQuota
metadata:
  name: project-storage-quota
  namespace: project-550e8400-e29b-41d4-a716-446655440000
spec:
  hard:
    persistentvolumeclaims: "10"  # Max 10 PVCs
    requests.storage: "100Gi"     # Total storage requests
```

**Object Count Quotas:**
```yaml
apiVersion: v1
kind: ResourceQuota
metadata:
  name: project-object-quota
  namespace: project-550e8400-e29b-41d4-a716-446655440000
spec:
  hard:
    pods: "50"
    services: "20"
    services.loadbalancers: "5"   # Max 5 LoadBalancer services (limit external IPs)
    configmaps: "50"
    secrets: "50"
```

**Quota Enforcement:**
- K8s admission controller rejects resource creation exceeding quota
- User receives clear error message
- Quota usage visible in `kubectl describe quota`

## Deployment Model

### Single-Server (Development/Small)

**Target Use Case:**
- Development and testing environments
- Small production workloads (<10 nodes)
- Cost-sensitive deployments

**Architecture:**
- Single k3s server node with embedded SQLite datastore
- Control plane and worker colocated
- No HA guarantees

**k3s Server Command:**
```bash
k3s server \
  --data-dir=/var/lib/k8shost \
  --disable=servicelb,traefik,flannel \
  --flannel-backend=none \
  --disable-network-policy \
  --cluster-domain=cluster.local \
  --service-cidr=10.96.0.0/12 \
  --cluster-cidr=10.244.0.0/16 \
  --authentication-token-webhook-config-file=/etc/k8shost/iam-webhook.yaml \
  --bind-address=0.0.0.0 \
  --advertise-address=192.168.1.100 \
  --tls-san=k8s-api.example.com
```

**NixOS Configuration:**
```nix
{ config, lib, pkgs, ... }:

{
  services.k8shost = {
    enable = true;
    mode = "server";
    datastore = "sqlite";  # Embedded SQLite
    disableComponents = ["servicelb" "traefik" "flannel"];

    networking = {
      serviceCIDR = "10.96.0.0/12";
      clusterCIDR = "10.244.0.0/16";
      clusterDomain = "cluster.local";
    };

    prismnet = {
      enable = true;
      endpoint = "prismnet-server:5000";
      ovnNorthbound = "tcp:prismnet-server:6641";
      ovnSouthbound = "tcp:prismnet-server:6642";
    };

    fiberlb = {
      enable = true;
      endpoint = "fiberlb-server:7000";
      externalIpPool = "192.168.100.0/24";
    };

    iam = {
      enable = true;
      webhookEndpoint = "https://iam-server:3000/apis/iam.plasmacloud.io/v1/authenticate";
      caCertFile = "/etc/k8shost/ca.crt";
      clientCertFile = "/etc/k8shost/client.crt";
      clientKeyFile = "/etc/k8shost/client.key";
    };

    flashdns = {
      enable = true;
      endpoint = "flashdns-server:6000";
      clusterDomain = "cluster.local";
      recordTTL = 30;
    };

    lightningstor = {
      enable = true;
      endpoint = "lightningstor-server:8000";
      csiNodeDaemonSet = true;  # Deploy CSI node plugin as DaemonSet
    };
  };

  # Open firewall for K8s API
  networking.firewall.allowedTCPPorts = [ 6443 ];
}
```

**Limitations:**
- No HA (single point of failure)
- SQLite has limited concurrency
- Control plane downtime affects entire cluster

### HA Cluster (Production)

**Target Use Case:**
- Production workloads requiring high availability
- Large clusters (>10 nodes)
- Mission-critical applications

**Architecture:**
- 3 or 5 k3s server nodes (odd number for quorum)
- Embedded etcd (Raft consensus, HA datastore)
- Load balancer in front of API servers
- Agent nodes for workload scheduling

**k3s Server Command (each server node):**
```bash
k3s server \
  --data-dir=/var/lib/k8shost \
  --disable=servicelb,traefik,flannel \
  --flannel-backend=none \
  --disable-network-policy \
  --cluster-domain=cluster.local \
  --service-cidr=10.96.0.0/12 \
  --cluster-cidr=10.244.0.0/16 \
  --authentication-token-webhook-config-file=/etc/k8shost/iam-webhook.yaml \
  --cluster-init \  # First server only
  --server https://k8s-api-lb.internal:6443 \  # Join existing cluster (not for first server)
  --tls-san=k8s-api-lb.example.com \
  --tls-san=k8s-api.example.com
```

**k3s Agent Command (worker nodes):**
```bash
k3s agent \
  --server https://k8s-api-lb.internal:6443 \
  --token <join-token>
```

**NixOS Configuration (Server Node):**
```nix
{ config, lib, pkgs, ... }:

{
  services.k8shost = {
    enable = true;
    mode = "server";
    datastore = "etcd";  # Embedded etcd for HA
    clusterInit = true;  # Set to false for joining servers
    serverUrl = "https://k8s-api-lb.internal:6443";  # For joining servers

    # ... same integrations as single-server ...
  };

  # High availability settings
  systemd.services.k8shost = {
    serviceConfig = {
      Restart = "always";
      RestartSec = "10s";
    };
  };
}
```

**Load Balancer Configuration (FiberLB):**
```yaml
# External LoadBalancer for API access
apiVersion: v1
kind: LoadBalancer
metadata:
  name: k8s-api-lb
spec:
  listeners:
  - protocol: TCP
    port: 6443
    backend_pool: k8s-api-servers
  pools:
  - name: k8s-api-servers
    algorithm: round-robin
    members:
    - address: 192.168.1.101  # server-1
      port: 6443
    - address: 192.168.1.102  # server-2
      port: 6443
    - address: 192.168.1.103  # server-3
      port: 6443
    health_check:
      type: tcp
      interval: 10s
      timeout: 5s
      retries: 3
```

**Datastore Options:**

#### Option 1: Embedded etcd (Recommended for MVP)
**Pros:**
- Built-in to k3s, no external dependencies
- Proven, battle-tested (CNCF etcd project)
- Automatic HA with Raft consensus
- Easy setup (just `--cluster-init`)

**Cons:**
- Another distributed datastore (in addition to Chainfire/FlareDB)
- etcd-specific operations (backup, restore, defragmentation)

#### Option 2: FlareDB as External Datastore
**Pros:**
- Unified storage layer for PlasmaCloud
- Leverage existing FlareDB deployment
- Simplified infrastructure (one less system to manage)

**Cons:**
- k3s requires etcd API compatibility
- FlareDB would need to implement etcd v3 API (significant effort)
- Untested for K8s workloads

**Recommendation for MVP:** Use embedded etcd for HA mode. Investigate FlareDB etcd compatibility in Phase 2 or 3.

**Backup and Disaster Recovery:**
```bash
# etcd snapshot (on any server node)
k3s etcd-snapshot save --name backup-$(date +%Y%m%d-%H%M%S)

# List snapshots
k3s etcd-snapshot ls

# Restore from snapshot
k3s server --cluster-reset --cluster-reset-restore-path=/var/lib/k8shost/server/db/snapshots/backup-20250101-120000
```

### NixOS Module Integration

**Module Structure:**
```
nix/modules/
├── k8shost.nix              # Main module
├── k8shost/
│   ├── controller.nix       # FiberLB, FlashDNS controllers
│   ├── csi.nix              # LightningStor CSI driver
│   └── cni.nix              # PrismNET CNI plugin
```

**Main Module (`nix/modules/k8shost.nix`):**
```nix
{ config, lib, pkgs, ... }:

with lib;

let
  cfg = config.services.k8shost;
in
{
  options.services.k8shost = {
    enable = mkEnableOption "PlasmaCloud K8s Hosting Service";

    mode = mkOption {
      type = types.enum ["server" "agent"];
      default = "server";
      description = "Run as server (control plane) or agent (worker)";
    };

    datastore = mkOption {
      type = types.enum ["sqlite" "etcd"];
      default = "sqlite";
      description = "Datastore backend (sqlite for single-server, etcd for HA)";
    };

    disableComponents = mkOption {
      type = types.listOf types.str;
      default = ["servicelb" "traefik" "flannel"];
      description = "k3s components to disable";
    };

    networking = {
      serviceCIDR = mkOption {
        type = types.str;
        default = "10.96.0.0/12";
        description = "CIDR for service ClusterIPs";
      };

      clusterCIDR = mkOption {
        type = types.str;
        default = "10.244.0.0/16";
        description = "CIDR for pod IPs";
      };

      clusterDomain = mkOption {
        type = types.str;
        default = "cluster.local";
        description = "Cluster DNS domain";
      };
    };

    # Integration options (prismnet, fiberlb, iam, flashdns, lightningstor)
    # ...
  };

  config = mkIf cfg.enable {
    # Install k3s package
    environment.systemPackages = [ pkgs.k3s ];

    # Create systemd service
    systemd.services.k8shost = {
      description = "PlasmaCloud K8s Hosting Service (k3s)";
      after = [ "network.target" "iam.service" "prismnet.service" ];
      requires = [ "iam.service" "prismnet.service" ];
      wantedBy = [ "multi-user.target" ];

      serviceConfig = {
        Type = "notify";
        ExecStart = "${pkgs.k3s}/bin/k3s ${cfg.mode} ${concatStringsSep " " (buildServerArgs cfg)}";
        KillMode = "process";
        Delegate = "yes";
        LimitNOFILE = 1048576;
        LimitNPROC = "infinity";
        LimitCORE = "infinity";
        TasksMax = "infinity";
        Restart = "always";
        RestartSec = "5s";
      };
    };

    # Create configuration files
    environment.etc."k8shost/iam-webhook.yaml" = {
      text = generateIAMWebhookConfig cfg.iam;
      mode = "0600";
    };

    # Deploy controllers (FiberLB, FlashDNS, etc.)
    # ... (as separate systemd services or in-cluster deployments)
  };
}
```

## API Server Configuration

### k3s Server Flags (Complete)

```bash
k3s server \
  # Data and cluster configuration
  --data-dir=/var/lib/k8shost \
  --cluster-init \  # For first server in HA cluster
  --server https://k8s-api-lb.internal:6443 \  # Join existing HA cluster
  --token <cluster-token> \  # Secure join token

  # Disable default components
  --disable=servicelb,traefik,flannel,local-storage \
  --flannel-backend=none \
  --disable-network-policy \

  # Network configuration
  --cluster-domain=cluster.local \
  --service-cidr=10.96.0.0/12 \
  --cluster-cidr=10.244.0.0/16 \
  --service-node-port-range=30000-32767 \

  # API server configuration
  --bind-address=0.0.0.0 \
  --advertise-address=192.168.1.100 \
  --tls-san=k8s-api.example.com \
  --tls-san=k8s-api-lb.example.com \

  # Authentication
  --authentication-token-webhook-config-file=/etc/k8shost/iam-webhook.yaml \
  --authentication-token-webhook-cache-ttl=2m \

  # Authorization (RBAC enabled by default)
  # --authorization-mode=Node,RBAC \  # Default, no need to specify

  # Audit logging
  --kube-apiserver-arg=audit-log-path=/var/log/k8shost/audit.log \
  --kube-apiserver-arg=audit-log-maxage=30 \
  --kube-apiserver-arg=audit-log-maxbackup=10 \
  --kube-apiserver-arg=audit-log-maxsize=100 \

  # Feature gates (if needed)
  # --kube-apiserver-arg=feature-gates=SomeFeature=true
```

### Authentication Webhook Configuration

**File: `/etc/k8shost/iam-webhook.yaml`**
```yaml
apiVersion: v1
kind: Config
clusters:
- name: iam-webhook
  cluster:
    server: https://iam-server:3000/apis/iam.plasmacloud.io/v1/authenticate
    certificate-authority: /etc/k8shost/ca.crt
users:
- name: k8s-apiserver
  user:
    client-certificate: /etc/k8shost/apiserver-client.crt
    client-key: /etc/k8shost/apiserver-client.key
current-context: webhook
contexts:
- context:
    cluster: iam-webhook
    user: k8s-apiserver
  name: webhook
```

**Certificate Management:**
- CA certificate: Issued by PlasmaCloud IAM PKI
- Client certificate: For kube-apiserver to authenticate to IAM webhook
- Rotation: Certificates expire after 1 year, auto-renewed by IAM

## Security

### TLS/mTLS

**Component Communication:**
| Source | Destination | Protocol | Auth Method |
|--------|-------------|----------|-------------|
| kube-apiserver | IAM webhook | HTTPS + mTLS | Client cert |
| FiberLB controller | FiberLB gRPC | gRPC + TLS | IAM token |
| FlashDNS controller | FlashDNS gRPC | gRPC + TLS | IAM token |
| LightningStor CSI | LightningStor gRPC | gRPC + TLS | IAM token |
| PrismNET CNI | PrismNET gRPC | gRPC + TLS | IAM token |
| kubectl | kube-apiserver | HTTPS | IAM token (Bearer) |

**Certificate Issuance:**
- All certificates issued by IAM service (centralized PKI)
- Automatic renewal before expiration
- Certificate revocation via IAM CRL

### Pod Security

**Pod Security Standards (PSS):**
- **Baseline Profile**: Enforced on all namespaces by default
  - Deny privileged containers
  - Deny host network/PID/IPC
  - Deny hostPath volumes
  - Deny privilege escalation
- **Restricted Profile**: Optional, for highly sensitive workloads

**Example PodSecurityPolicy (deprecated in K8s 1.25, use PSS):**
```yaml
apiVersion: policy/v1beta1
kind: PodSecurityPolicy
metadata:
  name: restricted
spec:
  privileged: false
  allowPrivilegeEscalation: false
  requiredDropCapabilities:
  - ALL
  volumes:
  - configMap
  - emptyDir
  - projected
  - secret
  - downwardAPI
  - persistentVolumeClaim
  runAsUser:
    rule: MustRunAsNonRoot
  seLinux:
    rule: RunAsAny
  fsGroup:
    rule: RunAsAny
```

**Security Contexts (enforced):**
```yaml
apiVersion: v1
kind: Pod
metadata:
  name: secure-pod
spec:
  securityContext:
    runAsNonRoot: true
    runAsUser: 1000
    fsGroup: 2000
  containers:
  - name: app
    image: myapp:latest
    securityContext:
      allowPrivilegeEscalation: false
      readOnlyRootFilesystem: true
      capabilities:
        drop:
        - ALL
```

**Service Account Permissions:**
- Minimal RBAC permissions by default
- Principle of least privilege
- No cluster-admin access for user workloads

## Testing Strategy

### Unit Tests

**Controllers (Go):**
```go
// fiberlb_controller_test.go
func TestReconcileLoadBalancer(t *testing.T) {
    // Mock K8s client
    client := fake.NewSimpleClientset()

    // Mock FiberLB gRPC client
    mockFiberLB := &mockFiberLBClient{}

    controller := NewFiberLBController(client, mockFiberLB)

    // Create test service
    svc := &corev1.Service{
        ObjectMeta: metav1.ObjectMeta{Name: "test-svc", Namespace: "default"},
        Spec: corev1.ServiceSpec{Type: corev1.ServiceTypeLoadBalancer},
    }

    // Reconcile
    err := controller.Reconcile(svc)
    assert.NoError(t, err)

    // Verify FiberLB API called
    assert.Equal(t, 1, mockFiberLB.createLoadBalancerCalls)
}
```

**CNI Plugin (Rust):**
```rust
#[test]
fn test_cni_add() {
    let mut mock_ovn = MockOVNClient::new();
    mock_ovn.expect_allocate_ip()
        .returning(|ns, pod| Ok("10.244.1.5/24".to_string()));

    let plugin = PrismNETPlugin::new(mock_ovn);
    let result = plugin.handle_add(/* ... */);

    assert!(result.is_ok());
    assert_eq!(result.unwrap().ip, "10.244.1.5");
}
```

**CSI Driver (Go):**
```go
func TestCreateVolume(t *testing.T) {
    mockLightningStor := &mockLightningStorClient{}
    mockLightningStor.On("CreateVolume", mock.Anything).Return(&Volume{ID: "vol-123"}, nil)

    driver := NewCSIDriver(mockLightningStor)

    req := &csi.CreateVolumeRequest{
        Name: "test-vol",
        CapacityRange: &csi.CapacityRange{RequiredBytes: 10 * 1024 * 1024 * 1024},
    }

    resp, err := driver.CreateVolume(context.Background(), req)
    assert.NoError(t, err)
    assert.Equal(t, "vol-123", resp.Volume.VolumeId)
}
```

### Integration Tests

**Test Environment:**
- Single-node k3s cluster (kind or k3s in Docker)
- Mock or real PlasmaCloud services (PrismNET, FiberLB, etc.)
- Automated setup and teardown

**Test Cases:**

**1. Single-Pod Deployment:**
```bash
#!/bin/bash
set -e

# Deploy nginx pod
kubectl apply -f - <<EOF
apiVersion: v1
kind: Pod
metadata:
  name: nginx
spec:
  containers:
  - name: nginx
    image: nginx:latest
    ports:
    - containerPort: 80
EOF

# Wait for pod to be running
kubectl wait --for=condition=Ready pod/nginx --timeout=60s

# Verify pod IP allocated
POD_IP=$(kubectl get pod nginx -o jsonpath='{.status.podIP}')
[ -n "$POD_IP" ] || exit 1

# Cleanup
kubectl delete pod nginx
```

**2. Service Exposure (LoadBalancer):**
```bash
#!/bin/bash
set -e

# Create deployment
kubectl create deployment web --image=nginx:latest --replicas=2

# Expose as LoadBalancer
kubectl expose deployment web --type=LoadBalancer --port=80

# Wait for external IP
for i in {1..30}; do
  EXTERNAL_IP=$(kubectl get svc web -o jsonpath='{.status.loadBalancer.ingress[0].ip}')
  [ -n "$EXTERNAL_IP" ] && break
  sleep 2
done

[ -n "$EXTERNAL_IP" ] || exit 1

# Verify HTTP access
curl -f http://$EXTERNAL_IP || exit 1

# Cleanup
kubectl delete svc web
kubectl delete deployment web
```

**3. PersistentVolume Provisioning:**
```bash
#!/bin/bash
set -e

# Create PVC
kubectl apply -f - <<EOF
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: test-pvc
spec:
  accessModes: [ReadWriteOnce]
  resources:
    requests:
      storage: 1Gi
  storageClassName: lightningstor-ssd
EOF

# Wait for PVC to be bound
kubectl wait --for=condition=Bound pvc/test-pvc --timeout=60s

# Create pod using PVC
kubectl apply -f - <<EOF
apiVersion: v1
kind: Pod
metadata:
  name: test-pod
spec:
  containers:
  - name: app
    image: busybox
    command: ["sh", "-c", "echo hello > /data/test.txt && sleep 3600"]
    volumeMounts:
    - name: data
      mountPath: /data
  volumes:
  - name: data
    persistentVolumeClaim:
      claimName: test-pvc
EOF

kubectl wait --for=condition=Ready pod/test-pod --timeout=60s

# Verify file written
kubectl exec test-pod -- cat /data/test.txt | grep hello || exit 1

# Cleanup
kubectl delete pod test-pod
kubectl delete pvc test-pvc
```

**4. Multi-Tenant Isolation:**
```bash
#!/bin/bash
set -e

# Create two namespaces
kubectl create namespace project-a
kubectl create namespace project-b

# Deploy pod in each
kubectl run pod-a --image=nginx -n project-a
kubectl run pod-b --image=nginx -n project-b

# Verify network isolation (if NetworkPolicies enabled)
# Pod A should NOT be able to reach Pod B
POD_B_IP=$(kubectl get pod pod-b -n project-b -o jsonpath='{.status.podIP}')
kubectl exec pod-a -n project-a -- curl --max-time 5 http://$POD_B_IP && exit 1 || true

# Cleanup
kubectl delete ns project-a project-b
```

### E2E Test Scenario

**End-to-End Test: Deploy Multi-Tier Application**

```bash
#!/bin/bash
set -ex

NAMESPACE="project-123"

# 1. Create namespace
kubectl create namespace $NAMESPACE

# 2. Deploy PostgreSQL with PVC
kubectl apply -n $NAMESPACE -f - <<EOF
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: postgres-pvc
spec:
  accessModes: [ReadWriteOnce]
  resources:
    requests:
      storage: 5Gi
  storageClassName: lightningstor-ssd
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: postgres
spec:
  replicas: 1
  selector:
    matchLabels:
      app: postgres
  template:
    metadata:
      labels:
        app: postgres
    spec:
      containers:
      - name: postgres
        image: postgres:15
        env:
        - name: POSTGRES_PASSWORD
          value: testpass
        ports:
        - containerPort: 5432
        volumeMounts:
        - name: data
          mountPath: /var/lib/postgresql/data
      volumes:
      - name: data
        persistentVolumeClaim:
          claimName: postgres-pvc
---
apiVersion: v1
kind: Service
metadata:
  name: postgres
spec:
  selector:
    app: postgres
  ports:
  - port: 5432
EOF

# 3. Deploy web application
kubectl apply -n $NAMESPACE -f - <<EOF
apiVersion: apps/v1
kind: Deployment
metadata:
  name: web
spec:
  replicas: 3
  selector:
    matchLabels:
      app: web
  template:
    metadata:
      labels:
        app: web
    spec:
      containers:
      - name: web
        image: myapp:latest
        env:
        - name: DATABASE_URL
          value: postgres://postgres:testpass@postgres:5432/mydb
        ports:
        - containerPort: 8080
EOF

# 4. Expose web via LoadBalancer
kubectl expose deployment web -n $NAMESPACE --type=LoadBalancer --port=80 --target-port=8080

# 5. Wait for resources
kubectl wait -n $NAMESPACE --for=condition=Ready pod -l app=postgres --timeout=120s
kubectl wait -n $NAMESPACE --for=condition=Ready pod -l app=web --timeout=120s

# 6. Verify LoadBalancer external IP
for i in {1..60}; do
  EXTERNAL_IP=$(kubectl get svc web -n $NAMESPACE -o jsonpath='{.status.loadBalancer.ingress[0].ip}')
  [ -n "$EXTERNAL_IP" ] && break
  sleep 2
done
[ -n "$EXTERNAL_IP" ] || { echo "No external IP assigned"; exit 1; }

# 7. Verify DNS resolution
kubectl run -n $NAMESPACE --rm -it --restart=Never test-dns --image=busybox -- nslookup postgres.${NAMESPACE}.svc.cluster.local

# 8. Verify HTTP access
curl -f http://$EXTERNAL_IP/health || { echo "Health check failed"; exit 1; }

# 9. Verify PVC mounted
kubectl exec -n $NAMESPACE deployment/postgres -- ls /var/lib/postgresql/data | grep pg_wal

# 10. Verify network isolation (optional, if NetworkPolicies enabled)
# ...

# Cleanup
kubectl delete namespace $NAMESPACE

echo "E2E test passed!"
```

## Implementation Phases

### Phase 1: Foundation (4-5 weeks)

**Week 1-2: k3s Setup and IAM Integration**
- [ ] Install and configure k3s with disabled components
- [ ] Implement IAM authentication webhook server
- [ ] Configure kube-apiserver to use IAM webhook
- [ ] Create RBAC templates (org admin, project admin, viewer)
- [ ] Test: Authenticate with IAM token, verify RBAC enforcement

**Week 3: PrismNET CNI Plugin**
- [ ] Implement CNI binary (ADD, DEL, CHECK commands)
- [ ] Integrate with PrismNET gRPC API (AllocateIP, ReleaseIP)
- [ ] Configure OVN logical switches per namespace
- [ ] Test: Create pod, verify network interface and IP allocation

**Week 4: FiberLB Controller**
- [ ] Implement controller watch loop (Services, Endpoints)
- [ ] Integrate with FiberLB gRPC API (CreateLoadBalancer, UpdatePool)
- [ ] Implement external IP allocation from pool
- [ ] Test: Expose service as LoadBalancer, verify external IP and routing

**Week 5: Basic RBAC and Multi-Tenancy**
- [ ] Implement namespace-per-project provisioning
- [ ] Deploy default RBAC roles and bindings
- [ ] Test: Create multiple projects, verify isolation

**Deliverables:**
- Functional k3s cluster with IAM authentication
- Pod networking via PrismNET
- LoadBalancer services via FiberLB
- Multi-tenant namespaces with RBAC

### Phase 2: Storage & DNS (5-6 weeks)

**Week 6-7: LightningStor CSI Driver**
- [ ] Implement CSI Controller Service (CreateVolume, DeleteVolume, ControllerPublishVolume)
- [ ] Implement CSI Node Service (NodeStageVolume, NodePublishVolume)
- [ ] Integrate with LightningStor gRPC API
- [ ] Deploy CSI driver as pods (controller + node DaemonSet)
- [ ] Create StorageClasses for SSD and HDD
- [ ] Test: Create PVC, attach to pod, write/read data

**Week 8: FlashDNS Controller**
- [ ] Implement controller watch loop (Services, Pods)
- [ ] Integrate with FlashDNS gRPC API (CreateRecord, UpdateRecord)
- [ ] Generate DNS records (A, SRV) for services and pods
- [ ] Configure kubelet DNS settings
- [ ] Test: Resolve service DNS from pod, verify DNS updates

**Week 9: Network Policy Support**
- [ ] Extend PrismNET CNI with NetworkPolicy controller
- [ ] Translate K8s NetworkPolicy to OVN ACLs
- [ ] Implement address sets for pod label selectors
- [ ] Test: Create NetworkPolicy, verify ingress/egress enforcement

**Week 10-11: Integration Testing**
- [ ] Write integration test suite (pod, service, PVC, DNS)
- [ ] Test multi-tier application deployment
- [ ] Performance testing (pod creation time, network throughput)
- [ ] Fix bugs and optimize

**Deliverables:**
- Persistent storage via LightningStor CSI
- Service discovery via FlashDNS
- Network policies enforced by PrismNET
- Comprehensive integration tests

### Phase 3: Advanced Features (Post-MVP, 6-8 weeks)

**StatefulSets:**
- [ ] Verify StatefulSet controller functionality (built-in to k3s)
- [ ] Test with headless services and volumeClaimTemplates
- [ ] Example: Deploy Cassandra or Kafka cluster

**PlasmaVMC CRI Integration:**
- [ ] Implement CRI server in PlasmaVMC (Rust)
- [ ] Create Firecracker microVM per pod
- [ ] Test pod lifecycle (create, start, stop, delete)
- [ ] Performance benchmarking (startup time, resource overhead)

**FlareDB as k3s Datastore:**
- [ ] Investigate etcd API compatibility layer for FlareDB
- [ ] Implement etcd v3 gRPC API shim
- [ ] Test k3s with FlareDB backend
- [ ] Benchmarking and stability testing

**Autoscaling:**
- [ ] Deploy metrics-server
- [ ] Implement HorizontalPodAutoscaler
- [ ] Test autoscaling based on CPU/memory metrics

**Ingress (L7 LoadBalancer):**
- [ ] Implement Ingress controller using FiberLB L7 capabilities
- [ ] Support host-based and path-based routing
- [ ] TLS termination

## Success Criteria

**Functional Requirements:**
1. ✅ Deploy pods, services, deployments using kubectl
2. ✅ LoadBalancer services receive external IPs from FiberLB
3. ✅ PersistentVolumes provisioned from LightningStor and mounted to pods
4. ✅ DNS resolution works for services and pods (via FlashDNS)
5. ✅ Multi-tenant isolation enforced (namespaces, RBAC, network policies)
6. ✅ IAM authentication and RBAC functional (token validation, user/group mapping)
7. ✅ E2E test passes (multi-tier application deployment)

**Performance Requirements:**
1. Pod creation time: <10 seconds (from API call to running state)
2. Service LoadBalancer IP allocation: <5 seconds
3. PersistentVolume provisioning: <30 seconds
4. DNS record updates: <10 seconds (after service creation)
5. Support 100+ pods per cluster
6. Support 10+ concurrent namespaces

**Operational Requirements:**
1. NixOS module for declarative deployment
2. Cluster upgrade path (k3s version upgrades)
3. Backup and restore procedures (etcd snapshots)
4. Monitoring and alerting integration (Prometheus, Grafana)
5. Logging aggregation (FluentBit → centralized log storage)

## Next Steps (S3-S6)

### S3: Workspace Scaffold
- Create `k8shost/` workspace directory structure
- Set up Go module for controllers (FiberLB, FlashDNS)
- Set up Rust workspace for CNI plugin
- Set up Go module for CSI driver
- Create NixOS module skeleton

**Directory Structure:**
```
k8shost/
├── controllers/          # Go: FiberLB, FlashDNS, IAM webhook
│   ├── fiberlb/
│   ├── flashdns/
│   ├── iamwebhook/
│   └── main.go
├── cni/                  # Rust: PrismNET CNI plugin
│   ├── src/
│   └── Cargo.toml
├── csi/                  # Go: LightningStor CSI driver
│   ├── controller/
│   ├── node/
│   └── main.go
├── nix/
│   └── modules/
│       └── k8shost.nix
└── tests/
    ├── integration/
    └── e2e/
```

### S4: Controllers Implementation
- Implement FiberLB controller (Service watch, gRPC integration)
- Implement FlashDNS controller (Service/Pod watch, DNS record sync)
- Implement IAM webhook server (TokenReview API, IAM validation)
- Unit tests for each controller

### S5: CNI + CSI Implementation
- Implement PrismNET CNI plugin (ADD/DEL/CHECK, OVN integration)
- Implement LightningStor CSI driver (Controller and Node services)
- Deploy CSI driver as pods (Deployment + DaemonSet)
- Unit tests for CNI and CSI

### S6: Integration Testing
- Set up integration test environment (k3s cluster + mock services)
- Write integration tests (pod, service, PVC, DNS, multi-tenant)
- Write E2E test (multi-tier application)
- CI/CD pipeline for automated testing

## References

- **k3s Architecture**: https://docs.k3s.io/architecture
- **k3s Installation**: https://docs.k3s.io/installation
- **k3s HA Setup**: https://docs.k3s.io/datastore/ha-embedded
- **CNI Specification**: https://github.com/containernetworking/cni/blob/main/SPEC.md
- **CSI Specification**: https://github.com/container-storage-interface/spec/blob/master/spec.md
- **K8s Authentication Webhooks**: https://kubernetes.io/docs/reference/access-authn-authz/authentication/#webhook-token-authentication
- **K8s Authorization (RBAC)**: https://kubernetes.io/docs/reference/access-authn-authz/rbac/
- **K8s Network Policies**: https://kubernetes.io/docs/concepts/services-networking/network-policies/
- **OVN Architecture**: https://www.ovn.org/support/dist-docs/ovn-architecture.7.html
- **Kubernetes API Reference**: https://kubernetes.io/docs/reference/kubernetes-api/

---

**Document Version:** 1.0
**Last Updated:** 2025-12-09
**Authors:** PlasmaCloud Platform Team
**Status:** Draft for Review