# T057.S1: IPAM System Design Specification **Author:** PeerA **Date:** 2025-12-12 **Status:** DRAFT ## 1. Executive Summary This document specifies the IPAM (IP Address Management) system for k8shost integration with PrismNET. The design extends PrismNET's existing IPAM capabilities to support Kubernetes Service ClusterIP and LoadBalancer IP allocation. ## 2. Current State Analysis ### 2.1 k8shost Service IP Allocation (Current) **File:** `k8shost/crates/k8shost-server/src/services/service.rs:28-37` ```rust pub fn allocate_cluster_ip() -> String { // Simple counter-based allocation in 10.96.0.0/16 static COUNTER: AtomicU32 = AtomicU32::new(100); let counter = COUNTER.fetch_add(1, Ordering::SeqCst); format!("10.96.{}.{}", (counter >> 8) & 0xff, counter & 0xff) } ``` **Issues:** - No persistence (counter resets on restart) - No collision detection - No integration with network layer - Hard-coded CIDR range ### 2.2 PrismNET IPAM (Current) **File:** `prismnet/crates/prismnet-server/src/metadata.rs:577-662` **Capabilities:** - CIDR parsing and IP enumeration - Allocated IP tracking via Port resources - Gateway IP avoidance - Subnet-scoped allocation - ChainFire persistence **Limitations:** - Designed for VM/container ports, not K8s Services - No dedicated Service IP subnet concept ## 3. Architecture Design ### 3.1 Conceptual Model ``` ┌─────────────────────────────────────────────────────────────┐ │ Tenant Scope │ │ │ │ ┌────────────────┐ ┌────────────────┐ │ │ │ VPC │ │ Service Subnet │ │ │ │ (10.0.0.0/16) │ │ (10.96.0.0/16) │ │ │ └───────┬────────┘ └───────┬─────────┘ │ │ │ │ │ │ ┌───────┴────────┐ ┌───────┴─────────┐ │ │ │ Subnet │ │ Service IPs │ │ │ │ (10.0.1.0/24) │ │ ClusterIP │ │ │ └───────┬────────┘ │ LoadBalancerIP │ │ │ │ └─────────────────┘ │ │ ┌───────┴────────┐ │ │ │ Ports (VMs) │ │ │ └────────────────┘ │ └─────────────────────────────────────────────────────────────┘ ``` ### 3.2 New Resource: ServiceIPPool A dedicated IP pool for Kubernetes Services within a tenant. ```rust /// Service IP Pool for k8shost Service allocation pub struct ServiceIPPool { pub id: ServiceIPPoolId, pub org_id: String, pub project_id: String, pub name: String, pub cidr_block: String, // e.g., "10.96.0.0/16" pub pool_type: ServiceIPPoolType, pub allocated_ips: HashSet, pub created_at: u64, pub updated_at: u64, } pub enum ServiceIPPoolType { ClusterIP, // For ClusterIP services LoadBalancer, // For LoadBalancer services (VIPs) NodePort, // Reserved NodePort range } ``` ### 3.3 Integration Architecture ``` ┌──────────────────────────────────────────────────────────────────┐ │ k8shost Server │ │ │ │ ┌─────────────────────┐ ┌──────────────────────┐ │ │ │ ServiceService │─────>│ IpamClient │ │ │ │ create_service() │ │ allocate_ip() │ │ │ │ delete_service() │ │ release_ip() │ │ │ └─────────────────────┘ └──────────┬───────────┘ │ └──────────────────────────────────────────┼───────────────────────┘ │ gRPC ┌──────────────────────────────────────────┼───────────────────────┐ │ PrismNET Server │ │ │ ▼ │ │ ┌─────────────────────┐ ┌──────────────────────┐ │ │ │ IpamService (new) │<─────│ NetworkMetadataStore│ │ │ │ AllocateServiceIP │ │ service_ip_pools │ │ │ │ ReleaseServiceIP │ │ allocated_ips │ │ │ └─────────────────────┘ └──────────────────────┘ │ └──────────────────────────────────────────────────────────────────┘ ``` ## 4. API Design ### 4.1 PrismNET IPAM gRPC Service ```protobuf service IpamService { // Create a Service IP Pool rpc CreateServiceIPPool(CreateServiceIPPoolRequest) returns (CreateServiceIPPoolResponse); // Get Service IP Pool rpc GetServiceIPPool(GetServiceIPPoolRequest) returns (GetServiceIPPoolResponse); // List Service IP Pools rpc ListServiceIPPools(ListServiceIPPoolsRequest) returns (ListServiceIPPoolsResponse); // Allocate IP from pool rpc AllocateServiceIP(AllocateServiceIPRequest) returns (AllocateServiceIPResponse); // Release IP back to pool rpc ReleaseServiceIP(ReleaseServiceIPRequest) returns (ReleaseServiceIPResponse); // Get IP allocation status rpc GetIPAllocation(GetIPAllocationRequest) returns (GetIPAllocationResponse); } message AllocateServiceIPRequest { string org_id = 1; string project_id = 2; string pool_id = 3; // Optional: specific pool ServiceIPPoolType pool_type = 4; // Required: ClusterIP or LoadBalancer string service_uid = 5; // K8s service UID for tracking string requested_ip = 6; // Optional: specific IP request } message AllocateServiceIPResponse { string ip_address = 1; string pool_id = 2; } ``` ### 4.2 k8shost IpamClient ```rust /// IPAM client for k8shost pub struct IpamClient { client: IpamServiceClient, } impl IpamClient { /// Allocate ClusterIP for a Service pub async fn allocate_cluster_ip( &mut self, org_id: &str, project_id: &str, service_uid: &str, ) -> Result; /// Allocate LoadBalancer IP for a Service pub async fn allocate_loadbalancer_ip( &mut self, org_id: &str, project_id: &str, service_uid: &str, ) -> Result; /// Release an allocated IP pub async fn release_ip( &mut self, org_id: &str, project_id: &str, ip_address: &str, ) -> Result<()>; } ``` ## 5. Storage Schema ### 5.1 ChainFire Key Structure ``` /prismnet/ipam/pools/{org_id}/{project_id}/{pool_id} /prismnet/ipam/allocations/{org_id}/{project_id}/{ip_address} ``` ### 5.2 Allocation Record ```rust pub struct IPAllocation { pub ip_address: String, pub pool_id: ServiceIPPoolId, pub org_id: String, pub project_id: String, pub resource_type: String, // "k8s-service", "vm-port", etc. pub resource_id: String, // Service UID, Port ID, etc. pub allocated_at: u64, } ``` ## 6. Implementation Plan ### Phase 1: PrismNET IPAM Service (S1 deliverable) 1. Add `ServiceIPPool` type to prismnet-types 2. Add `IpamService` gRPC service to prismnet-api 3. Implement `IpamServiceImpl` in prismnet-server 4. Storage: pools and allocations in ChainFire ### Phase 2: k8shost Integration (S2) 1. Create `IpamClient` in k8shost 2. Replace `allocate_cluster_ip()` with PrismNET call 3. Add IP release on Service deletion 4. Configuration: PrismNET endpoint env var ### Phase 3: Default Pool Provisioning 1. Auto-create default ClusterIP pool per tenant 2. Default CIDR: `10.96.{tenant_hash}.0/20` (4096 IPs) 3. LoadBalancer pool: `192.168.{tenant_hash}.0/24` (256 IPs) ## 7. Tenant Isolation ### 7.1 Pool Isolation Each tenant (org_id + project_id) has: - Separate ClusterIP pool - Separate LoadBalancer pool - Non-overlapping IP ranges ### 7.2 IP Collision Prevention - IP uniqueness enforced at pool level - CAS (Compare-And-Swap) for concurrent allocation - ChainFire transactions for atomicity ## 8. Default Configuration ```yaml # k8shost config ipam: enabled: true prismnet_endpoint: "http://prismnet:9090" # Default pools (auto-created if missing) default_cluster_ip_cidr: "10.96.0.0/12" # 1M IPs shared default_loadbalancer_cidr: "192.168.0.0/16" # 64K IPs shared # Per-tenant allocation cluster_ip_pool_size: "/20" # 4096 IPs per tenant loadbalancer_pool_size: "/24" # 256 IPs per tenant ``` ## 9. Backward Compatibility ### 9.1 Migration Path 1. Deploy new IPAM service in PrismNET 2. k8shost checks for IPAM availability on startup 3. If IPAM unavailable, fall back to local counter 4. Log warning for fallback mode ### 9.2 Existing Services - Existing Services retain their IPs - On next restart, k8shost syncs with IPAM - Conflict resolution: IPAM is source of truth ## 10. Observability ### 10.1 Metrics ``` # Pool utilization prismnet_ipam_pool_total{org_id, project_id, pool_type} prismnet_ipam_pool_allocated{org_id, project_id, pool_type} prismnet_ipam_pool_available{org_id, project_id, pool_type} # Allocation rate prismnet_ipam_allocations_total{org_id, project_id, pool_type} prismnet_ipam_releases_total{org_id, project_id, pool_type} ``` ### 10.2 Alerts - Pool exhaustion warning at 80% utilization - Allocation failure alerts - Pool not found errors ## 11. References - [Kubernetes Service IP allocation](https://kubernetes.io/docs/concepts/services-networking/cluster-ip-allocation/) - [OpenStack Neutron IPAM](https://docs.openstack.org/neutron/latest/admin/intro-os-networking.html) - PrismNET metadata.rs IPAM implementation ## 12. Decision Summary | Aspect | Decision | Rationale | |--------|----------|-----------| | IPAM Location | PrismNET | Network layer owns IP management | | Storage | ChainFire | Consistency with existing PrismNET storage | | Pool Type | Per-tenant | Tenant isolation, quota enforcement | | Integration | gRPC client | Consistent with other PlasmaCloud services | | Fallback | Local counter | Backward compatibility |