# T039.S6 Integration Test Plan **Owner**: peerA **Prerequisites**: S3-S5 complete (NixOS provisioned, services deployed, clusters formed) ## Test Categories ### 1. Service Health Checks Verify all 11 services respond on all 3 nodes. ```bash # Node IPs (from T036 config) NODES=(192.168.100.11 192.168.100.12 192.168.100.13) # Service ports (from nix/modules/*.nix - verified 2025-12-12) declare -A SERVICES=( ["chainfire"]=2379 ["flaredb"]=2479 ["iam"]=3000 ["plasmavmc"]=4000 ["lightningstor"]=8000 ["flashdns"]=6000 ["fiberlb"]=7000 ["prismnet"]=5000 ["k8shost"]=6443 ["nightlight"]=9101 ["creditservice"]=3010 ) # Health check each service on each node for node in "${NODES[@]}"; do for svc in "${!SERVICES[@]}"; do grpcurl -plaintext $node:${SERVICES[$svc]} list || echo "FAIL: $svc on $node" done done ``` **Expected**: All services respond with gRPC reflection ### 2. Cluster Formation Validation #### 2.1 ChainFire Cluster ```bash # Check cluster status on each node for node in "${NODES[@]}"; do grpcurl -plaintext $node:2379 chainfire.ClusterService/GetStatus done ``` **Expected**: - 3 nodes in cluster - Leader elected - All nodes healthy #### 2.2 FlareDB Cluster ```bash # Check FlareDB cluster health for node in "${NODES[@]}"; do grpcurl -plaintext $node:2479 flaredb.AdminService/GetClusterStatus done ``` **Expected**: - 3 nodes joined - Quorum formed (2/3 minimum) ### 3. Cross-Component Integration (T029 Scenarios) #### 3.1 IAM Authentication Flow ```bash # Create test organization grpcurl -plaintext $NODES[0]:3000 iam.OrgService/CreateOrg \ -d '{"name":"test-org","display_name":"Test Organization"}' # Create test user grpcurl -plaintext $NODES[0]:3000 iam.UserService/CreateUser \ -d '{"org_id":"test-org","username":"testuser","password":"testpass"}' # Authenticate and get token TOKEN=$(grpcurl -plaintext $NODES[0]:3000 iam.AuthService/Authenticate \ -d '{"username":"testuser","password":"testpass"}' | jq -r '.token') # Validate token grpcurl -plaintext $NODES[0]:3000 iam.AuthService/ValidateToken \ -d "{\"token\":\"$TOKEN\"}" ``` **Expected**: Token issued and validated successfully #### 3.2 FlareDB Storage ```bash # Write data grpcurl -plaintext $NODES[0]:2479 flaredb.KVService/Put \ -d '{"key":"test-key","value":"dGVzdC12YWx1ZQ=="}' # Read from different node (replication test) grpcurl -plaintext $NODES[1]:2479 flaredb.KVService/Get \ -d '{"key":"test-key"}' ``` **Expected**: Data replicated across nodes #### 3.3 LightningSTOR S3 Operations ```bash # Create bucket via S3 API curl -X PUT http://$NODES[0]:9100/test-bucket # Upload object curl -X PUT http://$NODES[0]:9100/test-bucket/test-object \ -d "test content" # Download object from different node curl http://$NODES[1]:9100/test-bucket/test-object ``` **Expected**: Object storage working, multi-node accessible #### 3.4 FlashDNS Resolution ```bash # Add DNS record grpcurl -plaintext $NODES[0]:6000 flashdns.RecordService/CreateRecord \ -d '{"zone":"test.cloud","name":"test","type":"A","value":"192.168.100.100"}' # Query DNS from different node dig @$NODES[1] test.test.cloud A +short ``` **Expected**: DNS record created and resolvable ### 4. Nightlight Metrics Collection ```bash # Check Prometheus endpoint on each node for node in "${NODES[@]}"; do curl -s http://$node:9090/api/v1/targets | jq '.data.activeTargets | length' done # Query metrics curl -s "http://$NODES[0]:9090/api/v1/query?query=up" | jq '.data.result' ``` **Expected**: All targets up, metrics being collected ### 5. FiberLB Load Balancing (T051 Validation) ```bash # Create load balancer for test service grpcurl -plaintext $NODES[0]:7000 fiberlb.LBService/CreateLoadBalancer \ -d '{"name":"test-lb","org_id":"test-org"}' # Create pool with round-robin grpcurl -plaintext $NODES[0]:7000 fiberlb.PoolService/CreatePool \ -d '{"lb_id":"...","algorithm":"ROUND_ROBIN","protocol":"TCP"}' # Add backends for i in 1 2 3; do grpcurl -plaintext $NODES[0]:7000 fiberlb.BackendService/CreateBackend \ -d "{\"pool_id\":\"...\",\"address\":\"192.168.100.1$i\",\"port\":8080}" done # Verify distribution (requires test backend servers) for i in {1..10}; do curl -s http://:80 | head -1 done | sort | uniq -c ``` **Expected**: Requests distributed across backends ### 6. PrismNET Overlay Networking ```bash # Create VPC grpcurl -plaintext $NODES[0]:5000 prismnet.VPCService/CreateVPC \ -d '{"name":"test-vpc","cidr":"10.0.0.0/16"}' # Create subnet grpcurl -plaintext $NODES[0]:5000 prismnet.SubnetService/CreateSubnet \ -d '{"vpc_id":"...","name":"test-subnet","cidr":"10.0.1.0/24"}' # Create port grpcurl -plaintext $NODES[0]:5000 prismnet.PortService/CreatePort \ -d '{"subnet_id":"...","name":"test-port"}' ``` **Expected**: VPC/subnet/port created successfully ### 7. CreditService Quota (If Implemented) ```bash # Check wallet balance grpcurl -plaintext $NODES[0]:3010 creditservice.WalletService/GetBalance \ -d '{"org_id":"test-org","project_id":"test-project"}' ``` **Expected**: Quota system responding ### 8. Node Failure Resilience ```bash # Shutdown node03 ssh root@$NODES[2] "systemctl stop chainfire flaredb" # Verify cluster still operational (quorum: 2/3) grpcurl -plaintext $NODES[0]:2379 chainfire.ClusterService/GetStatus # Write data grpcurl -plaintext $NODES[0]:2479 flaredb.KVService/Put \ -d '{"key":"failover-test","value":"..."}' # Read data grpcurl -plaintext $NODES[1]:2479 flaredb.KVService/Get \ -d '{"key":"failover-test"}' # Restart node03 ssh root@$NODES[2] "systemctl start chainfire flaredb" # Verify rejoin sleep 30 grpcurl -plaintext $NODES[2]:2379 chainfire.ClusterService/GetStatus ``` **Expected**: Cluster survives single node failure, node rejoins ## Test Execution Order 1. Service Health (basic connectivity) 2. Cluster Formation (Raft quorum) 3. IAM Auth (foundation for other tests) 4. FlareDB Storage (data layer) 5. Nightlight Metrics (observability) 6. LightningSTOR S3 (object storage) 7. FlashDNS (name resolution) 8. FiberLB (load balancing) 9. PrismNET (networking) 10. CreditService (quota) 11. Node Failure (resilience) ## Success Criteria - All services respond on all nodes - ChainFire cluster: 3 nodes, leader elected - FlareDB cluster: quorum formed, replication working - IAM: auth tokens issued/validated - Data: read/write across nodes - Metrics: targets up, queries working - LB: traffic distributed - Failover: survives 1 node loss ## Failure Handling If tests fail: 1. Capture service logs: `journalctl -u --no-pager` 2. Document failure in evidence section 3. Create follow-up task if systemic issue 4. Do not proceed to production traffic