photoncloud-monorepo/docs/por/T051-fiberlb-integration/task.yaml
centra d2149b6249 fix(lightningstor): Fix SigV4 canonicalization for AWS S3 auth
- Replace form_urlencoded with RFC 3986 compliant URI encoding
- Implement aws_uri_encode() matching AWS SigV4 spec exactly
- Unreserved chars (A-Z,a-z,0-9,-,_,.,~) not encoded
- All other chars percent-encoded with uppercase hex
- Preserve slashes in paths, encode in query params
- Normalize empty paths to '/' per AWS spec
- Fix test expectations (body hash, HMAC values)
- Add comprehensive SigV4 signature determinism test

This fixes the canonicalization mismatch that caused signature
validation failures in T047. Auth can now be enabled for production.

Refs: T058.S1
2025-12-12 06:23:46 +09:00

168 lines
5.8 KiB
YAML
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

id: T051
name: FiberLB Integration Testing
goal: Validate FiberLB works correctly and integrates with other services for endpoint discovery
status: planned
priority: P1
owner: peerA
created: 2025-12-12
depends_on: []
blocks: [T039]
context: |
**User Direction (2025-12-12):**
"LBがちゃんと動くかも考えないといけませんね。これも重要な課題としてLBと他の結合試験やる必要があります"
"そもそもLBがちゃんと動かないならどのエンドポイントにアクセスしたら良いかわからない"
**Rationale:**
- LB is critical for service discovery
- Without working LB, clients don't know which endpoint to access
- Multiple instances of services need load balancing
PROJECT.md Item 7:
- MaglevによるL4ロードバランシング
- BGP AnycastによるL2ロードバランシング
- L7ロードバランシング
acceptance:
- FiberLB basic health check passes
- L4 load balancing works (round-robin or Maglev)
- Service registration/discovery works
- Integration with k8shost Service objects
- Integration with PlasmaVMC (VM endpoints)
steps:
- step: S1
name: FiberLB Current State Assessment
done: Understand existing FiberLB implementation
status: complete
completed: 2025-12-12 01:50 JST
owner: peerB
priority: P0
notes: |
**Architecture:** ~3100L Rust code, 3 crates
- Control Plane: 5 gRPC services (LB, Pool, Backend, Listener, HealthCheck)
- Data Plane: L4 TCP proxy (tokio bidirectional copy)
- Metadata: ChainFire/FlareDB/InMemory backends
- Integration: k8shost FiberLB controller (T028, 226L)
**✓ IMPLEMENTED:**
- L4 TCP load balancing (round-robin)
- Health checks (TCP, HTTP, configurable intervals)
- VIP allocation (203.0.113.0/24 TEST-NET-3)
- Multi-tenant scoping (org_id/project_id)
- k8shost Service integration (controller reconciles every 10s)
- Graceful backend exclusion on health failure
- NixOS packaging (systemd service)
**✗ GAPS (Blocking Production):**
CRITICAL:
1. Single Algorithm - Only round-robin works
- Missing: Maglev (PROJECT.md requirement)
- Missing: LeastConnections, IpHash, WeightedRR
- No session persistence/affinity
2. No L7 HTTP Load Balancing
- Only L4 TCP proxying
- No path/host routing
- No HTTP header inspection
- No TLS termination
3. No BGP Anycast (PROJECT.md requirement)
- Single-node data plane
- No VIP advertisement
- No ECMP support
4. Backend Discovery Gap
- k8shost controller creates LB but doesn't register Pod endpoints
- Need: Automatic backend registration from Service Endpoints
HIGH:
5. MVP VIP Management - Sequential allocation, no reclamation
6. No HA/Failover - Single FiberLB instance
7. No Metrics - Missing request rate, latency, error metrics
8. No UDP Support - TCP only
**Test Coverage:**
- Control plane: 12 unit tests, 4 integration tests ✓
- Data plane: 1 ignored E2E test (requires real server)
- k8shost integration: NO tests
**Production Readiness:** LOW-MEDIUM
- Works for basic L4 TCP
- Needs: endpoint discovery, Maglev/IpHash, BGP, HA, metrics
**Recommendation:**
S2 Focus: E2E L4 test with 3 backends
S3 Focus: Fix endpoint discovery gap, validate k8shost flow
S4 Focus: Health check failover validation
- step: S2
name: Basic LB Functionality Test
done: Round-robin or Maglev L4 LB working
status: pending
owner: peerB
priority: P0
notes: |
Test:
- Start multiple backend servers
- Configure FiberLB
- Verify requests are distributed
- step: S3
name: k8shost Service Integration
done: FiberLB provides VIP for k8shost Services with endpoint discovery
status: complete
completed: 2025-12-12 02:05 JST
owner: peerB
priority: P0
notes: |
**Implementation (k8shost/crates/k8shost-server/src/fiberlb_controller.rs):**
Enhanced FiberLB controller with complete endpoint discovery workflow:
1. Create LoadBalancer → receive VIP (existing)
2. Create Pool (RoundRobin, TCP) → NEW
3. Create Listener for each Service port → VIP:port → Pool → NEW
4. Query Pods matching Service.spec.selector → NEW
5. Create Backend for each Pod IP:targetPort → NEW
**Changes:**
- Added client connections: PoolService, ListenerService, BackendService
- Store pool_id in Service annotations
- Create Listener for each Service.spec.ports[] entry
- Use storage.list_pods() with label_selector for endpoint discovery
- Create Backend for each Pod with status.pod_ip
- Handle target_port mapping (Service port → Container port)
**Result:**
- ✓ Compilation successful
- ✓ Complete Service→VIP→Pool→Listener→Backend flow
- ✓ Automatic Pod endpoint registration
- ✓ Addresses user concern: "どのエンドポイントにアクセスしたら良いかわからない"
**Next Steps:**
- E2E validation: Deploy Service + Pods, verify VIP connectivity
- S4: Health check failover validation
- step: S4
name: Health Check and Failover
done: Unhealthy backends removed from pool
status: pending
owner: peerB
priority: P1
notes: |
Test:
- Active health checks
- Remove failed backend
- Recovery when backend returns
evidence: []
notes: |
**Strategic Value:**
- LB is foundational for production deployment
- Without working LB, multi-instance deployments are impossible
- Critical for T039 production readiness
**Related Work:**
- T028: k8shost FiberLB Controller (already implemented)
- T050.S6: k8shost REST API (includes Service endpoints)