- Created T026-practical-test task.yaml for MVP smoke testing - Added k8shost-server to flake.nix (packages, apps, overlays) - Staged all workspace directories for nix flake build - Updated flake.nix shellHook to include k8shost Resolves: T026.S1 blocker (R8 - nix submodule visibility)
115 lines
4.2 KiB
YAML
115 lines
4.2 KiB
YAML
id: T004
|
|
name: P0 Critical Fixes - Production Blockers
|
|
status: complete
|
|
created: 2025-12-08
|
|
completed: 2025-12-08
|
|
owner: peerB
|
|
goal: Resolve all 6 P0 blockers identified in T003 gap analysis
|
|
|
|
description: |
|
|
Fix critical gaps that block production deployment.
|
|
Priority order: FlareDB persistence (data loss) > Chainfire (etcd compat) > IAM (K8s deploy)
|
|
|
|
acceptance:
|
|
- All 6 P0 fixes implemented and tested
|
|
- No regressions in existing tests
|
|
- R4 risk (FlareDB data loss) closed
|
|
|
|
steps:
|
|
- step: S1
|
|
action: FlareDB persistent Raft storage
|
|
priority: P0-CRITICAL
|
|
status: complete
|
|
complexity: large
|
|
estimate: 1-2 weeks
|
|
location: flaredb-raft/src/persistent_storage.rs, raft_node.rs, store.rs
|
|
completed: 2025-12-08
|
|
notes: |
|
|
Implemented persistent Raft storage with:
|
|
- New `new_persistent()` constructor uses RocksDB via PersistentFlareStore
|
|
- Snapshot persistence to RocksDB (data + metadata)
|
|
- Startup recovery: loads snapshot, restores state machine
|
|
- Fixed state machine serialization (bincode for tuple map keys)
|
|
- FlareDB server now uses persistent storage by default
|
|
- Added test: test_snapshot_persistence_and_recovery
|
|
|
|
- step: S2
|
|
action: Chainfire lease service
|
|
priority: P0
|
|
status: complete
|
|
complexity: medium
|
|
estimate: 3-5 days
|
|
location: chainfire.proto, lease.rs, lease_store.rs, lease_service.rs
|
|
completed: 2025-12-08
|
|
notes: |
|
|
Implemented full Lease service for etcd compatibility:
|
|
- Proto: LeaseGrant, LeaseRevoke, LeaseKeepAlive, LeaseTimeToLive, LeaseLeases RPCs
|
|
- Types: Lease, LeaseData, LeaseId in chainfire-types
|
|
- Storage: LeaseStore with grant/revoke/refresh/attach_key/detach_key/export/import
|
|
- State machine: Handles LeaseGrant/Revoke/Refresh commands, key attachment
|
|
- Service: LeaseServiceImpl in chainfire-api with streaming keep-alive
|
|
- Integration: Put/Delete auto-attach/detach keys to/from leases
|
|
|
|
- step: S3
|
|
action: Chainfire read consistency
|
|
priority: P0
|
|
status: complete
|
|
complexity: small
|
|
estimate: 1-2 days
|
|
location: kv_service.rs, chainfire.proto
|
|
completed: 2025-12-08
|
|
notes: |
|
|
Implemented linearizable/serializable read modes:
|
|
- Added `serializable` field to RangeRequest in chainfire.proto
|
|
- When serializable=false (default), calls linearizable_read() before reading
|
|
- linearizable_read() uses OpenRaft's ensure_linearizable() for consistency
|
|
- Updated all client RangeRequest usages with explicit serializable flags
|
|
|
|
- step: S4
|
|
action: Chainfire range in transactions
|
|
priority: P0
|
|
status: complete
|
|
complexity: small
|
|
estimate: 1-2 days
|
|
location: kv_service.rs, command.rs, state_machine.rs
|
|
completed: 2025-12-08
|
|
notes: |
|
|
Fixed Range operations in transactions:
|
|
- Added TxnOp::Range variant to chainfire-types/command.rs
|
|
- Updated state_machine.rs to handle Range ops (read-only, no state change)
|
|
- Fixed convert_ops in kv_service.rs to convert RequestRange properly
|
|
- Removed dummy Delete op workaround
|
|
|
|
- step: S5
|
|
action: IAM health endpoints
|
|
priority: P0
|
|
status: complete
|
|
complexity: small
|
|
estimate: 1 day
|
|
completed: 2025-12-08
|
|
notes: |
|
|
Added gRPC health service (grpc.health.v1.Health) using tonic-health.
|
|
K8s can use grpc health probes for liveness/readiness.
|
|
Services: IamAuthz, IamToken, IamAdmin all report SERVING status.
|
|
|
|
- step: S6
|
|
action: IAM metrics
|
|
priority: P0
|
|
status: complete
|
|
complexity: small
|
|
estimate: 1-2 days
|
|
completed: 2025-12-08
|
|
notes: |
|
|
Added Prometheus metrics using metrics-exporter-prometheus.
|
|
Serves metrics at http://0.0.0.0:{metrics_port}/metrics (default 9090).
|
|
Pre-defined counters: authz_requests, allowed, denied, token_issued.
|
|
Pre-defined histogram: request_duration_seconds.
|
|
|
|
parallel_track: |
|
|
After S5+S6 complete (IAM P0s, ~3 days), PlasmaVMC spec design can begin
|
|
while S1 (FlareDB persistence) continues.
|
|
|
|
notes: |
|
|
Strategic decision: Modified (B) Parallel approach.
|
|
FlareDB persistence is critical path - start immediately.
|
|
Small fixes (S3-S6) can be done in parallel by multiple developers.
|