photoncloud-monorepo/docs/por/T027-production-hardening/task.yaml
centra df80d00696 feat(T027): Unify configuration for flaredb-server and plasmavmc-server
Refactored flaredb-server and plasmavmc-server to use a unified configuration
approach, supporting TOML files, environment variables, and CLI overrides.

This completes T027.S0 Config Unification.

Changes include:
- Created dedicated  modules for both flaredb-server and plasmavmc-server
  to define  structs.
- Implemented  for  in both components.
- Modified  in flaredb-server to use  instead of .
- Modified  in plasmavmc-server to add  dependency.
- Refactored  in both components to load config from TOML/env and apply
  CLI overrides.
- Extended  in plasmavmc-server/src/config.rs to include all
  relevant Firecracker backend parameters.
- Implemented  in
  plasmavmc/crates/plasmavmc-firecracker/src/lib.rs to construct backend
  from the unified configuration.
- Updated docs/por/T027-production-hardening/task.yaml to mark S0 as complete
  and the overall task status as active.
2025-12-10 05:11:04 +09:00

73 lines
2.1 KiB
YAML

id: T027
name: Production Hardening
goal: Transform MVP stack into a production-grade, observable, and highly available platform.
status: active
priority: P1
owner: peerB
created: 2025-12-10
depends_on: [T026]
blocks: []
context: |
With MVP functionality verified (T026), the platform must be hardened for
production usage. This involves ensuring high availability (HA), comprehensive
observability (metrics/logs), and security (TLS).
This task focuses on Non-Functional Requirements (NFRs). Functional gaps
(deferred P1s) will be handled in T028.
acceptance:
- All components use a unified configuration approach (clap + config file or env)
- Full observability stack (Prometheus/Grafana/Loki) operational via NixOS
- All services exporting metrics and logs to the stack
- Chainfire and FlareDB verified in 3-node HA cluster
- TLS enabled for all inter-service communication (optional for internal, required for external)
- Chaos testing (kill node, verify recovery) passed
- Ops documentation (Backup/Restore, Upgrade) created
steps:
- step: S0
name: Config Unification
done: All components use unified configuration (clap + config file/env)
status: complete
owner: peerB
priority: P0
- step: S1
name: Observability Stack
done: Prometheus, Grafana, and Loki deployed and scraping targets
status: pending
owner: peerB
priority: P0
- step: S2
name: Service Telemetry Integration
done: All components (Chainfire, FlareDB, IAM, k8shost) dashboards functional
status: pending
owner: peerB
priority: P0
- step: S3
name: HA Clustering Verification
done: 3-node Chainfire/FlareDB cluster survives single node failure
status: pending
owner: peerB
priority: P0
- step: S4
name: Security Hardening
done: mTLS/TLS enabled where appropriate, secrets management verified
status: pending
owner: peerB
priority: P1
- step: S5
name: Ops Documentation
done: Runbooks for common operations (Scale out, Restore, Upgrade)
status: pending
owner: peerB
priority: P1
evidence: []
notes: |
Separated from functional feature work (T028).