photoncloud-monorepo/TODO.md

68 KiB

UltraCloud Baseline TODO (2026-04-10)

  • Task: 0fe10731-bdbc-4f8f-8bcc-5f5a16903200
  • 作成ブランチ: task/0fe10731-baseline-todo
  • ベース: origin/main at b8ebd24d4e9b2dbe71e34ba09b77092dfa7dd43c
  • 引き継ぎ方針: task/343c8c57-main-reaggregate の dirty worktree は reset/revert せず、そのまま新ブランチへ持ち上げた。
  • この票の目的: 各コンポーネントの責務、正本 entrypoint、現時点の証拠、未証明事項、優先度付き問題票、依存関係を 1 枚に固定し、以後の自律実装の基準票にする。
  • 調査入力: README.md, docs/component-matrix.md, docs/testing.md, nix/test-cluster/README.md, plans/cluster-investigation-2026-03-02/*, 現在の nix/modules/*, nix/single-node/*, nix/nodes/baremetal-qemu/*, nix/test-cluster/*, 各 component の src/main.rs / API 定義。

Canonical Boundary Snapshot

  • 正本 profile は 3 つ: single-node dev, 3-node HA control plane, bare-metal bootstrap
  • 最小コアは chainfire + flaredb + iam + prismnet + plasmavmc
  • ネットワーク provider bundle は prismnet + flashdns + fiberlb
  • VM hosting bundle は plasmavmc + prismnet + coronafs + lightningstor
  • edge/tenant bundle は apigateway + nightlight + creditservice
  • rollout bundle は deployer + nix-agent + fleet-scheduler + node-agent
  • 2026-04-10 の current branch では、QEMU/KVM を正本の local proof とし、bare-metal proof も QEMU as hardware として同一 ISO 契約で扱う構造が入っている。

2026-03-02 Failure Split

2026-03-02 の失敗で、2026-04-10 current branch では file-level に解消済みのもの

  • ARCH-001: flake.nix が欠損 docs/.../configuration.nix を参照していた件は解消済み。現在の正本は nix/nodes/vm-cluster/node01, node02, node03canonical-profile-eval-guards
  • ARCH-002: ISO install の disko.nix 欠損参照は解消済み。現在は nix/nodes/baremetal-qemu/control-plane/disko.nix.../worker/disko.nixverify-baremetal-iso.sh が直接使う。
  • ARCH-003: deployer の Nix wiring 欠損は解消済み。nix/modules/deployer.nix, flake.nix の package/app/check 定義, deployer-server/api/v1/phone-home が存在する。
  • TC-001: joinAddr 不整合は解消済み。現在の chainfire / flaredb module は initialPeers 契約に揃っている。
  • TC-002: node06creditservice 評価失敗は解消済み。現在の nix/test-cluster/node06.nixcreditservice.nix を import し、flaredbAddr も与えている。
  • COMP-001 から COMP-004: IAM endpoint 注入ミスマッチは解消済み。prismnet, plasmavmc, fiberlb, lightningstor, flashdns, creditservice は現在 module から binary が実際に読む config key に変換している。
  • ARCH-004: first-boot の leader_url 契約不整合は解消済み。nix/modules/first-boot-automation.nixhttp://localhost:8081 / 8082/admin/member/add を前提にしている。
  • ARCH-005: FlareDB に first-boot 用 join API が無かった件は解消済み。flaredb/crates/flaredb-server/src/rest.rsPOST /admin/member/add がある。
  • 3.1 NightLight grpcPort mismatch: 解消済み。nightlight-server は現在 HTTP と gRPC を両方 bind する。
  • ARCH-006 / cluster-config 二重実装問題: 2026-03-02 にあった nix-nos/topology.nix 起点の重複は current tree ではそのまま見当たらず、正本は nix/lib/cluster-schema.nixnix/modules/ultracloud-cluster.nix に寄っている。
  • QLT-001: flake.nix 上の大量 doCheck = false 群は、少なくとも current file-level ではそのまま残っていない。

2026-03-02 の失敗と切り分けて、2026-04-10 では「構造 fix はあるが runtime 再証明が未了」のもの

  • VERIFY-001: 2026-04-10 の local AMD/KVM host で supported-surface-guard, single-node-trial-vm, single-node-quickstart, fresh-smoke, fresh-demo-vm-webapp, fresh-matrix, ./nix/test-cluster/run-publishable-kvm-suite.sh ./work/publishable-kvm-suite, canonical-profile-eval-guards, portable-control-plane-regressions, deployer-bootstrap-e2e, host-lifecycle-e2e, fleet-scheduler-e2e, baremetal-iso, nix build .#checks.x86_64-linux.baremetal-iso-e2e, and the built ./result/bin/baremetal-iso-e2e exact runner は再走済みで pass。未再証明なのは実機 bare-metal smoke のみ。
  • VERIFY-002: bare-metal bootstrap は QEMU ISO proof まで閉じているが、USB/BMC/実機への同契約再証明はまだ無い。ただし 2026-04-10 に nix run ./nix/test-cluster#hardware-smoke -- preflight を追加し、transport 不在時の blocked state は ./work/hardware-smoke/latest/status.envmissing-requirements.txt へ機械的に残せるようになった。
  • VERIFY-003: config-contract 修正は run-publishable-kvm-suite.sh で全 add-on 有効 profile まで再確認済み。baremetal-iso-e2e も materialized host-KVM runner へ移行済みで、残件は hardware bring-up に絞られた。

First Tranche Backlog

  • TRANCHE-01: 完了。single-node dev の optional bundle health gating は 2026-04-10 に修正済み。coronafs の port mismatch と flashdns / fiberlb / lightningstor の health 未監視を解消した。
  • TRANCHE-02: baremetal-isobaremetal-iso-e2e exact runner は 2026-04-10 の local AMD/KVM host で再走済み。次段で USB/BMC/実機 1 台の smoke を追加する。
  • TRANCHE-03: 完了。2026-04-10 に nix run ./nix/test-cluster#cluster -- durability-proof を追加し、chainfire / flaredb の logical backup/restore と、deployer の admin pre-register request replay + restart persistence proof を product doc と harness へ固定した。
  • TRANCHE-04: 完了。fleet-scheduler, nix-agent, node-agent, deployer-ctl の local chainfire 既定 endpoint は 2026-04-10 に canonical http://127.0.0.1:2379 へ正規化した。
  • TRANCHE-05: 完了。fiberlb の HTTPS health check は 2026-04-10 に supported scope を明文化し、現時点では backend TLS 証明書検証なしの TCP reachability + HTTP status のみが製品契約だと docs/guard/source comment へ固定した。
  • TRANCHE-06: 完了。k8shost は 2026-04-10 に API/control-plane 製品として固定し、runtime dataplane helpers は archived non-product と docs/guard/TODO を一致させた。
  • TRANCHE-07: 完了。2026-04-10 の durability-prooflightningstor distributed backend の node-loss / repair と coronafs controller/node split outage を canonical failure-injection proof として保存する。
  • TRANCHE-08: 完了。2026-04-10 に hardware-smoke preflight/handoff wrapper を追加し、deployer -> ISO -> first-boot -> nix-agent の実機 bring-up を USB/BMC/Redfish 共通 entrypoint で準備できるようにした。transport 不在時の blocked artifact も ./work/hardware-smoke に固定化した。
  • TRANCHE-10: 完了。2026-04-10 に nix run ./nix/test-cluster#cluster -- rollout-soak を longer-run KVM operator lane として固定し、draining maintenance, worker power-loss, deployer / fleet-scheduler / node-agent restart, fixed-membership chainfire / flaredb restart を同一 artifact root に保存した。steady-state test-clusternix-agent.service が載っていないことも scope marker artifact で明文化した。
  • TRANCHE-11: 完了。2026-04-10 に DEPLOYER-P1-01FLEET-P1-01 を scope-fixed final state へ更新し、rollout-soakscope-fixed-contract.json, deployer-scope-fixed.txt, fleet-scheduler-scope-fixed.txt/mnt/d2/centra/photoncloud-monorepo/work/rollout-soak/20260410T164549+0900 へ保存するようにした。deployer は one active writer plus optional cold-standby restore、fleet-scheduler は two native-runtime workers 上の one drain + one fail-stop cycle with 30-second hold を release boundary として固定した。
  • TRANCHE-12: 完了。2026-04-10 に FDB-P1-01, IAM-P1-01, HARNESS-P2-01 を次段処理した。run-core-control-plane-ops-proof.sh/mnt/d2/centra/photoncloud-monorepo/work/core-control-plane-ops-proof/20260410T172148+09:00scope-fixed-contract.json, iam-credential-rotation-tests.log, iam-mtls-rotation-tests.log, result.json を保存し、FlareDB destructive DDL/fully automated online migration は scope-fixed unsupported、IAM は signing-key + credential + mTLS overlap rotation までを supported lifecycle とし multi-node failover は unsupported に固定した。work-root-budget.sh には enforceprune-proof-logs を追加し、disk budget advisory から stronger local gate と safer cleanup workflow へ進めた。

2026-04-10 Physical Hardware Bring-Up Pack

  • Task: 3dba03d3-525b-4079-8c93-90af6a89d32b
  • Canonical entrypoint: nix run ./nix/test-cluster#hardware-smoke -- preflight, then run or capture
  • Current preflight artifact root: ./work/hardware-smoke/latest
  • Artifact contract: status.env, missing-requirements.txt, kernel-params.txt, expected-markers.txt, failure-markers.txt, operator-handoff.md, environment.txt
  • Bridge to QEMU proof: hardware wrapper reuses nixosConfigurations.ultracloud-iso and the same ULTRACLOUD_MARKER pre-install.boot.*, pre-install.phone-home.complete.*, install.disko.complete.*, reboot.*, post-install.boot.*, desired-system-active.* markers that verify-baremetal-iso.sh enforces in the QEMU harness.
  • Blocked-state recording: when USB device or BMC/Redfish transport is missing, preflight records status=blocked and the missing transport, kernel-parameter, and capture inputs in missing-requirements.txt without pretending the hardware proof ran.
  • Still open: an actual physical-node execution remains pending until a removable USB target or BMC/Redfish endpoint plus credentials are supplied.
  • TRANCHE-09: 完了。2026-04-10 に docs/rollout-bundle.md を追加し、deployer single-writer DR、nix-agent health-check/rollback、node-agent logs/secrets/volume/upgrade、fleet-scheduler drain/maintenance/failover の product contract と proof command を固定した。

2026-04-10 Long-Run Control Plane And Rollout Soak

  • Task: 07d6137e-6e4c-4158-9142-8920f4f70a76
  • Canonical entrypoint: nix run ./nix/test-cluster#cluster -- rollout-soak
  • Artifact root: /mnt/d2/centra/photoncloud-monorepo/work/rollout-soak/20260410T164549+0900
  • Scenario proof: one planned node04 -> draining -> active cycle, one node05 power-loss and recovery cycle, restart of deployer.service, fleet-scheduler.service, node-agent.service on both worker nodes, and fixed-membership restart of chainfire.service plus flaredb.service on node02.
  • Saved evidence: maintenance-during.json, maintenance-held.json, maintenance-restored.json, power-loss-during.json, power-loss-held.json, power-loss-restored.json, deployer-post-restart-nodes.json, fleet-scheduler-post-restart.json, node04-node-agent-post-restart.json, node05-node-agent-post-restart.json, chainfire-post-restart-put.json, flaredb-post-restart.json, post-control-plane-restarts.json, scope-fixed-contract.json, deployer-scope-fixed.txt, fleet-scheduler-scope-fixed.txt, result.json.
  • Long-run nix-agent boundary: steady-state nix/test-cluster nodes do not ship nix-agent.service, so this soak records node01-nix-agent-scope.txt and node04-nix-agent-scope.txt instead of pretending a live-cluster nix-agent restart happened. The executable nix-agent proofs remain deployer-vm-rollback, baremetal-iso, and baremetal-iso-e2e.
  • Result: PASS on the local AMD/KVM host. result.json records success=true, fleet_supported_native_runtime_nodes=2, validated_maintenance_cycles=1, validated_power_loss_cycles=1, soak_hold_secs=30, and the summary validated one planned drain cycle and one fail-stop worker-loss cycle on the two-node native-runtime lab, held each degraded state for the configured soak window, restarted deployer or scheduler or agent services, and revalidated fixed-membership control-plane restarts while keeping deployer HA scope-fixed to single-writer recovery.

2026-04-10 Local Executable Baseline

  • Task: b1e811fb-158f-415c-a011-64c724e84c5c
  • Runner: nix/test-cluster/run-local-baseline.sh
  • Log root: /mnt/d2/centra/photoncloud-monorepo/work/baselines/b1e811fb-158f-415c-a011-64c724e84c5c
  • Local execution policy: ULTRACLOUD_WORK_ROOT=/mnt/d2/centra/photoncloud-monorepo/work, TMPDIR=/mnt/d2/centra/photoncloud-monorepo/work/tmp, XDG_CACHE_HOME=/mnt/d2/centra/photoncloud-monorepo/work/xdg-cache, PHOTON_CLUSTER_WORK_ROOT=/mnt/d2/centra/photoncloud-monorepo/work/test-cluster, PHOTON_VM_DIR=/mnt/d2/centra/photoncloud-monorepo/work/test-cluster/state, PHOTON_CLUSTER_VDE_SWITCH_DIR=/mnt/d2/centra/photoncloud-monorepo/work/test-cluster/vde-switch, NIX_CONFIG builders = で remote builder を禁止。
  • Host evidence: environment.txthost_cpu_count=12, ultracloud_local_nix_max_jobs=6, ultracloud_local_nix_build_cores=2, photon_cluster_nix_max_jobs=6, photon_cluster_nix_build_cores=2, nix_builders= (empty), kvm_access=rw, nested_param_value=1 を保存済み。
  • Guard/build checks:
    • canonical-profile-eval-guards: PASS. command nix build .#checks.x86_64-linux.canonical-profile-eval-guards --no-link; meta /mnt/d2/centra/photoncloud-monorepo/work/baselines/b1e811fb-158f-415c-a011-64c724e84c5c/canonical-profile-eval-guards.meta; log /mnt/d2/centra/photoncloud-monorepo/work/baselines/b1e811fb-158f-415c-a011-64c724e84c5c/canonical-profile-eval-guards.log.
    • supported-surface-guard: PASS. command nix build .#checks.x86_64-linux.supported-surface-guard --no-link; meta /mnt/d2/centra/photoncloud-monorepo/work/baselines/b1e811fb-158f-415c-a011-64c724e84c5c/supported-surface-guard.meta; log /mnt/d2/centra/photoncloud-monorepo/work/baselines/b1e811fb-158f-415c-a011-64c724e84c5c/supported-surface-guard.log.
    • portable-control-plane-regressions: PASS. command nix build .#checks.x86_64-linux.portable-control-plane-regressions; meta /mnt/d2/centra/photoncloud-monorepo/work/baselines/b1e811fb-158f-415c-a011-64c724e84c5c/portable-control-plane-regressions.meta; log /mnt/d2/centra/photoncloud-monorepo/work/baselines/b1e811fb-158f-415c-a011-64c724e84c5c/portable-control-plane-regressions.log.
    • deployer-bootstrap-e2e: PASS. command nix build .#checks.x86_64-linux.deployer-bootstrap-e2e; meta /mnt/d2/centra/photoncloud-monorepo/work/baselines/b1e811fb-158f-415c-a011-64c724e84c5c/deployer-bootstrap-e2e.meta; log /mnt/d2/centra/photoncloud-monorepo/work/baselines/b1e811fb-158f-415c-a011-64c724e84c5c/deployer-bootstrap-e2e.log.
    • host-lifecycle-e2e: PASS. command nix build .#checks.x86_64-linux.host-lifecycle-e2e; meta /mnt/d2/centra/photoncloud-monorepo/work/baselines/b1e811fb-158f-415c-a011-64c724e84c5c/host-lifecycle-e2e.meta; log /mnt/d2/centra/photoncloud-monorepo/work/baselines/b1e811fb-158f-415c-a011-64c724e84c5c/host-lifecycle-e2e.log.
    • fleet-scheduler-e2e: PASS. command nix build .#checks.x86_64-linux.fleet-scheduler-e2e; meta /mnt/d2/centra/photoncloud-monorepo/work/baselines/b1e811fb-158f-415c-a011-64c724e84c5c/fleet-scheduler-e2e.meta; log /mnt/d2/centra/photoncloud-monorepo/work/baselines/b1e811fb-158f-415c-a011-64c724e84c5c/fleet-scheduler-e2e.log.
  • Runtime path checks:
    • single-node-quickstart: PASS. command nix run .#single-node-quickstart; meta /mnt/d2/centra/photoncloud-monorepo/work/baselines/b1e811fb-158f-415c-a011-64c724e84c5c/single-node-quickstart.meta; log /mnt/d2/centra/photoncloud-monorepo/work/baselines/b1e811fb-158f-415c-a011-64c724e84c5c/single-node-quickstart.log; success marker single-node quickstart smoke passed.
    • baremetal-iso: PASS. command nix run ./nix/test-cluster#cluster -- baremetal-iso; meta /mnt/d2/centra/photoncloud-monorepo/work/baselines/b1e811fb-158f-415c-a011-64c724e84c5c/baremetal-iso.meta; log /mnt/d2/centra/photoncloud-monorepo/work/baselines/b1e811fb-158f-415c-a011-64c724e84c5c/baremetal-iso.log; success markers ULTRACLOUD_MARKER desired-system-active.iso-control-plane-01, ULTRACLOUD_MARKER desired-system-active.iso-worker-01, Canonical ISO bare-metal QEMU verification succeeded.
    • fresh-smoke: PASS. command nix run ./nix/test-cluster#cluster -- fresh-smoke; meta /mnt/d2/centra/photoncloud-monorepo/work/baselines/b1e811fb-158f-415c-a011-64c724e84c5c/fresh-smoke.meta; log /mnt/d2/centra/photoncloud-monorepo/work/baselines/b1e811fb-158f-415c-a011-64c724e84c5c/fresh-smoke.log; success marker Cluster validation succeeded.
  • 2026-04-10 execution failures: none. 2026-03-02 の historical failure split は上節のままで、この local AMD/KVM baseline では required command 群を fail として再現していない。
  • 2026-04-10 observed non-failure risk:
    • HARNESS-OBS-20260410-01: 2026-04-10 に解消。nix/test-cluster/run-cluster.sh の stale VM cleanup は current vm_dir / vde_switch_dir を cmdline で確認した PID のみ収集するように変更し、path 非依存の hostfwd=tcp::${port}-:22 fallback を撤去した。

2026-04-10 Bare-Metal Canonical Path

  • Task: 6d9f45e4-1954-4a0b-b886-c61482db6c3c
  • QEMU-as-hardware runtime proof: PASS. command nix run ./nix/test-cluster#cluster -- baremetal-iso; log root /mnt/d2/centra/photoncloud-monorepo/work/baremetal-iso; evidence files environment.txt, deployer.log, chainfire.log, control-plane.serial.log, worker.serial.log.
  • Runtime PASS markers: ULTRACLOUD_MARKER desired-system-active.iso-control-plane-01, ULTRACLOUD_MARKER desired-system-active.iso-worker-01, Canonical ISO bare-metal QEMU verification succeeded.
  • Runtime contract now proven:
    • reusable node classes own install_plan.nixos_configuration, install_plan.disko_config_path, and stable install_plan.target_disk_by_id
    • nodes carry identity plus desired-system overrides only; when a cache-backed prebuilt closure is available they now publish desired_system.target_system to converge to the exact shipped system instead of a dirty local rebuild
    • installed nodes now keep nix-agent alive across their own switch-to-configuration transaction long enough for activation to finish, which restored post-install chainfire and nix-agent convergence
  • Historical blocker (resolved on 2026-04-10): direct build-time execution of nix build .#checks.x86_64-linux.baremetal-iso-e2e ran under sandboxed nixbld1 and fell back to TCG. The exact lane is now a materialized runner: the check build succeeds quickly and emits ./result/bin/baremetal-iso-e2e, and that runner executes the same verify-baremetal-iso.sh harness with host KVM and logs under ./work.

2026-04-10 Responsibility And Minimal-Surface Alignment

  • Task: 65a13e46-1376-4f37-a5c1-e520b5b376ec
  • Authoring source decision: ultracloud.cluster backed by nix/lib/cluster-schema.nix is now documented in README.md, docs/README.md, and docs/testing.md as the only supported cluster authoring source. nix-nos is explicitly reduced to legacy compatibility plus low-level network primitives.
  • Module boundary alignment: services.deployer, services.fleet-scheduler, services.nix-agent, and services.node-agent descriptions now agree on the canonical layering ultracloud.cluster -> deployer -> (nix-agent | fleet-scheduler -> node-agent).
  • Minimal-surface friction reduction: services.plasmavmc and services.k8shost now wait only for local backing services that they actually use. When explicit remote endpoints are configured, they no longer hard-wire unrelated local control-plane units into startup ordering, which preserves a lighter standalone story for the VM-platform core and remote-provider deployments.
  • Validation alignment: supported-surface-guard now requires contract markers for the supported authoring source, the constrained nix-nos role, and the standalone VM-platform story so docs drift becomes a failing regression.
  • Still open: rollout-stack の default port mismatch は解消済み。残件は hardware bring-up と longer-duration durability proof。

2026-04-10 Supported Surface Final Proof

  • Task: 32f64c10-1b74-4d8a-8d7d-b2cc6bf6b4f0
  • Guard + minimal-trial proof root: /mnt/d2/centra/photoncloud-monorepo/work/final-proofs/32f64c10-1b74-4d8a-8d7d-b2cc6bf6b4f0-final
    • supported-surface-guard: PASS. command nix build .#checks.x86_64-linux.supported-surface-guard --no-link; meta /mnt/d2/centra/photoncloud-monorepo/work/final-proofs/32f64c10-1b74-4d8a-8d7d-b2cc6bf6b4f0-final/supported-surface-guard.meta; log /mnt/d2/centra/photoncloud-monorepo/work/final-proofs/32f64c10-1b74-4d8a-8d7d-b2cc6bf6b4f0-final/supported-surface-guard.log.
    • single-node-trial-vm: PASS. command nix build .#single-node-trial-vm --no-link --print-out-paths; meta /mnt/d2/centra/photoncloud-monorepo/work/final-proofs/32f64c10-1b74-4d8a-8d7d-b2cc6bf6b4f0-final/single-node-trial-vm.meta; log /mnt/d2/centra/photoncloud-monorepo/work/final-proofs/32f64c10-1b74-4d8a-8d7d-b2cc6bf6b4f0-final/single-node-trial-vm.log; output path /nix/store/1nq4pkadm3lbxmhkr54iz7lgjd6vm7z3-nixos-vm.
    • single-node-quickstart: PASS. command nix run .#single-node-quickstart; meta /mnt/d2/centra/photoncloud-monorepo/work/final-proofs/32f64c10-1b74-4d8a-8d7d-b2cc6bf6b4f0-final/single-node-quickstart.meta; log /mnt/d2/centra/photoncloud-monorepo/work/final-proofs/32f64c10-1b74-4d8a-8d7d-b2cc6bf6b4f0-final/single-node-quickstart.log; success marker single-node quickstart smoke passed.
  • Publishable KVM suite root: /mnt/d2/centra/photoncloud-monorepo/work/publishable-kvm-suite
    • environment.txt captures host_cpu_count=12, local_nix_max_jobs=6, local_nix_build_cores=2, photon_cluster_nix_max_jobs=6, photon_cluster_nix_build_cores=2, kvm_present=yes, kvm_access=rw, kvm_amd_nested=1, nix_builders=, finished_at=2026-04-10T09:36:09+09:00, exit_status=0.
    • fresh-smoke: PASS. command nix run ./nix/test-cluster#cluster -- fresh-smoke; meta /mnt/d2/centra/photoncloud-monorepo/work/publishable-kvm-suite/fresh-smoke.meta; log /mnt/d2/centra/photoncloud-monorepo/work/publishable-kvm-suite/fresh-smoke.log; success marker Cluster validation succeeded.
    • fresh-demo-vm-webapp: PASS. command nix run ./nix/test-cluster#cluster -- fresh-demo-vm-webapp; meta /mnt/d2/centra/photoncloud-monorepo/work/publishable-kvm-suite/fresh-demo-vm-webapp.meta; log /mnt/d2/centra/photoncloud-monorepo/work/publishable-kvm-suite/fresh-demo-vm-webapp.log; success markers include PHOTON_VM_DEMO_WEB_READY and the guest web health check on http://10.62.10.10:8080/health.
    • fresh-matrix: PASS. command nix run ./nix/test-cluster#cluster -- fresh-matrix; meta /mnt/d2/centra/photoncloud-monorepo/work/publishable-kvm-suite/fresh-matrix.meta; log /mnt/d2/centra/photoncloud-monorepo/work/publishable-kvm-suite/fresh-matrix.log; success marker Component matrix validation succeeded.
    • run-publishable-kvm-suite: PASS. command ./nix/test-cluster/run-publishable-kvm-suite.sh ./work/publishable-kvm-suite; environment /mnt/d2/centra/photoncloud-monorepo/work/publishable-kvm-suite/environment.txt; final stdout marker publishable KVM suite passed; logs in ./work/publishable-kvm-suite.
  • Fixed while proving the surface:
    • NODEAGENT-FIX-20260410-01: reboot-time PID reuse could make node-agent treat native-daemon as the resurrected native-web instance after worker reboot, stalling fresh-smoke at native runtime recovery. deployer/crates/node-agent/src/process.rs now persists argv + boot-id metadata, validates the live /proc/<pid>/cmdline, and refuses to signal or reuse mismatched processes from stale pidfiles.
    • HARNESS-FIX-20260410-01: run-publishable-kvm-suite exposed a control-plane LightningStor bootstrap race that was not consistently hit by ad-hoc reruns. nix/test-cluster/node01.nix now holds lightningstor.service behind explicit local control-plane and worker-replica TCP readiness with a longer start timeout, and nix/test-cluster/run-cluster.sh now waits the worker storage agents before gating the control-plane LightningStor unit.
  • Still open after the final supported-surface proof: real hardware baremetal-iso smoke.

2026-04-10 baremetal-iso-e2e Local-KVM Exact Lane

  • Task: 0de75570-dabd-471b-95fe-5898c54e2e8c
  • Check build output: nix build .#checks.x86_64-linux.baremetal-iso-e2e now materializes ./result/bin/baremetal-iso-e2e instead of trying to execute QEMU inside the daemon sandbox.
  • Exact proof root: /mnt/d2/centra/photoncloud-monorepo/work/baremetal-iso-e2e/0de75570-dabd-471b-95fe-5898c54e2e8c
  • Outer runner evidence: environment.txt records execution_model=materialized-check-runner, nix_builders= (empty), kvm_present=yes, kvm_access=rw, and the local CPU-derived Nix parallelism.
  • Exact check build: PASS. command nix build .#checks.x86_64-linux.baremetal-iso-e2e; output path is a runner package that ships bin/baremetal-iso-e2e plus share/ultracloud/README.txt documenting the sandbox/TCG reason for the materialized execution model.
  • Exact runner: PASS. command ./result/bin/baremetal-iso-e2e ./work/baremetal-iso-e2e/0de75570-dabd-471b-95fe-5898c54e2e8c; meta /mnt/d2/centra/photoncloud-monorepo/work/baremetal-iso-e2e/0de75570-dabd-471b-95fe-5898c54e2e8c/baremetal-iso-e2e.meta; log /mnt/d2/centra/photoncloud-monorepo/work/baremetal-iso-e2e/0de75570-dabd-471b-95fe-5898c54e2e8c/baremetal-iso-e2e.log.
  • Inner runtime evidence: state dir /mnt/d2/centra/photoncloud-monorepo/work/baremetal-iso-e2e/0de75570-dabd-471b-95fe-5898c54e2e8c/state; state/environment.txt records vm_accelerator_mode=kvm; success markers in baremetal-iso-e2e.log include ULTRACLOUD_MARKER desired-system-active.iso-control-plane-01, ULTRACLOUD_MARKER desired-system-active.iso-worker-01, and Canonical ISO bare-metal QEMU verification succeeded.
  • Remaining delta vs direct runtime proof: the harness is now identical because both nix run ./nix/test-cluster#cluster -- baremetal-iso and ./result/bin/baremetal-iso-e2e call nix/test-cluster/verify-baremetal-iso.sh. The only intentional difference is execution entrypoint: nix build materializes the runner because daemon-sandboxed nixbld builds would otherwise lose host KVM and degrade to TCG.

2026-04-10 Durability And Product-Boundary Hardening

  • Task: 541356be-b289-4583-ba40-cbf46b0f9680
  • Guard rerun: PASS. command nix build .#checks.x86_64-linux.supported-surface-guard --no-link.
  • Runtime rerun: PASS. command nix run ./nix/test-cluster#cluster -- fresh-matrix; success marker Component matrix validation succeeded.
  • Durability proof: PASS. command nix run ./nix/test-cluster#cluster -- durability-proof; artifact root /mnt/d2/centra/photoncloud-monorepo/work/durability-proof/20260410T120618+0900; convenience symlink /mnt/d2/centra/photoncloud-monorepo/work/durability-proof/latest.
  • ChainFire proof: chainfire-backup-response.jsonchainfire-restored-response.json が同じ logical payload を返し、DELETE 後の chainfire-after-delete.out は 404 を返す。
  • FlareDB proof: flaredb-backup.jsonflaredb-restored.json が同じ SQL row を返し、flaredb-after-delete.json は空集合を返す。
  • Deployer proof: deployer-pre-register-request.json を backup artifact とし、deployer-backup-list.json で pre-registered node を観測し、deployer.service restart 後も deployer-post-restart-list.json に残ることを確認し、同じ request を replay した後も deployer-replayed-list.json の summary が変わらないことを確認した。result.jsondeployer_restore_modeadmin pre-register request replay with pre/post-restart list verification
  • CoronaFS failure injection: coronafs-node04-local-state.json は controller 停止中も node_local=true と materialized path を保持し、coronafs-node04-capabilities.json は node-only capability split (supports_controller_api=false, supports_node_api=true) を維持した。
  • LightningStor failure injection: lightningstor-put-during-node05-outage.json, lightningstor-head-during-node05-outage.json, lightningstor-object-during-node05-outage.txt, lightningstor-object-after-repair.txt が node05 停止中 write と repair 後 read-back を保存する。
  • FiberLB supported limitation: fiberlb/crates/fiberlb-server/src/healthcheck.rs, README.md, docs/testing.md, docs/component-matrix.md, flake.nix で、HTTPS backend health は TLS 証明書検証なしの限定契約だと固定した。
  • k8shost boundary: README.md, docs/testing.md, docs/component-matrix.md, k8shost/README.md, nix/test-cluster/README.md, flake.nixk8shost を API/control-plane 製品 surface のみに固定し、k8shost-cni, k8shost-controllers, lightningstor-csi を archived non-product として揃えた。
  • Proof-lane hardening done during this tranche: 初回 durability-proof は FlareDB cleanup tail の unsupported DROP TABLE で落ちたため unique namespace 前提に整理し、次に cleanup trap の unbound local で落ちたため trap cleanup を ${var:-} と guarded tunnel shutdown に直した。現在の lane は zero-exit で artifact を残す。

2026-04-10 Rollout Bundle HA And DR Hardening

  • Task: a41343c5-116e-4313-8751-b333472f931c
  • Operator doc: docs/rollout-bundle.md
  • Verification reruns: nix build .#checks.x86_64-linux.portable-control-plane-regressions, nix build .#checks.x86_64-linux.fleet-scheduler-e2e, and nix build .#checks.x86_64-linux.deployer-vm-rollback all passed on 2026-04-10 with local-only Nix settings.
  • Durability rerun: nix run ./nix/test-cluster#cluster -- durability-proof passed again from a clean KVM cluster and wrote artifacts under /mnt/d2/centra/photoncloud-monorepo/work/durability-proof/20260410T123535+0900.
  • Supported deployer boundary: single-writer deployer with restart-in-place or cold-standby restore. ChainFire-backed multi-instance failover is explicitly unsupported for now and the restore runbook is fixed to cluster-state apply + preserved pre-register request replay + admin verification.
  • Nix-agent proof: nix build .#checks.x86_64-linux.deployer-vm-rollback passed on 2026-04-10 and is now the canonical reproducible proof for health_check_command, rollback, and rolled-back partial failure recovery semantics.
  • Fleet-scheduler semantics: fresh-smoke and fleet-scheduler-e2e remain the release proofs for short-lived draining maintenance, fail-stop worker loss, and replica restoration. Long-duration maintenance and large-cluster drain choreography stay scope-limited rather than silently implied.
  • Node-agent contract: product docs now fix ${stateDir}/pids/*.log as the per-instance log location, ${stateDir}/pids/*.meta.json as stale-pid metadata, secret delivery as caller-provided env or mounted files only, host-path volumes as pass-through only, and upgrades as replace-and-reconcile rather than in-place patching.

2026-04-10 Core Control Plane Operator Lifecycle Proofs

  • Task: dcdc961a-0aa6-47c3-aeba-a1c67bca27b7
  • Operator doc: docs/control-plane-ops.md
  • Focused proof: ./nix/test-cluster/run-core-control-plane-ops-proof.sh /mnt/d2/centra/photoncloud-monorepo/work/core-control-plane-ops-proof/20260410T172148+09:00
  • Focused proof result: passed on 2026-04-10 and wrote result.json, scope-fixed-contract.json, iam-key-rotation-tests.log, iam-credential-rotation-tests.log, iam-mtls-rotation-tests.log, and the contract-marker logs under /mnt/d2/centra/photoncloud-monorepo/work/core-control-plane-ops-proof/20260410T172148+09:00.
  • Supported-surface guard: rerun after the doc and proof updates so the public lifecycle contract is now guarded alongside the existing supported-surface wording.
  • ChainFire boundary: dynamic membership, replace-node, and scale-out are now explicit non-supported actions on the product surface. The supported path is fixed-membership restore or whole-cluster replacement anchored by the existing durability-proof backup/restore lane.
  • FlareDB boundary: online migration and schema evolution are now fixed to an additive-first, backup/restore-gated operator contract. Destructive DDL and fully automated online migration are explicit non-supported boundaries for this release rather than implied future promises.
  • IAM boundary: bootstrap hardening now requires explicit admin token, signing key, and 32-byte IAM_CRED_MASTER_KEY inputs in docs. The standalone proof reruns signing-key rotation, credential overlap-and-revoke rotation, and mTLS overlap-and-cutover rotation tests while checking the hardening markers in iam-server; multi-node IAM failover remains unsupported.

2026-04-10 Edge And Trial-Surface Productization

  • Task: cc24ac5a-b940-4a32-9136-d706ecadf875
  • Operator doc: docs/edge-trial-surface.md
  • Component docs: apigateway/README.md, nightlight/README.md, and creditservice/README.md
  • Helper: ./nix/test-cluster/work-root-budget.sh status now reports ./work disk usage, soft budgets, and cleanup plus nix store gc guidance without mutating state by default.
  • Edge bundle boundary: APIGateway is now documented as stateless replicated behind external L4 or VIP distribution, but restart-based rollout remains the only supported config distribution or reload model proven on this branch. NightLight is fixed to a single-node WAL/snapshot product shape with process-wide retention, and CreditService export plus migration is fixed to offline export/import or backend-native snapshots instead of live mixed-writer migration.
  • Trial boundary: single-node-trial-vm and single-node-quickstart remain the only supported lightweight trial surface. OCI/Docker remains intentionally unsupported because it would not prove the same guest-kernel, KVM, /dev/net/tun, and OVS/libvirt contract.

2026-04-10 Provider And VM-Hosting Reality Proof

  • Task: 41a074a3-dc5c-42fc-979e-c8ebf9919d55
  • Focused proof lane: nix run ./nix/test-cluster#cluster -- provider-vm-reality-proof
  • Focused proof result: passed on 2026-04-10 and wrote result.json, meta.json, journals, and provider or VM-hosting artifacts under /mnt/d2/centra/photoncloud-monorepo/work/provider-vm-reality-proof/20260410T135827+0900.
  • Provider artifacts: network-provider/prismnet-port-create.json, network-provider/prismnet-security-group-after-add.json, network-provider/flashdns-workload-authoritative-answer.txt, network-provider/flashdns-service-authoritative-answer.txt, network-provider/fiberlb-drain-summary.txt, network-provider/fiberlb-tcp-health-before-drain.txt, and network-provider/fiberlb-tcp-health-after-restore.txt fix the current local-KVM proof to tenant network lifecycle, authoritative DNS answers, and listener drain or re-convergence.
  • VM-hosting artifacts: vm-hosting/vm-create-response.json, vm-hosting/root-volume-before-migration.json, vm-hosting/root-volume-after-migration.json, vm-hosting/data-volume-after-migration.json, vm-hosting/migration-summary.json, vm-hosting/prismnet-port-after-migration.json, and vm-hosting/demo-state-after-post-migration-restart.json fix the current release proof to KVM shared-storage migration, CoronaFS handoff, and post-migration restart on the worker pair.
  • Scope-fixed gaps: real OVS/OVN dataplane validation, native BGP or BFD peer interop with hardware VIP ownership, and real-hardware VM migration or storage handoff remain outside the supported local-KVM surface and are now explicit docs or guard limits rather than implied release claims.

chainfire

  • 責務: UltraCloud 全体の replicated coordination store。KV, lease, watch, cluster membership view, rollout stack の state anchor を持つ。
  • Canonical entrypoint: nix/modules/chainfire.nix; chainfire/crates/chainfire-server/src/main.rs; supported API は chainfire/proto/chainfire.proto
  • 現在ある証拠: README.mdMemberList / Status を supported surface と明示; chainfire/crates/chainfire-server/src/rest.rs に health と member add がある; docs/testing.md が quickstart と HA proof を定義; nix/single-node/base.nixnix/nodes/vm-cluster/* が正本 wiring; 2026-04-10 の durability-proofchainfire-backup-response.json / chainfire-restored-response.json で logical KV backup/restore を保存し、rollout-soak/mnt/d2/centra/photoncloud-monorepo/work/rollout-soak/20260410T164549+0900/chainfire-post-restart-put.jsonpost-control-plane-restarts.json で fixed-membership restart 後の live proof を保存した。
  • 未証明事項: rolling upgrade 手順; 実機 3 ノード上での membership 変更; power-loss 後の復旧 runbook。
  • P0: いまの static survey で即死級の file-level breakage は未検出。
  • P1: CF-P1-01 は 2026-04-10 に scope freeze から live-restart-proof 付きへ進んだ。dynamic membership / scale-out / replace-node は supported surface では explicit に unsupported のままだが、fixed-membership restart 自体は rollout-soak により live KVM proof へ格上げされた。次段で残るのは、live membership mutation 自体を製品化したい場合の dedicated KVM proof 追加だけ。
  • P2: CF-P2-01 chainfire-core の internal pruning が current branch で進行中なので、公開境界と workspace 内部境界の最終整理が必要。
  • 依存関係: local disk; host networking; flaredb, iam, deployer, fleet-scheduler, nix-agent, node-agent, coronafs から参照される。

flaredb

  • 責務: replicated KV/SQL metadata store。各サービスの metadata, quota state, object metadata, tenant network state の受け皿。
  • Canonical entrypoint: nix/modules/flaredb.nix; flaredb/crates/flaredb-server/src/main.rs; REST は flaredb/crates/flaredb-server/src/rest.rs
  • 現在ある証拠: README.mdPOST /api/v1/sqlGET /api/v1/tables を supported と明記; flaredb/crates/flaredb-server/src/rest.rs に SQL/KV/scan/member add がある; docs/testing.md が control-plane proof と fresh-matrix 依存を説明; nix/modules/flaredb.nixpdAddr と namespace mode を生成; 2026-04-10 の rollout-soak/mnt/d2/centra/photoncloud-monorepo/work/rollout-soak/20260410T164549+0900/flaredb-post-restart-create.json, flaredb-post-restart-insert.json, flaredb-post-restart.json で member restart 後の additive SQL を保存し、run-core-control-plane-ops-proof.sh/mnt/d2/centra/photoncloud-monorepo/work/core-control-plane-ops-proof/20260410T172148+09:00/scope-fixed-contract.jsonflaredb-migration-contract.log で destructive DDL / fully automated online migration が supported surface の外だと固定した。
  • 未証明事項: real hardware 上の storage pressure と multi-node repair。fully automated online migration と destructive DDL online cutover はこの release では intentionally unsupported。
  • P0: いまの static survey で即死級の file-level breakage は未検出。
  • P1: FDB-P1-01 は 2026-04-10 に scope-fixed final。supported SQL/KV surface の logical backup/restore は durability-proof と docs で固定済みで、online migration / schema-evolution は additive-first と backup/restore baseline 前提で整理された。rollout-soak は member restart 後の additive SQL を live KVM artifact として保存し、run-core-control-plane-ops-proof.sh は destructive DDL と fully automated online migration が supported surface の外だと /mnt/d2/centra/photoncloud-monorepo/work/core-control-plane-ops-proof/20260410T172148+09:00/scope-fixed-contract.json に固定した。今後やるなら scope 拡張として destructive online migration proof を別 tranche で扱う。
  • P2: FDB-P2-01 namespace ごとの strong / eventual 方針が module default に埋まっており、operator-facing contract としてはまだ弱い。
  • 依存関係: chainfire を placement/coordination に使う; local disk; iam, prismnet, flashdns, fiberlb, plasmavmc, lightningstor, creditservice, k8shost から参照される。

iam

  • 責務: identity, token issuance, authn, authz, tenant principal 管理。
  • Canonical entrypoint: nix/modules/iam.nix; iam/crates/iam-server/src/main.rs; API package は iam/crates/iam-api/src/lib.rs
  • 現在ある証拠: README.mddocs/component-matrix.md が core component として扱う; nix/modules/iam.nixchainfire / flaredb 接続を正本生成; iam-authn, iam-authz, iam-store crate が分離; fresh-matrix と gateway path が credit/k8shost/plasmavmc 経由で IAM を前提にしている; run-core-control-plane-ops-proof.sh/mnt/d2/centra/photoncloud-monorepo/work/core-control-plane-ops-proof/20260410T172148+09:00iam-key-rotation-tests.log, iam-credential-rotation-tests.log, iam-mtls-rotation-tests.log, scope-fixed-contract.json, result.json を保存し、bootstrap hardening, signing-key rotation, credential overlap rotation, mTLS overlap rotation を standalone proof として固定した。
  • 未証明事項: multi-node IAM failover; backend matrix 全体での same-lane lifecycle proof。
  • P0: いまの static survey で即死級の file-level breakage は未検出。
  • P1: IAM-P1-01 は 2026-04-10 に scope-fixed final。bootstrap hardening と token/signing-key rotation は docs/control-plane-ops.mdrun-core-control-plane-ops-proof.sh で standalone に固定され、同じ proof root が credential overlap-and-revoke rotation と mTLS overlap-and-cutover rotation も保存するようになった。multi-node IAM failover は supported surface の外へ明示的に出した。今後やるなら scope 拡張として clustered IAM failover proof を別 tranche で扱う。
  • P2: IAM-P2-01 flaredb / postgres / sqlite / memory の backend matrix 全体を harness ではまだ網羅していない。
  • 依存関係: flaredb が主 storage; optional chainfire; prismnet, flashdns, fiberlb, plasmavmc, lightningstor, creditservice, k8shost, apigateway が consumer。

prismnet

  • 責務: tenant network control plane。VPC, subnet, port, router, security group, service IP pool を扱う。
  • Canonical entrypoint: nix/modules/prismnet.nix; prismnet/crates/prismnet-server/src/main.rs; API は prismnet/crates/prismnet-api/proto/prismnet.proto
  • 現在ある証拠: docs/testing.mdREADME.mdfresh-matrix で VPC/subnet/port と security-group ACL add/remove を正本 proof と明示; prismnet/crates/prismnet-server/src/services/* に service 実装がある; prismnet/crates/prismnet-server/src/ovn/client.rs が OVN client を持つ; nix/modules/prismnet.nix が binary-consumed config を生成する。
  • 未証明事項: 実機 OVS/OVN dataplane; DHCP/metadata service の実ハード proof; multi-rack network integration。
  • P0: いまの static survey で即死級の file-level breakage は未検出。
  • P1: PRISMNET-P1-01 は 2026-04-10 に narrowed。provider-vm-reality-proof が local KVM lab で VPC/subnet/port lifecycle, security-group ACL add/remove, attached-VM networking artifact を dated root に保存するようになった。未解消の次段は real OVS/OVN dataplane と hardware-switch integration を release proof に昇格させること。
  • P2: PRISMNET-P2-01 ovn/mock.rs が近接して残っているため、supported path と archived/test path の境界を継続監視する必要がある。
  • 依存関係: iam, flaredb, optional chainfire; consumer は flashdns, fiberlb, plasmavmc, k8shost

flashdns

  • 責務: authoritative DNS publication。tenant record, reverse zone, DNS handler を持つ。
  • Canonical entrypoint: nix/modules/flashdns.nix; flashdns/crates/flashdns-server/src/main.rs; flashdns/crates/flashdns-server/src/dns/*
  • 現在ある証拠: docs/testing.mdREADME.mdfresh-matrix で record publication を正本 proof としている; flashdns server は record/zone/reverse-zone service を持つ; nix/modules/flashdns.nix が binary-consumed config を生成する。
  • 未証明事項: real port 53 exposure; upstream/secondary integration; failover with real network gear。
  • P0: いまの static survey で即死級の file-level breakage は未検出。
  • P1: FLASHDNS-P1-01 は 2026-04-10 に narrowed。provider-vm-reality-proof が authoritative workload/service answers を dated root に保存するようになり、local KVM での publication evidence は release lane に入った。未解消の次段は real port 53 exposure と upstream/secondary interop を hardware or external-network proof に広げること。
  • P2: FLASHDNS-P2-01 は 2026-04-10 に解消済み。single-node dev optional bundle は nix/single-node/surface.nix 上の TCP health gating を持つようになった。
  • 依存関係: iam, flaredb, optional chainfire; publication source は k8shostfleet-scheduler

fiberlb

  • 責務: service publication / VIP / L4-L7 load balancing / native BGP advertisement。
  • Canonical entrypoint: nix/modules/fiberlb.nix; fiberlb/crates/fiberlb-server/src/main.rs; dataplane は dataplane.rs, l7_dataplane.rs, vip_manager.rs, bgp_client.rs
  • 現在ある証拠: README.mddocs/testing.mdfresh-matrix で TCP と TLS-terminated Https / TerminatedHttps listener を正本 proof としている; server code に native BGP/BFD, VIP ownership, TLS store, L7 dataplane 実装がある; L4 algorithm は in-tree tests を持つ。
  • 未証明事項: 実機 BGP peer との interop; L2/VIP 所有権の hardware proof; IPv6 と mixed peer topology。
  • P0: いまの static survey で即死級の file-level breakage は未検出。
  • P1: FIBERLB-P1-01 は 2026-04-10 に scope-fixed。fiberlb/crates/fiberlb-server/src/healthcheck.rs の HTTPS health check は依然として backend TLS 証明書検証をしないが、その理由と supported 範囲 (TCP reachability + HTTP status) は docs/guard/source comment に固定された。将来の CA-aware verification は別 tranche。
  • P1: FIBERLB-P1-02 は 2026-04-10 に narrowed。provider-vm-reality-proof が listener publication, backend disable, drain, restore, re-convergence の artifact を dated root に保存するようになった。未解消の次段は native BGP/BFD peer interop と hardware VIP ownership を real network proof へ広げること。
  • P2: FIBERLB-P2-01 は 2026-04-10 に解消済み。single-node dev optional bundle は nix/single-node/surface.nix 上の TCP health gating を持つようになった。
  • 依存関係: iam, flaredb, optional chainfire; publication consumer は k8shostfleet-scheduler; 実ネットワーク peer が必要。

plasmavmc

  • 責務: tenant VM control plane と worker agent。VM lifecycle, image/materialization, worker registration, hypervisor integration を持つ。
  • Canonical entrypoint: nix/modules/plasmavmc.nix; plasmavmc/crates/plasmavmc-server/src/main.rs; supported public backend は plasmavmc-kvm
  • 現在ある証拠: README.md が KVM-only public contract を明記; docs/testing.mdsingle-node-quickstart, fresh-smoke, fresh-matrixHYPERVISOR_TYPE_KVM を正本 proof とする; vm_service.rsHYPERVISOR_TYPE_KVM 以外を public surface 外とする; volume_manager.rscoronafs / lightningstor integration を持つ。
  • 未証明事項: 実機での migration / storage handoff; long-running guest upgrade; network + storage fault 下での recovery。
  • P0: いまの static survey で即死級の file-level breakage は未検出。
  • P1: PLASMAVMC-P1-01 は 2026-04-10 に narrowed。provider-vm-reality-proof が shared-storage migration, PrismNet-attached post-migration networking, CoronaFS handoff, post-migration restart state を dated root に保存するようになった。未解消の次段は real-hardware migration と storage handoff の release proof を足すこと。
  • P2: PLASMAVMC-P2-01 Firecracker / mvisor の archived code が in-tree に残るため、supported surface への逆流を guard し続ける必要がある。
  • 依存関係: iam, flaredb, prismnet, optional chainfire, lightningstor, coronafs, host KVM/QEMU。

coronafs

  • 責務: mutable VM volume layer。raw volume を管理し、qemu-nbd で worker に export する。
  • Canonical entrypoint: nix/modules/coronafs.nix; coronafs/crates/coronafs-server/src/main.rs; 製品説明は coronafs/README.md
  • 現在ある証拠: coronafs/README.md が mutable VM-volume layer としての split を明言; coronafs-server/healthz と volume/export API を持つ; docs/testing.mdplasmavmc + coronafs + lightningstorfresh-matrix で proof 対象にしている; plasmavmc/volume_manager.rs に深い integration がある。
  • 未証明事項: export interruption 後の recovery の長時間耐久; 実ディスク/実ネットワーク上での latency budget。
  • P0: いまの static survey で即死級の file-level breakage は未検出。
  • P1: CORONAFS-P1-01 は 2026-04-10 に解消済み。nix/single-node/surface.nix の quickstart health URL は http://127.0.0.1:50088/healthz に修正された。
  • P1: CORONAFS-P1-02 は 2026-04-10 に解消済み。durability-proof が controller outage 中も node-local materialized volume の read と node-only capability split を検証する canonical failure-injection lane を持つ。
  • P2: CORONAFS-P2-01 storage benchmark はあるが、canonical publish gate では recovery path の比重がまだ弱い。
  • 依存関係: qemu-nbd, qemu-img, local disk; optional chainfire metadata backend; primary consumer は plasmavmc

lightningstor

  • 責務: object storage と VM image backing。metadata plane と data node plane を持つ。
  • Canonical entrypoint: nix/modules/lightningstor.nix; lightningstor/crates/lightningstor-server/src/main.rs; lightningstor/crates/lightningstor-node/src/main.rs; S3 path は src/s3/*
  • 現在ある証拠: README.md が bucket versioning / policy / tagging / object version listing を supported surface と明記; docs/testing.mdfresh-matrix で bucket metadata と object-version APIs を proof 対象にしている; server は S3 auth, distributed backend, repair queue を持つ; module は metadata/data/all-in-one mode を持つ。
  • 未証明事項: distributed backend の実機 failover; S3 compatibility breadth; cold-start image distribution on hardware。
  • P0: いまの static survey で即死級の file-level breakage は未検出。
  • P1: LIGHTNINGSTOR-P1-01 は 2026-04-10 に解消済み。durability-proof が node05 outage 中の write/head/read と service restore 後の repair/read-back を canonical failure-injection artifact として保存する。
  • P2: LIGHTNINGSTOR-P2-01 は 2026-04-10 に解消済み。single-node dev optional bundle は nix/single-node/surface.nix 上の TCP health gating を持つようになった。
  • 依存関係: iam, flaredb, optional chainfire; optional lightningstor-node; consumer は plasmavmc と tenant object clients。

k8shost

  • 責務: tenant workload API surface。pod/deployment/service を扱い、prismnet, flashdns, fiberlb, optional creditservice に投影する。
  • Canonical entrypoint: nix/modules/k8shost.nix; k8shost/crates/k8shost-server/src/main.rs; API protobuf は k8shost/crates/k8shost-proto/proto/k8s.proto
  • 現在ある証拠: k8shost/README.md が supported scope を定義; README.mdWatchPods を bounded snapshot stream と明記; k8shost-server/src/services/pod.rsReceiverStream ベースの WatchPods を実装; docs/testing.mdfresh-smoke / fresh-matrix で API contract を proof 対象にしている; 2026-04-10 には docs/guard/TODO で API/control-plane product surface のみに固定された。
  • 未証明事項: 実 workload runtime; tenant networking dataplane with real CNI/CSI; node-level execution semantics。
  • P0: K8SHOST-P0-01 は 2026-04-10 に解消済み。実 workload dataplane (k8shost-cni, k8shost-controllers, lightningstor-csi) は archived non-product として固定し、製品 narrative を API/control-plane scope のみに揃えた。
  • P1: K8SHOST-P1-01 は 2026-04-10 に scope-resolved。canonical proof が API contract 中心であること自体を製品境界として明文化し、実 pod runtime は製品 claim から外した。
  • P2: K8SHOST-P2-01 は 2026-04-10 に解消済み。archived scaffolds の非正本扱いは supported-surface-guard の contract marker で継続監視される。
  • 依存関係: iam, flaredb, chainfire, prismnet, flashdns, fiberlb, optional creditservice

apigateway

  • 責務: external API/proxy surface。route, auth provider, credit provider, request mediation を持つ。
  • Canonical entrypoint: nix/modules/apigateway.nix; apigateway/crates/apigateway-server/src/main.rs
  • 現在ある証拠: node06apigateway を正本 gateway node として起動; docs/testing.mdnix/test-cluster/README.md が API-gateway-mediated flows を fresh-matrix に含める; server code は route, auth, credit provider, upstream timeout, request-id を持つ。
  • 未証明事項: multi-node HA; config distribution / reload; TLS termination strategy; gateway as product docs。
  • P0: いまの static survey で即死級の file-level breakage は未検出。
  • P1: APIGW-P1-01 は 2026-04-10 に scope-fixed。APIGateway は stateless replicated behind external L4/VIP として supported、config distribution は rendered config + restart-based rollout、live in-process reload は unsupported と docs に固定された。次段で残るのは dedicated multi-gateway HA proof の追加。
  • P2: APIGW-P2-01 release proof は node06fresh-matrix への間接依存が中心で、専用 smoke gate が無い。
  • 依存関係: upstream services; optional iam / creditservice provider; external clients。

nightlight

  • 責務: metrics ingestion と query。Prometheus remote_write / query API と gRPC query/admin を持つ。
  • Canonical entrypoint: nix/modules/nightlight.nix; nightlight/crates/nightlight-server/src/main.rs; API proto は nightlight/crates/nightlight-api/proto/*
  • 現在ある証拠: nightlight-server は HTTP と gRPC を両方 bind する; node06 が gateway node で起動; docs/testing.mdnix/test-cluster/README.md が NightLight HTTP surface の host-forward proof を記述; local WAL/snapshot/retention loop がある。
  • 未証明事項: replicated metrics topology; large retention; sustained remote_write load; tenant isolation。
  • P0: いまの static survey で即死級の file-level breakage は未検出。
  • P1: NIGHTLIGHT-P1-01 は 2026-04-10 に scope-fixed。NightLight は single-node WAL/snapshot service として product shape を固定し、replicated / HA metrics path は unsupported であることを docs と guard に反映した。
  • P2: NIGHTLIGHT-P2-01 は 2026-04-10 に narrowed。tenant boundary は deployment-scoped か upstream-auth-scoped であり、process 内の hard multi-tenant auth や per-tenant retention は current product contract に含めないことを docs に固定した。次段は auth or quota aware multi-tenant proof の追加。
  • 依存関係: local disk; optional apigateway; external metric writers/readers。

creditservice

  • 責務: quota, wallet, reservation, admission control。
  • Canonical entrypoint: nix/modules/creditservice.nix; creditservice/crates/creditservice-server/src/main.rs; 製品スコープは creditservice/README.md
  • 現在ある証拠: creditservice/README.md が supported scope と non-goals を明記; docs/testing.mdfresh-matrix で quota/wallet/reservation/API-gateway path を proof 対象にしている; module は iamAddr, flaredbAddr, optional SQL backend を持つ; node06 が canonical gateway node で起動する。
  • 未証明事項: backend migration; finance-system との分離運用; export/reporting path。
  • P0: いまの static survey で即死級の file-level breakage は未検出。
  • P1: CREDIT-P1-01 製品 narrative が README の non-goal を越えて finance ledger に膨らまないよう境界維持が必要。
  • P2: CREDIT-P2-01 は 2026-04-10 に narrowed。export と backend migration は offline export/import or backend-native snapshot workflows として README へ固定し、live mixed-writer migration は unsupported と明示した。次段は dedicated export proof の追加。
  • 依存関係: iam, flaredb, optional chainfire; apigateway, k8shost, tenant admission flow が consumer。

deployer

  • 責務: bootstrap and rollout-intent authority。/api/v1/phone-home, install plan, desired-system reference, cluster inventory を持つ。
  • Canonical entrypoint: nix/modules/deployer.nix; deployer/crates/deployer-server/src/main.rs; route wiring は deployer/crates/deployer-server/src/lib.rs
  • 現在ある証拠: /api/v1/phone-home が server route に存在; nix/modules/deployer.nix が package/service/cluster-state seed を持つ; docs/testing.md, docs/rollout-bundle.md, nix/test-cluster/README.mdbaremetal-iso, baremetal-iso-e2e, deployer-vm-smoke, deployer-bootstrap-e2e, durability-proof, rollout-soak を正本 proof とする; verify-baremetal-iso.sh が install path を end-to-end で辿る; 2026-04-10 の durability-proofdeployer-pre-register-request.json, deployer-backup-list.json, deployer-post-restart-list.json, deployer-replayed-list.json を保存し、rollout-soak/mnt/d2/centra/photoncloud-monorepo/work/rollout-soak/20260410T164549+0900/deployer-post-restart-nodes.json, scope-fixed-contract.json, deployer-scope-fixed.txt, deployer-journal.log で longer-run live restart と release boundary marker を保存した。
  • 未証明事項: 実機 USB/BMC install; deployer 自身の true HA; ChainFire-backed multi-instance active failover の実装; operator disaster recovery の実機確認。
  • P0: DEPLOYER-P0-01 現在の canonical bare-metal proof は QEMU-as-hardware までで、実機 regression lane はまだ無い。
  • P1: DEPLOYER-P1-01 は 2026-04-10 に scope-fixed final。release contract は one active writer plus optional cold-standby restore with ultracloud.cluster state re-apply and preserved admin request replay で固定し、automatic ChainFire-backed multi-instance failover は supported surface の外へ明示的に出した。rollout-soak/mnt/d2/centra/photoncloud-monorepo/work/rollout-soak/20260410T164549+0900/deployer-post-restart-nodes.json, scope-fixed-contract.json, deployer-scope-fixed.txt で live restart proof と boundary marker を保存した。今後やるなら scope 拡張として true HA 実装を別 ticket で扱う。
  • P2: DEPLOYER-P2-01 bootstrapFlakeBundle と optional binary cache を production でどう供給するかの標準運用形がまだ文書化不足。
  • 依存関係: chainfire; nix-agent; install-target; ISO/first-boot path; optional binary cache。

fleet-scheduler

  • 責務: non-Kubernetes native service scheduler。cluster-native service placement, failover, publication reconciliation を持つ。
  • Canonical entrypoint: nix/modules/fleet-scheduler.nix; deployer/crates/fleet-scheduler/src/main.rs; publication code は publish.rs
  • 現在ある証拠: docs/testing.md, docs/rollout-bundle.md, nix/test-cluster/README.mdfresh-smoke, fresh-matrix, fleet-scheduler-e2e, rollout-soak をこの境界の proof とする; module は iamEndpoint, fiberlbEndpoint, flashdnsEndpoint, heartbeatTimeoutSecs を持つ; scheduler code は chainfire watch, dependency summary, publication reconciliation を持つ; fresh-smokenode04 -> draining, node05 fail-stop, worker return 後の replica restore を通し、rollout-soak/mnt/d2/centra/photoncloud-monorepo/work/rollout-soak/20260410T164549+0900/maintenance-held.json, power-loss-held.json, fleet-scheduler-post-restart.json, scope-fixed-contract.json, fleet-scheduler-scope-fixed.txt で scope-fixed longer-run proof を保存した。
  • 未証明事項: 大規模クラスタ; multi-hour maintenance 窓; operator approval workflow を伴う drain choreography。
  • P0: いまの static survey で即死級の file-level breakage は未検出。
  • P1: FLEET-P1-01 は 2026-04-10 に scope-fixed final。release contract は two native-runtime workers 上の one planned drain cycle + one fail-stop worker-loss cycle + 30-second held degraded states で固定し、rollout-soak/mnt/d2/centra/photoncloud-monorepo/work/rollout-soak/20260410T164549+0900/maintenance-held.json, power-loss-held.json, fleet-scheduler-post-restart.json, scope-fixed-contract.json, fleet-scheduler-scope-fixed.txt でその upper bound を live KVM artifact として保存した。multi-hour maintenance windows, pinned singleton policies, operator approval workflows, and larger-cluster drain storms は supported surface の外へ明示的に出した。
  • P2: FLEET-P2-01 は 2026-04-10 に解消済み。module/binary default の chainfireEndpoint は canonical http://127.0.0.1:2379 へ揃えた。
  • 依存関係: chainfire; node-agent; optional iam, fiberlb, flashdns

nix-agent

  • 責務: host-local NixOS convergence only。desired system を build/apply し、health check と rollback を担う。
  • Canonical entrypoint: nix/modules/nix-agent.nix; deployer/crates/nix-agent/src/main.rs
  • 現在ある証拠: docs/testing.md, docs/rollout-bundle.md, nix/test-cluster/README.mdbaremetal-iso, baremetal-iso-e2e, deployer-vm-smoke, deployer-vm-rollback, portable-control-plane-regressions を proof とする; code は desired-system, observed-system, rollback-on-failure, health-check-command を持つ; nix/modules/nix-agent.nix がその CLI 契約を正本生成する; 2026-04-10 の rollout-soak/mnt/d2/centra/photoncloud-monorepo/work/rollout-soak/20260410T154744+0900/node01-nix-agent-scope.txtnode04-nix-agent-scope.txt を保存し、steady-state test-cluster では live nix-agent.service restart を pretending しない boundary を artifact と docs で固定した。
  • 未証明事項: kernel/network failure 下の rollback; multi-node wave rollout; real hardware recovery after partial switch。
  • P0: いまの static survey で即死級の file-level breakage は未検出。
  • P1: NIXAGENT-P1-01 は 2026-04-10 に解消済み。healthCheckCommand の argv 契約、rollbackOnFailurerolled-back semantics、deployer-vm-rollback proof、partial failure recovery 手順は docs/rollout-bundle.mddocs/testing.md に固定された。
  • P2: NIXAGENT-P2-01 は 2026-04-10 に解消済み。module/binary default の chainfireEndpoint は canonical http://127.0.0.1:2379 へ揃えた。
  • 依存関係: chainfire; deployer が publish する desired-system; local NixOS flake / switch-to-configuration。

node-agent

  • 責務: host-local runtime reconcile only。native service instance の heartbeat, process/container 実行, local observed state を担う。
  • Canonical entrypoint: nix/modules/node-agent.nix; deployer/crates/node-agent/src/main.rs
  • 現在ある証拠: docs/testing.md, docs/rollout-bundle.md, nix/test-cluster/README.mdfresh-smoke, fresh-matrix, fleet-scheduler-e2e, portable-control-plane-regressions を proof とする; code は watcher, agent, process を持つ; module は Podman enable, stateDir, pidDir, allowLocalInstanceUpsert を持つ; process.rs${stateDir}/pids/*.log${stateDir}/pids/*.meta.json の contract を実装する。
  • 未証明事項: heterogeneous runtime support; crash-looping host service の細かな SLO; secret-rotation workflow そのもの。
  • P0: いまの static survey で即死級の file-level breakage は未検出。
  • P1: NODEAGENT-P1-01 は 2026-04-10 に解消済み。logs / secrets / volume / upgrade 契約は docs/rollout-bundle.md と module description に固定された。
  • P2: NODEAGENT-P2-01 は 2026-04-10 に解消済み。module/binary default の chainfireEndpoint は canonical http://127.0.0.1:2379 へ揃えた。
  • 依存関係: chainfire; fleet-scheduler; optional Podman; host systemd/process model。

Nix/bootstrap/harness

  • 責務: 製品 surface を定義し、single-node dev, 3-node HA control plane, bare-metal bootstrap の NixOS outputs と VM/QEMU harness を正本化する。
  • Canonical entrypoint: flake.nix; nix/modules/default.nix; nix/single-node/base.nix; nix/test-cluster/run-publishable-kvm-suite.sh; nix/test-cluster/run-local-baseline.sh; nix/test-cluster/verify-baremetal-iso.sh; nix/nodes/baremetal-qemu/*
  • 現在ある証拠: flake.nixsingle-node-quickstart, single-node-trial-vm, canonical-profile-eval-guards, portable-control-plane-regressions, baremetal-iso-e2e がある; nix/modules/default.nix が現在の module surface を一括 import する; nix/single-node/base.nix が最小 VM platform core と optional bundle を組む; run-publishable-kvm-suite.shrun-local-baseline.sh が local CPU 並列度と local builder を固定する; verify-baremetal-iso.sh が ISO -> phone-home -> bundle fetch -> Disko -> reboot -> nix-agent active を辿る; run-cluster.sh には durability-proofrollout-soak が追加され、chainfire, flaredb, deployer, coronafs, lightningstor の backup/restore と failure-injection artifact を /work/durability-proof に、longer-run rollout/control-plane maintenance artifact を /work/rollout-soak に保存する; 2026-04-10 の local AMD/KVM baseline で required 6 checks と single-node-quickstart, baremetal-iso, fresh-smoke がすべて pass した。
  • 未証明事項: 実機 USB/BMC install; /nix/store 容量制御の自動 guard; optional bundle 全部入り quickstart の release proof; non-Nix easy-trial artifact。
  • P0: HARNESS-P0-01 real hardware regression lane がまだ無く、canonical bare-metal proof は QEMU stand-in のまま。
  • P1: HARNESS-P1-01 は 2026-04-10 に解消済み。quickstart optional bundle の health gating は lightningstor, flashdns, fiberlb の TCP probe と coronafs50088/healthz へ揃えた。
  • P1: HARNESS-P1-02 は 2026-04-10 に scope-fixed。easy-trial は single-node-trial-vm による Nix VM appliance で成立し、より軽い Docker/OCI 風 trial path を supported としない理由は docs/edge-trial-surface.md, README.md, docs/testing.md, docs/component-matrix.md, nix/single-node/surface.nix, supported-surface-guard に揃えた。
  • P1: HARNESS-P1-03 は 2026-04-10 に解消済み。fresh-smoke の stale VM cleanup は current profile の vm_dir / vde_switch_dir に含まれる PID に限定し、別 checkout の同名 cluster VM を巻き込まないようにした。
  • P2: HARNESS-P2-01 は 2026-04-10 に解消済み。./work と local builder parallelism に加えて ./nix/test-cluster/work-root-budget.shstatus に加えて enforceprune-proof-logs を持つようになり、disk budget advisory だけでなく stronger local budget gate と safer dated-proof cleanup workflow を提供するようになった。
  • 依存関係: nix, nixpkgs, QEMU/KVM, host disk under ./work, local CPU parallelism, 全 component module 群。

Notes For The Next Implementation Agent

  • まず DEPLOYER-P0-01 / HARNESS-P0-01 を処理すると、hardware proof と実機 operator path の残件を低コストで減らせる。
  • baseline 再現は nix/test-cluster/run-local-baseline.sh を使うと、local-only builder と ./work 配下ログを固定したまま同じ経路を再実行できる。
  • その次に DEPLOYER-P0-01 / HARNESS-P0-01 を実機 smoke へ進めると、QEMU-only から hardware path へ移れる。
  • DEPLOYER-P1-01FLEET-P1-01 は scope-fixed final になった。今後それらを再度開くなら、current release boundary を拡張する別 tranche として true deployer HA や larger-cluster scheduler maintenance proof を扱うとよい。
  • FIBERLB-P1-01 は scope-fixed になったが、将来的に backend certificate verification を製品化するなら docs/guard の限定契約を書き換える必要がある。