# Core Control Plane Operations This document fixes the supported operator lifecycle for the core control-plane services: `chainfire`, `flaredb`, and `iam`. ## ChainFire Membership And Node Replacement ChainFire dynamic membership, replace-node, and scale-out are unsupported on the supported surface. The supported public surface is the fixed-membership cluster API already documented in `chainfire-api`: `MemberList` and `Status` report the membership that the node booted with, and operators should treat that membership as immutable for a release branch. Supported operator actions today: 1. Keep the canonical control plane at the documented fixed membership for the branch. 2. Use the canonical `durability-proof` backup/restore lane before disruptive maintenance. 3. Use `nix run ./nix/test-cluster#cluster -- rollout-soak` when you need a longer-running fixed-membership restart proof after maintenance or rollout work. 4. Recover failed nodes by restoring the same fixed-membership cluster shape or by rebuilding the whole cluster with a freshly published static membership and then restoring data. Unsupported operator actions today: 1. Live `replace-node` through a public ChainFire API. 2. Live `scale-out` by adding new voters on the supported surface. 3. Relying on internal membership helpers as a published product contract. The focused boundary proof is `./nix/test-cluster/run-core-control-plane-ops-proof.sh`, which records the fixed-membership source marker from `chainfire-api` and the public docs markers under `./work/core-control-plane-ops-proof`. The live-operations companion is `nix run ./nix/test-cluster#cluster -- rollout-soak`, which on 2026-04-10 recorded `chainfire-post-restart-put.json`, `chainfire-post-restart.json`, and `post-control-plane-restarts.json` under `./work/rollout-soak/20260410T164549+0900` after repeated maintenance and worker power-loss, without promoting dynamic membership to supported scope. ## FlareDB Online Migration And Schema Evolution FlareDB online migration and schema evolution must start from the durability-proof backup/restore baseline. The supported operator contract is additive-first schema evolution: 1. Run `nix run ./nix/test-cluster#cluster -- durability-proof` or keep an equivalent logical backup artifact before changing schema. 2. Apply additive changes first: new tables, new nullable columns, new indexes, and code paths that tolerate both old and new shapes. 3. Backfill data and cut read traffic to the new schema before deleting or rewriting old state. 4. Treat destructive cleanup, `DROP TABLE`, and incompatible column rewrites as a later maintenance step after a fresh backup. This keeps the migration runbook consistent with the current product proof: the durability lane proves logical SQL backup/restore, and the 2026-04-10 `rollout-soak` artifact root `./work/rollout-soak/20260410T164549+0900` rechecks additive SQL operations through `flaredb-post-restart-create.json`, `flaredb-post-restart-insert.json`, and `flaredb-post-restart.json` after a FlareDB member restart. The operator contract for live changes stays additive schema evolution rather than destructive in-place rewrites. FlareDB destructive DDL and fully automated online migration remain outside the supported product contract for this release. When you need `DROP TABLE`, incompatible column rewrites, or automated destructive cutover, stop at the additive-first boundary above, take a fresh logical backup, and treat the destructive step as an explicit offline maintenance action rather than a release-proven online behavior. Internal raft membership helpers in `flaredb-raft` exist for implementation work, but they are not the published operator API for schema migration. ## IAM Bootstrap Hardening And Rotation IAM bootstrap hardening requires an explicit admin token, an explicit signing key, and a 32-byte IAM_CRED_MASTER_KEY; signing-key rotation, credential rotation, and mTLS overlap-and-cutover rotation are the supported recovery paths. Production bootstrap contract: 1. Set `IAM_ADMIN_TOKEN` or `PHOTON_IAM_ADMIN_TOKEN`. 2. Set `authn.internal_token.signing_key` in config or provide the equivalent environment-backed configuration. 3. Set `IAM_CRED_MASTER_KEY` to a 32-byte value before enabling credential issuance. 4. Keep `admin.allow_unauthenticated=true`, `IAM_ALLOW_UNAUTHENTICATED_ADMIN=true`, and random signing keys limited to local development or lab proof environments. Supported token and key rotation flow: 1. Add the new signing key and keep the old key available for verification during the overlap window. 2. Issue new tokens from the new active key. 3. Wait for the maximum supported token TTL or explicitly revoke the old population before retiring the old key. 4. Purge retired keys only after the overlap and retirement windows are complete. Supported credential rotation flow: 1. Keep `IAM_CRED_MASTER_KEY` explicit and stable across the overlap window. 2. Mint a new credential for the same principal before revoking the old one. 3. Move clients to the new access key and verify it can still read back its secret material. 4. Revoke the old credential only after cutover is complete. Supported mTLS overlap-and-cutover rotation flow: 1. Configure IAM to trust both the old and new service identity mapping or trust roots during the overlap window. 2. Issue or install the new client certificate and cut traffic over to it. 3. Remove the old mapping or trust root only after the new certificate is serving traffic successfully. 4. Verify the old certificate is rejected once the overlap window closes. Multi-node IAM failover remains outside the supported product contract for this release. The current release proof is lifecycle-oriented rather than HA-oriented: bootstrap hardening, signing-key rotation, credential overlap-and-revoke rotation, and mTLS overlap-and-cutover rotation are supported; clustered IAM failover is future scope expansion. The standalone proof is `./nix/test-cluster/run-core-control-plane-ops-proof.sh`. It runs the `iam-authn` signing-key and mTLS rotation tests plus the `iam-api` credential rotation test, records the bootstrap hardening source markers from `iam-server`, and persists logs plus `result.json` and `scope-fixed-contract.json` under `./work/core-control-plane-ops-proof`. The dated 2026-04-10 artifact root is `./work/core-control-plane-ops-proof/20260410T172148+09:00`.