まとめてコミット

This commit is contained in:
centra 2026-05-05 22:49:03 +09:00
parent c1d4178a52
commit 72a68e8fc4
35 changed files with 3604 additions and 349 deletions

View file

@ -25,7 +25,7 @@ The canonical bare-metal bootstrap proof is the ISO-on-QEMU path under [`nix/tes
## Core API Notes
- `chainfire` ships a fixed-membership cluster API on the supported surface. Public cluster management is `MemberList` plus `Status`, and the internal Raft transport surface is `Vote` plus `AppendEntries`. `chainfire-core` is workspace-internal only; the old embeddable builder and distributed-KV scaffold are not part of the supported product contract.
- `chainfire` ships a live cluster-management API on the supported surface. Public cluster management is `MemberAdd`, `MemberRemove`, `MemberList`, `Status`, and `LeaderTransfer`, and the internal Raft transport surface is `Vote`, `AppendEntries`, plus `TimeoutNow`. `chainfire-core` is workspace-internal only; the old embeddable builder and distributed-KV scaffold are not part of the supported product contract.
- `flaredb` ships SQL on both gRPC and REST. The supported REST SQL surface is `POST /api/v1/sql` for statement execution and `GET /api/v1/tables` for table discovery, alongside the existing KV and scan endpoints.
- `plasmavmc` ships a KVM-only public VM backend contract. The supported create and recovery surface is the KVM path exercised in `single-node-quickstart`, `fresh-smoke`, and `fresh-matrix`; Firecracker and mvisor remain archived non-product backends outside the supported surface until they have real tenant-network coverage.
- `lightningstor` keeps its optional gRPC surface live: bucket versioning, bucket policy, bucket tagging, and explicit object version listing are part of the supported contract for the canonical optional bundle.
@ -38,7 +38,7 @@ The canonical bare-metal bootstrap proof is the ISO-on-QEMU path under [`nix/tes
The control-plane operator contract is fixed in [docs/control-plane-ops.md](docs/control-plane-ops.md).
- ChainFire dynamic membership, replace-node, and scale-out are unsupported on the supported surface; the supported operator path is fixed-membership restore or whole-cluster replacement backed by the `durability-proof` backup/restore baseline.
- ChainFire supports live membership add, remove, promotion, endpoint replacement, and leader transfer for voters and learners on the public surface, including current-leader removal followed by election on the remaining voters. The supported reconfiguration boundary is sequential one-voter transitions until joint consensus lands. The fallback operator path remains backup plus restore through `durability-proof`, and the dedicated KVM proof lane is `nix run ./nix/test-cluster#cluster -- chainfire-live-membership-proof`.
- FlareDB online migration and schema evolution must start from the durability-proof backup/restore baseline and stay additive-first until a later destructive cleanup window. FlareDB destructive DDL and fully automated online migration remain outside the supported product contract for this release.
- IAM bootstrap hardening requires an explicit admin token, an explicit signing key, and a 32-byte IAM_CRED_MASTER_KEY. Signing-key rotation, credential overlap-and-revoke rotation, and mTLS overlap-and-cutover rotation are part of the supported operator contract; multi-node IAM failover remains outside the supported product contract. The standalone proof is `./nix/test-cluster/run-core-control-plane-ops-proof.sh`.
@ -93,6 +93,7 @@ nix develop
nix run ./nix/test-cluster#cluster -- fresh-smoke
nix run ./nix/test-cluster#cluster -- fresh-demo-vm-webapp
nix run ./nix/test-cluster#cluster -- fresh-matrix
nix run ./nix/test-cluster#cluster -- chainfire-live-membership-proof
./nix/test-cluster/run-publishable-kvm-suite.sh ./work/publishable-kvm-suite
```
@ -100,6 +101,7 @@ The checked-in entrypoint for the publishable nested-KVM suite is the local wrap
For the full supported-surface proof on a local AMD/KVM host, use `./nix/test-cluster/run-supported-surface-final-proof.sh ./work/final-proofs/latest`; it keeps builders local, builds `single-node-trial-vm`, runs `single-node-quickstart`, and captures the publishable KVM suite logs in one place.
`nix run ./nix/test-cluster#cluster -- durability-proof` is the canonical chainfire flaredb deployer backup/restore lane. It persists artifacts under `./work/durability-proof/latest`, proves logical backup/restore for ChainFire keys and FlareDB SQL rows, uses the canonical Deployer admin pre-register request itself as the backup artifact, verifies that the pre-registered node survives a `deployer.service` restart, replays the same request idempotently, and injects CoronaFS plus LightningStor failures against the same live KVM cluster.
`nix run ./nix/test-cluster#cluster -- rollout-soak` is the longer-running control-plane and rollout companion lane. It rebuilds from clean local KVM runtime state, persists artifacts under `./work/rollout-soak/latest`, validates exactly one planned `draining` maintenance cycle and one fail-stop worker-loss cycle on the two native-runtime workers, holds each degraded state for the configured soak window, then restarts `deployer`, `fleet-scheduler`, `node-agent`, `chainfire`, and `flaredb` before revalidating the cluster. The soak root also carries explicit scope markers so the supported boundary is encoded in the proof artifacts rather than only in docs. The steady-state KVM nodes do not run `nix-agent.service`, so the soak lane records explicit `nix-agent` scope markers instead of pretending a live-cluster `nix-agent` restart happened.
`nix run ./nix/test-cluster#cluster -- chainfire-live-membership-proof` is the focused local-KVM live-reconfiguration lane for ChainFire. It rebuilds from clean local runtime state, starts a temporary ChainFire replica on `node04`, proves learner add plus local replication, voter promotion, live leader transfer, temporary-voter restart and rejoin, current-leader removal followed by re-election, removed-leader re-add, and final scale-in back to the canonical 3-node control-plane shape, and stores the resulting membership or local-read artifacts under `./work/chainfire-live-membership-proof/latest`.
`nix run ./nix/test-cluster#cluster -- provider-vm-reality-proof` is the focused local-KVM reality lane for the provider and VM-hosting bundles. It stores artifacts under `./work/provider-vm-reality-proof/latest`, captures authoritative FlashDNS answers, FiberLB backend drain and restore evidence, and PlasmaVMC KVM shared-storage migration plus post-migration restart state.
The 2026-04-10 local AMD/KVM proof logs are in `./work/final-proofs/32f64c10-1b74-4d8a-8d7d-b2cc6bf6b4f0-final` for `supported-surface-guard`, `single-node-trial-vm`, and `single-node-quickstart`, and in `./work/publishable-kvm-suite` for the final passing `fresh-smoke`, `fresh-demo-vm-webapp`, and `fresh-matrix` run through `./nix/test-cluster/run-publishable-kvm-suite.sh`.
The exact bare-metal check-runner proof from `2026-04-10` is in `./work/baremetal-iso-e2e/0de75570-dabd-471b-95fe-5898c54e2e8c`; its outer `environment.txt` records `execution_model=materialized-check-runner`, and `state/environment.txt` records `vm_accelerator_mode=kvm`.
@ -108,13 +110,13 @@ The 2026-04-10 longer-running rollout and control-plane soak is in `./work/rollo
The 2026-04-10 provider and VM-hosting reality proof logs are in `./work/provider-vm-reality-proof/20260410T135827+0900`; `result.json` records `success=true`, and the artifact set includes `network-provider/fiberlb-drain-summary.txt`, `network-provider/flashdns-service-authoritative-answer.txt`, `vm-hosting/migration-summary.json`, and `vm-hosting/root-volume-after-post-migration-restart.json`.
Physical-node bring-up now has a canonical preflight wrapper as well: `nix run ./nix/test-cluster#hardware-smoke -- preflight`. It writes `kernel-params.txt`, expected markers, failure markers, and a machine-readable blocked or ready state under `./work/hardware-smoke/latest`, and the same entrypoint can later be rerun as `run` or `capture` when USB or BMC/Redfish transport is actually present.
Within that suite, `fresh-matrix` is the public provider-bundle proof: it exercises PrismNet VPC/subnet/port flows plus security-group ACL add/remove, FlashDNS record publication, and FiberLB TCP plus TLS-terminated `Https` / `TerminatedHttps` listeners in one tenant-scoped composition run. The published FiberLB L4 algorithms are kept honest with targeted server unit tests in-tree. `provider-vm-reality-proof` is the artifact-producing companion lane for the same bundle and for the VM-hosting path.
Within that suite, `fresh-matrix` is the public provider-bundle proof: it exercises PrismNet VPC/subnet/port flows plus security-group ACL add/remove, FlashDNS record publication, and FiberLB TCP plus TLS-terminated `Https` / `TerminatedHttps` listeners in one tenant-scoped composition run. The published FiberLB L4 algorithms are kept honest with targeted server unit tests in-tree. `provider-vm-reality-proof` is the artifact-producing companion lane for the same bundle and for the VM-hosting path, and `chainfire-live-membership-proof` is the dedicated control-plane live-reconfiguration companion for ChainFire.
PrismNet real OVS/OVN dataplane validation remains outside the supported local KVM surface. FiberLB native BGP or BFD peer interop plus hardware VIP ownership also remain outside the supported local KVM surface. PlasmaVMC real-hardware migration or storage handoff remains a later hardware proof; the current local-KVM proof fixes the release surface to KVM shared-storage migration on the worker pair.
Project-done release proof now requires both halves of the public validation surface to be green:
- `baremetal-iso` and `baremetal-iso-e2e` for the canonical `deployer -> installer -> nix-agent` bare-metal bootstrap path
- the KVM publishable suite (`fresh-smoke`, `fresh-demo-vm-webapp`, `fresh-matrix`) for the nested-KVM multi-node VM-hosting path
- the KVM publishable suite (`fresh-smoke`, `fresh-demo-vm-webapp`, `fresh-matrix`, `chainfire-live-membership-proof`) for the nested-KVM multi-node VM-hosting and live-control-plane path
Canonical bare-metal bootstrap proof:

View file

@ -4,7 +4,8 @@ use crate::error::{ClientError, Result};
use crate::watch::WatchHandle;
use chainfire_proto::proto::{
cluster_client::ClusterClient, compare, kv_client::KvClient, request_op, response_op,
watch_client::WatchClient, Compare, DeleteRangeRequest, PutRequest, RangeRequest, RequestOp,
watch_client::WatchClient, Compare, DeleteRangeRequest, LeaderTransferRequest, Member,
MemberAddRequest, MemberListRequest, MemberRemoveRequest, PutRequest, RangeRequest, RequestOp,
StatusRequest, TxnRequest,
};
use std::time::Duration;
@ -616,6 +617,89 @@ impl Client {
raft_term: resp.raft_term,
})
}
/// List current cluster members.
pub async fn member_list(&mut self) -> Result<Vec<ClusterMemberInfo>> {
let resp = self
.with_cluster_retry(|mut cluster| async move {
cluster
.member_list(MemberListRequest {})
.await
.map(|resp| resp.into_inner())
})
.await?;
Ok(resp
.members
.into_iter()
.map(ClusterMemberInfo::from)
.collect())
}
/// Add or update a cluster member.
pub async fn member_add(
&mut self,
id: u64,
name: impl Into<String>,
peer_urls: Vec<String>,
client_urls: Vec<String>,
is_learner: bool,
) -> Result<ClusterMemberInfo> {
let name = name.into();
let resp = self
.with_cluster_retry(|mut cluster| {
let request = MemberAddRequest {
id,
name: name.clone(),
peer_urls: peer_urls.clone(),
client_urls: client_urls.clone(),
is_learner,
};
async move {
cluster
.member_add(request)
.await
.map(|resp| resp.into_inner())
}
})
.await?;
resp.member
.map(ClusterMemberInfo::from)
.ok_or_else(|| ClientError::Internal("member_add response missing member".to_string()))
}
/// Remove a cluster member.
pub async fn member_remove(&mut self, id: u64) -> Result<Vec<ClusterMemberInfo>> {
let resp = self
.with_cluster_retry(|mut cluster| async move {
cluster
.member_remove(MemberRemoveRequest { id })
.await
.map(|resp| resp.into_inner())
})
.await?;
Ok(resp
.members
.into_iter()
.map(ClusterMemberInfo::from)
.collect())
}
/// Transfer leadership to a specific voting member.
pub async fn leader_transfer(&mut self, target_id: u64) -> Result<u64> {
let resp = self
.with_cluster_retry(|mut cluster| async move {
cluster
.leader_transfer(LeaderTransferRequest { target_id })
.await
.map(|resp| resp.into_inner())
})
.await?;
Ok(resp.leader)
}
}
/// Cluster status
@ -629,6 +713,33 @@ pub struct ClusterStatus {
pub raft_term: u64,
}
/// Cluster member returned by cluster-management RPCs.
#[derive(Debug, Clone)]
pub struct ClusterMemberInfo {
/// Unique member ID.
pub id: u64,
/// Human-readable node name.
pub name: String,
/// Peer URLs used for Raft replication.
pub peer_urls: Vec<String>,
/// Client URLs exposed by the node.
pub client_urls: Vec<String>,
/// Whether this member is configured as a learner.
pub is_learner: bool,
}
impl From<Member> for ClusterMemberInfo {
fn from(member: Member) -> Self {
Self {
id: member.id,
name: member.name,
peer_urls: member.peer_urls,
client_urls: member.client_urls,
is_learner: member.is_learner,
}
}
}
/// CAS outcome returned by compare_and_swap
#[derive(Debug, Clone)]
pub struct CasOutcome {

View file

@ -1,76 +1,153 @@
//! Cluster management service implementation
//! Cluster management service implementation.
//!
//! This service handles cluster operations and status queries.
//! The supported surface reports the fixed membership that the node booted with.
//! This service exposes live member add/remove/list/status operations backed by
//! the replicated membership state in `RaftCore`.
use crate::conversions::make_header;
use crate::proto::{
cluster_server::Cluster, Member, MemberListRequest, MemberListResponse, StatusRequest,
StatusResponse,
cluster_server::Cluster, LeaderTransferRequest, LeaderTransferResponse, Member,
MemberAddRequest, MemberAddResponse, MemberListRequest, MemberListResponse,
MemberRemoveRequest, MemberRemoveResponse, StatusRequest, StatusResponse,
};
use chainfire_raft::core::RaftCore;
use chainfire_raft::core::{ClusterMember as CoreClusterMember, ClusterMembership, RaftCore};
use std::sync::Arc;
use tonic::{Request, Response, Status};
use tracing::debug;
/// Cluster service implementation
/// Cluster service implementation.
pub struct ClusterServiceImpl {
/// Raft core
/// Raft core.
raft: Arc<RaftCore>,
/// Cluster ID
/// Cluster ID.
cluster_id: u64,
/// Configured members with client and peer URLs
members: Vec<Member>,
/// Server version
/// Server version.
version: String,
}
impl ClusterServiceImpl {
/// Create a new cluster service
pub fn new(
raft: Arc<RaftCore>,
cluster_id: u64,
members: Vec<Member>,
) -> Self {
/// Create a new cluster service.
pub fn new(raft: Arc<RaftCore>, cluster_id: u64) -> Self {
Self {
raft,
cluster_id,
members,
version: env!("CARGO_PKG_VERSION").to_string(),
}
}
fn make_header(&self, revision: u64) -> crate::proto::ResponseHeader {
make_header(self.cluster_id, self.raft.node_id(), revision, 0)
async fn make_header(&self, revision: u64) -> crate::proto::ResponseHeader {
let term = self.raft.current_term().await;
make_header(self.cluster_id, self.raft.node_id(), revision, term)
}
/// Get current members as proto Member list
/// Return the configured static membership that the server was booted with.
async fn get_member_list(&self) -> Vec<Member> {
if self.members.is_empty() {
return vec![Member {
id: self.raft.node_id(),
name: format!("node-{}", self.raft.node_id()),
peer_urls: vec![],
client_urls: vec![],
is_learner: false,
}];
fn proto_member(member: &CoreClusterMember) -> Member {
Member {
id: member.id,
name: member.name.clone(),
peer_urls: member.peer_urls.clone(),
client_urls: member.client_urls.clone(),
is_learner: member.is_learner,
}
}
fn proto_members(membership: &ClusterMembership) -> Vec<Member> {
membership.members.iter().map(Self::proto_member).collect()
}
}
fn map_raft_error(error: chainfire_raft::core::RaftError) -> Status {
match error {
chainfire_raft::core::RaftError::NotLeader { leader_id } => {
Status::failed_precondition(format!("not leader; current leader is {:?}", leader_id))
}
chainfire_raft::core::RaftError::Rejected(message) => Status::failed_precondition(message),
chainfire_raft::core::RaftError::StorageError(message)
| chainfire_raft::core::RaftError::NetworkError(message) => Status::internal(message),
chainfire_raft::core::RaftError::Timeout => {
Status::deadline_exceeded("cluster operation timed out")
}
self.members.clone()
}
}
#[tonic::async_trait]
impl Cluster for ClusterServiceImpl {
async fn member_add(
&self,
request: Request<MemberAddRequest>,
) -> Result<Response<MemberAddResponse>, Status> {
let req = request.into_inner();
debug!(member_id = req.id, "Member add request");
if req.id == 0 {
return Err(Status::invalid_argument("member id must be non-zero"));
}
if req.peer_urls.is_empty() {
return Err(Status::invalid_argument(
"member add requires at least one peer URL",
));
}
let member = CoreClusterMember {
id: req.id,
name: if req.name.trim().is_empty() {
format!("node-{}", req.id)
} else {
req.name
},
peer_urls: req.peer_urls,
client_urls: req.client_urls,
is_learner: req.is_learner,
};
let membership = self
.raft
.add_member(member.clone())
.await
.map_err(map_raft_error)?;
let revision = self.raft.last_applied().await;
let applied_member = membership.member(member.id).cloned().unwrap_or(member);
Ok(Response::new(MemberAddResponse {
header: Some(self.make_header(revision).await),
member: Some(Self::proto_member(&applied_member)),
members: Self::proto_members(&membership),
}))
}
async fn member_remove(
&self,
request: Request<MemberRemoveRequest>,
) -> Result<Response<MemberRemoveResponse>, Status> {
let req = request.into_inner();
debug!(member_id = req.id, "Member remove request");
if req.id == 0 {
return Err(Status::invalid_argument("member id must be non-zero"));
}
let membership = self
.raft
.remove_member(req.id)
.await
.map_err(map_raft_error)?;
let revision = self.raft.last_applied().await;
Ok(Response::new(MemberRemoveResponse {
header: Some(self.make_header(revision).await),
members: Self::proto_members(&membership),
}))
}
async fn member_list(
&self,
_request: Request<MemberListRequest>,
) -> Result<Response<MemberListResponse>, Status> {
debug!("Member list request");
let revision = self.raft.last_applied().await;
let membership = self.raft.cluster_membership().await;
Ok(Response::new(MemberListResponse {
header: Some(self.make_header(0)),
members: self.get_member_list().await,
header: Some(self.make_header(revision).await),
members: Self::proto_members(&membership),
}))
}
@ -86,7 +163,7 @@ impl Cluster for ClusterServiceImpl {
let last_applied = self.raft.last_applied().await;
Ok(Response::new(StatusResponse {
header: Some(self.make_header(last_applied)),
header: Some(self.make_header(last_applied).await),
version: self.version.clone(),
db_size: 0,
leader: leader.unwrap_or(0),
@ -95,4 +172,30 @@ impl Cluster for ClusterServiceImpl {
raft_applied_index: last_applied,
}))
}
async fn leader_transfer(
&self,
request: Request<LeaderTransferRequest>,
) -> Result<Response<LeaderTransferResponse>, Status> {
let req = request.into_inner();
debug!(target_id = req.target_id, "Leader transfer request");
if req.target_id == 0 {
return Err(Status::invalid_argument(
"leader transfer target must be non-zero",
));
}
let leader = self
.raft
.transfer_leader(req.target_id)
.await
.map_err(map_raft_error)?;
let revision = self.raft.last_applied().await;
Ok(Response::new(LeaderTransferResponse {
header: Some(self.make_header(revision).await),
leader,
}))
}
}

View file

@ -5,8 +5,9 @@
use crate::internal_proto::{
raft_service_server::RaftService, AppendEntriesRequest as ProtoAppendEntriesRequest,
AppendEntriesResponse as ProtoAppendEntriesResponse, VoteRequest as ProtoVoteRequest,
VoteResponse as ProtoVoteResponse,
AppendEntriesResponse as ProtoAppendEntriesResponse, EntryType as ProtoEntryType,
TimeoutNowRequest as ProtoTimeoutNowRequest, TimeoutNowResponse as ProtoTimeoutNowResponse,
VoteRequest as ProtoVoteRequest, VoteResponse as ProtoVoteResponse,
};
use chainfire_raft::core::{AppendEntriesRequest, RaftCore, VoteRequest};
use chainfire_storage::{EntryPayload, LogEntry as RaftLogEntry, LogId};
@ -31,6 +32,32 @@ impl RaftServiceImpl {
}
}
fn decode_log_entry(
entry: crate::internal_proto::LogEntry,
) -> Result<RaftLogEntry<RaftCommand>, Status> {
let payload = match ProtoEntryType::try_from(entry.entry_type).unwrap_or(ProtoEntryType::Blank)
{
ProtoEntryType::Blank => EntryPayload::Blank,
ProtoEntryType::Normal => {
let command = bincode::deserialize::<RaftCommand>(&entry.data).map_err(|err| {
Status::invalid_argument(format!(
"failed to decode normal raft entry payload: {err}"
))
})?;
EntryPayload::Normal(command)
}
ProtoEntryType::Membership => EntryPayload::Membership(entry.data),
};
Ok(RaftLogEntry {
log_id: LogId {
term: entry.term,
index: entry.index,
},
payload,
})
}
#[tonic::async_trait]
impl RaftService for RaftServiceImpl {
async fn vote(
@ -91,26 +118,8 @@ impl RaftService for RaftServiceImpl {
let entries: Vec<RaftLogEntry<RaftCommand>> = req
.entries
.into_iter()
.map(|e| {
let payload = if e.data.is_empty() {
EntryPayload::Blank
} else {
// Deserialize the command from the entry data
match bincode::deserialize::<RaftCommand>(&e.data) {
Ok(cmd) => EntryPayload::Normal(cmd),
Err(_) => EntryPayload::Blank,
}
};
RaftLogEntry {
log_id: LogId {
term: e.term,
index: e.index,
},
payload,
}
})
.collect();
.map(decode_log_entry)
.collect::<Result<Vec<_>, _>>()?;
let append_req = AppendEntriesRequest {
term: req.term,
@ -140,4 +149,48 @@ impl RaftService for RaftServiceImpl {
}))
}
async fn timeout_now(
&self,
_request: Request<ProtoTimeoutNowRequest>,
) -> Result<Response<ProtoTimeoutNowResponse>, Status> {
let (resp_tx, resp_rx) = oneshot::channel();
self.raft.timeout_now_rpc(resp_tx).await;
let result = resp_rx.await.map_err(|e| {
warn!(error = %e, "TimeoutNow request channel closed");
Status::internal("TimeoutNow request failed: channel closed")
})?;
let term = self.raft.current_term().await;
match result {
Ok(()) => Ok(Response::new(ProtoTimeoutNowResponse {
accepted: true,
term,
})),
Err(err) => Err(Status::failed_precondition(err.to_string())),
}
}
}
#[cfg(test)]
mod tests {
use super::*;
use chainfire_storage::EntryPayload;
#[test]
fn decode_log_entry_preserves_membership_payloads() {
let expected = vec![1, 2, 3, 4];
let decoded = decode_log_entry(crate::internal_proto::LogEntry {
index: 7,
term: 3,
data: expected.clone(),
entry_type: ProtoEntryType::Membership as i32,
})
.expect("decode membership entry");
match decoded.payload {
EntryPayload::Membership(bytes) => assert_eq!(bytes, expected),
other => panic!("expected membership payload, got {other:?}"),
}
}
}

View file

@ -5,7 +5,8 @@
use crate::internal_proto::{
raft_service_client::RaftServiceClient, AppendEntriesRequest as ProtoAppendEntriesRequest,
LogEntry as ProtoLogEntry, VoteRequest as ProtoVoteRequest,
EntryType as ProtoEntryType, LogEntry as ProtoLogEntry,
TimeoutNowRequest as ProtoTimeoutNowRequest, VoteRequest as ProtoVoteRequest,
};
use chainfire_raft::network::{RaftNetworkError, RaftRpcClient};
use chainfire_types::NodeId;
@ -241,6 +242,30 @@ impl Default for GrpcRaftClient {
#[async_trait::async_trait]
impl RaftRpcClient for GrpcRaftClient {
async fn add_node(&self, target: NodeId, addr: String) -> Result<(), RaftNetworkError> {
GrpcRaftClient::add_node(self, target, addr).await;
Ok(())
}
async fn remove_node(&self, target: NodeId) -> Result<(), RaftNetworkError> {
GrpcRaftClient::remove_node(self, target).await;
Ok(())
}
async fn timeout_now(&self, target: NodeId) -> Result<(), RaftNetworkError> {
trace!(target = target, "Sending timeout-now request");
self.with_retry(target, "timeout_now", || async {
let mut client = self.get_client(target).await?;
client
.timeout_now(ProtoTimeoutNowRequest {})
.await
.map_err(|e| RaftNetworkError::RpcFailed(e.to_string()))?;
Ok(())
})
.await
}
async fn vote(
&self,
target: NodeId,
@ -286,17 +311,22 @@ impl RaftRpcClient for GrpcRaftClient {
);
// Clone entries once for potential retries
let entries_data: Vec<(u64, u64, Vec<u8>)> = req
let entries_data: Vec<(u64, u64, Vec<u8>, i32)> = req
.entries
.iter()
.map(|e| {
use chainfire_storage::EntryPayload;
let data = match &e.payload {
EntryPayload::Blank => vec![],
EntryPayload::Normal(cmd) => bincode::serialize(cmd).unwrap_or_default(),
EntryPayload::Membership(_) => vec![],
let (data, entry_type) = match &e.payload {
EntryPayload::Blank => (vec![], ProtoEntryType::Blank as i32),
EntryPayload::Normal(cmd) => (
bincode::serialize(cmd).unwrap_or_default(),
ProtoEntryType::Normal as i32,
),
EntryPayload::Membership(bytes) => {
(bytes.clone(), ProtoEntryType::Membership as i32)
}
};
(e.log_id.index, e.log_id.term, data)
(e.log_id.index, e.log_id.term, data, entry_type)
})
.collect();
@ -313,7 +343,12 @@ impl RaftRpcClient for GrpcRaftClient {
let entries: Vec<ProtoLogEntry> = entries_data
.into_iter()
.map(|(index, term, data)| ProtoLogEntry { index, term, data })
.map(|(index, term, data, entry_type)| ProtoLogEntry {
index,
term,
data,
entry_type,
})
.collect();
let proto_req = ProtoAppendEntriesRequest {

View file

@ -1,6 +1,6 @@
//! Internal compatibility crate for workspace-local ChainFire types.
//!
//! The supported ChainFire product surface is the fixed-membership
//! The supported ChainFire product surface is the live-membership
//! `chainfire-server` / `chainfire-api` contract documented in the repository
//! root. This crate intentionally does not export an embeddable cluster,
//! membership-mutation, or distributed-KV API.

File diff suppressed because it is too large Load diff

View file

@ -27,6 +27,12 @@ pub enum RaftNetworkError {
/// Trait for sending Raft RPCs
#[async_trait::async_trait]
pub trait RaftRpcClient: Send + Sync + 'static {
async fn add_node(&self, target: NodeId, addr: String) -> Result<(), RaftNetworkError>;
async fn remove_node(&self, target: NodeId) -> Result<(), RaftNetworkError>;
async fn timeout_now(&self, target: NodeId) -> Result<(), RaftNetworkError>;
async fn vote(
&self,
target: NodeId,
@ -59,6 +65,7 @@ pub mod test_client {
AppendEntriesRequest,
tokio::sync::oneshot::Sender<AppendEntriesResponse>,
),
TimeoutNow(tokio::sync::oneshot::Sender<Result<(), String>>),
}
impl Default for InMemoryRpcClient {
@ -81,15 +88,46 @@ pub mod test_client {
#[async_trait::async_trait]
impl RaftRpcClient for InMemoryRpcClient {
async fn add_node(&self, _target: NodeId, _addr: String) -> Result<(), RaftNetworkError> {
Ok(())
}
async fn remove_node(&self, target: NodeId) -> Result<(), RaftNetworkError> {
self.channels.write().await.remove(&target);
Ok(())
}
async fn timeout_now(&self, target: NodeId) -> Result<(), RaftNetworkError> {
let tx = {
let channels = self.channels.read().await;
channels
.get(&target)
.cloned()
.ok_or(RaftNetworkError::NodeNotFound(target))?
};
let (resp_tx, resp_rx) = tokio::sync::oneshot::channel();
tx.send(RpcMessage::TimeoutNow(resp_tx))
.map_err(|_| RaftNetworkError::RpcFailed("Channel closed".into()))?;
resp_rx
.await
.map_err(|_| RaftNetworkError::RpcFailed("Response channel closed".into()))?
.map_err(RaftNetworkError::RpcFailed)
}
async fn vote(
&self,
target: NodeId,
req: VoteRequest,
) -> Result<VoteResponse, RaftNetworkError> {
let channels = self.channels.read().await;
let tx = channels
.get(&target)
.ok_or(RaftNetworkError::NodeNotFound(target))?;
let tx = {
let channels = self.channels.read().await;
channels
.get(&target)
.cloned()
.ok_or(RaftNetworkError::NodeNotFound(target))?
};
let (resp_tx, resp_rx) = tokio::sync::oneshot::channel();
tx.send(RpcMessage::Vote(req, resp_tx))
@ -105,20 +143,22 @@ pub mod test_client {
target: NodeId,
req: AppendEntriesRequest,
) -> Result<AppendEntriesResponse, RaftNetworkError> {
let channels = self.channels.read().await;
let tx = channels.get(&target).ok_or_else(|| {
eprintln!(
"[RPC] NodeNotFound: target={}, registered={:?}",
target,
channels.keys().collect::<Vec<_>>()
);
RaftNetworkError::NodeNotFound(target)
})?;
let tx = {
let channels = self.channels.read().await;
channels.get(&target).cloned().ok_or_else(|| {
eprintln!(
"[RPC] NodeNotFound: target={}, registered={:?}",
target,
channels.keys().collect::<Vec<_>>()
);
RaftNetworkError::NodeNotFound(target)
})?
};
let (resp_tx, resp_rx) = tokio::sync::oneshot::channel();
let send_result = tx.send(RpcMessage::AppendEntries(req.clone(), resp_tx));
if let Err(e) = send_result {
if let Err(_e) = send_result {
eprintln!("[RPC] Send failed to node {}: channel closed", target);
return Err(RaftNetworkError::RpcFailed("Channel closed".into()));
}

View file

@ -6,7 +6,7 @@ use crate::config::ServerConfig;
use anyhow::Result;
use chainfire_api::GrpcRaftClient;
use chainfire_gossip::{GossipAgent, GossipId};
use chainfire_raft::core::{RaftConfig, RaftCore};
use chainfire_raft::core::{ClusterMember, ClusterMembership, RaftConfig, RaftCore};
use chainfire_raft::network::RaftRpcClient;
use chainfire_storage::{LogStorage, RocksStore, StateMachine};
use chainfire_types::node::NodeRole;
@ -43,6 +43,7 @@ impl Node {
// Create Raft core only if role participates in Raft
let (raft, rpc_client) = if config.raft.role.participates_in_raft() {
let membership = initial_membership(&config);
// Create RocksDB store
let store = RocksStore::new(&config.storage.data_dir)?;
info!(data_dir = ?config.storage.data_dir, "Opened storage");
@ -57,26 +58,22 @@ impl Node {
// Create gRPC Raft client and register peer addresses
let rpc_client = Arc::new(GrpcRaftClient::new());
for member in &config.cluster.initial_members {
rpc_client
.add_node(member.id, member.raft_addr.clone())
.await;
info!(node_id = member.id, addr = %member.raft_addr, "Registered peer");
for member in &membership.members {
if let Some(peer_url) = member.peer_urls.first() {
let addr = peer_url
.strip_prefix("http://")
.or_else(|| peer_url.strip_prefix("https://"))
.unwrap_or(peer_url)
.to_string();
rpc_client.add_node(member.id, addr.clone()).await;
info!(node_id = member.id, addr = %addr, "Registered peer");
}
}
// Extract peer node IDs (excluding self)
let peers: Vec<u64> = config
.cluster
.initial_members
.iter()
.map(|m| m.id)
.filter(|&id| id != config.node.id)
.collect();
// Create RaftCore with default config
let raft_core = Arc::new(RaftCore::new(
config.node.id,
peers,
membership,
log_storage,
state_machine,
Arc::clone(&rpc_client) as Arc<dyn RaftRpcClient>,
@ -179,7 +176,7 @@ impl Node {
/// NOTE: Custom RaftCore handles multi-node initialization via the peers parameter
/// in the constructor. All nodes start with the same peer list and will elect a leader.
pub async fn maybe_bootstrap(&self) -> Result<()> {
let Some(raft) = &self.raft else {
let Some(_raft) = &self.raft else {
info!("No Raft core to bootstrap (role=none)");
return Ok(());
};
@ -231,3 +228,53 @@ impl Node {
let _ = self.shutdown_tx.send(());
}
}
fn initial_membership(config: &ServerConfig) -> ClusterMembership {
let api_port = config.network.api_addr.port();
let mut members: Vec<ClusterMember> = config
.cluster
.initial_members
.iter()
.map(|member| ClusterMember {
id: member.id,
name: format!("node-{}", member.id),
peer_urls: vec![normalize_peer_url(&member.raft_addr)],
client_urls: grpc_endpoint_from_raft_addr(&member.raft_addr, api_port)
.into_iter()
.collect(),
is_learner: member.id == config.node.id && config.raft.role == RaftRole::Learner,
})
.collect();
if members.is_empty() {
members.push(ClusterMember {
id: config.node.id,
name: config.node.name.clone(),
peer_urls: vec![normalize_peer_url(&config.network.raft_addr.to_string())],
client_urls: vec![format!(
"http://{}:{}",
config.network.api_addr.ip(),
config.network.api_addr.port()
)],
is_learner: config.raft.role == RaftRole::Learner,
});
}
ClusterMembership { members }.normalized()
}
fn grpc_endpoint_from_raft_addr(raft_addr: &str, api_port: u16) -> Option<String> {
if let Ok(addr) = raft_addr.parse::<std::net::SocketAddr>() {
return Some(format!("http://{}:{}", addr.ip(), api_port));
}
let (host, _) = raft_addr.rsplit_once(':')?;
Some(format!("http://{}:{}", host, api_port))
}
fn normalize_peer_url(raft_addr: &str) -> String {
if raft_addr.contains("://") {
raft_addr.to_string()
} else {
format!("http://{raft_addr}")
}
}

View file

@ -7,18 +7,20 @@
//! - GET /api/v1/kv?prefix={prefix} - Range scan
//! - GET /api/v1/cluster/status - Cluster health
//! - POST /api/v1/cluster/members - Add member
//! - POST /api/v1/cluster/leader/transfer - Transfer cluster leadership
use axum::{
extract::{Path, Query, State},
http::StatusCode,
routing::{get, post},
routing::{delete, get, post},
Json, Router,
};
use chainfire_api::GrpcRaftClient;
use chainfire_raft::{core::RaftError, RaftCore};
use chainfire_raft::{
core::{ClusterMember, RaftError},
RaftCore,
};
use chainfire_types::command::RaftCommand;
use serde::{Deserialize, Serialize};
use std::collections::HashMap;
use std::sync::Arc;
/// REST API state
@ -26,9 +28,8 @@ use std::sync::Arc;
pub struct RestApiState {
pub raft: Arc<RaftCore>,
pub cluster_id: u64,
pub rpc_client: Option<Arc<GrpcRaftClient>>,
pub http_client: reqwest::Client,
pub peer_http_addrs: Arc<HashMap<u64, String>>,
pub http_port: u16,
}
/// Standard REST error response
@ -113,21 +114,39 @@ pub struct ClusterStatusResponse {
}
/// Add member request
#[derive(Debug, Deserialize)]
#[derive(Debug, Deserialize, Serialize)]
pub struct AddMemberRequest {
pub node_id: u64,
pub raft_addr: String,
#[serde(default)]
pub client_url: Option<String>,
#[serde(default)]
pub name: Option<String>,
#[serde(default)]
pub is_learner: bool,
}
/// Add member request (legacy format from first-boot-automation)
/// Accepts string id and converts to numeric node_id
#[derive(Debug, Deserialize)]
#[derive(Debug, Deserialize, Serialize)]
pub struct AddMemberRequestLegacy {
/// Node ID as string (e.g., "node01", "node02")
pub id: String,
pub raft_addr: String,
}
/// Remove member request body.
#[derive(Debug, Deserialize)]
pub struct RemoveMemberRequest {
pub node_id: u64,
}
/// Leader-transfer request body.
#[derive(Debug, Deserialize, Serialize)]
pub struct LeaderTransferRequest {
pub target_id: u64,
}
/// Query parameters for prefix scan
#[derive(Debug, Deserialize)]
pub struct PrefixQuery {
@ -154,6 +173,8 @@ pub fn build_router(state: RestApiState) -> Router {
.route("/api/v1/kv", get(list_kv))
.route("/api/v1/cluster/status", get(cluster_status))
.route("/api/v1/cluster/members", post(add_member))
.route("/api/v1/cluster/leader/transfer", post(transfer_leader))
.route("/api/v1/cluster/members/:node_id", delete(remove_member))
// Legacy endpoint for first-boot-automation compatibility
.route("/admin/member/add", post(add_member_legacy))
.route("/health", get(health_check))
@ -342,38 +363,77 @@ fn string_to_node_id(s: &str) -> u64 {
hasher.finish()
}
fn cluster_operation_error(err: &RaftError) -> (StatusCode, &'static str, String) {
match err {
RaftError::Rejected(message) => (
StatusCode::PRECONDITION_FAILED,
"PRECONDITION_FAILED",
message.clone(),
),
RaftError::Timeout => (
StatusCode::REQUEST_TIMEOUT,
"TIMEOUT",
"cluster operation timed out".to_string(),
),
_ => (
StatusCode::INTERNAL_SERVER_ERROR,
"INTERNAL_ERROR",
err.to_string(),
),
}
}
/// POST /api/v1/cluster/members - Add member
async fn add_member(
State(state): State<RestApiState>,
Json(req): Json<AddMemberRequest>,
) -> Result<(StatusCode, Json<SuccessResponse<serde_json::Value>>), (StatusCode, Json<ErrorResponse>)>
{
let rpc_client = state.rpc_client.as_ref().ok_or_else(|| {
error_response(
StatusCode::SERVICE_UNAVAILABLE,
"SERVICE_UNAVAILABLE",
"RPC client not available",
)
})?;
let member = ClusterMember {
id: req.node_id,
name: req
.name
.clone()
.filter(|value| !value.trim().is_empty())
.unwrap_or_else(|| format!("node-{}", req.node_id)),
peer_urls: vec![normalize_peer_url(&req.raft_addr)],
client_urls: req.client_url.clone().into_iter().collect(),
is_learner: req.is_learner,
};
// Add node to RPC client's routing table
rpc_client
.add_node(req.node_id, req.raft_addr.clone())
.await;
// Note: RaftCore doesn't have add_peer() - members are managed via configuration
// For now, we just register the node in the RPC client
// In a full implementation, this would trigger a Raft configuration change
Ok((
StatusCode::CREATED,
Json(SuccessResponse::new(serde_json::json!({
"node_id": req.node_id,
"raft_addr": req.raft_addr,
"success": true,
"note": "Node registered in RPC client routing table"
}))),
))
match state.raft.add_member(member).await {
Ok(membership) => {
return Ok((
StatusCode::CREATED,
Json(SuccessResponse::new(serde_json::json!({
"node_id": req.node_id,
"raft_addr": req.raft_addr,
"members": membership.members.len(),
"success": true
}))),
));
}
Err(RaftError::NotLeader { leader_id }) => {
return proxy_cluster_write_to_leader(
&state,
leader_id,
"/api/v1/cluster/members",
reqwest::Method::POST,
Some(serde_json::to_value(&req).map_err(|err| {
error_response(
StatusCode::INTERNAL_SERVER_ERROR,
"INTERNAL_ERROR",
&format!("failed to encode add-member request: {err}"),
)
})?),
)
.await;
}
Err(err) => {
let (status, code, message) = cluster_operation_error(&err);
return Err(error_response(status, code, &message));
}
}
}
/// POST /admin/member/add - Add member (legacy format for first-boot-automation)
@ -383,28 +443,94 @@ async fn add_member_legacy(
) -> Result<(StatusCode, Json<SuccessResponse<serde_json::Value>>), (StatusCode, Json<ErrorResponse>)>
{
let node_id = string_to_node_id(&req.id);
add_member(
State(state),
Json(AddMemberRequest {
node_id,
raft_addr: req.raft_addr,
client_url: None,
name: Some(req.id),
is_learner: false,
}),
)
.await
}
let rpc_client = state.rpc_client.as_ref().ok_or_else(|| {
error_response(
StatusCode::SERVICE_UNAVAILABLE,
"SERVICE_UNAVAILABLE",
"RPC client not available",
)
})?;
/// DELETE /api/v1/cluster/members/:node_id - Remove member.
async fn remove_member(
State(state): State<RestApiState>,
Path(node_id): Path<u64>,
) -> Result<(StatusCode, Json<SuccessResponse<serde_json::Value>>), (StatusCode, Json<ErrorResponse>)>
{
match state.raft.remove_member(node_id).await {
Ok(membership) => Ok((
StatusCode::OK,
Json(SuccessResponse::new(serde_json::json!({
"node_id": node_id,
"members": membership.members.len(),
"success": true
}))),
)),
Err(RaftError::NotLeader { leader_id }) => {
proxy_cluster_write_to_leader(
&state,
leader_id,
&format!("/api/v1/cluster/members/{node_id}"),
reqwest::Method::DELETE,
None,
)
.await
}
Err(err) => {
let (status, code, message) = cluster_operation_error(&err);
Err(error_response(status, code, &message))
}
}
}
// Add node to RPC client's routing table
rpc_client.add_node(node_id, req.raft_addr.clone()).await;
/// POST /api/v1/cluster/leader/transfer - Transfer cluster leadership.
async fn transfer_leader(
State(state): State<RestApiState>,
Json(req): Json<LeaderTransferRequest>,
) -> Result<(StatusCode, Json<SuccessResponse<serde_json::Value>>), (StatusCode, Json<ErrorResponse>)>
{
if req.target_id == 0 {
return Err(error_response(
StatusCode::BAD_REQUEST,
"INVALID_ARGUMENT",
"leader transfer target must be non-zero",
));
}
Ok((
StatusCode::CREATED,
Json(SuccessResponse::new(serde_json::json!({
"id": req.id,
"node_id": node_id,
"raft_addr": req.raft_addr,
"success": true,
"note": "Node registered in RPC client routing table (legacy API)"
}))),
))
match state.raft.transfer_leader(req.target_id).await {
Ok(leader) => Ok((
StatusCode::OK,
Json(SuccessResponse::new(serde_json::json!({
"leader": leader,
"success": true
}))),
)),
Err(RaftError::NotLeader { leader_id }) => {
proxy_cluster_write_to_leader(
&state,
leader_id,
"/api/v1/cluster/leader/transfer",
reqwest::Method::POST,
Some(serde_json::to_value(&req).map_err(|err| {
error_response(
StatusCode::INTERNAL_SERVER_ERROR,
"INTERNAL_ERROR",
&format!("failed to encode leader-transfer request: {err}"),
)
})?),
)
.await
}
Err(err) => {
let (status, code, message) = cluster_operation_error(&err);
Err(error_response(status, code, &message))
}
}
}
/// Helper to create error response
@ -426,6 +552,54 @@ fn error_response(
)
}
fn normalize_peer_url(raft_addr: &str) -> String {
if raft_addr.contains("://") {
raft_addr.to_string()
} else {
format!("http://{raft_addr}")
}
}
fn http_endpoint_from_peer_url(peer_url: &str, http_port: u16) -> Option<String> {
let trimmed = peer_url
.strip_prefix("http://")
.or_else(|| peer_url.strip_prefix("https://"))
.unwrap_or(peer_url);
if let Ok(addr) = trimmed.parse::<std::net::SocketAddr>() {
return Some(format!("http://{}:{}", addr.ip(), http_port));
}
let (host, _) = trimmed.rsplit_once(':')?;
Some(format!("http://{}:{}", host, http_port))
}
async fn leader_http_addr(
state: &RestApiState,
leader_id: u64,
) -> Result<String, (StatusCode, Json<ErrorResponse>)> {
let membership = state.raft.cluster_membership().await;
let leader = membership.member(leader_id).ok_or_else(|| {
error_response(
StatusCode::SERVICE_UNAVAILABLE,
"NOT_LEADER",
&format!("leader {leader_id} is known but has no membership record"),
)
})?;
let peer_url = leader.peer_urls.first().ok_or_else(|| {
error_response(
StatusCode::SERVICE_UNAVAILABLE,
"NOT_LEADER",
&format!("leader {leader_id} is known but has no peer URL"),
)
})?;
http_endpoint_from_peer_url(peer_url, state.http_port).ok_or_else(|| {
error_response(
StatusCode::SERVICE_UNAVAILABLE,
"NOT_LEADER",
&format!("leader {leader_id} peer URL {peer_url} cannot be mapped to HTTP"),
)
})
}
async fn submit_rest_write(
state: &RestApiState,
command: RaftCommand,
@ -464,13 +638,7 @@ async fn proxy_write_to_leader(
"current node is not the leader and no leader is known yet",
)
})?;
let leader_http_addr = state.peer_http_addrs.get(&leader_id).ok_or_else(|| {
error_response(
StatusCode::SERVICE_UNAVAILABLE,
"NOT_LEADER",
&format!("leader {leader_id} is known but has no HTTP endpoint mapping"),
)
})?;
let leader_http_addr = leader_http_addr(state, leader_id).await?;
let url = format!(
"{}/api/v1/kv/{}",
leader_http_addr.trim_end_matches('/'),
@ -506,6 +674,64 @@ async fn proxy_write_to_leader(
Err((status, Json(payload)))
}
async fn proxy_cluster_write_to_leader(
state: &RestApiState,
leader_id: Option<u64>,
path: &str,
method: reqwest::Method,
body: Option<serde_json::Value>,
) -> Result<(StatusCode, Json<SuccessResponse<serde_json::Value>>), (StatusCode, Json<ErrorResponse>)>
{
let leader_id = leader_id.ok_or_else(|| {
error_response(
StatusCode::SERVICE_UNAVAILABLE,
"NOT_LEADER",
"current node is not the leader and no leader is known yet",
)
})?;
let leader_http_addr = leader_http_addr(state, leader_id).await?;
let url = format!("{}{}", leader_http_addr.trim_end_matches('/'), path);
let mut request = state.http_client.request(method, &url);
if let Some(body) = body {
request = request.json(&body);
}
let response = request.send().await.map_err(|err| {
error_response(
StatusCode::BAD_GATEWAY,
"LEADER_PROXY_FAILED",
&format!("failed to forward cluster write to leader {leader_id}: {err}"),
)
})?;
if response.status().is_success() {
let status = StatusCode::from_u16(response.status().as_u16()).unwrap_or(StatusCode::OK);
let payload = response
.json::<SuccessResponse<serde_json::Value>>()
.await
.map_err(|err| {
error_response(
StatusCode::BAD_GATEWAY,
"LEADER_PROXY_FAILED",
&format!("failed to decode leader {leader_id} response: {err}"),
)
})?;
return Ok((status, Json(payload)));
}
let status =
StatusCode::from_u16(response.status().as_u16()).unwrap_or(StatusCode::BAD_GATEWAY);
let payload = response
.json::<ErrorResponse>()
.await
.unwrap_or_else(|err| ErrorResponse {
error: ErrorDetail {
code: "LEADER_PROXY_FAILED".to_string(),
message: format!("leader {leader_id} returned {status}: {err}"),
details: None,
},
meta: ResponseMeta::new(),
});
Err((status, Json(payload)))
}
async fn should_proxy_read(consistency: Option<&str>, state: &RestApiState) -> bool {
let node_id = state.raft.node_id();
let leader_id = state.raft.leader().await;
@ -517,7 +743,12 @@ fn read_requires_leader_proxy(
node_id: u64,
leader_id: Option<u64>,
) -> bool {
if matches!(consistency, Some(mode) if mode.eq_ignore_ascii_case("local")) {
if matches!(
consistency,
Some(mode)
if mode.eq_ignore_ascii_case("local")
|| mode.eq_ignore_ascii_case("serializable")
) {
return false;
}
matches!(leader_id, Some(leader_id) if leader_id != node_id)
@ -538,13 +769,7 @@ where
"current node is not the leader and no leader is known yet",
)
})?;
let leader_http_addr = state.peer_http_addrs.get(&leader_id).ok_or_else(|| {
error_response(
StatusCode::SERVICE_UNAVAILABLE,
"NOT_LEADER",
&format!("leader {leader_id} is known but has no HTTP endpoint mapping"),
)
})?;
let leader_http_addr = leader_http_addr(state, leader_id).await?;
let url = format!("{}{}", leader_http_addr.trim_end_matches('/'), path);
let mut request = state.http_client.get(&url);
if let Some(query) = query {
@ -591,7 +816,26 @@ mod tests {
fn read_requires_leader_proxy_defaults_to_leader_consistency() {
assert!(read_requires_leader_proxy(None, 2, Some(1)));
assert!(!read_requires_leader_proxy(Some("local"), 2, Some(1)));
assert!(!read_requires_leader_proxy(
Some("serializable"),
2,
Some(1)
));
assert!(!read_requires_leader_proxy(
Some("SERIALIZABLE"),
2,
Some(1)
));
assert!(!read_requires_leader_proxy(None, 2, Some(2)));
assert!(!read_requires_leader_proxy(None, 2, None));
}
#[test]
fn cluster_operation_error_maps_rejected_to_precondition_failed() {
let (status, code, message) =
cluster_operation_error(&RaftError::Rejected("needs sequential reconfigure".into()));
assert_eq!(status, StatusCode::PRECONDITION_FAILED);
assert_eq!(code, "PRECONDITION_FAILED");
assert_eq!(message, "needs sequential reconfigure");
}
}

View file

@ -11,11 +11,10 @@ use crate::rest::{build_router, RestApiState};
use anyhow::Result;
use chainfire_api::internal_proto::raft_service_server::RaftServiceServer;
use chainfire_api::proto::{
cluster_server::ClusterServer, kv_server::KvServer, watch_server::WatchServer, Member,
cluster_server::ClusterServer, kv_server::KvServer, watch_server::WatchServer,
};
use chainfire_api::{ClusterServiceImpl, KvServiceImpl, RaftServiceImpl, WatchServiceImpl};
use chainfire_types::RaftRole;
use std::collections::HashMap;
use std::sync::Arc;
use tokio::signal;
use tonic::transport::{Certificate, Identity, Server as TonicServer, ServerTlsConfig};
@ -94,11 +93,7 @@ impl Server {
raft.node_id(),
);
let cluster_service = ClusterServiceImpl::new(
Arc::clone(&raft),
self.node.cluster_id(),
configured_members(&self.config),
);
let cluster_service = ClusterServiceImpl::new(Arc::clone(&raft), self.node.cluster_id());
// Internal Raft service for inter-node communication
let raft_service = RaftServiceImpl::new(Arc::clone(&raft));
@ -155,24 +150,11 @@ impl Server {
// HTTP REST API server
let http_addr = self.config.network.http_addr;
let http_port = self.config.network.http_addr.port();
let peer_http_addrs = Arc::new(
self.config
.cluster
.initial_members
.iter()
.filter_map(|member| {
http_endpoint_from_raft_addr(&member.raft_addr, http_port)
.map(|http_addr| (member.id, http_addr))
})
.collect::<HashMap<_, _>>(),
);
let rest_state = RestApiState {
raft: Arc::clone(&raft),
cluster_id: self.node.cluster_id(),
rpc_client: self.node.rpc_client().cloned(),
http_client: reqwest::Client::new(),
peer_http_addrs,
http_port: self.config.network.http_addr.port(),
};
let rest_app = build_router(rest_state);
let http_listener = tokio::net::TcpListener::bind(&http_addr).await?;
@ -289,45 +271,3 @@ impl Server {
Ok(())
}
}
fn http_endpoint_from_raft_addr(raft_addr: &str, http_port: u16) -> Option<String> {
if let Ok(addr) = raft_addr.parse::<std::net::SocketAddr>() {
return Some(format!("http://{}:{}", addr.ip(), http_port));
}
let (host, _) = raft_addr.rsplit_once(':')?;
Some(format!("http://{}:{}", host, http_port))
}
fn grpc_endpoint_from_raft_addr(raft_addr: &str, api_port: u16) -> Option<String> {
if let Ok(addr) = raft_addr.parse::<std::net::SocketAddr>() {
return Some(format!("http://{}:{}", addr.ip(), api_port));
}
let (host, _) = raft_addr.rsplit_once(':')?;
Some(format!("http://{}:{}", host, api_port))
}
fn normalize_peer_url(raft_addr: &str) -> String {
if raft_addr.contains("://") {
raft_addr.to_string()
} else {
format!("http://{raft_addr}")
}
}
fn configured_members(config: &ServerConfig) -> Vec<Member> {
let api_port = config.network.api_addr.port();
config
.cluster
.initial_members
.iter()
.map(|member| Member {
id: member.id,
name: format!("node-{}", member.id),
peer_urls: vec![normalize_peer_url(&member.raft_addr)],
client_urls: grpc_endpoint_from_raft_addr(&member.raft_addr, api_port)
.into_iter()
.collect(),
is_learner: false,
})
.collect()
}

View file

@ -44,8 +44,8 @@ pub enum EntryPayload<D> {
Blank,
/// A normal data entry
Normal(D),
/// Membership change entry
Membership(Vec<u64>), // Just node IDs for simplicity
/// Membership change entry encoded as a serialized membership payload.
Membership(Vec<u8>),
}
impl<D> LogEntry<D> {
@ -189,6 +189,35 @@ impl LogStorage {
}
}
/// Save the current serialized membership payload.
pub fn save_membership(&self, membership: &[u8]) -> Result<(), StorageError> {
let cf = self
.store
.cf_handle(cf::META)
.ok_or_else(|| StorageError::RocksDb("META cf not found".into()))?;
self.store
.db()
.put_cf(&cf, crate::meta_keys::MEMBERSHIP, membership)
.map_err(|e| StorageError::RocksDb(e.to_string()))?;
debug!(bytes = membership.len(), "Saved membership payload");
Ok(())
}
/// Read the current serialized membership payload from storage.
pub fn read_membership(&self) -> Result<Option<Vec<u8>>, StorageError> {
let cf = self
.store
.cf_handle(cf::META)
.ok_or_else(|| StorageError::RocksDb("META cf not found".into()))?;
self.store
.db()
.get_cf(&cf, crate::meta_keys::MEMBERSHIP)
.map_err(|e| StorageError::RocksDb(e.to_string()))
}
/// Append log entries
pub fn append<D: Serialize>(&self, entries: &[LogEntry<D>]) -> Result<(), StorageError> {
if entries.is_empty() {

View file

@ -23,14 +23,23 @@ service Watch {
rpc Watch(stream WatchRequest) returns (stream WatchResponse);
}
// Cluster management service for fixed-membership clusters.
// Cluster management service for live membership changes.
service Cluster {
// MemberList lists the members configured at cluster bootstrap time
// MemberAdd adds a member into the cluster.
rpc MemberAdd(MemberAddRequest) returns (MemberAddResponse);
// MemberRemove removes an existing member from the cluster.
rpc MemberRemove(MemberRemoveRequest) returns (MemberRemoveResponse);
// MemberList lists the current committed cluster membership
rpc MemberList(MemberListRequest) returns (MemberListResponse);
// Status gets the status of the cluster
rpc Status(StatusRequest) returns (StatusResponse);
// LeaderTransfer requests a live leadership handoff to a specific voting member.
rpc LeaderTransfer(LeaderTransferRequest) returns (LeaderTransferResponse);
}
// Lease service for TTL-based key expiration
@ -283,6 +292,38 @@ message Member {
bool is_learner = 5;
}
message MemberAddRequest {
// ID is the member ID to add or update
uint64 id = 1;
// name is the human-readable name
string name = 2;
// peer_urls are URLs for Raft communication
repeated string peer_urls = 3;
// client_urls are URLs for client communication
repeated string client_urls = 4;
// is_learner indicates if the member is a learner
bool is_learner = 5;
}
message MemberAddResponse {
ResponseHeader header = 1;
// member is the member information for the added member
Member member = 2;
// members is the list of all members after adding
repeated Member members = 3;
}
message MemberRemoveRequest {
// ID is the member ID to remove
uint64 id = 1;
}
message MemberRemoveResponse {
ResponseHeader header = 1;
// members is the list of all members after removing
repeated Member members = 2;
}
message MemberListRequest {}
message MemberListResponse {
@ -309,6 +350,17 @@ message StatusResponse {
uint64 raft_applied_index = 7;
}
message LeaderTransferRequest {
// target_id is the voting member that should become leader.
uint64 target_id = 1;
}
message LeaderTransferResponse {
ResponseHeader header = 1;
// leader is the member ID of the observed leader after transfer.
uint64 leader = 2;
}
// ========== Lease ==========
message LeaseGrantRequest {

View file

@ -9,6 +9,9 @@ service RaftService {
// AppendEntries sends log entries to followers
rpc AppendEntries(AppendEntriesRequest) returns (AppendEntriesResponse);
// TimeoutNow requests an immediate election on the target voting peer.
rpc TimeoutNow(TimeoutNowRequest) returns (TimeoutNowResponse);
}
message VoteRequest {
@ -47,6 +50,12 @@ message AppendEntriesRequest {
uint64 leader_commit = 6;
}
enum EntryType {
ENTRY_TYPE_BLANK = 0;
ENTRY_TYPE_NORMAL = 1;
ENTRY_TYPE_MEMBERSHIP = 2;
}
message LogEntry {
// index is the log entry index
uint64 index = 1;
@ -54,6 +63,8 @@ message LogEntry {
uint64 term = 2;
// data is the command data
bytes data = 3;
// entry_type identifies how data should be decoded
EntryType entry_type = 4;
}
message AppendEntriesResponse {
@ -66,3 +77,12 @@ message AppendEntriesResponse {
// conflict_term is the term of the conflicting entry
uint64 conflict_term = 4;
}
message TimeoutNowRequest {}
message TimeoutNowResponse {
// accepted is true if the target accepted the immediate-election request.
bool accepted = 1;
// term is the target node's current term after processing the request.
uint64 term = 2;
}

11
deployer/Cargo.lock generated
View file

@ -364,6 +364,17 @@ dependencies = [
"generic-array",
]
[[package]]
name = "bootstrap-agent"
version = "0.1.0"
dependencies = [
"anyhow",
"clap",
"deployer-types",
"serde",
"serde_json",
]
[[package]]
name = "bumpalo"
version = "3.20.2"

View file

@ -1,6 +1,7 @@
[workspace]
resolver = "2"
members = [
"crates/bootstrap-agent",
"crates/deployer-types",
"crates/deployer-server",
"crates/node-agent",

View file

@ -0,0 +1,16 @@
[package]
name = "bootstrap-agent"
version.workspace = true
edition.workspace = true
rust-version.workspace = true
authors.workspace = true
license.workspace = true
repository.workspace = true
[dependencies]
anyhow.workspace = true
clap.workspace = true
serde.workspace = true
serde_json.workspace = true
deployer-types.workspace = true

View file

@ -0,0 +1,203 @@
use std::collections::HashMap;
use std::fmt::Write as _;
use std::fs;
use std::path::{Path, PathBuf};
use anyhow::{Context, Result};
use clap::{Parser, Subcommand, ValueEnum};
use deployer_types::{DiskSelectorSource, NodeConfig, ResolvedInstallPlan};
#[derive(Parser, Debug)]
#[command(author, version, about)]
struct Cli {
#[command(subcommand)]
command: Command,
}
#[derive(Subcommand, Debug)]
enum Command {
ResolveInstallContext(ResolveInstallContextArgs),
}
#[derive(Parser, Debug)]
struct ResolveInstallContextArgs {
#[arg(long, default_value = "/etc/ultracloud/node-config.json")]
node_config: PathBuf,
#[arg(long, default_value = "/etc/ultracloud/disko-script-paths.json")]
disko_script_paths: PathBuf,
#[arg(long, default_value = "/etc/ultracloud/system-paths.json")]
system_paths: PathBuf,
#[arg(long, value_enum, default_value_t = OutputFormat::Json)]
format: OutputFormat,
#[arg(long)]
write: Option<PathBuf>,
}
#[derive(Clone, Copy, Debug, Eq, PartialEq, ValueEnum)]
enum OutputFormat {
Json,
Env,
}
fn main() -> Result<()> {
let cli = Cli::parse();
match cli.command {
Command::ResolveInstallContext(args) => resolve_install_context(args),
}
}
fn resolve_install_context(args: ResolveInstallContextArgs) -> Result<()> {
let node_config = read_json::<NodeConfig>(&args.node_config)
.with_context(|| format!("failed to read node config from {}", args.node_config.display()))?;
let disko_script_paths = read_optional_path_map(&args.disko_script_paths)?;
let system_paths = read_optional_path_map(&args.system_paths)?;
let resolved = node_config.resolve_install_plan(
disko_script_paths.as_ref(),
system_paths.as_ref(),
)?;
let rendered = match args.format {
OutputFormat::Json => serde_json::to_string_pretty(&resolved)?,
OutputFormat::Env => render_env_file(&resolved),
};
if let Some(path) = args.write {
if let Some(parent) = path.parent() {
fs::create_dir_all(parent).with_context(|| {
format!("failed to create parent directory for {}", path.display())
})?;
}
fs::write(&path, rendered)
.with_context(|| format!("failed to write {}", path.display()))?;
} else {
print!("{rendered}");
if !rendered.ends_with('\n') {
println!();
}
}
Ok(())
}
fn read_json<T>(path: &Path) -> Result<T>
where
T: serde::de::DeserializeOwned,
{
let raw = fs::read_to_string(path)
.with_context(|| format!("failed to read {}", path.display()))?;
serde_json::from_str(&raw)
.with_context(|| format!("failed to parse {}", path.display()))
}
fn read_optional_path_map(path: &Path) -> Result<Option<HashMap<String, String>>> {
if !path.exists() {
return Ok(None);
}
let map = read_json::<HashMap<String, String>>(path)?;
let sanitized = map
.into_iter()
.filter_map(|(key, value)| {
let trimmed_key = key.trim();
let trimmed_value = value.trim();
if trimmed_key.is_empty() || trimmed_value.is_empty() {
None
} else {
Some((trimmed_key.to_string(), trimmed_value.to_string()))
}
})
.collect::<HashMap<_, _>>();
Ok(Some(sanitized))
}
fn render_env_file(resolved: &ResolvedInstallPlan) -> String {
let mut rendered = String::new();
let node_marker_id = resolved.installer_node_name();
let display_target_disk = resolved.display_target_disk().unwrap_or_default();
let disk_selector_source = match resolved.disk_selector_source {
DiskSelectorSource::AutoDiscovery => "auto-discovery",
DiskSelectorSource::InstallPlanTargetDisk => "install_plan.target_disk",
DiskSelectorSource::InstallPlanTargetDiskById => "install_plan.target_disk_by_id",
};
for (key, value) in [
("NODE_ID", node_marker_id),
("NODE_IP", resolved.ip.as_str()),
("NIXOS_CONFIGURATION", resolved.nixos_configuration.as_str()),
(
"INSTALL_PLAN_DISKO_CONFIG_PATH",
resolved.disko_config_path.as_deref().unwrap_or(""),
),
(
"DISKO_SCRIPT_PATH",
resolved.disko_script_path.as_deref().unwrap_or(""),
),
(
"TARGET_SYSTEM_PATH",
resolved.target_system_path.as_deref().unwrap_or(""),
),
("TARGET_DISK", resolved.target_disk.as_deref().unwrap_or("")),
(
"TARGET_DISK_BY_ID",
resolved.target_disk_by_id.as_deref().unwrap_or(""),
),
("DISPLAY_TARGET_DISK", display_target_disk),
("DISK_SELECTOR_SOURCE", disk_selector_source),
] {
writeln!(
rendered,
"{key}={}",
systemd_environment_quote(value)
)
.expect("writing to String should never fail");
}
rendered
}
fn systemd_environment_quote(value: &str) -> String {
let mut quoted = String::with_capacity(value.len() + 2);
quoted.push('"');
for ch in value.chars() {
match ch {
'\\' => quoted.push_str("\\\\"),
'"' => quoted.push_str("\\\""),
'\n' => quoted.push_str("\\n"),
'\t' => quoted.push_str("\\t"),
_ => quoted.push(ch),
}
}
quoted.push('"');
quoted
}
#[cfg(test)]
mod tests {
use super::render_env_file;
use deployer_types::{DiskSelectorSource, ResolvedInstallPlan};
#[test]
fn env_rendering_quotes_values_for_environment_file_consumers() {
let rendered = render_env_file(&ResolvedInstallPlan {
node_id: "node-01".to_string(),
hostname: "node 01".to_string(),
ip: "10.0.0.10".to_string(),
nixos_configuration: "profile with spaces".to_string(),
disko_config_path: Some("profiles/worker/disko.nix".to_string()),
disko_script_path: Some("/nix/store/example script".to_string()),
target_system_path: Some("/nix/store/example-system".to_string()),
target_disk: Some("/dev/vda".to_string()),
target_disk_by_id: None,
disk_selector_source: DiskSelectorSource::InstallPlanTargetDisk,
});
assert!(rendered.contains("NODE_ID=\"node 01\""));
assert!(rendered.contains("NIXOS_CONFIGURATION=\"profile with spaces\""));
assert!(rendered.contains("DISK_SELECTOR_SOURCE=\"install_plan.target_disk\""));
assert!(rendered.contains("DISPLAY_TARGET_DISK=\"/dev/vda\""));
}
}

View file

@ -1550,6 +1550,8 @@ mod tests {
install_plan: Some(InstallPlan {
nixos_configuration: Some("worker-golden".to_string()),
disko_config_path: Some("profiles/worker-linux/disko.nix".to_string()),
disko_script_path: None,
target_system_path: None,
target_disk: Some("/dev/disk/by-id/worker-golden".to_string()),
target_disk_by_id: None,
}),

View file

@ -139,6 +139,8 @@ mod tests {
install_plan: Some(InstallPlan {
nixos_configuration: Some("worker-golden".to_string()),
disko_config_path: Some("profiles/worker/disko.nix".to_string()),
disko_script_path: None,
target_system_path: None,
target_disk: Some("/dev/vda".to_string()),
target_disk_by_id: None,
}),

View file

@ -1064,6 +1064,8 @@ mod tests {
install_plan: Some(InstallPlan {
nixos_configuration: Some("gpu-worker".to_string()),
disko_config_path: Some("profiles/gpu-worker/disko.nix".to_string()),
disko_script_path: None,
target_system_path: None,
target_disk: Some("/dev/disk/by-id/nvme-gpu-worker".to_string()),
target_disk_by_id: None,
}),

View file

@ -111,6 +111,12 @@ pub struct InstallPlan {
/// Repository-relative Disko file used during installation.
#[serde(default, skip_serializing_if = "Option::is_none")]
pub disko_config_path: Option<String>,
/// Pre-built Disko formatter closure used instead of evaluating a flake on the ISO.
#[serde(default, skip_serializing_if = "Option::is_none")]
pub disko_script_path: Option<String>,
/// Pre-built NixOS system closure installed directly by `nixos-install --system`.
#[serde(default, skip_serializing_if = "Option::is_none")]
pub target_system_path: Option<String>,
/// Explicit disk device path used by bootstrap installers.
#[serde(default, skip_serializing_if = "Option::is_none")]
pub target_disk: Option<String>,
@ -128,6 +134,12 @@ impl InstallPlan {
if self.disko_config_path.is_some() {
merged.disko_config_path = self.disko_config_path.clone();
}
if self.disko_script_path.is_some() {
merged.disko_script_path = self.disko_script_path.clone();
}
if self.target_system_path.is_some() {
merged.target_system_path = self.target_system_path.clone();
}
if self.target_disk.is_some() {
merged.target_disk = self.target_disk.clone();
}
@ -149,6 +161,66 @@ impl InstallPlan {
}
}
#[derive(Debug, Clone, Copy, PartialEq, Eq, Serialize, Deserialize)]
#[serde(rename_all = "snake_case")]
pub enum DiskSelectorSource {
AutoDiscovery,
InstallPlanTargetDisk,
InstallPlanTargetDiskById,
}
#[derive(Debug, Clone, PartialEq, Eq, Serialize, Deserialize)]
pub struct ResolvedInstallPlan {
pub node_id: String,
pub hostname: String,
pub ip: String,
pub nixos_configuration: String,
#[serde(default, skip_serializing_if = "Option::is_none")]
pub disko_config_path: Option<String>,
#[serde(default, skip_serializing_if = "Option::is_none")]
pub disko_script_path: Option<String>,
#[serde(default, skip_serializing_if = "Option::is_none")]
pub target_system_path: Option<String>,
#[serde(default, skip_serializing_if = "Option::is_none")]
pub target_disk: Option<String>,
#[serde(default, skip_serializing_if = "Option::is_none")]
pub target_disk_by_id: Option<String>,
pub disk_selector_source: DiskSelectorSource,
}
impl ResolvedInstallPlan {
pub fn installer_node_name(&self) -> &str {
&self.hostname
}
pub fn display_target_disk(&self) -> Option<&str> {
self.target_disk_by_id
.as_deref()
.or(self.target_disk.as_deref())
}
}
#[derive(Debug, Clone, PartialEq, Eq)]
pub enum InstallPlanResolveError {
MissingNodeId,
MissingNodeIp,
}
impl std::fmt::Display for InstallPlanResolveError {
fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
match self {
InstallPlanResolveError::MissingNodeId => {
write!(f, "node_config assignment is missing node_id/hostname")
}
InstallPlanResolveError::MissingNodeIp => {
write!(f, "node_config assignment is missing ip")
}
}
}
}
impl std::error::Error for InstallPlanResolveError {}
/// Stable node assignment returned by bootstrap enrollment.
#[derive(Debug, Clone, Serialize, Deserialize, PartialEq, Eq, Default)]
pub struct NodeAssignment {
@ -212,6 +284,91 @@ impl NodeConfig {
bootstrap_secrets,
}
}
pub fn resolve_install_plan(
&self,
disko_script_paths: Option<&HashMap<String, String>>,
system_paths: Option<&HashMap<String, String>>,
) -> Result<ResolvedInstallPlan, InstallPlanResolveError> {
let node_id = non_empty(self.assignment.node_id.as_str())
.map(str::to_string)
.or_else(|| non_empty(self.assignment.hostname.as_str()).map(str::to_string))
.ok_or(InstallPlanResolveError::MissingNodeId)?;
let hostname = non_empty(self.assignment.hostname.as_str())
.unwrap_or(node_id.as_str())
.to_string();
let ip = non_empty(self.assignment.ip.as_str())
.map(str::to_string)
.ok_or(InstallPlanResolveError::MissingNodeIp)?;
let install_plan = self.bootstrap_plan.install_plan.as_ref();
let nixos_configuration = install_plan
.and_then(|plan| plan.nixos_configuration.as_deref())
.and_then(non_empty)
.unwrap_or(hostname.as_str())
.to_string();
let disko_config_path = install_plan
.and_then(|plan| plan.disko_config_path.as_deref())
.and_then(non_empty)
.map(str::to_string);
let disko_script_path = install_plan
.and_then(|plan| plan.disko_script_path.as_deref())
.and_then(non_empty)
.map(str::to_string)
.or_else(|| lookup_path_map(disko_script_paths, &nixos_configuration));
let target_system_path = install_plan
.and_then(|plan| plan.target_system_path.as_deref())
.and_then(non_empty)
.map(str::to_string)
.or_else(|| lookup_path_map(system_paths, &nixos_configuration));
let target_disk = install_plan
.and_then(|plan| plan.target_disk.as_deref())
.and_then(non_empty)
.map(str::to_string);
let target_disk_by_id = install_plan
.and_then(|plan| plan.target_disk_by_id.as_deref())
.and_then(non_empty)
.map(str::to_string);
let disk_selector_source = if target_disk_by_id.is_some() {
DiskSelectorSource::InstallPlanTargetDiskById
} else if target_disk.is_some() {
DiskSelectorSource::InstallPlanTargetDisk
} else {
DiskSelectorSource::AutoDiscovery
};
Ok(ResolvedInstallPlan {
node_id,
hostname,
ip,
nixos_configuration,
disko_config_path,
disko_script_path,
target_system_path,
target_disk,
target_disk_by_id,
disk_selector_source,
})
}
}
fn non_empty(value: &str) -> Option<&str> {
let trimmed = value.trim();
if trimmed.is_empty() {
None
} else {
Some(trimmed)
}
}
fn lookup_path_map(
paths: Option<&HashMap<String, String>>,
nixos_configuration: &str,
) -> Option<String> {
paths
.and_then(|entries| entries.get(nixos_configuration))
.map(String::as_str)
.and_then(non_empty)
.map(str::to_string)
}
/// Basic inventory record for a physical disk observed during commissioning.
@ -1512,6 +1669,8 @@ mod tests {
install_plan: Some(InstallPlan {
nixos_configuration: Some("node01".to_string()),
disko_config_path: Some("nix/nodes/vm-cluster/node01/disko.nix".to_string()),
disko_script_path: None,
target_system_path: None,
target_disk: Some("/dev/vda".to_string()),
target_disk_by_id: None,
}),
@ -1572,6 +1731,8 @@ mod tests {
install_plan: Some(InstallPlan {
nixos_configuration: Some("worker-linux".to_string()),
disko_config_path: Some("profiles/worker-linux/disko.nix".to_string()),
disko_script_path: None,
target_system_path: None,
target_disk: None,
target_disk_by_id: Some("/dev/disk/by-id/worker-default".to_string()),
}),
@ -2064,22 +2225,176 @@ mod tests {
let fallback = InstallPlan {
nixos_configuration: Some("fallback".to_string()),
disko_config_path: Some("fallback/disko.nix".to_string()),
disko_script_path: Some("/nix/store/fallback-disko".to_string()),
target_system_path: Some("/nix/store/fallback-system".to_string()),
target_disk: Some("/dev/sda".to_string()),
target_disk_by_id: None,
};
let preferred = InstallPlan {
nixos_configuration: None,
disko_config_path: None,
disko_script_path: None,
target_system_path: None,
target_disk: None,
target_disk_by_id: Some("/dev/disk/by-id/nvme-example".to_string()),
};
let merged = preferred.merged_with(Some(&fallback));
assert_eq!(merged.nixos_configuration.as_deref(), Some("fallback"));
assert_eq!(
merged.disko_script_path.as_deref(),
Some("/nix/store/fallback-disko")
);
assert_eq!(
merged.target_system_path.as_deref(),
Some("/nix/store/fallback-system")
);
assert_eq!(merged.target_disk.as_deref(), Some("/dev/sda"));
assert_eq!(
merged.target_disk_by_id.as_deref(),
Some("/dev/disk/by-id/nvme-example")
);
}
#[test]
fn test_node_config_resolves_install_plan_from_profile_maps() {
let config = NodeConfig::from_parts(
NodeAssignment {
node_id: "node01".to_string(),
hostname: "node01.example".to_string(),
role: "worker".to_string(),
ip: "10.0.0.10".to_string(),
labels: HashMap::new(),
pool: None,
node_class: None,
failure_domain: None,
},
BootstrapPlan {
services: vec![],
nix_profile: None,
install_plan: Some(InstallPlan {
nixos_configuration: Some("worker-profile".to_string()),
disko_config_path: Some("profiles/worker/disko.nix".to_string()),
disko_script_path: None,
target_system_path: None,
target_disk: None,
target_disk_by_id: Some("/dev/disk/by-id/worker-root".to_string()),
}),
},
BootstrapSecrets::default(),
);
let resolved = config
.resolve_install_plan(
Some(&HashMap::from([(
"worker-profile".to_string(),
"/nix/store/worker-disko".to_string(),
)])),
Some(&HashMap::from([(
"worker-profile".to_string(),
"/nix/store/worker-system".to_string(),
)])),
)
.expect("install contract should resolve");
assert_eq!(resolved.node_id, "node01");
assert_eq!(resolved.hostname, "node01.example");
assert_eq!(resolved.nixos_configuration, "worker-profile");
assert_eq!(
resolved.disko_script_path.as_deref(),
Some("/nix/store/worker-disko")
);
assert_eq!(
resolved.target_system_path.as_deref(),
Some("/nix/store/worker-system")
);
assert_eq!(
resolved.target_disk_by_id.as_deref(),
Some("/dev/disk/by-id/worker-root")
);
assert_eq!(
resolved.disk_selector_source,
DiskSelectorSource::InstallPlanTargetDiskById
);
}
#[test]
fn test_node_config_prefers_direct_install_artifacts() {
let config = NodeConfig::from_parts(
NodeAssignment {
node_id: "node02".to_string(),
hostname: "node02".to_string(),
role: "control-plane".to_string(),
ip: "10.0.0.11".to_string(),
labels: HashMap::new(),
pool: None,
node_class: None,
failure_domain: None,
},
BootstrapPlan {
services: vec![],
nix_profile: None,
install_plan: Some(InstallPlan {
nixos_configuration: None,
disko_config_path: None,
disko_script_path: Some("/nix/store/direct-disko".to_string()),
target_system_path: Some("/nix/store/direct-system".to_string()),
target_disk: Some("/dev/vda".to_string()),
target_disk_by_id: None,
}),
},
BootstrapSecrets::default(),
);
let resolved = config
.resolve_install_plan(
Some(&HashMap::from([(
"node02".to_string(),
"/nix/store/fallback-disko".to_string(),
)])),
Some(&HashMap::from([(
"node02".to_string(),
"/nix/store/fallback-system".to_string(),
)])),
)
.expect("install contract should resolve");
assert_eq!(resolved.nixos_configuration, "node02");
assert_eq!(
resolved.disko_script_path.as_deref(),
Some("/nix/store/direct-disko")
);
assert_eq!(
resolved.target_system_path.as_deref(),
Some("/nix/store/direct-system")
);
assert_eq!(resolved.target_disk.as_deref(), Some("/dev/vda"));
assert_eq!(
resolved.disk_selector_source,
DiskSelectorSource::InstallPlanTargetDisk
);
}
#[test]
fn test_node_config_resolve_install_plan_requires_ip() {
let config = NodeConfig::from_parts(
NodeAssignment {
node_id: "node03".to_string(),
hostname: "node03".to_string(),
role: "worker".to_string(),
ip: "".to_string(),
labels: HashMap::new(),
pool: None,
node_class: None,
failure_domain: None,
},
BootstrapPlan::default(),
BootstrapSecrets::default(),
);
let error = config
.resolve_install_plan(None, None)
.expect_err("missing ip should fail resolution");
assert_eq!(error, InstallPlanResolveError::MissingNodeIp);
}
}

View file

@ -776,6 +776,8 @@ mod tests {
install_plan: Some(InstallPlan {
nixos_configuration: Some("node01".to_string()),
disko_config_path: Some("nix/nodes/vm-cluster/node01/disko.nix".to_string()),
disko_script_path: None,
target_system_path: None,
target_disk: Some("/dev/vda".to_string()),
target_disk_by_id: None,
}),

View file

@ -44,7 +44,7 @@ This directory is the public documentation entrypoint for UltraCloud.
## Core API Notes
- `chainfire` supports fixed-membership cluster introspection on the public surface: `MemberList`, `Status`, and the internal `Vote` plus `AppendEntries` Raft transport. `chainfire-core` remains a workspace-internal compatibility crate rather than a supported embeddable API.
- `chainfire` supports live cluster membership management on the public surface: `MemberAdd`, `MemberRemove`, `MemberList`, `Status`, `LeaderTransfer`, and the internal `Vote`, `AppendEntries`, plus `TimeoutNow` Raft transport. The supported operator flow now includes learner add or promote, live leader transfer, temporary-voter restart and rejoin, and current-leader removal followed by election on the remaining voters. The supported reconfiguration boundary is sequential one-voter transitions until joint consensus exists. `chainfire-core` remains a workspace-internal compatibility crate rather than a supported embeddable API.
- `flaredb` supports SQL over both gRPC and REST. The public REST endpoints are `POST /api/v1/sql` and `GET /api/v1/tables`.
- `lightningstor` keeps bucket versioning, bucket policy, bucket tagging, and explicit object version listing on the supported optional surface.
- `k8shost` keeps `WatchPods` on the supported surface as a bounded snapshot stream of the current matching pods.

View file

@ -4,24 +4,28 @@ This document fixes the supported operator lifecycle for the core control-plane
## ChainFire Membership And Node Replacement
ChainFire dynamic membership, replace-node, and scale-out are unsupported on the supported surface.
ChainFire supports live membership add, remove, endpoint replacement, and live leader transfer on the supported surface.
The supported reconfiguration boundary is sequential one-voter transitions; arbitrary multi-voter swaps still require future joint-consensus work.
The supported public surface is the fixed-membership cluster API already documented in `chainfire-api`: `MemberList` and `Status` report the membership that the node booted with, and operators should treat that membership as immutable for a release branch.
The supported public surface is the replicated cluster API documented in `chainfire-api`: `MemberAdd`, `MemberRemove`, `MemberList`, `Status`, and `LeaderTransfer` operate on the current committed membership rather than only the bootstrap shape.
Supported operator actions today:
1. Keep the canonical control plane at the documented fixed membership for the branch.
2. Use the canonical `durability-proof` backup/restore lane before disruptive maintenance.
3. Use `nix run ./nix/test-cluster#cluster -- rollout-soak` when you need a longer-running fixed-membership restart proof after maintenance or rollout work.
4. Recover failed nodes by restoring the same fixed-membership cluster shape or by rebuilding the whole cluster with a freshly published static membership and then restoring data.
1. Scale out by adding a learner or voter with `MemberAdd`.
2. Promote a learner to voter by re-adding the same member ID with `is_learner=false`.
3. Replace a learner, follower, voter, or current-leader endpoint in place by re-adding the same member ID with updated peer or client URLs.
4. Hand leadership to another live voting member with `LeaderTransfer` before maintenance that should avoid the current leader taking the election hit.
5. Scale in or retire a learner, follower, voter, or current leader with `MemberRemove`; when the current leader is removed, the remaining voters elect the replacement leader.
6. Use the canonical `durability-proof` backup/restore lane before disruptive maintenance or before a membership change you cannot quickly roll back.
7. Use `nix run ./nix/test-cluster#cluster -- chainfire-live-membership-proof` when you need the dedicated KVM proof for scale-out, learner promotion, leader transfer, temporary-voter restart, current-leader removal, re-add, and scale-in on the canonical control-plane shape.
8. Use `nix run ./nix/test-cluster#cluster -- rollout-soak` when you need the longer-running restart and degraded-service proof for the canonical control-plane shape after maintenance or rollout work.
Unsupported operator actions today:
1. Live `replace-node` through a public ChainFire API.
2. Live `scale-out` by adding new voters on the supported surface.
3. Relying on internal membership helpers as a published product contract.
1. Treating internal Raft helpers outside `chainfire-api` and `chainfire-server` as the supported operator contract.
2. Treating larger-cluster, hardware, or arbitrary-topology live reconfiguration beyond the canonical KVM proof lane as release-proven. The current proof is fixed to the canonical 3-node control plane plus one temporary `node04` replica.
The focused boundary proof is `./nix/test-cluster/run-core-control-plane-ops-proof.sh`, which records the fixed-membership source marker from `chainfire-api` and the public docs markers under `./work/core-control-plane-ops-proof`. The live-operations companion is `nix run ./nix/test-cluster#cluster -- rollout-soak`, which on 2026-04-10 recorded `chainfire-post-restart-put.json`, `chainfire-post-restart.json`, and `post-control-plane-restarts.json` under `./work/rollout-soak/20260410T164549+0900` after repeated maintenance and worker power-loss, without promoting dynamic membership to supported scope.
The focused boundary proof is `./nix/test-cluster/run-core-control-plane-ops-proof.sh`, which records the published ChainFire API surface and the public docs markers under `./work/core-control-plane-ops-proof`. The dedicated live-membership KVM proof is `nix run ./nix/test-cluster#cluster -- chainfire-live-membership-proof`, which records learner add, voter promotion, live leader transfer, temporary-voter restart, current-leader removal, removed-leader re-add, and final scale-in artifacts under `./work/chainfire-live-membership-proof`. The live-operations restart companion remains `nix run ./nix/test-cluster#cluster -- rollout-soak`, which on 2026-04-10 recorded `chainfire-post-restart-put.json`, `chainfire-post-restart.json`, and `post-control-plane-restarts.json` under `./work/rollout-soak/20260410T164549+0900` after repeated maintenance and worker power-loss for the canonical 3-node control-plane shape.
## FlareDB Online Migration And Schema Evolution

View file

@ -27,7 +27,7 @@ The supported layering is still `deployer -> nix-agent` for host OS rollout and
- `nix run ./nix/test-cluster#cluster -- rollout-soak`
- `nix run ./nix/test-cluster#cluster -- durability-proof`
`deployer-vm-rollback` is the smallest reproducible proof for the `nix-agent` health-check and rollback path. `fresh-smoke` and `fleet-scheduler-e2e` keep the short regression semantics green. `rollout-soak` is the longer-running KVM operator lane for one planned drain cycle, one fail-stop worker-loss cycle, and service-restart behavior across `deployer`, `fleet-scheduler`, `node-agent`, and the fixed-membership control plane. It writes `scope-fixed-contract.json`, `deployer-scope-fixed.txt`, and `fleet-scheduler-scope-fixed.txt` so the release boundary is captured in the proof root instead of being implied only by docs. The steady-state `nix/test-cluster` nodes record explicit `nix-agent` scope markers instead of pretending they run `nix-agent.service`. `durability-proof` remains the canonical persisted artifact lane for `deployer` backup, restart, replay, and storage-side failure injection.
`deployer-vm-rollback` is the smallest reproducible proof for the `nix-agent` health-check and rollback path. `fresh-smoke` and `fleet-scheduler-e2e` keep the short regression semantics green. `rollout-soak` is the longer-running KVM operator lane for one planned drain cycle, one fail-stop worker-loss cycle, and service-restart behavior across `deployer`, `fleet-scheduler`, `node-agent`, and the canonical 3-node control plane. It writes `scope-fixed-contract.json`, `deployer-scope-fixed.txt`, and `fleet-scheduler-scope-fixed.txt` so the release boundary is captured in the proof root instead of being implied only by docs. The steady-state `nix/test-cluster` nodes record explicit `nix-agent` scope markers instead of pretending they run `nix-agent.service`. `durability-proof` remains the canonical persisted artifact lane for `deployer` backup, restart, replay, and storage-side failure injection.
## Deployer HA And DR

View file

@ -145,6 +145,7 @@ nix run ./nix/test-cluster#cluster -- baremetal-iso
nix run ./nix/test-cluster#cluster -- fresh-smoke
nix run ./nix/test-cluster#cluster -- fresh-demo-vm-webapp
nix run ./nix/test-cluster#cluster -- fresh-matrix
nix run ./nix/test-cluster#cluster -- chainfire-live-membership-proof
nix run ./nix/test-cluster#cluster -- provider-vm-reality-proof
nix run ./nix/test-cluster#cluster -- rollout-soak
./nix/test-cluster/run-publishable-kvm-suite.sh ./work/publishable-kvm-suite
@ -163,10 +164,11 @@ Use these commands as the release-facing local proof set:
- `fresh-smoke` also proves the supported PlasmaVMC backend contract by requiring both worker registrations to advertise `HYPERVISOR_TYPE_KVM` and nothing broader on the public surface
- `fresh-demo-vm-webapp`: optional VM-hosting bundle proof for `plasmavmc + prismnet` with state persisted through `lightningstor`
- `fresh-matrix`: optional composition proof for provider bundles such as `prismnet + flashdns + fiberlb` and `plasmavmc + coronafs + lightningstor`, including PrismNet security-group ACL add/remove, FiberLB TCP plus TLS-terminated `Https` / `TerminatedHttps` listeners, LightningStor bucket metadata plus object-version APIs, the published `k8shost` pod-watch surface, and the KVM-only PlasmaVMC worker contract
- `chainfire-live-membership-proof`: focused local-KVM ChainFire lane that starts from the canonical 3-node control plane, adds a temporary learner on `node04`, promotes it to voter, transfers leadership to another live voter, restarts the temporary voter, removes the current leader, re-adds the removed leader, and scales back into the canonical 3-node shape while proving local serializable reads through each transition
- `provider-vm-reality-proof`: focused local-KVM provider and VM-hosting lane that writes dated artifacts under `./work/provider-vm-reality-proof/latest`, captures authoritative FlashDNS answers, FiberLB backend drain and re-convergence, and PlasmaVMC KVM shared-storage migration plus post-migration restart state
- `rollout-soak`: focused longer-run control-plane and rollout lane that rebuilds from clean local runtime state, writes dated artifacts under `./work/rollout-soak/latest`, repeats `draining` maintenance and worker power-loss, then restarts `deployer`, `fleet-scheduler`, `node-agent`, `chainfire`, and `flaredb` while recording explicit `nix-agent` scope markers for the steady-state KVM nodes
- `durability-proof`: canonical chainfire flaredb deployer backup/restore lane. It stores artifacts under `./work/durability-proof/latest`, proves logical backup/restore for ChainFire keys and FlareDB SQL rows, uses the canonical Deployer admin pre-register request itself as the backup artifact, verifies that the pre-registered node survives a `deployer.service` restart, replays the same request idempotently, and injects CoronaFS plus LightningStor failures on the live KVM cluster
- `run-publishable-kvm-suite.sh`: reproducible wrapper that captures the KVM environment, requires real `/dev/kvm` access, keeps runtime state under `./work` by default, and runs the full publishable nested-KVM trio in a single command
- `run-publishable-kvm-suite.sh`: reproducible wrapper that captures the KVM environment, requires real `/dev/kvm` access, keeps runtime state under `./work` by default, and runs the publishable nested-KVM application lanes plus the focused ChainFire live-membership proof in a single command
- `run-supported-surface-final-proof.sh`: one-shot local wrapper that keeps builders local, records environment metadata, builds `single-node-trial-vm`, runs `supported-surface-guard`, `single-node-quickstart`, and then the publishable nested-KVM suite into one dated log root
- `baremetal-iso-e2e`: materialized exact proof runner for the same canonical ISO harness; the build output keeps the attr stable, and `./result/bin/baremetal-iso-e2e` runs the real host-KVM proof with persisted log/meta
- `deployer-vm-smoke`: lightweight regression proving that `nix-agent` can activate a host-built target closure without guest-side compilation
@ -186,8 +188,9 @@ The 2026-04-10 exact bare-metal check-runner proof is recorded under `./work/bar
- `portable-control-plane-regressions` keeps the main non-KVM-safe boundaries under continuous coverage by composing `deployer-bootstrap-e2e`, `host-lifecycle-e2e`, `deployer-vm-smoke`, and `fleet-scheduler-e2e` behind the canonical profile eval guard.
- `fresh-smoke` and `fresh-matrix` are the canonical proof for `deployer -> fleet-scheduler -> node-agent`. They cover native service placement, heartbeats, failover, and runtime reconciliation.
- `fresh-smoke` proves the supported `fleet-scheduler` maintenance semantics: short-lived `active -> draining -> active` transitions, fail-stop worker loss, and replica restoration after the node returns.
- `chainfire-live-membership-proof` is the canonical KVM proof for ChainFire live reconfiguration on the supported surface. It covers learner add, local replica catch-up, voter promotion, live leader transfer, temporary-voter restart and rejoin, current-leader removal, removed-leader re-add, and final scale-in on the canonical control-plane shape.
- `rollout-soak` is the longer-running companion lane for the same bundle. It validates exactly one planned drain cycle and one fail-stop worker-loss cycle on the two native-runtime workers, holds each degraded state for 30 seconds, restarts `deployer`, `fleet-scheduler`, `node-agent`, `chainfire`, and `flaredb`, and then revalidates the live cluster. It also writes `scope-fixed-contract.json`, `deployer-scope-fixed.txt`, and `fleet-scheduler-scope-fixed.txt` so the supported release boundary is captured in the proof root. The steady-state KVM nodes do not ship `nix-agent.service`, so the lane records scope markers there and leaves executable `nix-agent` proof to `deployer-vm-rollback`, `baremetal-iso`, and `baremetal-iso-e2e`.
- Multi-hour maintenance windows, pinned singleton relocation rules, dynamic ChainFire membership changes, destructive FlareDB schema rewrites, fully automated online migration, and large-cluster drain storms remain outside the release-proven scope and are called out explicitly in [rollout-bundle.md](rollout-bundle.md) and [control-plane-ops.md](control-plane-ops.md).
- Multi-hour maintenance windows, arbitrary multi-voter ChainFire swaps that still need joint consensus, larger-cluster or hardware ChainFire live membership reconfiguration beyond the canonical KVM proof lane, destructive FlareDB schema rewrites, fully automated online migration, and large-cluster drain storms remain outside the release-proven scope and are called out explicitly in [rollout-bundle.md](rollout-bundle.md) and [control-plane-ops.md](control-plane-ops.md).
- `fresh-smoke` also covers `k8shost` separately from `fleet-scheduler`: `k8shost` exposes tenant pod and service semantics, while `fleet-scheduler` handles bare-metal host services. `k8shost` is fixed as an API/control-plane product surface; runtime dataplane helpers stay archived non-product.
- `fresh-matrix` keeps the shipped add-on surface honest: it exercises the supported `creditservice` quota, wallet, reservation, and API-gateway flows, the published `k8shost-server` API contract, the supported LightningStor bucket metadata plus object-version APIs, and the network-provider bundle contract for PrismNet ACL lifecycle plus FiberLB TCP and TLS-terminated listeners.
- `provider-vm-reality-proof` is the artifact-producing companion lane for that same provider or VM-hosting bundle. It records PrismNet port and ACL state, authoritative FlashDNS answers, FiberLB listener drain or restore artifacts, and PlasmaVMC migration or storage-handoff state in one dated proof root.
@ -200,18 +203,18 @@ The 2026-04-10 exact bare-metal check-runner proof is recorded under `./work/bar
- FiberLB HTTPS health checks currently do not verify backend TLS certificates. Supported scope is limited to TCP reachability plus HTTP status for the backend endpoint until CA-aware verification is wired through config, server code, and the canonical harness.
- `durability-proof` is the canonical backup, restore, and failure-injection companion lane for the publishable KVM suite. Use it after `fresh-matrix` when you need persisted artifacts for `chainfire`, `flaredb`, `deployer`, `coronafs`, and `lightningstor`.
- `rollout-soak` is the longer-running maintenance and DR companion lane for the same control-plane and rollout bundle. Use it when a change is supposed to survive the current release boundary of one planned drain cycle, one fail-stop worker-loss cycle, and service-restart churn on the live KVM lab instead of only the short `fresh-smoke` window.
- `run-core-control-plane-ops-proof.sh` is the focused operator lifecycle proof for the core control plane. It records the fixed-membership ChainFire boundary, the FlareDB additive-first migration and destructive-DDL boundary, and the standalone IAM bootstrap hardening plus signing-key, credential, and mTLS rotation proof under `./work/core-control-plane-ops-proof`.
- `run-core-control-plane-ops-proof.sh` is the focused operator lifecycle proof for the core control plane. It records the published ChainFire API boundary, the FlareDB additive-first migration and destructive-DDL boundary, and the standalone IAM bootstrap hardening plus signing-key, credential, and mTLS rotation proof under `./work/core-control-plane-ops-proof`.
- The supported `deployer` HA and DR boundary is scope-fixed to one active writer plus optional cold-standby restore, not automatic multi-instance failover. The canonical runbook is to recover one writer, re-apply `ultracloud.cluster` generated state with `deployer-ctl apply`, replay preserved admin pre-register requests, and then verify state through the admin API or `deployer-ctl node inspect`; the unsupported multi-instance boundary is fixed in [rollout-bundle.md](rollout-bundle.md).
- The supported `node-agent` product contract is also fixed in [rollout-bundle.md](rollout-bundle.md): per-instance logs and pid metadata live under `${stateDir}/pids`, secrets must already exist in the rendered spec or mounted host files, host-path volumes are passed through but not provisioned, and upgrades are replace-and-reconcile operations rather than in-place patching.
- The dated 2026-04-10 proof root for that lane is `./work/durability-proof/20260410T120618+0900`; `result.json` records `success=true`, and the artifact set includes `deployer-post-restart-list.json`, `coronafs-node04-local-state.json`, and `lightningstor-head-during-node05-outage.json`.
- `single-node-quickstart` intentionally excludes `deployer`, `nix-agent`, `node-agent`, and `fleet-scheduler`, so the smallest trial surface stays focused on the VM-platform core instead of mixing rollout and scheduling responsibilities.
The three `fresh-*` VM-cluster commands are the publishable nested-KVM suite. They require a Linux host with `/dev/kvm` and nested virtualization, and the harness stops at preflight by design when that device is absent. `single-node-quickstart` and `baremetal-iso` can still fall back to `TCG` for debugging, but the release-facing `baremetal-iso-e2e` runner now requires host KVM so the exact proof lane matches the shipped hardware proxy route. `deployer-vm-smoke` and `portable-control-plane-regressions` remain the supported non-KVM developer lanes.
The three `fresh-*` VM-cluster commands plus `chainfire-live-membership-proof` make up the publishable nested-KVM suite. They require a Linux host with `/dev/kvm` and nested virtualization, and the harness stops at preflight by design when that device is absent. `single-node-quickstart` and `baremetal-iso` can still fall back to `TCG` for debugging, but the release-facing `baremetal-iso-e2e` runner now requires host KVM so the exact proof lane matches the shipped hardware proxy route. `deployer-vm-smoke` and `portable-control-plane-regressions` remain the supported non-KVM developer lanes.
Release-facing completion now requires both of these to be green on the same branch:
- the canonical bare-metal proof: `nix run ./nix/test-cluster#cluster -- baremetal-iso` plus `nix build .#checks.x86_64-linux.baremetal-iso-e2e` and `./result/bin/baremetal-iso-e2e`
- the publishable nested-KVM suite: `fresh-smoke`, `fresh-demo-vm-webapp`, and `fresh-matrix`, preferably through `./nix/test-cluster/run-publishable-kvm-suite.sh`
- the publishable nested-KVM suite: `fresh-smoke`, `fresh-demo-vm-webapp`, `fresh-matrix`, and `chainfire-live-membership-proof`, preferably through `./nix/test-cluster/run-publishable-kvm-suite.sh`
Focused operator lifecycle proof for the core control plane:

View file

@ -861,6 +861,13 @@
description = "Node bootstrap and phone-home orchestration service";
};
bootstrap-agent = buildRustWorkspace {
name = "bootstrap-agent";
workspaceSubdir = "deployer";
mainCrate = "bootstrap-agent";
description = "Typed bootstrap helper for installer contract resolution";
};
deployer-ctl = buildRustWorkspace {
name = "deployer-ctl";
workspaceSubdir = "deployer";
@ -926,6 +933,7 @@
name = "deployer-workspace";
workspaceSubdir = "deployer";
crates = [
"bootstrap-agent"
"deployer-server"
"deployer-ctl"
"node-agent"
@ -2415,6 +2423,7 @@ EOF
k8shost-server = self.packages.${final.system}.k8shost-server;
deployer-workspace = self.packages.${final.system}.deployer-workspace;
deployer-server = self.packages.${final.system}.deployer-server;
bootstrap-agent = self.packages.${final.system}.bootstrap-agent;
deployer-ctl = self.packages.${final.system}.deployer-ctl;
ultracloud-reconciler = self.packages.${final.system}.ultracloud-reconciler;
ultracloudFlakeBundle = self.packages.${final.system}.ultracloudFlakeBundle;

View file

@ -140,6 +140,7 @@
"deployer/**"
],
"build_packages": [
"bootstrap-agent",
"deployer-server",
"deployer-ctl",
"node-agent",

View file

@ -310,9 +310,8 @@
'';
};
# Auto-install service - partitions disk and runs nixos-install
systemd.services.ultracloud-install = {
description = "UltraCloud Auto-Install to Disk";
systemd.services.ultracloud-install-contract = {
description = "UltraCloud Install Contract Resolution";
wantedBy = [ "multi-user.target" ];
after = [ "ultracloud-bootstrap.service" ];
requires = [ "ultracloud-bootstrap.service" ];
@ -324,6 +323,33 @@
StandardError = "journal+console";
};
script = ''
set -euo pipefail
${pkgs.bootstrap-agent}/bin/bootstrap-agent resolve-install-context \
--node-config /etc/ultracloud/node-config.json \
--disko-script-paths /etc/ultracloud/disko-script-paths.json \
--system-paths /etc/ultracloud/system-paths.json \
--format env \
--write /run/ultracloud/install-contract.env
'';
};
# Auto-install service - partitions disk and runs nixos-install
systemd.services.ultracloud-install = {
description = "UltraCloud Auto-Install to Disk";
wantedBy = [ "multi-user.target" ];
after = [ "ultracloud-install-contract.service" ];
requires = [ "ultracloud-install-contract.service" ];
serviceConfig = {
Type = "oneshot";
RemainAfterExit = true;
StandardOutput = "journal+console";
StandardError = "journal+console";
EnvironmentFile = "/run/ultracloud/install-contract.env";
};
script = ''
set -euo pipefail
export PATH="${pkgs.nix}/bin:${config.system.build.nixos-install}/bin:$PATH"
@ -376,41 +402,16 @@
return 1
}
if [ ! -s /etc/ultracloud/node-config.json ]; then
echo "ERROR: node-config.json missing (bootstrap not complete?)"
exit 1
fi
NODE_ID=$(${pkgs.jq}/bin/jq -r '.assignment.hostname // .assignment.node_id // empty' /etc/ultracloud/node-config.json)
NODE_IP=$(${pkgs.jq}/bin/jq -r '.assignment.ip // empty' /etc/ultracloud/node-config.json)
NIXOS_CONFIGURATION=$(${pkgs.jq}/bin/jq -r '.bootstrap_plan.install_plan.nixos_configuration // .assignment.hostname // empty' /etc/ultracloud/node-config.json)
INSTALL_PLAN_DISKO_CONFIG_PATH=$(${pkgs.jq}/bin/jq -r '.bootstrap_plan.install_plan.disko_config_path // empty' /etc/ultracloud/node-config.json)
DISKO_SCRIPT_PATH=$(${pkgs.jq}/bin/jq -r '.bootstrap_plan.install_plan.disko_script_path // empty' /etc/ultracloud/node-config.json)
if [ -z "$DISKO_SCRIPT_PATH" ] && [ -r /etc/ultracloud/disko-script-paths.json ]; then
DISKO_SCRIPT_PATH=$(${pkgs.jq}/bin/jq -r --arg cfg "$NIXOS_CONFIGURATION" '.[$cfg] // empty' /etc/ultracloud/disko-script-paths.json)
if [ -n "$DISKO_SCRIPT_PATH" ]; then
echo "Resolved pre-built Disko script for install profile $NIXOS_CONFIGURATION from the ISO profile map"
fi
fi
TARGET_SYSTEM_PATH=$(${pkgs.jq}/bin/jq -r '.bootstrap_plan.install_plan.target_system_path // empty' /etc/ultracloud/node-config.json)
if [ -z "$TARGET_SYSTEM_PATH" ] && [ -r /etc/ultracloud/system-paths.json ]; then
TARGET_SYSTEM_PATH=$(${pkgs.jq}/bin/jq -r --arg cfg "$NIXOS_CONFIGURATION" '.[$cfg] // empty' /etc/ultracloud/system-paths.json)
if [ -n "$TARGET_SYSTEM_PATH" ]; then
echo "Resolved pre-built target system for install profile $NIXOS_CONFIGURATION from the ISO profile map"
fi
fi
TARGET_DISK=$(${pkgs.jq}/bin/jq -r '.bootstrap_plan.install_plan.target_disk // empty' /etc/ultracloud/node-config.json)
TARGET_DISK_BY_ID=$(${pkgs.jq}/bin/jq -r '.bootstrap_plan.install_plan.target_disk_by_id // empty' /etc/ultracloud/node-config.json)
DEPLOYER_URL="$(resolve_deployer_url)"
SRC_ROOT="/opt/ultracloud-src"
if [ -z "$NODE_ID" ] || [ -z "$NODE_IP" ]; then
echo "ERROR: node-config.json missing hostname/ip"
echo "ERROR: install contract missing hostname/ip"
exit 1
fi
if [ -z "$NIXOS_CONFIGURATION" ]; then
echo "ERROR: node-config.json missing install_plan.nixos_configuration"
echo "ERROR: install contract missing nixos configuration"
exit 1
fi
@ -594,6 +595,7 @@
vim
htop
nix
bootstrap-agent
gawk
gnugrep
util-linux

View file

@ -12,11 +12,12 @@ The hardware bridge now has its own canonical wrapper: `nix run ./nix/test-clust
The harness keeps the install contract reusable by pushing install details into classes and pools. `verify-baremetal-iso.sh` now publishes node classes whose `install_plan` owns the install profile and stable disk selector, while node records carry only identity plus any desired-system override that is genuinely host-specific. In the canonical QEMU proof that means the node record carries the prebuilt `desired_system.target_system` plus the health check, and the class carries the install plan. The chassis emulates the preferred hardware-style disk selection by attaching explicit virtio serials and installing against `/dev/disk/by-id/virtio-uc-control-root` and `/dev/disk/by-id/virtio-uc-worker-root`.
When `/dev/kvm` is absent, the portable fallback is not another harness subcommand. Use the root-flake non-KVM lane instead: `nix build .#checks.x86_64-linux.portable-control-plane-regressions`.
When `/dev/kvm` and nested virtualization are available, the reproducible publishable lane is `./nix/test-cluster/run-publishable-kvm-suite.sh`, which records environment metadata and then runs `fresh-smoke`, `fresh-demo-vm-webapp`, and `fresh-matrix` in order.
When `/dev/kvm` and nested virtualization are available, the reproducible publishable lane is `./nix/test-cluster/run-publishable-kvm-suite.sh`, which records environment metadata and then runs `fresh-smoke`, `fresh-demo-vm-webapp`, `fresh-matrix`, and `chainfire-live-membership-proof` in order.
`nix run ./nix/test-cluster#cluster -- durability-proof` is the canonical chainfire flaredb deployer backup/restore lane. It persists artifacts under `./work/durability-proof/latest`, proves logical backup/restore for ChainFire keys and FlareDB SQL rows, uses the canonical Deployer admin pre-register request itself as the backup artifact, verifies that the pre-registered node survives a `deployer.service` restart, replays the same request idempotently, and injects CoronaFS plus LightningStor failures against the live KVM cluster.
`nix run ./nix/test-cluster#cluster -- rollout-soak` is the longer-running KVM companion lane for the rollout bundle and fixed-membership control plane. It rebuilds from clean local runtime state, writes dated artifacts under `./work/rollout-soak/latest`, validates exactly one planned `draining` maintenance cycle and one fail-stop worker-loss cycle on the two native-runtime workers, holds each degraded state for 30 seconds, then restarts `deployer`, `fleet-scheduler`, `node-agent`, `chainfire`, and `flaredb` before revalidating the live cluster. The same proof root includes `scope-fixed-contract.json`, `deployer-scope-fixed.txt`, and `fleet-scheduler-scope-fixed.txt` so the supported release boundary is recorded with the runtime evidence. The steady-state KVM nodes do not run `nix-agent.service`, so the lane records `nix-agent` scope markers instead of pretending a live-cluster `nix-agent` restart happened.
`nix run ./nix/test-cluster#cluster -- rollout-soak` is the longer-running KVM companion lane for the rollout bundle and canonical control plane. It rebuilds from clean local runtime state, writes dated artifacts under `./work/rollout-soak/latest`, validates exactly one planned `draining` maintenance cycle and one fail-stop worker-loss cycle on the two native-runtime workers, holds each degraded state for 30 seconds, then restarts `deployer`, `fleet-scheduler`, `node-agent`, `chainfire`, and `flaredb` before revalidating the live cluster. The same proof root includes `scope-fixed-contract.json`, `deployer-scope-fixed.txt`, and `fleet-scheduler-scope-fixed.txt` so the supported release boundary is recorded with the runtime evidence. The steady-state KVM nodes do not run `nix-agent.service`, so the lane records `nix-agent` scope markers instead of pretending a live-cluster `nix-agent` restart happened.
`nix run ./nix/test-cluster#cluster -- chainfire-live-membership-proof` is the focused local-KVM live-reconfiguration lane for the ChainFire control plane. It rebuilds from clean local runtime state, launches a temporary ChainFire replica on `node04`, proves learner add plus local replication, voter promotion, live leader transfer to another voting member, temporary-voter restart and rejoin, current-leader removal followed by re-election, removed-leader re-add, and final scale-in back to the canonical 3-node control-plane shape, and stores artifacts under `./work/chainfire-live-membership-proof/latest`.
`nix run ./nix/test-cluster#cluster -- provider-vm-reality-proof` is the focused local-KVM reality lane for `prismnet`, `flashdns`, `fiberlb`, and `plasmavmc`. It writes authoritative DNS answers, FiberLB backend drain or restore artifacts, and PlasmaVMC migration or storage-handoff state under `./work/provider-vm-reality-proof/latest`.
`./nix/test-cluster/run-core-control-plane-ops-proof.sh` is the focused operator lifecycle proof for `chainfire`, `flaredb`, and `iam`. It records the ChainFire fixed-membership boundary, the FlareDB additive-first migration and destructive-DDL boundary, and the standalone IAM bootstrap hardening plus signing-key, credential, and mTLS rotation proof under `./work/core-control-plane-ops-proof`.
`./nix/test-cluster/run-core-control-plane-ops-proof.sh` is the focused operator lifecycle proof for `chainfire`, `flaredb`, and `iam`. It records the published ChainFire live-membership API boundary, the FlareDB additive-first migration and destructive-DDL boundary, and the standalone IAM bootstrap hardening plus signing-key, credential, and mTLS rotation proof under `./work/core-control-plane-ops-proof`.
`./nix/test-cluster/work-root-budget.sh` is the checked helper for local disk budget reporting, stronger local enforcement, and safer cleanup guidance under `./work`.
The dated 2026-04-10 artifact root for the focused control-plane proof is `./work/core-control-plane-ops-proof/20260410T172148+09:00`.
Runner-specific workflow wiring from `task/f5c70db0-baseline-profiles` is intentionally excluded from this re-aggregated baseline; the checked-in artifact here is the local wrapper.
@ -32,6 +33,7 @@ Runner-specific workflow wiring from `task/f5c70db0-baseline-profiles` is intent
- gateway-node `apigateway`, `nightlight`, and `creditservice` quota, wallet, reservation, and admission flows
- host-forwarded access to the API gateway and NightLight HTTP surfaces
- cross-node data replication smoke tests for `chainfire` and `flaredb`
- live ChainFire scale-out, learner promotion, leader transfer, temporary-voter restart, current-leader removal, re-add, and scale-in on the canonical control-plane shape
- deployer-seeded native runtime scheduling from declarative Nix service definitions, including drain/failover recovery
- ISO-based bare-metal bootstrap from `nixosConfigurations.ultracloud-iso` through phone-home, flake bundle fetch, Disko install, reboot, and desired-system activation
- durability and restore coverage for `chainfire`, `flaredb`, `deployer`, `coronafs`, and `lightningstor`
@ -78,6 +80,7 @@ nix run ./nix/test-cluster#cluster -- serve-vm-webapp
nix run ./nix/test-cluster#cluster -- fresh-serve-vm-webapp
nix run ./nix/test-cluster#cluster -- matrix
nix run ./nix/test-cluster#cluster -- fresh-matrix
nix run ./nix/test-cluster#cluster -- chainfire-live-membership-proof
nix run ./nix/test-cluster#cluster -- provider-vm-reality-proof
nix run ./nix/test-cluster#cluster -- rollout-soak
nix run ./nix/test-cluster#cluster -- durability-proof
@ -121,6 +124,8 @@ Preferred entrypoint for safer dated-proof cleanup dry-runs: `./nix/test-cluster
Preferred entrypoint for publishable matrix verification: `nix run ./nix/test-cluster#cluster -- fresh-matrix`
Preferred entrypoint for focused ChainFire live membership verification: `nix run ./nix/test-cluster#cluster -- chainfire-live-membership-proof`
Preferred entrypoint for focused provider and VM-hosting reality verification: `nix run ./nix/test-cluster#cluster -- provider-vm-reality-proof`
Preferred entrypoint for longer-running rollout maintenance and DR verification: `nix run ./nix/test-cluster#cluster -- rollout-soak`
@ -137,7 +142,7 @@ The supported operator contract for `deployer`, `fleet-scheduler`, `nix-agent`,
- `deployer` is supported as one active writer with restart or cold-standby restore. Automatic ChainFire-backed multi-instance failover is outside the supported product contract for this release.
- `nix-agent` health-check and rollback behavior is proven by `nix build .#checks.x86_64-linux.deployer-vm-rollback`, while `baremetal-iso` and `baremetal-iso-e2e` prove the same desired-system handoff with the installer in front.
- `fresh-smoke` is the canonical KVM proof for `fleet-scheduler` drain, maintenance, and failover semantics. It drains `node04`, checks relocation to `node05`, restores `node04`, then stops `node05` and verifies failover plus replica restoration when the worker returns.
- `rollout-soak` is the longer-running companion for that same contract. It proves the current release boundary of one planned drain cycle, one fail-stop worker-loss cycle, and 30-second held degraded states on the two native-runtime workers, then restarts the rollout services and the fixed-membership control-plane services before rechecking the live runtime state. The dated 2026-04-10 release-grade artifact root is `./work/rollout-soak/20260410T164549+0900`.
- `rollout-soak` is the longer-running companion for that same contract. It proves the current release boundary of one planned drain cycle, one fail-stop worker-loss cycle, and 30-second held degraded states on the two native-runtime workers, then restarts the rollout services and the canonical control-plane services before rechecking the live runtime state. The dated 2026-04-10 release-grade artifact root is `./work/rollout-soak/20260410T164549+0900`.
- `node-agent` product scope is host-local runtime reconcile only. Logs and pid metadata live under `${stateDir}/pids`, secrets must already exist in the rendered spec or mounted files, host-path volumes are pass-through only, and upgrades are replace-and-reconcile operations.
`nix run ./nix/test-cluster#cluster -- bench-storage` benchmarks CoronaFS controller-export vs node-local-export I/O, worker-side materialization latency, and LightningStor large/small-object S3 throughput, then writes a report to `docs/storage-benchmarks.md`.

View file

@ -130,6 +130,7 @@ in
environment.systemPackages = with pkgs; [
awscli2
chainfire-server
curl
dnsutils
ethtool

View file

@ -141,6 +141,8 @@ NIGHTLIGHT_QUERY_PROTO="${NIGHTLIGHT_PROTO_DIR}/query.proto"
NIGHTLIGHT_ADMIN_PROTO="${NIGHTLIGHT_PROTO_DIR}/admin.proto"
PLASMAVMC_PROTO_DIR="${REPO_ROOT}/plasmavmc/proto"
PLASMAVMC_PROTO="${PLASMAVMC_PROTO_DIR}/plasmavmc.proto"
CHAINFIRE_PROTO_DIR="${REPO_ROOT}/chainfire/proto"
CHAINFIRE_PROTO="${CHAINFIRE_PROTO_DIR}/chainfire.proto"
FLAREDB_PROTO_DIR="${REPO_ROOT}/flaredb/crates/flaredb-proto/src"
FLAREDB_PROTO="${FLAREDB_PROTO_DIR}/kvrpc.proto"
FLAREDB_SQL_PROTO="${FLAREDB_PROTO_DIR}/sqlrpc.proto"
@ -554,6 +556,20 @@ prepare_rollout_soak_dir() {
printf '%s\n' "${proof_dir}"
}
chainfire_live_membership_proof_root() {
printf '%s/%s\n' "${ULTRACLOUD_WORK_ROOT}" "chainfire-live-membership-proof"
}
prepare_chainfire_live_membership_proof_dir() {
local proof_root proof_dir timestamp
proof_root="$(chainfire_live_membership_proof_root)"
timestamp="$(date '+%Y%m%dT%H%M%S%z')"
proof_dir="${proof_root}/${timestamp}"
mkdir -p "${proof_dir}"
ln -sfn "${proof_dir}" "${proof_root}/latest"
printf '%s\n' "${proof_dir}"
}
provider_vm_reality_proof_root() {
printf '%s/%s\n' "${ULTRACLOUD_WORK_ROOT}" "provider-vm-reality-proof"
}
@ -9943,7 +9959,7 @@ run_rollout_soak() {
ssh_node node05 "journalctl -u node-agent -b --since '${started_at}' --no-pager" \
>"${proof_dir}/node05-node-agent-journal.log"
log "Rollout soak: restarting fixed-membership ChainFire and FlareDB members"
log "Rollout soak: restarting canonical ChainFire and FlareDB members"
ssh_node node02 "systemctl restart chainfire.service"
wait_for_unit node02 chainfire
wait_for_http node02 "http://127.0.0.1:8081/health"
@ -9960,7 +9976,7 @@ run_rollout_soak() {
>"${proof_dir}/chainfire-post-restart.json"
jq -e --arg expected "${chainfire_value}" '.data.value == $expected' \
"${proof_dir}/chainfire-post-restart.json" >/dev/null \
|| die "ChainFire fixed-membership restart proof did not reproduce the expected value"
|| die "ChainFire restart proof did not reproduce the expected value"
ssh_node node02 "journalctl -u chainfire -b --since '${started_at}' --no-pager" \
>"${proof_dir}/chainfire-node02-journal.log"
@ -10010,13 +10026,705 @@ run_rollout_soak() {
--argjson validated_maintenance_cycles "${validated_maintenance_cycles}" \
--argjson validated_power_loss_cycles "${validated_power_loss_cycles}" \
--argjson soak_hold_secs "${soak_hold_secs}" \
--arg summary "validated one planned drain cycle and one fail-stop worker-loss cycle on the two-node native-runtime lab, held each degraded state for the configured soak window, restarted deployer or scheduler or agent services, and revalidated fixed-membership control-plane restarts while keeping deployer HA scope-fixed to single-writer recovery" \
--arg summary "validated one planned drain cycle and one fail-stop worker-loss cycle on the two-node native-runtime lab, held each degraded state for the configured soak window, restarted deployer or scheduler or agent services, and revalidated canonical control-plane restarts while keeping deployer HA scope-fixed to single-writer recovery" \
'{started_at:$started_at, finished_at:$finished_at, artifact_root:$artifact_root, deployer_supported_writer_mode:$deployer_supported_writer_mode, fleet_supported_native_runtime_nodes:$fleet_supported_native_runtime_nodes, validated_maintenance_cycles:$validated_maintenance_cycles, validated_power_loss_cycles:$validated_power_loss_cycles, soak_hold_secs:$soak_hold_secs, summary:$summary, success:true}' \
>"${proof_dir}/result.json"
log "Long-run rollout soak succeeded; artifacts are in ${proof_dir}"
}
run_chainfire_live_membership_proof() {
local proof_dir started_at finished_at
local chainfire_tunnel_node01="" chainfire_tunnel_node02="" chainfire_tunnel_node03="" chainfire_tunnel_node04=""
local chainfire_rest_tunnel_node01="" chainfire_rest_tunnel_node02="" chainfire_rest_tunnel_node03="" chainfire_rest_tunnel_node04=""
local learner_key learner_value promoted_key promoted_value transfer_key transfer_value restart_key restart_value removed_key removed_value final_key final_value
local leader_before_transfer_id="" transfer_target_id="" removed_leader_id="" new_leader_id=""
proof_dir="$(prepare_chainfire_live_membership_proof_dir)"
started_at="$(date -Iseconds)"
cleanup_chainfire_live_membership_proof() {
set +e
set +u
ssh_node_script node04 <<'EOS' >/dev/null 2>&1 || true
set +e
runtime_dir="/run/chainfire-live-membership-proof"
pid_path="${runtime_dir}/chainfire.pid"
if [[ -f "${pid_path}" ]]; then
kill "$(cat "${pid_path}")" >/dev/null 2>&1 || true
rm -f "${pid_path}"
fi
pkill -f '/run/current-system/sw/bin/chainfire --config /run/chainfire-live-membership-proof/config.toml' >/dev/null 2>&1 || true
EOS
stop_ssh_tunnel node04 "${chainfire_rest_tunnel_node04}" >/dev/null 2>&1 || true
stop_ssh_tunnel node03 "${chainfire_rest_tunnel_node03}" >/dev/null 2>&1 || true
stop_ssh_tunnel node02 "${chainfire_rest_tunnel_node02}" >/dev/null 2>&1 || true
stop_ssh_tunnel node01 "${chainfire_rest_tunnel_node01}" >/dev/null 2>&1 || true
stop_ssh_tunnel node04 "${chainfire_tunnel_node04}" >/dev/null 2>&1 || true
stop_ssh_tunnel node03 "${chainfire_tunnel_node03}" >/dev/null 2>&1 || true
stop_ssh_tunnel node02 "${chainfire_tunnel_node02}" >/dev/null 2>&1 || true
stop_ssh_tunnel node01 "${chainfire_tunnel_node01}" >/dev/null 2>&1 || true
}
trap cleanup_chainfire_live_membership_proof RETURN
jq -n \
--arg command "nix run ./nix/test-cluster#cluster -- chainfire-live-membership-proof" \
--arg proof_dir "${proof_dir}" \
--arg started_at "${started_at}" \
--arg ultracloud_work_root "${ULTRACLOUD_WORK_ROOT}" \
--arg photon_cluster_work_root "${WORK_ROOT}" \
--arg build_profile "${BUILD_PROFILE}" \
'{command:$command, proof_dir:$proof_dir, started_at:$started_at, ultracloud_work_root:$ultracloud_work_root, photon_cluster_work_root:$photon_cluster_work_root, build_profile:$build_profile}' \
>"${proof_dir}/meta.json"
chainfire_cluster_rpc() {
local endpoint="$1"
local method="$2"
local payload="${3-}"
if [[ -n "${payload}" ]]; then
grpcurl -plaintext \
-import-path "${CHAINFIRE_PROTO_DIR}" \
-proto "${CHAINFIRE_PROTO}" \
-d "${payload}" \
"${endpoint}" \
"${method}"
return
fi
grpcurl -plaintext \
-import-path "${CHAINFIRE_PROTO_DIR}" \
-proto "${CHAINFIRE_PROTO}" \
"${endpoint}" \
"${method}"
}
chainfire_member_list_json() {
local endpoint="${1:-127.0.0.1:12379}"
chainfire_cluster_rpc "${endpoint}" "chainfire.v1.Cluster/MemberList"
}
chainfire_status_json() {
local endpoint="${1:-127.0.0.1:12379}"
chainfire_cluster_rpc "${endpoint}" "chainfire.v1.Cluster/Status"
}
chainfire_rest_url_for_id() {
case "$1" in
1) printf '%s\n' "http://127.0.0.1:18081" ;;
2) printf '%s\n' "http://127.0.0.1:18082" ;;
3) printf '%s\n' "http://127.0.0.1:18083" ;;
4) printf '%s\n' "http://127.0.0.1:18084" ;;
*) return 1 ;;
esac
}
chainfire_grpc_endpoint_for_id() {
case "$1" in
1) printf '%s\n' "127.0.0.1:12379" ;;
2) printf '%s\n' "127.0.0.1:12380" ;;
3) printf '%s\n' "127.0.0.1:12381" ;;
4) printf '%s\n' "127.0.0.1:12382" ;;
*) return 1 ;;
esac
}
chainfire_node_name_for_id() {
case "$1" in
1) printf '%s\n' "node01" ;;
2) printf '%s\n' "node02" ;;
3) printf '%s\n' "node03" ;;
4) printf '%s\n' "node04" ;;
*) return 1 ;;
esac
}
chainfire_raft_addr_for_id() {
case "$1" in
1) printf '%s\n' "10.100.0.11:2380" ;;
2) printf '%s\n' "10.100.0.12:2380" ;;
3) printf '%s\n' "10.100.0.13:2380" ;;
4) printf '%s\n' "10.100.0.21:2380" ;;
*) return 1 ;;
esac
}
chainfire_client_url_for_id() {
case "$1" in
1) printf '%s\n' "http://10.100.0.11:2379" ;;
2) printf '%s\n' "http://10.100.0.12:2379" ;;
3) printf '%s\n' "http://10.100.0.13:2379" ;;
4) printf '%s\n' "http://10.100.0.21:2379" ;;
*) return 1 ;;
esac
}
chainfire_status_from_any_endpoint() {
local endpoint output leader
for endpoint in 127.0.0.1:12379 127.0.0.1:12380 127.0.0.1:12381 127.0.0.1:12382; do
output="$(chainfire_status_json "${endpoint}" 2>/dev/null || true)"
leader="$(printf '%s' "${output}" | jq -r '.leader // 0' 2>/dev/null || printf '0')"
if [[ -n "${output}" ]] && [[ "${leader}" =~ ^[0-9]+$ ]] && (( leader > 0 )); then
printf '%s' "${output}"
return 0
fi
done
return 1
}
chainfire_current_leader_id() {
local output leader
output="$(chainfire_status_from_any_endpoint 2>/dev/null || true)"
leader="$(printf '%s' "${output}" | jq -r '.leader // 0' 2>/dev/null || printf '0')"
if [[ "${leader}" =~ ^[0-9]+$ ]] && (( leader > 0 )); then
printf '%s\n' "${leader}"
return 0
fi
return 1
}
chainfire_wait_membership() {
local jq_expr="$1"
local timeout="${2:-180}"
local endpoint="${3:-127.0.0.1:12379}"
local deadline=$((SECONDS + timeout))
local output=""
while true; do
output="$(chainfire_member_list_json "${endpoint}" 2>/dev/null || true)"
if [[ -n "${output}" ]] && printf '%s' "${output}" | jq -e "${jq_expr}" >/dev/null 2>&1; then
printf '%s' "${output}"
return 0
fi
if (( SECONDS >= deadline )); then
die "timed out waiting for ChainFire membership to satisfy ${jq_expr}"
fi
sleep 2
done
}
chainfire_wait_for_new_leader() {
local old_leader="$1"
local timeout="${2:-180}"
local deadline=$((SECONDS + timeout))
local leader=""
while true; do
leader="$(chainfire_current_leader_id 2>/dev/null || true)"
if [[ "${leader}" =~ ^[0-9]+$ ]] && (( leader > 0 )) && [[ "${leader}" != "${old_leader}" ]]; then
printf '%s' "${leader}"
return 0
fi
if (( SECONDS >= deadline )); then
die "timed out waiting for ChainFire leader change away from ${old_leader}"
fi
sleep 2
done
}
chainfire_wait_for_specific_leader() {
local expected_leader="$1"
local timeout="${2:-180}"
local deadline=$((SECONDS + timeout))
local leader=""
while true; do
leader="$(chainfire_current_leader_id 2>/dev/null || true)"
if [[ "${leader}" =~ ^[0-9]+$ ]] && (( leader > 0 )) && [[ "${leader}" == "${expected_leader}" ]]; then
printf '%s' "${leader}"
return 0
fi
if (( SECONDS >= deadline )); then
die "timed out waiting for ChainFire leader ${expected_leader}"
fi
sleep 2
done
}
chainfire_put_key() {
local key="$1"
local value="$2"
local output_path="$3"
local timeout="${4:-180}"
local deadline=$((SECONDS + timeout))
local leader_id="" rest_url=""
while true; do
leader_id="$(chainfire_current_leader_id 2>/dev/null || true)"
if [[ "${leader_id}" =~ ^[0-9]+$ ]] && (( leader_id > 0 )); then
rest_url="$(chainfire_rest_url_for_id "${leader_id}" 2>/dev/null || true)"
if [[ -n "${rest_url}" ]] && curl -fsS \
-X PUT \
-H 'content-type: application/json' \
-d "$(jq -cn --arg value "${value}" '{value:$value}')" \
"${rest_url}/api/v1/kv/${key}" \
>"${output_path}" 2>/dev/null; then
return 0
fi
fi
if (( SECONDS >= deadline )); then
die "timed out writing ChainFire key ${key} through the current leader"
fi
sleep 2
done
}
chainfire_serializable_get_status() {
local node_id="$1"
local key="$2"
local output_path="$3"
local rest_url
rest_url="$(chainfire_rest_url_for_id "${node_id}")"
curl -sS -o "${output_path}" -w '%{http_code}' \
"${rest_url}/api/v1/kv/${key}?consistency=serializable" || true
}
chainfire_wait_local_value() {
local node_id="$1"
local key="$2"
local expected_value="$3"
local output_path="$4"
local timeout="${5:-180}"
local deadline=$((SECONDS + timeout))
local status=""
while true; do
status="$(chainfire_serializable_get_status "${node_id}" "${key}" "${output_path}")"
if [[ "${status}" == "200" ]] && jq -e --arg expected "${expected_value}" '.data.value == $expected' "${output_path}" >/dev/null 2>&1; then
return 0
fi
if (( SECONDS >= deadline )); then
die "timed out waiting for ChainFire node${node_id} to serve ${key} locally"
fi
sleep 2
done
}
chainfire_wait_local_absent() {
local node_id="$1"
local key="$2"
local output_path="$3"
local timeout="${4:-180}"
local deadline=$((SECONDS + timeout))
local status=""
while true; do
status="$(chainfire_serializable_get_status "${node_id}" "${key}" "${output_path}")"
if [[ "${status}" == "404" ]]; then
return 0
fi
if (( SECONDS >= deadline )); then
die "timed out waiting for ChainFire node${node_id} to stop serving ${key} locally"
fi
sleep 2
done
}
chainfire_wait_member_visible_locally() {
local endpoint="$1"
local member_id="$2"
local expected_is_learner="$3"
local timeout="${4:-180}"
chainfire_wait_membership "any(.members[]; (.id | tostring) == \"${member_id}\" and ((.isLearner // .is_learner // false) == ${expected_is_learner}))" "${timeout}" "${endpoint}" >/dev/null
}
chainfire_post_member_request() {
local request_json="$1"
local output_path="$2"
local timeout="${3:-180}"
local deadline=$((SECONDS + timeout))
local status="" leader_id="" rest_url=""
while true; do
leader_id="$(chainfire_current_leader_id 2>/dev/null || true)"
rest_url="$(chainfire_rest_url_for_id "${leader_id}" 2>/dev/null || true)"
status="$(curl -sS -o "${output_path}" -w '%{http_code}' \
-X POST \
-H 'content-type: application/json' \
-d "${request_json}" \
"${rest_url}/api/v1/cluster/members" || true)"
if [[ "${status}" == "201" ]]; then
return 0
fi
if (( SECONDS >= deadline )); then
cat "${output_path}" >&2 || true
die "ChainFire member add request did not succeed (status ${status})"
fi
sleep 2
done
}
chainfire_delete_member_request() {
local member_id="$1"
local output_path="$2"
local timeout="${3:-180}"
local deadline=$((SECONDS + timeout))
local status="" leader_id="" rest_url=""
while true; do
leader_id="$(chainfire_current_leader_id 2>/dev/null || true)"
rest_url="$(chainfire_rest_url_for_id "${leader_id}" 2>/dev/null || true)"
status="$(curl -sS -o "${output_path}" -w '%{http_code}' \
-X DELETE \
"${rest_url}/api/v1/cluster/members/${member_id}" || true)"
if [[ "${status}" == "200" ]]; then
return 0
fi
if (( SECONDS >= deadline )); then
cat "${output_path}" >&2 || true
die "ChainFire member remove request for ${member_id} did not succeed (status ${status})"
fi
sleep 2
done
}
chainfire_transfer_leader_request() {
local target_id="$1"
local output_path="$2"
local timeout="${3:-180}"
local deadline=$((SECONDS + timeout))
local status="" leader_id="" rest_url=""
while true; do
leader_id="$(chainfire_current_leader_id 2>/dev/null || true)"
rest_url="$(chainfire_rest_url_for_id "${leader_id}" 2>/dev/null || true)"
status="$(curl -sS -o "${output_path}" -w '%{http_code}' \
-X POST \
-H 'content-type: application/json' \
-d "$(jq -cn --argjson target_id "${target_id}" '{target_id:$target_id}')" \
"${rest_url}/api/v1/cluster/leader/transfer" || true)"
if [[ "${status}" == "200" ]]; then
return 0
fi
if (( SECONDS >= deadline )); then
cat "${output_path}" >&2 || true
die "ChainFire leader transfer request to ${target_id} did not succeed (status ${status})"
fi
sleep 2
done
}
chainfire_wait_internal_http_from_node01() {
local timeout="${1:-120}"
local deadline=$((SECONDS + timeout))
while true; do
if ssh_node_script node01 <<'EOS' >/dev/null 2>&1
set -euo pipefail
for ip in 10.100.0.11 10.100.0.12 10.100.0.13; do
curl -fsS "http://${ip}:8081/health" >/dev/null
done
EOS
then
return 0
fi
if (( SECONDS >= deadline )); then
die "timed out waiting for internal ChainFire HTTP reachability from node01"
fi
sleep 2
done
}
chainfire_wait_internal_replication_from_node01() {
local timeout="${1:-120}"
local deadline=$((SECONDS + timeout))
while true; do
if ssh_node_script node01 <<'EOS' >/tmp/chainfire-internal-ready.out 2>/tmp/chainfire-internal-ready.err
set -euo pipefail
key="validation-chainfire-final-$(date +%s)-$RANDOM"
value="ok-$RANDOM"
nodes=(10.100.0.11 10.100.0.12 10.100.0.13)
leader=""
for ip in "${nodes[@]}"; do
code="$(curl -sS -o /tmp/chainfire-final-put.out -w '%{http_code}' \
-X PUT "http://${ip}:8081/api/v1/kv/${key}" \
-H 'Content-Type: application/json' \
-d "{\"value\":\"${value}\"}" || true)"
if [[ "${code}" == "200" ]]; then
leader="${ip}"
break
fi
done
[[ -n "${leader}" ]]
for ip in "${nodes[@]}"; do
actual="$(curl -fsS "http://${ip}:8081/api/v1/kv/${key}" | jq -r '.data.value')"
[[ "${actual}" == "${value}" ]]
done
printf '{"key":"%s","value":"%s","leader":"%s"}\n' "${key}" "${value}" "${leader}"
EOS
then
cat /tmp/chainfire-internal-ready.out
return 0
fi
if (( SECONDS >= deadline )); then
cat /tmp/chainfire-internal-ready.err >&2 || true
die "timed out waiting for internal ChainFire replication to stabilize from node01"
fi
sleep 2
done
}
restart_temporary_chainfire_node04() {
ssh_node_script node04 <<'EOS'
set -euo pipefail
runtime_dir="/run/chainfire-live-membership-proof"
pid_path="${runtime_dir}/chainfire.pid"
log_path="${runtime_dir}/chainfire.log"
config_path="${runtime_dir}/config.toml"
mkdir -p "${runtime_dir}"
if [[ ! -f "${config_path}" ]]; then
echo "temporary ChainFire config missing at ${config_path}" >&2
exit 1
fi
if [[ -f "${pid_path}" ]]; then
kill "$(cat "${pid_path}")" >/dev/null 2>&1 || true
rm -f "${pid_path}"
fi
pkill -f '/run/current-system/sw/bin/chainfire --config /run/chainfire-live-membership-proof/config.toml' >/dev/null 2>&1 || true
printf '\n[chainfire-live-membership-proof] restarting temporary voter at %s\n' "$(date -Is)" >>"${log_path}"
nohup /run/current-system/sw/bin/chainfire --config "${config_path}" --metrics-port 9194 >>"${log_path}" 2>&1 &
echo $! >"${pid_path}"
for _ in $(seq 1 60); do
if curl -fsS http://10.100.0.21:8081/health >/dev/null 2>&1; then
exit 0
fi
sleep 1
done
echo "restarted temporary ChainFire on node04 did not become healthy" >&2
exit 1
EOS
}
log "Running ChainFire live membership proof; artifacts will be written to ${proof_dir}"
validate_control_plane
validate_workers
chainfire_tunnel_node01="$(start_ssh_tunnel node01 12379 2379 "${NODE_IPS[node01]}")"
chainfire_tunnel_node02="$(start_ssh_tunnel node02 12380 2379 "${NODE_IPS[node02]}")"
chainfire_tunnel_node03="$(start_ssh_tunnel node03 12381 2379 "${NODE_IPS[node03]}")"
chainfire_rest_tunnel_node01="$(start_ssh_tunnel node01 18081 8081 "${NODE_IPS[node01]}")"
chainfire_rest_tunnel_node02="$(start_ssh_tunnel node02 18082 8081 "${NODE_IPS[node02]}")"
chainfire_rest_tunnel_node03="$(start_ssh_tunnel node03 18083 8081 "${NODE_IPS[node03]}")"
chainfire_member_list_json "127.0.0.1:12379" >"${proof_dir}/baseline-membership.json"
chainfire_status_json "127.0.0.1:12379" >"${proof_dir}/baseline-status.json"
curl -fsS http://127.0.0.1:18081/api/v1/cluster/status >"${proof_dir}/baseline-node01-rest-status.json"
curl -fsS http://127.0.0.1:18082/api/v1/cluster/status >"${proof_dir}/baseline-node02-rest-status.json"
curl -fsS http://127.0.0.1:18083/api/v1/cluster/status >"${proof_dir}/baseline-node03-rest-status.json"
jq -e '(.members | length) == 3 and all(.members[]; (.isLearner // .is_learner // false) == false)' "${proof_dir}/baseline-membership.json" >/dev/null \
|| die "expected baseline ChainFire membership to contain exactly three voters"
log "ChainFire live membership proof: starting temporary learner on node04"
ssh_node_script node04 <<'EOS'
set -euo pipefail
runtime_dir="/run/chainfire-live-membership-proof"
data_dir="/var/lib/chainfire-live-membership-proof"
pid_path="${runtime_dir}/chainfire.pid"
log_path="${runtime_dir}/chainfire.log"
config_path="${runtime_dir}/config.toml"
mkdir -p "${runtime_dir}"
if [[ -f "${pid_path}" ]]; then
kill "$(cat "${pid_path}")" >/dev/null 2>&1 || true
rm -f "${pid_path}"
fi
pkill -f '/run/current-system/sw/bin/chainfire --config /run/chainfire-live-membership-proof/config.toml' >/dev/null 2>&1 || true
rm -rf "${data_dir}"
mkdir -p "${data_dir}"
cat >"${config_path}" <<'EOF'
[node]
id = 4
name = "node04"
role = "control_plane"
[storage]
data_dir = "/var/lib/chainfire-live-membership-proof"
[network]
api_addr = "10.100.0.21:2379"
http_addr = "10.100.0.21:8081"
raft_addr = "10.100.0.21:2380"
gossip_addr = "10.100.0.21:2381"
[cluster]
id = 1
initial_members = [
{ id = 1, raft_addr = "10.100.0.11:2380" },
{ id = 2, raft_addr = "10.100.0.12:2380" },
{ id = 3, raft_addr = "10.100.0.13:2380" },
]
bootstrap = false
[raft]
role = "learner"
EOF
nohup /run/current-system/sw/bin/chainfire --config "${config_path}" --metrics-port 9194 >"${log_path}" 2>&1 &
echo $! >"${pid_path}"
for _ in $(seq 1 60); do
if curl -fsS http://10.100.0.21:8081/health >/dev/null 2>&1; then
exit 0
fi
sleep 1
done
echo "temporary ChainFire on node04 did not become healthy" >&2
exit 1
EOS
chainfire_tunnel_node04="$(start_ssh_tunnel node04 12382 2379 "${NODE_IPS[node04]}")"
chainfire_rest_tunnel_node04="$(start_ssh_tunnel node04 18084 8081 "${NODE_IPS[node04]}")"
chainfire_status_json "127.0.0.1:12382" >"${proof_dir}/node04-temporary-status.json"
curl -fsS http://127.0.0.1:18084/api/v1/cluster/status >"${proof_dir}/node04-temporary-rest-status.json"
log "ChainFire live membership proof: adding node04 as learner"
chainfire_post_member_request \
"$(jq -cn --argjson node_id 4 --arg raft_addr "10.100.0.21:2380" --arg client_url "http://10.100.0.21:2379" --arg name "node04" '{node_id:$node_id, raft_addr:$raft_addr, client_url:$client_url, name:$name, is_learner:true}')" \
"${proof_dir}/member-add-node04-learner.json"
chainfire_wait_membership '(.members | length) == 4 and any(.members[]; (.id | tostring) == "4" and ((.isLearner // .is_learner // false) == true))' 180 >"${proof_dir}/membership-after-node04-learner.json"
chainfire_wait_member_visible_locally "127.0.0.1:12382" "4" "true" 180
learner_key="chainfire-live-proof-learner-$(date +%s)-$RANDOM"
learner_value="learner-${RANDOM}"
chainfire_put_key "${learner_key}" "${learner_value}" "${proof_dir}/learner-put.json"
chainfire_wait_local_value 4 "${learner_key}" "${learner_value}" "${proof_dir}/learner-node04-local-read.json" 180
log "ChainFire live membership proof: promoting node04 to voter"
chainfire_post_member_request \
"$(jq -cn --argjson node_id 4 --arg raft_addr "10.100.0.21:2380" --arg client_url "http://10.100.0.21:2379" --arg name "node04" '{node_id:$node_id, raft_addr:$raft_addr, client_url:$client_url, name:$name, is_learner:false}')" \
"${proof_dir}/member-promote-node04.json"
chainfire_wait_membership '(.members | length) == 4 and any(.members[]; (.id | tostring) == "4" and ((.isLearner // .is_learner // false) == false))' 180 >"${proof_dir}/membership-after-node04-promotion.json"
chainfire_wait_member_visible_locally "127.0.0.1:12382" "4" "false" 180
promoted_key="chainfire-live-proof-voter-$(date +%s)-$RANDOM"
promoted_value="voter-${RANDOM}"
chainfire_put_key "${promoted_key}" "${promoted_value}" "${proof_dir}/promoted-put.json"
chainfire_wait_local_value 1 "${promoted_key}" "${promoted_value}" "${proof_dir}/promoted-node01-local-read.json" 180
chainfire_wait_local_value 2 "${promoted_key}" "${promoted_value}" "${proof_dir}/promoted-node02-local-read.json" 180
chainfire_wait_local_value 3 "${promoted_key}" "${promoted_value}" "${proof_dir}/promoted-node03-local-read.json" 180
chainfire_wait_local_value 4 "${promoted_key}" "${promoted_value}" "${proof_dir}/promoted-node04-local-read.json" 180
chainfire_status_from_any_endpoint >"${proof_dir}/status-after-node04-promotion.json"
leader_before_transfer_id="$(jq -r '.leader' "${proof_dir}/status-after-node04-promotion.json")"
[[ "${leader_before_transfer_id}" =~ ^[0-9]+$ ]] && (( leader_before_transfer_id > 0 )) \
|| die "could not determine current ChainFire leader before live transfer"
printf '%s\n' "${leader_before_transfer_id}" >"${proof_dir}/leader-before-transfer.txt"
if [[ "${leader_before_transfer_id}" != "2" ]]; then
transfer_target_id="2"
else
transfer_target_id="3"
fi
printf '%s\n' "${transfer_target_id}" >"${proof_dir}/leader-transfer-target.txt"
log "ChainFire live membership proof: transferring leader ${leader_before_transfer_id} to ${transfer_target_id}"
chainfire_transfer_leader_request "${transfer_target_id}" "${proof_dir}/leader-transfer.json"
chainfire_wait_for_specific_leader "${transfer_target_id}" 180 >"${proof_dir}/leader-after-transfer.txt"
chainfire_status_from_any_endpoint >"${proof_dir}/status-after-leader-transfer.json"
jq -e --arg target "${transfer_target_id}" '(.leader | tostring) == $target' "${proof_dir}/status-after-leader-transfer.json" >/dev/null \
|| die "expected ChainFire leader transfer to settle on ${transfer_target_id}"
transfer_key="chainfire-live-proof-transfer-$(date +%s)-$RANDOM"
transfer_value="transfer-${RANDOM}"
chainfire_put_key "${transfer_key}" "${transfer_value}" "${proof_dir}/transfer-put.json"
chainfire_wait_local_value 1 "${transfer_key}" "${transfer_value}" "${proof_dir}/transfer-node01-local-read.json" 180
chainfire_wait_local_value 2 "${transfer_key}" "${transfer_value}" "${proof_dir}/transfer-node02-local-read.json" 180
chainfire_wait_local_value 3 "${transfer_key}" "${transfer_value}" "${proof_dir}/transfer-node03-local-read.json" 180
chainfire_wait_local_value 4 "${transfer_key}" "${transfer_value}" "${proof_dir}/transfer-node04-local-read.json" 180
log "ChainFire live membership proof: restarting temporary voter on node04"
restart_temporary_chainfire_node04
chainfire_wait_membership '(.members | length) == 4 and any(.members[]; (.id | tostring) == "4" and ((.isLearner // .is_learner // false) == false))' 180 >"${proof_dir}/membership-after-node04-restart.json"
chainfire_wait_member_visible_locally "127.0.0.1:12382" "4" "false" 180
chainfire_status_json "127.0.0.1:12382" >"${proof_dir}/node04-status-after-restart.json"
chainfire_status_from_any_endpoint >"${proof_dir}/status-after-node04-restart.json"
restart_key="chainfire-live-proof-restart-$(date +%s)-$RANDOM"
restart_value="restart-${RANDOM}"
chainfire_put_key "${restart_key}" "${restart_value}" "${proof_dir}/restart-put.json"
chainfire_wait_local_value 1 "${restart_key}" "${restart_value}" "${proof_dir}/restart-node01-local-read.json" 180
chainfire_wait_local_value 2 "${restart_key}" "${restart_value}" "${proof_dir}/restart-node02-local-read.json" 180
chainfire_wait_local_value 3 "${restart_key}" "${restart_value}" "${proof_dir}/restart-node03-local-read.json" 180
chainfire_wait_local_value 4 "${restart_key}" "${restart_value}" "${proof_dir}/restart-node04-local-read.json" 180
removed_leader_id="$(jq -r '.leader' "${proof_dir}/status-after-node04-restart.json")"
[[ "${removed_leader_id}" =~ ^[0-9]+$ ]] && (( removed_leader_id > 0 )) \
|| die "could not determine current ChainFire leader before live removal"
printf '%s\n' "${removed_leader_id}" >"${proof_dir}/leader-before-removal.txt"
log "ChainFire live membership proof: removing current leader ${removed_leader_id}"
chainfire_delete_member_request "${removed_leader_id}" "${proof_dir}/member-remove-leader.json"
chainfire_wait_membership "(.members | length) == 3 and (all(.members[]; (.id | tostring) != \"${removed_leader_id}\"))" 180 >"${proof_dir}/membership-after-leader-removal.json"
new_leader_id="$(chainfire_wait_for_new_leader "${removed_leader_id}" 180)"
printf '%s\n' "${new_leader_id}" >"${proof_dir}/leader-after-removal.txt"
chainfire_status_json "127.0.0.1:12379" >"${proof_dir}/status-after-leader-removal.json"
removed_key="chainfire-live-proof-removed-$(date +%s)-$RANDOM"
removed_value="removed-${RANDOM}"
chainfire_put_key "${removed_key}" "${removed_value}" "${proof_dir}/post-removal-put.json"
if [[ "${removed_leader_id}" != "1" ]]; then
chainfire_wait_local_value 1 "${removed_key}" "${removed_value}" "${proof_dir}/post-removal-node01-local-read.json" 180
fi
if [[ "${removed_leader_id}" != "2" ]]; then
chainfire_wait_local_value 2 "${removed_key}" "${removed_value}" "${proof_dir}/post-removal-node02-local-read.json" 180
fi
if [[ "${removed_leader_id}" != "3" ]]; then
chainfire_wait_local_value 3 "${removed_key}" "${removed_value}" "${proof_dir}/post-removal-node03-local-read.json" 180
fi
if [[ "${removed_leader_id}" != "4" ]]; then
chainfire_wait_local_value 4 "${removed_key}" "${removed_value}" "${proof_dir}/post-removal-node04-local-read.json" 180
fi
chainfire_wait_local_absent "${removed_leader_id}" "${removed_key}" "${proof_dir}/post-removal-removed-leader-local-read.out" 180
log "ChainFire live membership proof: re-adding removed leader ${removed_leader_id}"
chainfire_post_member_request \
"$(jq -cn \
--argjson node_id "${removed_leader_id}" \
--arg raft_addr "$(chainfire_raft_addr_for_id "${removed_leader_id}")" \
--arg name "$(chainfire_node_name_for_id "${removed_leader_id}")" \
--arg client_url "$(chainfire_client_url_for_id "${removed_leader_id}")" \
'{node_id:$node_id, raft_addr:$raft_addr, client_url:$client_url, name:$name, is_learner:false}')" \
"${proof_dir}/member-readd-leader.json"
chainfire_wait_membership "(.members | length) == 4 and any(.members[]; (.id | tostring) == \"${removed_leader_id}\" and ((.isLearner // .is_learner // false) == false))" 180 >"${proof_dir}/membership-after-leader-readd.json"
chainfire_wait_local_value "${removed_leader_id}" "${removed_key}" "${removed_value}" "${proof_dir}/post-readd-restored-leader-local-read.json" 180
log "ChainFire live membership proof: removing temporary node04 and restoring canonical shape"
chainfire_delete_member_request "4" "${proof_dir}/member-remove-node04.json"
chainfire_wait_membership '(.members | length) == 3 and (all(.members[]; (.id | tostring) != "4"))' 180 >"${proof_dir}/final-membership.json"
final_key="chainfire-live-proof-final-$(date +%s)-$RANDOM"
final_value="final-${RANDOM}"
chainfire_put_key "${final_key}" "${final_value}" "${proof_dir}/final-put.json"
chainfire_wait_local_value 1 "${final_key}" "${final_value}" "${proof_dir}/final-node01-local-read.json" 180
chainfire_wait_local_value 2 "${final_key}" "${final_value}" "${proof_dir}/final-node02-local-read.json" 180
chainfire_wait_local_value 3 "${final_key}" "${final_value}" "${proof_dir}/final-node03-local-read.json" 180
chainfire_wait_local_absent 4 "${final_key}" "${proof_dir}/final-node04-local-read.out" 180
chainfire_wait_internal_http_from_node01 120
chainfire_wait_internal_replication_from_node01 120 >"${proof_dir}/final-internal-replication.json"
ssh_node node04 "cat /run/chainfire-live-membership-proof/chainfire.log" >"${proof_dir}/node04-temporary-chainfire.log" || true
curl -fsS http://127.0.0.1:18081/api/v1/cluster/status >"${proof_dir}/final-node01-rest-status.json"
curl -fsS http://127.0.0.1:18082/api/v1/cluster/status >"${proof_dir}/final-node02-rest-status.json"
curl -fsS http://127.0.0.1:18083/api/v1/cluster/status >"${proof_dir}/final-node03-rest-status.json"
validate_control_plane
finished_at="$(date -Iseconds)"
jq -n \
--arg started_at "${started_at}" \
--arg finished_at "${finished_at}" \
--arg artifact_root "${proof_dir}" \
--arg leader_before_transfer_id "${leader_before_transfer_id}" \
--arg transfer_target_id "${transfer_target_id}" \
--arg removed_leader_id "${removed_leader_id}" \
--arg new_leader_id "${new_leader_id}" \
--arg summary "started from the canonical 3-node ChainFire control plane, scaled out by adding node04 as a learner then voter, transferred leadership to another live voter, restarted the temporary voter and revalidated local reads, removed the live leader and waited for re-election, re-added the removed leader, then scaled back in to the canonical 3-node shape while proving local serializable reads on every membership transition" \
'{started_at:$started_at, finished_at:$finished_at, artifact_root:$artifact_root, leader_before_transfer_id:$leader_before_transfer_id, transfer_target_id:$transfer_target_id, removed_leader_id:$removed_leader_id, new_leader_id:$new_leader_id, summary:$summary, success:true}' \
>"${proof_dir}/result.json"
log "ChainFire live membership proof succeeded; artifacts are in ${proof_dir}"
}
validate_cluster() {
preflight
wait_requested
@ -10157,6 +10865,12 @@ rollout_soak_requested() {
run_rollout_soak
}
chainfire_live_membership_proof_requested() {
clean_requested "$@"
start_requested "$@"
run_chainfire_live_membership_proof
}
durability_proof_requested() {
start_requested "$@"
run_durability_proof
@ -10448,6 +11162,7 @@ Commands:
fresh-matrix clean local runtime state, rebuild on the host, start, and validate composed service configurations
provider-vm-reality-proof start the cluster if needed, then persist provider and VM-hosting interop artifacts under ./work/provider-vm-reality-proof
rollout-soak clean local runtime state, rebuild on the host, start, and persist a longer-run control-plane and rollout soak under ./work/rollout-soak
chainfire-live-membership-proof clean local runtime state, rebuild on the host, start, and persist live ChainFire scale-out/replace artifacts under ./work/chainfire-live-membership-proof
durability-proof start the cluster if needed, then persist durability and restore artifacts under ./work/durability-proof
bench-storage start the cluster and benchmark CoronaFS plus LightningStor against the current running VMs
fresh-bench-storage clean local runtime state, rebuild on the host, start, and benchmark CoronaFS plus LightningStor
@ -10482,6 +11197,7 @@ Examples:
$0 fresh-matrix
$0 provider-vm-reality-proof
$0 rollout-soak
$0 chainfire-live-membership-proof
$0 durability-proof
$0 bench-storage
$0 fresh-bench-storage
@ -10526,6 +11242,7 @@ main() {
fresh-matrix) fresh_matrix_requested "$@" ;;
provider-vm-reality-proof) provider_vm_reality_proof_requested "$@" ;;
rollout-soak) rollout_soak_requested "$@" ;;
chainfire-live-membership-proof) chainfire_live_membership_proof_requested "$@" ;;
durability-proof) durability_proof_requested "$@" ;;
bench-storage) bench_storage_requested "$@" ;;
fresh-bench-storage) fresh_bench_storage_requested "$@" ;;

View file

@ -95,8 +95,8 @@ main() {
fi
if (( rc == 0 )); then
run_case chainfire-membership-contract \
rg -n 'fixed-membership|replace-node|scale-out|unsupported on the supported surface' \
README.md docs TODO.md chainfire/crates/chainfire-api/src/cluster_service.rs || rc=$?
rg -n 'MemberAdd|MemberRemove|MemberList|LeaderTransfer|TimeoutNow|chainfire-live-membership-proof|current-leader removal|leader transfer|temporary-voter restart|one-voter transitions|joint consensus|live membership' \
README.md docs/control-plane-ops.md docs/testing.md nix/test-cluster/README.md chainfire/proto/chainfire.proto chainfire/crates/chainfire-api/src/cluster_service.rs || rc=$?
fi
if (( rc == 0 )); then
run_case flaredb-migration-contract \

View file

@ -224,6 +224,7 @@ main() {
run_case fresh-smoke nix run ./nix/test-cluster#cluster -- fresh-smoke
run_case fresh-demo-vm-webapp nix run ./nix/test-cluster#cluster -- fresh-demo-vm-webapp
run_case fresh-matrix nix run ./nix/test-cluster#cluster -- fresh-matrix
run_case chainfire-live-membership-proof nix run ./nix/test-cluster#cluster -- chainfire-live-membership-proof
log "publishable KVM suite passed; logs in ${LOG_DIR}"
}