- netboot-base.nix with SSH key auth - Launch scripts for node01/02/03 - Node configuration.nix and disko.nix - Nix modules for first-boot automation 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> |
||
|---|---|---|
| .. | ||
| README.md | ||
Chainfire Specification
Version: 1.0 | Status: Draft | Last Updated: 2025-12-08
1. Overview
1.1 Purpose
Chainfire is a distributed key-value store designed for cluster management with etcd-compatible semantics. It provides strongly consistent storage with MVCC (Multi-Version Concurrency Control), watch notifications, and transaction support.
1.2 Scope
- In scope: Distributed KV storage, consensus (Raft), watch/subscribe, transactions, cluster membership
- Out of scope: SQL queries, secondary indexes, full-text search
1.3 Design Goals
- etcd API compatibility for ecosystem tooling
- High availability via Raft consensus
- Low latency for configuration management workloads
- Simple deployment (single binary)
2. Architecture
2.1 Crate Structure
chainfire/
├── crates/
│ ├── chainfire-api/ # gRPC service implementations
│ ├── chainfire-core/ # Embeddable cluster library, config, callbacks
│ ├── chainfire-gossip/ # SWIM gossip protocol (foca)
│ ├── chainfire-raft/ # OpenRaft integration
│ ├── chainfire-server/ # Server binary, config
│ ├── chainfire-storage/ # RocksDB state machine
│ ├── chainfire-types/ # Shared types (KV, Watch, Command)
│ └── chainfire-watch/ # Watch registry
├── chainfire-client/ # Rust client library
└── proto/
├── chainfire.proto # Public API (KV, Watch, Cluster)
└── internal.proto # Raft internal RPCs (Vote, AppendEntries)
2.2 Data Flow
[Client gRPC] → [API Layer] → [Raft Node] → [State Machine] → [RocksDB]
↓ ↓
[Watch Registry] ← [Events]
2.3 Dependencies
| Crate | Version | Purpose |
|---|---|---|
| tokio | 1.40 | Async runtime |
| tonic | 0.12 | gRPC framework |
| openraft | 0.9 | Raft consensus |
| rocksdb | 0.24 | Storage engine |
| foca | 1.0 | SWIM gossip protocol |
| prost | 0.13 | Protocol buffers |
| dashmap | 6 | Concurrent hash maps |
3. API
3.1 gRPC Services
KV Service (chainfire.v1.KV)
service KV {
rpc Range(RangeRequest) returns (RangeResponse);
rpc Put(PutRequest) returns (PutResponse);
rpc Delete(DeleteRangeRequest) returns (DeleteRangeResponse);
rpc Txn(TxnRequest) returns (TxnResponse);
}
Range (Get/Scan)
- Single key lookup:
keyset,range_endempty - Range scan:
key= start,range_end= end (exclusive) - Prefix scan:
range_end= prefix + 1 - Options:
limit,revision(point-in-time),keys_only,count_only
Put
- Writes key-value pair
- Optional:
lease(TTL),prev_kv(return previous)
Delete
- Single key or range delete
- Optional:
prev_kv(return deleted values)
Transaction (Txn)
- Atomic compare-and-swap operations
compare: Conditions to checksuccess: Operations if all conditions passfailure: Operations if any condition fails
Watch Service (chainfire.v1.Watch)
service Watch {
rpc Watch(stream WatchRequest) returns (stream WatchResponse);
}
- Bidirectional streaming
- Supports: single key, prefix, range watches
- Historical replay via
start_revision - Progress notifications
Cluster Service (chainfire.v1.Cluster)
service Cluster {
rpc MemberAdd(MemberAddRequest) returns (MemberAddResponse);
rpc MemberRemove(MemberRemoveRequest) returns (MemberRemoveResponse);
rpc MemberList(MemberListRequest) returns (MemberListResponse);
rpc Status(StatusRequest) returns (StatusResponse);
}
3.2 Client Library
use chainfire_client::Client;
let mut client = Client::connect("http://127.0.0.1:2379").await?;
// Put
let revision = client.put("key", "value").await?;
// Get
let value = client.get("key").await?; // Option<Vec<u8>>
// Get with string convenience
let value = client.get_str("key").await?; // Option<String>
// Prefix scan
let kvs = client.get_prefix("prefix/").await?; // Vec<(key, value, revision)>
// Delete
let deleted = client.delete("key").await?; // bool
// Status
let status = client.status().await?;
println!("Leader: {}, Term: {}", status.leader, status.raft_term);
3.3 Public Traits (chainfire-core)
ClusterEventHandler
#[async_trait]
pub trait ClusterEventHandler: Send + Sync {
async fn on_node_joined(&self, node: &NodeInfo) {}
async fn on_node_left(&self, node_id: u64, reason: LeaveReason) {}
async fn on_leader_changed(&self, old: Option<u64>, new: u64) {}
async fn on_became_leader(&self) {}
async fn on_lost_leadership(&self) {}
async fn on_membership_changed(&self, members: &[NodeInfo]) {}
async fn on_partition_detected(&self, reachable: &[u64], unreachable: &[u64]) {}
async fn on_cluster_ready(&self) {}
}
KvEventHandler
#[async_trait]
pub trait KvEventHandler: Send + Sync {
async fn on_key_changed(&self, namespace: &str, key: &[u8], value: &[u8], revision: u64) {}
async fn on_key_deleted(&self, namespace: &str, key: &[u8], revision: u64) {}
async fn on_prefix_changed(&self, namespace: &str, prefix: &[u8], entries: &[KvEntry]) {}
}
StorageBackend
#[async_trait]
pub trait StorageBackend: Send + Sync {
async fn get(&self, key: &[u8]) -> io::Result<Option<Vec<u8>>>;
async fn put(&self, key: &[u8], value: &[u8]) -> io::Result<()>;
async fn delete(&self, key: &[u8]) -> io::Result<bool>;
}
3.4 Embeddable Library (chainfire-core)
use chainfire_core::{ClusterBuilder, ClusterEventHandler};
let cluster = ClusterBuilder::new(node_id)
.name("node-1")
.gossip_addr("0.0.0.0:7946".parse()?)
.raft_addr("0.0.0.0:2380".parse()?)
.on_cluster_event(MyHandler)
.build()
.await?;
// Use the KVS
cluster.kv().put("key", b"value").await?;
4. Data Models
4.1 Core Types
KeyValue Entry
pub struct KvEntry {
pub key: Vec<u8>,
pub value: Vec<u8>,
pub create_revision: u64, // Revision when created (immutable)
pub mod_revision: u64, // Last modification revision
pub version: u64, // Update count (1, 2, 3, ...)
pub lease_id: Option<i64>, // Lease ID for TTL expiration
}
Read Consistency Levels
pub enum ReadConsistency {
Local, // Read from local storage (may be stale)
Serializable, // Verify with leader's committed index
Linearizable, // Read only from leader (default, strongest)
}
Watch Event
pub enum WatchEventType {
Put,
Delete,
}
pub struct WatchEvent {
pub event_type: WatchEventType,
pub kv: KvEntry,
pub prev_kv: Option<KvEntry>,
}
Response Header
pub struct ResponseHeader {
pub cluster_id: u64,
pub member_id: u64,
pub revision: u64, // Current store revision
pub raft_term: u64,
}
4.2 Transaction Types
pub struct Compare {
pub key: Vec<u8>,
pub target: CompareTarget,
pub result: CompareResult,
}
pub enum CompareTarget {
Version(u64),
CreateRevision(u64),
ModRevision(u64),
Value(Vec<u8>),
}
pub enum CompareResult {
Equal,
NotEqual,
Greater,
Less,
}
4.3 Storage Format
- Engine: RocksDB
- Column Families:
raft_logs: Raft log entriesraft_meta: Raft metadata (vote, term, membership)key_value: KV data (key bytes → serialized KvEntry)snapshot: Snapshot metadata
- Metadata Keys:
vote,last_applied,membership,revision,last_snapshot - Serialization: bincode for Raft, Protocol Buffers for gRPC
- MVCC: Global revision counter, per-key create/mod revisions
5. Configuration
5.1 Config File Format (TOML)
[node]
id = 1
name = "chainfire-1"
role = "control_plane" # or "worker"
[storage]
data_dir = "/var/lib/chainfire"
[network]
api_addr = "0.0.0.0:2379"
raft_addr = "0.0.0.0:2380"
gossip_addr = "0.0.0.0:2381"
[cluster]
id = 1
bootstrap = true
initial_members = []
[raft]
role = "voter" # "voter", "learner", or "none"
5.2 Environment Variables
| Variable | Default | Description |
|---|---|---|
| CHAINFIRE_DATA_DIR | ./data | Data directory |
| CHAINFIRE_API_ADDR | 127.0.0.1:2379 | Client API address |
| CHAINFIRE_RAFT_ADDR | 127.0.0.1:2380 | Raft peer address |
5.3 Raft Tuning
heartbeat_interval: 150ms // Leader heartbeat
election_timeout_min: 300ms // Min election timeout
election_timeout_max: 600ms // Max election timeout
snapshot_policy: LogsSinceLast(5000)
snapshot_max_chunk_size: 3MB
max_payload_entries: 300
6. Security
6.1 Authentication
- Current: None (development mode)
- Planned: mTLS for peer communication, token-based client auth
6.2 Authorization
- Current: All operations permitted
- Planned: RBAC integration with IAM (aegis)
6.3 Multi-tenancy
- Namespace isolation: Key prefix per tenant
- Planned: Per-namespace quotas, ACLs via IAM
7. Operations
7.1 Deployment
Single Node (Bootstrap)
chainfire-server --config config.toml
# With bootstrap = true in config
Cluster (3-node)
# Node 1 (bootstrap)
chainfire-server --config node1.toml
# Node 2, 3 (join)
# Set bootstrap = false, add node1 to initial_members
chainfire-server --config node2.toml
7.2 Monitoring
- Health: gRPC health check service
- Metrics: Prometheus endpoint (planned)
chainfire_kv_operations_totalchainfire_raft_termchainfire_storage_byteschainfire_watch_active
7.3 Backup & Recovery
- Snapshot: Automatic via Raft (every 5000 log entries)
- Manual backup: Copy data_dir while stopped
- Point-in-time: Use revision parameter in Range requests
8. Compatibility
8.1 API Versioning
- gRPC package:
chainfire.v1 - Breaking changes: New major version (v2, v3)
- Backward compatible: Add fields, new RPCs
8.2 Wire Protocol
- Protocol Buffers 3
- tonic/prost for Rust
- Compatible with any gRPC client
8.3 etcd Compatibility
- Compatible: KV operations, Watch, basic transactions
- Different: gRPC package names, some field names
- Not implemented: Lease service, Auth service (planned)
Appendix
A. Error Codes
| Error | Meaning |
|---|---|
| NOT_LEADER | Node is not the Raft leader |
| KEY_NOT_FOUND | Key does not exist |
| REVISION_COMPACTED | Requested revision no longer available |
| TXN_FAILED | Transaction condition not met |
B. Raft Commands
pub enum RaftCommand {
Put { key, value, lease_id, prev_kv },
Delete { key, prev_kv },
DeleteRange { start, end, prev_kv },
Txn { compare, success, failure },
Noop, // Leadership establishment
}
C. Port Assignments
| Port | Protocol | Purpose |
|---|---|---|
| 2379 | gRPC | Client API |
| 2380 | gRPC | Raft peer |
| 2381 | UDP | SWIM gossip |
D. Node Roles
/// Role in cluster gossip
pub enum NodeRole {
ControlPlane, // Participates in Raft consensus
Worker, // Gossip only, watches Control Plane
}
/// Role in Raft consensus
pub enum RaftRole {
Voter, // Full voting member
Learner, // Non-voting replica (receives log replication)
None, // No Raft participation (agent/proxy only)
}
E. Internal Raft RPCs (internal.proto)
service RaftService {
rpc Vote(VoteRequest) returns (VoteResponse);
rpc AppendEntries(AppendEntriesRequest) returns (AppendEntriesResponse);
rpc InstallSnapshot(stream InstallSnapshotRequest) returns (InstallSnapshotResponse);
}