# Chainfire Specification > Version: 1.0 | Status: Draft | Last Updated: 2025-12-08 ## 1. Overview ### 1.1 Purpose Chainfire is a distributed key-value store designed for cluster management with etcd-compatible semantics. It provides strongly consistent storage with MVCC (Multi-Version Concurrency Control), watch notifications, and transaction support. ### 1.2 Scope - **In scope**: Distributed KV storage, consensus (Raft), watch/subscribe, transactions, cluster membership - **Out of scope**: SQL queries, secondary indexes, full-text search ### 1.3 Design Goals - etcd API compatibility for ecosystem tooling - High availability via Raft consensus - Low latency for configuration management workloads - Simple deployment (single binary) ## 2. Architecture ### 2.1 Crate Structure ``` chainfire/ ├── crates/ │ ├── chainfire-api/ # gRPC service implementations │ ├── chainfire-core/ # Embeddable cluster library, config, callbacks │ ├── chainfire-gossip/ # SWIM gossip protocol (foca) │ ├── chainfire-raft/ # OpenRaft integration │ ├── chainfire-server/ # Server binary, config │ ├── chainfire-storage/ # RocksDB state machine │ ├── chainfire-types/ # Shared types (KV, Watch, Command) │ └── chainfire-watch/ # Watch registry ├── chainfire-client/ # Rust client library └── proto/ ├── chainfire.proto # Public API (KV, Watch, Cluster) └── internal.proto # Raft internal RPCs (Vote, AppendEntries) ``` ### 2.2 Data Flow ``` [Client gRPC] → [API Layer] → [Raft Node] → [State Machine] → [RocksDB] ↓ ↓ [Watch Registry] ← [Events] ``` ### 2.3 Dependencies | Crate | Version | Purpose | |-------|---------|---------| | tokio | 1.40 | Async runtime | | tonic | 0.12 | gRPC framework | | openraft | 0.9 | Raft consensus | | rocksdb | 0.24 | Storage engine | | foca | 1.0 | SWIM gossip protocol | | prost | 0.13 | Protocol buffers | | dashmap | 6 | Concurrent hash maps | ## 3. API ### 3.1 gRPC Services #### KV Service (`chainfire.v1.KV`) ```protobuf service KV { rpc Range(RangeRequest) returns (RangeResponse); rpc Put(PutRequest) returns (PutResponse); rpc Delete(DeleteRangeRequest) returns (DeleteRangeResponse); rpc Txn(TxnRequest) returns (TxnResponse); } ``` **Range (Get/Scan)** - Single key lookup: `key` set, `range_end` empty - Range scan: `key` = start, `range_end` = end (exclusive) - Prefix scan: `range_end` = prefix + 1 - Options: `limit`, `revision` (point-in-time), `keys_only`, `count_only` **Put** - Writes key-value pair - Optional: `lease` (TTL), `prev_kv` (return previous) **Delete** - Single key or range delete - Optional: `prev_kv` (return deleted values) **Transaction (Txn)** - Atomic compare-and-swap operations - `compare`: Conditions to check - `success`: Operations if all conditions pass - `failure`: Operations if any condition fails #### Watch Service (`chainfire.v1.Watch`) ```protobuf service Watch { rpc Watch(stream WatchRequest) returns (stream WatchResponse); } ``` - Bidirectional streaming - Supports: single key, prefix, range watches - Historical replay via `start_revision` - Progress notifications #### Cluster Service (`chainfire.v1.Cluster`) ```protobuf service Cluster { rpc MemberAdd(MemberAddRequest) returns (MemberAddResponse); rpc MemberRemove(MemberRemoveRequest) returns (MemberRemoveResponse); rpc MemberList(MemberListRequest) returns (MemberListResponse); rpc Status(StatusRequest) returns (StatusResponse); } ``` ### 3.2 Client Library ```rust use chainfire_client::Client; let mut client = Client::connect("http://127.0.0.1:2379").await?; // Put let revision = client.put("key", "value").await?; // Get let value = client.get("key").await?; // Option> // Get with string convenience let value = client.get_str("key").await?; // Option // Prefix scan let kvs = client.get_prefix("prefix/").await?; // Vec<(key, value, revision)> // Delete let deleted = client.delete("key").await?; // bool // Status let status = client.status().await?; println!("Leader: {}, Term: {}", status.leader, status.raft_term); ``` ### 3.3 Public Traits (chainfire-core) #### ClusterEventHandler ```rust #[async_trait] pub trait ClusterEventHandler: Send + Sync { async fn on_node_joined(&self, node: &NodeInfo) {} async fn on_node_left(&self, node_id: u64, reason: LeaveReason) {} async fn on_leader_changed(&self, old: Option, new: u64) {} async fn on_became_leader(&self) {} async fn on_lost_leadership(&self) {} async fn on_membership_changed(&self, members: &[NodeInfo]) {} async fn on_partition_detected(&self, reachable: &[u64], unreachable: &[u64]) {} async fn on_cluster_ready(&self) {} } ``` #### KvEventHandler ```rust #[async_trait] pub trait KvEventHandler: Send + Sync { async fn on_key_changed(&self, namespace: &str, key: &[u8], value: &[u8], revision: u64) {} async fn on_key_deleted(&self, namespace: &str, key: &[u8], revision: u64) {} async fn on_prefix_changed(&self, namespace: &str, prefix: &[u8], entries: &[KvEntry]) {} } ``` #### StorageBackend ```rust #[async_trait] pub trait StorageBackend: Send + Sync { async fn get(&self, key: &[u8]) -> io::Result>>; async fn put(&self, key: &[u8], value: &[u8]) -> io::Result<()>; async fn delete(&self, key: &[u8]) -> io::Result; } ``` ### 3.4 Embeddable Library (chainfire-core) ```rust use chainfire_core::{ClusterBuilder, ClusterEventHandler}; let cluster = ClusterBuilder::new(node_id) .name("node-1") .gossip_addr("0.0.0.0:7946".parse()?) .raft_addr("0.0.0.0:2380".parse()?) .on_cluster_event(MyHandler) .build() .await?; // Use the KVS cluster.kv().put("key", b"value").await?; ``` ## 4. Data Models ### 4.1 Core Types #### KeyValue Entry ```rust pub struct KvEntry { pub key: Vec, pub value: Vec, pub create_revision: u64, // Revision when created (immutable) pub mod_revision: u64, // Last modification revision pub version: u64, // Update count (1, 2, 3, ...) pub lease_id: Option, // Lease ID for TTL expiration } ``` #### Read Consistency Levels ```rust pub enum ReadConsistency { Local, // Read from local storage (may be stale) Serializable, // Verify with leader's committed index Linearizable, // Read only from leader (default, strongest) } ``` #### Watch Event ```rust pub enum WatchEventType { Put, Delete, } pub struct WatchEvent { pub event_type: WatchEventType, pub kv: KvEntry, pub prev_kv: Option, } ``` #### Response Header ```rust pub struct ResponseHeader { pub cluster_id: u64, pub member_id: u64, pub revision: u64, // Current store revision pub raft_term: u64, } ``` ### 4.2 Transaction Types ```rust pub struct Compare { pub key: Vec, pub target: CompareTarget, pub result: CompareResult, } pub enum CompareTarget { Version(u64), CreateRevision(u64), ModRevision(u64), Value(Vec), } pub enum CompareResult { Equal, NotEqual, Greater, Less, } ``` ### 4.3 Storage Format - **Engine**: RocksDB - **Column Families**: - `raft_logs`: Raft log entries - `raft_meta`: Raft metadata (vote, term, membership) - `key_value`: KV data (key bytes → serialized KvEntry) - `snapshot`: Snapshot metadata - **Metadata Keys**: `vote`, `last_applied`, `membership`, `revision`, `last_snapshot` - **Serialization**: bincode for Raft, Protocol Buffers for gRPC - **MVCC**: Global revision counter, per-key create/mod revisions ## 5. Configuration ### 5.1 Config File Format (TOML) ```toml [node] id = 1 name = "chainfire-1" role = "control_plane" # or "worker" [storage] data_dir = "/var/lib/chainfire" [network] api_addr = "0.0.0.0:2379" raft_addr = "0.0.0.0:2380" gossip_addr = "0.0.0.0:2381" [cluster] id = 1 bootstrap = true initial_members = [] [raft] role = "voter" # "voter", "learner", or "none" ``` ### 5.2 Environment Variables | Variable | Default | Description | |----------|---------|-------------| | CHAINFIRE_DATA_DIR | ./data | Data directory | | CHAINFIRE_API_ADDR | 127.0.0.1:2379 | Client API address | | CHAINFIRE_RAFT_ADDR | 127.0.0.1:2380 | Raft peer address | ### 5.3 Raft Tuning ```rust heartbeat_interval: 150ms // Leader heartbeat election_timeout_min: 300ms // Min election timeout election_timeout_max: 600ms // Max election timeout snapshot_policy: LogsSinceLast(5000) snapshot_max_chunk_size: 3MB max_payload_entries: 300 ``` ## 6. Security ### 6.1 Authentication - **Current**: None (development mode) - **Planned**: mTLS for peer communication, token-based client auth ### 6.2 Authorization - **Current**: All operations permitted - **Planned**: RBAC integration with IAM (aegis) ### 6.3 Multi-tenancy - **Namespace isolation**: Key prefix per tenant - **Planned**: Per-namespace quotas, ACLs via IAM ## 7. Operations ### 7.1 Deployment **Single Node (Bootstrap)** ```bash chainfire-server --config config.toml # With bootstrap = true in config ``` **Cluster (3-node)** ```bash # Node 1 (bootstrap) chainfire-server --config node1.toml # Node 2, 3 (join) # Set bootstrap = false, add node1 to initial_members chainfire-server --config node2.toml ``` ### 7.2 Monitoring - **Health**: gRPC health check service - **Metrics**: Prometheus endpoint (planned) - `chainfire_kv_operations_total` - `chainfire_raft_term` - `chainfire_storage_bytes` - `chainfire_watch_active` ### 7.3 Backup & Recovery - **Snapshot**: Automatic via Raft (every 5000 log entries) - **Manual backup**: Copy data_dir while stopped - **Point-in-time**: Use revision parameter in Range requests ## 8. Compatibility ### 8.1 API Versioning - gRPC package: `chainfire.v1` - Breaking changes: New major version (v2, v3) - Backward compatible: Add fields, new RPCs ### 8.2 Wire Protocol - Protocol Buffers 3 - tonic/prost for Rust - Compatible with any gRPC client ### 8.3 etcd Compatibility - **Compatible**: KV operations, Watch, basic transactions - **Different**: gRPC package names, some field names - **Not implemented**: Lease service, Auth service (planned) ## Appendix ### A. Error Codes | Error | Meaning | |-------|---------| | NOT_LEADER | Node is not the Raft leader | | KEY_NOT_FOUND | Key does not exist | | REVISION_COMPACTED | Requested revision no longer available | | TXN_FAILED | Transaction condition not met | ### B. Raft Commands ```rust pub enum RaftCommand { Put { key, value, lease_id, prev_kv }, Delete { key, prev_kv }, DeleteRange { start, end, prev_kv }, Txn { compare, success, failure }, Noop, // Leadership establishment } ``` ### C. Port Assignments | Port | Protocol | Purpose | |------|----------|---------| | 2379 | gRPC | Client API | | 2380 | gRPC | Raft peer | | 2381 | UDP | SWIM gossip | ### D. Node Roles ```rust /// Role in cluster gossip pub enum NodeRole { ControlPlane, // Participates in Raft consensus Worker, // Gossip only, watches Control Plane } /// Role in Raft consensus pub enum RaftRole { Voter, // Full voting member Learner, // Non-voting replica (receives log replication) None, // No Raft participation (agent/proxy only) } ``` ### E. Internal Raft RPCs (internal.proto) ```protobuf service RaftService { rpc Vote(VoteRequest) returns (VoteResponse); rpc AppendEntries(AppendEntriesRequest) returns (AppendEntriesResponse); rpc InstallSnapshot(stream InstallSnapshotRequest) returns (InstallSnapshotResponse); } ```