Fix R8: Convert submodule gitlinks to regular directories

- Remove gitlinks (160000 mode) for chainfire, flaredb, iam
- Add workspace contents as regular tracked files
- Update flake.nix to use simple paths instead of builtins.fetchGit

This resolves the nix build failure where submodule directories
appeared empty in the nix store.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
centra 2025-12-09 16:51:20 +09:00
parent e4de4e8c66
commit 8f94aee1fa
253 changed files with 45639 additions and 19 deletions

@ -1 +0,0 @@
Subproject commit 0d970d80331e2a0d74a6c806f3576095bd083923

22
chainfire/.gitignore vendored Normal file
View file

@ -0,0 +1,22 @@
# Generated files
/target/
**/*.rs.bk
Cargo.lock
# IDE
.idea/
.vscode/
*.swp
*.swo
# OS
.DS_Store
Thumbs.db
# Test data
/tmp/
*.db/
# Environment
.env
.env.local

89
chainfire/Cargo.toml Normal file
View file

@ -0,0 +1,89 @@
[workspace]
resolver = "2"
members = [
"crates/chainfire-proto",
"crates/chainfire-types",
"crates/chainfire-storage",
"crates/chainfire-raft",
"crates/chainfire-gossip",
"crates/chainfire-watch",
"crates/chainfire-api",
"crates/chainfire-core",
"crates/chainfire-server",
"chainfire-client",
]
[workspace.package]
version = "0.1.0"
edition = "2021"
license = "MIT OR Apache-2.0"
rust-version = "1.75"
authors = ["Chainfire Contributors"]
repository = "https://github.com/chainfire/chainfire"
[workspace.dependencies]
# Internal crates
chainfire-types = { path = "crates/chainfire-types" }
chainfire-storage = { path = "crates/chainfire-storage" }
chainfire-raft = { path = "crates/chainfire-raft" }
chainfire-gossip = { path = "crates/chainfire-gossip" }
chainfire-watch = { path = "crates/chainfire-watch" }
chainfire-api = { path = "crates/chainfire-api" }
chainfire-client = { path = "chainfire-client" }
chainfire-core = { path = "crates/chainfire-core" }
chainfire-server = { path = "crates/chainfire-server" }
chainfire-proto = { path = "crates/chainfire-proto" }
# Async runtime
tokio = { version = "1.40", features = ["full"] }
tokio-stream = "0.1"
futures = "0.3"
async-trait = "0.1"
# Raft
openraft = { version = "0.9", features = ["serde", "storage-v2"] }
# Gossip (SWIM protocol)
foca = { version = "1.0", features = ["std", "tracing", "serde", "postcard-codec"] }
# Storage
rocksdb = { version = "0.24", default-features = false, features = ["multi-threaded-cf", "zstd", "lz4", "snappy"] }
# gRPC
tonic = "0.12"
tonic-build = "0.12"
tonic-health = "0.12"
prost = "0.13"
prost-types = "0.13"
# Serialization
serde = { version = "1.0", features = ["derive"] }
serde_json = "1.0"
bincode = "1.3"
# Utilities
thiserror = "1.0"
anyhow = "1.0"
tracing = "0.1"
tracing-subscriber = { version = "0.3", features = ["env-filter"] }
bytes = "1.5"
parking_lot = "0.12"
dashmap = "6"
# Metrics
metrics = "0.23"
metrics-exporter-prometheus = "0.15"
# Configuration
toml = "0.8"
clap = { version = "4", features = ["derive"] }
# Testing
tempfile = "3.10"
proptest = "1.4"
[workspace.lints.rust]
unsafe_code = "deny"
[workspace.lints.clippy]
all = "warn"

87
chainfire/advice.md Normal file
View file

@ -0,0 +1,87 @@
RaftとGossipプロトコルを用いた、クラスター管理のための数万台までスケールするKey-Value Storeを書いてほしいです。
- プログラミング言語rust
- テストをちゃんと書きながら書くことを推奨する。
- クラスターへの参加/削除/障害検知を行う。
では、**「Raft合意形成」と「Gossip情報の拡散」を組み合わせた場合、具体的にどうデータが流れ、どうやってードが動き出すのか**、その具体的なフローを解説します。
-----
### 前提:このシステムの役割分担
* **Control Plane (CP):** Raftで構成された3〜7台Raftアルゴリズムでうまく合意が取れる範囲のサーバー。情報の「正規の持ち主」。いなくなったら自動でWorker Nodesから昇格する。
* **Worker Nodes (VM/DB Hosts):** 数百〜数千台の実働部隊。CPのクライアント。
### 1\. データはどのように書き込まれるか? (Write)
書き込みは **「必ず Control Plane の Raft Leader に対して」** 行います。Gossip経由での書き込みは順序保証がないため行いません。
「VM-A を Node-10 で起動したい」
1. **API Call:** 管理者またはCLIが、CPのAPIサーバーにリクエストを送ります。
2. **Raft Log:** CPのリーダーは、この変更を `Put(Key="/nodes/node-10/tasks/vm-a", Value="START")` としてRaftログに追加します。
3. **Commit:** 過半数のCPードがログを保存したら「書き込み完了」と見なします。
ここまでは普通のDBと同じです。
### 2\. 各ノードはどのようにデータを取得し、通知を受けるか? (Read & Notify)
ここが最大のポイントです。数千台のードが「自分宛ての命令はないか」と毎秒ポーリング問い合わせすると、CPがDDoS攻撃を受けたようにパンクします。
ここで **「Watchロングポーリング」** という仕組みを使います。
#### A. Watchによる通知と取得これがメイン
Kubernetesやetcdが採用している方式です。
1. **接続維持:** Node-10 は起動時に CP に対して `Watch("/nodes/node-10/")` というリクエストを送ります。
2. **待機:** CP は「Node-10 以下のキーに変更があるまで、レスポンスを返さずに接続を維持(ブロック)」します。
3. **イベント発火:** 先ほどの書き込みVM起動命令が発生した瞬間、CP は待機していた Node-10 への接続を通じて「更新イベントEvent: PUT, Key: ...vm-a, Value: START」を即座にプッシュします。
4. **アクション:** Node-10 は通知を受け取り次第、VMを起動します。
**結論:** 「書き込み後の通知」は絶対に必要です。それを効率よくやるのが **Watch API** です。
-----
### 3\. じゃあ Gossip はどこで使うのか?
「Raft + Watch」で完結しそうに見えますが、10台以上のスケール、特にVM基盤のような動的な環境では **Gossip が以下の「3つの穴」を埋めるために不可欠** になります。
#### ① Nodeの死活監視・インベントリ管理下り方向
CPが「Node-10にVMを立てたい」と思ったとき、「そもそもNode-10は生きているのか IPは 空きメモリは?」という情報を知る必要があります。
* **Gossipの役割:** 各Worker Nodeは、GossipSWIMプロトコルでお互いに、そしてCPに対して「自分は生きてるよ、IPはこれだよ」と喋り続けます。
* CPはこのGossip情報を聞いて、最新の「ード一覧リストMemberlist」をメモリ上に保持します。
#### ② サービスのディスカバリ(横方向)
DB基盤の場合、「DBードA」が「DBードB」と通信したいことがあります。いちいちCPに聞きに行くと遅いです。
* **Gossipの役割:** ード同士で「私はここにいるよ」とGossipし合うことで、CPを介さずに直接通信相手を見つけられます。
#### ③ "Actual State"(現状)の報告(上り方向)
VMが起動した後、「起動しました」というステータスをどうCPに伝えるか。
* **Raftに書く:** 確実ですが、頻繁に変わるステータスCPU使用率などを全部Raftに書くとCPがパンクします。
* **Gossipで流す:** 「VM-Aは起動中、負荷50%」といった情報はGossipに乗せて、**「結果的にCPに伝わればいい」** という扱いにします。
-----
### 設計のまとめRaftとGossipの使い分け
これから作る「汎用管理DB外部依存型」は、以下のようなインターフェースを持つことになるでしょう。
| アクション | 通信方式 | 具体的なデータ例 |
| :--- | :--- | :--- |
| **命令 (Desired State)**<br>「こうなってほしい」 | **Raft + Watch**<br>(強整合性) | ・VMの起動/停止命令<br>・DBのデータ配置情報の変更<br>・パスワードや設定変更 |
| **現状 (Actual State)**<br>「今こうなってます」 | **Gossip**<br>(結果整合性) | ・ノードの生存確認 (Heartbeat)<br>・リソース使用率 (CPU/Mem)<br>・「VM起動完了」などのステータス |
| **通知 (Notification)** | **Watch (HTTP/gRPC Stream)** | ・「新しい命令が来たぞ!」というトリガー |
#### 実装のアドバイス
もし「etcdのようなもの」を自作されるなら、**「Serf (Gossip)」と「Raft」をライブラリとして組み込み、その上に「gRPCによるWatch付きのKVS API」を被せる** という構成になります。
これができれば、VM基盤は「Watchして、VMを起動して、Gossipでステータスを返すエージェント」を作るだけで済みますし、DB基盤も同様に作れます。非常にスケーラブルで美しい設計です。

View file

@ -0,0 +1,31 @@
[package]
name = "chainfire-client"
version.workspace = true
edition.workspace = true
license.workspace = true
rust-version.workspace = true
description = "Chainfire distributed KVS client library"
[dependencies]
chainfire-types = { workspace = true }
chainfire-proto = { workspace = true }
# gRPC
tonic = { workspace = true }
# Async
tokio = { workspace = true }
tokio-stream = { workspace = true }
futures = { workspace = true }
# Utilities
tracing = { workspace = true }
thiserror = { workspace = true }
serde = { workspace = true }
serde_json = { workspace = true }
[dev-dependencies]
tokio = { workspace = true, features = ["rt-multi-thread", "macros"] }
[lints]
workspace = true

View file

@ -0,0 +1,389 @@
//! Chainfire client implementation
use crate::error::{ClientError, Result};
use crate::watch::WatchHandle;
use chainfire_proto::proto::{
cluster_client::ClusterClient,
compare,
kv_client::KvClient,
request_op,
response_op,
watch_client::WatchClient,
Compare,
DeleteRangeRequest,
PutRequest,
RangeRequest,
RequestOp,
StatusRequest,
TxnRequest,
};
use tonic::transport::Channel;
use tracing::debug;
/// Chainfire client
pub struct Client {
/// gRPC channel
channel: Channel,
/// KV client
kv: KvClient<Channel>,
/// Cluster client
cluster: ClusterClient<Channel>,
}
impl Client {
/// Connect to a Chainfire server
pub async fn connect(addr: impl AsRef<str>) -> Result<Self> {
let addr = addr.as_ref().to_string();
debug!(addr = %addr, "Connecting to Chainfire");
let channel = Channel::from_shared(addr)
.map_err(|e| ClientError::Connection(e.to_string()))?
.connect()
.await?;
let kv = KvClient::new(channel.clone());
let cluster = ClusterClient::new(channel.clone());
Ok(Self {
channel,
kv,
cluster,
})
}
/// Put a key-value pair
pub async fn put(&mut self, key: impl AsRef<[u8]>, value: impl AsRef<[u8]>) -> Result<u64> {
let resp = self
.kv
.put(PutRequest {
key: key.as_ref().to_vec(),
value: value.as_ref().to_vec(),
lease: 0,
prev_kv: false,
})
.await?
.into_inner();
Ok(resp.header.map(|h| h.revision as u64).unwrap_or(0))
}
/// Put a key-value pair with string values
pub async fn put_str(&mut self, key: &str, value: &str) -> Result<u64> {
self.put(key.as_bytes(), value.as_bytes()).await
}
/// Get a value by key
pub async fn get(&mut self, key: impl AsRef<[u8]>) -> Result<Option<Vec<u8>>> {
Ok(self
.get_with_revision(key)
.await?
.map(|(value, _)| value))
}
/// Get a value by key along with its current revision
pub async fn get_with_revision(
&mut self,
key: impl AsRef<[u8]>,
) -> Result<Option<(Vec<u8>, u64)>> {
let resp = self
.kv
.range(RangeRequest {
key: key.as_ref().to_vec(),
range_end: vec![],
limit: 1,
revision: 0,
keys_only: false,
count_only: false,
serializable: false, // default: linearizable read
})
.await?
.into_inner();
Ok(resp.kvs.into_iter().next().map(|kv| (kv.value, kv.mod_revision as u64)))
}
/// Get a value as string
pub async fn get_str(&mut self, key: &str) -> Result<Option<String>> {
let value = self.get(key.as_bytes()).await?;
Ok(value.map(|v| String::from_utf8_lossy(&v).to_string()))
}
/// Delete a key
pub async fn delete(&mut self, key: impl AsRef<[u8]>) -> Result<bool> {
let resp = self
.kv
.delete(DeleteRangeRequest {
key: key.as_ref().to_vec(),
range_end: vec![],
prev_kv: false,
})
.await?
.into_inner();
Ok(resp.deleted > 0)
}
/// Get all keys with a prefix
pub async fn get_prefix(&mut self, prefix: impl AsRef<[u8]>) -> Result<Vec<(Vec<u8>, Vec<u8>)>> {
let prefix = prefix.as_ref();
let range_end = prefix_end(prefix);
let resp = self
.kv
.range(RangeRequest {
key: prefix.to_vec(),
range_end,
limit: 0,
revision: 0,
keys_only: false,
count_only: false,
serializable: false,
})
.await?
.into_inner();
Ok(resp.kvs.into_iter().map(|kv| (kv.key, kv.value)).collect())
}
/// Scan a prefix returning keys, values, and revisions
pub async fn scan_prefix(
&mut self,
prefix: impl AsRef<[u8]>,
limit: i64,
) -> Result<(Vec<(Vec<u8>, Vec<u8>, u64)>, Option<Vec<u8>>)> {
let prefix = prefix.as_ref();
let range_end = prefix_end(prefix);
let resp = self
.kv
.range(RangeRequest {
key: prefix.to_vec(),
range_end,
limit,
revision: 0,
keys_only: false,
count_only: false,
serializable: false,
})
.await?
.into_inner();
let more = resp.more;
let mut kvs: Vec<(Vec<u8>, Vec<u8>, u64)> = resp
.kvs
.into_iter()
.map(|kv| (kv.key, kv.value, kv.mod_revision as u64))
.collect();
let next_key = if more {
kvs.last()
.map(|(k, _, _)| {
let mut nk = k.clone();
nk.push(0);
nk
})
} else {
None
};
Ok((kvs, next_key))
}
/// Scan an arbitrary range [start, end)
pub async fn scan_range(
&mut self,
start: impl AsRef<[u8]>,
end: impl AsRef<[u8]>,
limit: i64,
) -> Result<(Vec<(Vec<u8>, Vec<u8>, u64)>, Option<Vec<u8>>)> {
let resp = self
.kv
.range(RangeRequest {
key: start.as_ref().to_vec(),
range_end: end.as_ref().to_vec(),
limit,
revision: 0,
keys_only: false,
count_only: false,
serializable: false,
})
.await?
.into_inner();
let more = resp.more;
let mut kvs: Vec<(Vec<u8>, Vec<u8>, u64)> = resp
.kvs
.into_iter()
.map(|kv| (kv.key, kv.value, kv.mod_revision as u64))
.collect();
let next_key = if more {
kvs.last()
.map(|(k, _, _)| {
let mut nk = k.clone();
nk.push(0);
nk
})
} else {
None
};
Ok((kvs, next_key))
}
/// Compare-and-swap based on key version
pub async fn compare_and_swap(
&mut self,
key: impl AsRef<[u8]>,
expected_version: u64,
value: impl AsRef<[u8]>,
) -> Result<CasOutcome> {
let key_bytes = key.as_ref().to_vec();
let put_op = RequestOp {
request: Some(request_op::Request::RequestPut(PutRequest {
key: key_bytes.clone(),
value: value.as_ref().to_vec(),
lease: 0,
prev_kv: false,
})),
};
// Fetch current value on failure to surface the actual version
let read_on_fail = RequestOp {
request: Some(request_op::Request::RequestRange(RangeRequest {
key: key_bytes.clone(),
range_end: vec![],
limit: 1,
revision: 0,
keys_only: false,
count_only: false,
serializable: true, // within txn, use serializable read
})),
};
let compare = Compare {
result: compare::CompareResult::Equal as i32,
target: compare::CompareTarget::Version as i32,
key: key_bytes.clone(),
target_union: Some(compare::TargetUnion::Version(expected_version as i64)),
};
let resp = self
.kv
.txn(TxnRequest {
compare: vec![compare],
success: vec![put_op],
failure: vec![read_on_fail],
})
.await?
.into_inner();
if resp.succeeded {
let new_version = resp
.header
.as_ref()
.map(|h| h.revision as u64)
.unwrap_or(0);
return Ok(CasOutcome {
success: true,
current_version: new_version,
new_version,
});
}
// On failure try to extract the current version from the range response
let current_version = resp
.responses
.into_iter()
.filter_map(|op| match op.response {
Some(response_op::Response::ResponseRange(r)) => r
.kvs
.into_iter()
.next()
.map(|kv| kv.mod_revision as u64),
_ => None,
})
.next()
.unwrap_or(0);
Ok(CasOutcome {
success: false,
current_version,
new_version: 0,
})
}
/// Watch a key or prefix for changes
pub async fn watch(&mut self, key: impl AsRef<[u8]>) -> Result<WatchHandle> {
let key = key.as_ref().to_vec();
let watch_client = WatchClient::new(self.channel.clone());
WatchHandle::new(watch_client, key, None).await
}
/// Watch all keys with a prefix
pub async fn watch_prefix(&mut self, prefix: impl AsRef<[u8]>) -> Result<WatchHandle> {
let prefix = prefix.as_ref().to_vec();
let range_end = prefix_end(&prefix);
let watch_client = WatchClient::new(self.channel.clone());
WatchHandle::new(watch_client, prefix, Some(range_end)).await
}
/// Get cluster status
pub async fn status(&mut self) -> Result<ClusterStatus> {
let resp = self
.cluster
.status(StatusRequest {})
.await?
.into_inner();
Ok(ClusterStatus {
version: resp.version,
leader: resp.leader,
raft_term: resp.raft_term,
})
}
}
/// Cluster status
#[derive(Debug, Clone)]
pub struct ClusterStatus {
/// Server version
pub version: String,
/// Current leader ID
pub leader: u64,
/// Current Raft term
pub raft_term: u64,
}
/// CAS outcome returned by compare_and_swap
#[derive(Debug, Clone)]
pub struct CasOutcome {
/// Whether CAS succeeded
pub success: bool,
/// Observed/current version
pub current_version: u64,
/// New version when succeeded
pub new_version: u64,
}
/// Calculate prefix end for range queries
fn prefix_end(prefix: &[u8]) -> Vec<u8> {
let mut end = prefix.to_vec();
for i in (0..end.len()).rev() {
if end[i] < 0xff {
end[i] += 1;
end.truncate(i + 1);
return end;
}
}
vec![]
}
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn test_prefix_end() {
assert_eq!(prefix_end(b"abc"), b"abd");
assert_eq!(prefix_end(b"/nodes/"), b"/nodes0");
}
}

View file

@ -0,0 +1,34 @@
//! Client error types
use thiserror::Error;
/// Result type for client operations
pub type Result<T> = std::result::Result<T, ClientError>;
/// Client error
#[derive(Error, Debug)]
pub enum ClientError {
/// Connection error
#[error("Connection error: {0}")]
Connection(String),
/// RPC error
#[error("RPC error: {0}")]
Rpc(#[from] tonic::Status),
/// Transport error
#[error("Transport error: {0}")]
Transport(#[from] tonic::transport::Error),
/// Key not found
#[error("Key not found: {0}")]
KeyNotFound(String),
/// Watch error
#[error("Watch error: {0}")]
Watch(String),
/// Internal error
#[error("Internal error: {0}")]
Internal(String),
}

View file

@ -0,0 +1,34 @@
//! Chainfire distributed KVS client library
//!
//! This crate provides a client for interacting with Chainfire clusters.
//!
//! # Example
//!
//! ```no_run
//! use chainfire_client::Client;
//!
//! #[tokio::main]
//! async fn main() -> Result<(), Box<dyn std::error::Error>> {
//! let mut client = Client::connect("http://127.0.0.1:2379").await?;
//!
//! // Put a value
//! client.put("/my/key", "my value").await?;
//!
//! // Get a value
//! if let Some(value) = client.get("/my/key").await? {
//! println!("Got: {}", String::from_utf8_lossy(&value));
//! }
//!
//! Ok(())
//! }
//! ```
mod client;
mod error;
pub mod node;
mod watch;
pub use client::{CasOutcome, Client};
pub use error::{ClientError, Result};
pub use node::{NodeCapacity, NodeFilter, NodeMetadata};
pub use watch::WatchHandle;

View file

@ -0,0 +1,333 @@
//! Node metadata helpers for Chainfire KVS
//!
//! This module provides helpers for storing and retrieving node metadata
//! in the Chainfire distributed KVS.
//!
//! # KVS Key Schema
//!
//! Node metadata is stored with the following key structure:
//! - `/nodes/<id>/info` - JSON-encoded NodeMetadata
//! - `/nodes/<id>/roles` - JSON-encoded roles (raft_role, gossip_role)
//! - `/nodes/<id>/capacity/cpu` - CPU cores (u32)
//! - `/nodes/<id>/capacity/memory_gb` - Memory in GB (u32)
//! - `/nodes/<id>/labels/<key>` - Custom labels (string)
//! - `/nodes/<id>/api_addr` - API address (string)
use crate::error::Result;
use crate::Client;
use chainfire_types::node::NodeRole;
use chainfire_types::RaftRole;
use serde::{Deserialize, Serialize};
use std::collections::HashMap;
/// Node metadata stored in KVS
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct NodeMetadata {
/// Unique node identifier
pub id: u64,
/// Human-readable node name
pub name: String,
/// Raft participation role
pub raft_role: RaftRole,
/// Gossip/cluster role
pub gossip_role: NodeRole,
/// API address for client connections
pub api_addr: String,
/// Raft address for inter-node communication (optional for workers)
pub raft_addr: Option<String>,
/// Gossip address for membership protocol
pub gossip_addr: String,
/// Node capacity information
pub capacity: NodeCapacity,
/// Custom labels for node selection
pub labels: HashMap<String, String>,
}
/// Node capacity information
#[derive(Debug, Clone, Default, Serialize, Deserialize)]
pub struct NodeCapacity {
/// Number of CPU cores
pub cpu_cores: u32,
/// Memory in gigabytes
pub memory_gb: u32,
/// Disk space in gigabytes (optional)
pub disk_gb: Option<u32>,
}
/// Filter for listing nodes
#[derive(Debug, Clone, Default)]
pub struct NodeFilter {
/// Filter by Raft role
pub raft_role: Option<RaftRole>,
/// Filter by gossip role
pub gossip_role: Option<NodeRole>,
/// Filter by labels (all must match)
pub labels: HashMap<String, String>,
}
impl NodeMetadata {
/// Create a new NodeMetadata for a control-plane node
pub fn control_plane(
id: u64,
name: impl Into<String>,
api_addr: impl Into<String>,
raft_addr: impl Into<String>,
gossip_addr: impl Into<String>,
) -> Self {
Self {
id,
name: name.into(),
raft_role: RaftRole::Voter,
gossip_role: NodeRole::ControlPlane,
api_addr: api_addr.into(),
raft_addr: Some(raft_addr.into()),
gossip_addr: gossip_addr.into(),
capacity: NodeCapacity::default(),
labels: HashMap::new(),
}
}
/// Create a new NodeMetadata for a worker node
pub fn worker(
id: u64,
name: impl Into<String>,
api_addr: impl Into<String>,
gossip_addr: impl Into<String>,
) -> Self {
Self {
id,
name: name.into(),
raft_role: RaftRole::None,
gossip_role: NodeRole::Worker,
api_addr: api_addr.into(),
raft_addr: None,
gossip_addr: gossip_addr.into(),
capacity: NodeCapacity::default(),
labels: HashMap::new(),
}
}
/// Set capacity information
pub fn with_capacity(mut self, cpu_cores: u32, memory_gb: u32) -> Self {
self.capacity.cpu_cores = cpu_cores;
self.capacity.memory_gb = memory_gb;
self
}
/// Add a label
pub fn with_label(mut self, key: impl Into<String>, value: impl Into<String>) -> Self {
self.labels.insert(key.into(), value.into());
self
}
}
/// Key prefix for all node metadata
const NODE_PREFIX: &str = "/nodes/";
/// Generate the key for node info
fn node_info_key(id: u64) -> String {
format!("{}{}/info", NODE_PREFIX, id)
}
/// Generate the key for a node label
fn node_label_key(id: u64, label: &str) -> String {
format!("{}{}/labels/{}", NODE_PREFIX, id, label)
}
/// Register a node in the cluster by storing its metadata in KVS
///
/// # Arguments
///
/// * `client` - The Chainfire client
/// * `meta` - Node metadata to register
///
/// # Returns
///
/// The revision number of the write operation
pub async fn register_node(client: &mut Client, meta: &NodeMetadata) -> Result<u64> {
let key = node_info_key(meta.id);
let value = serde_json::to_string(meta)
.map_err(|e| crate::error::ClientError::Internal(e.to_string()))?;
client.put_str(&key, &value).await
}
/// Update a specific node attribute
pub async fn update_node_label(
client: &mut Client,
node_id: u64,
label: &str,
value: &str,
) -> Result<u64> {
let key = node_label_key(node_id, label);
client.put_str(&key, value).await
}
/// Get a node's metadata by ID
///
/// # Arguments
///
/// * `client` - The Chainfire client
/// * `node_id` - The node ID to look up
///
/// # Returns
///
/// The node metadata if found, None otherwise
pub async fn get_node(client: &mut Client, node_id: u64) -> Result<Option<NodeMetadata>> {
let key = node_info_key(node_id);
let value = client.get_str(&key).await?;
match value {
Some(json) => {
let meta: NodeMetadata = serde_json::from_str(&json)
.map_err(|e| crate::error::ClientError::Internal(e.to_string()))?;
Ok(Some(meta))
}
None => Ok(None),
}
}
/// List all registered nodes
///
/// # Arguments
///
/// * `client` - The Chainfire client
/// * `filter` - Optional filter criteria
///
/// # Returns
///
/// A list of node metadata matching the filter
pub async fn list_nodes(client: &mut Client, filter: &NodeFilter) -> Result<Vec<NodeMetadata>> {
let prefix = format!("{}", NODE_PREFIX);
let entries = client.get_prefix(&prefix).await?;
let mut nodes = Vec::new();
for (key, value) in entries {
let key_str = String::from_utf8_lossy(&key);
// Only process /nodes/<id>/info keys
if !key_str.ends_with("/info") {
continue;
}
let json = String::from_utf8_lossy(&value);
if let Ok(meta) = serde_json::from_str::<NodeMetadata>(&json) {
// Apply filters
if let Some(ref raft_role) = filter.raft_role {
if meta.raft_role != *raft_role {
continue;
}
}
if let Some(ref gossip_role) = filter.gossip_role {
if meta.gossip_role != *gossip_role {
continue;
}
}
// Check label filters
let mut labels_match = true;
for (k, v) in &filter.labels {
match meta.labels.get(k) {
Some(node_v) if node_v == v => {}
_ => {
labels_match = false;
break;
}
}
}
if labels_match {
nodes.push(meta);
}
}
}
// Sort by node ID for consistent ordering
nodes.sort_by_key(|n| n.id);
Ok(nodes)
}
/// Unregister a node from the cluster
///
/// # Arguments
///
/// * `client` - The Chainfire client
/// * `node_id` - The node ID to unregister
///
/// # Returns
///
/// True if the node was found and deleted
pub async fn unregister_node(client: &mut Client, node_id: u64) -> Result<bool> {
let key = node_info_key(node_id);
client.delete(&key).await
}
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn test_node_info_key() {
assert_eq!(node_info_key(1), "/nodes/1/info");
assert_eq!(node_info_key(123), "/nodes/123/info");
}
#[test]
fn test_node_label_key() {
assert_eq!(node_label_key(1, "zone"), "/nodes/1/labels/zone");
}
#[test]
fn test_control_plane_metadata() {
let meta = NodeMetadata::control_plane(
1,
"cp-1",
"127.0.0.1:2379",
"127.0.0.1:2380",
"127.0.0.1:2381",
);
assert_eq!(meta.id, 1);
assert_eq!(meta.raft_role, RaftRole::Voter);
assert_eq!(meta.gossip_role, NodeRole::ControlPlane);
assert!(meta.raft_addr.is_some());
}
#[test]
fn test_worker_metadata() {
let meta = NodeMetadata::worker(100, "worker-1", "127.0.0.1:3379", "127.0.0.1:3381");
assert_eq!(meta.id, 100);
assert_eq!(meta.raft_role, RaftRole::None);
assert_eq!(meta.gossip_role, NodeRole::Worker);
assert!(meta.raft_addr.is_none());
}
#[test]
fn test_metadata_with_capacity() {
let meta = NodeMetadata::worker(1, "worker", "addr", "gossip")
.with_capacity(8, 32)
.with_label("zone", "us-west-1");
assert_eq!(meta.capacity.cpu_cores, 8);
assert_eq!(meta.capacity.memory_gb, 32);
assert_eq!(meta.labels.get("zone"), Some(&"us-west-1".to_string()));
}
#[test]
fn test_metadata_serialization() {
let meta = NodeMetadata::control_plane(1, "test", "api", "raft", "gossip")
.with_capacity(4, 16)
.with_label("env", "prod");
let json = serde_json::to_string(&meta).unwrap();
let deserialized: NodeMetadata = serde_json::from_str(&json).unwrap();
assert_eq!(meta.id, deserialized.id);
assert_eq!(meta.raft_role, deserialized.raft_role);
assert_eq!(meta.capacity.cpu_cores, deserialized.capacity.cpu_cores);
}
}

View file

@ -0,0 +1,143 @@
//! Watch functionality
use crate::error::{ClientError, Result};
use chainfire_proto::proto::{
watch_client::WatchClient, watch_request, Event, WatchCreateRequest, WatchRequest,
};
use futures::StreamExt;
use tokio::sync::mpsc;
use tonic::transport::Channel;
use tracing::{debug, warn};
/// Event received from a watch
#[derive(Debug, Clone)]
pub struct WatchEvent {
/// Event type (Put or Delete)
pub event_type: EventType,
/// Key that changed
pub key: Vec<u8>,
/// New value (for Put events)
pub value: Vec<u8>,
/// Revision of the change
pub revision: u64,
}
/// Type of watch event
#[derive(Debug, Clone, Copy, PartialEq, Eq)]
pub enum EventType {
Put,
Delete,
}
/// Handle to a watch stream
pub struct WatchHandle {
/// Watch ID
watch_id: i64,
/// Event receiver
rx: mpsc::Receiver<WatchEvent>,
}
impl WatchHandle {
/// Create a new watch
pub(crate) async fn new(
mut client: WatchClient<Channel>,
key: Vec<u8>,
range_end: Option<Vec<u8>>,
) -> Result<Self> {
let (tx, rx) = mpsc::channel(64);
let (req_tx, req_rx) = mpsc::channel(16);
// Send initial create request
let create_req = WatchRequest {
request_union: Some(watch_request::RequestUnion::CreateRequest(
WatchCreateRequest {
key,
range_end: range_end.unwrap_or_default(),
start_revision: 0,
progress_notify: false,
prev_kv: false,
watch_id: 0,
},
)),
};
req_tx
.send(create_req)
.await
.map_err(|_| ClientError::Watch("Failed to send create request".into()))?;
// Create bidirectional stream
let req_stream = tokio_stream::wrappers::ReceiverStream::new(req_rx);
let mut resp_stream = client.watch(req_stream).await?.into_inner();
// Wait for creation confirmation
let first_resp = resp_stream
.next()
.await
.ok_or_else(|| ClientError::Watch("No response from server".into()))?
.map_err(ClientError::Rpc)?;
if !first_resp.created {
return Err(ClientError::Watch("Watch creation failed".into()));
}
let watch_id = first_resp.watch_id;
debug!(watch_id, "Watch created");
// Spawn task to process events
tokio::spawn(async move {
while let Some(result) = resp_stream.next().await {
match result {
Ok(resp) => {
if resp.canceled {
debug!(watch_id = resp.watch_id, "Watch canceled");
break;
}
for event in resp.events {
let watch_event = convert_event(event);
if tx.send(watch_event).await.is_err() {
break;
}
}
}
Err(e) => {
warn!(error = %e, "Watch stream error");
break;
}
}
}
});
Ok(Self { watch_id, rx })
}
/// Get the watch ID
pub fn id(&self) -> i64 {
self.watch_id
}
/// Receive the next event
pub async fn recv(&mut self) -> Option<WatchEvent> {
self.rx.recv().await
}
}
fn convert_event(event: Event) -> WatchEvent {
let event_type = if event.r#type == 0 {
EventType::Put
} else {
EventType::Delete
};
let (key, value, revision) = event.kv.map(|kv| {
(kv.key, kv.value, kv.mod_revision as u64)
}).unwrap_or_default();
WatchEvent {
event_type,
key,
value,
revision,
}
}

View file

@ -0,0 +1,42 @@
[package]
name = "chainfire-api"
version.workspace = true
edition.workspace = true
license.workspace = true
rust-version.workspace = true
description = "gRPC API layer for Chainfire distributed KVS"
[dependencies]
chainfire-types = { workspace = true }
chainfire-storage = { workspace = true }
chainfire-raft = { workspace = true }
chainfire-watch = { workspace = true }
# gRPC
tonic = { workspace = true }
prost = { workspace = true }
prost-types = { workspace = true }
# Async
tokio = { workspace = true }
tokio-stream = { workspace = true }
futures = { workspace = true }
async-trait = { workspace = true }
# Raft
openraft = { workspace = true }
# Serialization
bincode = { workspace = true }
# Utilities
tracing = { workspace = true }
[build-dependencies]
tonic-build = { workspace = true }
[dev-dependencies]
tokio = { workspace = true, features = ["rt-multi-thread", "macros"] }
[lints]
workspace = true

View file

@ -0,0 +1,19 @@
fn main() -> Result<(), Box<dyn std::error::Error>> {
// Compile the protobuf files to OUT_DIR (default location for include_proto!)
tonic_build::configure()
.build_server(true)
.build_client(true)
.compile_protos(
&[
"../../proto/chainfire.proto",
"../../proto/internal.proto",
],
&["../../proto"],
)?;
// Tell cargo to rerun if proto files change
println!("cargo:rerun-if-changed=../../proto/chainfire.proto");
println!("cargo:rerun-if-changed=../../proto/internal.proto");
Ok(())
}

View file

@ -0,0 +1,216 @@
//! Cluster management service implementation
//!
//! This service handles cluster membership operations including adding,
//! removing, and listing members.
use crate::conversions::make_header;
use crate::proto::{
cluster_server::Cluster, Member, MemberAddRequest, MemberAddResponse, MemberListRequest,
MemberListResponse, MemberRemoveRequest, MemberRemoveResponse, StatusRequest, StatusResponse,
};
use chainfire_raft::RaftNode;
use openraft::BasicNode;
use std::collections::hash_map::DefaultHasher;
use std::collections::BTreeMap;
use std::hash::{Hash, Hasher};
use std::sync::atomic::{AtomicU64, Ordering};
use std::sync::Arc;
use std::time::{SystemTime, UNIX_EPOCH};
use tonic::{Request, Response, Status};
use tracing::{debug, info, warn};
/// Generate a unique member ID based on timestamp and counter
fn generate_member_id() -> u64 {
static COUNTER: AtomicU64 = AtomicU64::new(0);
let counter = COUNTER.fetch_add(1, Ordering::Relaxed);
let timestamp = SystemTime::now()
.duration_since(UNIX_EPOCH)
.unwrap_or_default()
.as_nanos() as u64;
let mut hasher = DefaultHasher::new();
(timestamp, counter, std::process::id()).hash(&mut hasher);
hasher.finish()
}
/// Cluster service implementation
pub struct ClusterServiceImpl {
/// Raft node
raft: Arc<RaftNode>,
/// Cluster ID
cluster_id: u64,
/// Server version
version: String,
}
impl ClusterServiceImpl {
/// Create a new cluster service
pub fn new(raft: Arc<RaftNode>, cluster_id: u64) -> Self {
Self {
raft,
cluster_id,
version: env!("CARGO_PKG_VERSION").to_string(),
}
}
fn make_header(&self, revision: u64) -> crate::proto::ResponseHeader {
make_header(self.cluster_id, self.raft.id(), revision, 0)
}
/// Get current members as proto Member list
async fn get_member_list(&self) -> Vec<Member> {
self.raft
.membership()
.await
.iter()
.map(|&id| Member {
id,
name: format!("node-{}", id),
peer_urls: vec![],
client_urls: vec![],
is_learner: false,
})
.collect()
}
}
#[tonic::async_trait]
impl Cluster for ClusterServiceImpl {
async fn member_add(
&self,
request: Request<MemberAddRequest>,
) -> Result<Response<MemberAddResponse>, Status> {
let req = request.into_inner();
debug!(peer_urls = ?req.peer_urls, is_learner = req.is_learner, "Member add request");
// Generate new member ID
let member_id = generate_member_id();
// Create BasicNode for the new member
let node = BasicNode::default();
// Add as learner first (safer for cluster stability)
match self.raft.add_learner(member_id, node, true).await {
Ok(()) => {
info!(member_id, "Added learner node");
// If not explicitly a learner, promote to voter
if !req.is_learner {
// Get current membership and add new member
let mut members: BTreeMap<u64, BasicNode> = self
.raft
.membership()
.await
.iter()
.map(|&id| (id, BasicNode::default()))
.collect();
members.insert(member_id, BasicNode::default());
if let Err(e) = self.raft.change_membership(members, false).await {
warn!(error = %e, member_id, "Failed to promote learner to voter");
// Still return success for the learner add
} else {
info!(member_id, "Promoted learner to voter");
}
}
let new_member = Member {
id: member_id,
name: String::new(),
peer_urls: req.peer_urls,
client_urls: vec![],
is_learner: req.is_learner,
};
Ok(Response::new(MemberAddResponse {
header: Some(self.make_header(0)),
member: Some(new_member),
members: self.get_member_list().await,
}))
}
Err(e) => {
warn!(error = %e, "Failed to add member");
Err(Status::internal(format!("Failed to add member: {}", e)))
}
}
}
async fn member_remove(
&self,
request: Request<MemberRemoveRequest>,
) -> Result<Response<MemberRemoveResponse>, Status> {
let req = request.into_inner();
debug!(member_id = req.id, "Member remove request");
// Get current membership and remove the member
let mut members: BTreeMap<u64, BasicNode> = self
.raft
.membership()
.await
.iter()
.map(|&id| (id, BasicNode::default()))
.collect();
if !members.contains_key(&req.id) {
return Err(Status::not_found(format!(
"Member {} not found in cluster",
req.id
)));
}
members.remove(&req.id);
match self.raft.change_membership(members, false).await {
Ok(()) => {
info!(member_id = req.id, "Removed member from cluster");
Ok(Response::new(MemberRemoveResponse {
header: Some(self.make_header(0)),
members: self.get_member_list().await,
}))
}
Err(e) => {
warn!(error = %e, member_id = req.id, "Failed to remove member");
Err(Status::internal(format!("Failed to remove member: {}", e)))
}
}
}
async fn member_list(
&self,
_request: Request<MemberListRequest>,
) -> Result<Response<MemberListResponse>, Status> {
debug!("Member list request");
Ok(Response::new(MemberListResponse {
header: Some(self.make_header(0)),
members: self.get_member_list().await,
}))
}
async fn status(
&self,
_request: Request<StatusRequest>,
) -> Result<Response<StatusResponse>, Status> {
debug!("Status request");
let leader = self.raft.leader().await;
let term = self.raft.current_term().await;
let is_leader = self.raft.is_leader().await;
// Get storage info from Raft node
let storage = self.raft.storage();
let storage_guard = storage.read().await;
let sm = storage_guard.state_machine().read().await;
let revision = sm.current_revision();
Ok(Response::new(StatusResponse {
header: Some(self.make_header(revision)),
version: self.version.clone(),
db_size: 0, // TODO: get actual RocksDB size
leader: leader.unwrap_or(0),
raft_index: revision,
raft_term: term,
raft_applied_index: revision,
}))
}
}

View file

@ -0,0 +1,113 @@
//! Conversions between protobuf types and internal types
use crate::proto;
use chainfire_types::kv::KvEntry;
use chainfire_types::watch::{WatchEvent, WatchEventType, WatchRequest as InternalWatchRequest};
use chainfire_types::Revision;
/// Convert internal KvEntry to proto KeyValue
impl From<KvEntry> for proto::KeyValue {
fn from(entry: KvEntry) -> Self {
Self {
key: entry.key,
value: entry.value,
create_revision: entry.create_revision as i64,
mod_revision: entry.mod_revision as i64,
version: entry.version as i64,
lease: entry.lease_id.unwrap_or(0),
}
}
}
/// Convert proto KeyValue to internal KvEntry
impl From<proto::KeyValue> for KvEntry {
fn from(kv: proto::KeyValue) -> Self {
Self {
key: kv.key,
value: kv.value,
create_revision: kv.create_revision as u64,
mod_revision: kv.mod_revision as u64,
version: kv.version as u64,
lease_id: if kv.lease != 0 { Some(kv.lease) } else { None },
}
}
}
/// Convert internal WatchEvent to proto Event
impl From<WatchEvent> for proto::Event {
fn from(event: WatchEvent) -> Self {
Self {
r#type: match event.event_type {
WatchEventType::Put => proto::event::EventType::Put as i32,
WatchEventType::Delete => proto::event::EventType::Delete as i32,
},
kv: Some(event.kv.into()),
prev_kv: event.prev_kv.map(Into::into),
}
}
}
/// Convert proto WatchCreateRequest to internal WatchRequest
impl From<proto::WatchCreateRequest> for InternalWatchRequest {
fn from(req: proto::WatchCreateRequest) -> Self {
Self {
watch_id: req.watch_id,
key: req.key,
range_end: if req.range_end.is_empty() {
None
} else {
Some(req.range_end)
},
start_revision: if req.start_revision > 0 {
Some(req.start_revision as Revision)
} else {
None
},
prev_kv: req.prev_kv,
progress_notify: req.progress_notify,
}
}
}
/// Create a response header
pub fn make_header(
cluster_id: u64,
member_id: u64,
revision: Revision,
raft_term: u64,
) -> proto::ResponseHeader {
proto::ResponseHeader {
cluster_id,
member_id,
revision: revision as i64,
raft_term,
}
}
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn test_kv_entry_conversion() {
let entry = KvEntry::new(b"key".to_vec(), b"value".to_vec(), 1);
let proto_kv: proto::KeyValue = entry.clone().into();
assert_eq!(proto_kv.key, b"key");
assert_eq!(proto_kv.value, b"value");
assert_eq!(proto_kv.create_revision, 1);
let back: KvEntry = proto_kv.into();
assert_eq!(back.key, entry.key);
assert_eq!(back.value, entry.value);
}
#[test]
fn test_watch_event_conversion() {
let kv = KvEntry::new(b"key".to_vec(), b"value".to_vec(), 1);
let event = WatchEvent::put(kv, None);
let proto_event: proto::Event = event.into();
assert_eq!(proto_event.r#type, proto::event::EventType::Put as i32);
}
}

View file

@ -0,0 +1,566 @@
// This file is @generated by prost-build.
#[derive(Clone, Copy, PartialEq, ::prost::Message)]
pub struct VoteRequest {
/// term is the candidate's term
#[prost(uint64, tag = "1")]
pub term: u64,
/// candidate_id is the candidate requesting the vote
#[prost(uint64, tag = "2")]
pub candidate_id: u64,
/// last_log_index is index of candidate's last log entry
#[prost(uint64, tag = "3")]
pub last_log_index: u64,
/// last_log_term is term of candidate's last log entry
#[prost(uint64, tag = "4")]
pub last_log_term: u64,
}
#[derive(Clone, Copy, PartialEq, ::prost::Message)]
pub struct VoteResponse {
/// term is the current term for the voter
#[prost(uint64, tag = "1")]
pub term: u64,
/// vote_granted is true if the candidate received the vote
#[prost(bool, tag = "2")]
pub vote_granted: bool,
/// last_log_id is the voter's last log ID
#[prost(uint64, tag = "3")]
pub last_log_index: u64,
#[prost(uint64, tag = "4")]
pub last_log_term: u64,
}
#[derive(Clone, PartialEq, ::prost::Message)]
pub struct AppendEntriesRequest {
/// term is the leader's term
#[prost(uint64, tag = "1")]
pub term: u64,
/// leader_id is the leader's ID
#[prost(uint64, tag = "2")]
pub leader_id: u64,
/// prev_log_index is index of log entry immediately preceding new ones
#[prost(uint64, tag = "3")]
pub prev_log_index: u64,
/// prev_log_term is term of prev_log_index entry
#[prost(uint64, tag = "4")]
pub prev_log_term: u64,
/// entries are log entries to append
#[prost(message, repeated, tag = "5")]
pub entries: ::prost::alloc::vec::Vec<LogEntry>,
/// leader_commit is leader's commit index
#[prost(uint64, tag = "6")]
pub leader_commit: u64,
}
#[derive(Clone, PartialEq, ::prost::Message)]
pub struct LogEntry {
/// index is the log entry index
#[prost(uint64, tag = "1")]
pub index: u64,
/// term is the term when entry was received
#[prost(uint64, tag = "2")]
pub term: u64,
/// data is the command data
#[prost(bytes = "vec", tag = "3")]
pub data: ::prost::alloc::vec::Vec<u8>,
}
#[derive(Clone, Copy, PartialEq, ::prost::Message)]
pub struct AppendEntriesResponse {
/// term is the current term
#[prost(uint64, tag = "1")]
pub term: u64,
/// success is true if follower contained entry matching prevLogIndex
#[prost(bool, tag = "2")]
pub success: bool,
/// conflict_index is the first conflicting index (for optimization)
#[prost(uint64, tag = "3")]
pub conflict_index: u64,
/// conflict_term is the term of the conflicting entry
#[prost(uint64, tag = "4")]
pub conflict_term: u64,
}
#[derive(Clone, PartialEq, ::prost::Message)]
pub struct InstallSnapshotRequest {
/// term is the leader's term
#[prost(uint64, tag = "1")]
pub term: u64,
/// leader_id is the leader's ID
#[prost(uint64, tag = "2")]
pub leader_id: u64,
/// last_included_index is the snapshot replaces all entries up through and including this index
#[prost(uint64, tag = "3")]
pub last_included_index: u64,
/// last_included_term is term of last_included_index
#[prost(uint64, tag = "4")]
pub last_included_term: u64,
/// offset is byte offset where chunk is positioned in the snapshot file
#[prost(uint64, tag = "5")]
pub offset: u64,
/// data is raw bytes of the snapshot chunk
#[prost(bytes = "vec", tag = "6")]
pub data: ::prost::alloc::vec::Vec<u8>,
/// done is true if this is the last chunk
#[prost(bool, tag = "7")]
pub done: bool,
}
#[derive(Clone, Copy, PartialEq, ::prost::Message)]
pub struct InstallSnapshotResponse {
/// term is the current term
#[prost(uint64, tag = "1")]
pub term: u64,
}
/// Generated client implementations.
pub mod raft_service_client {
#![allow(
unused_variables,
dead_code,
missing_docs,
clippy::wildcard_imports,
clippy::let_unit_value,
)]
use tonic::codegen::*;
use tonic::codegen::http::Uri;
/// Internal Raft RPC service for node-to-node communication
#[derive(Debug, Clone)]
pub struct RaftServiceClient<T> {
inner: tonic::client::Grpc<T>,
}
impl RaftServiceClient<tonic::transport::Channel> {
/// Attempt to create a new client by connecting to a given endpoint.
pub async fn connect<D>(dst: D) -> Result<Self, tonic::transport::Error>
where
D: TryInto<tonic::transport::Endpoint>,
D::Error: Into<StdError>,
{
let conn = tonic::transport::Endpoint::new(dst)?.connect().await?;
Ok(Self::new(conn))
}
}
impl<T> RaftServiceClient<T>
where
T: tonic::client::GrpcService<tonic::body::BoxBody>,
T::Error: Into<StdError>,
T::ResponseBody: Body<Data = Bytes> + std::marker::Send + 'static,
<T::ResponseBody as Body>::Error: Into<StdError> + std::marker::Send,
{
pub fn new(inner: T) -> Self {
let inner = tonic::client::Grpc::new(inner);
Self { inner }
}
pub fn with_origin(inner: T, origin: Uri) -> Self {
let inner = tonic::client::Grpc::with_origin(inner, origin);
Self { inner }
}
pub fn with_interceptor<F>(
inner: T,
interceptor: F,
) -> RaftServiceClient<InterceptedService<T, F>>
where
F: tonic::service::Interceptor,
T::ResponseBody: Default,
T: tonic::codegen::Service<
http::Request<tonic::body::BoxBody>,
Response = http::Response<
<T as tonic::client::GrpcService<tonic::body::BoxBody>>::ResponseBody,
>,
>,
<T as tonic::codegen::Service<
http::Request<tonic::body::BoxBody>,
>>::Error: Into<StdError> + std::marker::Send + std::marker::Sync,
{
RaftServiceClient::new(InterceptedService::new(inner, interceptor))
}
/// Compress requests with the given encoding.
///
/// This requires the server to support it otherwise it might respond with an
/// error.
#[must_use]
pub fn send_compressed(mut self, encoding: CompressionEncoding) -> Self {
self.inner = self.inner.send_compressed(encoding);
self
}
/// Enable decompressing responses.
#[must_use]
pub fn accept_compressed(mut self, encoding: CompressionEncoding) -> Self {
self.inner = self.inner.accept_compressed(encoding);
self
}
/// Limits the maximum size of a decoded message.
///
/// Default: `4MB`
#[must_use]
pub fn max_decoding_message_size(mut self, limit: usize) -> Self {
self.inner = self.inner.max_decoding_message_size(limit);
self
}
/// Limits the maximum size of an encoded message.
///
/// Default: `usize::MAX`
#[must_use]
pub fn max_encoding_message_size(mut self, limit: usize) -> Self {
self.inner = self.inner.max_encoding_message_size(limit);
self
}
/// Vote requests a vote from a peer
pub async fn vote(
&mut self,
request: impl tonic::IntoRequest<super::VoteRequest>,
) -> std::result::Result<tonic::Response<super::VoteResponse>, tonic::Status> {
self.inner
.ready()
.await
.map_err(|e| {
tonic::Status::unknown(
format!("Service was not ready: {}", e.into()),
)
})?;
let codec = tonic::codec::ProstCodec::default();
let path = http::uri::PathAndQuery::from_static(
"/chainfire.internal.RaftService/Vote",
);
let mut req = request.into_request();
req.extensions_mut()
.insert(GrpcMethod::new("chainfire.internal.RaftService", "Vote"));
self.inner.unary(req, path, codec).await
}
/// AppendEntries sends log entries to followers
pub async fn append_entries(
&mut self,
request: impl tonic::IntoRequest<super::AppendEntriesRequest>,
) -> std::result::Result<
tonic::Response<super::AppendEntriesResponse>,
tonic::Status,
> {
self.inner
.ready()
.await
.map_err(|e| {
tonic::Status::unknown(
format!("Service was not ready: {}", e.into()),
)
})?;
let codec = tonic::codec::ProstCodec::default();
let path = http::uri::PathAndQuery::from_static(
"/chainfire.internal.RaftService/AppendEntries",
);
let mut req = request.into_request();
req.extensions_mut()
.insert(
GrpcMethod::new("chainfire.internal.RaftService", "AppendEntries"),
);
self.inner.unary(req, path, codec).await
}
/// InstallSnapshot sends a snapshot to a follower
pub async fn install_snapshot(
&mut self,
request: impl tonic::IntoStreamingRequest<
Message = super::InstallSnapshotRequest,
>,
) -> std::result::Result<
tonic::Response<super::InstallSnapshotResponse>,
tonic::Status,
> {
self.inner
.ready()
.await
.map_err(|e| {
tonic::Status::unknown(
format!("Service was not ready: {}", e.into()),
)
})?;
let codec = tonic::codec::ProstCodec::default();
let path = http::uri::PathAndQuery::from_static(
"/chainfire.internal.RaftService/InstallSnapshot",
);
let mut req = request.into_streaming_request();
req.extensions_mut()
.insert(
GrpcMethod::new("chainfire.internal.RaftService", "InstallSnapshot"),
);
self.inner.client_streaming(req, path, codec).await
}
}
}
/// Generated server implementations.
pub mod raft_service_server {
#![allow(
unused_variables,
dead_code,
missing_docs,
clippy::wildcard_imports,
clippy::let_unit_value,
)]
use tonic::codegen::*;
/// Generated trait containing gRPC methods that should be implemented for use with RaftServiceServer.
#[async_trait]
pub trait RaftService: std::marker::Send + std::marker::Sync + 'static {
/// Vote requests a vote from a peer
async fn vote(
&self,
request: tonic::Request<super::VoteRequest>,
) -> std::result::Result<tonic::Response<super::VoteResponse>, tonic::Status>;
/// AppendEntries sends log entries to followers
async fn append_entries(
&self,
request: tonic::Request<super::AppendEntriesRequest>,
) -> std::result::Result<
tonic::Response<super::AppendEntriesResponse>,
tonic::Status,
>;
/// InstallSnapshot sends a snapshot to a follower
async fn install_snapshot(
&self,
request: tonic::Request<tonic::Streaming<super::InstallSnapshotRequest>>,
) -> std::result::Result<
tonic::Response<super::InstallSnapshotResponse>,
tonic::Status,
>;
}
/// Internal Raft RPC service for node-to-node communication
#[derive(Debug)]
pub struct RaftServiceServer<T> {
inner: Arc<T>,
accept_compression_encodings: EnabledCompressionEncodings,
send_compression_encodings: EnabledCompressionEncodings,
max_decoding_message_size: Option<usize>,
max_encoding_message_size: Option<usize>,
}
impl<T> RaftServiceServer<T> {
pub fn new(inner: T) -> Self {
Self::from_arc(Arc::new(inner))
}
pub fn from_arc(inner: Arc<T>) -> Self {
Self {
inner,
accept_compression_encodings: Default::default(),
send_compression_encodings: Default::default(),
max_decoding_message_size: None,
max_encoding_message_size: None,
}
}
pub fn with_interceptor<F>(
inner: T,
interceptor: F,
) -> InterceptedService<Self, F>
where
F: tonic::service::Interceptor,
{
InterceptedService::new(Self::new(inner), interceptor)
}
/// Enable decompressing requests with the given encoding.
#[must_use]
pub fn accept_compressed(mut self, encoding: CompressionEncoding) -> Self {
self.accept_compression_encodings.enable(encoding);
self
}
/// Compress responses with the given encoding, if the client supports it.
#[must_use]
pub fn send_compressed(mut self, encoding: CompressionEncoding) -> Self {
self.send_compression_encodings.enable(encoding);
self
}
/// Limits the maximum size of a decoded message.
///
/// Default: `4MB`
#[must_use]
pub fn max_decoding_message_size(mut self, limit: usize) -> Self {
self.max_decoding_message_size = Some(limit);
self
}
/// Limits the maximum size of an encoded message.
///
/// Default: `usize::MAX`
#[must_use]
pub fn max_encoding_message_size(mut self, limit: usize) -> Self {
self.max_encoding_message_size = Some(limit);
self
}
}
impl<T, B> tonic::codegen::Service<http::Request<B>> for RaftServiceServer<T>
where
T: RaftService,
B: Body + std::marker::Send + 'static,
B::Error: Into<StdError> + std::marker::Send + 'static,
{
type Response = http::Response<tonic::body::BoxBody>;
type Error = std::convert::Infallible;
type Future = BoxFuture<Self::Response, Self::Error>;
fn poll_ready(
&mut self,
_cx: &mut Context<'_>,
) -> Poll<std::result::Result<(), Self::Error>> {
Poll::Ready(Ok(()))
}
fn call(&mut self, req: http::Request<B>) -> Self::Future {
match req.uri().path() {
"/chainfire.internal.RaftService/Vote" => {
#[allow(non_camel_case_types)]
struct VoteSvc<T: RaftService>(pub Arc<T>);
impl<T: RaftService> tonic::server::UnaryService<super::VoteRequest>
for VoteSvc<T> {
type Response = super::VoteResponse;
type Future = BoxFuture<
tonic::Response<Self::Response>,
tonic::Status,
>;
fn call(
&mut self,
request: tonic::Request<super::VoteRequest>,
) -> Self::Future {
let inner = Arc::clone(&self.0);
let fut = async move {
<T as RaftService>::vote(&inner, request).await
};
Box::pin(fut)
}
}
let accept_compression_encodings = self.accept_compression_encodings;
let send_compression_encodings = self.send_compression_encodings;
let max_decoding_message_size = self.max_decoding_message_size;
let max_encoding_message_size = self.max_encoding_message_size;
let inner = self.inner.clone();
let fut = async move {
let method = VoteSvc(inner);
let codec = tonic::codec::ProstCodec::default();
let mut grpc = tonic::server::Grpc::new(codec)
.apply_compression_config(
accept_compression_encodings,
send_compression_encodings,
)
.apply_max_message_size_config(
max_decoding_message_size,
max_encoding_message_size,
);
let res = grpc.unary(method, req).await;
Ok(res)
};
Box::pin(fut)
}
"/chainfire.internal.RaftService/AppendEntries" => {
#[allow(non_camel_case_types)]
struct AppendEntriesSvc<T: RaftService>(pub Arc<T>);
impl<
T: RaftService,
> tonic::server::UnaryService<super::AppendEntriesRequest>
for AppendEntriesSvc<T> {
type Response = super::AppendEntriesResponse;
type Future = BoxFuture<
tonic::Response<Self::Response>,
tonic::Status,
>;
fn call(
&mut self,
request: tonic::Request<super::AppendEntriesRequest>,
) -> Self::Future {
let inner = Arc::clone(&self.0);
let fut = async move {
<T as RaftService>::append_entries(&inner, request).await
};
Box::pin(fut)
}
}
let accept_compression_encodings = self.accept_compression_encodings;
let send_compression_encodings = self.send_compression_encodings;
let max_decoding_message_size = self.max_decoding_message_size;
let max_encoding_message_size = self.max_encoding_message_size;
let inner = self.inner.clone();
let fut = async move {
let method = AppendEntriesSvc(inner);
let codec = tonic::codec::ProstCodec::default();
let mut grpc = tonic::server::Grpc::new(codec)
.apply_compression_config(
accept_compression_encodings,
send_compression_encodings,
)
.apply_max_message_size_config(
max_decoding_message_size,
max_encoding_message_size,
);
let res = grpc.unary(method, req).await;
Ok(res)
};
Box::pin(fut)
}
"/chainfire.internal.RaftService/InstallSnapshot" => {
#[allow(non_camel_case_types)]
struct InstallSnapshotSvc<T: RaftService>(pub Arc<T>);
impl<
T: RaftService,
> tonic::server::ClientStreamingService<
super::InstallSnapshotRequest,
> for InstallSnapshotSvc<T> {
type Response = super::InstallSnapshotResponse;
type Future = BoxFuture<
tonic::Response<Self::Response>,
tonic::Status,
>;
fn call(
&mut self,
request: tonic::Request<
tonic::Streaming<super::InstallSnapshotRequest>,
>,
) -> Self::Future {
let inner = Arc::clone(&self.0);
let fut = async move {
<T as RaftService>::install_snapshot(&inner, request).await
};
Box::pin(fut)
}
}
let accept_compression_encodings = self.accept_compression_encodings;
let send_compression_encodings = self.send_compression_encodings;
let max_decoding_message_size = self.max_decoding_message_size;
let max_encoding_message_size = self.max_encoding_message_size;
let inner = self.inner.clone();
let fut = async move {
let method = InstallSnapshotSvc(inner);
let codec = tonic::codec::ProstCodec::default();
let mut grpc = tonic::server::Grpc::new(codec)
.apply_compression_config(
accept_compression_encodings,
send_compression_encodings,
)
.apply_max_message_size_config(
max_decoding_message_size,
max_encoding_message_size,
);
let res = grpc.client_streaming(method, req).await;
Ok(res)
};
Box::pin(fut)
}
_ => {
Box::pin(async move {
let mut response = http::Response::new(empty_body());
let headers = response.headers_mut();
headers
.insert(
tonic::Status::GRPC_STATUS,
(tonic::Code::Unimplemented as i32).into(),
);
headers
.insert(
http::header::CONTENT_TYPE,
tonic::metadata::GRPC_CONTENT_TYPE,
);
Ok(response)
})
}
}
}
}
impl<T> Clone for RaftServiceServer<T> {
fn clone(&self) -> Self {
let inner = self.inner.clone();
Self {
inner,
accept_compression_encodings: self.accept_compression_encodings,
send_compression_encodings: self.send_compression_encodings,
max_decoding_message_size: self.max_decoding_message_size,
max_encoding_message_size: self.max_encoding_message_size,
}
}
}
/// Generated gRPC service name
pub const SERVICE_NAME: &str = "chainfire.internal.RaftService";
impl<T> tonic::server::NamedService for RaftServiceServer<T> {
const NAME: &'static str = SERVICE_NAME;
}
}

File diff suppressed because it is too large Load diff

View file

@ -0,0 +1,13 @@
//! Generated protobuf code
//!
//! This module contains the code generated by tonic-build from the proto files.
pub mod chainfire {
pub mod v1 {
tonic::include_proto!("chainfire.v1");
}
pub mod internal {
tonic::include_proto!("chainfire.internal");
}
}

View file

@ -0,0 +1,242 @@
//! Internal Raft RPC service implementation
//!
//! This service handles Raft protocol messages between nodes in the cluster.
//! It bridges the gRPC layer with the OpenRaft implementation.
use crate::internal_proto::{
raft_service_server::RaftService, AppendEntriesRequest, AppendEntriesResponse,
InstallSnapshotRequest, InstallSnapshotResponse, VoteRequest, VoteResponse,
};
use chainfire_raft::{Raft, TypeConfig};
use chainfire_types::NodeId;
use openraft::BasicNode;
use std::sync::Arc;
use tonic::{Request, Response, Status, Streaming};
use tracing::{debug, trace, warn};
/// Internal Raft RPC service implementation
///
/// This service handles Raft protocol messages between nodes.
pub struct RaftServiceImpl {
/// Reference to the Raft instance
raft: Arc<Raft>,
}
impl RaftServiceImpl {
/// Create a new Raft service with a Raft instance
pub fn new(raft: Arc<Raft>) -> Self {
Self { raft }
}
}
#[tonic::async_trait]
impl RaftService for RaftServiceImpl {
async fn vote(
&self,
request: Request<VoteRequest>,
) -> Result<Response<VoteResponse>, Status> {
let req = request.into_inner();
trace!(
term = req.term,
candidate = req.candidate_id,
"Vote request received"
);
// Convert proto request to openraft request
let vote_req = openraft::raft::VoteRequest {
vote: openraft::Vote::new(req.term, req.candidate_id),
last_log_id: if req.last_log_index > 0 {
Some(openraft::LogId::new(
openraft::CommittedLeaderId::new(req.last_log_term, 0),
req.last_log_index,
))
} else {
None
},
};
// Forward to Raft node
let result = self.raft.vote(vote_req).await;
match result {
Ok(resp) => {
trace!(term = resp.vote.leader_id().term, granted = resp.vote_granted, "Vote response");
Ok(Response::new(VoteResponse {
term: resp.vote.leader_id().term,
vote_granted: resp.vote_granted,
last_log_index: resp.last_log_id.map(|id| id.index).unwrap_or(0),
last_log_term: resp.last_log_id.map(|id| id.leader_id.term).unwrap_or(0),
}))
}
Err(e) => {
warn!(error = %e, "Vote request failed");
Err(Status::internal(e.to_string()))
}
}
}
async fn append_entries(
&self,
request: Request<AppendEntriesRequest>,
) -> Result<Response<AppendEntriesResponse>, Status> {
let req = request.into_inner();
trace!(
term = req.term,
leader = req.leader_id,
entries = req.entries.len(),
"AppendEntries request received"
);
// Convert proto entries to openraft entries
let entries: Vec<openraft::Entry<TypeConfig>> = req
.entries
.into_iter()
.map(|e| {
let payload = if e.data.is_empty() {
openraft::EntryPayload::Blank
} else {
// Deserialize the command from the entry data
match bincode::deserialize(&e.data) {
Ok(cmd) => openraft::EntryPayload::Normal(cmd),
Err(_) => openraft::EntryPayload::Blank,
}
};
openraft::Entry {
log_id: openraft::LogId::new(
openraft::CommittedLeaderId::new(e.term, 0),
e.index,
),
payload,
}
})
.collect();
let prev_log_id = if req.prev_log_index > 0 {
Some(openraft::LogId::new(
openraft::CommittedLeaderId::new(req.prev_log_term, 0),
req.prev_log_index,
))
} else {
None
};
let leader_commit = if req.leader_commit > 0 {
Some(openraft::LogId::new(
openraft::CommittedLeaderId::new(req.term, 0),
req.leader_commit,
))
} else {
None
};
let append_req = openraft::raft::AppendEntriesRequest {
vote: openraft::Vote::new_committed(req.term, req.leader_id),
prev_log_id,
entries,
leader_commit,
};
let result = self.raft.append_entries(append_req).await;
match result {
Ok(resp) => {
let (success, conflict_index, conflict_term) = match resp {
openraft::raft::AppendEntriesResponse::Success => (true, 0, 0),
openraft::raft::AppendEntriesResponse::PartialSuccess(log_id) => {
// Partial success - some entries were accepted
let index = log_id.map(|l| l.index).unwrap_or(0);
(true, index, 0)
}
openraft::raft::AppendEntriesResponse::HigherVote(vote) => {
(false, 0, vote.leader_id().term)
}
openraft::raft::AppendEntriesResponse::Conflict => (false, 0, 0),
};
trace!(success, "AppendEntries response");
Ok(Response::new(AppendEntriesResponse {
term: req.term,
success,
conflict_index,
conflict_term,
}))
}
Err(e) => {
warn!(error = %e, "AppendEntries request failed");
Err(Status::internal(e.to_string()))
}
}
}
async fn install_snapshot(
&self,
request: Request<Streaming<InstallSnapshotRequest>>,
) -> Result<Response<InstallSnapshotResponse>, Status> {
let mut stream = request.into_inner();
debug!("InstallSnapshot stream started");
// Collect all chunks
let mut term = 0;
let mut leader_id = 0;
let mut last_log_index = 0;
let mut last_log_term = 0;
let mut data = Vec::new();
while let Some(chunk) = stream.message().await? {
term = chunk.term;
leader_id = chunk.leader_id;
last_log_index = chunk.last_included_index;
last_log_term = chunk.last_included_term;
data.extend_from_slice(&chunk.data);
if chunk.done {
break;
}
}
debug!(term, size = data.len(), "InstallSnapshot completed");
// Create snapshot metadata
let last_log_id = if last_log_index > 0 {
Some(openraft::LogId::new(
openraft::CommittedLeaderId::new(last_log_term, 0),
last_log_index,
))
} else {
None
};
let meta = openraft::SnapshotMeta {
last_log_id,
last_membership: openraft::StoredMembership::new(
None,
openraft::Membership::<NodeId, BasicNode>::new(vec![], None),
),
snapshot_id: format!("{}-{}", term, last_log_index),
};
let snapshot_req = openraft::raft::InstallSnapshotRequest {
vote: openraft::Vote::new_committed(term, leader_id),
meta,
offset: 0,
data,
done: true,
};
let result = self.raft.install_snapshot(snapshot_req).await;
match result {
Ok(resp) => {
debug!(term = resp.vote.leader_id().term, "InstallSnapshot response");
Ok(Response::new(InstallSnapshotResponse {
term: resp.vote.leader_id().term,
}))
}
Err(e) => {
warn!(error = %e, "InstallSnapshot request failed");
Err(Status::internal(e.to_string()))
}
}
}
}

View file

@ -0,0 +1,285 @@
//! KV service implementation
use crate::conversions::make_header;
use crate::proto::{
compare, kv_server::Kv, DeleteRangeRequest, DeleteRangeResponse, PutRequest, PutResponse,
RangeRequest, RangeResponse, ResponseOp, TxnRequest, TxnResponse,
};
use chainfire_raft::RaftNode;
use chainfire_types::command::RaftCommand;
use std::sync::Arc;
use tonic::{Request, Response, Status};
use tracing::{debug, trace};
/// KV service implementation
pub struct KvServiceImpl {
/// Raft node for consensus
raft: Arc<RaftNode>,
/// Cluster ID
cluster_id: u64,
}
impl KvServiceImpl {
/// Create a new KV service
pub fn new(raft: Arc<RaftNode>, cluster_id: u64) -> Self {
Self { raft, cluster_id }
}
/// Create a response header
fn make_header(&self, revision: u64) -> crate::proto::ResponseHeader {
make_header(
self.cluster_id,
self.raft.id(),
revision,
0, // TODO: get actual term
)
}
}
#[tonic::async_trait]
impl Kv for KvServiceImpl {
async fn range(
&self,
request: Request<RangeRequest>,
) -> Result<Response<RangeResponse>, Status> {
let req = request.into_inner();
trace!(key = ?String::from_utf8_lossy(&req.key), serializable = req.serializable, "Range request");
// For linearizable reads (serializable=false), ensure we're reading consistent state
// by verifying leadership/log commit status through Raft
if !req.serializable {
self.raft
.linearizable_read()
.await
.map_err(|e| Status::unavailable(format!("linearizable read failed: {}", e)))?;
}
// Get storage from Raft node
let storage = self.raft.storage();
let storage_guard = storage.read().await;
let sm = storage_guard.state_machine().read().await;
let entries = if req.range_end.is_empty() {
// Single key lookup
sm.kv()
.get(&req.key)
.map_err(|e| Status::internal(e.to_string()))?
.into_iter()
.collect()
} else {
// Range scan
sm.kv()
.range(&req.key, Some(&req.range_end))
.map_err(|e| Status::internal(e.to_string()))?
};
let revision = sm.current_revision();
let kvs: Vec<_> = entries.into_iter().map(Into::into).collect();
let count = kvs.len() as i64;
Ok(Response::new(RangeResponse {
header: Some(self.make_header(revision)),
kvs,
more: false,
count,
}))
}
async fn put(&self, request: Request<PutRequest>) -> Result<Response<PutResponse>, Status> {
let req = request.into_inner();
debug!(key = ?String::from_utf8_lossy(&req.key), "Put request");
let command = RaftCommand::Put {
key: req.key,
value: req.value,
lease_id: if req.lease != 0 { Some(req.lease) } else { None },
prev_kv: req.prev_kv,
};
let response = self
.raft
.write(command)
.await
.map_err(|e| Status::internal(e.to_string()))?;
Ok(Response::new(PutResponse {
header: Some(self.make_header(response.revision)),
prev_kv: response.prev_kv.map(Into::into),
}))
}
async fn delete(
&self,
request: Request<DeleteRangeRequest>,
) -> Result<Response<DeleteRangeResponse>, Status> {
let req = request.into_inner();
debug!(key = ?String::from_utf8_lossy(&req.key), "Delete request");
let command = if req.range_end.is_empty() {
RaftCommand::Delete {
key: req.key,
prev_kv: req.prev_kv,
}
} else {
RaftCommand::DeleteRange {
start: req.key,
end: req.range_end,
prev_kv: req.prev_kv,
}
};
let response = self
.raft
.write(command)
.await
.map_err(|e| Status::internal(e.to_string()))?;
Ok(Response::new(DeleteRangeResponse {
header: Some(self.make_header(response.revision)),
deleted: response.deleted as i64,
prev_kvs: response.prev_kvs.into_iter().map(Into::into).collect(),
}))
}
async fn txn(&self, request: Request<TxnRequest>) -> Result<Response<TxnResponse>, Status> {
let req = request.into_inner();
debug!("Txn request with {} comparisons", req.compare.len());
// Convert protobuf types to internal types
let compare: Vec<_> = req
.compare
.into_iter()
.map(|c| {
use chainfire_types::command::{
Compare, CompareResult as InternalResult, CompareTarget as InternalTarget,
};
let result = match compare::CompareResult::try_from(c.result) {
Ok(compare::CompareResult::Equal) => InternalResult::Equal,
Ok(compare::CompareResult::NotEqual) => InternalResult::NotEqual,
Ok(compare::CompareResult::Greater) => InternalResult::Greater,
Ok(compare::CompareResult::Less) => InternalResult::Less,
Err(_) => InternalResult::Equal,
};
let target = match c.target_union {
Some(compare::TargetUnion::Version(v)) => InternalTarget::Version(v as u64),
Some(compare::TargetUnion::CreateRevision(v)) => {
InternalTarget::CreateRevision(v as u64)
}
Some(compare::TargetUnion::ModRevision(v)) => {
InternalTarget::ModRevision(v as u64)
}
Some(compare::TargetUnion::Value(v)) => InternalTarget::Value(v),
None => InternalTarget::Version(0),
};
Compare {
key: c.key,
target,
result,
}
})
.collect();
let success = convert_ops(&req.success);
let failure = convert_ops(&req.failure);
let command = RaftCommand::Txn {
compare,
success,
failure,
};
let response = self
.raft
.write(command)
.await
.map_err(|e| Status::internal(e.to_string()))?;
// Convert txn_responses to proto ResponseOp
let responses = convert_txn_responses(&response.txn_responses, response.revision);
Ok(Response::new(TxnResponse {
header: Some(self.make_header(response.revision)),
succeeded: response.succeeded,
responses,
}))
}
}
/// Convert internal TxnOpResponse to proto ResponseOp
fn convert_txn_responses(
responses: &[chainfire_types::command::TxnOpResponse],
revision: u64,
) -> Vec<ResponseOp> {
use crate::proto::response_op::Response as ProtoResponse;
use chainfire_types::command::TxnOpResponse;
responses
.iter()
.map(|resp| {
let response = match resp {
TxnOpResponse::Put { prev_kv } => ProtoResponse::ResponsePut(PutResponse {
header: Some(make_header(0, 0, revision, 0)),
prev_kv: prev_kv.clone().map(Into::into),
}),
TxnOpResponse::Delete { deleted, prev_kvs } => {
ProtoResponse::ResponseDeleteRange(DeleteRangeResponse {
header: Some(make_header(0, 0, revision, 0)),
deleted: *deleted as i64,
prev_kvs: prev_kvs.iter().cloned().map(Into::into).collect(),
})
}
TxnOpResponse::Range { kvs, count, more } => {
ProtoResponse::ResponseRange(RangeResponse {
header: Some(make_header(0, 0, revision, 0)),
kvs: kvs.iter().cloned().map(Into::into).collect(),
count: *count as i64,
more: *more,
})
}
};
ResponseOp {
response: Some(response),
}
})
.collect()
}
fn convert_ops(
ops: &[crate::proto::RequestOp],
) -> Vec<chainfire_types::command::TxnOp> {
use chainfire_types::command::TxnOp;
ops.iter()
.filter_map(|op| {
op.request.as_ref().map(|req| match req {
crate::proto::request_op::Request::RequestPut(put) => TxnOp::Put {
key: put.key.clone(),
value: put.value.clone(),
lease_id: if put.lease != 0 { Some(put.lease) } else { None },
},
crate::proto::request_op::Request::RequestDeleteRange(del) => {
if del.range_end.is_empty() {
TxnOp::Delete {
key: del.key.clone(),
}
} else {
TxnOp::DeleteRange {
start: del.key.clone(),
end: del.range_end.clone(),
}
}
}
crate::proto::request_op::Request::RequestRange(range) => TxnOp::Range {
key: range.key.clone(),
range_end: range.range_end.clone(),
limit: range.limit,
keys_only: range.keys_only,
count_only: range.count_only,
}
})
})
.collect()
}

View file

@ -0,0 +1,194 @@
//! Lease service implementation
use crate::conversions::make_header;
use crate::proto::{
lease_server::Lease, LeaseGrantRequest, LeaseGrantResponse, LeaseKeepAliveRequest,
LeaseKeepAliveResponse, LeaseLeasesRequest, LeaseLeasesResponse, LeaseRevokeRequest,
LeaseRevokeResponse, LeaseStatus, LeaseTimeToLiveRequest, LeaseTimeToLiveResponse,
};
use chainfire_raft::RaftNode;
use chainfire_types::command::RaftCommand;
use std::pin::Pin;
use std::sync::Arc;
use tokio::sync::mpsc;
use tokio_stream::{wrappers::ReceiverStream, Stream, StreamExt};
use tonic::{Request, Response, Status, Streaming};
use tracing::{debug, warn};
/// Lease service implementation
pub struct LeaseServiceImpl {
/// Raft node for consensus
raft: Arc<RaftNode>,
/// Cluster ID
cluster_id: u64,
}
impl LeaseServiceImpl {
/// Create a new Lease service
pub fn new(raft: Arc<RaftNode>, cluster_id: u64) -> Self {
Self { raft, cluster_id }
}
/// Create a response header
fn make_header(&self, revision: u64) -> crate::proto::ResponseHeader {
make_header(self.cluster_id, self.raft.id(), revision, 0)
}
}
#[tonic::async_trait]
impl Lease for LeaseServiceImpl {
async fn lease_grant(
&self,
request: Request<LeaseGrantRequest>,
) -> Result<Response<LeaseGrantResponse>, Status> {
let req = request.into_inner();
debug!(id = req.id, ttl = req.ttl, "LeaseGrant request");
let command = RaftCommand::LeaseGrant {
id: req.id,
ttl: req.ttl,
};
let response = self
.raft
.write(command)
.await
.map_err(|e| Status::internal(e.to_string()))?;
Ok(Response::new(LeaseGrantResponse {
header: Some(self.make_header(response.revision)),
id: response.lease_id.unwrap_or(0),
ttl: response.lease_ttl.unwrap_or(0),
error: String::new(),
}))
}
async fn lease_revoke(
&self,
request: Request<LeaseRevokeRequest>,
) -> Result<Response<LeaseRevokeResponse>, Status> {
let req = request.into_inner();
debug!(id = req.id, "LeaseRevoke request");
let command = RaftCommand::LeaseRevoke { id: req.id };
let response = self
.raft
.write(command)
.await
.map_err(|e| Status::internal(e.to_string()))?;
Ok(Response::new(LeaseRevokeResponse {
header: Some(self.make_header(response.revision)),
}))
}
type LeaseKeepAliveStream =
Pin<Box<dyn Stream<Item = Result<LeaseKeepAliveResponse, Status>> + Send>>;
async fn lease_keep_alive(
&self,
request: Request<Streaming<LeaseKeepAliveRequest>>,
) -> Result<Response<Self::LeaseKeepAliveStream>, Status> {
let mut stream = request.into_inner();
let raft = Arc::clone(&self.raft);
let cluster_id = self.cluster_id;
let (tx, rx) = mpsc::channel(16);
tokio::spawn(async move {
while let Some(result) = stream.next().await {
match result {
Ok(req) => {
debug!(id = req.id, "LeaseKeepAlive request");
let command = RaftCommand::LeaseRefresh { id: req.id };
match raft.write(command).await {
Ok(response) => {
let resp = LeaseKeepAliveResponse {
header: Some(make_header(
cluster_id,
raft.id(),
response.revision,
0,
)),
id: response.lease_id.unwrap_or(req.id),
ttl: response.lease_ttl.unwrap_or(0),
};
if tx.send(Ok(resp)).await.is_err() {
break;
}
}
Err(e) => {
warn!("LeaseKeepAlive failed: {}", e);
if tx.send(Err(Status::internal(e.to_string()))).await.is_err() {
break;
}
}
}
}
Err(e) => {
warn!("LeaseKeepAlive stream error: {}", e);
break;
}
}
}
});
Ok(Response::new(Box::pin(ReceiverStream::new(rx))))
}
async fn lease_time_to_live(
&self,
request: Request<LeaseTimeToLiveRequest>,
) -> Result<Response<LeaseTimeToLiveResponse>, Status> {
let req = request.into_inner();
debug!(id = req.id, "LeaseTimeToLive request");
// Read directly from state machine (this is a read operation)
let storage = self.raft.storage();
let storage_guard = storage.read().await;
let sm = storage_guard.state_machine().read().await;
let leases = sm.leases();
match leases.time_to_live(req.id) {
Some((ttl, granted_ttl, keys)) => Ok(Response::new(LeaseTimeToLiveResponse {
header: Some(self.make_header(sm.current_revision())),
id: req.id,
ttl,
granted_ttl,
keys: if req.keys { keys } else { vec![] },
})),
None => Ok(Response::new(LeaseTimeToLiveResponse {
header: Some(self.make_header(sm.current_revision())),
id: req.id,
ttl: -1,
granted_ttl: 0,
keys: vec![],
})),
}
}
async fn lease_leases(
&self,
_request: Request<LeaseLeasesRequest>,
) -> Result<Response<LeaseLeasesResponse>, Status> {
debug!("LeaseLeases request");
// Read directly from state machine
let storage = self.raft.storage();
let storage_guard = storage.read().await;
let sm = storage_guard.state_machine().read().await;
let leases = sm.leases();
let lease_ids = leases.list();
let statuses: Vec<LeaseStatus> = lease_ids.into_iter().map(|id| LeaseStatus { id }).collect();
Ok(Response::new(LeaseLeasesResponse {
header: Some(self.make_header(sm.current_revision())),
leases: statuses,
}))
}
}

View file

@ -0,0 +1,29 @@
//! gRPC API layer for Chainfire distributed KVS
//!
//! This crate provides:
//! - Generated protobuf types
//! - gRPC service implementations
//! - Client and server components
pub mod generated;
pub mod kv_service;
pub mod lease_service;
pub mod watch_service;
pub mod cluster_service;
pub mod internal_service;
pub mod raft_client;
pub mod conversions;
// Re-export generated types
pub use generated::chainfire::v1 as proto;
pub use generated::chainfire::internal as internal_proto;
// Re-export services
pub use kv_service::KvServiceImpl;
pub use lease_service::LeaseServiceImpl;
pub use watch_service::WatchServiceImpl;
pub use cluster_service::ClusterServiceImpl;
pub use internal_service::RaftServiceImpl;
// Re-export Raft client and config
pub use raft_client::{GrpcRaftClient, RetryConfig};

View file

@ -0,0 +1,428 @@
//! gRPC client for Raft RPC
//!
//! This module provides a gRPC-based implementation of RaftRpcClient
//! for node-to-node Raft communication with retry and backoff support.
use crate::internal_proto::{
raft_service_client::RaftServiceClient, AppendEntriesRequest as ProtoAppendEntriesRequest,
InstallSnapshotRequest as ProtoInstallSnapshotRequest, LogEntry as ProtoLogEntry,
VoteRequest as ProtoVoteRequest,
};
use chainfire_raft::network::{RaftNetworkError, RaftRpcClient};
use chainfire_raft::TypeConfig;
use chainfire_types::NodeId;
use openraft::raft::{
AppendEntriesRequest, AppendEntriesResponse, InstallSnapshotRequest, InstallSnapshotResponse,
VoteRequest, VoteResponse,
};
use openraft::{CommittedLeaderId, LogId, Vote};
use std::collections::HashMap;
use std::sync::Arc;
use std::time::Duration;
use tokio::sync::RwLock;
use tonic::transport::Channel;
use tracing::{debug, error, trace, warn};
/// Configuration for RPC retry behavior with exponential backoff.
#[derive(Debug, Clone)]
pub struct RetryConfig {
/// Initial timeout for RPC calls (default: 500ms)
pub initial_timeout: Duration,
/// Maximum timeout after backoff (default: 30s)
pub max_timeout: Duration,
/// Maximum number of retry attempts (default: 3)
pub max_retries: u32,
/// Backoff multiplier between retries (default: 2.0)
pub backoff_multiplier: f64,
}
impl Default for RetryConfig {
fn default() -> Self {
Self {
initial_timeout: Duration::from_millis(500),
max_timeout: Duration::from_secs(30),
max_retries: 3,
backoff_multiplier: 2.0,
}
}
}
impl RetryConfig {
/// Create a new RetryConfig with custom values
pub fn new(
initial_timeout: Duration,
max_timeout: Duration,
max_retries: u32,
backoff_multiplier: f64,
) -> Self {
Self {
initial_timeout,
max_timeout,
max_retries,
backoff_multiplier,
}
}
/// Calculate timeout for a given retry attempt (0-indexed)
fn timeout_for_attempt(&self, attempt: u32) -> Duration {
let multiplier = self.backoff_multiplier.powi(attempt as i32);
let timeout_millis = (self.initial_timeout.as_millis() as f64 * multiplier) as u64;
let timeout = Duration::from_millis(timeout_millis);
timeout.min(self.max_timeout)
}
}
/// gRPC-based Raft RPC client with retry support
pub struct GrpcRaftClient {
/// Cached gRPC clients per node
clients: Arc<RwLock<HashMap<NodeId, RaftServiceClient<Channel>>>>,
/// Node address mapping
node_addrs: Arc<RwLock<HashMap<NodeId, String>>>,
/// Retry configuration
retry_config: RetryConfig,
}
impl GrpcRaftClient {
/// Create a new gRPC Raft client with default retry config
pub fn new() -> Self {
Self {
clients: Arc::new(RwLock::new(HashMap::new())),
node_addrs: Arc::new(RwLock::new(HashMap::new())),
retry_config: RetryConfig::default(),
}
}
/// Create a new gRPC Raft client with custom retry config
pub fn new_with_retry(retry_config: RetryConfig) -> Self {
Self {
clients: Arc::new(RwLock::new(HashMap::new())),
node_addrs: Arc::new(RwLock::new(HashMap::new())),
retry_config,
}
}
/// Add or update a node's address
pub async fn add_node(&self, id: NodeId, addr: String) {
debug!(node_id = id, addr = %addr, "Adding node address");
self.node_addrs.write().await.insert(id, addr);
}
/// Remove a node
pub async fn remove_node(&self, id: NodeId) {
self.node_addrs.write().await.remove(&id);
self.clients.write().await.remove(&id);
}
/// Get or create a gRPC client for the target node
async fn get_client(&self, target: NodeId) -> Result<RaftServiceClient<Channel>, RaftNetworkError> {
// Check cache first
{
let clients = self.clients.read().await;
if let Some(client) = clients.get(&target) {
return Ok(client.clone());
}
}
// Get address
let addr = {
let addrs = self.node_addrs.read().await;
addrs.get(&target).cloned()
};
let addr = addr.ok_or(RaftNetworkError::NodeNotFound(target))?;
// Create new connection
let endpoint = format!("http://{}", addr);
trace!(target = target, endpoint = %endpoint, "Connecting to node");
let channel = Channel::from_shared(endpoint.clone())
.map_err(|e| RaftNetworkError::ConnectionFailed {
node_id: target,
reason: e.to_string(),
})?
.connect()
.await
.map_err(|e| RaftNetworkError::ConnectionFailed {
node_id: target,
reason: e.to_string(),
})?;
let client = RaftServiceClient::new(channel);
// Cache the client
self.clients.write().await.insert(target, client.clone());
Ok(client)
}
/// Invalidate cached client for a node (e.g., on connection failure)
async fn invalidate_client(&self, target: NodeId) {
self.clients.write().await.remove(&target);
}
/// Execute an async operation with retry and exponential backoff
async fn with_retry<T, F, Fut>(
&self,
target: NodeId,
rpc_name: &str,
mut operation: F,
) -> Result<T, RaftNetworkError>
where
F: FnMut() -> Fut,
Fut: std::future::Future<Output = Result<T, RaftNetworkError>>,
{
let mut last_error = None;
for attempt in 0..=self.retry_config.max_retries {
let timeout = self.retry_config.timeout_for_attempt(attempt);
trace!(
target = target,
rpc = rpc_name,
attempt = attempt,
timeout_ms = timeout.as_millis(),
"Attempting RPC"
);
match tokio::time::timeout(timeout, operation()).await {
Ok(Ok(result)) => return Ok(result),
Ok(Err(e)) => {
warn!(
target = target,
rpc = rpc_name,
attempt = attempt,
error = %e,
"RPC failed"
);
// Invalidate cached client on failure
self.invalidate_client(target).await;
last_error = Some(e);
}
Err(_) => {
warn!(
target = target,
rpc = rpc_name,
attempt = attempt,
timeout_ms = timeout.as_millis(),
"RPC timed out"
);
// Invalidate cached client on timeout
self.invalidate_client(target).await;
last_error = Some(RaftNetworkError::RpcFailed(format!(
"{} timed out after {}ms",
rpc_name,
timeout.as_millis()
)));
}
}
// Wait before retry (backoff delay)
if attempt < self.retry_config.max_retries {
let backoff_delay = self.retry_config.timeout_for_attempt(attempt);
tokio::time::sleep(backoff_delay).await;
}
}
Err(last_error.unwrap_or_else(|| {
RaftNetworkError::RpcFailed(format!(
"{} failed after {} retries",
rpc_name, self.retry_config.max_retries
))
}))
}
}
impl Default for GrpcRaftClient {
fn default() -> Self {
Self::new()
}
}
#[async_trait::async_trait]
impl RaftRpcClient for GrpcRaftClient {
async fn vote(
&self,
target: NodeId,
req: VoteRequest<NodeId>,
) -> Result<VoteResponse<NodeId>, RaftNetworkError> {
trace!(target = target, term = req.vote.leader_id().term, "Sending vote request");
self.with_retry(target, "vote", || async {
let mut client = self.get_client(target).await?;
// Convert to proto request
let proto_req = ProtoVoteRequest {
term: req.vote.leader_id().term,
candidate_id: req.vote.leader_id().node_id,
last_log_index: req.last_log_id.map(|id| id.index).unwrap_or(0),
last_log_term: req.last_log_id.map(|id| id.leader_id.term).unwrap_or(0),
};
let response = client
.vote(proto_req)
.await
.map_err(|e| RaftNetworkError::RpcFailed(e.to_string()))?;
let resp = response.into_inner();
// Convert from proto response
let last_log_id = if resp.last_log_index > 0 {
Some(LogId::new(
CommittedLeaderId::new(resp.last_log_term, 0),
resp.last_log_index,
))
} else {
None
};
Ok(VoteResponse {
vote: Vote::new(resp.term, target),
vote_granted: resp.vote_granted,
last_log_id,
})
})
.await
}
async fn append_entries(
&self,
target: NodeId,
req: AppendEntriesRequest<TypeConfig>,
) -> Result<AppendEntriesResponse<NodeId>, RaftNetworkError> {
trace!(
target = target,
entries = req.entries.len(),
"Sending append entries"
);
// Clone entries once for potential retries
let entries_data: Vec<(u64, u64, Vec<u8>)> = req
.entries
.iter()
.map(|e| {
let data = match &e.payload {
openraft::EntryPayload::Blank => vec![],
openraft::EntryPayload::Normal(cmd) => {
bincode::serialize(cmd).unwrap_or_default()
}
openraft::EntryPayload::Membership(_) => vec![],
};
(e.log_id.index, e.log_id.leader_id.term, data)
})
.collect();
let term = req.vote.leader_id().term;
let leader_id = req.vote.leader_id().node_id;
let prev_log_index = req.prev_log_id.map(|id| id.index).unwrap_or(0);
let prev_log_term = req.prev_log_id.map(|id| id.leader_id.term).unwrap_or(0);
let leader_commit = req.leader_commit.map(|id| id.index).unwrap_or(0);
self.with_retry(target, "append_entries", || {
let entries_data = entries_data.clone();
async move {
let mut client = self.get_client(target).await?;
let entries: Vec<ProtoLogEntry> = entries_data
.into_iter()
.map(|(index, term, data)| ProtoLogEntry { index, term, data })
.collect();
let proto_req = ProtoAppendEntriesRequest {
term,
leader_id,
prev_log_index,
prev_log_term,
entries,
leader_commit,
};
let response = client
.append_entries(proto_req)
.await
.map_err(|e| RaftNetworkError::RpcFailed(e.to_string()))?;
let resp = response.into_inner();
// Convert response
if resp.success {
Ok(AppendEntriesResponse::Success)
} else if resp.conflict_term > 0 {
Ok(AppendEntriesResponse::HigherVote(Vote::new(
resp.conflict_term,
target,
)))
} else {
Ok(AppendEntriesResponse::Conflict)
}
}
})
.await
}
async fn install_snapshot(
&self,
target: NodeId,
req: InstallSnapshotRequest<TypeConfig>,
) -> Result<InstallSnapshotResponse<NodeId>, RaftNetworkError> {
debug!(
target = target,
last_log_id = ?req.meta.last_log_id,
data_len = req.data.len(),
"Sending install snapshot"
);
let term = req.vote.leader_id().term;
let leader_id = req.vote.leader_id().node_id;
let last_included_index = req.meta.last_log_id.map(|id| id.index).unwrap_or(0);
let last_included_term = req.meta.last_log_id.map(|id| id.leader_id.term).unwrap_or(0);
let offset = req.offset;
let data = req.data.clone();
let done = req.done;
let result = self
.with_retry(target, "install_snapshot", || {
let data = data.clone();
async move {
let mut client = self.get_client(target).await?;
let proto_req = ProtoInstallSnapshotRequest {
term,
leader_id,
last_included_index,
last_included_term,
offset,
data,
done,
};
// Send as stream (single item)
let stream = tokio_stream::once(proto_req);
let response = client
.install_snapshot(stream)
.await
.map_err(|e| RaftNetworkError::RpcFailed(e.to_string()))?;
let resp = response.into_inner();
Ok(InstallSnapshotResponse {
vote: Vote::new(resp.term, target),
})
}
})
.await;
// Log error for install_snapshot failures
if let Err(ref e) = result {
error!(
target = target,
last_log_id = ?req.meta.last_log_id,
data_len = req.data.len(),
error = %e,
"install_snapshot failed after retries"
);
}
result
}
}

View file

@ -0,0 +1,157 @@
//! Watch service implementation
use crate::conversions::make_header;
use crate::proto::{
watch_server::Watch, WatchRequest, WatchResponse,
};
use chainfire_watch::{WatchRegistry, WatchStream};
use std::pin::Pin;
use std::sync::Arc;
use tokio::sync::mpsc;
use tokio_stream::{wrappers::ReceiverStream, StreamExt};
use tonic::{Request, Response, Status, Streaming};
use tracing::{debug, warn};
/// Watch service implementation
pub struct WatchServiceImpl {
/// Watch registry
registry: Arc<WatchRegistry>,
/// Cluster ID
cluster_id: u64,
/// Member ID
member_id: u64,
}
impl WatchServiceImpl {
/// Create a new watch service
pub fn new(registry: Arc<WatchRegistry>, cluster_id: u64, member_id: u64) -> Self {
Self {
registry,
cluster_id,
member_id,
}
}
fn make_header(&self, revision: u64) -> crate::proto::ResponseHeader {
make_header(self.cluster_id, self.member_id, revision, 0)
}
}
#[tonic::async_trait]
impl Watch for WatchServiceImpl {
type WatchStream = Pin<Box<dyn tokio_stream::Stream<Item = Result<WatchResponse, Status>> + Send>>;
async fn watch(
&self,
request: Request<Streaming<WatchRequest>>,
) -> Result<Response<Self::WatchStream>, Status> {
let mut in_stream = request.into_inner();
let registry = Arc::clone(&self.registry);
let cluster_id = self.cluster_id;
let member_id = self.member_id;
// Channel for sending responses back to client
let (tx, rx) = mpsc::channel(128);
let tx_for_events = tx.clone();
// Channel for watch events
let (event_tx, mut event_rx) = mpsc::channel::<crate::proto::WatchResponse>(128);
// Spawn task to handle the bidirectional stream
tokio::spawn(async move {
let mut stream = WatchStream::new(Arc::clone(&registry), {
let event_tx = event_tx.clone();
let (watch_tx, mut watch_rx) = mpsc::channel(64);
// Forward internal watch responses to proto responses
tokio::spawn(async move {
while let Some(resp) = watch_rx.recv().await {
let proto_resp = internal_to_proto_response(resp, cluster_id, member_id);
if event_tx.send(proto_resp).await.is_err() {
break;
}
}
});
watch_tx
});
while let Some(result) = in_stream.next().await {
match result {
Ok(req) => {
if let Some(request_union) = req.request_union {
let response = match request_union {
crate::proto::watch_request::RequestUnion::CreateRequest(create) => {
let internal_req: chainfire_types::watch::WatchRequest =
create.into();
let resp = stream.create_watch(internal_req);
internal_to_proto_response(resp, cluster_id, member_id)
}
crate::proto::watch_request::RequestUnion::CancelRequest(cancel) => {
let resp = stream.cancel_watch(cancel.watch_id);
internal_to_proto_response(resp, cluster_id, member_id)
}
crate::proto::watch_request::RequestUnion::ProgressRequest(_) => {
// Send progress notification
WatchResponse {
header: Some(make_header(
cluster_id,
member_id,
registry.current_revision(),
0,
)),
watch_id: 0,
created: false,
canceled: false,
compact_revision: 0,
cancel_reason: String::new(),
events: vec![],
}
}
};
if tx.send(Ok(response)).await.is_err() {
break;
}
}
}
Err(e) => {
warn!(error = %e, "Watch stream error");
break;
}
}
}
debug!(watches = stream.watch_count(), "Watch stream closed");
// Stream cleanup happens in WatchStream::drop
});
// Spawn task to forward watch events
tokio::spawn(async move {
while let Some(response) = event_rx.recv().await {
if tx_for_events.send(Ok(response)).await.is_err() {
break;
}
}
});
let output_stream = ReceiverStream::new(rx);
Ok(Response::new(Box::pin(output_stream)))
}
}
fn internal_to_proto_response(
resp: chainfire_types::watch::WatchResponse,
cluster_id: u64,
member_id: u64,
) -> WatchResponse {
WatchResponse {
header: Some(make_header(cluster_id, member_id, resp.compact_revision, 0)),
watch_id: resp.watch_id,
created: resp.created,
canceled: resp.canceled,
compact_revision: resp.compact_revision as i64,
cancel_reason: String::new(),
events: resp.events.into_iter().map(Into::into).collect(),
}
}

View file

@ -0,0 +1,37 @@
[package]
name = "chainfire-core"
version.workspace = true
edition.workspace = true
license.workspace = true
description = "Embeddable distributed cluster library with Raft consensus and SWIM gossip"
rust-version.workspace = true
[dependencies]
# Internal crates
chainfire-types = { workspace = true }
# Note: chainfire-storage, chainfire-raft, chainfire-gossip, chainfire-watch
# will be added as implementation progresses
# chainfire-storage = { workspace = true }
# chainfire-raft = { workspace = true }
# chainfire-gossip = { workspace = true }
# chainfire-watch = { workspace = true }
# Async runtime
tokio = { workspace = true }
tokio-stream = { workspace = true }
futures = { workspace = true }
async-trait = { workspace = true }
# Utilities
thiserror = { workspace = true }
tracing = { workspace = true }
bytes = { workspace = true }
parking_lot = { workspace = true }
dashmap = { workspace = true }
[dev-dependencies]
tokio = { workspace = true, features = ["test-util"] }
tempfile = { workspace = true }
[lints]
workspace = true

View file

@ -0,0 +1,221 @@
//! Builder pattern for cluster creation
use std::net::SocketAddr;
use std::path::PathBuf;
use std::sync::Arc;
use chainfire_types::node::NodeRole;
use chainfire_types::RaftRole;
use crate::callbacks::{ClusterEventHandler, KvEventHandler};
use crate::cluster::Cluster;
use crate::config::{ClusterConfig, MemberConfig, StorageBackendConfig, TimeoutConfig};
use crate::error::{ClusterError, Result};
use crate::events::EventDispatcher;
/// Builder for creating a Chainfire cluster instance
///
/// # Example
///
/// ```ignore
/// use chainfire_core::ClusterBuilder;
///
/// let cluster = ClusterBuilder::new(1)
/// .name("node-1")
/// .gossip_addr("0.0.0.0:7946".parse()?)
/// .raft_addr("0.0.0.0:2380".parse()?)
/// .bootstrap(true)
/// .build()
/// .await?;
/// ```
pub struct ClusterBuilder {
config: ClusterConfig,
cluster_handlers: Vec<Arc<dyn ClusterEventHandler>>,
kv_handlers: Vec<Arc<dyn KvEventHandler>>,
}
impl ClusterBuilder {
/// Create a new cluster builder with the given node ID
pub fn new(node_id: u64) -> Self {
Self {
config: ClusterConfig {
node_id,
..Default::default()
},
cluster_handlers: Vec::new(),
kv_handlers: Vec::new(),
}
}
/// Set the node name
pub fn name(mut self, name: impl Into<String>) -> Self {
self.config.node_name = name.into();
self
}
/// Set the node role (ControlPlane or Worker)
pub fn role(mut self, role: NodeRole) -> Self {
self.config.node_role = role;
self
}
/// Set the Raft participation role (Voter, Learner, or None)
pub fn raft_role(mut self, role: RaftRole) -> Self {
self.config.raft_role = role;
self
}
/// Set the API listen address
pub fn api_addr(mut self, addr: SocketAddr) -> Self {
self.config.api_addr = Some(addr);
self
}
/// Set the Raft listen address (for control plane nodes)
pub fn raft_addr(mut self, addr: SocketAddr) -> Self {
self.config.raft_addr = Some(addr);
self
}
/// Set the gossip listen address
pub fn gossip_addr(mut self, addr: SocketAddr) -> Self {
self.config.gossip_addr = addr;
self
}
/// Set the storage backend
pub fn storage(mut self, backend: StorageBackendConfig) -> Self {
self.config.storage = backend;
self
}
/// Set the data directory (convenience method for RocksDB storage)
pub fn data_dir(mut self, path: impl Into<PathBuf>) -> Self {
self.config.storage = StorageBackendConfig::RocksDb { path: path.into() };
self
}
/// Use in-memory storage
pub fn memory_storage(mut self) -> Self {
self.config.storage = StorageBackendConfig::Memory;
self
}
/// Add initial cluster members (for bootstrap)
pub fn initial_members(mut self, members: Vec<MemberConfig>) -> Self {
self.config.initial_members = members;
self
}
/// Add a single initial member
pub fn add_member(mut self, member: MemberConfig) -> Self {
self.config.initial_members.push(member);
self
}
/// Enable cluster bootstrap (first node)
pub fn bootstrap(mut self, bootstrap: bool) -> Self {
self.config.bootstrap = bootstrap;
self
}
/// Set the cluster ID
pub fn cluster_id(mut self, id: u64) -> Self {
self.config.cluster_id = id;
self
}
/// Enable gRPC API server
pub fn with_grpc_api(mut self, enabled: bool) -> Self {
self.config.enable_grpc_api = enabled;
self
}
/// Set timeout configuration
pub fn timeouts(mut self, timeouts: TimeoutConfig) -> Self {
self.config.timeouts = timeouts;
self
}
/// Register a cluster event handler
///
/// Multiple handlers can be registered. They will all be called
/// when cluster events occur.
pub fn on_cluster_event<H>(mut self, handler: H) -> Self
where
H: ClusterEventHandler + 'static,
{
self.cluster_handlers.push(Arc::new(handler));
self
}
/// Register a cluster event handler (Arc version)
pub fn on_cluster_event_arc(mut self, handler: Arc<dyn ClusterEventHandler>) -> Self {
self.cluster_handlers.push(handler);
self
}
/// Register a KV event handler
///
/// Multiple handlers can be registered. They will all be called
/// when KV events occur.
pub fn on_kv_event<H>(mut self, handler: H) -> Self
where
H: KvEventHandler + 'static,
{
self.kv_handlers.push(Arc::new(handler));
self
}
/// Register a KV event handler (Arc version)
pub fn on_kv_event_arc(mut self, handler: Arc<dyn KvEventHandler>) -> Self {
self.kv_handlers.push(handler);
self
}
/// Validate the configuration
fn validate(&self) -> Result<()> {
if self.config.node_id == 0 {
return Err(ClusterError::Config("node_id must be non-zero".into()));
}
if self.config.node_name.is_empty() {
return Err(ClusterError::Config("node_name is required".into()));
}
// Raft-participating nodes need a Raft address
if self.config.raft_role.participates_in_raft() && self.config.raft_addr.is_none() {
return Err(ClusterError::Config(
"raft_addr is required for Raft-participating nodes".into(),
));
}
Ok(())
}
/// Build the cluster instance
///
/// This initializes the storage backend, Raft (if applicable), and gossip.
pub async fn build(self) -> Result<Cluster> {
self.validate()?;
// Create event dispatcher with registered handlers
let mut event_dispatcher = EventDispatcher::new();
for handler in self.cluster_handlers {
event_dispatcher.add_cluster_handler(handler);
}
for handler in self.kv_handlers {
event_dispatcher.add_kv_handler(handler);
}
// Create the cluster
let cluster = Cluster::new(self.config, event_dispatcher);
// TODO: Initialize storage backend
// TODO: Initialize Raft if role participates
// TODO: Initialize gossip
// TODO: Start background tasks
Ok(cluster)
}
}

View file

@ -0,0 +1,103 @@
//! Callback traits for cluster events
use async_trait::async_trait;
use chainfire_types::node::NodeInfo;
use crate::kvs::KvEntry;
/// Handler for cluster lifecycle events
///
/// Implement this trait to receive notifications about cluster membership
/// and leadership changes.
#[async_trait]
pub trait ClusterEventHandler: Send + Sync {
/// Called when a node joins the cluster
async fn on_node_joined(&self, _node: &NodeInfo) {}
/// Called when a node leaves the cluster
async fn on_node_left(&self, _node_id: u64, _reason: LeaveReason) {}
/// Called when leadership changes
async fn on_leader_changed(&self, _old_leader: Option<u64>, _new_leader: u64) {}
/// Called when this node becomes leader
async fn on_became_leader(&self) {}
/// Called when this node loses leadership
async fn on_lost_leadership(&self) {}
/// Called when cluster membership changes
async fn on_membership_changed(&self, _members: &[NodeInfo]) {}
/// Called when a network partition is detected
async fn on_partition_detected(&self, _reachable: &[u64], _unreachable: &[u64]) {}
/// Called when cluster is ready (initial leader elected, etc.)
async fn on_cluster_ready(&self) {}
}
/// Handler for KV store events
///
/// Implement this trait to receive notifications about key-value changes.
#[async_trait]
pub trait KvEventHandler: Send + Sync {
/// Called when a key is created or updated
async fn on_key_changed(
&self,
_namespace: &str,
_key: &[u8],
_value: &[u8],
_revision: u64,
) {
}
/// Called when a key is deleted
async fn on_key_deleted(&self, _namespace: &str, _key: &[u8], _revision: u64) {}
/// Called when multiple keys with a prefix are changed
async fn on_prefix_changed(&self, _namespace: &str, _prefix: &[u8], _entries: &[KvEntry]) {}
}
/// Reason for node departure from the cluster
#[derive(Debug, Clone, Copy, PartialEq, Eq)]
pub enum LeaveReason {
/// Node left gracefully
Graceful,
/// Node timed out (failed to respond)
Timeout,
/// Network partition detected
NetworkPartition,
/// Node was explicitly evicted
Evicted,
/// Unknown reason
Unknown,
}
impl std::fmt::Display for LeaveReason {
fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
match self {
LeaveReason::Graceful => write!(f, "graceful"),
LeaveReason::Timeout => write!(f, "timeout"),
LeaveReason::NetworkPartition => write!(f, "network_partition"),
LeaveReason::Evicted => write!(f, "evicted"),
LeaveReason::Unknown => write!(f, "unknown"),
}
}
}
/// A no-op event handler for when callbacks are not needed
pub struct NoOpClusterEventHandler;
#[async_trait]
impl ClusterEventHandler for NoOpClusterEventHandler {}
/// A no-op KV event handler
pub struct NoOpKvEventHandler;
#[async_trait]
impl KvEventHandler for NoOpKvEventHandler {}

View file

@ -0,0 +1,282 @@
//! Cluster management
use std::sync::atomic::{AtomicBool, Ordering};
use std::sync::Arc;
use parking_lot::RwLock;
use tokio::sync::broadcast;
use chainfire_types::node::NodeInfo;
use crate::config::ClusterConfig;
use crate::error::{ClusterError, Result};
use crate::events::EventDispatcher;
use crate::kvs::{Kv, KvHandle};
/// Current state of the cluster
#[derive(Debug, Clone)]
pub struct ClusterState {
/// Whether this node is the leader
pub is_leader: bool,
/// Current leader's node ID
pub leader_id: Option<u64>,
/// Current term (Raft)
pub term: u64,
/// All known cluster members
pub members: Vec<NodeInfo>,
/// Whether the cluster is ready (initial leader elected)
pub ready: bool,
}
impl Default for ClusterState {
fn default() -> Self {
Self {
is_leader: false,
leader_id: None,
term: 0,
members: Vec::new(),
ready: false,
}
}
}
/// Main cluster instance
///
/// This is the primary interface for interacting with a Chainfire cluster.
/// It manages Raft consensus, gossip membership, and the distributed KV store.
pub struct Cluster {
/// Node configuration
config: ClusterConfig,
/// Current cluster state
state: Arc<RwLock<ClusterState>>,
/// KV store
kv: Arc<Kv>,
/// Event dispatcher
event_dispatcher: Arc<EventDispatcher>,
/// Shutdown flag
shutdown: AtomicBool,
/// Shutdown signal sender
shutdown_tx: broadcast::Sender<()>,
}
impl Cluster {
/// Create a new cluster instance
pub(crate) fn new(
config: ClusterConfig,
event_dispatcher: EventDispatcher,
) -> Self {
let (shutdown_tx, _) = broadcast::channel(1);
Self {
config,
state: Arc::new(RwLock::new(ClusterState::default())),
kv: Arc::new(Kv::new()),
event_dispatcher: Arc::new(event_dispatcher),
shutdown: AtomicBool::new(false),
shutdown_tx,
}
}
/// Get this node's ID
pub fn node_id(&self) -> u64 {
self.config.node_id
}
/// Get this node's name
pub fn node_name(&self) -> &str {
&self.config.node_name
}
/// Get a handle for interacting with the cluster
///
/// Handles are lightweight and can be cloned freely.
pub fn handle(&self) -> ClusterHandle {
ClusterHandle {
node_id: self.config.node_id,
state: self.state.clone(),
kv: self.kv.clone(),
shutdown_tx: self.shutdown_tx.clone(),
}
}
/// Get the KV store interface
pub fn kv(&self) -> &Arc<Kv> {
&self.kv
}
/// Get current cluster state
pub fn state(&self) -> ClusterState {
self.state.read().clone()
}
/// Check if this node is the leader
pub fn is_leader(&self) -> bool {
self.state.read().is_leader
}
/// Get current leader ID
pub fn leader(&self) -> Option<u64> {
self.state.read().leader_id
}
/// Get all cluster members
pub fn members(&self) -> Vec<NodeInfo> {
self.state.read().members.clone()
}
/// Check if the cluster is ready
pub fn is_ready(&self) -> bool {
self.state.read().ready
}
/// Join an existing cluster
///
/// Connects to seed nodes and joins the cluster.
pub async fn join(&self, _seed_addrs: &[std::net::SocketAddr]) -> Result<()> {
// TODO: Implement cluster joining via gossip
Ok(())
}
/// Leave the cluster gracefully
pub async fn leave(&self) -> Result<()> {
// TODO: Implement graceful leave
self.shutdown();
Ok(())
}
/// Add a new node to the cluster (leader only)
pub async fn add_node(&self, _node: NodeInfo, _as_learner: bool) -> Result<()> {
if !self.is_leader() {
return Err(ClusterError::NotLeader {
leader_id: self.leader(),
});
}
// TODO: Implement node addition via Raft
Ok(())
}
/// Remove a node from the cluster (leader only)
pub async fn remove_node(&self, _node_id: u64) -> Result<()> {
if !self.is_leader() {
return Err(ClusterError::NotLeader {
leader_id: self.leader(),
});
}
// TODO: Implement node removal via Raft
Ok(())
}
/// Promote a learner to voter (leader only)
pub async fn promote_learner(&self, _node_id: u64) -> Result<()> {
if !self.is_leader() {
return Err(ClusterError::NotLeader {
leader_id: self.leader(),
});
}
// TODO: Implement learner promotion via Raft
Ok(())
}
/// Run the cluster (blocks until shutdown)
pub async fn run(self) -> Result<()> {
self.run_until_shutdown(std::future::pending()).await
}
/// Run with graceful shutdown signal
pub async fn run_until_shutdown<F>(self, shutdown_signal: F) -> Result<()>
where
F: std::future::Future<Output = ()>,
{
let mut shutdown_rx = self.shutdown_tx.subscribe();
tokio::select! {
_ = shutdown_signal => {
tracing::info!("Received shutdown signal");
}
_ = shutdown_rx.recv() => {
tracing::info!("Received internal shutdown");
}
}
// TODO: Cleanup resources
Ok(())
}
/// Trigger shutdown
pub fn shutdown(&self) {
self.shutdown.store(true, Ordering::SeqCst);
let _ = self.shutdown_tx.send(());
}
/// Check if shutdown was requested
pub fn is_shutting_down(&self) -> bool {
self.shutdown.load(Ordering::SeqCst)
}
/// Get the event dispatcher
pub(crate) fn event_dispatcher(&self) -> &Arc<EventDispatcher> {
&self.event_dispatcher
}
}
/// Lightweight handle for cluster operations
///
/// This handle can be cloned and passed around cheaply. It provides
/// access to cluster state and the KV store without owning the cluster.
#[derive(Clone)]
pub struct ClusterHandle {
node_id: u64,
state: Arc<RwLock<ClusterState>>,
kv: Arc<Kv>,
shutdown_tx: broadcast::Sender<()>,
}
impl ClusterHandle {
/// Get this node's ID
pub fn node_id(&self) -> u64 {
self.node_id
}
/// Get a KV handle
pub fn kv(&self) -> KvHandle {
KvHandle::new(self.kv.clone())
}
/// Check if this node is the leader
pub fn is_leader(&self) -> bool {
self.state.read().is_leader
}
/// Get current leader ID
pub fn leader(&self) -> Option<u64> {
self.state.read().leader_id
}
/// Get all cluster members
pub fn members(&self) -> Vec<NodeInfo> {
self.state.read().members.clone()
}
/// Get current cluster state
pub fn state(&self) -> ClusterState {
self.state.read().clone()
}
/// Trigger cluster shutdown
pub fn shutdown(&self) {
let _ = self.shutdown_tx.send(());
}
}

View file

@ -0,0 +1,162 @@
//! Configuration types for chainfire-core
use std::net::SocketAddr;
use std::path::PathBuf;
use std::sync::Arc;
use std::time::Duration;
use chainfire_types::node::NodeRole;
use chainfire_types::RaftRole;
// Forward declaration - will be implemented in chainfire-storage
// For now, use a placeholder trait
use async_trait::async_trait;
/// Storage backend trait for pluggable storage
#[async_trait]
pub trait StorageBackend: Send + Sync {
/// Get a value by key
async fn get(&self, key: &[u8]) -> std::io::Result<Option<Vec<u8>>>;
/// Put a value
async fn put(&self, key: &[u8], value: &[u8]) -> std::io::Result<()>;
/// Delete a key
async fn delete(&self, key: &[u8]) -> std::io::Result<bool>;
}
/// Configuration for a cluster node
#[derive(Debug, Clone)]
pub struct ClusterConfig {
/// Unique node ID
pub node_id: u64,
/// Human-readable node name
pub node_name: String,
/// Node role (ControlPlane or Worker)
pub node_role: NodeRole,
/// Raft participation role (Voter, Learner, or None)
pub raft_role: RaftRole,
/// API listen address for client connections
pub api_addr: Option<SocketAddr>,
/// Raft listen address for peer-to-peer Raft communication
pub raft_addr: Option<SocketAddr>,
/// Gossip listen address for membership discovery
pub gossip_addr: SocketAddr,
/// Storage backend configuration
pub storage: StorageBackendConfig,
/// Initial cluster members for bootstrap
pub initial_members: Vec<MemberConfig>,
/// Whether to bootstrap the cluster (first node)
pub bootstrap: bool,
/// Cluster ID
pub cluster_id: u64,
/// Enable gRPC API server
pub enable_grpc_api: bool,
/// Timeouts
pub timeouts: TimeoutConfig,
}
impl Default for ClusterConfig {
fn default() -> Self {
Self {
node_id: 0,
node_name: String::new(),
node_role: NodeRole::ControlPlane,
raft_role: RaftRole::Voter,
api_addr: None,
raft_addr: None,
gossip_addr: "0.0.0.0:7946".parse().unwrap(),
storage: StorageBackendConfig::Memory,
initial_members: Vec::new(),
bootstrap: false,
cluster_id: 1,
enable_grpc_api: false,
timeouts: TimeoutConfig::default(),
}
}
}
/// Storage backend configuration
#[derive(Clone)]
pub enum StorageBackendConfig {
/// In-memory storage (for testing/simple deployments)
Memory,
/// RocksDB storage
RocksDb {
/// Data directory path
path: PathBuf,
},
/// Custom storage backend
Custom(Arc<dyn StorageBackend>),
}
impl std::fmt::Debug for StorageBackendConfig {
fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
match self {
StorageBackendConfig::Memory => write!(f, "Memory"),
StorageBackendConfig::RocksDb { path } => {
f.debug_struct("RocksDb").field("path", path).finish()
}
StorageBackendConfig::Custom(_) => write!(f, "Custom(...)"),
}
}
}
/// Configuration for a cluster member
#[derive(Debug, Clone)]
pub struct MemberConfig {
/// Node ID
pub id: u64,
/// Node name
pub name: String,
/// Raft address
pub raft_addr: String,
/// Client API address
pub client_addr: String,
}
/// Timeout configuration
#[derive(Debug, Clone)]
pub struct TimeoutConfig {
/// Raft heartbeat interval
pub heartbeat_interval: Duration,
/// Raft election timeout range (min)
pub election_timeout_min: Duration,
/// Raft election timeout range (max)
pub election_timeout_max: Duration,
/// Connection timeout
pub connection_timeout: Duration,
/// Request timeout
pub request_timeout: Duration,
}
impl Default for TimeoutConfig {
fn default() -> Self {
Self {
heartbeat_interval: Duration::from_millis(150),
election_timeout_min: Duration::from_millis(300),
election_timeout_max: Duration::from_millis(600),
connection_timeout: Duration::from_secs(5),
request_timeout: Duration::from_secs(10),
}
}
}

View file

@ -0,0 +1,78 @@
//! Error types for chainfire-core
use thiserror::Error;
/// Result type for chainfire-core operations
pub type Result<T> = std::result::Result<T, ClusterError>;
/// Errors that can occur in cluster operations
#[derive(Debug, Error)]
pub enum ClusterError {
/// Storage operation failed
#[error("storage error: {0}")]
Storage(String),
/// Raft consensus error
#[error("raft error: {0}")]
Raft(String),
/// Gossip protocol error
#[error("gossip error: {0}")]
Gossip(String),
/// Network error
#[error("network error: {0}")]
Network(String),
/// Configuration error
#[error("configuration error: {0}")]
Config(String),
/// Not the leader - write operations must go to leader
#[error("not the leader, current leader is: {leader_id:?}")]
NotLeader {
/// Current leader's node ID, if known
leader_id: Option<u64>,
},
/// Key not found
#[error("key not found")]
KeyNotFound,
/// Compare-and-swap version mismatch
#[error("version mismatch: expected {expected}, got {actual}")]
VersionMismatch {
/// Expected version
expected: u64,
/// Actual version
actual: u64,
},
/// Cluster not initialized
#[error("cluster not initialized")]
NotInitialized,
/// Node already exists in cluster
#[error("node {0} already exists in cluster")]
NodeExists(u64),
/// Node not found in cluster
#[error("node {0} not found in cluster")]
NodeNotFound(u64),
/// Operation timed out
#[error("operation timed out")]
Timeout,
/// Cluster is shutting down
#[error("cluster is shutting down")]
ShuttingDown,
/// Internal error
#[error("internal error: {0}")]
Internal(String),
/// IO error
#[error("io error: {0}")]
Io(#[from] std::io::Error),
}

View file

@ -0,0 +1,198 @@
//! Event types and dispatcher
use std::sync::Arc;
use tokio::sync::broadcast;
use chainfire_types::node::NodeInfo;
use crate::callbacks::{ClusterEventHandler, KvEventHandler, LeaveReason};
/// Cluster-level events
#[derive(Debug, Clone)]
pub enum ClusterEvent {
/// A node joined the cluster
NodeJoined(NodeInfo),
/// A node left the cluster
NodeLeft {
/// The node ID that left
node_id: u64,
/// Why the node left
reason: LeaveReason,
},
/// Leadership changed
LeaderChanged {
/// Previous leader (None if no previous leader)
old: Option<u64>,
/// New leader
new: u64,
},
/// This node became the leader
BecameLeader,
/// This node lost leadership
LostLeadership,
/// Cluster membership changed
MembershipChanged(Vec<NodeInfo>),
/// Network partition detected
PartitionDetected {
/// Nodes that are reachable
reachable: Vec<u64>,
/// Nodes that are unreachable
unreachable: Vec<u64>,
},
/// Cluster is ready
ClusterReady,
}
/// KV store events
#[derive(Debug, Clone)]
pub enum KvEvent {
/// A key was created or updated
KeyChanged {
/// Namespace of the key
namespace: String,
/// The key that changed
key: Vec<u8>,
/// New value
value: Vec<u8>,
/// Revision number
revision: u64,
},
/// A key was deleted
KeyDeleted {
/// Namespace of the key
namespace: String,
/// The key that was deleted
key: Vec<u8>,
/// Revision number
revision: u64,
},
}
/// Event dispatcher that manages callbacks and event broadcasting
pub struct EventDispatcher {
cluster_handlers: Vec<Arc<dyn ClusterEventHandler>>,
kv_handlers: Vec<Arc<dyn KvEventHandler>>,
event_tx: broadcast::Sender<ClusterEvent>,
}
impl EventDispatcher {
/// Create a new event dispatcher
pub fn new() -> Self {
let (event_tx, _) = broadcast::channel(1024);
Self {
cluster_handlers: Vec::new(),
kv_handlers: Vec::new(),
event_tx,
}
}
/// Add a cluster event handler
pub fn add_cluster_handler(&mut self, handler: Arc<dyn ClusterEventHandler>) {
self.cluster_handlers.push(handler);
}
/// Add a KV event handler
pub fn add_kv_handler(&mut self, handler: Arc<dyn KvEventHandler>) {
self.kv_handlers.push(handler);
}
/// Get a subscriber for cluster events
pub fn subscribe(&self) -> broadcast::Receiver<ClusterEvent> {
self.event_tx.subscribe()
}
/// Dispatch a cluster event to all handlers
pub async fn dispatch_cluster_event(&self, event: ClusterEvent) {
// Broadcast to channel subscribers
let _ = self.event_tx.send(event.clone());
// Call registered handlers
match &event {
ClusterEvent::NodeJoined(node) => {
for handler in &self.cluster_handlers {
handler.on_node_joined(node).await;
}
}
ClusterEvent::NodeLeft { node_id, reason } => {
for handler in &self.cluster_handlers {
handler.on_node_left(*node_id, *reason).await;
}
}
ClusterEvent::LeaderChanged { old, new } => {
for handler in &self.cluster_handlers {
handler.on_leader_changed(*old, *new).await;
}
}
ClusterEvent::BecameLeader => {
for handler in &self.cluster_handlers {
handler.on_became_leader().await;
}
}
ClusterEvent::LostLeadership => {
for handler in &self.cluster_handlers {
handler.on_lost_leadership().await;
}
}
ClusterEvent::MembershipChanged(members) => {
for handler in &self.cluster_handlers {
handler.on_membership_changed(members).await;
}
}
ClusterEvent::PartitionDetected {
reachable,
unreachable,
} => {
for handler in &self.cluster_handlers {
handler.on_partition_detected(reachable, unreachable).await;
}
}
ClusterEvent::ClusterReady => {
for handler in &self.cluster_handlers {
handler.on_cluster_ready().await;
}
}
}
}
/// Dispatch a KV event to all handlers
pub async fn dispatch_kv_event(&self, event: KvEvent) {
match &event {
KvEvent::KeyChanged {
namespace,
key,
value,
revision,
} => {
for handler in &self.kv_handlers {
handler
.on_key_changed(namespace, key, value, *revision)
.await;
}
}
KvEvent::KeyDeleted {
namespace,
key,
revision,
} => {
for handler in &self.kv_handlers {
handler.on_key_deleted(namespace, key, *revision).await;
}
}
}
}
}
impl Default for EventDispatcher {
fn default() -> Self {
Self::new()
}
}

View file

@ -0,0 +1,290 @@
//! Key-Value store abstraction
use std::sync::Arc;
use std::time::Duration;
use dashmap::DashMap;
use crate::error::{ClusterError, Result};
/// KV store interface
///
/// Provides access to distributed key-value storage with namespace isolation.
pub struct Kv {
namespaces: DashMap<String, Arc<KvNamespace>>,
default_namespace: Arc<KvNamespace>,
}
impl Kv {
/// Create a new KV store
pub(crate) fn new() -> Self {
let default_namespace = Arc::new(KvNamespace::new("default".to_string()));
Self {
namespaces: DashMap::new(),
default_namespace,
}
}
/// Get or create a namespace
pub fn namespace(&self, name: &str) -> Arc<KvNamespace> {
if name == "default" {
return self.default_namespace.clone();
}
self.namespaces
.entry(name.to_string())
.or_insert_with(|| Arc::new(KvNamespace::new(name.to_string())))
.clone()
}
/// Get the default namespace
pub fn default_namespace(&self) -> &Arc<KvNamespace> {
&self.default_namespace
}
// Convenience methods on default namespace
/// Get a value by key from the default namespace
pub async fn get(&self, key: impl AsRef<[u8]>) -> Result<Option<Vec<u8>>> {
self.default_namespace.get(key).await
}
/// Put a value in the default namespace
pub async fn put(&self, key: impl AsRef<[u8]>, value: impl AsRef<[u8]>) -> Result<u64> {
self.default_namespace.put(key, value).await
}
/// Delete a key from the default namespace
pub async fn delete(&self, key: impl AsRef<[u8]>) -> Result<bool> {
self.default_namespace.delete(key).await
}
/// Compare-and-swap in the default namespace
pub async fn compare_and_swap(
&self,
key: impl AsRef<[u8]>,
expected_version: u64,
value: impl AsRef<[u8]>,
) -> Result<CasResult> {
self.default_namespace
.compare_and_swap(key, expected_version, value)
.await
}
}
/// KV namespace for data isolation
pub struct KvNamespace {
name: String,
// TODO: Add storage backend and raft reference
}
impl KvNamespace {
pub(crate) fn new(name: String) -> Self {
Self { name }
}
/// Get the namespace name
pub fn name(&self) -> &str {
&self.name
}
/// Get a value by key
pub async fn get(&self, _key: impl AsRef<[u8]>) -> Result<Option<Vec<u8>>> {
// TODO: Implement with storage backend
Ok(None)
}
/// Get with revision
pub async fn get_with_revision(
&self,
_key: impl AsRef<[u8]>,
) -> Result<Option<(Vec<u8>, u64)>> {
// TODO: Implement with storage backend
Ok(None)
}
/// Put a value (goes through Raft if available)
pub async fn put(&self, _key: impl AsRef<[u8]>, _value: impl AsRef<[u8]>) -> Result<u64> {
// TODO: Implement with Raft
Ok(0)
}
/// Put with options
pub async fn put_with_options(
&self,
_key: impl AsRef<[u8]>,
_value: impl AsRef<[u8]>,
_options: KvOptions,
) -> Result<KvPutResult> {
// TODO: Implement with Raft
Ok(KvPutResult {
revision: 0,
prev_value: None,
})
}
/// Delete a key
pub async fn delete(&self, _key: impl AsRef<[u8]>) -> Result<bool> {
// TODO: Implement with Raft
Ok(false)
}
/// Compare-and-swap
pub async fn compare_and_swap(
&self,
_key: impl AsRef<[u8]>,
expected_version: u64,
_value: impl AsRef<[u8]>,
) -> Result<CasResult> {
// TODO: Implement with storage backend
Err(ClusterError::VersionMismatch {
expected: expected_version,
actual: 0,
})
}
/// Scan keys with prefix
pub async fn scan_prefix(
&self,
_prefix: impl AsRef<[u8]>,
_limit: u32,
) -> Result<Vec<KvEntry>> {
// TODO: Implement with storage backend
Ok(Vec::new())
}
/// Scan keys in a range
pub async fn scan_range(
&self,
_start: impl AsRef<[u8]>,
_end: impl AsRef<[u8]>,
_limit: u32,
) -> Result<Vec<KvEntry>> {
// TODO: Implement with storage backend
Ok(Vec::new())
}
/// Get with specified consistency level
pub async fn get_with_consistency(
&self,
_key: impl AsRef<[u8]>,
_consistency: ReadConsistency,
) -> Result<Option<Vec<u8>>> {
// TODO: Implement with consistency options
Ok(None)
}
}
/// Options for KV operations
#[derive(Debug, Clone, Default)]
pub struct KvOptions {
/// Lease ID for TTL-based expiration
pub lease_id: Option<u64>,
/// Return previous value
pub prev_kv: bool,
/// Time-to-live for the key
pub ttl: Option<Duration>,
}
/// Result of a put operation
#[derive(Debug, Clone)]
pub struct KvPutResult {
/// New revision after the put
pub revision: u64,
/// Previous value, if requested and existed
pub prev_value: Option<Vec<u8>>,
}
/// A key-value entry with metadata
#[derive(Debug, Clone)]
pub struct KvEntry {
/// The key
pub key: Vec<u8>,
/// The value
pub value: Vec<u8>,
/// Revision when the key was created
pub create_revision: u64,
/// Revision when the key was last modified
pub mod_revision: u64,
/// Version number (increments on each update)
pub version: u64,
/// Lease ID if the key is attached to a lease
pub lease_id: Option<u64>,
}
/// Result of a compare-and-swap operation
#[derive(Debug, Clone)]
pub enum CasResult {
/// CAS succeeded, contains new revision
Success(u64),
/// CAS failed due to version mismatch
Conflict {
/// Expected version
expected: u64,
/// Actual version found
actual: u64,
},
/// Key did not exist
NotFound,
}
/// Read consistency level
#[derive(Debug, Clone, Copy, Default, PartialEq, Eq)]
pub enum ReadConsistency {
/// Read from local storage (may be stale)
Local,
/// Read from any node, but verify with leader's committed index
Serializable,
/// Read only from leader (linearizable, strongest guarantee)
#[default]
Linearizable,
}
/// Lightweight handle for KV operations
#[derive(Clone)]
pub struct KvHandle {
kv: Arc<Kv>,
}
impl KvHandle {
pub(crate) fn new(kv: Arc<Kv>) -> Self {
Self { kv }
}
/// Get the underlying KV store
pub fn inner(&self) -> &Arc<Kv> {
&self.kv
}
/// Get a value by key
pub async fn get(&self, key: impl AsRef<[u8]>) -> Result<Option<Vec<u8>>> {
self.kv.get(key).await
}
/// Put a value
pub async fn put(&self, key: impl AsRef<[u8]>, value: impl AsRef<[u8]>) -> Result<u64> {
self.kv.put(key, value).await
}
/// Delete a key
pub async fn delete(&self, key: impl AsRef<[u8]>) -> Result<bool> {
self.kv.delete(key).await
}
/// Get a namespace
pub fn namespace(&self, name: &str) -> Arc<KvNamespace> {
self.kv.namespace(name)
}
}

View file

@ -0,0 +1,58 @@
//! Chainfire Core - Embeddable distributed cluster library
//!
//! This crate provides cluster management, distributed KVS, and event callbacks
//! for embedding Raft consensus and SWIM gossip into applications.
//!
//! # Example
//!
//! ```ignore
//! use chainfire_core::{ClusterBuilder, ClusterEventHandler};
//! use std::net::SocketAddr;
//!
//! struct MyHandler;
//!
//! impl ClusterEventHandler for MyHandler {
//! async fn on_leader_changed(&self, old: Option<u64>, new: u64) {
//! println!("Leader changed: {:?} -> {}", old, new);
//! }
//! }
//!
//! #[tokio::main]
//! async fn main() -> Result<(), Box<dyn std::error::Error>> {
//! let cluster = ClusterBuilder::new(1)
//! .name("node-1")
//! .gossip_addr("0.0.0.0:7946".parse()?)
//! .raft_addr("0.0.0.0:2380".parse()?)
//! .on_cluster_event(MyHandler)
//! .build()
//! .await?;
//!
//! // Use the KVS
//! cluster.kv().put("key", b"value").await?;
//!
//! Ok(())
//! }
//! ```
pub mod builder;
pub mod callbacks;
pub mod cluster;
pub mod config;
pub mod error;
pub mod events;
pub mod kvs;
// Re-exports from chainfire-types
pub use chainfire_types::{
node::{NodeId, NodeInfo, NodeRole},
RaftRole,
};
// Re-exports from this crate
pub use builder::ClusterBuilder;
pub use callbacks::{ClusterEventHandler, KvEventHandler, LeaveReason};
pub use cluster::{Cluster, ClusterHandle, ClusterState};
pub use config::{ClusterConfig, StorageBackend, StorageBackendConfig};
pub use error::{ClusterError, Result};
pub use events::{ClusterEvent, EventDispatcher, KvEvent};
pub use kvs::{CasResult, Kv, KvEntry, KvHandle, KvNamespace, KvOptions, ReadConsistency};

View file

@ -0,0 +1,35 @@
[package]
name = "chainfire-gossip"
version.workspace = true
edition.workspace = true
license.workspace = true
rust-version.workspace = true
description = "Gossip/SWIM protocol integration for Chainfire distributed KVS"
[dependencies]
chainfire-types = { workspace = true }
# Gossip (SWIM protocol)
foca = { workspace = true }
# Async
tokio = { workspace = true }
futures = { workspace = true }
# Serialization
serde = { workspace = true }
bincode = { workspace = true }
# Utilities
tracing = { workspace = true }
parking_lot = { workspace = true }
dashmap = { workspace = true }
bytes = { workspace = true }
thiserror = { workspace = true }
rand = "0.9"
[dev-dependencies]
tokio = { workspace = true, features = ["rt-multi-thread", "macros", "time"] }
[lints]
workspace = true

View file

@ -0,0 +1,214 @@
//! Gossip agent with UDP transport
use crate::broadcast::ActualStateBroadcast;
use crate::identity::GossipId;
use crate::membership::{MembershipChange, MembershipState};
use crate::runtime::GossipRuntime;
use crate::GossipError;
use foca::{Config as FocaConfig, Foca, NoCustomBroadcast, PostcardCodec, Timer};
use futures::stream::FuturesUnordered;
use futures::StreamExt;
use rand::rngs::SmallRng;
use rand::SeedableRng;
use std::net::SocketAddr;
use std::sync::Arc;
use std::time::Duration;
use tokio::net::UdpSocket;
use tokio::sync::mpsc;
use tracing::{error, info, trace, warn};
/// Default gossip configuration
pub fn default_config() -> FocaConfig {
FocaConfig::simple()
}
/// Gossip agent managing the SWIM protocol
pub struct GossipAgent {
/// Our identity
identity: GossipId,
/// UDP socket for gossip
socket: Arc<UdpSocket>,
/// Membership state
membership: Arc<MembershipState>,
/// Actual state broadcast handler
broadcast: Arc<ActualStateBroadcast>,
/// Channel for receiving membership changes
membership_rx: mpsc::Receiver<MembershipChange>,
/// Channel for receiving outgoing packets
outgoing_rx: mpsc::Receiver<(SocketAddr, Vec<u8>)>,
/// Channel for receiving timer events
timer_rx: mpsc::Receiver<(Timer<GossipId>, Duration)>,
/// Foca instance
foca: Foca<GossipId, PostcardCodec, SmallRng, NoCustomBroadcast>,
/// Runtime for callbacks
runtime: GossipRuntime,
}
impl GossipAgent {
/// Create a new gossip agent
pub async fn new(identity: GossipId, config: FocaConfig) -> Result<Self, GossipError> {
let socket = UdpSocket::bind(identity.addr)
.await
.map_err(|e| GossipError::BindFailed(e.to_string()))?;
info!(addr = %identity.addr, node_id = identity.node_id, "Gossip agent bound");
let (outgoing_tx, outgoing_rx) = mpsc::channel(1024);
let (timer_tx, timer_rx) = mpsc::channel(256);
let (membership_tx, membership_rx) = mpsc::channel(256);
let runtime = GossipRuntime::new(outgoing_tx, timer_tx, membership_tx);
let rng = SmallRng::from_os_rng();
let foca = Foca::new(identity.clone(), config, rng, PostcardCodec);
Ok(Self {
identity,
socket: Arc::new(socket),
membership: Arc::new(MembershipState::new()),
broadcast: Arc::new(ActualStateBroadcast::new()),
membership_rx,
outgoing_rx,
timer_rx,
foca,
runtime,
})
}
/// Get the identity
pub fn identity(&self) -> &GossipId {
&self.identity
}
/// Get the membership state
pub fn membership(&self) -> &Arc<MembershipState> {
&self.membership
}
/// Get the broadcast handler
pub fn broadcast(&self) -> &Arc<ActualStateBroadcast> {
&self.broadcast
}
/// Announce to a known cluster member to join
pub fn announce(&mut self, addr: SocketAddr) -> Result<(), GossipError> {
// Create a probe identity for the target
let probe = GossipId::worker(0, addr);
self.foca
.announce(probe, &mut self.runtime)
.map_err(|e| GossipError::JoinFailed(format!("{:?}", e)))?;
info!(addr = %addr, "Announced to cluster");
Ok(())
}
/// Get current members
pub fn members(&self) -> Vec<GossipId> {
self.foca.iter_members().map(|m| m.id().clone()).collect()
}
/// Run the gossip agent
pub async fn run(&mut self) -> Result<(), GossipError> {
let mut buf = vec![0u8; 65536];
let mut timer_handles = FuturesUnordered::new();
info!(identity = %self.identity, "Starting gossip agent");
loop {
tokio::select! {
// Handle incoming UDP packets
result = self.socket.recv_from(&mut buf) => {
match result {
Ok((len, addr)) => {
trace!(from = %addr, len, "Received gossip packet");
if let Err(e) = self.foca.handle_data(&buf[..len], &mut self.runtime) {
warn!(error = ?e, "Failed to handle gossip data");
}
}
Err(e) => {
error!(error = %e, "Failed to receive UDP packet");
}
}
}
// Send outgoing packets
Some((addr, data)) = self.outgoing_rx.recv() => {
trace!(to = %addr, len = data.len(), "Sending gossip packet");
if let Err(e) = self.socket.send_to(&data, addr).await {
warn!(error = %e, to = %addr, "Failed to send UDP packet");
}
}
// Schedule timers
Some((timer, duration)) = self.timer_rx.recv() => {
let timer_clone = timer.clone();
timer_handles.push(async move {
tokio::time::sleep(duration).await;
timer_clone
});
}
// Fire timers
Some(timer) = timer_handles.next() => {
if let Err(e) = self.foca.handle_timer(timer, &mut self.runtime) {
warn!(error = ?e, "Failed to handle timer");
}
}
// Handle membership changes
Some(change) = self.membership_rx.recv() => {
// Also remove state on member down
if let MembershipChange::MemberDown(ref id) = change {
self.broadcast.remove_state(id.node_id);
}
self.membership.handle_change(change);
}
}
}
}
/// Run the agent with graceful shutdown
pub async fn run_until_shutdown(
mut self,
mut shutdown: tokio::sync::broadcast::Receiver<()>,
) -> Result<(), GossipError> {
tokio::select! {
result = self.run() => result,
_ = shutdown.recv() => {
info!("Gossip agent shutting down");
Ok(())
}
}
}
}
#[cfg(test)]
mod tests {
use super::*;
use chainfire_types::node::NodeRole;
async fn create_test_agent(port: u16) -> GossipAgent {
let id = GossipId::new(
port as u64,
format!("127.0.0.1:{}", port).parse().unwrap(),
NodeRole::Worker,
);
GossipAgent::new(id, default_config()).await.unwrap()
}
#[tokio::test]
async fn test_agent_creation() {
let agent = create_test_agent(15000).await;
assert_eq!(agent.identity().node_id, 15000);
}
#[tokio::test]
async fn test_membership_empty() {
let agent = create_test_agent(15001).await;
assert_eq!(agent.membership().count(), 0);
}
// Note: Full gossip tests require multiple agents communicating
// which is complex to set up in unit tests. Integration tests
// would be more appropriate for testing actual gossip behavior.
}

View file

@ -0,0 +1,210 @@
//! Custom broadcast handler for actual state propagation
use chainfire_types::NodeId;
use dashmap::DashMap;
use parking_lot::RwLock;
use serde::{Deserialize, Serialize};
use std::collections::HashMap;
use std::time::SystemTime;
use tracing::debug;
/// Actual state data broadcast via gossip
///
/// This is the "Actual State" mentioned in the design - things like
/// CPU usage, memory, running tasks, etc. that are eventually consistent.
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct ActualState {
/// Node ID this state is from
pub node_id: NodeId,
/// Timestamp when this state was generated
pub timestamp: u64,
/// CPU usage percentage (0-100)
pub cpu_usage: f32,
/// Memory usage percentage (0-100)
pub memory_usage: f32,
/// Disk usage percentage (0-100)
pub disk_usage: f32,
/// Custom status fields (e.g., "vm-a" -> "running")
pub status: HashMap<String, String>,
}
impl ActualState {
/// Create a new actual state
pub fn new(node_id: NodeId) -> Self {
let timestamp = SystemTime::now()
.duration_since(SystemTime::UNIX_EPOCH)
.unwrap()
.as_secs();
Self {
node_id,
timestamp,
cpu_usage: 0.0,
memory_usage: 0.0,
disk_usage: 0.0,
status: HashMap::new(),
}
}
/// Set CPU usage
pub fn with_cpu(mut self, usage: f32) -> Self {
self.cpu_usage = usage;
self
}
/// Set memory usage
pub fn with_memory(mut self, usage: f32) -> Self {
self.memory_usage = usage;
self
}
/// Set disk usage
pub fn with_disk(mut self, usage: f32) -> Self {
self.disk_usage = usage;
self
}
/// Add a status entry
pub fn with_status(mut self, key: impl Into<String>, value: impl Into<String>) -> Self {
self.status.insert(key.into(), value.into());
self
}
/// Update timestamp to now
pub fn touch(&mut self) {
self.timestamp = SystemTime::now()
.duration_since(SystemTime::UNIX_EPOCH)
.unwrap()
.as_secs();
}
}
/// Broadcast handler for actual state propagation
pub struct ActualStateBroadcast {
/// Current node's actual state
local_state: RwLock<Option<ActualState>>,
/// Collected states from other nodes
cluster_state: DashMap<NodeId, ActualState>,
}
impl ActualStateBroadcast {
/// Create a new broadcast handler
pub fn new() -> Self {
Self {
local_state: RwLock::new(None),
cluster_state: DashMap::new(),
}
}
/// Set the local node's state
pub fn set_local_state(&self, state: ActualState) {
*self.local_state.write() = Some(state);
}
/// Get the local node's state
pub fn local_state(&self) -> Option<ActualState> {
self.local_state.read().clone()
}
/// Get a node's state
pub fn get_state(&self, node_id: NodeId) -> Option<ActualState> {
self.cluster_state.get(&node_id).map(|r| r.clone())
}
/// Get all cluster states
pub fn all_states(&self) -> Vec<ActualState> {
self.cluster_state.iter().map(|r| r.clone()).collect()
}
/// Remove a node's state (on member down)
pub fn remove_state(&self, node_id: NodeId) {
self.cluster_state.remove(&node_id);
}
}
impl Default for ActualStateBroadcast {
fn default() -> Self {
Self::new()
}
}
impl ActualStateBroadcast {
/// Receive and process state from another node
/// Returns true if the state was newer and accepted
pub fn receive_state(&self, state: ActualState) -> bool {
let node_id = state.node_id;
// Check if we should update
if let Some(existing) = self.cluster_state.get(&node_id) {
if existing.timestamp >= state.timestamp {
return false; // Stale data
}
}
debug!(node_id, timestamp = state.timestamp, "Received actual state");
self.cluster_state.insert(node_id, state);
true
}
}
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn test_actual_state_creation() {
let state = ActualState::new(1)
.with_cpu(50.0)
.with_memory(75.0)
.with_status("vm-1", "running");
assert_eq!(state.node_id, 1);
assert_eq!(state.cpu_usage, 50.0);
assert_eq!(state.memory_usage, 75.0);
assert_eq!(state.status.get("vm-1"), Some(&"running".to_string()));
}
#[test]
fn test_receive_state() {
let handler = ActualStateBroadcast::new();
// Receive first state
let state1 = ActualState::new(1).with_cpu(50.0);
let result = handler.receive_state(state1.clone());
assert!(result); // Should accept
// Receive newer state
let mut state2 = ActualState::new(1).with_cpu(60.0);
state2.timestamp = state1.timestamp + 1;
let result = handler.receive_state(state2.clone());
assert!(result); // Should accept
// Receive older state
let mut state3 = ActualState::new(1).with_cpu(40.0);
state3.timestamp = state1.timestamp - 1;
let result = handler.receive_state(state3);
assert!(!result); // Should reject stale data
// Verify final state
let stored = handler.get_state(1).unwrap();
assert_eq!(stored.cpu_usage, 60.0);
}
#[test]
fn test_cluster_state_collection() {
let handler = ActualStateBroadcast::new();
handler
.cluster_state
.insert(1, ActualState::new(1).with_cpu(50.0));
handler
.cluster_state
.insert(2, ActualState::new(2).with_cpu(60.0));
let states = handler.all_states();
assert_eq!(states.len(), 2);
handler.remove_state(1);
assert_eq!(handler.all_states().len(), 1);
}
}

View file

@ -0,0 +1,147 @@
//! Node identity for the gossip protocol
use chainfire_types::node::{NodeId, NodeRole};
use foca::Identity;
use serde::{Deserialize, Serialize};
use std::net::SocketAddr;
/// Node identity for the SWIM gossip protocol
#[derive(Debug, Clone, PartialEq, Eq, Hash, Serialize, Deserialize)]
pub struct GossipId {
/// Unique node identifier
pub node_id: NodeId,
/// UDP address for gossip
pub addr: SocketAddr,
/// Incarnation number - bumped on rejoin to distinguish old/new instances
pub incarnation: u64,
/// Node role
pub role: NodeRole,
}
impl GossipId {
/// Create a new gossip identity
pub fn new(node_id: NodeId, addr: SocketAddr, role: NodeRole) -> Self {
Self {
node_id,
addr,
incarnation: 0,
role,
}
}
/// Create a Control Plane node identity
pub fn control_plane(node_id: NodeId, addr: SocketAddr) -> Self {
Self::new(node_id, addr, NodeRole::ControlPlane)
}
/// Create a Worker node identity
pub fn worker(node_id: NodeId, addr: SocketAddr) -> Self {
Self::new(node_id, addr, NodeRole::Worker)
}
/// Check if this node is a Control Plane node
pub fn is_control_plane(&self) -> bool {
self.role == NodeRole::ControlPlane
}
/// Check if this node is a Worker node
pub fn is_worker(&self) -> bool {
self.role == NodeRole::Worker
}
}
impl Identity for GossipId {
type Addr = SocketAddr;
fn addr(&self) -> SocketAddr {
self.addr
}
fn renew(&self) -> Option<Self> {
// Create new identity with bumped incarnation
Some(Self {
incarnation: self.incarnation + 1,
..self.clone()
})
}
fn win_addr_conflict(&self, other: &Self) -> bool {
// Higher incarnation wins, tie-break by node_id
match self.incarnation.cmp(&other.incarnation) {
std::cmp::Ordering::Greater => true,
std::cmp::Ordering::Less => false,
std::cmp::Ordering::Equal => self.node_id > other.node_id,
}
}
}
impl std::fmt::Display for GossipId {
fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
write!(
f,
"{}@{}:{}",
self.node_id,
self.addr,
self.incarnation
)
}
}
impl PartialOrd for GossipId {
fn partial_cmp(&self, other: &Self) -> Option<std::cmp::Ordering> {
Some(self.cmp(other))
}
}
impl Ord for GossipId {
fn cmp(&self, other: &Self) -> std::cmp::Ordering {
// First compare by node_id, then by incarnation
match self.node_id.cmp(&other.node_id) {
std::cmp::Ordering::Equal => self.incarnation.cmp(&other.incarnation),
other => other,
}
}
}
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn test_identity_creation() {
let id = GossipId::control_plane(1, "127.0.0.1:5000".parse().unwrap());
assert_eq!(id.node_id, 1);
assert_eq!(id.incarnation, 0);
assert!(id.is_control_plane());
}
#[test]
fn test_identity_renew() {
let id = GossipId::worker(1, "127.0.0.1:5000".parse().unwrap());
let renewed = id.renew().unwrap();
assert_eq!(renewed.node_id, id.node_id);
assert_eq!(renewed.addr, id.addr);
assert_eq!(renewed.incarnation, 1);
}
#[test]
fn test_identity_ordering() {
let id1 = GossipId::new(1, "127.0.0.1:5000".parse().unwrap(), NodeRole::Worker);
let id2 = GossipId::new(2, "127.0.0.1:5001".parse().unwrap(), NodeRole::Worker);
let id1_renewed = id1.renew().unwrap();
assert!(id1 < id2);
assert!(id1 < id1_renewed);
}
#[test]
fn test_serialization() {
let id = GossipId::control_plane(42, "192.168.1.1:5000".parse().unwrap());
let bytes = bincode::serialize(&id).unwrap();
let restored: GossipId = bincode::deserialize(&bytes).unwrap();
assert_eq!(id, restored);
}
}

View file

@ -0,0 +1,40 @@
//! Gossip/SWIM protocol integration for Chainfire distributed KVS
//!
//! This crate provides:
//! - Node identity for SWIM protocol
//! - Gossip agent with UDP transport
//! - Membership management
//! - Actual state broadcast
pub mod agent;
pub mod broadcast;
pub mod identity;
pub mod membership;
pub mod runtime;
pub use agent::GossipAgent;
pub use broadcast::ActualState;
pub use identity::GossipId;
pub use membership::MembershipChange;
pub use runtime::GossipRuntime;
use thiserror::Error;
/// Gossip protocol errors
#[derive(Error, Debug)]
pub enum GossipError {
#[error("Failed to bind to address: {0}")]
BindFailed(String),
#[error("IO error: {0}")]
Io(#[from] std::io::Error),
#[error("Serialization error: {0}")]
Serialization(String),
#[error("Invalid identity: {0}")]
InvalidIdentity(String),
#[error("Join failed: {0}")]
JoinFailed(String),
}

View file

@ -0,0 +1,141 @@
//! Membership state management
use crate::identity::GossipId;
use chainfire_types::NodeId;
use dashmap::DashMap;
use std::net::SocketAddr;
use tracing::debug;
/// Membership change event
#[derive(Debug, Clone)]
pub enum MembershipChange {
/// A member joined or became reachable
MemberUp(GossipId),
/// A member left or became unreachable
MemberDown(GossipId),
}
/// Manages cluster membership state
pub struct MembershipState {
/// Known members
members: DashMap<NodeId, GossipId>,
}
impl MembershipState {
/// Create a new membership state
pub fn new() -> Self {
Self {
members: DashMap::new(),
}
}
/// Handle a membership change
pub fn handle_change(&self, change: MembershipChange) {
match change {
MembershipChange::MemberUp(id) => {
debug!(node_id = id.node_id, addr = %id.addr, "Adding member");
self.members.insert(id.node_id, id);
}
MembershipChange::MemberDown(id) => {
debug!(node_id = id.node_id, "Removing member");
self.members.remove(&id.node_id);
}
}
}
/// Get a member by node ID
pub fn get(&self, node_id: NodeId) -> Option<GossipId> {
self.members.get(&node_id).map(|r| r.clone())
}
/// Get all members
pub fn all(&self) -> Vec<GossipId> {
self.members.iter().map(|r| r.clone()).collect()
}
/// Get member count
pub fn count(&self) -> usize {
self.members.len()
}
/// Check if a node is a member
pub fn contains(&self, node_id: NodeId) -> bool {
self.members.contains_key(&node_id)
}
/// Get all member addresses
pub fn addresses(&self) -> Vec<SocketAddr> {
self.members.iter().map(|r| r.addr).collect()
}
/// Get all control plane members
pub fn control_plane_members(&self) -> Vec<GossipId> {
self.members
.iter()
.filter(|r| r.is_control_plane())
.map(|r| r.clone())
.collect()
}
/// Get all worker members
pub fn worker_members(&self) -> Vec<GossipId> {
self.members
.iter()
.filter(|r| r.is_worker())
.map(|r| r.clone())
.collect()
}
}
impl Default for MembershipState {
fn default() -> Self {
Self::new()
}
}
#[cfg(test)]
mod tests {
use super::*;
use chainfire_types::node::NodeRole;
fn create_id(node_id: NodeId, role: NodeRole) -> GossipId {
GossipId::new(
node_id,
format!("127.0.0.1:{}", 5000 + node_id).parse().unwrap(),
role,
)
}
#[test]
fn test_membership_changes() {
let state = MembershipState::new();
let id1 = create_id(1, NodeRole::ControlPlane);
let id2 = create_id(2, NodeRole::Worker);
state.handle_change(MembershipChange::MemberUp(id1.clone()));
state.handle_change(MembershipChange::MemberUp(id2.clone()));
assert_eq!(state.count(), 2);
assert!(state.contains(1));
assert!(state.contains(2));
state.handle_change(MembershipChange::MemberDown(id1));
assert_eq!(state.count(), 1);
assert!(!state.contains(1));
}
#[test]
fn test_role_filtering() {
let state = MembershipState::new();
state.handle_change(MembershipChange::MemberUp(create_id(1, NodeRole::ControlPlane)));
state.handle_change(MembershipChange::MemberUp(create_id(2, NodeRole::ControlPlane)));
state.handle_change(MembershipChange::MemberUp(create_id(3, NodeRole::Worker)));
state.handle_change(MembershipChange::MemberUp(create_id(4, NodeRole::Worker)));
state.handle_change(MembershipChange::MemberUp(create_id(5, NodeRole::Worker)));
assert_eq!(state.control_plane_members().len(), 2);
assert_eq!(state.worker_members().len(), 3);
}
}

View file

@ -0,0 +1,131 @@
//! Foca runtime implementation
use crate::identity::GossipId;
use crate::membership::MembershipChange;
use foca::{Notification, Runtime, Timer};
use std::net::SocketAddr;
use std::time::Duration;
use tokio::sync::mpsc;
use tracing::{debug, trace};
/// Foca runtime implementation for async operation
pub struct GossipRuntime {
/// Channel for outgoing UDP packets
outgoing_tx: mpsc::Sender<(SocketAddr, Vec<u8>)>,
/// Channel for timer scheduling
timer_tx: mpsc::Sender<(Timer<GossipId>, Duration)>,
/// Channel for membership updates
membership_tx: mpsc::Sender<MembershipChange>,
}
impl GossipRuntime {
/// Create a new gossip runtime
pub fn new(
outgoing_tx: mpsc::Sender<(SocketAddr, Vec<u8>)>,
timer_tx: mpsc::Sender<(Timer<GossipId>, Duration)>,
membership_tx: mpsc::Sender<MembershipChange>,
) -> Self {
Self {
outgoing_tx,
timer_tx,
membership_tx,
}
}
}
impl Runtime<GossipId> for GossipRuntime {
fn notify(&mut self, notification: Notification<GossipId>) {
match notification {
Notification::MemberUp(id) => {
debug!(node_id = id.node_id, addr = %id.addr, "Member up");
let _ = self
.membership_tx
.try_send(MembershipChange::MemberUp(id.clone()));
}
Notification::MemberDown(id) => {
debug!(node_id = id.node_id, addr = %id.addr, "Member down");
let _ = self
.membership_tx
.try_send(MembershipChange::MemberDown(id.clone()));
}
Notification::Idle => {
trace!("Gossip idle");
}
Notification::Rejoin(id) => {
debug!(node_id = id.node_id, "Member rejoined");
let _ = self
.membership_tx
.try_send(MembershipChange::MemberUp(id.clone()));
}
Notification::Active => {
trace!("Gossip active");
}
Notification::Defunct => {
trace!("Member defunct");
}
Notification::Rename(old, new) => {
debug!(old = old.node_id, new = new.node_id, "Member renamed");
// Treat as down/up sequence
let _ = self
.membership_tx
.try_send(MembershipChange::MemberDown(old.clone()));
let _ = self
.membership_tx
.try_send(MembershipChange::MemberUp(new.clone()));
}
}
}
fn send_to(&mut self, to: GossipId, data: &[u8]) {
trace!(to = %to.addr, len = data.len(), "Sending gossip packet");
let _ = self.outgoing_tx.try_send((to.addr, data.to_vec()));
}
fn submit_after(&mut self, event: Timer<GossipId>, after: Duration) {
trace!(?event, ?after, "Scheduling timer");
let _ = self.timer_tx.try_send((event, after));
}
}
#[cfg(test)]
mod tests {
use super::*;
use chainfire_types::node::NodeRole;
#[tokio::test]
async fn test_runtime_notifications() {
let (outgoing_tx, _) = mpsc::channel(10);
let (timer_tx, _) = mpsc::channel(10);
let (membership_tx, mut membership_rx) = mpsc::channel(10);
let mut runtime = GossipRuntime::new(outgoing_tx, timer_tx, membership_tx);
let id = GossipId::new(1, "127.0.0.1:5000".parse().unwrap(), NodeRole::Worker);
runtime.notify(Notification::MemberUp(&id));
let change = membership_rx.try_recv().unwrap();
assert!(matches!(change, MembershipChange::MemberUp(_)));
runtime.notify(Notification::MemberDown(&id));
let change = membership_rx.try_recv().unwrap();
assert!(matches!(change, MembershipChange::MemberDown(_)));
}
#[tokio::test]
async fn test_runtime_send() {
let (outgoing_tx, mut outgoing_rx) = mpsc::channel(10);
let (timer_tx, _) = mpsc::channel(10);
let (membership_tx, _) = mpsc::channel(10);
let mut runtime = GossipRuntime::new(outgoing_tx, timer_tx, membership_tx);
let id = GossipId::new(1, "127.0.0.1:5000".parse().unwrap(), NodeRole::Worker);
let data = b"test data";
runtime.send_to(id.clone(), data);
let (recv_addr, recv_data) = outgoing_rx.try_recv().unwrap();
assert_eq!(recv_addr, id.addr);
assert_eq!(recv_data, data);
}
}

View file

@ -0,0 +1,21 @@
[package]
name = "chainfire-proto"
version.workspace = true
edition.workspace = true
license.workspace = true
rust-version.workspace = true
description = "Protocol buffer definitions for Chainfire (client-safe, no storage deps)"
[dependencies]
tonic = { workspace = true }
prost = { workspace = true }
prost-types = { workspace = true }
tokio = { workspace = true }
tokio-stream = { workspace = true }
[build-dependencies]
tonic-build = { workspace = true }
protoc-bin-vendored = "3"
[lints]
workspace = true

View file

@ -0,0 +1,12 @@
fn main() -> Result<(), Box<dyn std::error::Error>> {
let protoc_path = protoc_bin_vendored::protoc_bin_path()?;
std::env::set_var("PROTOC", protoc_path);
tonic_build::configure()
.build_server(false)
.build_client(true)
.compile_protos(&["../../proto/chainfire.proto"], &["../../proto"])?;
println!("cargo:rerun-if-changed=../../proto/chainfire.proto");
Ok(())
}

View file

@ -0,0 +1,7 @@
//! Lightweight protocol buffer definitions for Chainfire (client-side)
//! Generates client stubs only (no storage/backend dependencies).
// Generated client stubs live under the `proto` module to mirror chainfire-api's re-exports.
pub mod proto {
include!(concat!(env!("OUT_DIR"), "/chainfire.v1.rs"));
}

View file

@ -0,0 +1,38 @@
[package]
name = "chainfire-raft"
version.workspace = true
edition.workspace = true
license.workspace = true
rust-version.workspace = true
description = "OpenRaft integration for Chainfire distributed KVS"
[dependencies]
chainfire-types = { workspace = true }
chainfire-storage = { workspace = true }
# Raft
openraft = { workspace = true }
# Async
tokio = { workspace = true }
async-trait = { workspace = true }
futures = { workspace = true }
# Serialization
serde = { workspace = true }
bincode = { workspace = true }
# Utilities
tracing = { workspace = true }
parking_lot = { workspace = true }
dashmap = { workspace = true }
bytes = { workspace = true }
thiserror = { workspace = true }
anyhow = { workspace = true }
[dev-dependencies]
tempfile = { workspace = true }
tokio = { workspace = true, features = ["rt-multi-thread", "macros"] }
[lints]
workspace = true

View file

@ -0,0 +1,79 @@
//! OpenRaft type configuration for Chainfire
use chainfire_types::command::{RaftCommand, RaftResponse};
use chainfire_types::NodeId;
use openraft::BasicNode;
use std::io::Cursor;
// Use the declare_raft_types macro for OpenRaft 0.9
// NodeId defaults to u64, which matches our chainfire_types::NodeId
openraft::declare_raft_types!(
/// OpenRaft type configuration for Chainfire
pub TypeConfig:
D = RaftCommand,
R = RaftResponse,
Node = BasicNode,
);
/// Request data type - commands submitted to Raft
pub type Request = RaftCommand;
/// Response data type - responses from state machine
pub type Response = RaftResponse;
/// Log ID type
pub type LogId = openraft::LogId<NodeId>;
/// Vote type
pub type Vote = openraft::Vote<NodeId>;
/// Snapshot meta type (uses NodeId and Node separately)
pub type SnapshotMeta = openraft::SnapshotMeta<NodeId, BasicNode>;
/// Membership type (uses NodeId and Node separately)
pub type Membership = openraft::Membership<NodeId, BasicNode>;
/// Stored membership type
pub type StoredMembership = openraft::StoredMembership<NodeId, BasicNode>;
/// Entry type
pub type Entry = openraft::Entry<TypeConfig>;
/// Leader ID type
pub type LeaderId = openraft::LeaderId<NodeId>;
/// Committed Leader ID type
pub type CommittedLeaderId = openraft::CommittedLeaderId<NodeId>;
/// Raft configuration builder
pub fn default_config() -> openraft::Config {
openraft::Config {
cluster_name: "chainfire".into(),
heartbeat_interval: 150,
election_timeout_min: 300,
election_timeout_max: 600,
install_snapshot_timeout: 400,
max_payload_entries: 300,
replication_lag_threshold: 1000,
snapshot_policy: openraft::SnapshotPolicy::LogsSinceLast(5000),
snapshot_max_chunk_size: 3 * 1024 * 1024, // 3MB
max_in_snapshot_log_to_keep: 1000,
purge_batch_size: 256,
enable_tick: true,
enable_heartbeat: true,
enable_elect: true,
..Default::default()
}
}
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn test_default_config() {
let config = default_config();
assert_eq!(config.cluster_name, "chainfire");
assert!(config.heartbeat_interval < config.election_timeout_min);
}
}

View file

@ -0,0 +1,20 @@
//! OpenRaft integration for Chainfire distributed KVS
//!
//! This crate provides:
//! - TypeConfig for OpenRaft
//! - Network implementation for Raft RPC
//! - Storage adapters
//! - Raft node management
pub mod config;
pub mod network;
pub mod node;
pub mod storage;
pub use config::TypeConfig;
pub use network::{NetworkFactory, RaftNetworkError};
pub use node::RaftNode;
pub use storage::RaftStorage;
/// Raft type alias with our configuration
pub type Raft = openraft::Raft<TypeConfig>;

View file

@ -0,0 +1,316 @@
//! Network implementation for Raft RPC
//!
//! This module provides network adapters for OpenRaft to communicate between nodes.
use crate::config::TypeConfig;
use chainfire_types::NodeId;
use openraft::error::{InstallSnapshotError, NetworkError, RaftError, RPCError, StreamingError, Fatal};
use openraft::network::{RPCOption, RaftNetwork, RaftNetworkFactory};
use openraft::raft::{
AppendEntriesRequest, AppendEntriesResponse, InstallSnapshotRequest, InstallSnapshotResponse,
SnapshotResponse, VoteRequest, VoteResponse,
};
use openraft::BasicNode;
use std::collections::HashMap;
use std::sync::Arc;
use thiserror::Error;
use tokio::sync::RwLock;
use tracing::{debug, trace};
/// Network error type
#[derive(Error, Debug)]
pub enum RaftNetworkError {
#[error("Connection failed to node {node_id}: {reason}")]
ConnectionFailed { node_id: NodeId, reason: String },
#[error("RPC failed: {0}")]
RpcFailed(String),
#[error("Timeout")]
Timeout,
#[error("Node {0} not found")]
NodeNotFound(NodeId),
}
/// Trait for sending Raft RPCs
/// This will be implemented by the gRPC client in chainfire-api
#[async_trait::async_trait]
pub trait RaftRpcClient: Send + Sync + 'static {
async fn vote(
&self,
target: NodeId,
req: VoteRequest<NodeId>,
) -> Result<VoteResponse<NodeId>, RaftNetworkError>;
async fn append_entries(
&self,
target: NodeId,
req: AppendEntriesRequest<TypeConfig>,
) -> Result<AppendEntriesResponse<NodeId>, RaftNetworkError>;
async fn install_snapshot(
&self,
target: NodeId,
req: InstallSnapshotRequest<TypeConfig>,
) -> Result<InstallSnapshotResponse<NodeId>, RaftNetworkError>;
}
/// Factory for creating network connections to Raft peers
pub struct NetworkFactory {
/// RPC client for sending requests
client: Arc<dyn RaftRpcClient>,
/// Node address mapping
nodes: Arc<RwLock<HashMap<NodeId, BasicNode>>>,
}
impl NetworkFactory {
/// Create a new network factory
pub fn new(client: Arc<dyn RaftRpcClient>) -> Self {
Self {
client,
nodes: Arc::new(RwLock::new(HashMap::new())),
}
}
/// Add or update a node's address
pub async fn add_node(&self, id: NodeId, node: BasicNode) {
let mut nodes = self.nodes.write().await;
nodes.insert(id, node);
}
/// Remove a node
pub async fn remove_node(&self, id: NodeId) {
let mut nodes = self.nodes.write().await;
nodes.remove(&id);
}
}
impl RaftNetworkFactory<TypeConfig> for NetworkFactory {
type Network = NetworkConnection;
async fn new_client(&mut self, target: NodeId, node: &BasicNode) -> Self::Network {
// Update our node map
self.nodes.write().await.insert(target, node.clone());
NetworkConnection {
target,
node: node.clone(),
client: Arc::clone(&self.client),
}
}
}
/// A connection to a single Raft peer
pub struct NetworkConnection {
target: NodeId,
node: BasicNode,
client: Arc<dyn RaftRpcClient>,
}
/// Convert our network error to OpenRaft's RPCError
fn to_rpc_error<E: std::error::Error>(e: RaftNetworkError) -> RPCError<NodeId, BasicNode, RaftError<NodeId, E>> {
RPCError::Network(NetworkError::new(&e))
}
/// Convert our network error to OpenRaft's RPCError with InstallSnapshotError
fn to_snapshot_rpc_error(e: RaftNetworkError) -> RPCError<NodeId, BasicNode, RaftError<NodeId, InstallSnapshotError>> {
RPCError::Network(NetworkError::new(&e))
}
impl RaftNetwork<TypeConfig> for NetworkConnection {
async fn vote(
&mut self,
req: VoteRequest<NodeId>,
_option: RPCOption,
) -> Result<
VoteResponse<NodeId>,
RPCError<NodeId, BasicNode, RaftError<NodeId>>,
> {
trace!(target = self.target, "Sending vote request");
self.client
.vote(self.target, req)
.await
.map_err(to_rpc_error)
}
async fn append_entries(
&mut self,
req: AppendEntriesRequest<TypeConfig>,
_option: RPCOption,
) -> Result<
AppendEntriesResponse<NodeId>,
RPCError<NodeId, BasicNode, RaftError<NodeId>>,
> {
trace!(
target = self.target,
entries = req.entries.len(),
"Sending append entries"
);
self.client
.append_entries(self.target, req)
.await
.map_err(to_rpc_error)
}
async fn install_snapshot(
&mut self,
req: InstallSnapshotRequest<TypeConfig>,
_option: RPCOption,
) -> Result<
InstallSnapshotResponse<NodeId>,
RPCError<NodeId, BasicNode, RaftError<NodeId, InstallSnapshotError>>,
> {
debug!(
target = self.target,
last_log_id = ?req.meta.last_log_id,
"Sending install snapshot"
);
self.client
.install_snapshot(self.target, req)
.await
.map_err(to_snapshot_rpc_error)
}
async fn full_snapshot(
&mut self,
vote: openraft::Vote<NodeId>,
snapshot: openraft::Snapshot<TypeConfig>,
_cancel: impl std::future::Future<Output = openraft::error::ReplicationClosed> + Send + 'static,
_option: RPCOption,
) -> Result<
SnapshotResponse<NodeId>,
StreamingError<TypeConfig, Fatal<NodeId>>,
> {
// For simplicity, send snapshot in one chunk
// In production, you'd want to chunk large snapshots
let req = InstallSnapshotRequest {
vote,
meta: snapshot.meta.clone(),
offset: 0,
data: snapshot.snapshot.into_inner(),
done: true,
};
debug!(
target = self.target,
last_log_id = ?snapshot.meta.last_log_id,
"Sending full snapshot"
);
let resp = self
.client
.install_snapshot(self.target, req)
.await
.map_err(|e| StreamingError::Network(NetworkError::new(&e)))?;
Ok(SnapshotResponse { vote: resp.vote })
}
}
/// In-memory RPC client for testing
#[cfg(test)]
pub mod test_client {
use super::*;
use std::collections::HashMap;
use tokio::sync::mpsc;
/// A simple in-memory RPC client for testing
pub struct InMemoryRpcClient {
/// Channel senders to each node
channels: Arc<RwLock<HashMap<NodeId, mpsc::Sender<RpcMessage>>>>,
}
pub enum RpcMessage {
Vote(
VoteRequest<NodeId>,
tokio::sync::oneshot::Sender<VoteResponse<NodeId>>,
),
AppendEntries(
AppendEntriesRequest<TypeConfig>,
tokio::sync::oneshot::Sender<AppendEntriesResponse<NodeId>>,
),
InstallSnapshot(
InstallSnapshotRequest<TypeConfig>,
tokio::sync::oneshot::Sender<InstallSnapshotResponse<NodeId>>,
),
}
impl InMemoryRpcClient {
pub fn new() -> Self {
Self {
channels: Arc::new(RwLock::new(HashMap::new())),
}
}
pub async fn register(&self, id: NodeId, tx: mpsc::Sender<RpcMessage>) {
self.channels.write().await.insert(id, tx);
}
}
#[async_trait::async_trait]
impl RaftRpcClient for InMemoryRpcClient {
async fn vote(
&self,
target: NodeId,
req: VoteRequest<NodeId>,
) -> Result<VoteResponse<NodeId>, RaftNetworkError> {
let channels = self.channels.read().await;
let tx = channels
.get(&target)
.ok_or(RaftNetworkError::NodeNotFound(target))?;
let (resp_tx, resp_rx) = tokio::sync::oneshot::channel();
tx.send(RpcMessage::Vote(req, resp_tx))
.await
.map_err(|_| RaftNetworkError::RpcFailed("Channel closed".into()))?;
resp_rx
.await
.map_err(|_| RaftNetworkError::RpcFailed("Response channel closed".into()))
}
async fn append_entries(
&self,
target: NodeId,
req: AppendEntriesRequest<TypeConfig>,
) -> Result<AppendEntriesResponse<NodeId>, RaftNetworkError> {
let channels = self.channels.read().await;
let tx = channels
.get(&target)
.ok_or(RaftNetworkError::NodeNotFound(target))?;
let (resp_tx, resp_rx) = tokio::sync::oneshot::channel();
tx.send(RpcMessage::AppendEntries(req, resp_tx))
.await
.map_err(|_| RaftNetworkError::RpcFailed("Channel closed".into()))?;
resp_rx
.await
.map_err(|_| RaftNetworkError::RpcFailed("Response channel closed".into()))
}
async fn install_snapshot(
&self,
target: NodeId,
req: InstallSnapshotRequest<TypeConfig>,
) -> Result<InstallSnapshotResponse<NodeId>, RaftNetworkError> {
let channels = self.channels.read().await;
let tx = channels
.get(&target)
.ok_or(RaftNetworkError::NodeNotFound(target))?;
let (resp_tx, resp_rx) = tokio::sync::oneshot::channel();
tx.send(RpcMessage::InstallSnapshot(req, resp_tx))
.await
.map_err(|_| RaftNetworkError::RpcFailed("Channel closed".into()))?;
resp_rx
.await
.map_err(|_| RaftNetworkError::RpcFailed("Response channel closed".into()))
}
}
}

View file

@ -0,0 +1,326 @@
//! Raft node management
//!
//! This module provides the high-level API for managing a Raft node.
use crate::config::{default_config, TypeConfig};
use crate::network::{NetworkFactory, RaftRpcClient};
use crate::storage::RaftStorage;
use crate::Raft;
use chainfire_storage::RocksStore;
use chainfire_types::command::{RaftCommand, RaftResponse};
use chainfire_types::error::RaftError;
use chainfire_types::NodeId;
use openraft::{BasicNode, Config};
use std::collections::BTreeMap;
use std::sync::Arc;
use tokio::sync::RwLock;
use tracing::{debug, info};
/// A Raft node instance
pub struct RaftNode {
/// Node ID
id: NodeId,
/// OpenRaft instance (wrapped in Arc for sharing)
raft: Arc<Raft>,
/// Storage
storage: Arc<RwLock<RaftStorage>>,
/// Network factory
network: Arc<RwLock<NetworkFactory>>,
/// Configuration
config: Arc<Config>,
}
impl RaftNode {
/// Create a new Raft node
pub async fn new(
id: NodeId,
store: RocksStore,
rpc_client: Arc<dyn RaftRpcClient>,
) -> Result<Self, RaftError> {
let config = Arc::new(default_config());
// Create storage wrapper for local access
let storage =
RaftStorage::new(store.clone()).map_err(|e| RaftError::Internal(e.to_string()))?;
let storage = Arc::new(RwLock::new(storage));
let network = NetworkFactory::new(Arc::clone(&rpc_client));
// Create log storage and state machine (they share the same underlying store)
let log_storage = RaftStorage::new(store.clone())
.map_err(|e| RaftError::Internal(e.to_string()))?;
let state_machine = RaftStorage::new(store)
.map_err(|e| RaftError::Internal(e.to_string()))?;
// Create Raft instance with separate log storage and state machine
let raft = Arc::new(
Raft::new(
id,
config.clone(),
network,
log_storage,
state_machine,
)
.await
.map_err(|e| RaftError::Internal(e.to_string()))?,
);
info!(node_id = id, "Created Raft node");
Ok(Self {
id,
raft,
storage,
network: Arc::new(RwLock::new(NetworkFactory::new(rpc_client))),
config,
})
}
/// Get the node ID
pub fn id(&self) -> NodeId {
self.id
}
/// Get the Raft instance (reference)
pub fn raft(&self) -> &Raft {
&self.raft
}
/// Get the Raft instance (Arc clone for sharing)
pub fn raft_arc(&self) -> Arc<Raft> {
Arc::clone(&self.raft)
}
/// Get the storage
pub fn storage(&self) -> &Arc<RwLock<RaftStorage>> {
&self.storage
}
/// Initialize a single-node cluster
pub async fn initialize(&self) -> Result<(), RaftError> {
let mut nodes = BTreeMap::new();
nodes.insert(self.id, BasicNode::default());
self.raft
.initialize(nodes)
.await
.map_err(|e| RaftError::Internal(e.to_string()))?;
info!(node_id = self.id, "Initialized single-node cluster");
Ok(())
}
/// Initialize a multi-node cluster
pub async fn initialize_cluster(
&self,
members: BTreeMap<NodeId, BasicNode>,
) -> Result<(), RaftError> {
self.raft
.initialize(members)
.await
.map_err(|e| RaftError::Internal(e.to_string()))?;
info!(node_id = self.id, "Initialized multi-node cluster");
Ok(())
}
/// Add a learner node
pub async fn add_learner(
&self,
id: NodeId,
node: BasicNode,
blocking: bool,
) -> Result<(), RaftError> {
self.raft
.add_learner(id, node, blocking)
.await
.map_err(|e| RaftError::Internal(e.to_string()))?;
info!(node_id = id, "Added learner");
Ok(())
}
/// Change cluster membership
pub async fn change_membership(
&self,
members: BTreeMap<NodeId, BasicNode>,
retain: bool,
) -> Result<(), RaftError> {
let member_ids: std::collections::BTreeSet<_> = members.keys().cloned().collect();
self.raft
.change_membership(member_ids, retain)
.await
.map_err(|e| RaftError::Internal(e.to_string()))?;
info!(?members, "Changed membership");
Ok(())
}
/// Submit a write request (goes through Raft consensus)
pub async fn write(&self, cmd: RaftCommand) -> Result<RaftResponse, RaftError> {
let response = self
.raft
.client_write(cmd)
.await
.map_err(|e| match e {
openraft::error::RaftError::APIError(
openraft::error::ClientWriteError::ForwardToLeader(fwd)
) => RaftError::NotLeader {
leader_id: fwd.leader_id,
},
_ => RaftError::ProposalFailed(e.to_string()),
})?;
Ok(response.data)
}
/// Read from the state machine (linearizable read)
pub async fn linearizable_read(&self) -> Result<(), RaftError> {
self.raft
.ensure_linearizable()
.await
.map_err(|e| RaftError::Internal(e.to_string()))?;
Ok(())
}
/// Get current leader ID
pub async fn leader(&self) -> Option<NodeId> {
let metrics = self.raft.metrics().borrow().clone();
metrics.current_leader
}
/// Check if this node is the leader
pub async fn is_leader(&self) -> bool {
self.leader().await == Some(self.id)
}
/// Get current term
pub async fn current_term(&self) -> u64 {
let metrics = self.raft.metrics().borrow().clone();
metrics.current_term
}
/// Get cluster membership
pub async fn membership(&self) -> Vec<NodeId> {
let metrics = self.raft.metrics().borrow().clone();
metrics
.membership_config
.membership()
.voter_ids()
.collect()
}
/// Shutdown the node
pub async fn shutdown(&self) -> Result<(), RaftError> {
self.raft
.shutdown()
.await
.map_err(|e| RaftError::Internal(e.to_string()))?;
info!(node_id = self.id, "Raft node shutdown");
Ok(())
}
/// Trigger a snapshot
pub async fn trigger_snapshot(&self) -> Result<(), RaftError> {
self.raft
.trigger()
.snapshot()
.await
.map_err(|e| RaftError::Internal(e.to_string()))?;
debug!(node_id = self.id, "Triggered snapshot");
Ok(())
}
}
/// Dummy RPC client for initialization
struct DummyRpcClient;
#[async_trait::async_trait]
impl RaftRpcClient for DummyRpcClient {
async fn vote(
&self,
_target: NodeId,
_req: openraft::raft::VoteRequest<NodeId>,
) -> Result<openraft::raft::VoteResponse<NodeId>, crate::network::RaftNetworkError> {
Err(crate::network::RaftNetworkError::RpcFailed(
"Dummy client".into(),
))
}
async fn append_entries(
&self,
_target: NodeId,
_req: openraft::raft::AppendEntriesRequest<TypeConfig>,
) -> Result<openraft::raft::AppendEntriesResponse<NodeId>, crate::network::RaftNetworkError>
{
Err(crate::network::RaftNetworkError::RpcFailed(
"Dummy client".into(),
))
}
async fn install_snapshot(
&self,
_target: NodeId,
_req: openraft::raft::InstallSnapshotRequest<TypeConfig>,
) -> Result<openraft::raft::InstallSnapshotResponse<NodeId>, crate::network::RaftNetworkError>
{
Err(crate::network::RaftNetworkError::RpcFailed(
"Dummy client".into(),
))
}
}
#[cfg(test)]
mod tests {
use super::*;
use tempfile::tempdir;
async fn create_test_node(id: NodeId) -> RaftNode {
let dir = tempdir().unwrap();
let store = RocksStore::new(dir.path()).unwrap();
RaftNode::new(id, store, Arc::new(DummyRpcClient))
.await
.unwrap()
}
#[tokio::test]
async fn test_node_creation() {
let node = create_test_node(1).await;
assert_eq!(node.id(), 1);
}
#[tokio::test]
async fn test_single_node_initialization() {
let node = create_test_node(1).await;
node.initialize().await.unwrap();
// Should be leader of single-node cluster
tokio::time::sleep(std::time::Duration::from_millis(500)).await;
let leader = node.leader().await;
assert_eq!(leader, Some(1));
}
#[tokio::test]
async fn test_single_node_write() {
let node = create_test_node(1).await;
node.initialize().await.unwrap();
// Wait for leader election
tokio::time::sleep(std::time::Duration::from_millis(500)).await;
let cmd = RaftCommand::Put {
key: b"test".to_vec(),
value: b"data".to_vec(),
lease_id: None,
prev_kv: false,
};
let response = node.write(cmd).await.unwrap();
assert_eq!(response.revision, 1);
}
}

View file

@ -0,0 +1,475 @@
//! Storage adapters for OpenRaft
//!
//! This module provides the storage traits implementation for OpenRaft using our RocksDB-based storage.
use crate::config::{CommittedLeaderId, LogId, Membership, StoredMembership, TypeConfig};
use chainfire_storage::{
log_storage::{EntryPayload, LogEntry, LogId as InternalLogId, Vote as InternalVote},
snapshot::{Snapshot, SnapshotBuilder},
LogStorage, RocksStore, StateMachine,
};
use chainfire_types::command::{RaftCommand, RaftResponse};
use chainfire_types::error::StorageError as ChainfireStorageError;
use chainfire_types::NodeId;
use openraft::storage::{LogFlushed, LogState as OpenRaftLogState, RaftLogStorage, RaftStateMachine};
use openraft::{
AnyError, BasicNode, Entry, EntryPayload as OpenRaftEntryPayload,
ErrorSubject, ErrorVerb, SnapshotMeta as OpenRaftSnapshotMeta,
StorageError as OpenRaftStorageError, StorageIOError,
Vote as OpenRaftVote,
};
use std::fmt::Debug;
use std::io::Cursor;
use std::sync::Arc;
use tokio::sync::{mpsc, RwLock};
use tracing::{debug, info, trace};
/// Combined Raft storage implementing OpenRaft traits
pub struct RaftStorage {
/// Underlying RocksDB store
store: RocksStore,
/// Log storage
log: LogStorage,
/// State machine
state_machine: Arc<RwLock<StateMachine>>,
/// Snapshot builder
snapshot_builder: SnapshotBuilder,
/// Current membership
membership: RwLock<Option<StoredMembership>>,
/// Last applied log ID
last_applied: RwLock<Option<LogId>>,
}
/// Convert our storage error to OpenRaft StorageError
fn to_storage_error(e: ChainfireStorageError) -> OpenRaftStorageError<NodeId> {
let io_err = StorageIOError::new(
ErrorSubject::Store,
ErrorVerb::Read,
AnyError::new(&e),
);
OpenRaftStorageError::IO { source: io_err }
}
impl RaftStorage {
/// Create new Raft storage
pub fn new(store: RocksStore) -> Result<Self, ChainfireStorageError> {
let log = LogStorage::new(store.clone());
let state_machine = Arc::new(RwLock::new(StateMachine::new(store.clone())?));
let snapshot_builder = SnapshotBuilder::new(store.clone());
Ok(Self {
store,
log,
state_machine,
snapshot_builder,
membership: RwLock::new(None),
last_applied: RwLock::new(None),
})
}
/// Set the watch event sender
pub async fn set_watch_sender(&self, tx: mpsc::UnboundedSender<chainfire_types::WatchEvent>) {
let mut sm = self.state_machine.write().await;
sm.set_watch_sender(tx);
}
/// Get the state machine
pub fn state_machine(&self) -> &Arc<RwLock<StateMachine>> {
&self.state_machine
}
/// Convert internal LogId to OpenRaft LogId
fn to_openraft_log_id(id: InternalLogId) -> LogId {
// Create CommittedLeaderId from term (node_id is ignored in std implementation)
let committed_leader_id = CommittedLeaderId::new(id.term, 0);
openraft::LogId::new(committed_leader_id, id.index)
}
/// Convert OpenRaft LogId to internal LogId
fn from_openraft_log_id(id: &LogId) -> InternalLogId {
InternalLogId::new(id.leader_id.term, id.index)
}
/// Convert internal Vote to OpenRaft Vote
fn to_openraft_vote(vote: InternalVote) -> OpenRaftVote<NodeId> {
OpenRaftVote::new(vote.term, vote.node_id.unwrap_or(0))
}
/// Convert OpenRaft Vote to internal Vote
fn from_openraft_vote(vote: &OpenRaftVote<NodeId>) -> InternalVote {
InternalVote {
term: vote.leader_id().term,
node_id: Some(vote.leader_id().node_id),
committed: vote.is_committed(),
}
}
/// Convert internal entry to OpenRaft entry
fn to_openraft_entry(entry: LogEntry<RaftCommand>) -> Entry<TypeConfig> {
let payload = match entry.payload {
EntryPayload::Blank => OpenRaftEntryPayload::Blank,
EntryPayload::Normal(data) => OpenRaftEntryPayload::Normal(data),
EntryPayload::Membership(members) => {
// Create membership from node IDs
let nodes: std::collections::BTreeMap<NodeId, BasicNode> = members
.into_iter()
.map(|id| (id, BasicNode::default()))
.collect();
let membership = Membership::new(vec![nodes.keys().cloned().collect()], None);
OpenRaftEntryPayload::Membership(membership)
}
};
Entry {
log_id: Self::to_openraft_log_id(entry.log_id),
payload,
}
}
/// Convert OpenRaft entry to internal entry
fn from_openraft_entry(entry: &Entry<TypeConfig>) -> LogEntry<RaftCommand> {
let payload = match &entry.payload {
OpenRaftEntryPayload::Blank => EntryPayload::Blank,
OpenRaftEntryPayload::Normal(data) => EntryPayload::Normal(data.clone()),
OpenRaftEntryPayload::Membership(m) => {
let members: Vec<NodeId> = m.voter_ids().collect();
EntryPayload::Membership(members)
}
};
LogEntry {
log_id: Self::from_openraft_log_id(&entry.log_id),
payload,
}
}
}
impl RaftLogStorage<TypeConfig> for RaftStorage {
type LogReader = Self;
async fn get_log_state(
&mut self,
) -> Result<OpenRaftLogState<TypeConfig>, OpenRaftStorageError<NodeId>> {
let state = self
.log
.get_log_state()
.map_err(to_storage_error)?;
Ok(OpenRaftLogState {
last_purged_log_id: state.last_purged_log_id.map(Self::to_openraft_log_id),
last_log_id: state.last_log_id.map(Self::to_openraft_log_id),
})
}
async fn save_vote(
&mut self,
vote: &OpenRaftVote<NodeId>,
) -> Result<(), OpenRaftStorageError<NodeId>> {
let internal_vote = Self::from_openraft_vote(vote);
self.log
.save_vote(internal_vote)
.map_err(to_storage_error)
}
async fn read_vote(
&mut self,
) -> Result<Option<OpenRaftVote<NodeId>>, OpenRaftStorageError<NodeId>> {
match self.log.read_vote() {
Ok(Some(vote)) => Ok(Some(Self::to_openraft_vote(vote))),
Ok(None) => Ok(None),
Err(e) => Err(to_storage_error(e)),
}
}
async fn save_committed(
&mut self,
committed: Option<LogId>,
) -> Result<(), OpenRaftStorageError<NodeId>> {
// Store committed index in metadata
debug!(?committed, "Saving committed log id");
Ok(())
}
async fn read_committed(
&mut self,
) -> Result<Option<LogId>, OpenRaftStorageError<NodeId>> {
// Return the last applied as committed
let last_applied = self.last_applied.read().await;
Ok(last_applied.clone())
}
async fn append<I: IntoIterator<Item = Entry<TypeConfig>> + Send>(
&mut self,
entries: I,
callback: LogFlushed<TypeConfig>,
) -> Result<(), OpenRaftStorageError<NodeId>> {
let entries: Vec<_> = entries.into_iter().collect();
if entries.is_empty() {
callback.log_io_completed(Ok(()));
return Ok(());
}
let internal_entries: Vec<_> = entries.iter().map(Self::from_openraft_entry).collect();
match self.log.append(&internal_entries) {
Ok(()) => {
callback.log_io_completed(Ok(()));
Ok(())
}
Err(e) => {
let io_err = std::io::Error::new(std::io::ErrorKind::Other, e.to_string());
callback.log_io_completed(Err(io_err));
Err(to_storage_error(e))
}
}
}
async fn truncate(
&mut self,
log_id: LogId,
) -> Result<(), OpenRaftStorageError<NodeId>> {
self.log
.truncate(log_id.index)
.map_err(to_storage_error)
}
async fn purge(
&mut self,
log_id: LogId,
) -> Result<(), OpenRaftStorageError<NodeId>> {
self.log
.purge(log_id.index)
.map_err(to_storage_error)
}
async fn get_log_reader(&mut self) -> Self::LogReader {
// Return self as the log reader
RaftStorage {
store: self.store.clone(),
log: LogStorage::new(self.store.clone()),
state_machine: Arc::clone(&self.state_machine),
snapshot_builder: SnapshotBuilder::new(self.store.clone()),
membership: RwLock::new(None),
last_applied: RwLock::new(None),
}
}
}
impl openraft::storage::RaftLogReader<TypeConfig> for RaftStorage {
async fn try_get_log_entries<RB: std::ops::RangeBounds<u64> + Clone + Debug + Send>(
&mut self,
range: RB,
) -> Result<Vec<Entry<TypeConfig>>, OpenRaftStorageError<NodeId>> {
let entries: Vec<LogEntry<RaftCommand>> =
self.log.get_log_entries(range).map_err(to_storage_error)?;
Ok(entries.into_iter().map(Self::to_openraft_entry).collect())
}
}
impl RaftStateMachine<TypeConfig> for RaftStorage {
type SnapshotBuilder = Self;
async fn applied_state(
&mut self,
) -> Result<(Option<LogId>, StoredMembership), OpenRaftStorageError<NodeId>> {
let last_applied = self.last_applied.read().await.clone();
let membership = self
.membership
.read()
.await
.clone()
.unwrap_or_else(|| StoredMembership::new(None, Membership::new(vec![], None)));
Ok((last_applied, membership))
}
async fn apply<I: IntoIterator<Item = Entry<TypeConfig>> + Send>(
&mut self,
entries: I,
) -> Result<Vec<RaftResponse>, OpenRaftStorageError<NodeId>> {
let mut responses = Vec::new();
let sm = self.state_machine.write().await;
for entry in entries {
trace!(log_id = ?entry.log_id, "Applying entry");
let response = match &entry.payload {
OpenRaftEntryPayload::Blank => RaftResponse::new(sm.current_revision()),
OpenRaftEntryPayload::Normal(cmd) => {
sm.apply(cmd.clone()).map_err(to_storage_error)?
}
OpenRaftEntryPayload::Membership(m) => {
// Update stored membership
let stored = StoredMembership::new(Some(entry.log_id.clone()), m.clone());
*self.membership.write().await = Some(stored);
RaftResponse::new(sm.current_revision())
}
};
responses.push(response);
// Update last applied
*self.last_applied.write().await = Some(entry.log_id.clone());
}
Ok(responses)
}
async fn get_snapshot_builder(&mut self) -> Self::SnapshotBuilder {
RaftStorage {
store: self.store.clone(),
log: LogStorage::new(self.store.clone()),
state_machine: Arc::clone(&self.state_machine),
snapshot_builder: SnapshotBuilder::new(self.store.clone()),
membership: RwLock::new(None),
last_applied: RwLock::new(None),
}
}
async fn begin_receiving_snapshot(
&mut self,
) -> Result<Box<Cursor<Vec<u8>>>, OpenRaftStorageError<NodeId>> {
Ok(Box::new(Cursor::new(Vec::new())))
}
async fn install_snapshot(
&mut self,
meta: &OpenRaftSnapshotMeta<NodeId, BasicNode>,
snapshot: Box<Cursor<Vec<u8>>>,
) -> Result<(), OpenRaftStorageError<NodeId>> {
let data = snapshot.into_inner();
// Parse and apply snapshot
let snapshot = Snapshot::from_bytes(&data).map_err(to_storage_error)?;
self.snapshot_builder
.apply(&snapshot)
.map_err(to_storage_error)?;
// Update state
*self.last_applied.write().await = meta.last_log_id.clone();
*self.membership.write().await = Some(meta.last_membership.clone());
info!(last_log_id = ?meta.last_log_id, "Installed snapshot");
Ok(())
}
async fn get_current_snapshot(
&mut self,
) -> Result<Option<openraft::Snapshot<TypeConfig>>, OpenRaftStorageError<NodeId>> {
let last_applied = self.last_applied.read().await.clone();
let membership = self.membership.read().await.clone();
let Some(log_id) = last_applied else {
return Ok(None);
};
let membership_ids: Vec<NodeId> = membership
.as_ref()
.map(|m| m.membership().voter_ids().collect())
.unwrap_or_default();
let snapshot = self
.snapshot_builder
.build(log_id.index, log_id.leader_id.term, membership_ids)
.map_err(to_storage_error)?;
let data = snapshot.to_bytes().map_err(to_storage_error)?;
let last_membership = membership
.unwrap_or_else(|| StoredMembership::new(None, Membership::new(vec![], None)));
let meta = OpenRaftSnapshotMeta {
last_log_id: Some(log_id),
last_membership,
snapshot_id: format!(
"{}-{}",
self.last_applied.read().await.as_ref().map(|l| l.leader_id.term).unwrap_or(0),
self.last_applied.read().await.as_ref().map(|l| l.index).unwrap_or(0)
),
};
Ok(Some(openraft::Snapshot {
meta,
snapshot: Box::new(Cursor::new(data)),
}))
}
}
impl openraft::storage::RaftSnapshotBuilder<TypeConfig> for RaftStorage {
async fn build_snapshot(
&mut self,
) -> Result<openraft::Snapshot<TypeConfig>, OpenRaftStorageError<NodeId>> {
self.get_current_snapshot()
.await?
.ok_or_else(|| {
let io_err = StorageIOError::new(
ErrorSubject::Snapshot(None),
ErrorVerb::Read,
AnyError::error("No snapshot available"),
);
OpenRaftStorageError::IO { source: io_err }
})
}
}
#[cfg(test)]
mod tests {
use super::*;
use openraft::RaftLogReader;
use tempfile::tempdir;
fn create_test_storage() -> RaftStorage {
let dir = tempdir().unwrap();
let store = RocksStore::new(dir.path()).unwrap();
RaftStorage::new(store).unwrap()
}
#[tokio::test]
async fn test_vote_persistence() {
let mut storage = create_test_storage();
let vote = OpenRaftVote::new(5, 1);
storage.save_vote(&vote).await.unwrap();
let loaded = storage.read_vote().await.unwrap().unwrap();
assert_eq!(loaded.leader_id().term, 5);
assert_eq!(loaded.leader_id().node_id, 1);
}
#[tokio::test]
async fn test_log_state_initial() {
let mut storage = create_test_storage();
// Initially, log should be empty
let state = storage.get_log_state().await.unwrap();
assert!(state.last_log_id.is_none());
assert!(state.last_purged_log_id.is_none());
}
#[tokio::test]
async fn test_apply_entries() {
let mut storage = create_test_storage();
let entries = vec![Entry {
log_id: openraft::LogId::new(CommittedLeaderId::new(1, 0), 1),
payload: OpenRaftEntryPayload::Normal(RaftCommand::Put {
key: b"test".to_vec(),
value: b"data".to_vec(),
lease_id: None,
prev_kv: false,
}),
}];
let responses = storage.apply(entries).await.unwrap();
assert_eq!(responses.len(), 1);
assert_eq!(responses[0].revision, 1);
// Verify in state machine
let sm = storage.state_machine.read().await;
let entry = sm.kv().get(b"test").unwrap().unwrap();
assert_eq!(entry.value, b"data");
}
}

View file

@ -0,0 +1,59 @@
[package]
name = "chainfire-server"
version.workspace = true
edition.workspace = true
license.workspace = true
rust-version.workspace = true
description = "Chainfire distributed KVS server"
[lib]
name = "chainfire_server"
path = "src/lib.rs"
[[bin]]
name = "chainfire"
path = "src/main.rs"
[dependencies]
chainfire-types = { workspace = true }
chainfire-storage = { workspace = true }
chainfire-raft = { workspace = true }
chainfire-gossip = { workspace = true }
chainfire-watch = { workspace = true }
chainfire-api = { workspace = true }
# Async
tokio = { workspace = true }
futures = { workspace = true }
async-trait = { workspace = true }
# Raft (for RPC types)
openraft = { workspace = true }
# gRPC
tonic = { workspace = true }
tonic-health = { workspace = true }
# Configuration
clap = { workspace = true }
toml = { workspace = true }
serde = { workspace = true }
# Logging
tracing = { workspace = true }
tracing-subscriber = { workspace = true }
# Metrics
metrics = { workspace = true }
metrics-exporter-prometheus = { workspace = true }
# Utilities
anyhow = { workspace = true }
[dev-dependencies]
tempfile = { workspace = true }
chainfire-client = { workspace = true }
tokio = { workspace = true, features = ["rt-multi-thread", "macros", "time"] }
[lints]
workspace = true

View file

@ -0,0 +1,160 @@
//! Server configuration
use anyhow::Result;
use chainfire_types::RaftRole;
use serde::{Deserialize, Serialize};
use std::net::SocketAddr;
use std::path::{Path, PathBuf};
/// Server configuration
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct ServerConfig {
/// Node configuration
pub node: NodeConfig,
/// Storage configuration
pub storage: StorageConfig,
/// Network configuration
pub network: NetworkConfig,
/// Cluster configuration
pub cluster: ClusterConfig,
/// Raft configuration
#[serde(default)]
pub raft: RaftConfig,
}
/// Node-specific configuration
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct NodeConfig {
/// Unique node ID
pub id: u64,
/// Human-readable name
pub name: String,
/// Node role (control_plane or worker)
pub role: String,
}
/// Storage configuration
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct StorageConfig {
/// Data directory
pub data_dir: PathBuf,
}
/// Network configuration
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct NetworkConfig {
/// API listen address (gRPC)
pub api_addr: SocketAddr,
/// Raft listen address
pub raft_addr: SocketAddr,
/// Gossip listen address (UDP)
pub gossip_addr: SocketAddr,
}
/// Cluster configuration
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct ClusterConfig {
/// Cluster ID
pub id: u64,
/// Initial cluster members
pub initial_members: Vec<MemberConfig>,
/// Whether to bootstrap a new cluster
pub bootstrap: bool,
}
/// Cluster member configuration
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct MemberConfig {
/// Node ID
pub id: u64,
/// Raft address
pub raft_addr: String,
}
/// Raft-specific configuration
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct RaftConfig {
/// Raft participation role: "voter", "learner", or "none"
///
/// - `voter`: Full voting member in Raft consensus
/// - `learner`: Non-voting replica that receives log replication
/// - `none`: No Raft participation, node acts as agent/proxy only
#[serde(default)]
pub role: RaftRole,
}
impl Default for RaftConfig {
fn default() -> Self {
Self {
role: RaftRole::Voter,
}
}
}
impl Default for ServerConfig {
fn default() -> Self {
Self {
node: NodeConfig {
id: 1,
name: "chainfire-1".into(),
role: "control_plane".into(),
},
storage: StorageConfig {
data_dir: PathBuf::from("./data"),
},
network: NetworkConfig {
api_addr: "127.0.0.1:2379".parse().unwrap(),
raft_addr: "127.0.0.1:2380".parse().unwrap(),
gossip_addr: "127.0.0.1:2381".parse().unwrap(),
},
cluster: ClusterConfig {
id: 1,
initial_members: vec![],
bootstrap: true,
},
raft: RaftConfig::default(),
}
}
}
impl ServerConfig {
/// Load configuration from a file
pub fn load(path: &Path) -> Result<Self> {
let contents = std::fs::read_to_string(path)?;
let config: ServerConfig = toml::from_str(&contents)?;
Ok(config)
}
/// Save configuration to a file
pub fn save(&self, path: &Path) -> Result<()> {
let contents = toml::to_string_pretty(self)?;
std::fs::write(path, contents)?;
Ok(())
}
}
#[cfg(test)]
mod tests {
use super::*;
use tempfile::tempdir;
#[test]
fn test_default_config() {
let config = ServerConfig::default();
assert_eq!(config.node.id, 1);
assert!(config.cluster.bootstrap);
}
#[test]
fn test_config_roundtrip() {
let dir = tempdir().unwrap();
let path = dir.path().join("config.toml");
let config = ServerConfig::default();
config.save(&path).unwrap();
let loaded = ServerConfig::load(&path).unwrap();
assert_eq!(loaded.node.id, config.node.id);
assert_eq!(loaded.network.api_addr, config.network.api_addr);
}
}

View file

@ -0,0 +1,10 @@
//! Chainfire distributed KVS server library
//!
//! This crate provides the server implementation for Chainfire, including:
//! - Server configuration
//! - Node management
//! - gRPC service hosting
pub mod config;
pub mod node;
pub mod server;

View file

@ -0,0 +1,148 @@
//! Chainfire distributed KVS server
use anyhow::Result;
use chainfire_server::config::ServerConfig;
use clap::Parser;
use metrics_exporter_prometheus::PrometheusBuilder;
use std::path::PathBuf;
use tracing::info;
/// Chainfire distributed Key-Value Store
#[derive(Parser, Debug)]
#[command(name = "chainfire")]
#[command(author, version, about, long_about = None)]
struct Args {
/// Configuration file path
#[arg(short, long, default_value = "chainfire.toml")]
config: PathBuf,
/// Node ID (overrides config)
#[arg(long)]
node_id: Option<u64>,
/// Data directory (overrides config)
#[arg(long)]
data_dir: Option<PathBuf>,
/// API listen address (overrides config)
#[arg(long)]
api_addr: Option<String>,
/// Raft listen address (overrides config)
#[arg(long)]
raft_addr: Option<String>,
/// Gossip listen address (overrides config)
#[arg(long)]
gossip_addr: Option<String>,
/// Initial cluster members for bootstrap (comma-separated node_id=addr pairs)
#[arg(long)]
initial_cluster: Option<String>,
/// Enable verbose logging
#[arg(short, long)]
verbose: bool,
/// Metrics port for Prometheus scraping
#[arg(long, default_value = "9091")]
metrics_port: u16,
}
#[tokio::main]
async fn main() -> Result<()> {
let args = Args::parse();
// Initialize logging
let filter = if args.verbose {
"chainfire=debug,tower_http=debug"
} else {
"chainfire=info"
};
tracing_subscriber::fmt()
.with_env_filter(filter)
.with_target(true)
.init();
info!("Chainfire v{}", env!("CARGO_PKG_VERSION"));
// Initialize Prometheus metrics exporter
let metrics_addr = format!("0.0.0.0:{}", args.metrics_port);
let builder = PrometheusBuilder::new();
builder
.with_http_listener(metrics_addr.parse::<std::net::SocketAddr>()?)
.install()
.expect("Failed to install Prometheus metrics exporter");
info!(
"Prometheus metrics available at http://{}/metrics",
metrics_addr
);
// Register chainfire metrics
metrics::describe_counter!(
"chainfire_kv_requests_total",
"Total number of KV requests by operation type"
);
metrics::describe_counter!(
"chainfire_kv_bytes_read",
"Total bytes read from KV store"
);
metrics::describe_counter!(
"chainfire_kv_bytes_written",
"Total bytes written to KV store"
);
metrics::describe_histogram!(
"chainfire_kv_request_duration_seconds",
"KV request duration in seconds"
);
metrics::describe_gauge!(
"chainfire_raft_term",
"Current Raft term"
);
metrics::describe_gauge!(
"chainfire_raft_is_leader",
"Whether this node is the Raft leader (1=yes, 0=no)"
);
metrics::describe_counter!(
"chainfire_watch_events_total",
"Total number of watch events emitted"
);
// Load or create configuration
let mut config = if args.config.exists() {
ServerConfig::load(&args.config)?
} else {
info!("Config file not found, using defaults");
ServerConfig::default()
};
// Apply command line overrides
if let Some(node_id) = args.node_id {
config.node.id = node_id;
}
if let Some(data_dir) = args.data_dir {
config.storage.data_dir = data_dir;
}
if let Some(api_addr) = args.api_addr {
config.network.api_addr = api_addr.parse()?;
}
if let Some(raft_addr) = args.raft_addr {
config.network.raft_addr = raft_addr.parse()?;
}
if let Some(gossip_addr) = args.gossip_addr {
config.network.gossip_addr = gossip_addr.parse()?;
}
info!(node_id = config.node.id, "Starting node");
info!(api_addr = %config.network.api_addr, "API address");
info!(raft_addr = %config.network.raft_addr, "Raft address");
info!(gossip_addr = %config.network.gossip_addr, "Gossip address");
// Start the server
let server = chainfire_server::server::Server::new(config).await?;
server.run().await?;
Ok(())
}

View file

@ -0,0 +1,201 @@
//! Node orchestration
//!
//! This module manages the lifecycle of all components in a Chainfire node.
use crate::config::ServerConfig;
use anyhow::Result;
use chainfire_api::GrpcRaftClient;
use chainfire_gossip::{GossipAgent, GossipId};
use chainfire_raft::{Raft, RaftNode};
use chainfire_storage::RocksStore;
use chainfire_types::node::NodeRole;
use chainfire_types::RaftRole;
use chainfire_watch::WatchRegistry;
use std::sync::Arc;
use tokio::sync::broadcast;
use tracing::info;
/// Node instance managing all components
pub struct Node {
/// Server configuration
config: ServerConfig,
/// Raft node (None if role is RaftRole::None)
raft: Option<Arc<RaftNode>>,
/// Watch registry
watch_registry: Arc<WatchRegistry>,
/// Gossip agent (runs on all nodes)
gossip: Option<GossipAgent>,
/// Shutdown signal
shutdown_tx: broadcast::Sender<()>,
}
impl Node {
/// Create a new node
pub async fn new(config: ServerConfig) -> Result<Self> {
// Ensure data directory exists
std::fs::create_dir_all(&config.storage.data_dir)?;
// Create watch registry
let watch_registry = Arc::new(WatchRegistry::new());
// Create Raft node only if role participates in Raft
let raft = if config.raft.role.participates_in_raft() {
// Create RocksDB store
let store = RocksStore::new(&config.storage.data_dir)?;
info!(data_dir = ?config.storage.data_dir, "Opened storage");
// Create gRPC Raft client and register peer addresses
let rpc_client = Arc::new(GrpcRaftClient::new());
for member in &config.cluster.initial_members {
rpc_client.add_node(member.id, member.raft_addr.clone()).await;
info!(node_id = member.id, addr = %member.raft_addr, "Registered peer");
}
// Create Raft node
let raft_node = Arc::new(
RaftNode::new(config.node.id, store, rpc_client).await?,
);
info!(
node_id = config.node.id,
raft_role = %config.raft.role,
"Created Raft node"
);
Some(raft_node)
} else {
info!(
node_id = config.node.id,
raft_role = %config.raft.role,
"Skipping Raft node (role=none)"
);
None
};
// Gossip runs on ALL nodes regardless of Raft role
let gossip_role = match config.node.role.as_str() {
"control_plane" => NodeRole::ControlPlane,
_ => NodeRole::Worker,
};
let gossip_id = GossipId::new(config.node.id, config.network.gossip_addr, gossip_role);
let gossip = Some(
GossipAgent::new(gossip_id, chainfire_gossip::agent::default_config())
.await?,
);
info!(
addr = %config.network.gossip_addr,
gossip_role = ?gossip_role,
"Created gossip agent"
);
let (shutdown_tx, _) = broadcast::channel(1);
Ok(Self {
config,
raft,
watch_registry,
gossip,
shutdown_tx,
})
}
/// Get the Raft node (None if role is RaftRole::None)
pub fn raft(&self) -> Option<&Arc<RaftNode>> {
self.raft.as_ref()
}
/// Get the underlying Raft instance for internal service (None if role is RaftRole::None)
pub fn raft_instance(&self) -> Option<Arc<Raft>> {
self.raft.as_ref().map(|r| r.raft_arc())
}
/// Check if this node has Raft enabled
pub fn has_raft(&self) -> bool {
self.raft.is_some()
}
/// Get the Raft role configuration
pub fn raft_role(&self) -> RaftRole {
self.config.raft.role
}
/// Get the watch registry
pub fn watch_registry(&self) -> &Arc<WatchRegistry> {
&self.watch_registry
}
/// Get the cluster ID
pub fn cluster_id(&self) -> u64 {
self.config.cluster.id
}
/// Initialize the cluster if bootstrapping
///
/// This handles different behaviors based on RaftRole:
/// - Voter with bootstrap=true: Initialize cluster (single or multi-node)
/// - Learner: Wait to be added by the leader via add_learner
/// - None: No Raft, nothing to do
pub async fn maybe_bootstrap(&self) -> Result<()> {
let Some(raft) = &self.raft else {
info!("No Raft node to bootstrap (role=none)");
return Ok(());
};
match self.config.raft.role {
RaftRole::Voter if self.config.cluster.bootstrap => {
if self.config.cluster.initial_members.is_empty() {
// Single-node bootstrap
info!("Bootstrapping single-node cluster");
raft.initialize().await?;
} else {
// Multi-node bootstrap with initial_members
use openraft::BasicNode;
use std::collections::BTreeMap;
info!(
members = self.config.cluster.initial_members.len(),
"Bootstrapping multi-node cluster"
);
let members: BTreeMap<u64, BasicNode> = self
.config
.cluster
.initial_members
.iter()
.map(|m| (m.id, BasicNode::default()))
.collect();
raft.initialize_cluster(members).await?;
}
}
RaftRole::Learner => {
info!(
node_id = self.config.node.id,
"Learner node ready, waiting to be added to cluster"
);
// Learners don't bootstrap; they wait to be added via add_learner
}
_ => {
// Voter without bootstrap flag or other cases
info!(
node_id = self.config.node.id,
raft_role = %self.config.raft.role,
bootstrap = self.config.cluster.bootstrap,
"Not bootstrapping"
);
}
}
Ok(())
}
/// Get shutdown receiver
pub fn shutdown_receiver(&self) -> broadcast::Receiver<()> {
self.shutdown_tx.subscribe()
}
/// Trigger shutdown
pub fn shutdown(&self) {
let _ = self.shutdown_tx.send(());
}
}

View file

@ -0,0 +1,207 @@
//! gRPC server
//!
//! This module provides the main server implementation that hosts all gRPC services.
//! Supports two modes:
//! - Full server mode (voter/learner): Runs Raft consensus and all services
//! - Agent mode (role=none): Runs gossip only, proxies requests to control-plane
use crate::config::ServerConfig;
use crate::node::Node;
use anyhow::Result;
use chainfire_api::internal_proto::raft_service_server::RaftServiceServer;
use chainfire_api::proto::{
cluster_server::ClusterServer, kv_server::KvServer, watch_server::WatchServer,
};
use chainfire_api::{ClusterServiceImpl, KvServiceImpl, RaftServiceImpl, WatchServiceImpl};
use chainfire_types::RaftRole;
use std::sync::Arc;
use tokio::signal;
use tonic::transport::Server as TonicServer;
use tonic_health::server::health_reporter;
use tracing::info;
/// Main server instance
pub struct Server {
node: Arc<Node>,
config: ServerConfig,
}
impl Server {
/// Create a new server
pub async fn new(config: ServerConfig) -> Result<Self> {
let node = Arc::new(Node::new(config.clone()).await?);
Ok(Self { node, config })
}
/// Run the server in the appropriate mode based on Raft role
pub async fn run(self) -> Result<()> {
match self.node.raft_role() {
RaftRole::None => self.run_agent_mode().await,
_ => self.run_full_mode().await,
}
}
/// Run in full server mode (voter/learner with Raft consensus)
async fn run_full_mode(self) -> Result<()> {
let raft = self
.node
.raft()
.expect("raft node should exist in full mode")
.clone();
let raft_instance = self.node.raft_instance().expect("raft instance should exist");
// Bootstrap cluster if needed
self.node.maybe_bootstrap().await?;
// Create gRPC services for client API
let kv_service = KvServiceImpl::new(Arc::clone(&raft), self.node.cluster_id());
let watch_service = WatchServiceImpl::new(
Arc::clone(self.node.watch_registry()),
self.node.cluster_id(),
raft.id(),
);
let cluster_service = ClusterServiceImpl::new(Arc::clone(&raft), self.node.cluster_id());
// Internal Raft service for inter-node communication
let raft_service = RaftServiceImpl::new(raft_instance);
// Health check service for K8s liveness/readiness probes
let (mut health_reporter, health_service) = health_reporter();
health_reporter
.set_serving::<KvServer<KvServiceImpl>>()
.await;
health_reporter
.set_serving::<WatchServer<WatchServiceImpl>>()
.await;
health_reporter
.set_serving::<ClusterServer<ClusterServiceImpl>>()
.await;
info!(
api_addr = %self.config.network.api_addr,
raft_addr = %self.config.network.raft_addr,
"Starting gRPC servers"
);
// Shutdown signal channel
let (shutdown_tx, _) = tokio::sync::broadcast::channel::<()>(1);
let mut shutdown_rx1 = shutdown_tx.subscribe();
let mut shutdown_rx2 = shutdown_tx.subscribe();
// Client API server (KV, Watch, Cluster, Health)
let api_addr = self.config.network.api_addr;
let api_server = TonicServer::builder()
.add_service(health_service)
.add_service(KvServer::new(kv_service))
.add_service(WatchServer::new(watch_service))
.add_service(ClusterServer::new(cluster_service))
.serve_with_shutdown(api_addr, async move {
let _ = shutdown_rx1.recv().await;
});
// Internal Raft server (peer-to-peer communication)
let raft_addr = self.config.network.raft_addr;
let raft_server = TonicServer::builder()
.add_service(RaftServiceServer::new(raft_service))
.serve_with_shutdown(raft_addr, async move {
let _ = shutdown_rx2.recv().await;
});
info!(api_addr = %api_addr, "Client API server starting");
info!(raft_addr = %raft_addr, "Raft server starting");
// Run both servers concurrently
tokio::select! {
result = api_server => {
if let Err(e) = result {
tracing::error!(error = %e, "API server error");
}
}
result = raft_server => {
if let Err(e) = result {
tracing::error!(error = %e, "Raft server error");
}
}
_ = signal::ctrl_c() => {
info!("Received shutdown signal");
let _ = shutdown_tx.send(());
}
}
info!("Server stopped");
Ok(())
}
/// Run in agent mode (role=none, gossip only, no Raft)
///
/// Agent mode runs a lightweight server that:
/// - Participates in gossip protocol for cluster discovery
/// - Can subscribe to watch events (if connected to control-plane)
/// - Does not run Raft consensus
/// - Suitable for worker nodes that only need cluster membership
async fn run_agent_mode(self) -> Result<()> {
info!(
node_id = self.config.node.id,
api_addr = %self.config.network.api_addr,
"Starting agent mode (no Raft)"
);
// Get control-plane Raft addresses from initial_members
// These can be used to derive API addresses or discover them via gossip
let control_plane_addrs: Vec<&str> = self
.config
.cluster
.initial_members
.iter()
.map(|m| m.raft_addr.as_str())
.collect();
if !control_plane_addrs.is_empty() {
info!(
control_plane_nodes = ?control_plane_addrs,
"Agent mode: control-plane Raft endpoints (use gossip for API discovery)"
);
}
// Health check service for K8s liveness/readiness probes
let (mut health_reporter, health_service) = health_reporter();
// In agent mode, we report the agent service as serving (gossip is running)
health_reporter
.set_service_status("chainfire.Agent", tonic_health::ServingStatus::Serving)
.await;
// Shutdown signal channel
let (shutdown_tx, _) = tokio::sync::broadcast::channel::<()>(1);
let mut shutdown_rx = shutdown_tx.subscribe();
// Run health check server for K8s probes
let api_addr = self.config.network.api_addr;
let health_server = TonicServer::builder()
.add_service(health_service)
.serve_with_shutdown(api_addr, async move {
let _ = shutdown_rx.recv().await;
});
info!(api_addr = %api_addr, "Agent health server starting");
info!("Agent running. Press Ctrl+C to stop.");
tokio::select! {
result = health_server => {
if let Err(e) = result {
tracing::error!(error = %e, "Agent health server error");
}
}
_ = signal::ctrl_c() => {
info!("Received shutdown signal");
let _ = shutdown_tx.send(());
}
}
self.node.shutdown();
info!("Agent stopped");
Ok(())
}
}

View file

@ -0,0 +1,159 @@
//! Integration tests for Chainfire
//!
//! These tests verify that the server, client, and all components work together correctly.
use chainfire_client::Client;
use chainfire_server::{
config::{ClusterConfig, NetworkConfig, NodeConfig, RaftConfig, ServerConfig, StorageConfig},
server::Server,
};
use std::time::Duration;
use tokio::time::sleep;
/// Create a test server configuration
fn test_config(port: u16) -> (ServerConfig, tempfile::TempDir) {
use std::net::SocketAddr;
let api_addr: SocketAddr = format!("127.0.0.1:{}", port).parse().unwrap();
let raft_addr: SocketAddr = format!("127.0.0.1:{}", port + 100).parse().unwrap();
let gossip_addr: SocketAddr = format!("127.0.0.1:{}", port + 200).parse().unwrap();
let temp_dir = tempfile::tempdir().unwrap();
let config = ServerConfig {
node: NodeConfig {
id: 1,
name: format!("test-node-{}", port),
role: "control_plane".to_string(),
},
cluster: ClusterConfig {
id: 1,
bootstrap: true,
initial_members: vec![],
},
network: NetworkConfig {
api_addr,
raft_addr,
gossip_addr,
},
storage: StorageConfig {
data_dir: temp_dir.path().to_path_buf(),
},
raft: RaftConfig::default(),
};
(config, temp_dir)
}
#[tokio::test]
async fn test_single_node_kv_operations() {
// Start server
let (config, _temp_dir) = test_config(23790);
let api_addr = config.network.api_addr;
let server = Server::new(config).await.unwrap();
// Run server in background
let server_handle = tokio::spawn(async move {
let _ = server.run().await;
});
// Wait for server to start
sleep(Duration::from_millis(500)).await;
// Connect client
let mut client = Client::connect(format!("http://{}", api_addr))
.await
.unwrap();
// Test put
let rev = client.put("test/key1", "value1").await.unwrap();
assert!(rev > 0);
// Test get
let value = client.get("test/key1").await.unwrap();
assert_eq!(value, Some(b"value1".to_vec()));
// Test put with different value
let rev2 = client.put("test/key1", "value2").await.unwrap();
assert!(rev2 > rev);
// Test get updated value
let value = client.get("test/key1").await.unwrap();
assert_eq!(value, Some(b"value2".to_vec()));
// Test get non-existent key
let value = client.get("test/nonexistent").await.unwrap();
assert!(value.is_none());
// Test delete
let deleted = client.delete("test/key1").await.unwrap();
assert!(deleted);
// Verify deletion
let value = client.get("test/key1").await.unwrap();
assert!(value.is_none());
// Test delete non-existent key
let deleted = client.delete("test/nonexistent").await.unwrap();
assert!(!deleted);
// Test prefix operations
client.put("prefix/a", "1").await.unwrap();
client.put("prefix/b", "2").await.unwrap();
client.put("prefix/c", "3").await.unwrap();
client.put("other/key", "other").await.unwrap();
let prefix_values = client.get_prefix("prefix/").await.unwrap();
assert_eq!(prefix_values.len(), 3);
// Cleanup
server_handle.abort();
}
#[tokio::test]
async fn test_cluster_status() {
let (config, _temp_dir) = test_config(23800);
let api_addr = config.network.api_addr;
let server = Server::new(config).await.unwrap();
let server_handle = tokio::spawn(async move {
let _ = server.run().await;
});
sleep(Duration::from_millis(500)).await;
let mut client = Client::connect(format!("http://{}", api_addr))
.await
.unwrap();
let status = client.status().await.unwrap();
assert_eq!(status.leader, 1);
assert!(status.raft_term > 0);
server_handle.abort();
}
#[tokio::test]
async fn test_string_convenience_methods() {
let (config, _temp_dir) = test_config(23810);
let api_addr = config.network.api_addr;
let server = Server::new(config).await.unwrap();
let server_handle = tokio::spawn(async move {
let _ = server.run().await;
});
sleep(Duration::from_millis(500)).await;
let mut client = Client::connect(format!("http://{}", api_addr))
.await
.unwrap();
// Test string methods
client.put_str("/config/name", "chainfire").await.unwrap();
let value = client.get_str("/config/name").await.unwrap();
assert_eq!(value, Some("chainfire".to_string()));
server_handle.abort();
}

View file

@ -0,0 +1,34 @@
[package]
name = "chainfire-storage"
version.workspace = true
edition.workspace = true
license.workspace = true
rust-version.workspace = true
description = "RocksDB storage layer for Chainfire distributed KVS"
[dependencies]
chainfire-types = { workspace = true }
# Storage
rocksdb = { workspace = true }
# Async
tokio = { workspace = true }
async-trait = { workspace = true }
# Serialization
serde = { workspace = true }
bincode = { workspace = true }
# Utilities
tracing = { workspace = true }
parking_lot = { workspace = true }
bytes = { workspace = true }
dashmap = { workspace = true }
[dev-dependencies]
tempfile = { workspace = true }
tokio = { workspace = true, features = ["rt-multi-thread", "macros"] }
[lints]
workspace = true

View file

@ -0,0 +1,435 @@
//! Key-Value store operations
use crate::{cf, meta_keys, RocksStore};
use chainfire_types::error::StorageError;
use chainfire_types::kv::{KeyRange, KvEntry, Revision};
use parking_lot::RwLock;
use rocksdb::WriteBatch;
use std::sync::atomic::{AtomicU64, Ordering};
use std::sync::Arc;
use tracing::{debug, trace};
/// KV store built on RocksDB
pub struct KvStore {
store: RocksStore,
/// Current revision counter
revision: AtomicU64,
}
impl KvStore {
/// Create a new KV store
pub fn new(store: RocksStore) -> Result<Self, StorageError> {
let revision = Self::load_revision(&store)?;
Ok(Self {
store,
revision: AtomicU64::new(revision),
})
}
/// Load the current revision from storage
fn load_revision(store: &RocksStore) -> Result<Revision, StorageError> {
let cf = store
.cf_handle(cf::META)
.ok_or_else(|| StorageError::RocksDb("META cf not found".into()))?;
match store
.db()
.get_cf(&cf, meta_keys::REVISION)
.map_err(|e| StorageError::RocksDb(e.to_string()))?
{
Some(bytes) => {
let revision: Revision = bincode::deserialize(&bytes)
.map_err(|e| StorageError::Serialization(e.to_string()))?;
Ok(revision)
}
None => Ok(0),
}
}
/// Get current revision
pub fn current_revision(&self) -> Revision {
self.revision.load(Ordering::SeqCst)
}
/// Increment and return new revision
fn next_revision(&self) -> Revision {
self.revision.fetch_add(1, Ordering::SeqCst) + 1
}
/// Persist current revision
fn save_revision(&self, revision: Revision) -> Result<(), StorageError> {
let cf = self
.store
.cf_handle(cf::META)
.ok_or_else(|| StorageError::RocksDb("META cf not found".into()))?;
let bytes =
bincode::serialize(&revision).map_err(|e| StorageError::Serialization(e.to_string()))?;
self.store
.db()
.put_cf(&cf, meta_keys::REVISION, bytes)
.map_err(|e| StorageError::RocksDb(e.to_string()))?;
Ok(())
}
/// Get a single key
pub fn get(&self, key: &[u8]) -> Result<Option<KvEntry>, StorageError> {
let cf = self
.store
.cf_handle(cf::KV)
.ok_or_else(|| StorageError::RocksDb("KV cf not found".into()))?;
match self
.store
.db()
.get_cf(&cf, key)
.map_err(|e| StorageError::RocksDb(e.to_string()))?
{
Some(bytes) => {
let entry: KvEntry = bincode::deserialize(&bytes)
.map_err(|e| StorageError::Serialization(e.to_string()))?;
Ok(Some(entry))
}
None => Ok(None),
}
}
/// Put a key-value pair
pub fn put(
&self,
key: Vec<u8>,
value: Vec<u8>,
lease_id: Option<i64>,
) -> Result<(Revision, Option<KvEntry>), StorageError> {
let cf = self
.store
.cf_handle(cf::KV)
.ok_or_else(|| StorageError::RocksDb("KV cf not found".into()))?;
// Get previous entry
let prev = self.get(&key)?;
let revision = self.next_revision();
// Create new entry
let entry = match &prev {
Some(old) => old.update(value, revision),
None => {
if let Some(lease) = lease_id {
KvEntry::with_lease(key.clone(), value, revision, lease)
} else {
KvEntry::new(key.clone(), value, revision)
}
}
};
// Write to RocksDB
let bytes =
bincode::serialize(&entry).map_err(|e| StorageError::Serialization(e.to_string()))?;
let mut batch = WriteBatch::default();
batch.put_cf(&cf, &key, &bytes);
// Also persist revision
let meta_cf = self
.store
.cf_handle(cf::META)
.ok_or_else(|| StorageError::RocksDb("META cf not found".into()))?;
let rev_bytes = bincode::serialize(&revision)
.map_err(|e| StorageError::Serialization(e.to_string()))?;
batch.put_cf(&meta_cf, meta_keys::REVISION, &rev_bytes);
self.store
.db()
.write(batch)
.map_err(|e| StorageError::RocksDb(e.to_string()))?;
debug!(key = ?String::from_utf8_lossy(&key), revision, "Put key");
Ok((revision, prev))
}
/// Delete a single key
pub fn delete(&self, key: &[u8]) -> Result<(Revision, Option<KvEntry>), StorageError> {
let cf = self
.store
.cf_handle(cf::KV)
.ok_or_else(|| StorageError::RocksDb("KV cf not found".into()))?;
// Get previous entry
let prev = self.get(key)?;
if prev.is_none() {
return Ok((self.current_revision(), None));
}
let revision = self.next_revision();
// Delete from RocksDB
let mut batch = WriteBatch::default();
batch.delete_cf(&cf, key);
// Persist revision
let meta_cf = self
.store
.cf_handle(cf::META)
.ok_or_else(|| StorageError::RocksDb("META cf not found".into()))?;
let rev_bytes = bincode::serialize(&revision)
.map_err(|e| StorageError::Serialization(e.to_string()))?;
batch.put_cf(&meta_cf, meta_keys::REVISION, &rev_bytes);
self.store
.db()
.write(batch)
.map_err(|e| StorageError::RocksDb(e.to_string()))?;
debug!(key = ?String::from_utf8_lossy(key), revision, "Deleted key");
Ok((revision, prev))
}
/// Delete a range of keys
pub fn delete_range(
&self,
start: &[u8],
end: &[u8],
) -> Result<(Revision, Vec<KvEntry>), StorageError> {
let cf = self
.store
.cf_handle(cf::KV)
.ok_or_else(|| StorageError::RocksDb("KV cf not found".into()))?;
// First, collect all keys to delete
let entries = self.range(start, Some(end))?;
if entries.is_empty() {
return Ok((self.current_revision(), Vec::new()));
}
let revision = self.next_revision();
// Delete all keys
let mut batch = WriteBatch::default();
for entry in &entries {
batch.delete_cf(&cf, &entry.key);
}
// Persist revision
let meta_cf = self
.store
.cf_handle(cf::META)
.ok_or_else(|| StorageError::RocksDb("META cf not found".into()))?;
let rev_bytes = bincode::serialize(&revision)
.map_err(|e| StorageError::Serialization(e.to_string()))?;
batch.put_cf(&meta_cf, meta_keys::REVISION, &rev_bytes);
self.store
.db()
.write(batch)
.map_err(|e| StorageError::RocksDb(e.to_string()))?;
debug!(
start = ?String::from_utf8_lossy(start),
end = ?String::from_utf8_lossy(end),
deleted = entries.len(),
revision,
"Deleted range"
);
Ok((revision, entries))
}
/// Scan a range of keys
pub fn range(&self, start: &[u8], end: Option<&[u8]>) -> Result<Vec<KvEntry>, StorageError> {
let cf = self
.store
.cf_handle(cf::KV)
.ok_or_else(|| StorageError::RocksDb("KV cf not found".into()))?;
let mut entries = Vec::new();
let iter = self.store.db().iterator_cf(
&cf,
rocksdb::IteratorMode::From(start, rocksdb::Direction::Forward),
);
for item in iter {
let (key, value) = item.map_err(|e| StorageError::RocksDb(e.to_string()))?;
// Check if we've passed the end
if let Some(end_key) = end {
if key.as_ref() >= end_key {
break;
}
}
let entry: KvEntry = bincode::deserialize(&value)
.map_err(|e| StorageError::Serialization(e.to_string()))?;
entries.push(entry);
}
trace!(
start = ?String::from_utf8_lossy(start),
count = entries.len(),
"Range scan"
);
Ok(entries)
}
/// Scan keys with a prefix
pub fn prefix(&self, prefix: &[u8]) -> Result<Vec<KvEntry>, StorageError> {
let range = KeyRange::prefix(prefix);
self.range(&range.start, range.end.as_deref())
}
/// Get the underlying store
pub fn store(&self) -> &RocksStore {
&self.store
}
}
#[cfg(test)]
mod tests {
use super::*;
use tempfile::tempdir;
fn create_test_store() -> KvStore {
let dir = tempdir().unwrap();
let store = RocksStore::new(dir.path()).unwrap();
KvStore::new(store).unwrap()
}
#[test]
fn test_put_and_get() {
let kv = create_test_store();
let (rev, prev) = kv.put(b"key1".to_vec(), b"value1".to_vec(), None).unwrap();
assert_eq!(rev, 1);
assert!(prev.is_none());
let entry = kv.get(b"key1").unwrap().unwrap();
assert_eq!(entry.key, b"key1");
assert_eq!(entry.value, b"value1");
assert_eq!(entry.version, 1);
}
#[test]
fn test_update() {
let kv = create_test_store();
kv.put(b"key1".to_vec(), b"value1".to_vec(), None).unwrap();
let (rev, prev) = kv.put(b"key1".to_vec(), b"value2".to_vec(), None).unwrap();
assert_eq!(rev, 2);
assert!(prev.is_some());
assert_eq!(prev.unwrap().value, b"value1");
let entry = kv.get(b"key1").unwrap().unwrap();
assert_eq!(entry.value, b"value2");
assert_eq!(entry.version, 2);
assert_eq!(entry.create_revision, 1); // Unchanged
assert_eq!(entry.mod_revision, 2);
}
#[test]
fn test_delete() {
let kv = create_test_store();
kv.put(b"key1".to_vec(), b"value1".to_vec(), None).unwrap();
let (rev, prev) = kv.delete(b"key1").unwrap();
assert_eq!(rev, 2);
assert!(prev.is_some());
assert_eq!(prev.unwrap().value, b"value1");
let entry = kv.get(b"key1").unwrap();
assert!(entry.is_none());
}
#[test]
fn test_delete_nonexistent() {
let kv = create_test_store();
let (rev, prev) = kv.delete(b"nonexistent").unwrap();
assert_eq!(rev, 0);
assert!(prev.is_none());
}
#[test]
fn test_range() {
let kv = create_test_store();
kv.put(b"a".to_vec(), b"1".to_vec(), None).unwrap();
kv.put(b"b".to_vec(), b"2".to_vec(), None).unwrap();
kv.put(b"c".to_vec(), b"3".to_vec(), None).unwrap();
kv.put(b"d".to_vec(), b"4".to_vec(), None).unwrap();
let entries = kv.range(b"b", Some(b"d")).unwrap();
assert_eq!(entries.len(), 2);
assert_eq!(entries[0].key, b"b");
assert_eq!(entries[1].key, b"c");
}
#[test]
fn test_prefix() {
let kv = create_test_store();
kv.put(b"/nodes/1".to_vec(), b"node1".to_vec(), None)
.unwrap();
kv.put(b"/nodes/2".to_vec(), b"node2".to_vec(), None)
.unwrap();
kv.put(b"/tasks/1".to_vec(), b"task1".to_vec(), None)
.unwrap();
let entries = kv.prefix(b"/nodes/").unwrap();
assert_eq!(entries.len(), 2);
}
#[test]
fn test_delete_range() {
let kv = create_test_store();
kv.put(b"/nodes/1".to_vec(), b"node1".to_vec(), None)
.unwrap();
kv.put(b"/nodes/2".to_vec(), b"node2".to_vec(), None)
.unwrap();
kv.put(b"/tasks/1".to_vec(), b"task1".to_vec(), None)
.unwrap();
let (rev, deleted) = kv.delete_range(b"/nodes/", b"/nodes0").unwrap();
assert_eq!(deleted.len(), 2);
// Verify nodes are gone
assert!(kv.get(b"/nodes/1").unwrap().is_none());
assert!(kv.get(b"/nodes/2").unwrap().is_none());
// Verify task still exists
assert!(kv.get(b"/tasks/1").unwrap().is_some());
}
#[test]
fn test_revision_persistence() {
let dir = tempdir().unwrap();
// Create store and write some data
{
let store = RocksStore::new(dir.path()).unwrap();
let kv = KvStore::new(store).unwrap();
kv.put(b"key1".to_vec(), b"value1".to_vec(), None).unwrap();
kv.put(b"key2".to_vec(), b"value2".to_vec(), None).unwrap();
assert_eq!(kv.current_revision(), 2);
}
// Reopen and verify revision is restored
{
let store = RocksStore::new(dir.path()).unwrap();
let kv = KvStore::new(store).unwrap();
assert_eq!(kv.current_revision(), 2);
// Next write should continue from 3
let (rev, _) = kv.put(b"key3".to_vec(), b"value3".to_vec(), None).unwrap();
assert_eq!(rev, 3);
}
}
}

View file

@ -0,0 +1,280 @@
//! Lease storage for TTL-based key expiration
//!
//! Manages lease lifecycle: grant, revoke, refresh, expiration.
use chainfire_types::error::StorageError;
use chainfire_types::lease::{Lease, LeaseData, LeaseId};
use dashmap::DashMap;
use std::sync::atomic::{AtomicI64, Ordering};
use std::sync::Arc;
use std::time::Duration;
use tokio::sync::mpsc;
use tracing::{debug, info, warn};
/// Store for managing leases
pub struct LeaseStore {
/// Active leases: lease_id -> Lease
leases: DashMap<LeaseId, Lease>,
/// ID generator for new leases
next_id: AtomicI64,
/// Channel to notify of expired leases (lease_id, keys_to_delete)
expiration_tx: Option<mpsc::UnboundedSender<(LeaseId, Vec<Vec<u8>>)>>,
}
impl LeaseStore {
/// Create a new lease store
pub fn new() -> Self {
Self {
leases: DashMap::new(),
next_id: AtomicI64::new(1),
expiration_tx: None,
}
}
/// Set the expiration notification channel
pub fn set_expiration_sender(&mut self, tx: mpsc::UnboundedSender<(LeaseId, Vec<Vec<u8>>)>) {
self.expiration_tx = Some(tx);
}
/// Grant a new lease
pub fn grant(&self, id: LeaseId, ttl: i64) -> Result<Lease, StorageError> {
let lease_id = if id == 0 {
self.next_id.fetch_add(1, Ordering::SeqCst)
} else {
// Check if ID is already in use
if self.leases.contains_key(&id) {
return Err(StorageError::LeaseError(format!("Lease {} already exists", id)));
}
// Update next_id if necessary
let _ = self.next_id.fetch_max(id + 1, Ordering::SeqCst);
id
};
let lease = Lease::new(lease_id, ttl);
self.leases.insert(lease_id, lease.clone());
debug!(lease_id, ttl, "Lease granted");
Ok(lease)
}
/// Revoke a lease and return keys to delete
pub fn revoke(&self, id: LeaseId) -> Result<Vec<Vec<u8>>, StorageError> {
match self.leases.remove(&id) {
Some((_, lease)) => {
info!(lease_id = id, keys_count = lease.keys.len(), "Lease revoked");
Ok(lease.keys)
}
None => Err(StorageError::LeaseError(format!("Lease {} not found", id))),
}
}
/// Refresh a lease (keep-alive)
pub fn refresh(&self, id: LeaseId) -> Result<i64, StorageError> {
match self.leases.get_mut(&id) {
Some(mut lease) => {
lease.refresh();
let ttl = lease.ttl;
debug!(lease_id = id, ttl, "Lease refreshed");
Ok(ttl)
}
None => Err(StorageError::LeaseError(format!("Lease {} not found", id))),
}
}
/// Get a lease by ID
pub fn get(&self, id: LeaseId) -> Option<Lease> {
self.leases.get(&id).map(|l| l.clone())
}
/// Get remaining TTL for a lease
pub fn time_to_live(&self, id: LeaseId) -> Option<(i64, i64, Vec<Vec<u8>>)> {
self.leases.get(&id).map(|lease| {
(lease.remaining(), lease.ttl, lease.keys.clone())
})
}
/// List all lease IDs
pub fn list(&self) -> Vec<LeaseId> {
self.leases.iter().map(|entry| *entry.key()).collect()
}
/// Attach a key to a lease
pub fn attach_key(&self, lease_id: LeaseId, key: Vec<u8>) -> Result<(), StorageError> {
match self.leases.get_mut(&lease_id) {
Some(mut lease) => {
lease.attach_key(key);
Ok(())
}
None => Err(StorageError::LeaseError(format!("Lease {} not found", lease_id))),
}
}
/// Detach a key from a lease
pub fn detach_key(&self, lease_id: LeaseId, key: &[u8]) {
if let Some(mut lease) = self.leases.get_mut(&lease_id) {
lease.detach_key(key);
}
}
/// Check for expired leases and return their IDs and keys
pub fn collect_expired(&self) -> Vec<(LeaseId, Vec<Vec<u8>>)> {
let mut expired = Vec::new();
for entry in self.leases.iter() {
if entry.is_expired() {
expired.push((*entry.key(), entry.keys.clone()));
}
}
// Remove expired leases
for (id, _) in &expired {
self.leases.remove(id);
}
expired
}
/// Export all leases for snapshot
pub fn export(&self) -> Vec<LeaseData> {
self.leases
.iter()
.map(|entry| LeaseData::from_lease(&entry))
.collect()
}
/// Import leases from snapshot
pub fn import(&self, leases: Vec<LeaseData>) {
self.leases.clear();
for data in leases {
let id = data.id;
let lease = data.to_lease();
self.leases.insert(id, lease);
// Update next_id
let _ = self.next_id.fetch_max(id + 1, Ordering::SeqCst);
}
}
}
impl Default for LeaseStore {
fn default() -> Self {
Self::new()
}
}
/// Background worker that checks for expired leases
pub struct LeaseExpirationWorker {
store: Arc<LeaseStore>,
interval: Duration,
shutdown_rx: mpsc::Receiver<()>,
}
impl LeaseExpirationWorker {
/// Create a new expiration worker
pub fn new(
store: Arc<LeaseStore>,
interval: Duration,
shutdown_rx: mpsc::Receiver<()>,
) -> Self {
Self {
store,
interval,
shutdown_rx,
}
}
/// Run the expiration worker
pub async fn run(mut self, expire_callback: impl Fn(LeaseId, Vec<Vec<u8>>) + Send + 'static) {
let mut interval = tokio::time::interval(self.interval);
loop {
tokio::select! {
_ = interval.tick() => {
let expired = self.store.collect_expired();
for (lease_id, keys) in expired {
info!(lease_id, keys_count = keys.len(), "Lease expired");
expire_callback(lease_id, keys);
}
}
_ = self.shutdown_rx.recv() => {
info!("Lease expiration worker shutting down");
break;
}
}
}
}
}
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn test_lease_grant() {
let store = LeaseStore::new();
let lease = store.grant(0, 10).unwrap();
assert!(lease.id > 0);
assert_eq!(lease.ttl, 10);
}
#[test]
fn test_lease_grant_with_id() {
let store = LeaseStore::new();
let lease = store.grant(42, 10).unwrap();
assert_eq!(lease.id, 42);
}
#[test]
fn test_lease_revoke() {
let store = LeaseStore::new();
let lease = store.grant(0, 10).unwrap();
let id = lease.id;
// Attach some keys
store.attach_key(id, b"key1".to_vec()).unwrap();
store.attach_key(id, b"key2".to_vec()).unwrap();
let keys = store.revoke(id).unwrap();
assert_eq!(keys.len(), 2);
// Lease should be gone
assert!(store.get(id).is_none());
}
#[test]
fn test_lease_refresh() {
let store = LeaseStore::new();
let lease = store.grant(0, 10).unwrap();
let id = lease.id;
let ttl = store.refresh(id).unwrap();
assert_eq!(ttl, 10);
}
#[test]
fn test_lease_list() {
let store = LeaseStore::new();
store.grant(1, 10).unwrap();
store.grant(2, 10).unwrap();
store.grant(3, 10).unwrap();
let ids = store.list();
assert_eq!(ids.len(), 3);
}
#[test]
fn test_lease_attach_detach() {
let store = LeaseStore::new();
let lease = store.grant(0, 10).unwrap();
let id = lease.id;
store.attach_key(id, b"key1".to_vec()).unwrap();
store.attach_key(id, b"key2".to_vec()).unwrap();
let lease = store.get(id).unwrap();
assert_eq!(lease.keys.len(), 2);
store.detach_key(id, b"key1");
let lease = store.get(id).unwrap();
assert_eq!(lease.keys.len(), 1);
}
}

View file

@ -0,0 +1,51 @@
//! RocksDB storage layer for Chainfire distributed KVS
//!
//! This crate provides:
//! - RocksDB-backed persistent storage
//! - Key-Value operations (Put, Get, Delete, Scan)
//! - Lease management for TTL-based key expiration
//! - Log storage for Raft
//! - State machine for Raft
//! - Snapshot management
pub mod kv_store;
pub mod lease_store;
pub mod log_storage;
pub mod snapshot;
pub mod state_machine;
pub mod store;
pub use kv_store::KvStore;
pub use lease_store::{LeaseExpirationWorker, LeaseStore};
pub use log_storage::LogStorage;
pub use snapshot::{Snapshot, SnapshotBuilder};
pub use state_machine::StateMachine;
pub use store::RocksStore;
/// Column family names for RocksDB
pub mod cf {
/// Raft log entries
pub const LOGS: &str = "raft_logs";
/// Raft metadata (vote, term, etc.)
pub const META: &str = "raft_meta";
/// Key-value data
pub const KV: &str = "key_value";
/// Snapshot metadata
pub const SNAPSHOT: &str = "snapshot";
/// Lease data
pub const LEASES: &str = "leases";
}
/// Metadata keys
pub mod meta_keys {
/// Current term and vote
pub const VOTE: &[u8] = b"vote";
/// Last applied log ID
pub const LAST_APPLIED: &[u8] = b"last_applied";
/// Current membership
pub const MEMBERSHIP: &[u8] = b"membership";
/// Current revision
pub const REVISION: &[u8] = b"revision";
/// Last snapshot ID
pub const LAST_SNAPSHOT: &[u8] = b"last_snapshot";
}

View file

@ -0,0 +1,478 @@
//! Raft log storage implementation
//!
//! This module provides persistent storage for Raft log entries using RocksDB.
use crate::{cf, meta_keys, RocksStore};
use chainfire_types::error::StorageError;
use rocksdb::WriteBatch;
use serde::{Deserialize, Serialize};
use std::ops::RangeBounds;
use tracing::{debug, trace};
/// Log entry index type
pub type LogIndex = u64;
/// Raft term type
pub type Term = u64;
/// Log ID combining term and index
#[derive(Debug, Clone, Copy, PartialEq, Eq, PartialOrd, Ord, Hash, Serialize, Deserialize)]
pub struct LogId {
pub term: Term,
pub index: LogIndex,
}
impl LogId {
pub fn new(term: Term, index: LogIndex) -> Self {
Self { term, index }
}
}
impl Default for LogId {
fn default() -> Self {
Self { term: 0, index: 0 }
}
}
/// A log entry stored in the Raft log
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct LogEntry<D> {
pub log_id: LogId,
pub payload: EntryPayload<D>,
}
/// Payload of a log entry
#[derive(Debug, Clone, Serialize, Deserialize)]
pub enum EntryPayload<D> {
/// A blank entry for leader establishment
Blank,
/// A normal data entry
Normal(D),
/// Membership change entry
Membership(Vec<u64>), // Just node IDs for simplicity
}
impl<D> LogEntry<D> {
pub fn blank(log_id: LogId) -> Self {
Self {
log_id,
payload: EntryPayload::Blank,
}
}
pub fn normal(log_id: LogId, data: D) -> Self {
Self {
log_id,
payload: EntryPayload::Normal(data),
}
}
}
/// Persisted vote information
#[derive(Debug, Clone, Copy, Serialize, Deserialize, Default)]
pub struct Vote {
pub term: Term,
pub node_id: Option<u64>,
pub committed: bool,
}
/// Log storage state
#[derive(Debug, Clone, Default)]
pub struct LogState {
/// Last purged log ID
pub last_purged_log_id: Option<LogId>,
/// Last log ID in storage
pub last_log_id: Option<LogId>,
}
/// Raft log storage backed by RocksDB
pub struct LogStorage {
store: RocksStore,
}
impl LogStorage {
/// Create a new log storage
pub fn new(store: RocksStore) -> Self {
Self { store }
}
/// Encode log index as bytes for storage
fn encode_index(index: LogIndex) -> [u8; 8] {
index.to_be_bytes()
}
/// Decode log index from bytes
fn decode_index(bytes: &[u8]) -> LogIndex {
let arr: [u8; 8] = bytes.try_into().unwrap_or_default();
LogIndex::from_be_bytes(arr)
}
/// Get log state (first and last log IDs)
pub fn get_log_state(&self) -> Result<LogState, StorageError> {
let cf = self
.store
.cf_handle(cf::LOGS)
.ok_or_else(|| StorageError::RocksDb("LOGS cf not found".into()))?;
// Get first and last entries
let mut iter = self
.store
.db()
.iterator_cf(&cf, rocksdb::IteratorMode::Start);
let _first = iter.next();
let last_purged_log_id = self.get_last_purged_log_id()?;
// Get last log ID
let mut last_iter = self
.store
.db()
.iterator_cf(&cf, rocksdb::IteratorMode::End);
let last_log_id = if let Some(Ok((_, value))) = last_iter.next() {
let entry: LogEntry<Vec<u8>> = bincode::deserialize(&value)
.map_err(|e| StorageError::Serialization(e.to_string()))?;
Some(entry.log_id)
} else {
last_purged_log_id
};
Ok(LogState {
last_purged_log_id,
last_log_id,
})
}
/// Save vote to persistent storage
pub fn save_vote(&self, vote: Vote) -> Result<(), StorageError> {
let cf = self
.store
.cf_handle(cf::META)
.ok_or_else(|| StorageError::RocksDb("META cf not found".into()))?;
let bytes =
bincode::serialize(&vote).map_err(|e| StorageError::Serialization(e.to_string()))?;
self.store
.db()
.put_cf(&cf, meta_keys::VOTE, bytes)
.map_err(|e| StorageError::RocksDb(e.to_string()))?;
debug!(?vote, "Saved vote");
Ok(())
}
/// Read vote from persistent storage
pub fn read_vote(&self) -> Result<Option<Vote>, StorageError> {
let cf = self
.store
.cf_handle(cf::META)
.ok_or_else(|| StorageError::RocksDb("META cf not found".into()))?;
match self
.store
.db()
.get_cf(&cf, meta_keys::VOTE)
.map_err(|e| StorageError::RocksDb(e.to_string()))?
{
Some(bytes) => {
let vote: Vote = bincode::deserialize(&bytes)
.map_err(|e| StorageError::Serialization(e.to_string()))?;
Ok(Some(vote))
}
None => Ok(None),
}
}
/// Append log entries
pub fn append<D: Serialize>(&self, entries: &[LogEntry<D>]) -> Result<(), StorageError> {
if entries.is_empty() {
return Ok(());
}
let cf = self
.store
.cf_handle(cf::LOGS)
.ok_or_else(|| StorageError::RocksDb("LOGS cf not found".into()))?;
let mut batch = WriteBatch::default();
for entry in entries {
let key = Self::encode_index(entry.log_id.index);
let value = bincode::serialize(entry)
.map_err(|e| StorageError::Serialization(e.to_string()))?;
batch.put_cf(&cf, key, value);
}
self.store
.db()
.write(batch)
.map_err(|e| StorageError::RocksDb(e.to_string()))?;
debug!(
first = entries.first().map(|e| e.log_id.index),
last = entries.last().map(|e| e.log_id.index),
count = entries.len(),
"Appended log entries"
);
Ok(())
}
/// Get log entries in a range
pub fn get_log_entries<D: for<'de> Deserialize<'de>>(
&self,
range: impl RangeBounds<LogIndex>,
) -> Result<Vec<LogEntry<D>>, StorageError> {
let cf = self
.store
.cf_handle(cf::LOGS)
.ok_or_else(|| StorageError::RocksDb("LOGS cf not found".into()))?;
let start = match range.start_bound() {
std::ops::Bound::Included(&idx) => idx,
std::ops::Bound::Excluded(&idx) => idx + 1,
std::ops::Bound::Unbounded => 0,
};
let end = match range.end_bound() {
std::ops::Bound::Included(&idx) => Some(idx + 1),
std::ops::Bound::Excluded(&idx) => Some(idx),
std::ops::Bound::Unbounded => None,
};
let mut entries = Vec::new();
let iter = self.store.db().iterator_cf(
&cf,
rocksdb::IteratorMode::From(&Self::encode_index(start), rocksdb::Direction::Forward),
);
for item in iter {
let (key, value) = item.map_err(|e| StorageError::RocksDb(e.to_string()))?;
let idx = Self::decode_index(&key);
if let Some(end_idx) = end {
if idx >= end_idx {
break;
}
}
let entry: LogEntry<D> = bincode::deserialize(&value)
.map_err(|e| StorageError::Serialization(e.to_string()))?;
entries.push(entry);
}
trace!(start, ?end, count = entries.len(), "Get log entries");
Ok(entries)
}
/// Truncate log from the given index (inclusive)
pub fn truncate(&self, from_index: LogIndex) -> Result<(), StorageError> {
let cf = self
.store
.cf_handle(cf::LOGS)
.ok_or_else(|| StorageError::RocksDb("LOGS cf not found".into()))?;
let mut batch = WriteBatch::default();
let iter = self.store.db().iterator_cf(
&cf,
rocksdb::IteratorMode::From(
&Self::encode_index(from_index),
rocksdb::Direction::Forward,
),
);
for item in iter {
let (key, _) = item.map_err(|e| StorageError::RocksDb(e.to_string()))?;
batch.delete_cf(&cf, key);
}
self.store
.db()
.write(batch)
.map_err(|e| StorageError::RocksDb(e.to_string()))?;
debug!(from_index, "Truncated log");
Ok(())
}
/// Purge log entries up to the given index (inclusive)
pub fn purge(&self, up_to_index: LogIndex) -> Result<(), StorageError> {
let cf = self
.store
.cf_handle(cf::LOGS)
.ok_or_else(|| StorageError::RocksDb("LOGS cf not found".into()))?;
// First, get the log ID of the entry we're purging to
let entries: Vec<LogEntry<Vec<u8>>> = self.get_log_entries(up_to_index..=up_to_index)?;
let last_purged = entries.first().map(|e| e.log_id);
let mut batch = WriteBatch::default();
let iter = self
.store
.db()
.iterator_cf(&cf, rocksdb::IteratorMode::Start);
for item in iter {
let (key, _) = item.map_err(|e| StorageError::RocksDb(e.to_string()))?;
let idx = Self::decode_index(&key);
if idx > up_to_index {
break;
}
batch.delete_cf(&cf, key);
}
// Save last purged log ID
if let Some(log_id) = last_purged {
let meta_cf = self
.store
.cf_handle(cf::META)
.ok_or_else(|| StorageError::RocksDb("META cf not found".into()))?;
let bytes = bincode::serialize(&log_id)
.map_err(|e| StorageError::Serialization(e.to_string()))?;
batch.put_cf(&meta_cf, b"last_purged", bytes);
}
self.store
.db()
.write(batch)
.map_err(|e| StorageError::RocksDb(e.to_string()))?;
debug!(up_to_index, "Purged log");
Ok(())
}
/// Get last purged log ID
fn get_last_purged_log_id(&self) -> Result<Option<LogId>, StorageError> {
let cf = self
.store
.cf_handle(cf::META)
.ok_or_else(|| StorageError::RocksDb("META cf not found".into()))?;
match self
.store
.db()
.get_cf(&cf, b"last_purged")
.map_err(|e| StorageError::RocksDb(e.to_string()))?
{
Some(bytes) => {
let log_id: LogId = bincode::deserialize(&bytes)
.map_err(|e| StorageError::Serialization(e.to_string()))?;
Ok(Some(log_id))
}
None => Ok(None),
}
}
}
#[cfg(test)]
mod tests {
use super::*;
use tempfile::tempdir;
fn create_test_storage() -> LogStorage {
let dir = tempdir().unwrap();
let store = RocksStore::new(dir.path()).unwrap();
LogStorage::new(store)
}
#[test]
fn test_vote_persistence() {
let storage = create_test_storage();
let vote = Vote {
term: 5,
node_id: Some(1),
committed: true,
};
storage.save_vote(vote).unwrap();
let loaded = storage.read_vote().unwrap().unwrap();
assert_eq!(loaded.term, 5);
assert_eq!(loaded.node_id, Some(1));
assert!(loaded.committed);
}
#[test]
fn test_append_and_get_entries() {
let storage = create_test_storage();
let entries = vec![
LogEntry::<Vec<u8>>::blank(LogId::new(1, 1)),
LogEntry::normal(LogId::new(1, 2), b"data1".to_vec()),
LogEntry::normal(LogId::new(1, 3), b"data2".to_vec()),
];
storage.append(&entries).unwrap();
let loaded: Vec<LogEntry<Vec<u8>>> = storage.get_log_entries(1..=3).unwrap();
assert_eq!(loaded.len(), 3);
assert_eq!(loaded[0].log_id.index, 1);
assert_eq!(loaded[2].log_id.index, 3);
}
#[test]
fn test_log_state() {
let storage = create_test_storage();
// Initially empty
let state = storage.get_log_state().unwrap();
assert!(state.last_log_id.is_none());
// Add entries
let entries = vec![
LogEntry::<Vec<u8>>::blank(LogId::new(1, 1)),
LogEntry::normal(LogId::new(1, 2), b"data".to_vec()),
];
storage.append(&entries).unwrap();
let state = storage.get_log_state().unwrap();
assert_eq!(state.last_log_id, Some(LogId::new(1, 2)));
}
#[test]
fn test_truncate() {
let storage = create_test_storage();
let entries = vec![
LogEntry::<Vec<u8>>::blank(LogId::new(1, 1)),
LogEntry::normal(LogId::new(1, 2), b"data1".to_vec()),
LogEntry::normal(LogId::new(1, 3), b"data2".to_vec()),
LogEntry::normal(LogId::new(1, 4), b"data3".to_vec()),
];
storage.append(&entries).unwrap();
// Truncate from index 3
storage.truncate(3).unwrap();
let loaded: Vec<LogEntry<Vec<u8>>> = storage.get_log_entries(1..=4).unwrap();
assert_eq!(loaded.len(), 2);
assert_eq!(loaded.last().unwrap().log_id.index, 2);
}
#[test]
fn test_purge() {
let storage = create_test_storage();
let entries = vec![
LogEntry::<Vec<u8>>::blank(LogId::new(1, 1)),
LogEntry::normal(LogId::new(1, 2), b"data1".to_vec()),
LogEntry::normal(LogId::new(1, 3), b"data2".to_vec()),
LogEntry::normal(LogId::new(1, 4), b"data3".to_vec()),
];
storage.append(&entries).unwrap();
// Purge up to index 2
storage.purge(2).unwrap();
let loaded: Vec<LogEntry<Vec<u8>>> = storage.get_log_entries(1..=4).unwrap();
assert_eq!(loaded.len(), 2);
assert_eq!(loaded.first().unwrap().log_id.index, 3);
let state = storage.get_log_state().unwrap();
assert_eq!(state.last_purged_log_id, Some(LogId::new(1, 2)));
}
}

View file

@ -0,0 +1,316 @@
//! Snapshot management for Raft state
//!
//! Snapshots allow compacting the Raft log while preserving the state machine state.
use crate::{cf, RocksStore};
use chainfire_types::error::StorageError;
use chainfire_types::kv::KvEntry;
use serde::{Deserialize, Serialize};
use std::io::{Read, Write};
use tracing::info;
/// Snapshot metadata
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct SnapshotMeta {
/// Last log index included in snapshot
pub last_log_index: u64,
/// Term of last log entry included
pub last_log_term: u64,
/// Cluster membership at snapshot time
pub membership: Vec<u64>,
/// Size of snapshot data in bytes
pub size: u64,
}
/// A complete snapshot
#[derive(Debug)]
pub struct Snapshot {
pub meta: SnapshotMeta,
pub data: Vec<u8>,
}
impl Snapshot {
/// Create snapshot from raw data
pub fn new(meta: SnapshotMeta, data: Vec<u8>) -> Self {
Self { meta, data }
}
/// Serialize snapshot to bytes
pub fn to_bytes(&self) -> Result<Vec<u8>, StorageError> {
// Format: [meta_len: u32][meta][data]
let meta_bytes =
bincode::serialize(&self.meta).map_err(|e| StorageError::Serialization(e.to_string()))?;
let mut result = Vec::with_capacity(4 + meta_bytes.len() + self.data.len());
result.extend_from_slice(&(meta_bytes.len() as u32).to_le_bytes());
result.extend_from_slice(&meta_bytes);
result.extend_from_slice(&self.data);
Ok(result)
}
/// Deserialize snapshot from bytes
pub fn from_bytes(bytes: &[u8]) -> Result<Self, StorageError> {
if bytes.len() < 4 {
return Err(StorageError::Snapshot("Invalid snapshot: too short".into()));
}
let meta_len = u32::from_le_bytes(bytes[..4].try_into().unwrap()) as usize;
if bytes.len() < 4 + meta_len {
return Err(StorageError::Snapshot(
"Invalid snapshot: meta truncated".into(),
));
}
let meta: SnapshotMeta = bincode::deserialize(&bytes[4..4 + meta_len])
.map_err(|e| StorageError::Serialization(e.to_string()))?;
let data = bytes[4 + meta_len..].to_vec();
Ok(Self { meta, data })
}
}
/// Builder for creating snapshots from KV store state
pub struct SnapshotBuilder {
store: RocksStore,
}
impl SnapshotBuilder {
pub fn new(store: RocksStore) -> Self {
Self { store }
}
/// Build a snapshot of the current KV state
pub fn build(
&self,
last_log_index: u64,
last_log_term: u64,
membership: Vec<u64>,
) -> Result<Snapshot, StorageError> {
let cf = self
.store
.cf_handle(cf::KV)
.ok_or_else(|| StorageError::RocksDb("KV cf not found".into()))?;
// Collect all KV entries
let mut entries: Vec<KvEntry> = Vec::new();
let iter = self
.store
.db()
.iterator_cf(&cf, rocksdb::IteratorMode::Start);
for item in iter {
let (_, value) = item.map_err(|e| StorageError::RocksDb(e.to_string()))?;
let entry: KvEntry = bincode::deserialize(&value)
.map_err(|e| StorageError::Serialization(e.to_string()))?;
entries.push(entry);
}
// Serialize entries
let data = bincode::serialize(&entries)
.map_err(|e| StorageError::Serialization(e.to_string()))?;
let meta = SnapshotMeta {
last_log_index,
last_log_term,
membership,
size: data.len() as u64,
};
info!(
last_log_index,
entries = entries.len(),
size = data.len(),
"Built snapshot"
);
Ok(Snapshot::new(meta, data))
}
/// Apply a snapshot to restore state
pub fn apply(&self, snapshot: &Snapshot) -> Result<(), StorageError> {
let cf = self
.store
.cf_handle(cf::KV)
.ok_or_else(|| StorageError::RocksDb("KV cf not found".into()))?;
// Deserialize entries
let entries: Vec<KvEntry> = bincode::deserialize(&snapshot.data)
.map_err(|e| StorageError::Serialization(e.to_string()))?;
// Clear existing KV data
let mut batch = rocksdb::WriteBatch::default();
let iter = self
.store
.db()
.iterator_cf(&cf, rocksdb::IteratorMode::Start);
for item in iter {
let (key, _) = item.map_err(|e| StorageError::RocksDb(e.to_string()))?;
batch.delete_cf(&cf, key);
}
// Write new entries
for entry in &entries {
let value = bincode::serialize(entry)
.map_err(|e| StorageError::Serialization(e.to_string()))?;
batch.put_cf(&cf, &entry.key, value);
}
self.store
.db()
.write(batch)
.map_err(|e| StorageError::RocksDb(e.to_string()))?;
info!(
last_log_index = snapshot.meta.last_log_index,
entries = entries.len(),
"Applied snapshot"
);
Ok(())
}
}
/// Streaming snapshot reader for large snapshots
pub struct SnapshotReader {
data: Vec<u8>,
position: usize,
}
impl SnapshotReader {
pub fn new(data: Vec<u8>) -> Self {
Self { data, position: 0 }
}
pub fn remaining(&self) -> usize {
self.data.len() - self.position
}
}
impl Read for SnapshotReader {
fn read(&mut self, buf: &mut [u8]) -> std::io::Result<usize> {
let remaining = self.remaining();
if remaining == 0 {
return Ok(0);
}
let to_read = std::cmp::min(buf.len(), remaining);
buf[..to_read].copy_from_slice(&self.data[self.position..self.position + to_read]);
self.position += to_read;
Ok(to_read)
}
}
/// Streaming snapshot writer for building large snapshots
pub struct SnapshotWriter {
data: Vec<u8>,
}
impl SnapshotWriter {
pub fn new() -> Self {
Self { data: Vec::new() }
}
pub fn into_inner(self) -> Vec<u8> {
self.data
}
}
impl Default for SnapshotWriter {
fn default() -> Self {
Self::new()
}
}
impl Write for SnapshotWriter {
fn write(&mut self, buf: &[u8]) -> std::io::Result<usize> {
self.data.extend_from_slice(buf);
Ok(buf.len())
}
fn flush(&mut self) -> std::io::Result<()> {
Ok(())
}
}
#[cfg(test)]
mod tests {
use super::*;
use crate::KvStore;
use tempfile::tempdir;
fn create_test_store() -> RocksStore {
let dir = tempdir().unwrap();
RocksStore::new(dir.path()).unwrap()
}
#[test]
fn test_snapshot_roundtrip() {
let store = create_test_store();
// Add some data
let kv = KvStore::new(store.clone()).unwrap();
kv.put(b"key1".to_vec(), b"value1".to_vec(), None).unwrap();
kv.put(b"key2".to_vec(), b"value2".to_vec(), None).unwrap();
// Build snapshot
let builder = SnapshotBuilder::new(store.clone());
let snapshot = builder.build(10, 1, vec![1, 2, 3]).unwrap();
assert_eq!(snapshot.meta.last_log_index, 10);
assert_eq!(snapshot.meta.last_log_term, 1);
assert_eq!(snapshot.meta.membership, vec![1, 2, 3]);
// Serialize and deserialize
let bytes = snapshot.to_bytes().unwrap();
let restored = Snapshot::from_bytes(&bytes).unwrap();
assert_eq!(restored.meta.last_log_index, snapshot.meta.last_log_index);
assert_eq!(restored.data.len(), snapshot.data.len());
}
#[test]
fn test_snapshot_apply() {
let store1 = create_test_store();
let store2 = create_test_store();
// Add data to store1
let kv1 = KvStore::new(store1.clone()).unwrap();
kv1.put(b"key1".to_vec(), b"value1".to_vec(), None)
.unwrap();
kv1.put(b"key2".to_vec(), b"value2".to_vec(), None)
.unwrap();
// Build snapshot from store1
let builder1 = SnapshotBuilder::new(store1.clone());
let snapshot = builder1.build(10, 1, vec![1]).unwrap();
// Apply to store2
let builder2 = SnapshotBuilder::new(store2.clone());
builder2.apply(&snapshot).unwrap();
// Verify data in store2
let kv2 = KvStore::new(store2).unwrap();
let entry1 = kv2.get(b"key1").unwrap().unwrap();
let entry2 = kv2.get(b"key2").unwrap().unwrap();
assert_eq!(entry1.value, b"value1");
assert_eq!(entry2.value, b"value2");
}
#[test]
fn test_snapshot_reader() {
let data = vec![1, 2, 3, 4, 5];
let mut reader = SnapshotReader::new(data.clone());
let mut buf = [0u8; 3];
assert_eq!(reader.read(&mut buf).unwrap(), 3);
assert_eq!(&buf, &[1, 2, 3]);
assert_eq!(reader.read(&mut buf).unwrap(), 2);
assert_eq!(&buf[..2], &[4, 5]);
assert_eq!(reader.read(&mut buf).unwrap(), 0);
}
}

View file

@ -0,0 +1,587 @@
//! Raft state machine implementation
//!
//! The state machine applies committed Raft log entries to the KV store.
use crate::{KvStore, LeaseStore, RocksStore};
use chainfire_types::command::{Compare, CompareResult, CompareTarget, RaftCommand, RaftResponse};
use chainfire_types::error::StorageError;
use chainfire_types::watch::WatchEvent;
use chainfire_types::Revision;
use std::sync::Arc;
use tokio::sync::mpsc;
use tracing::warn;
/// State machine that applies Raft commands to the KV store
pub struct StateMachine {
/// Underlying KV store
kv: KvStore,
/// Lease store for TTL management
leases: Arc<LeaseStore>,
/// Channel to send watch events
watch_tx: Option<mpsc::UnboundedSender<WatchEvent>>,
}
impl StateMachine {
/// Create a new state machine
pub fn new(store: RocksStore) -> Result<Self, StorageError> {
let kv = KvStore::new(store)?;
Ok(Self {
kv,
leases: Arc::new(LeaseStore::new()),
watch_tx: None,
})
}
/// Set the watch event sender
pub fn set_watch_sender(&mut self, tx: mpsc::UnboundedSender<WatchEvent>) {
self.watch_tx = Some(tx);
}
/// Get the underlying KV store
pub fn kv(&self) -> &KvStore {
&self.kv
}
/// Get the lease store
pub fn leases(&self) -> &Arc<LeaseStore> {
&self.leases
}
/// Get current revision
pub fn current_revision(&self) -> Revision {
self.kv.current_revision()
}
/// Apply a Raft command and return the response
pub fn apply(&self, command: RaftCommand) -> Result<RaftResponse, StorageError> {
match command {
RaftCommand::Put {
key,
value,
lease_id,
prev_kv,
} => self.apply_put(key, value, lease_id, prev_kv),
RaftCommand::Delete { key, prev_kv } => self.apply_delete(key, prev_kv),
RaftCommand::DeleteRange {
start,
end,
prev_kv,
} => self.apply_delete_range(start, end, prev_kv),
RaftCommand::Txn {
compare,
success,
failure,
} => self.apply_txn(compare, success, failure),
RaftCommand::LeaseGrant { id, ttl } => self.apply_lease_grant(id, ttl),
RaftCommand::LeaseRevoke { id } => self.apply_lease_revoke(id),
RaftCommand::LeaseRefresh { id } => self.apply_lease_refresh(id),
RaftCommand::Noop => Ok(RaftResponse::new(self.current_revision())),
}
}
/// Apply a Put command
fn apply_put(
&self,
key: Vec<u8>,
value: Vec<u8>,
lease_id: Option<i64>,
return_prev: bool,
) -> Result<RaftResponse, StorageError> {
// If key previously had a lease, detach it
if let Some(ref prev_entry) = self.kv.get(&key)? {
if let Some(old_lease_id) = prev_entry.lease_id {
self.leases.detach_key(old_lease_id, &key);
}
}
let (revision, prev) = self.kv.put(key.clone(), value.clone(), lease_id)?;
// Attach key to new lease if specified
if let Some(lid) = lease_id {
if let Err(e) = self.leases.attach_key(lid, key.clone()) {
warn!("Failed to attach key to lease {}: {}", lid, e);
}
}
// Emit watch event
if let Some(tx) = &self.watch_tx {
let entry = self.kv.get(&key)?.unwrap();
let event = WatchEvent::put(entry, if return_prev { prev.clone() } else { None });
if tx.send(event).is_err() {
warn!("Watch event channel closed");
}
}
Ok(RaftResponse::with_prev_kv(
revision,
if return_prev { prev } else { None },
))
}
/// Apply a Delete command
fn apply_delete(&self, key: Vec<u8>, return_prev: bool) -> Result<RaftResponse, StorageError> {
// Detach from lease if attached
if let Some(ref entry) = self.kv.get(&key)? {
if let Some(lease_id) = entry.lease_id {
self.leases.detach_key(lease_id, &key);
}
}
let (revision, prev) = self.kv.delete(&key)?;
// Emit watch event if key existed
if let (Some(tx), Some(ref deleted)) = (&self.watch_tx, &prev) {
let event = WatchEvent::delete(
deleted.clone(),
if return_prev { prev.clone() } else { None },
);
if tx.send(event).is_err() {
warn!("Watch event channel closed");
}
}
let deleted = if prev.is_some() { 1 } else { 0 };
Ok(RaftResponse {
revision,
prev_kv: if return_prev { prev } else { None },
deleted,
..Default::default()
})
}
/// Apply a DeleteRange command
fn apply_delete_range(
&self,
start: Vec<u8>,
end: Vec<u8>,
return_prev: bool,
) -> Result<RaftResponse, StorageError> {
let (revision, deleted_entries) = self.kv.delete_range(&start, &end)?;
// Emit watch events for each deleted key
if let Some(tx) = &self.watch_tx {
for entry in &deleted_entries {
let event = WatchEvent::delete(entry.clone(), None);
if tx.send(event).is_err() {
warn!("Watch event channel closed");
break;
}
}
}
Ok(RaftResponse::deleted(
revision,
deleted_entries.len() as u64,
if return_prev { deleted_entries } else { vec![] },
))
}
/// Apply a transaction
fn apply_txn(
&self,
compare: Vec<Compare>,
success: Vec<chainfire_types::command::TxnOp>,
failure: Vec<chainfire_types::command::TxnOp>,
) -> Result<RaftResponse, StorageError> {
use chainfire_types::command::TxnOpResponse;
// Evaluate all comparisons
let all_match = compare.iter().all(|c| self.evaluate_compare(c));
let ops = if all_match { &success } else { &failure };
// Apply operations and collect responses
let mut txn_responses = Vec::with_capacity(ops.len());
for op in ops {
match op {
chainfire_types::command::TxnOp::Put {
key,
value,
lease_id,
} => {
let resp = self.apply_put(key.clone(), value.clone(), *lease_id, true)?;
txn_responses.push(TxnOpResponse::Put {
prev_kv: resp.prev_kv,
});
}
chainfire_types::command::TxnOp::Delete { key } => {
let resp = self.apply_delete(key.clone(), true)?;
txn_responses.push(TxnOpResponse::Delete {
deleted: resp.deleted,
prev_kvs: resp.prev_kvs,
});
}
chainfire_types::command::TxnOp::DeleteRange { start, end } => {
let resp = self.apply_delete_range(start.clone(), end.clone(), true)?;
txn_responses.push(TxnOpResponse::Delete {
deleted: resp.deleted,
prev_kvs: resp.prev_kvs,
});
}
chainfire_types::command::TxnOp::Range {
key,
range_end,
limit,
keys_only,
count_only,
} => {
// Range operations are read-only - perform the read here
let entries = if range_end.is_empty() {
// Single key lookup
match self.kv.get(key)? {
Some(entry) => vec![entry],
None => vec![],
}
} else {
// Range query
let end_opt = if range_end.is_empty() {
None
} else {
Some(range_end.as_slice())
};
let mut results = self.kv.range(key, end_opt)?;
// Apply limit
if *limit > 0 {
results.truncate(*limit as usize);
}
results
};
let count = entries.len() as u64;
let kvs = if *count_only {
vec![]
} else if *keys_only {
entries
.into_iter()
.map(|e| chainfire_types::kv::KvEntry {
key: e.key,
value: vec![],
version: e.version,
create_revision: e.create_revision,
mod_revision: e.mod_revision,
lease_id: e.lease_id,
})
.collect()
} else {
entries
};
txn_responses.push(TxnOpResponse::Range {
kvs,
count,
more: false, // TODO: handle pagination
});
}
}
}
Ok(RaftResponse::txn(
self.current_revision(),
all_match,
txn_responses,
))
}
/// Evaluate a single comparison
fn evaluate_compare(&self, compare: &Compare) -> bool {
let entry = match self.kv.get(&compare.key) {
Ok(Some(e)) => e,
Ok(None) => {
// Key doesn't exist - special handling
return match &compare.target {
CompareTarget::Version(v) => match compare.result {
CompareResult::Equal => *v == 0,
CompareResult::NotEqual => *v != 0,
CompareResult::Greater => false,
CompareResult::Less => *v > 0,
},
_ => false,
};
}
Err(_) => return false,
};
match &compare.target {
CompareTarget::Version(expected) => {
self.compare_values(entry.version, *expected, compare.result)
}
CompareTarget::CreateRevision(expected) => {
self.compare_values(entry.create_revision, *expected, compare.result)
}
CompareTarget::ModRevision(expected) => {
self.compare_values(entry.mod_revision, *expected, compare.result)
}
CompareTarget::Value(expected) => match compare.result {
CompareResult::Equal => entry.value == *expected,
CompareResult::NotEqual => entry.value != *expected,
CompareResult::Greater => entry.value.as_slice() > expected.as_slice(),
CompareResult::Less => entry.value.as_slice() < expected.as_slice(),
},
}
}
/// Compare two numeric values
fn compare_values(&self, actual: u64, expected: u64, result: CompareResult) -> bool {
match result {
CompareResult::Equal => actual == expected,
CompareResult::NotEqual => actual != expected,
CompareResult::Greater => actual > expected,
CompareResult::Less => actual < expected,
}
}
/// Apply a lease grant command
fn apply_lease_grant(&self, id: i64, ttl: i64) -> Result<RaftResponse, StorageError> {
let lease = self.leases.grant(id, ttl)?;
Ok(RaftResponse::lease(self.current_revision(), lease.id, lease.ttl))
}
/// Apply a lease revoke command
fn apply_lease_revoke(&self, id: i64) -> Result<RaftResponse, StorageError> {
let keys = self.leases.revoke(id)?;
// Delete all keys attached to the lease
let mut deleted = 0u64;
for key in keys {
let (_, prev) = self.kv.delete(&key)?;
if prev.is_some() {
deleted += 1;
// Emit watch event
if let (Some(tx), Some(ref entry)) = (&self.watch_tx, &prev) {
let event = WatchEvent::delete(entry.clone(), None);
if tx.send(event).is_err() {
warn!("Watch event channel closed");
}
}
}
}
Ok(RaftResponse {
revision: self.current_revision(),
deleted,
..Default::default()
})
}
/// Apply a lease refresh command
fn apply_lease_refresh(&self, id: i64) -> Result<RaftResponse, StorageError> {
let ttl = self.leases.refresh(id)?;
Ok(RaftResponse::lease(self.current_revision(), id, ttl))
}
/// Delete keys by lease ID (called when lease expires)
pub fn delete_keys_by_lease(&self, lease_id: i64) -> Result<u64, StorageError> {
if let Some(lease) = self.leases.get(lease_id) {
let keys = lease.keys.clone();
// Revoke will also return the keys, but we already have them
let _ = self.leases.revoke(lease_id);
let mut deleted = 0u64;
for key in keys {
let (_, prev) = self.kv.delete(&key)?;
if prev.is_some() {
deleted += 1;
// Emit watch event
if let (Some(tx), Some(ref entry)) = (&self.watch_tx, &prev) {
let event = WatchEvent::delete(entry.clone(), None);
if tx.send(event).is_err() {
warn!("Watch event channel closed");
}
}
}
}
Ok(deleted)
} else {
Ok(0)
}
}
}
#[cfg(test)]
mod tests {
use super::*;
use tempfile::tempdir;
fn create_test_state_machine() -> StateMachine {
let dir = tempdir().unwrap();
let store = RocksStore::new(dir.path()).unwrap();
StateMachine::new(store).unwrap()
}
#[test]
fn test_apply_put() {
let sm = create_test_state_machine();
let cmd = RaftCommand::Put {
key: b"key1".to_vec(),
value: b"value1".to_vec(),
lease_id: None,
prev_kv: false,
};
let response = sm.apply(cmd).unwrap();
assert_eq!(response.revision, 1);
assert!(response.prev_kv.is_none());
let entry = sm.kv().get(b"key1").unwrap().unwrap();
assert_eq!(entry.value, b"value1");
}
#[test]
fn test_apply_put_with_prev() {
let sm = create_test_state_machine();
sm.apply(RaftCommand::Put {
key: b"key1".to_vec(),
value: b"value1".to_vec(),
lease_id: None,
prev_kv: false,
})
.unwrap();
let response = sm
.apply(RaftCommand::Put {
key: b"key1".to_vec(),
value: b"value2".to_vec(),
lease_id: None,
prev_kv: true,
})
.unwrap();
assert_eq!(response.revision, 2);
assert!(response.prev_kv.is_some());
assert_eq!(response.prev_kv.unwrap().value, b"value1");
}
#[test]
fn test_apply_delete() {
let sm = create_test_state_machine();
sm.apply(RaftCommand::Put {
key: b"key1".to_vec(),
value: b"value1".to_vec(),
lease_id: None,
prev_kv: false,
})
.unwrap();
let response = sm
.apply(RaftCommand::Delete {
key: b"key1".to_vec(),
prev_kv: true,
})
.unwrap();
assert_eq!(response.deleted, 1);
assert!(response.prev_kv.is_some());
assert!(sm.kv().get(b"key1").unwrap().is_none());
}
#[test]
fn test_apply_txn_success() {
let sm = create_test_state_machine();
// Create initial key
sm.apply(RaftCommand::Put {
key: b"counter".to_vec(),
value: b"1".to_vec(),
lease_id: None,
prev_kv: false,
})
.unwrap();
// Transaction: if version == 1, increment
let cmd = RaftCommand::Txn {
compare: vec![Compare {
key: b"counter".to_vec(),
target: CompareTarget::Version(1),
result: CompareResult::Equal,
}],
success: vec![chainfire_types::command::TxnOp::Put {
key: b"counter".to_vec(),
value: b"2".to_vec(),
lease_id: None,
}],
failure: vec![],
};
let response = sm.apply(cmd).unwrap();
assert!(response.succeeded);
let entry = sm.kv().get(b"counter").unwrap().unwrap();
assert_eq!(entry.value, b"2");
}
#[test]
fn test_apply_txn_failure() {
let sm = create_test_state_machine();
// Create initial key
sm.apply(RaftCommand::Put {
key: b"counter".to_vec(),
value: b"1".to_vec(),
lease_id: None,
prev_kv: false,
})
.unwrap();
// Transaction: if version == 5, increment (should fail)
let cmd = RaftCommand::Txn {
compare: vec![Compare {
key: b"counter".to_vec(),
target: CompareTarget::Version(5),
result: CompareResult::Equal,
}],
success: vec![chainfire_types::command::TxnOp::Put {
key: b"counter".to_vec(),
value: b"2".to_vec(),
lease_id: None,
}],
failure: vec![chainfire_types::command::TxnOp::Put {
key: b"counter".to_vec(),
value: b"failed".to_vec(),
lease_id: None,
}],
};
let response = sm.apply(cmd).unwrap();
assert!(!response.succeeded);
let entry = sm.kv().get(b"counter").unwrap().unwrap();
assert_eq!(entry.value, b"failed");
}
#[tokio::test]
async fn test_watch_events() {
let mut sm = create_test_state_machine();
let (tx, mut rx) = mpsc::unbounded_channel();
sm.set_watch_sender(tx);
// Apply a put
sm.apply(RaftCommand::Put {
key: b"key1".to_vec(),
value: b"value1".to_vec(),
lease_id: None,
prev_kv: false,
})
.unwrap();
// Check event was sent
let event = rx.recv().await.unwrap();
assert!(event.is_put());
assert_eq!(event.kv.key, b"key1");
assert_eq!(event.kv.value, b"value1");
}
}

View file

@ -0,0 +1,132 @@
//! RocksDB store management
use crate::cf;
use chainfire_types::error::StorageError;
use rocksdb::{BoundColumnFamily, ColumnFamilyDescriptor, Options, DB};
use std::path::Path;
use std::sync::Arc;
/// RocksDB store wrapper with column families
pub struct RocksStore {
db: Arc<DB>,
}
impl RocksStore {
/// Open or create a RocksDB database at the given path
pub fn new(path: impl AsRef<Path>) -> Result<Self, StorageError> {
let path = path.as_ref();
let mut db_opts = Options::default();
db_opts.create_if_missing(true);
db_opts.create_missing_column_families(true);
db_opts.set_max_background_jobs(4);
db_opts.set_bytes_per_sync(1024 * 1024); // 1MB
// Define column families
let cf_descriptors = vec![
ColumnFamilyDescriptor::new(cf::LOGS, Self::logs_cf_options()),
ColumnFamilyDescriptor::new(cf::META, Self::meta_cf_options()),
ColumnFamilyDescriptor::new(cf::KV, Self::kv_cf_options()),
ColumnFamilyDescriptor::new(cf::SNAPSHOT, Self::snapshot_cf_options()),
];
let db = DB::open_cf_descriptors(&db_opts, path, cf_descriptors)
.map_err(|e| StorageError::RocksDb(e.to_string()))?;
Ok(Self { db: Arc::new(db) })
}
/// Get the underlying DB handle
pub fn db(&self) -> &Arc<DB> {
&self.db
}
/// Get a column family handle
pub fn cf_handle(&self, name: &str) -> Option<Arc<BoundColumnFamily<'_>>> {
self.db.cf_handle(name)
}
/// Options for the logs column family
fn logs_cf_options() -> Options {
let mut opts = Options::default();
// Optimize for sequential reads/writes
opts.set_write_buffer_size(64 * 1024 * 1024); // 64MB
opts.set_max_write_buffer_number(3);
opts
}
/// Options for the metadata column family
fn meta_cf_options() -> Options {
let mut opts = Options::default();
// Small, frequently updated
opts.set_write_buffer_size(16 * 1024 * 1024); // 16MB
opts
}
/// Options for the KV column family
fn kv_cf_options() -> Options {
let mut opts = Options::default();
// Optimize for point lookups and range scans
opts.set_write_buffer_size(128 * 1024 * 1024); // 128MB
opts.set_max_write_buffer_number(4);
// Enable bloom filters for faster lookups
opts.set_prefix_extractor(rocksdb::SliceTransform::create_fixed_prefix(8));
opts
}
/// Options for the snapshot column family
fn snapshot_cf_options() -> Options {
let mut opts = Options::default();
opts.set_write_buffer_size(32 * 1024 * 1024); // 32MB
opts
}
}
impl Clone for RocksStore {
fn clone(&self) -> Self {
Self {
db: Arc::clone(&self.db),
}
}
}
#[cfg(test)]
mod tests {
use super::*;
use tempfile::tempdir;
#[test]
fn test_create_store() {
let dir = tempdir().unwrap();
let store = RocksStore::new(dir.path()).unwrap();
// Verify all column families exist
assert!(store.cf_handle(cf::LOGS).is_some());
assert!(store.cf_handle(cf::META).is_some());
assert!(store.cf_handle(cf::KV).is_some());
assert!(store.cf_handle(cf::SNAPSHOT).is_some());
}
#[test]
fn test_reopen_store() {
let dir = tempdir().unwrap();
// Create and close
{
let store = RocksStore::new(dir.path()).unwrap();
let cf = store.cf_handle(cf::META).unwrap();
store
.db()
.put_cf(&cf, b"test_key", b"test_value")
.unwrap();
}
// Reopen and verify data persisted
{
let store = RocksStore::new(dir.path()).unwrap();
let cf = store.cf_handle(cf::META).unwrap();
let value = store.db().get_cf(&cf, b"test_key").unwrap();
assert_eq!(value, Some(b"test_value".to_vec()));
}
}
}

View file

@ -0,0 +1,18 @@
[package]
name = "chainfire-types"
version.workspace = true
edition.workspace = true
license.workspace = true
rust-version.workspace = true
description = "Core types for Chainfire distributed KVS"
[dependencies]
serde = { workspace = true }
thiserror = { workspace = true }
bytes = { workspace = true }
[dev-dependencies]
bincode = { workspace = true }
[lints]
workspace = true

View file

@ -0,0 +1,270 @@
//! Raft commands and responses
//!
//! These types are submitted to Raft for consensus and applied to the state machine.
use crate::kv::KvEntry;
use crate::Revision;
use serde::{Deserialize, Serialize};
/// Commands submitted to Raft consensus
#[derive(Debug, Clone, PartialEq, Eq, Serialize, Deserialize)]
pub enum RaftCommand {
/// Put a key-value pair
Put {
key: Vec<u8>,
value: Vec<u8>,
lease_id: Option<i64>,
/// If true, return the previous value
prev_kv: bool,
},
/// Delete a single key
Delete {
key: Vec<u8>,
/// If true, return the deleted value
prev_kv: bool,
},
/// Delete a range of keys
DeleteRange {
start: Vec<u8>,
end: Vec<u8>,
/// If true, return deleted values
prev_kv: bool,
},
/// Transaction with multiple operations
Txn {
/// Comparison conditions
compare: Vec<Compare>,
/// Operations to execute if all comparisons succeed
success: Vec<TxnOp>,
/// Operations to execute if any comparison fails
failure: Vec<TxnOp>,
},
/// Grant a new lease
LeaseGrant {
/// Requested lease ID (0 for server-assigned)
id: i64,
/// TTL in seconds
ttl: i64,
},
/// Revoke a lease (deletes all attached keys)
LeaseRevoke {
/// Lease ID to revoke
id: i64,
},
/// Refresh a lease TTL (keep-alive)
LeaseRefresh {
/// Lease ID to refresh
id: i64,
},
/// No-op command for Raft leadership establishment
Noop,
}
impl Default for RaftCommand {
fn default() -> Self {
Self::Noop
}
}
/// Comparison for transaction conditions
#[derive(Debug, Clone, PartialEq, Eq, Serialize, Deserialize)]
pub struct Compare {
pub key: Vec<u8>,
pub target: CompareTarget,
pub result: CompareResult,
}
/// What to compare against
#[derive(Debug, Clone, PartialEq, Eq, Serialize, Deserialize)]
pub enum CompareTarget {
/// Compare the version number
Version(u64),
/// Compare the creation revision
CreateRevision(Revision),
/// Compare the modification revision
ModRevision(Revision),
/// Compare the value
Value(Vec<u8>),
}
/// Comparison operator
#[derive(Debug, Clone, Copy, PartialEq, Eq, Serialize, Deserialize)]
pub enum CompareResult {
Equal,
NotEqual,
Greater,
Less,
}
/// Operation in a transaction
#[derive(Debug, Clone, PartialEq, Eq, Serialize, Deserialize)]
pub enum TxnOp {
Put {
key: Vec<u8>,
value: Vec<u8>,
lease_id: Option<i64>,
},
Delete {
key: Vec<u8>,
},
DeleteRange {
start: Vec<u8>,
end: Vec<u8>,
},
/// Range query within a transaction
Range {
key: Vec<u8>,
range_end: Vec<u8>,
limit: i64,
keys_only: bool,
count_only: bool,
},
}
/// Response from a single operation in a transaction
#[derive(Debug, Clone, PartialEq, Eq, Serialize, Deserialize)]
pub enum TxnOpResponse {
/// Response from a Put operation
Put {
prev_kv: Option<KvEntry>,
},
/// Response from a Delete/DeleteRange operation
Delete {
deleted: u64,
prev_kvs: Vec<KvEntry>,
},
/// Response from a Range operation
Range {
kvs: Vec<KvEntry>,
count: u64,
more: bool,
},
}
/// Response from applying a Raft command
#[derive(Debug, Clone, PartialEq, Eq, Serialize, Deserialize, Default)]
pub struct RaftResponse {
/// Current revision after this operation
pub revision: Revision,
/// Previous key-value (if requested and existed)
pub prev_kv: Option<KvEntry>,
/// Number of keys deleted (for delete operations)
pub deleted: u64,
/// Whether transaction succeeded (for Txn)
pub succeeded: bool,
/// Previous key-values for batch deletes
pub prev_kvs: Vec<KvEntry>,
/// Lease ID (for lease operations)
pub lease_id: Option<i64>,
/// Lease TTL (for lease operations)
pub lease_ttl: Option<i64>,
/// Individual operation responses (for Txn)
pub txn_responses: Vec<TxnOpResponse>,
}
impl RaftResponse {
/// Create a simple response with just revision
pub fn new(revision: Revision) -> Self {
Self {
revision,
..Default::default()
}
}
/// Create a response with previous key-value
pub fn with_prev_kv(revision: Revision, prev_kv: Option<KvEntry>) -> Self {
Self {
revision,
prev_kv,
..Default::default()
}
}
/// Create a response for delete operations
pub fn deleted(revision: Revision, deleted: u64, prev_kvs: Vec<KvEntry>) -> Self {
Self {
revision,
deleted,
prev_kvs,
..Default::default()
}
}
/// Create a response for transaction
pub fn txn(revision: Revision, succeeded: bool, txn_responses: Vec<TxnOpResponse>) -> Self {
Self {
revision,
succeeded,
txn_responses,
..Default::default()
}
}
/// Create a response for lease operations
pub fn lease(revision: Revision, lease_id: i64, ttl: i64) -> Self {
Self {
revision,
lease_id: Some(lease_id),
lease_ttl: Some(ttl),
..Default::default()
}
}
}
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn test_put_command() {
let cmd = RaftCommand::Put {
key: b"key".to_vec(),
value: b"value".to_vec(),
lease_id: None,
prev_kv: false,
};
let serialized = bincode::serialize(&cmd).unwrap();
let deserialized: RaftCommand = bincode::deserialize(&serialized).unwrap();
assert_eq!(cmd, deserialized);
}
#[test]
fn test_txn_command() {
let cmd = RaftCommand::Txn {
compare: vec![Compare {
key: b"key".to_vec(),
target: CompareTarget::Version(1),
result: CompareResult::Equal,
}],
success: vec![TxnOp::Put {
key: b"key".to_vec(),
value: b"new_value".to_vec(),
lease_id: None,
}],
failure: vec![],
};
let serialized = bincode::serialize(&cmd).unwrap();
let deserialized: RaftCommand = bincode::deserialize(&serialized).unwrap();
assert_eq!(cmd, deserialized);
}
#[test]
fn test_response() {
let entry = KvEntry::new(b"key".to_vec(), b"old".to_vec(), 1);
let response = RaftResponse::with_prev_kv(5, Some(entry.clone()));
assert_eq!(response.revision, 5);
assert_eq!(response.prev_kv, Some(entry));
}
}

View file

@ -0,0 +1,164 @@
//! Error types for Chainfire
use thiserror::Error;
/// Result type alias using Chainfire's Error
pub type Result<T> = std::result::Result<T, Error>;
/// Main error type for Chainfire operations
#[derive(Error, Debug)]
pub enum Error {
/// Storage layer errors
#[error("Storage error: {0}")]
Storage(#[from] StorageError),
/// Raft consensus errors
#[error("Raft error: {0}")]
Raft(#[from] RaftError),
/// Network/RPC errors
#[error("Network error: {0}")]
Network(#[from] NetworkError),
/// Watch errors
#[error("Watch error: {0}")]
Watch(#[from] WatchError),
/// Gossip protocol errors
#[error("Gossip error: {0}")]
Gossip(#[from] GossipError),
/// Configuration errors
#[error("Configuration error: {0}")]
Config(String),
/// Serialization errors
#[error("Serialization error: {0}")]
Serialization(String),
/// Generic internal error
#[error("Internal error: {0}")]
Internal(String),
}
/// Storage layer errors
#[derive(Error, Debug)]
pub enum StorageError {
#[error("Key not found: {0:?}")]
KeyNotFound(Vec<u8>),
#[error("IO error: {0}")]
Io(#[from] std::io::Error),
#[error("RocksDB error: {0}")]
RocksDb(String),
#[error("Serialization error: {0}")]
Serialization(String),
#[error("Snapshot error: {0}")]
Snapshot(String),
#[error("Log compacted: requested {requested}, compacted to {compacted}")]
LogCompacted { requested: u64, compacted: u64 },
#[error("Lease error: {0}")]
LeaseError(String),
}
/// Raft consensus errors
#[derive(Error, Debug)]
pub enum RaftError {
#[error("Not leader, leader is node {leader_id:?}")]
NotLeader { leader_id: Option<u64> },
#[error("Node {0} not found")]
NodeNotFound(u64),
#[error("Proposal failed: {0}")]
ProposalFailed(String),
#[error("Timeout waiting for consensus")]
Timeout,
#[error("Cluster not initialized")]
NotInitialized,
#[error("Already initialized")]
AlreadyInitialized,
#[error("Internal Raft error: {0}")]
Internal(String),
}
/// Network/RPC errors
#[derive(Error, Debug)]
pub enum NetworkError {
#[error("Connection failed to {addr}: {reason}")]
ConnectionFailed { addr: String, reason: String },
#[error("RPC failed: {0}")]
RpcFailed(String),
#[error("Node {0} unreachable")]
Unreachable(u64),
#[error("Timeout")]
Timeout,
#[error("Invalid address: {0}")]
InvalidAddress(String),
}
/// Watch errors
#[derive(Error, Debug)]
pub enum WatchError {
#[error("Watch {0} not found")]
NotFound(i64),
#[error("Watch {0} already exists")]
AlreadyExists(i64),
#[error("Compacted: requested revision {requested}, compacted to {compacted}")]
Compacted { requested: u64, compacted: u64 },
#[error("Stream closed")]
StreamClosed,
}
/// Gossip protocol errors
#[derive(Error, Debug)]
pub enum GossipError {
#[error("Failed to join cluster: {0}")]
JoinFailed(String),
#[error("Broadcast failed: {0}")]
BroadcastFailed(String),
#[error("Invalid identity: {0}")]
InvalidIdentity(String),
#[error("UDP error: {0}")]
Udp(String),
}
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn test_error_display() {
let err = Error::Storage(StorageError::KeyNotFound(b"test".to_vec()));
assert!(err.to_string().contains("Key not found"));
let err = Error::Raft(RaftError::NotLeader { leader_id: Some(1) });
assert!(err.to_string().contains("Not leader"));
}
#[test]
fn test_error_conversion() {
let storage_err = StorageError::KeyNotFound(b"key".to_vec());
let err: Error = storage_err.into();
assert!(matches!(err, Error::Storage(_)));
}
}

View file

@ -0,0 +1,201 @@
//! Key-Value entry types with MVCC versioning
use serde::{Deserialize, Serialize};
/// Revision number for MVCC-style versioning
/// Each write operation increments the global revision counter
pub type Revision = u64;
/// A key-value entry with metadata
#[derive(Debug, Clone, PartialEq, Eq, Serialize, Deserialize)]
pub struct KvEntry {
/// The key
pub key: Vec<u8>,
/// The value
pub value: Vec<u8>,
/// Revision when this key was created
pub create_revision: Revision,
/// Revision of the last modification
pub mod_revision: Revision,
/// Number of modifications since creation
pub version: u64,
/// Optional lease ID for TTL-based expiration
pub lease_id: Option<i64>,
}
impl KvEntry {
/// Create a new KV entry for initial insertion
pub fn new(key: Vec<u8>, value: Vec<u8>, revision: Revision) -> Self {
Self {
key,
value,
create_revision: revision,
mod_revision: revision,
version: 1,
lease_id: None,
}
}
/// Create a new KV entry with lease
pub fn with_lease(key: Vec<u8>, value: Vec<u8>, revision: Revision, lease_id: i64) -> Self {
Self {
key,
value,
create_revision: revision,
mod_revision: revision,
version: 1,
lease_id: Some(lease_id),
}
}
/// Update the entry with a new value and revision
pub fn update(&self, value: Vec<u8>, revision: Revision) -> Self {
Self {
key: self.key.clone(),
value,
create_revision: self.create_revision,
mod_revision: revision,
version: self.version + 1,
lease_id: self.lease_id,
}
}
/// Get the key as a string (lossy conversion)
pub fn key_str(&self) -> String {
String::from_utf8_lossy(&self.key).to_string()
}
/// Get the value as a string (lossy conversion)
pub fn value_str(&self) -> String {
String::from_utf8_lossy(&self.value).to_string()
}
/// Check if this entry has a lease
pub fn has_lease(&self) -> bool {
self.lease_id.is_some()
}
}
impl Default for KvEntry {
fn default() -> Self {
Self {
key: Vec::new(),
value: Vec::new(),
create_revision: 0,
mod_revision: 0,
version: 0,
lease_id: None,
}
}
}
/// Range of keys for scan operations
#[derive(Debug, Clone, PartialEq, Eq, Serialize, Deserialize)]
pub struct KeyRange {
/// Start key (inclusive)
pub start: Vec<u8>,
/// End key (exclusive). If None, scan single key or to end
pub end: Option<Vec<u8>>,
}
impl KeyRange {
/// Create a range for a single key
pub fn key(key: impl Into<Vec<u8>>) -> Self {
Self {
start: key.into(),
end: None,
}
}
/// Create a range from start to end (exclusive)
pub fn range(start: impl Into<Vec<u8>>, end: impl Into<Vec<u8>>) -> Self {
Self {
start: start.into(),
end: Some(end.into()),
}
}
/// Create a prefix range (all keys with given prefix)
pub fn prefix(prefix: impl Into<Vec<u8>>) -> Self {
let prefix = prefix.into();
let end = prefix_end(&prefix);
Self {
start: prefix,
end: Some(end),
}
}
/// Check if this range matches a single key
pub fn is_single_key(&self) -> bool {
self.end.is_none()
}
}
/// Calculate the end key for a prefix scan
/// For prefix "abc", returns "abd" (increment last byte)
fn prefix_end(prefix: &[u8]) -> Vec<u8> {
let mut end = prefix.to_vec();
for i in (0..end.len()).rev() {
if end[i] < 0xff {
end[i] += 1;
end.truncate(i + 1);
return end;
}
}
// All bytes are 0xff, return empty to indicate no upper bound
Vec::new()
}
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn test_kv_entry_new() {
let entry = KvEntry::new(b"key".to_vec(), b"value".to_vec(), 1);
assert_eq!(entry.key, b"key");
assert_eq!(entry.value, b"value");
assert_eq!(entry.create_revision, 1);
assert_eq!(entry.mod_revision, 1);
assert_eq!(entry.version, 1);
assert!(entry.lease_id.is_none());
}
#[test]
fn test_kv_entry_update() {
let entry = KvEntry::new(b"key".to_vec(), b"value1".to_vec(), 1);
let updated = entry.update(b"value2".to_vec(), 5);
assert_eq!(updated.key, b"key");
assert_eq!(updated.value, b"value2");
assert_eq!(updated.create_revision, 1); // Unchanged
assert_eq!(updated.mod_revision, 5);
assert_eq!(updated.version, 2);
}
#[test]
fn test_prefix_end() {
assert_eq!(prefix_end(b"abc"), b"abd");
assert_eq!(prefix_end(b"ab\xff"), b"ac");
assert_eq!(prefix_end(b"\xff\xff"), Vec::<u8>::new());
}
#[test]
fn test_key_range_prefix() {
let range = KeyRange::prefix("/nodes/");
assert_eq!(range.start, b"/nodes/");
assert_eq!(range.end, Some(b"/nodes0".to_vec())); // '/' + 1 = '0'
}
#[test]
fn test_kv_serialization() {
let entry = KvEntry::new(b"test".to_vec(), b"data".to_vec(), 42);
let serialized = bincode::serialize(&entry).unwrap();
let deserialized: KvEntry = bincode::deserialize(&serialized).unwrap();
assert_eq!(entry, deserialized);
}
}

View file

@ -0,0 +1,187 @@
//! Lease types for TTL-based key expiration
//!
//! Leases provide time-to-live (TTL) functionality for keys. When a lease expires
//! or is revoked, all keys attached to it are automatically deleted.
use serde::{Deserialize, Serialize};
use std::time::{Duration, Instant};
/// Unique identifier for a lease
pub type LeaseId = i64;
/// A lease with TTL-based expiration
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct Lease {
/// Unique ID of the lease
pub id: LeaseId,
/// Time-to-live in seconds (as originally granted)
pub ttl: i64,
/// Remaining TTL in seconds (decremented over time)
#[serde(skip)]
pub remaining_ttl: i64,
/// Keys attached to this lease
pub keys: Vec<Vec<u8>>,
/// When the lease was created (for TTL calculation)
#[serde(skip)]
pub granted_at: Option<Instant>,
}
impl Lease {
/// Create a new lease with the given ID and TTL
pub fn new(id: LeaseId, ttl: i64) -> Self {
Self {
id,
ttl,
remaining_ttl: ttl,
keys: Vec::new(),
granted_at: Some(Instant::now()),
}
}
/// Check if the lease has expired
pub fn is_expired(&self) -> bool {
if let Some(granted_at) = self.granted_at {
let elapsed = granted_at.elapsed();
elapsed >= Duration::from_secs(self.ttl as u64)
} else {
// If no granted_at, use remaining_ttl
self.remaining_ttl <= 0
}
}
/// Get the remaining TTL in seconds
pub fn remaining(&self) -> i64 {
if let Some(granted_at) = self.granted_at {
let elapsed = granted_at.elapsed().as_secs() as i64;
(self.ttl - elapsed).max(0)
} else {
self.remaining_ttl.max(0)
}
}
/// Refresh the lease TTL (for keep-alive)
pub fn refresh(&mut self) {
self.granted_at = Some(Instant::now());
self.remaining_ttl = self.ttl;
}
/// Attach a key to this lease
pub fn attach_key(&mut self, key: Vec<u8>) {
if !self.keys.contains(&key) {
self.keys.push(key);
}
}
/// Detach a key from this lease
pub fn detach_key(&mut self, key: &[u8]) {
self.keys.retain(|k| k != key);
}
}
/// Persistent lease data (for serialization without Instant)
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct LeaseData {
/// Unique ID of the lease
pub id: LeaseId,
/// Time-to-live in seconds
pub ttl: i64,
/// Keys attached to this lease
pub keys: Vec<Vec<u8>>,
/// Unix timestamp when granted (for persistence)
pub granted_at_unix: u64,
}
impl LeaseData {
/// Create lease data from a lease
pub fn from_lease(lease: &Lease) -> Self {
use std::time::SystemTime;
let now = SystemTime::now()
.duration_since(SystemTime::UNIX_EPOCH)
.unwrap()
.as_secs();
Self {
id: lease.id,
ttl: lease.ttl,
keys: lease.keys.clone(),
granted_at_unix: now,
}
}
/// Convert to a lease (sets granted_at to now)
pub fn to_lease(&self) -> Lease {
use std::time::SystemTime;
let now = SystemTime::now()
.duration_since(SystemTime::UNIX_EPOCH)
.unwrap()
.as_secs();
let elapsed = (now - self.granted_at_unix) as i64;
let remaining = (self.ttl - elapsed).max(0);
Lease {
id: self.id,
ttl: self.ttl,
remaining_ttl: remaining,
keys: self.keys.clone(),
granted_at: Some(Instant::now() - Duration::from_secs(elapsed as u64)),
}
}
}
#[cfg(test)]
mod tests {
use super::*;
use std::thread;
use std::time::Duration;
#[test]
fn test_lease_creation() {
let lease = Lease::new(1, 10);
assert_eq!(lease.id, 1);
assert_eq!(lease.ttl, 10);
assert!(!lease.is_expired());
}
#[test]
fn test_lease_remaining() {
let lease = Lease::new(1, 10);
let remaining = lease.remaining();
assert!(remaining >= 9 && remaining <= 10);
}
#[test]
fn test_lease_attach_key() {
let mut lease = Lease::new(1, 10);
lease.attach_key(b"key1".to_vec());
lease.attach_key(b"key2".to_vec());
assert_eq!(lease.keys.len(), 2);
// Duplicate should not add
lease.attach_key(b"key1".to_vec());
assert_eq!(lease.keys.len(), 2);
}
#[test]
fn test_lease_detach_key() {
let mut lease = Lease::new(1, 10);
lease.attach_key(b"key1".to_vec());
lease.attach_key(b"key2".to_vec());
lease.detach_key(b"key1");
assert_eq!(lease.keys.len(), 1);
assert_eq!(lease.keys[0], b"key2".to_vec());
}
#[test]
fn test_lease_refresh() {
let mut lease = Lease::new(1, 1);
// Sleep briefly to ensure some time passes
thread::sleep(Duration::from_millis(100));
let remaining_before = lease.remaining();
lease.refresh();
let remaining_after = lease.remaining();
// After refresh, remaining should be back to full TTL
assert!(remaining_after >= remaining_before);
}
}

View file

@ -0,0 +1,23 @@
//! Core types for Chainfire distributed Key-Value Store
//!
//! This crate contains all shared type definitions used across the Chainfire system:
//! - Node identification and metadata
//! - Key-Value entry representation with MVCC versioning
//! - Raft commands and responses
//! - Lease types for TTL-based key expiration
//! - Watch event types
//! - Error types
pub mod command;
pub mod error;
pub mod kv;
pub mod lease;
pub mod node;
pub mod watch;
pub use command::{RaftCommand, RaftResponse};
pub use error::{Error, Result};
pub use kv::{KvEntry, Revision};
pub use lease::{Lease, LeaseData, LeaseId};
pub use node::{NodeId, NodeInfo, NodeRole, RaftRole};
pub use watch::{WatchEvent, WatchEventType, WatchRequest};

View file

@ -0,0 +1,255 @@
//! Node identification and metadata types
use serde::{Deserialize, Serialize};
use std::net::SocketAddr;
/// Unique identifier for each node in the cluster
pub type NodeId = u64;
/// Role of a node in the cluster
#[derive(Debug, Clone, Copy, PartialEq, Eq, Hash, Serialize, Deserialize)]
pub enum NodeRole {
/// Control Plane node - participates in Raft consensus
ControlPlane,
/// Worker node - only participates in gossip, watches Control Plane
Worker,
}
impl Default for NodeRole {
fn default() -> Self {
Self::Worker
}
}
/// Raft participation role for a node.
///
/// This determines whether and how a node participates in the Raft consensus protocol.
/// The RaftRole is separate from NodeRole (gossip role) - a node can be a ControlPlane
/// gossip participant without being a Raft voter.
#[derive(Debug, Clone, Copy, PartialEq, Eq, Hash, Serialize, Deserialize, Default)]
#[serde(rename_all = "lowercase")]
pub enum RaftRole {
/// Full voting member in Raft consensus.
/// Participates in leader election and log replication.
#[default]
Voter,
/// Non-voting replica that receives log replication.
/// Can be promoted to Voter via cluster membership change.
Learner,
/// No Raft participation.
/// Node only uses gossip and acts as a client proxy.
None,
}
impl RaftRole {
/// Check if this role participates in Raft at all.
///
/// Returns `true` for Voter and Learner, `false` for None.
pub fn participates_in_raft(&self) -> bool {
!matches!(self, RaftRole::None)
}
/// Check if this role is a voting member.
pub fn is_voter(&self) -> bool {
matches!(self, RaftRole::Voter)
}
/// Check if this role is a learner (non-voting replica).
pub fn is_learner(&self) -> bool {
matches!(self, RaftRole::Learner)
}
/// Convert to string representation.
pub fn as_str(&self) -> &'static str {
match self {
RaftRole::Voter => "voter",
RaftRole::Learner => "learner",
RaftRole::None => "none",
}
}
}
impl std::fmt::Display for RaftRole {
fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
write!(f, "{}", self.as_str())
}
}
impl std::str::FromStr for RaftRole {
type Err = String;
fn from_str(s: &str) -> Result<Self, Self::Err> {
match s.to_lowercase().as_str() {
"voter" => Ok(RaftRole::Voter),
"learner" => Ok(RaftRole::Learner),
"none" => Ok(RaftRole::None),
_ => Err(format!(
"invalid raft role '{}', expected 'voter', 'learner', or 'none'",
s
)),
}
}
}
/// Node metadata stored in cluster membership
#[derive(Debug, Clone, PartialEq, Eq, Serialize, Deserialize)]
pub struct NodeInfo {
/// Unique node identifier
pub id: NodeId,
/// Human-readable node name
pub name: String,
/// Address for Raft RPCs (Control Plane nodes only)
pub raft_addr: Option<SocketAddr>,
/// Address for client API (gRPC)
pub api_addr: SocketAddr,
/// Address for gossip protocol (UDP)
pub gossip_addr: SocketAddr,
/// Node role in the cluster
pub role: NodeRole,
}
impl NodeInfo {
/// Create a new Control Plane node info
pub fn control_plane(
id: NodeId,
name: impl Into<String>,
raft_addr: SocketAddr,
api_addr: SocketAddr,
gossip_addr: SocketAddr,
) -> Self {
Self {
id,
name: name.into(),
raft_addr: Some(raft_addr),
api_addr,
gossip_addr,
role: NodeRole::ControlPlane,
}
}
/// Create a new Worker node info
pub fn worker(
id: NodeId,
name: impl Into<String>,
api_addr: SocketAddr,
gossip_addr: SocketAddr,
) -> Self {
Self {
id,
name: name.into(),
raft_addr: None,
api_addr,
gossip_addr,
role: NodeRole::Worker,
}
}
/// Check if this node is a Control Plane node
pub fn is_control_plane(&self) -> bool {
self.role == NodeRole::ControlPlane
}
/// Check if this node is a Worker node
pub fn is_worker(&self) -> bool {
self.role == NodeRole::Worker
}
}
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn test_control_plane_node() {
let node = NodeInfo::control_plane(
1,
"cp-1",
"127.0.0.1:5000".parse().unwrap(),
"127.0.0.1:5001".parse().unwrap(),
"127.0.0.1:5002".parse().unwrap(),
);
assert_eq!(node.id, 1);
assert_eq!(node.name, "cp-1");
assert!(node.is_control_plane());
assert!(!node.is_worker());
assert!(node.raft_addr.is_some());
}
#[test]
fn test_worker_node() {
let node = NodeInfo::worker(
100,
"worker-1",
"127.0.0.1:6001".parse().unwrap(),
"127.0.0.1:6002".parse().unwrap(),
);
assert_eq!(node.id, 100);
assert!(node.is_worker());
assert!(!node.is_control_plane());
assert!(node.raft_addr.is_none());
}
#[test]
fn test_node_serialization() {
let node = NodeInfo::control_plane(
1,
"test",
"127.0.0.1:5000".parse().unwrap(),
"127.0.0.1:5001".parse().unwrap(),
"127.0.0.1:5002".parse().unwrap(),
);
let serialized = bincode::serialize(&node).unwrap();
let deserialized: NodeInfo = bincode::deserialize(&serialized).unwrap();
assert_eq!(node, deserialized);
}
#[test]
fn test_raft_role_default() {
let role = RaftRole::default();
assert_eq!(role, RaftRole::Voter);
assert!(role.participates_in_raft());
assert!(role.is_voter());
}
#[test]
fn test_raft_role_participates() {
assert!(RaftRole::Voter.participates_in_raft());
assert!(RaftRole::Learner.participates_in_raft());
assert!(!RaftRole::None.participates_in_raft());
}
#[test]
fn test_raft_role_from_str() {
assert_eq!("voter".parse::<RaftRole>().unwrap(), RaftRole::Voter);
assert_eq!("learner".parse::<RaftRole>().unwrap(), RaftRole::Learner);
assert_eq!("none".parse::<RaftRole>().unwrap(), RaftRole::None);
assert_eq!("VOTER".parse::<RaftRole>().unwrap(), RaftRole::Voter);
assert!("invalid".parse::<RaftRole>().is_err());
}
#[test]
fn test_raft_role_display() {
assert_eq!(RaftRole::Voter.to_string(), "voter");
assert_eq!(RaftRole::Learner.to_string(), "learner");
assert_eq!(RaftRole::None.to_string(), "none");
}
#[test]
fn test_raft_role_serialization() {
// Test binary serialization
let serialized = bincode::serialize(&RaftRole::Voter).unwrap();
let deserialized: RaftRole = bincode::deserialize(&serialized).unwrap();
assert_eq!(deserialized, RaftRole::Voter);
// Test all variants
for role in [RaftRole::Voter, RaftRole::Learner, RaftRole::None] {
let serialized = bincode::serialize(&role).unwrap();
let deserialized: RaftRole = bincode::deserialize(&serialized).unwrap();
assert_eq!(deserialized, role);
}
}
}

View file

@ -0,0 +1,266 @@
//! Watch event types for notifications
use crate::kv::KvEntry;
use crate::Revision;
use serde::{Deserialize, Serialize};
/// Event type for watch notifications
#[derive(Debug, Clone, Copy, PartialEq, Eq, Hash, Serialize, Deserialize)]
pub enum WatchEventType {
/// Key was created or updated
Put,
/// Key was deleted
Delete,
}
/// A single watch event
#[derive(Debug, Clone, PartialEq, Eq, Serialize, Deserialize)]
pub struct WatchEvent {
/// Type of event (Put or Delete)
pub event_type: WatchEventType,
/// Current key-value (for Put, contains new value; for Delete, contains deleted value)
pub kv: KvEntry,
/// Previous key-value (if requested and existed)
pub prev_kv: Option<KvEntry>,
}
impl WatchEvent {
/// Create a Put event
pub fn put(kv: KvEntry, prev_kv: Option<KvEntry>) -> Self {
Self {
event_type: WatchEventType::Put,
kv,
prev_kv,
}
}
/// Create a Delete event
pub fn delete(kv: KvEntry, prev_kv: Option<KvEntry>) -> Self {
Self {
event_type: WatchEventType::Delete,
kv,
prev_kv,
}
}
/// Check if this is a Put event
pub fn is_put(&self) -> bool {
self.event_type == WatchEventType::Put
}
/// Check if this is a Delete event
pub fn is_delete(&self) -> bool {
self.event_type == WatchEventType::Delete
}
}
/// Watch subscription request
#[derive(Debug, Clone, PartialEq, Eq, Serialize, Deserialize)]
pub struct WatchRequest {
/// Unique identifier for this watch
pub watch_id: i64,
/// Key to watch
pub key: Vec<u8>,
/// Range end for prefix/range watches. None = single key
pub range_end: Option<Vec<u8>>,
/// Start watching from this revision. None = from current
pub start_revision: Option<Revision>,
/// Include previous value in events
pub prev_kv: bool,
/// Send periodic progress notifications
pub progress_notify: bool,
}
impl WatchRequest {
/// Create a watch for a single key
pub fn key(watch_id: i64, key: impl Into<Vec<u8>>) -> Self {
Self {
watch_id,
key: key.into(),
range_end: None,
start_revision: None,
prev_kv: false,
progress_notify: false,
}
}
/// Create a watch for all keys with a prefix
pub fn prefix(watch_id: i64, prefix: impl Into<Vec<u8>>) -> Self {
let prefix = prefix.into();
let range_end = crate::kv::KeyRange::prefix(prefix.clone())
.end
.unwrap_or_default();
Self {
watch_id,
key: prefix,
range_end: Some(range_end),
start_revision: None,
prev_kv: false,
progress_notify: false,
}
}
/// Create a watch for a range of keys
pub fn range(watch_id: i64, start: impl Into<Vec<u8>>, end: impl Into<Vec<u8>>) -> Self {
Self {
watch_id,
key: start.into(),
range_end: Some(end.into()),
start_revision: None,
prev_kv: false,
progress_notify: false,
}
}
/// Set start revision
pub fn from_revision(mut self, revision: Revision) -> Self {
self.start_revision = Some(revision);
self
}
/// Request previous values in events
pub fn with_prev_kv(mut self) -> Self {
self.prev_kv = true;
self
}
/// Request progress notifications
pub fn with_progress_notify(mut self) -> Self {
self.progress_notify = true;
self
}
/// Check if this watch matches a key
pub fn matches(&self, key: &[u8]) -> bool {
match &self.range_end {
None => self.key == key,
Some(end) => {
if end.is_empty() {
// Empty end means all keys >= start
key >= self.key.as_slice()
} else {
key >= self.key.as_slice() && key < end.as_slice()
}
}
}
}
}
/// Response for a watch stream
#[derive(Debug, Clone, PartialEq, Eq, Serialize, Deserialize)]
pub struct WatchResponse {
/// Watch ID this response is for
pub watch_id: i64,
/// True if this is a watch creation confirmation
pub created: bool,
/// True if the watch was canceled
pub canceled: bool,
/// Current revision (for progress notifications)
pub compact_revision: Revision,
/// Events in this response
pub events: Vec<WatchEvent>,
}
impl WatchResponse {
/// Create a creation confirmation response
pub fn created(watch_id: i64) -> Self {
Self {
watch_id,
created: true,
canceled: false,
compact_revision: 0,
events: Vec::new(),
}
}
/// Create a cancellation response
pub fn canceled(watch_id: i64) -> Self {
Self {
watch_id,
created: false,
canceled: true,
compact_revision: 0,
events: Vec::new(),
}
}
/// Create an events response
pub fn events(watch_id: i64, events: Vec<WatchEvent>) -> Self {
Self {
watch_id,
created: false,
canceled: false,
compact_revision: 0,
events,
}
}
/// Create a progress notification
pub fn progress(watch_id: i64, revision: Revision) -> Self {
Self {
watch_id,
created: false,
canceled: false,
compact_revision: revision,
events: Vec::new(),
}
}
}
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn test_watch_event_put() {
let kv = KvEntry::new(b"key".to_vec(), b"value".to_vec(), 1);
let event = WatchEvent::put(kv.clone(), None);
assert!(event.is_put());
assert!(!event.is_delete());
assert_eq!(event.kv, kv);
}
#[test]
fn test_watch_request_single_key() {
let req = WatchRequest::key(1, "/test/key");
assert!(req.matches(b"/test/key"));
assert!(!req.matches(b"/test/key2"));
assert!(!req.matches(b"/test"));
}
#[test]
fn test_watch_request_prefix() {
let req = WatchRequest::prefix(1, "/nodes/");
assert!(req.matches(b"/nodes/node1"));
assert!(req.matches(b"/nodes/node2/tasks"));
assert!(!req.matches(b"/nodes")); // No trailing slash
assert!(!req.matches(b"/other/path"));
}
#[test]
fn test_watch_request_range() {
let req = WatchRequest::range(1, "a", "d");
assert!(req.matches(b"a"));
assert!(req.matches(b"b"));
assert!(req.matches(b"c"));
assert!(!req.matches(b"d")); // End is exclusive
assert!(!req.matches(b"e"));
}
#[test]
fn test_watch_serialization() {
let req = WatchRequest::prefix(42, "/test/")
.from_revision(100)
.with_prev_kv();
let serialized = bincode::serialize(&req).unwrap();
let deserialized: WatchRequest = bincode::deserialize(&serialized).unwrap();
assert_eq!(req, deserialized);
}
}

View file

@ -0,0 +1,26 @@
[package]
name = "chainfire-watch"
version.workspace = true
edition.workspace = true
license.workspace = true
rust-version.workspace = true
description = "Watch/notification system for Chainfire distributed KVS"
[dependencies]
chainfire-types = { workspace = true }
# Async
tokio = { workspace = true }
tokio-stream = { workspace = true }
futures = { workspace = true }
# Utilities
tracing = { workspace = true }
parking_lot = { workspace = true }
dashmap = { workspace = true }
[dev-dependencies]
tokio = { workspace = true, features = ["rt-multi-thread", "macros"] }
[lints]
workspace = true

View file

@ -0,0 +1,25 @@
//! Watch/notification system for Chainfire distributed KVS
//!
//! This crate provides:
//! - Watch subscription registry
//! - Prefix/key matching
//! - Event dispatch to subscribers
//! - Watch stream management
pub mod matcher;
pub mod registry;
pub mod stream;
pub use matcher::KeyMatcher;
pub use registry::WatchRegistry;
pub use stream::WatchStream;
use std::sync::atomic::{AtomicI64, Ordering};
/// Global watch ID counter
static WATCH_ID_COUNTER: AtomicI64 = AtomicI64::new(1);
/// Generate a new unique watch ID
pub fn next_watch_id() -> i64 {
WATCH_ID_COUNTER.fetch_add(1, Ordering::SeqCst)
}

View file

@ -0,0 +1,150 @@
//! Key matching utilities for watch subscriptions
/// Key matcher for watch subscriptions
#[derive(Debug, Clone)]
pub struct KeyMatcher {
/// Start key
key: Vec<u8>,
/// Range end (exclusive). None = single key match
range_end: Option<Vec<u8>>,
}
impl KeyMatcher {
/// Create a matcher for a single key
pub fn key(key: impl Into<Vec<u8>>) -> Self {
Self {
key: key.into(),
range_end: None,
}
}
/// Create a matcher for a key range
pub fn range(key: impl Into<Vec<u8>>, range_end: impl Into<Vec<u8>>) -> Self {
Self {
key: key.into(),
range_end: Some(range_end.into()),
}
}
/// Create a matcher for all keys with a given prefix
pub fn prefix(prefix: impl Into<Vec<u8>>) -> Self {
let prefix = prefix.into();
let range_end = prefix_end(&prefix);
Self {
key: prefix,
range_end: Some(range_end),
}
}
/// Create a matcher for all keys
pub fn all() -> Self {
Self {
key: vec![0],
range_end: Some(vec![]),
}
}
/// Check if a key matches this matcher
pub fn matches(&self, target: &[u8]) -> bool {
match &self.range_end {
None => self.key == target,
Some(end) => {
if end.is_empty() {
// Empty end means all keys >= start
target >= self.key.as_slice()
} else {
target >= self.key.as_slice() && target < end.as_slice()
}
}
}
}
/// Get the start key
pub fn start_key(&self) -> &[u8] {
&self.key
}
/// Get the range end
pub fn range_end(&self) -> Option<&[u8]> {
self.range_end.as_deref()
}
/// Check if this is a single key match
pub fn is_single_key(&self) -> bool {
self.range_end.is_none()
}
/// Check if this is a prefix match
pub fn is_prefix(&self) -> bool {
self.range_end.is_some()
}
}
/// Calculate the end key for a prefix scan
/// For prefix "abc", returns "abd" (increment last byte)
fn prefix_end(prefix: &[u8]) -> Vec<u8> {
let mut end = prefix.to_vec();
for i in (0..end.len()).rev() {
if end[i] < 0xff {
end[i] += 1;
end.truncate(i + 1);
return end;
}
}
// All bytes are 0xff, return empty to indicate no upper bound
Vec::new()
}
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn test_single_key_match() {
let matcher = KeyMatcher::key(b"/nodes/1");
assert!(matcher.matches(b"/nodes/1"));
assert!(!matcher.matches(b"/nodes/2"));
assert!(!matcher.matches(b"/nodes/10"));
assert!(!matcher.matches(b"/nodes"));
}
#[test]
fn test_prefix_match() {
let matcher = KeyMatcher::prefix(b"/nodes/");
assert!(matcher.matches(b"/nodes/1"));
assert!(matcher.matches(b"/nodes/abc"));
assert!(matcher.matches(b"/nodes/"));
assert!(!matcher.matches(b"/nodes"));
assert!(!matcher.matches(b"/tasks/1"));
}
#[test]
fn test_range_match() {
let matcher = KeyMatcher::range(b"a", b"d");
assert!(matcher.matches(b"a"));
assert!(matcher.matches(b"b"));
assert!(matcher.matches(b"c"));
assert!(!matcher.matches(b"d")); // End is exclusive
assert!(!matcher.matches(b"e"));
}
#[test]
fn test_all_match() {
let matcher = KeyMatcher::all();
assert!(matcher.matches(b"any"));
assert!(matcher.matches(b"/path/to/key"));
assert!(matcher.matches(b"\xff\xff\xff"));
}
#[test]
fn test_prefix_end() {
assert_eq!(prefix_end(b"abc"), b"abd");
assert_eq!(prefix_end(b"ab\xff"), b"ac");
assert_eq!(prefix_end(b"a\xff\xff"), b"b");
assert_eq!(prefix_end(b"\xff\xff"), Vec::<u8>::new());
}
}

View file

@ -0,0 +1,353 @@
//! Watch subscription registry
use crate::matcher::KeyMatcher;
use crate::next_watch_id;
use chainfire_types::watch::{WatchEvent, WatchRequest, WatchResponse};
use chainfire_types::Revision;
use dashmap::DashMap;
use parking_lot::RwLock;
use std::collections::{BTreeMap, HashSet};
use tokio::sync::mpsc;
use tracing::{debug, trace, warn};
/// A registered watch subscription
struct WatchSubscription {
watch_id: i64,
matcher: KeyMatcher,
prev_kv: bool,
created_revision: Revision,
sender: mpsc::Sender<WatchResponse>,
}
/// Registry for all active watch subscriptions
pub struct WatchRegistry {
/// Map of watch_id -> subscription
watches: DashMap<i64, WatchSubscription>,
/// Index: key prefix -> watch_ids for efficient dispatch
/// Uses BTreeMap for prefix range queries
prefix_index: RwLock<BTreeMap<Vec<u8>, HashSet<i64>>>,
/// Current revision for progress notifications
current_revision: RwLock<Revision>,
}
impl WatchRegistry {
/// Create a new watch registry
pub fn new() -> Self {
Self {
watches: DashMap::new(),
prefix_index: RwLock::new(BTreeMap::new()),
current_revision: RwLock::new(0),
}
}
/// Update current revision
pub fn set_revision(&self, revision: Revision) {
*self.current_revision.write() = revision;
}
/// Get current revision
pub fn current_revision(&self) -> Revision {
*self.current_revision.read()
}
/// Create a new watch subscription
pub fn create_watch(
&self,
req: WatchRequest,
sender: mpsc::Sender<WatchResponse>,
) -> i64 {
let watch_id = if req.watch_id != 0 {
req.watch_id
} else {
next_watch_id()
};
let matcher = if let Some(ref end) = req.range_end {
KeyMatcher::range(req.key.clone(), end.clone())
} else {
KeyMatcher::key(req.key.clone())
};
let subscription = WatchSubscription {
watch_id,
matcher,
prev_kv: req.prev_kv,
created_revision: req.start_revision.unwrap_or_else(|| self.current_revision()),
sender,
};
// Add to watches
self.watches.insert(watch_id, subscription);
// Add to prefix index
{
let mut index = self.prefix_index.write();
index
.entry(req.key.clone())
.or_insert_with(HashSet::new)
.insert(watch_id);
}
debug!(watch_id, key = ?String::from_utf8_lossy(&req.key), "Created watch");
watch_id
}
/// Cancel a watch
pub fn cancel_watch(&self, watch_id: i64) -> bool {
if let Some((_, sub)) = self.watches.remove(&watch_id) {
// Remove from prefix index
let mut index = self.prefix_index.write();
if let Some(ids) = index.get_mut(sub.matcher.start_key()) {
ids.remove(&watch_id);
if ids.is_empty() {
index.remove(sub.matcher.start_key());
}
}
debug!(watch_id, "Canceled watch");
true
} else {
false
}
}
/// Get watch count
pub fn watch_count(&self) -> usize {
self.watches.len()
}
/// Dispatch an event to matching watches
pub async fn dispatch_event(&self, event: WatchEvent) {
let key = &event.kv.key;
let revision = event.kv.mod_revision;
// Update current revision
{
let mut current = self.current_revision.write();
if revision > *current {
*current = revision;
}
}
// Find all matching watches
let matching_ids = self.find_matching_watches(key);
trace!(
key = ?String::from_utf8_lossy(key),
matches = matching_ids.len(),
"Dispatching event"
);
for watch_id in matching_ids {
if let Some(sub) = self.watches.get(&watch_id) {
// Check if event revision is after watch creation
if revision > sub.created_revision {
let response = WatchResponse::events(
watch_id,
vec![if sub.prev_kv {
event.clone()
} else {
WatchEvent {
event_type: event.event_type,
kv: event.kv.clone(),
prev_kv: None,
}
}],
);
// Non-blocking send
if sub.sender.try_send(response).is_err() {
warn!(watch_id, "Watch channel full or closed");
}
}
}
}
}
/// Find watches that match a key
fn find_matching_watches(&self, key: &[u8]) -> Vec<i64> {
let mut result = Vec::new();
// Check each subscription for match
// This is O(n) but can be optimized with better indexing
for entry in self.watches.iter() {
if entry.matcher.matches(key) {
result.push(*entry.key());
}
}
result
}
/// Send progress notification to all watches
pub async fn send_progress(&self) {
let revision = self.current_revision();
for entry in self.watches.iter() {
let response = WatchResponse::progress(entry.watch_id, revision);
if entry.sender.try_send(response).is_err() {
trace!(watch_id = entry.watch_id, "Progress notification dropped");
}
}
}
/// Remove watches with closed channels
pub fn cleanup_closed(&self) {
let closed_ids: Vec<i64> = self
.watches
.iter()
.filter(|entry| entry.sender.is_closed())
.map(|entry| *entry.key())
.collect();
for id in closed_ids {
self.cancel_watch(id);
}
}
}
impl Default for WatchRegistry {
fn default() -> Self {
Self::new()
}
}
#[cfg(test)]
mod tests {
use super::*;
use chainfire_types::kv::KvEntry;
use chainfire_types::watch::WatchEventType;
fn create_test_event(key: &[u8], value: &[u8], revision: u64) -> WatchEvent {
WatchEvent {
event_type: WatchEventType::Put,
kv: KvEntry::new(key.to_vec(), value.to_vec(), revision),
prev_kv: None,
}
}
#[tokio::test]
async fn test_create_and_cancel_watch() {
let registry = WatchRegistry::new();
let (tx, _rx) = mpsc::channel(10);
let req = WatchRequest::key(1, b"/test/key");
let watch_id = registry.create_watch(req, tx);
assert_eq!(watch_id, 1);
assert_eq!(registry.watch_count(), 1);
assert!(registry.cancel_watch(watch_id));
assert_eq!(registry.watch_count(), 0);
}
#[tokio::test]
async fn test_dispatch_to_single_key_watch() {
let registry = WatchRegistry::new();
let (tx, mut rx) = mpsc::channel(10);
let req = WatchRequest::key(1, b"/test/key");
registry.create_watch(req, tx);
// Dispatch matching event
let event = create_test_event(b"/test/key", b"value", 1);
registry.dispatch_event(event).await;
// Should receive event
let response = rx.try_recv().unwrap();
assert_eq!(response.watch_id, 1);
assert_eq!(response.events.len(), 1);
assert_eq!(response.events[0].kv.key, b"/test/key");
}
#[tokio::test]
async fn test_dispatch_to_prefix_watch() {
let registry = WatchRegistry::new();
let (tx, mut rx) = mpsc::channel(10);
let req = WatchRequest::prefix(1, b"/nodes/");
registry.create_watch(req, tx);
// Dispatch matching events
registry
.dispatch_event(create_test_event(b"/nodes/1", b"data1", 1))
.await;
registry
.dispatch_event(create_test_event(b"/nodes/2", b"data2", 2))
.await;
registry
.dispatch_event(create_test_event(b"/tasks/1", b"other", 3))
.await;
// Should receive 2 events (not /tasks/1)
let resp1 = rx.try_recv().unwrap();
let resp2 = rx.try_recv().unwrap();
assert!(rx.try_recv().is_err());
assert_eq!(resp1.events[0].kv.key, b"/nodes/1");
assert_eq!(resp2.events[0].kv.key, b"/nodes/2");
}
#[tokio::test]
async fn test_revision_filtering() {
let registry = WatchRegistry::new();
registry.set_revision(5);
let (tx, mut rx) = mpsc::channel(10);
// Watch starting from revision 10
let req = WatchRequest::key(1, b"/key").from_revision(10);
registry.create_watch(req, tx);
// Event at revision 8 (before watch start)
registry
.dispatch_event(create_test_event(b"/key", b"old", 8))
.await;
// Event at revision 12 (after watch start)
registry
.dispatch_event(create_test_event(b"/key", b"new", 12))
.await;
// Should only receive the second event
let response = rx.try_recv().unwrap();
assert_eq!(response.events[0].kv.mod_revision, 12);
assert!(rx.try_recv().is_err());
}
#[tokio::test]
async fn test_multiple_watches() {
let registry = WatchRegistry::new();
let (tx1, mut rx1) = mpsc::channel(10);
let (tx2, mut rx2) = mpsc::channel(10);
registry.create_watch(WatchRequest::prefix(1, b"/a/"), tx1);
registry.create_watch(WatchRequest::prefix(2, b"/a/b/"), tx2);
// Event matching both watches
registry
.dispatch_event(create_test_event(b"/a/b/c", b"value", 1))
.await;
// Both should receive the event
assert!(rx1.try_recv().is_ok());
assert!(rx2.try_recv().is_ok());
}
#[tokio::test]
async fn test_cleanup_closed() {
let registry = WatchRegistry::new();
let (tx, rx) = mpsc::channel(10);
registry.create_watch(WatchRequest::key(1, b"/test"), tx);
assert_eq!(registry.watch_count(), 1);
// Drop the receiver to close the channel
drop(rx);
// Cleanup should remove the watch
registry.cleanup_closed();
assert_eq!(registry.watch_count(), 0);
}
}

View file

@ -0,0 +1,190 @@
//! Watch stream management
use crate::WatchRegistry;
use chainfire_types::watch::{WatchRequest, WatchResponse};
use std::collections::HashSet;
use std::sync::Arc;
use tokio::sync::mpsc;
use tracing::{debug, trace};
/// Manages watch subscriptions for a single client stream
pub struct WatchStream {
/// Reference to the global registry
registry: Arc<WatchRegistry>,
/// Watch IDs owned by this stream
active_watches: HashSet<i64>,
/// Channel for sending events to the client
event_tx: mpsc::Sender<WatchResponse>,
}
impl WatchStream {
/// Create a new watch stream
pub fn new(registry: Arc<WatchRegistry>, event_tx: mpsc::Sender<WatchResponse>) -> Self {
Self {
registry,
active_watches: HashSet::new(),
event_tx,
}
}
/// Handle a create watch request
pub fn create_watch(&mut self, req: WatchRequest) -> WatchResponse {
let watch_id = self.registry.create_watch(req, self.event_tx.clone());
self.active_watches.insert(watch_id);
debug!(watch_id, "Stream created watch");
WatchResponse::created(watch_id)
}
/// Handle a cancel watch request
pub fn cancel_watch(&mut self, watch_id: i64) -> WatchResponse {
let canceled = if self.active_watches.remove(&watch_id) {
self.registry.cancel_watch(watch_id)
} else {
false
};
debug!(watch_id, canceled, "Stream canceled watch");
WatchResponse::canceled(watch_id)
}
/// Get the number of active watches in this stream
pub fn watch_count(&self) -> usize {
self.active_watches.len()
}
/// Get active watch IDs
pub fn watch_ids(&self) -> impl Iterator<Item = i64> + '_ {
self.active_watches.iter().copied()
}
}
impl Drop for WatchStream {
fn drop(&mut self) {
// Clean up all watches when stream closes
for watch_id in self.active_watches.drain() {
self.registry.cancel_watch(watch_id);
trace!(watch_id, "Cleaned up watch on stream close");
}
}
}
/// Handle for spawning watch event processor
pub struct WatchEventHandler {
registry: Arc<WatchRegistry>,
}
impl WatchEventHandler {
/// Create a new event handler
pub fn new(registry: Arc<WatchRegistry>) -> Self {
Self { registry }
}
/// Spawn a background task that processes watch events
pub fn spawn_dispatcher(
self,
mut event_rx: mpsc::UnboundedReceiver<chainfire_types::watch::WatchEvent>,
) -> tokio::task::JoinHandle<()> {
tokio::spawn(async move {
while let Some(event) = event_rx.recv().await {
self.registry.dispatch_event(event).await;
}
debug!("Watch event dispatcher stopped");
})
}
/// Spawn a background task for progress notifications
pub fn spawn_progress_notifier(
registry: Arc<WatchRegistry>,
interval: std::time::Duration,
) -> tokio::task::JoinHandle<()> {
tokio::spawn(async move {
let mut ticker = tokio::time::interval(interval);
loop {
ticker.tick().await;
registry.send_progress().await;
}
})
}
}
#[cfg(test)]
mod tests {
use super::*;
use chainfire_types::kv::KvEntry;
use chainfire_types::watch::{WatchEvent, WatchEventType};
#[tokio::test]
async fn test_watch_stream_lifecycle() {
let registry = Arc::new(WatchRegistry::new());
let (tx, mut rx) = mpsc::channel(10);
let mut stream = WatchStream::new(Arc::clone(&registry), tx);
// Create watch
let req = WatchRequest::key(0, b"/test");
let response = stream.create_watch(req);
assert!(response.created);
let watch_id = response.watch_id;
assert_eq!(stream.watch_count(), 1);
assert_eq!(registry.watch_count(), 1);
// Cancel watch
let response = stream.cancel_watch(watch_id);
assert!(response.canceled);
assert_eq!(stream.watch_count(), 0);
assert_eq!(registry.watch_count(), 0);
}
#[tokio::test]
async fn test_watch_stream_cleanup_on_drop() {
let registry = Arc::new(WatchRegistry::new());
let (tx, _rx) = mpsc::channel(10);
{
let mut stream = WatchStream::new(Arc::clone(&registry), tx);
stream.create_watch(WatchRequest::key(0, b"/a"));
stream.create_watch(WatchRequest::key(0, b"/b"));
stream.create_watch(WatchRequest::key(0, b"/c"));
assert_eq!(registry.watch_count(), 3);
}
// Stream dropped here
// Registry should be cleaned up
assert_eq!(registry.watch_count(), 0);
}
#[tokio::test]
async fn test_event_handler() {
let registry = Arc::new(WatchRegistry::new());
let (event_tx, event_rx) = mpsc::unbounded_channel();
let (watch_tx, mut watch_rx) = mpsc::channel(10);
// Create a watch
let req = WatchRequest::key(1, b"/test");
registry.create_watch(req, watch_tx);
// Start event handler
let handler = WatchEventHandler::new(Arc::clone(&registry));
let handle = handler.spawn_dispatcher(event_rx);
// Send an event
event_tx
.send(WatchEvent {
event_type: WatchEventType::Put,
kv: KvEntry::new(b"/test".to_vec(), b"value".to_vec(), 1),
prev_kv: None,
})
.unwrap();
// Should receive the event
let response = watch_rx.recv().await.unwrap();
assert_eq!(response.events.len(), 1);
// Cleanup
drop(event_tx);
handle.await.unwrap();
}
}

96
chainfire/flake.lock generated Normal file
View file

@ -0,0 +1,96 @@
{
"nodes": {
"flake-utils": {
"inputs": {
"systems": "systems"
},
"locked": {
"lastModified": 1731533236,
"narHash": "sha256-l0KFg5HjrsfsO/JpG+r7fRrqm12kzFHyUHqHCVpMMbI=",
"owner": "numtide",
"repo": "flake-utils",
"rev": "11707dc2f618dd54ca8739b309ec4fc024de578b",
"type": "github"
},
"original": {
"owner": "numtide",
"repo": "flake-utils",
"type": "github"
}
},
"nixpkgs": {
"locked": {
"lastModified": 1764517877,
"narHash": "sha256-pp3uT4hHijIC8JUK5MEqeAWmParJrgBVzHLNfJDZxg4=",
"owner": "NixOS",
"repo": "nixpkgs",
"rev": "2d293cbfa5a793b4c50d17c05ef9e385b90edf6c",
"type": "github"
},
"original": {
"owner": "NixOS",
"ref": "nixos-unstable",
"repo": "nixpkgs",
"type": "github"
}
},
"nixpkgs_2": {
"locked": {
"lastModified": 1744536153,
"narHash": "sha256-awS2zRgF4uTwrOKwwiJcByDzDOdo3Q1rPZbiHQg/N38=",
"owner": "NixOS",
"repo": "nixpkgs",
"rev": "18dd725c29603f582cf1900e0d25f9f1063dbf11",
"type": "github"
},
"original": {
"owner": "NixOS",
"ref": "nixpkgs-unstable",
"repo": "nixpkgs",
"type": "github"
}
},
"root": {
"inputs": {
"flake-utils": "flake-utils",
"nixpkgs": "nixpkgs",
"rust-overlay": "rust-overlay"
}
},
"rust-overlay": {
"inputs": {
"nixpkgs": "nixpkgs_2"
},
"locked": {
"lastModified": 1764729618,
"narHash": "sha256-z4RA80HCWv2los1KD346c+PwNPzMl79qgl7bCVgz8X0=",
"owner": "oxalica",
"repo": "rust-overlay",
"rev": "52764074a85145d5001bf0aa30cb71936e9ad5b8",
"type": "github"
},
"original": {
"owner": "oxalica",
"repo": "rust-overlay",
"type": "github"
}
},
"systems": {
"locked": {
"lastModified": 1681028828,
"narHash": "sha256-Vy1rq5AaRuLzOxct8nz4T6wlgyUR7zLU309k9mBC768=",
"owner": "nix-systems",
"repo": "default",
"rev": "da67096a3b9bf56a91d16901293e51ba5b49a27e",
"type": "github"
},
"original": {
"owner": "nix-systems",
"repo": "default",
"type": "github"
}
}
},
"root": "root",
"version": 7
}

79
chainfire/flake.nix Normal file
View file

@ -0,0 +1,79 @@
{
description = "Chainfire - Distributed Key-Value Store with Raft and Gossip";
inputs = {
nixpkgs.url = "github:NixOS/nixpkgs/nixos-unstable";
rust-overlay.url = "github:oxalica/rust-overlay";
flake-utils.url = "github:numtide/flake-utils";
};
outputs = { self, nixpkgs, rust-overlay, flake-utils }:
flake-utils.lib.eachDefaultSystem (system:
let
overlays = [ (import rust-overlay) ];
pkgs = import nixpkgs {
inherit system overlays;
};
rustToolchain = pkgs.rust-bin.stable.latest.default.override {
extensions = [ "rust-src" "rust-analyzer" ];
};
nativeBuildInputs = with pkgs; [
rustToolchain
pkg-config
protobuf
cmake
];
buildInputs = with pkgs; [
# For RocksDB bindgen
llvmPackages.libclang
llvmPackages.clang
# RocksDB build dependencies (let cargo build rocksdb from source)
snappy
lz4
zstd
zlib
bzip2
# OpenSSL for potential TLS support
openssl
];
# Environment variables for build
shellHook = ''
export LIBCLANG_PATH="${pkgs.llvmPackages.libclang.lib}/lib"
export PROTOC="${pkgs.protobuf}/bin/protoc"
'';
in
{
devShells.default = pkgs.mkShell {
inherit nativeBuildInputs buildInputs shellHook;
LIBCLANG_PATH = "${pkgs.llvmPackages.libclang.lib}/lib";
PROTOC = "${pkgs.protobuf}/bin/protoc";
};
packages.default = pkgs.rustPlatform.buildRustPackage {
pname = "chainfire";
version = "0.1.0";
src = ./.;
cargoLock = {
lockFile = ./Cargo.lock;
};
inherit nativeBuildInputs buildInputs;
LIBCLANG_PATH = "${pkgs.llvmPackages.libclang.lib}/lib";
PROTOC = "${pkgs.protobuf}/bin/protoc";
# Skip tests during nix build (run separately)
doCheck = false;
};
}
);
}

View file

@ -0,0 +1,414 @@
syntax = "proto3";
package chainfire.v1;
// Key-Value service
service KV {
// Range gets the keys in the range from the key-value store
rpc Range(RangeRequest) returns (RangeResponse);
// Put puts the given key into the key-value store
rpc Put(PutRequest) returns (PutResponse);
// Delete deletes the given range from the key-value store
rpc Delete(DeleteRangeRequest) returns (DeleteRangeResponse);
// Txn processes multiple requests in a single transaction
rpc Txn(TxnRequest) returns (TxnResponse);
}
// Watch service
service Watch {
// Watch watches for events happening or that have happened
rpc Watch(stream WatchRequest) returns (stream WatchResponse);
}
// Cluster management service
service Cluster {
// MemberAdd adds a member into the cluster
rpc MemberAdd(MemberAddRequest) returns (MemberAddResponse);
// MemberRemove removes an existing member from the cluster
rpc MemberRemove(MemberRemoveRequest) returns (MemberRemoveResponse);
// MemberList lists all the members in the cluster
rpc MemberList(MemberListRequest) returns (MemberListResponse);
// Status gets the status of the cluster
rpc Status(StatusRequest) returns (StatusResponse);
}
// Lease service for TTL-based key expiration
service Lease {
// LeaseGrant creates a new lease with a given TTL
rpc LeaseGrant(LeaseGrantRequest) returns (LeaseGrantResponse);
// LeaseRevoke revokes a lease, deleting all keys attached to it
rpc LeaseRevoke(LeaseRevokeRequest) returns (LeaseRevokeResponse);
// LeaseKeepAlive keeps a lease alive by refreshing its TTL
rpc LeaseKeepAlive(stream LeaseKeepAliveRequest) returns (stream LeaseKeepAliveResponse);
// LeaseTimeToLive retrieves lease information
rpc LeaseTimeToLive(LeaseTimeToLiveRequest) returns (LeaseTimeToLiveResponse);
// LeaseLeases lists all existing leases
rpc LeaseLeases(LeaseLeasesRequest) returns (LeaseLeasesResponse);
}
// Response header included in all responses
message ResponseHeader {
// cluster_id is the ID of the cluster
uint64 cluster_id = 1;
// member_id is the ID of the responding member
uint64 member_id = 2;
// revision is the key-value store revision
int64 revision = 3;
// raft_term is the current Raft term
uint64 raft_term = 4;
}
// Key-value pair
message KeyValue {
// key is the key in bytes
bytes key = 1;
// create_revision is the revision of last creation
int64 create_revision = 2;
// mod_revision is the revision of last modification
int64 mod_revision = 3;
// version is the version of the key
int64 version = 4;
// value is the value held by the key
bytes value = 5;
// lease is the ID of the lease attached to the key
int64 lease = 6;
}
// ========== Range ==========
message RangeRequest {
// key is the first key for the range
bytes key = 1;
// range_end is the upper bound on the requested range
bytes range_end = 2;
// limit is a limit on the number of keys returned
int64 limit = 3;
// revision is the point-in-time of the store to use
int64 revision = 4;
// keys_only when set returns only the keys and not the values
bool keys_only = 5;
// count_only when set returns only the count of the keys
bool count_only = 6;
// serializable sets the range request to use serializable (local) reads.
// When true, reads from local state (faster, but may be stale).
// When false (default), uses linearizable reads through Raft (consistent).
bool serializable = 7;
}
message RangeResponse {
ResponseHeader header = 1;
// kvs is the list of key-value pairs matched by the range request
repeated KeyValue kvs = 2;
// more indicates if there are more keys to return
bool more = 3;
// count is set to the number of keys within the range
int64 count = 4;
}
// ========== Put ==========
message PutRequest {
// key is the key to put
bytes key = 1;
// value is the value to put
bytes value = 2;
// lease is the lease ID to attach to the key
int64 lease = 3;
// prev_kv when set returns the previous key-value pair
bool prev_kv = 4;
}
message PutResponse {
ResponseHeader header = 1;
// prev_kv is the key-value pair before the put
KeyValue prev_kv = 2;
}
// ========== Delete ==========
message DeleteRangeRequest {
// key is the first key to delete
bytes key = 1;
// range_end is the key following the last key to delete
bytes range_end = 2;
// prev_kv when set returns deleted key-value pairs
bool prev_kv = 3;
}
message DeleteRangeResponse {
ResponseHeader header = 1;
// deleted is the number of keys deleted
int64 deleted = 2;
// prev_kvs holds the deleted key-value pairs
repeated KeyValue prev_kvs = 3;
}
// ========== Transaction ==========
message TxnRequest {
// compare is a list of predicates
repeated Compare compare = 1;
// success is a list of operations to apply if all comparisons succeed
repeated RequestOp success = 2;
// failure is a list of operations to apply if any comparison fails
repeated RequestOp failure = 3;
}
message TxnResponse {
ResponseHeader header = 1;
// succeeded is set to true if all comparisons evaluated to true
bool succeeded = 2;
// responses is a list of responses corresponding to the results
repeated ResponseOp responses = 3;
}
message Compare {
enum CompareResult {
EQUAL = 0;
GREATER = 1;
LESS = 2;
NOT_EQUAL = 3;
}
enum CompareTarget {
VERSION = 0;
CREATE = 1;
MOD = 2;
VALUE = 3;
}
CompareResult result = 1;
CompareTarget target = 2;
bytes key = 3;
oneof target_union {
int64 version = 4;
int64 create_revision = 5;
int64 mod_revision = 6;
bytes value = 7;
}
}
message RequestOp {
oneof request {
RangeRequest request_range = 1;
PutRequest request_put = 2;
DeleteRangeRequest request_delete_range = 3;
}
}
message ResponseOp {
oneof response {
RangeResponse response_range = 1;
PutResponse response_put = 2;
DeleteRangeResponse response_delete_range = 3;
}
}
// ========== Watch ==========
message WatchRequest {
oneof request_union {
WatchCreateRequest create_request = 1;
WatchCancelRequest cancel_request = 2;
WatchProgressRequest progress_request = 3;
}
}
message WatchCreateRequest {
// key is the key to watch
bytes key = 1;
// range_end is the end of the range to watch
bytes range_end = 2;
// start_revision is an optional revision to start watching from
int64 start_revision = 3;
// progress_notify is set to true to enable progress notifications
bool progress_notify = 4;
// prev_kv when set includes previous key-value in events
bool prev_kv = 5;
// watch_id is the user-provided watch ID (0 for server-assigned)
int64 watch_id = 6;
}
message WatchCancelRequest {
// watch_id is the watch ID to cancel
int64 watch_id = 1;
}
message WatchProgressRequest {}
message WatchResponse {
ResponseHeader header = 1;
// watch_id is the watch ID for this response
int64 watch_id = 2;
// created is set to true if this response is for a create request
bool created = 3;
// canceled is set to true if the watch was canceled
bool canceled = 4;
// compact_revision is the minimum revision the watcher may receive
int64 compact_revision = 5;
// cancel_reason indicates the reason for cancellation
string cancel_reason = 6;
// events is the list of events in this response
repeated Event events = 11;
}
message Event {
enum EventType {
PUT = 0;
DELETE = 1;
}
// type is the kind of event
EventType type = 1;
// kv is the KeyValue affected by the event
KeyValue kv = 2;
// prev_kv is the KeyValue prior to the event
KeyValue prev_kv = 3;
}
// ========== Cluster Management ==========
message Member {
// ID is the member ID
uint64 id = 1;
// name is the human-readable name
string name = 2;
// peer_urls are URLs for Raft communication
repeated string peer_urls = 3;
// client_urls are URLs for client communication
repeated string client_urls = 4;
// is_learner indicates if member is a learner
bool is_learner = 5;
}
message MemberAddRequest {
// peer_urls are the URLs to reach the new member
repeated string peer_urls = 1;
// is_learner indicates if the member is a learner
bool is_learner = 2;
}
message MemberAddResponse {
ResponseHeader header = 1;
// member is the member information for the added member
Member member = 2;
// members is the list of all members after adding
repeated Member members = 3;
}
message MemberRemoveRequest {
// ID is the member ID to remove
uint64 id = 1;
}
message MemberRemoveResponse {
ResponseHeader header = 1;
// members is the list of all members after removing
repeated Member members = 2;
}
message MemberListRequest {}
message MemberListResponse {
ResponseHeader header = 1;
// members is the list of all members
repeated Member members = 2;
}
message StatusRequest {}
message StatusResponse {
ResponseHeader header = 1;
// version is the version of the server
string version = 2;
// db_size is the size of the database
int64 db_size = 3;
// leader is the member ID of the current leader
uint64 leader = 4;
// raft_index is the current Raft committed index
uint64 raft_index = 5;
// raft_term is the current Raft term
uint64 raft_term = 6;
// raft_applied_index is the current Raft applied index
uint64 raft_applied_index = 7;
}
// ========== Lease ==========
message LeaseGrantRequest {
// TTL is the advisory time-to-live in seconds
int64 ttl = 1;
// ID is the requested lease ID. If 0, the server will choose an ID.
int64 id = 2;
}
message LeaseGrantResponse {
ResponseHeader header = 1;
// ID is the lease ID for the granted lease
int64 id = 2;
// TTL is the actual TTL granted by the server
int64 ttl = 3;
// error is any error that occurred
string error = 4;
}
message LeaseRevokeRequest {
// ID is the lease ID to revoke
int64 id = 1;
}
message LeaseRevokeResponse {
ResponseHeader header = 1;
}
message LeaseKeepAliveRequest {
// ID is the lease ID to keep alive
int64 id = 1;
}
message LeaseKeepAliveResponse {
ResponseHeader header = 1;
// ID is the lease ID from the keep-alive request
int64 id = 2;
// TTL is the new TTL for the lease
int64 ttl = 3;
}
message LeaseTimeToLiveRequest {
// ID is the lease ID to query
int64 id = 1;
// keys is true to query all keys attached to this lease
bool keys = 2;
}
message LeaseTimeToLiveResponse {
ResponseHeader header = 1;
// ID is the lease ID
int64 id = 2;
// TTL is the remaining TTL in seconds; -1 if lease doesn't exist
int64 ttl = 3;
// grantedTTL is the initial TTL granted
int64 granted_ttl = 4;
// keys is the list of keys attached to this lease
repeated bytes keys = 5;
}
message LeaseLeasesRequest {}
message LeaseLeasesResponse {
ResponseHeader header = 1;
// leases is the list of all leases
repeated LeaseStatus leases = 2;
}
message LeaseStatus {
// ID is the lease ID
int64 id = 1;
}

View file

@ -0,0 +1,93 @@
syntax = "proto3";
package chainfire.internal;
// Internal Raft RPC service for node-to-node communication
service RaftService {
// Vote requests a vote from a peer
rpc Vote(VoteRequest) returns (VoteResponse);
// AppendEntries sends log entries to followers
rpc AppendEntries(AppendEntriesRequest) returns (AppendEntriesResponse);
// InstallSnapshot sends a snapshot to a follower
rpc InstallSnapshot(stream InstallSnapshotRequest) returns (InstallSnapshotResponse);
}
message VoteRequest {
// term is the candidate's term
uint64 term = 1;
// candidate_id is the candidate requesting the vote
uint64 candidate_id = 2;
// last_log_index is index of candidate's last log entry
uint64 last_log_index = 3;
// last_log_term is term of candidate's last log entry
uint64 last_log_term = 4;
}
message VoteResponse {
// term is the current term for the voter
uint64 term = 1;
// vote_granted is true if the candidate received the vote
bool vote_granted = 2;
// last_log_id is the voter's last log ID
uint64 last_log_index = 3;
uint64 last_log_term = 4;
}
message AppendEntriesRequest {
// term is the leader's term
uint64 term = 1;
// leader_id is the leader's ID
uint64 leader_id = 2;
// prev_log_index is index of log entry immediately preceding new ones
uint64 prev_log_index = 3;
// prev_log_term is term of prev_log_index entry
uint64 prev_log_term = 4;
// entries are log entries to append
repeated LogEntry entries = 5;
// leader_commit is leader's commit index
uint64 leader_commit = 6;
}
message LogEntry {
// index is the log entry index
uint64 index = 1;
// term is the term when entry was received
uint64 term = 2;
// data is the command data
bytes data = 3;
}
message AppendEntriesResponse {
// term is the current term
uint64 term = 1;
// success is true if follower contained entry matching prevLogIndex
bool success = 2;
// conflict_index is the first conflicting index (for optimization)
uint64 conflict_index = 3;
// conflict_term is the term of the conflicting entry
uint64 conflict_term = 4;
}
message InstallSnapshotRequest {
// term is the leader's term
uint64 term = 1;
// leader_id is the leader's ID
uint64 leader_id = 2;
// last_included_index is the snapshot replaces all entries up through and including this index
uint64 last_included_index = 3;
// last_included_term is term of last_included_index
uint64 last_included_term = 4;
// offset is byte offset where chunk is positioned in the snapshot file
uint64 offset = 5;
// data is raw bytes of the snapshot chunk
bytes data = 6;
// done is true if this is the last chunk
bool done = 7;
}
message InstallSnapshotResponse {
// term is the current term
uint64 term = 1;
}

View file

@ -31,22 +31,10 @@
inherit system overlays;
};
# Fetch submodule sources with their .git directories included
# This is necessary because chainfire, flaredb, and iam are git submodules
chainfireSrc = builtins.fetchGit {
url = ./chainfire;
submodules = true;
};
flaredbSrc = builtins.fetchGit {
url = ./flaredb;
submodules = true;
};
iamSrc = builtins.fetchGit {
url = ./iam;
submodules = true;
};
# Local workspace sources (regular directories, not submodules)
chainfireSrc = ./chainfire;
flaredbSrc = ./flaredb;
iamSrc = ./iam;
# Rust toolchain configuration
# Using stable channel with rust-src (for rust-analyzer) and rust-analyzer

@ -1 +0,0 @@
Subproject commit 69908ec0d2fcfda290719ce129e84b4c56afc91c

18
flaredb/.gitignore vendored Normal file
View file

@ -0,0 +1,18 @@
# Generated by speckit
target/
debug/
release/
.codex/
.cursor/
AGENTS.md
**/*.rs.bk
*.rlib
*.prof*
.idea/
*.log
.env*
.DS_Store
Thumbs.db
*.tmp
*.swp
.vscode/

View file

@ -0,0 +1,41 @@
# FlareDB Feature Constitution
## Core Principles
### I. Test-First (NON-NEGOTIABLE)
- Write tests before implementation for new functionality.
- Follow Red-Green-Refactor; do not merge untested code.
- All critical paths require unit tests; integration tests required when services/protocols change.
### II. Reliability & Coverage
- CI must run `cargo test` (or equivalent) for all touched crates.
- Integration verification must cover cross-service interactions when contracts change.
- Regressions on previously passing tests are not acceptable.
### III. Simplicity & Readability
- Prefer standard crates over bespoke solutions; avoid unnecessary complexity (YAGNI).
- Code must be self-explanatory; add concise comments only for non-obvious logic.
- Keep APIs minimal and coherent; avoid naming drift.
### IV. Observability
- Services must log structured, human-readable errors; fatal errors exit non-zero.
- gRPC/CLI surfaces should emit actionable diagnostics on failure.
### V. Versioning & Compatibility
- Protocol and API changes must call out compatibility impact; breaking changes require explicit agreement.
- Generated artifacts must be reproducible (lockfiles or pinned versions where applicable).
## Additional Constraints
- Technology stack: Rust stable, gRPC via tonic/prost, RocksDB for storage, tokio runtime.
- Nix flake is the canonical dev environment; commands should respect it when present.
## Development Workflow
- Tests before code; integration tests when touching contracts or cross-service logic.
- Code review (human or designated process) must confirm constitution compliance.
- Complexity must be justified; large changes should be broken down into tasks aligned with user stories.
## Governance
- This constitution supersedes other practices for this feature; conflicts must be resolved by adjusting spec/plan/tasks, not by ignoring principles.
- Amendments require an explicit update to this document with rationale and date.
**Version**: 1.0.0 | **Ratified**: 2025-11-30 | **Last Amended**: 2025-11-30

View file

@ -0,0 +1,166 @@
#!/usr/bin/env bash
# Consolidated prerequisite checking script
#
# This script provides unified prerequisite checking for Spec-Driven Development workflow.
# It replaces the functionality previously spread across multiple scripts.
#
# Usage: ./check-prerequisites.sh [OPTIONS]
#
# OPTIONS:
# --json Output in JSON format
# --require-tasks Require tasks.md to exist (for implementation phase)
# --include-tasks Include tasks.md in AVAILABLE_DOCS list
# --paths-only Only output path variables (no validation)
# --help, -h Show help message
#
# OUTPUTS:
# JSON mode: {"FEATURE_DIR":"...", "AVAILABLE_DOCS":["..."]}
# Text mode: FEATURE_DIR:... \n AVAILABLE_DOCS: \n ✓/✗ file.md
# Paths only: REPO_ROOT: ... \n BRANCH: ... \n FEATURE_DIR: ... etc.
set -e
# Parse command line arguments
JSON_MODE=false
REQUIRE_TASKS=false
INCLUDE_TASKS=false
PATHS_ONLY=false
for arg in "$@"; do
case "$arg" in
--json)
JSON_MODE=true
;;
--require-tasks)
REQUIRE_TASKS=true
;;
--include-tasks)
INCLUDE_TASKS=true
;;
--paths-only)
PATHS_ONLY=true
;;
--help|-h)
cat << 'EOF'
Usage: check-prerequisites.sh [OPTIONS]
Consolidated prerequisite checking for Spec-Driven Development workflow.
OPTIONS:
--json Output in JSON format
--require-tasks Require tasks.md to exist (for implementation phase)
--include-tasks Include tasks.md in AVAILABLE_DOCS list
--paths-only Only output path variables (no prerequisite validation)
--help, -h Show this help message
EXAMPLES:
# Check task prerequisites (plan.md required)
./check-prerequisites.sh --json
# Check implementation prerequisites (plan.md + tasks.md required)
./check-prerequisites.sh --json --require-tasks --include-tasks
# Get feature paths only (no validation)
./check-prerequisites.sh --paths-only
EOF
exit 0
;;
*)
echo "ERROR: Unknown option '$arg'. Use --help for usage information." >&2
exit 1
;;
esac
done
# Source common functions
SCRIPT_DIR="$(CDPATH="" cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
source "$SCRIPT_DIR/common.sh"
# Get feature paths and validate branch
eval $(get_feature_paths)
check_feature_branch "$CURRENT_BRANCH" "$HAS_GIT" || exit 1
# If paths-only mode, output paths and exit (support JSON + paths-only combined)
if $PATHS_ONLY; then
if $JSON_MODE; then
# Minimal JSON paths payload (no validation performed)
printf '{"REPO_ROOT":"%s","BRANCH":"%s","FEATURE_DIR":"%s","FEATURE_SPEC":"%s","IMPL_PLAN":"%s","TASKS":"%s"}\n' \
"$REPO_ROOT" "$CURRENT_BRANCH" "$FEATURE_DIR" "$FEATURE_SPEC" "$IMPL_PLAN" "$TASKS"
else
echo "REPO_ROOT: $REPO_ROOT"
echo "BRANCH: $CURRENT_BRANCH"
echo "FEATURE_DIR: $FEATURE_DIR"
echo "FEATURE_SPEC: $FEATURE_SPEC"
echo "IMPL_PLAN: $IMPL_PLAN"
echo "TASKS: $TASKS"
fi
exit 0
fi
# Validate required directories and files
if [[ ! -d "$FEATURE_DIR" ]]; then
echo "ERROR: Feature directory not found: $FEATURE_DIR" >&2
echo "Run /speckit.specify first to create the feature structure." >&2
exit 1
fi
if [[ ! -f "$IMPL_PLAN" ]]; then
echo "ERROR: plan.md not found in $FEATURE_DIR" >&2
echo "Run /speckit.plan first to create the implementation plan." >&2
exit 1
fi
# Check for tasks.md if required
if $REQUIRE_TASKS && [[ ! -f "$TASKS" ]]; then
echo "ERROR: tasks.md not found in $FEATURE_DIR" >&2
echo "Run /speckit.tasks first to create the task list." >&2
exit 1
fi
# Build list of available documents
docs=()
# Always check these optional docs
[[ -f "$RESEARCH" ]] && docs+=("research.md")
[[ -f "$DATA_MODEL" ]] && docs+=("data-model.md")
# Check contracts directory (only if it exists and has files)
if [[ -d "$CONTRACTS_DIR" ]] && [[ -n "$(ls -A "$CONTRACTS_DIR" 2>/dev/null)" ]]; then
docs+=("contracts/")
fi
[[ -f "$QUICKSTART" ]] && docs+=("quickstart.md")
# Include tasks.md if requested and it exists
if $INCLUDE_TASKS && [[ -f "$TASKS" ]]; then
docs+=("tasks.md")
fi
# Output results
if $JSON_MODE; then
# Build JSON array of documents
if [[ ${#docs[@]} -eq 0 ]]; then
json_docs="[]"
else
json_docs=$(printf '"%s",' "${docs[@]}")
json_docs="[${json_docs%,}]"
fi
printf '{"FEATURE_DIR":"%s","AVAILABLE_DOCS":%s}\n' "$FEATURE_DIR" "$json_docs"
else
# Text output
echo "FEATURE_DIR:$FEATURE_DIR"
echo "AVAILABLE_DOCS:"
# Show status of each potential document
check_file "$RESEARCH" "research.md"
check_file "$DATA_MODEL" "data-model.md"
check_dir "$CONTRACTS_DIR" "contracts/"
check_file "$QUICKSTART" "quickstart.md"
if $INCLUDE_TASKS; then
check_file "$TASKS" "tasks.md"
fi
fi

View file

@ -0,0 +1,156 @@
#!/usr/bin/env bash
# Common functions and variables for all scripts
# Get repository root, with fallback for non-git repositories
get_repo_root() {
if git rev-parse --show-toplevel >/dev/null 2>&1; then
git rev-parse --show-toplevel
else
# Fall back to script location for non-git repos
local script_dir="$(CDPATH="" cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
(cd "$script_dir/../../.." && pwd)
fi
}
# Get current branch, with fallback for non-git repositories
get_current_branch() {
# First check if SPECIFY_FEATURE environment variable is set
if [[ -n "${SPECIFY_FEATURE:-}" ]]; then
echo "$SPECIFY_FEATURE"
return
fi
# Then check git if available
if git rev-parse --abbrev-ref HEAD >/dev/null 2>&1; then
git rev-parse --abbrev-ref HEAD
return
fi
# For non-git repos, try to find the latest feature directory
local repo_root=$(get_repo_root)
local specs_dir="$repo_root/specs"
if [[ -d "$specs_dir" ]]; then
local latest_feature=""
local highest=0
for dir in "$specs_dir"/*; do
if [[ -d "$dir" ]]; then
local dirname=$(basename "$dir")
if [[ "$dirname" =~ ^([0-9]{3})- ]]; then
local number=${BASH_REMATCH[1]}
number=$((10#$number))
if [[ "$number" -gt "$highest" ]]; then
highest=$number
latest_feature=$dirname
fi
fi
fi
done
if [[ -n "$latest_feature" ]]; then
echo "$latest_feature"
return
fi
fi
echo "main" # Final fallback
}
# Check if we have git available
has_git() {
git rev-parse --show-toplevel >/dev/null 2>&1
}
check_feature_branch() {
local branch="$1"
local has_git_repo="$2"
# For non-git repos, we can't enforce branch naming but still provide output
if [[ "$has_git_repo" != "true" ]]; then
echo "[specify] Warning: Git repository not detected; skipped branch validation" >&2
return 0
fi
if [[ ! "$branch" =~ ^[0-9]{3}- ]]; then
echo "ERROR: Not on a feature branch. Current branch: $branch" >&2
echo "Feature branches should be named like: 001-feature-name" >&2
return 1
fi
return 0
}
get_feature_dir() { echo "$1/specs/$2"; }
# Find feature directory by numeric prefix instead of exact branch match
# This allows multiple branches to work on the same spec (e.g., 004-fix-bug, 004-add-feature)
find_feature_dir_by_prefix() {
local repo_root="$1"
local branch_name="$2"
local specs_dir="$repo_root/specs"
# Extract numeric prefix from branch (e.g., "004" from "004-whatever")
if [[ ! "$branch_name" =~ ^([0-9]{3})- ]]; then
# If branch doesn't have numeric prefix, fall back to exact match
echo "$specs_dir/$branch_name"
return
fi
local prefix="${BASH_REMATCH[1]}"
# Search for directories in specs/ that start with this prefix
local matches=()
if [[ -d "$specs_dir" ]]; then
for dir in "$specs_dir"/"$prefix"-*; do
if [[ -d "$dir" ]]; then
matches+=("$(basename "$dir")")
fi
done
fi
# Handle results
if [[ ${#matches[@]} -eq 0 ]]; then
# No match found - return the branch name path (will fail later with clear error)
echo "$specs_dir/$branch_name"
elif [[ ${#matches[@]} -eq 1 ]]; then
# Exactly one match - perfect!
echo "$specs_dir/${matches[0]}"
else
# Multiple matches - this shouldn't happen with proper naming convention
echo "ERROR: Multiple spec directories found with prefix '$prefix': ${matches[*]}" >&2
echo "Please ensure only one spec directory exists per numeric prefix." >&2
echo "$specs_dir/$branch_name" # Return something to avoid breaking the script
fi
}
get_feature_paths() {
local repo_root=$(get_repo_root)
local current_branch=$(get_current_branch)
local has_git_repo="false"
if has_git; then
has_git_repo="true"
fi
# Use prefix-based lookup to support multiple branches per spec
local feature_dir=$(find_feature_dir_by_prefix "$repo_root" "$current_branch")
cat <<EOF
REPO_ROOT='$repo_root'
CURRENT_BRANCH='$current_branch'
HAS_GIT='$has_git_repo'
FEATURE_DIR='$feature_dir'
FEATURE_SPEC='$feature_dir/spec.md'
IMPL_PLAN='$feature_dir/plan.md'
TASKS='$feature_dir/tasks.md'
RESEARCH='$feature_dir/research.md'
DATA_MODEL='$feature_dir/data-model.md'
QUICKSTART='$feature_dir/quickstart.md'
CONTRACTS_DIR='$feature_dir/contracts'
EOF
}
check_file() { [[ -f "$1" ]] && echo "$2" || echo "$2"; }
check_dir() { [[ -d "$1" && -n $(ls -A "$1" 2>/dev/null) ]] && echo "$2" || echo "$2"; }

View file

@ -0,0 +1,305 @@
#!/usr/bin/env bash
set -e
JSON_MODE=false
SHORT_NAME=""
BRANCH_NUMBER=""
ARGS=()
i=1
while [ $i -le $# ]; do
arg="${!i}"
case "$arg" in
--json)
JSON_MODE=true
;;
--short-name)
if [ $((i + 1)) -gt $# ]; then
echo 'Error: --short-name requires a value' >&2
exit 1
fi
i=$((i + 1))
next_arg="${!i}"
# Check if the next argument is another option (starts with --)
if [[ "$next_arg" == --* ]]; then
echo 'Error: --short-name requires a value' >&2
exit 1
fi
SHORT_NAME="$next_arg"
;;
--number)
if [ $((i + 1)) -gt $# ]; then
echo 'Error: --number requires a value' >&2
exit 1
fi
i=$((i + 1))
next_arg="${!i}"
if [[ "$next_arg" == --* ]]; then
echo 'Error: --number requires a value' >&2
exit 1
fi
BRANCH_NUMBER="$next_arg"
;;
--help|-h)
echo "Usage: $0 [--json] [--short-name <name>] [--number N] <feature_description>"
echo ""
echo "Options:"
echo " --json Output in JSON format"
echo " --short-name <name> Provide a custom short name (2-4 words) for the branch"
echo " --number N Specify branch number manually (overrides auto-detection)"
echo " --help, -h Show this help message"
echo ""
echo "Examples:"
echo " $0 'Add user authentication system' --short-name 'user-auth'"
echo " $0 'Implement OAuth2 integration for API' --number 5"
exit 0
;;
*)
ARGS+=("$arg")
;;
esac
i=$((i + 1))
done
FEATURE_DESCRIPTION="${ARGS[*]}"
if [ -z "$FEATURE_DESCRIPTION" ]; then
echo "Usage: $0 [--json] [--short-name <name>] [--number N] <feature_description>" >&2
exit 1
fi
# Function to find the repository root by searching for existing project markers
find_repo_root() {
local dir="$1"
while [ "$dir" != "/" ]; do
if [ -d "$dir/.git" ] || [ -d "$dir/.specify" ]; then
echo "$dir"
return 0
fi
dir="$(dirname "$dir")"
done
return 1
}
# Function to get highest number from specs directory
get_highest_from_specs() {
local specs_dir="$1"
local highest=0
if [ -d "$specs_dir" ]; then
for dir in "$specs_dir"/*; do
[ -d "$dir" ] || continue
dirname=$(basename "$dir")
number=$(echo "$dirname" | grep -o '^[0-9]\+' || echo "0")
number=$((10#$number))
if [ "$number" -gt "$highest" ]; then
highest=$number
fi
done
fi
echo "$highest"
}
# Function to get highest number from git branches
get_highest_from_branches() {
local highest=0
# Get all branches (local and remote)
branches=$(git branch -a 2>/dev/null || echo "")
if [ -n "$branches" ]; then
while IFS= read -r branch; do
# Clean branch name: remove leading markers and remote prefixes
clean_branch=$(echo "$branch" | sed 's/^[* ]*//; s|^remotes/[^/]*/||')
# Extract feature number if branch matches pattern ###-*
if echo "$clean_branch" | grep -q '^[0-9]\{3\}-'; then
number=$(echo "$clean_branch" | grep -o '^[0-9]\{3\}' || echo "0")
number=$((10#$number))
if [ "$number" -gt "$highest" ]; then
highest=$number
fi
fi
done <<< "$branches"
fi
echo "$highest"
}
# Function to check existing branches (local and remote) and return next available number
check_existing_branches() {
local short_name="$1"
local specs_dir="$2"
# Fetch all remotes to get latest branch info (suppress errors if no remotes)
git fetch --all --prune 2>/dev/null || true
# Find all branches matching the pattern using git ls-remote (more reliable)
local remote_branches=$(git ls-remote --heads origin 2>/dev/null | grep -E "refs/heads/[0-9]+-${short_name}$" | sed 's/.*\/\([0-9]*\)-.*/\1/' | sort -n)
# Also check local branches
local local_branches=$(git branch 2>/dev/null | grep -E "^[* ]*[0-9]+-${short_name}$" | sed 's/^[* ]*//' | sed 's/-.*//' | sort -n)
# Check specs directory as well
local spec_dirs=""
if [ -d "$specs_dir" ]; then
spec_dirs=$(find "$specs_dir" -maxdepth 1 -type d -name "[0-9]*-${short_name}" 2>/dev/null | xargs -n1 basename 2>/dev/null | sed 's/-.*//' | sort -n)
fi
# Combine all sources and get the highest number
local max_num=0
for num in $remote_branches $local_branches $spec_dirs; do
if [ "$num" -gt "$max_num" ]; then
max_num=$num
fi
done
# Return next number
echo $((max_num + 1))
}
# Function to clean and format a branch name
clean_branch_name() {
local name="$1"
echo "$name" | tr '[:upper:]' '[:lower:]' | sed 's/[^a-z0-9]/-/g' | sed 's/-\+/-/g' | sed 's/^-//' | sed 's/-$//'
}
# Resolve repository root. Prefer git information when available, but fall back
# to searching for repository markers so the workflow still functions in repositories that
# were initialised with --no-git.
SCRIPT_DIR="$(CDPATH="" cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
if git rev-parse --show-toplevel >/dev/null 2>&1; then
REPO_ROOT=$(git rev-parse --show-toplevel)
HAS_GIT=true
else
REPO_ROOT="$(find_repo_root "$SCRIPT_DIR")"
if [ -z "$REPO_ROOT" ]; then
echo "Error: Could not determine repository root. Please run this script from within the repository." >&2
exit 1
fi
HAS_GIT=false
fi
cd "$REPO_ROOT"
SPECS_DIR="$REPO_ROOT/specs"
mkdir -p "$SPECS_DIR"
# Function to generate branch name with stop word filtering and length filtering
generate_branch_name() {
local description="$1"
# Common stop words to filter out
local stop_words="^(i|a|an|the|to|for|of|in|on|at|by|with|from|is|are|was|were|be|been|being|have|has|had|do|does|did|will|would|should|could|can|may|might|must|shall|this|that|these|those|my|your|our|their|want|need|add|get|set)$"
# Convert to lowercase and split into words
local clean_name=$(echo "$description" | tr '[:upper:]' '[:lower:]' | sed 's/[^a-z0-9]/ /g')
# Filter words: remove stop words and words shorter than 3 chars (unless they're uppercase acronyms in original)
local meaningful_words=()
for word in $clean_name; do
# Skip empty words
[ -z "$word" ] && continue
# Keep words that are NOT stop words AND (length >= 3 OR are potential acronyms)
if ! echo "$word" | grep -qiE "$stop_words"; then
if [ ${#word} -ge 3 ]; then
meaningful_words+=("$word")
elif echo "$description" | grep -q "\b${word^^}\b"; then
# Keep short words if they appear as uppercase in original (likely acronyms)
meaningful_words+=("$word")
fi
fi
done
# If we have meaningful words, use first 3-4 of them
if [ ${#meaningful_words[@]} -gt 0 ]; then
local max_words=3
if [ ${#meaningful_words[@]} -eq 4 ]; then max_words=4; fi
local result=""
local count=0
for word in "${meaningful_words[@]}"; do
if [ $count -ge $max_words ]; then break; fi
if [ -n "$result" ]; then result="$result-"; fi
result="$result$word"
count=$((count + 1))
done
echo "$result"
else
# Fallback to original logic if no meaningful words found
local cleaned=$(clean_branch_name "$description")
echo "$cleaned" | tr '-' '\n' | grep -v '^$' | head -3 | tr '\n' '-' | sed 's/-$//'
fi
}
# Generate branch name
if [ -n "$SHORT_NAME" ]; then
# Use provided short name, just clean it up
BRANCH_SUFFIX=$(clean_branch_name "$SHORT_NAME")
else
# Generate from description with smart filtering
BRANCH_SUFFIX=$(generate_branch_name "$FEATURE_DESCRIPTION")
fi
# Determine branch number
if [ -z "$BRANCH_NUMBER" ]; then
if [ "$HAS_GIT" = true ]; then
# Check existing branches on remotes
BRANCH_NUMBER=$(check_existing_branches "$BRANCH_SUFFIX" "$SPECS_DIR")
else
# Fall back to local directory check
HIGHEST=$(get_highest_from_specs "$SPECS_DIR")
BRANCH_NUMBER=$((HIGHEST + 1))
fi
fi
FEATURE_NUM=$(printf "%03d" "$BRANCH_NUMBER")
BRANCH_NAME="${FEATURE_NUM}-${BRANCH_SUFFIX}"
# GitHub enforces a 244-byte limit on branch names
# Validate and truncate if necessary
MAX_BRANCH_LENGTH=244
if [ ${#BRANCH_NAME} -gt $MAX_BRANCH_LENGTH ]; then
# Calculate how much we need to trim from suffix
# Account for: feature number (3) + hyphen (1) = 4 chars
MAX_SUFFIX_LENGTH=$((MAX_BRANCH_LENGTH - 4))
# Truncate suffix at word boundary if possible
TRUNCATED_SUFFIX=$(echo "$BRANCH_SUFFIX" | cut -c1-$MAX_SUFFIX_LENGTH)
# Remove trailing hyphen if truncation created one
TRUNCATED_SUFFIX=$(echo "$TRUNCATED_SUFFIX" | sed 's/-$//')
ORIGINAL_BRANCH_NAME="$BRANCH_NAME"
BRANCH_NAME="${FEATURE_NUM}-${TRUNCATED_SUFFIX}"
>&2 echo "[specify] Warning: Branch name exceeded GitHub's 244-byte limit"
>&2 echo "[specify] Original: $ORIGINAL_BRANCH_NAME (${#ORIGINAL_BRANCH_NAME} bytes)"
>&2 echo "[specify] Truncated to: $BRANCH_NAME (${#BRANCH_NAME} bytes)"
fi
if [ "$HAS_GIT" = true ]; then
git checkout -b "$BRANCH_NAME"
else
>&2 echo "[specify] Warning: Git repository not detected; skipped branch creation for $BRANCH_NAME"
fi
FEATURE_DIR="$SPECS_DIR/$BRANCH_NAME"
mkdir -p "$FEATURE_DIR"
TEMPLATE="$REPO_ROOT/.specify/templates/spec-template.md"
SPEC_FILE="$FEATURE_DIR/spec.md"
if [ -f "$TEMPLATE" ]; then cp "$TEMPLATE" "$SPEC_FILE"; else touch "$SPEC_FILE"; fi
# Set the SPECIFY_FEATURE environment variable for the current session
export SPECIFY_FEATURE="$BRANCH_NAME"
if $JSON_MODE; then
printf '{"BRANCH_NAME":"%s","SPEC_FILE":"%s","FEATURE_NUM":"%s"}\n' "$BRANCH_NAME" "$SPEC_FILE" "$FEATURE_NUM"
else
echo "BRANCH_NAME: $BRANCH_NAME"
echo "SPEC_FILE: $SPEC_FILE"
echo "FEATURE_NUM: $FEATURE_NUM"
echo "SPECIFY_FEATURE environment variable set to: $BRANCH_NAME"
fi

View file

@ -0,0 +1,61 @@
#!/usr/bin/env bash
set -e
# Parse command line arguments
JSON_MODE=false
ARGS=()
for arg in "$@"; do
case "$arg" in
--json)
JSON_MODE=true
;;
--help|-h)
echo "Usage: $0 [--json]"
echo " --json Output results in JSON format"
echo " --help Show this help message"
exit 0
;;
*)
ARGS+=("$arg")
;;
esac
done
# Get script directory and load common functions
SCRIPT_DIR="$(CDPATH="" cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
source "$SCRIPT_DIR/common.sh"
# Get all paths and variables from common functions
eval $(get_feature_paths)
# Check if we're on a proper feature branch (only for git repos)
check_feature_branch "$CURRENT_BRANCH" "$HAS_GIT" || exit 1
# Ensure the feature directory exists
mkdir -p "$FEATURE_DIR"
# Copy plan template if it exists
TEMPLATE="$REPO_ROOT/.specify/templates/plan-template.md"
if [[ -f "$TEMPLATE" ]]; then
cp "$TEMPLATE" "$IMPL_PLAN"
echo "Copied plan template to $IMPL_PLAN"
else
echo "Warning: Plan template not found at $TEMPLATE"
# Create a basic plan file if template doesn't exist
touch "$IMPL_PLAN"
fi
# Output results
if $JSON_MODE; then
printf '{"FEATURE_SPEC":"%s","IMPL_PLAN":"%s","SPECS_DIR":"%s","BRANCH":"%s","HAS_GIT":"%s"}\n' \
"$FEATURE_SPEC" "$IMPL_PLAN" "$FEATURE_DIR" "$CURRENT_BRANCH" "$HAS_GIT"
else
echo "FEATURE_SPEC: $FEATURE_SPEC"
echo "IMPL_PLAN: $IMPL_PLAN"
echo "SPECS_DIR: $FEATURE_DIR"
echo "BRANCH: $CURRENT_BRANCH"
echo "HAS_GIT: $HAS_GIT"
fi

View file

@ -0,0 +1,790 @@
#!/usr/bin/env bash
# Update agent context files with information from plan.md
#
# This script maintains AI agent context files by parsing feature specifications
# and updating agent-specific configuration files with project information.
#
# MAIN FUNCTIONS:
# 1. Environment Validation
# - Verifies git repository structure and branch information
# - Checks for required plan.md files and templates
# - Validates file permissions and accessibility
#
# 2. Plan Data Extraction
# - Parses plan.md files to extract project metadata
# - Identifies language/version, frameworks, databases, and project types
# - Handles missing or incomplete specification data gracefully
#
# 3. Agent File Management
# - Creates new agent context files from templates when needed
# - Updates existing agent files with new project information
# - Preserves manual additions and custom configurations
# - Supports multiple AI agent formats and directory structures
#
# 4. Content Generation
# - Generates language-specific build/test commands
# - Creates appropriate project directory structures
# - Updates technology stacks and recent changes sections
# - Maintains consistent formatting and timestamps
#
# 5. Multi-Agent Support
# - Handles agent-specific file paths and naming conventions
# - Supports: Claude, Gemini, Copilot, Cursor, Qwen, opencode, Codex, Windsurf, Kilo Code, Auggie CLI, Roo Code, CodeBuddy CLI, Amp, SHAI, or Amazon Q Developer CLI
# - Can update single agents or all existing agent files
# - Creates default Claude file if no agent files exist
#
# Usage: ./update-agent-context.sh [agent_type]
# Agent types: claude|gemini|copilot|cursor-agent|qwen|opencode|codex|windsurf|kilocode|auggie|shai|q|bob
# Leave empty to update all existing agent files
set -e
# Enable strict error handling
set -u
set -o pipefail
#==============================================================================
# Configuration and Global Variables
#==============================================================================
# Get script directory and load common functions
SCRIPT_DIR="$(CDPATH="" cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
source "$SCRIPT_DIR/common.sh"
# Get all paths and variables from common functions
eval $(get_feature_paths)
NEW_PLAN="$IMPL_PLAN" # Alias for compatibility with existing code
AGENT_TYPE="${1:-}"
# Agent-specific file paths
CLAUDE_FILE="$REPO_ROOT/CLAUDE.md"
GEMINI_FILE="$REPO_ROOT/GEMINI.md"
COPILOT_FILE="$REPO_ROOT/.github/agents/copilot-instructions.md"
CURSOR_FILE="$REPO_ROOT/.cursor/rules/specify-rules.mdc"
QWEN_FILE="$REPO_ROOT/QWEN.md"
AGENTS_FILE="$REPO_ROOT/AGENTS.md"
WINDSURF_FILE="$REPO_ROOT/.windsurf/rules/specify-rules.md"
KILOCODE_FILE="$REPO_ROOT/.kilocode/rules/specify-rules.md"
AUGGIE_FILE="$REPO_ROOT/.augment/rules/specify-rules.md"
ROO_FILE="$REPO_ROOT/.roo/rules/specify-rules.md"
CODEBUDDY_FILE="$REPO_ROOT/CODEBUDDY.md"
AMP_FILE="$REPO_ROOT/AGENTS.md"
SHAI_FILE="$REPO_ROOT/SHAI.md"
Q_FILE="$REPO_ROOT/AGENTS.md"
BOB_FILE="$REPO_ROOT/AGENTS.md"
# Template file
TEMPLATE_FILE="$REPO_ROOT/.specify/templates/agent-file-template.md"
# Global variables for parsed plan data
NEW_LANG=""
NEW_FRAMEWORK=""
NEW_DB=""
NEW_PROJECT_TYPE=""
#==============================================================================
# Utility Functions
#==============================================================================
log_info() {
echo "INFO: $1"
}
log_success() {
echo "$1"
}
log_error() {
echo "ERROR: $1" >&2
}
log_warning() {
echo "WARNING: $1" >&2
}
# Cleanup function for temporary files
cleanup() {
local exit_code=$?
rm -f /tmp/agent_update_*_$$
rm -f /tmp/manual_additions_$$
exit $exit_code
}
# Set up cleanup trap
trap cleanup EXIT INT TERM
#==============================================================================
# Validation Functions
#==============================================================================
validate_environment() {
# Check if we have a current branch/feature (git or non-git)
if [[ -z "$CURRENT_BRANCH" ]]; then
log_error "Unable to determine current feature"
if [[ "$HAS_GIT" == "true" ]]; then
log_info "Make sure you're on a feature branch"
else
log_info "Set SPECIFY_FEATURE environment variable or create a feature first"
fi
exit 1
fi
# Check if plan.md exists
if [[ ! -f "$NEW_PLAN" ]]; then
log_error "No plan.md found at $NEW_PLAN"
log_info "Make sure you're working on a feature with a corresponding spec directory"
if [[ "$HAS_GIT" != "true" ]]; then
log_info "Use: export SPECIFY_FEATURE=your-feature-name or create a new feature first"
fi
exit 1
fi
# Check if template exists (needed for new files)
if [[ ! -f "$TEMPLATE_FILE" ]]; then
log_warning "Template file not found at $TEMPLATE_FILE"
log_warning "Creating new agent files will fail"
fi
}
#==============================================================================
# Plan Parsing Functions
#==============================================================================
extract_plan_field() {
local field_pattern="$1"
local plan_file="$2"
grep "^\*\*${field_pattern}\*\*: " "$plan_file" 2>/dev/null | \
head -1 | \
sed "s|^\*\*${field_pattern}\*\*: ||" | \
sed 's/^[ \t]*//;s/[ \t]*$//' | \
grep -v "NEEDS CLARIFICATION" | \
grep -v "^N/A$" || echo ""
}
parse_plan_data() {
local plan_file="$1"
if [[ ! -f "$plan_file" ]]; then
log_error "Plan file not found: $plan_file"
return 1
fi
if [[ ! -r "$plan_file" ]]; then
log_error "Plan file is not readable: $plan_file"
return 1
fi
log_info "Parsing plan data from $plan_file"
NEW_LANG=$(extract_plan_field "Language/Version" "$plan_file")
NEW_FRAMEWORK=$(extract_plan_field "Primary Dependencies" "$plan_file")
NEW_DB=$(extract_plan_field "Storage" "$plan_file")
NEW_PROJECT_TYPE=$(extract_plan_field "Project Type" "$plan_file")
# Log what we found
if [[ -n "$NEW_LANG" ]]; then
log_info "Found language: $NEW_LANG"
else
log_warning "No language information found in plan"
fi
if [[ -n "$NEW_FRAMEWORK" ]]; then
log_info "Found framework: $NEW_FRAMEWORK"
fi
if [[ -n "$NEW_DB" ]] && [[ "$NEW_DB" != "N/A" ]]; then
log_info "Found database: $NEW_DB"
fi
if [[ -n "$NEW_PROJECT_TYPE" ]]; then
log_info "Found project type: $NEW_PROJECT_TYPE"
fi
}
format_technology_stack() {
local lang="$1"
local framework="$2"
local parts=()
# Add non-empty parts
[[ -n "$lang" && "$lang" != "NEEDS CLARIFICATION" ]] && parts+=("$lang")
[[ -n "$framework" && "$framework" != "NEEDS CLARIFICATION" && "$framework" != "N/A" ]] && parts+=("$framework")
# Join with proper formatting
if [[ ${#parts[@]} -eq 0 ]]; then
echo ""
elif [[ ${#parts[@]} -eq 1 ]]; then
echo "${parts[0]}"
else
# Join multiple parts with " + "
local result="${parts[0]}"
for ((i=1; i<${#parts[@]}; i++)); do
result="$result + ${parts[i]}"
done
echo "$result"
fi
}
#==============================================================================
# Template and Content Generation Functions
#==============================================================================
get_project_structure() {
local project_type="$1"
if [[ "$project_type" == *"web"* ]]; then
echo "backend/\\nfrontend/\\ntests/"
else
echo "src/\\ntests/"
fi
}
get_commands_for_language() {
local lang="$1"
case "$lang" in
*"Python"*)
echo "cd src && pytest && ruff check ."
;;
*"Rust"*)
echo "cargo test && cargo clippy"
;;
*"JavaScript"*|*"TypeScript"*)
echo "npm test \\&\\& npm run lint"
;;
*)
echo "# Add commands for $lang"
;;
esac
}
get_language_conventions() {
local lang="$1"
echo "$lang: Follow standard conventions"
}
create_new_agent_file() {
local target_file="$1"
local temp_file="$2"
local project_name="$3"
local current_date="$4"
if [[ ! -f "$TEMPLATE_FILE" ]]; then
log_error "Template not found at $TEMPLATE_FILE"
return 1
fi
if [[ ! -r "$TEMPLATE_FILE" ]]; then
log_error "Template file is not readable: $TEMPLATE_FILE"
return 1
fi
log_info "Creating new agent context file from template..."
if ! cp "$TEMPLATE_FILE" "$temp_file"; then
log_error "Failed to copy template file"
return 1
fi
# Replace template placeholders
local project_structure
project_structure=$(get_project_structure "$NEW_PROJECT_TYPE")
local commands
commands=$(get_commands_for_language "$NEW_LANG")
local language_conventions
language_conventions=$(get_language_conventions "$NEW_LANG")
# Perform substitutions with error checking using safer approach
# Escape special characters for sed by using a different delimiter or escaping
local escaped_lang=$(printf '%s\n' "$NEW_LANG" | sed 's/[\[\.*^$()+{}|]/\\&/g')
local escaped_framework=$(printf '%s\n' "$NEW_FRAMEWORK" | sed 's/[\[\.*^$()+{}|]/\\&/g')
local escaped_branch=$(printf '%s\n' "$CURRENT_BRANCH" | sed 's/[\[\.*^$()+{}|]/\\&/g')
# Build technology stack and recent change strings conditionally
local tech_stack
if [[ -n "$escaped_lang" && -n "$escaped_framework" ]]; then
tech_stack="- $escaped_lang + $escaped_framework ($escaped_branch)"
elif [[ -n "$escaped_lang" ]]; then
tech_stack="- $escaped_lang ($escaped_branch)"
elif [[ -n "$escaped_framework" ]]; then
tech_stack="- $escaped_framework ($escaped_branch)"
else
tech_stack="- ($escaped_branch)"
fi
local recent_change
if [[ -n "$escaped_lang" && -n "$escaped_framework" ]]; then
recent_change="- $escaped_branch: Added $escaped_lang + $escaped_framework"
elif [[ -n "$escaped_lang" ]]; then
recent_change="- $escaped_branch: Added $escaped_lang"
elif [[ -n "$escaped_framework" ]]; then
recent_change="- $escaped_branch: Added $escaped_framework"
else
recent_change="- $escaped_branch: Added"
fi
local substitutions=(
"s|\[PROJECT NAME\]|$project_name|"
"s|\[DATE\]|$current_date|"
"s|\[EXTRACTED FROM ALL PLAN.MD FILES\]|$tech_stack|"
"s|\[ACTUAL STRUCTURE FROM PLANS\]|$project_structure|g"
"s|\[ONLY COMMANDS FOR ACTIVE TECHNOLOGIES\]|$commands|"
"s|\[LANGUAGE-SPECIFIC, ONLY FOR LANGUAGES IN USE\]|$language_conventions|"
"s|\[LAST 3 FEATURES AND WHAT THEY ADDED\]|$recent_change|"
)
for substitution in "${substitutions[@]}"; do
if ! sed -i.bak -e "$substitution" "$temp_file"; then
log_error "Failed to perform substitution: $substitution"
rm -f "$temp_file" "$temp_file.bak"
return 1
fi
done
# Convert \n sequences to actual newlines
newline=$(printf '\n')
sed -i.bak2 "s/\\\\n/${newline}/g" "$temp_file"
# Clean up backup files
rm -f "$temp_file.bak" "$temp_file.bak2"
return 0
}
update_existing_agent_file() {
local target_file="$1"
local current_date="$2"
log_info "Updating existing agent context file..."
# Use a single temporary file for atomic update
local temp_file
temp_file=$(mktemp) || {
log_error "Failed to create temporary file"
return 1
}
# Process the file in one pass
local tech_stack=$(format_technology_stack "$NEW_LANG" "$NEW_FRAMEWORK")
local new_tech_entries=()
local new_change_entry=""
# Prepare new technology entries
if [[ -n "$tech_stack" ]] && ! grep -q "$tech_stack" "$target_file"; then
new_tech_entries+=("- $tech_stack ($CURRENT_BRANCH)")
fi
if [[ -n "$NEW_DB" ]] && [[ "$NEW_DB" != "N/A" ]] && [[ "$NEW_DB" != "NEEDS CLARIFICATION" ]] && ! grep -q "$NEW_DB" "$target_file"; then
new_tech_entries+=("- $NEW_DB ($CURRENT_BRANCH)")
fi
# Prepare new change entry
if [[ -n "$tech_stack" ]]; then
new_change_entry="- $CURRENT_BRANCH: Added $tech_stack"
elif [[ -n "$NEW_DB" ]] && [[ "$NEW_DB" != "N/A" ]] && [[ "$NEW_DB" != "NEEDS CLARIFICATION" ]]; then
new_change_entry="- $CURRENT_BRANCH: Added $NEW_DB"
fi
# Check if sections exist in the file
local has_active_technologies=0
local has_recent_changes=0
if grep -q "^## Active Technologies" "$target_file" 2>/dev/null; then
has_active_technologies=1
fi
if grep -q "^## Recent Changes" "$target_file" 2>/dev/null; then
has_recent_changes=1
fi
# Process file line by line
local in_tech_section=false
local in_changes_section=false
local tech_entries_added=false
local changes_entries_added=false
local existing_changes_count=0
local file_ended=false
while IFS= read -r line || [[ -n "$line" ]]; do
# Handle Active Technologies section
if [[ "$line" == "## Active Technologies" ]]; then
echo "$line" >> "$temp_file"
in_tech_section=true
continue
elif [[ $in_tech_section == true ]] && [[ "$line" =~ ^##[[:space:]] ]]; then
# Add new tech entries before closing the section
if [[ $tech_entries_added == false ]] && [[ ${#new_tech_entries[@]} -gt 0 ]]; then
printf '%s\n' "${new_tech_entries[@]}" >> "$temp_file"
tech_entries_added=true
fi
echo "$line" >> "$temp_file"
in_tech_section=false
continue
elif [[ $in_tech_section == true ]] && [[ -z "$line" ]]; then
# Add new tech entries before empty line in tech section
if [[ $tech_entries_added == false ]] && [[ ${#new_tech_entries[@]} -gt 0 ]]; then
printf '%s\n' "${new_tech_entries[@]}" >> "$temp_file"
tech_entries_added=true
fi
echo "$line" >> "$temp_file"
continue
fi
# Handle Recent Changes section
if [[ "$line" == "## Recent Changes" ]]; then
echo "$line" >> "$temp_file"
# Add new change entry right after the heading
if [[ -n "$new_change_entry" ]]; then
echo "$new_change_entry" >> "$temp_file"
fi
in_changes_section=true
changes_entries_added=true
continue
elif [[ $in_changes_section == true ]] && [[ "$line" =~ ^##[[:space:]] ]]; then
echo "$line" >> "$temp_file"
in_changes_section=false
continue
elif [[ $in_changes_section == true ]] && [[ "$line" == "- "* ]]; then
# Keep only first 2 existing changes
if [[ $existing_changes_count -lt 2 ]]; then
echo "$line" >> "$temp_file"
((existing_changes_count++))
fi
continue
fi
# Update timestamp
if [[ "$line" =~ \*\*Last\ updated\*\*:.*[0-9][0-9][0-9][0-9]-[0-9][0-9]-[0-9][0-9] ]]; then
echo "$line" | sed "s/[0-9][0-9][0-9][0-9]-[0-9][0-9]-[0-9][0-9]/$current_date/" >> "$temp_file"
else
echo "$line" >> "$temp_file"
fi
done < "$target_file"
# Post-loop check: if we're still in the Active Technologies section and haven't added new entries
if [[ $in_tech_section == true ]] && [[ $tech_entries_added == false ]] && [[ ${#new_tech_entries[@]} -gt 0 ]]; then
printf '%s\n' "${new_tech_entries[@]}" >> "$temp_file"
tech_entries_added=true
fi
# If sections don't exist, add them at the end of the file
if [[ $has_active_technologies -eq 0 ]] && [[ ${#new_tech_entries[@]} -gt 0 ]]; then
echo "" >> "$temp_file"
echo "## Active Technologies" >> "$temp_file"
printf '%s\n' "${new_tech_entries[@]}" >> "$temp_file"
tech_entries_added=true
fi
if [[ $has_recent_changes -eq 0 ]] && [[ -n "$new_change_entry" ]]; then
echo "" >> "$temp_file"
echo "## Recent Changes" >> "$temp_file"
echo "$new_change_entry" >> "$temp_file"
changes_entries_added=true
fi
# Move temp file to target atomically
if ! mv "$temp_file" "$target_file"; then
log_error "Failed to update target file"
rm -f "$temp_file"
return 1
fi
return 0
}
#==============================================================================
# Main Agent File Update Function
#==============================================================================
update_agent_file() {
local target_file="$1"
local agent_name="$2"
if [[ -z "$target_file" ]] || [[ -z "$agent_name" ]]; then
log_error "update_agent_file requires target_file and agent_name parameters"
return 1
fi
log_info "Updating $agent_name context file: $target_file"
local project_name
project_name=$(basename "$REPO_ROOT")
local current_date
current_date=$(date +%Y-%m-%d)
# Create directory if it doesn't exist
local target_dir
target_dir=$(dirname "$target_file")
if [[ ! -d "$target_dir" ]]; then
if ! mkdir -p "$target_dir"; then
log_error "Failed to create directory: $target_dir"
return 1
fi
fi
if [[ ! -f "$target_file" ]]; then
# Create new file from template
local temp_file
temp_file=$(mktemp) || {
log_error "Failed to create temporary file"
return 1
}
if create_new_agent_file "$target_file" "$temp_file" "$project_name" "$current_date"; then
if mv "$temp_file" "$target_file"; then
log_success "Created new $agent_name context file"
else
log_error "Failed to move temporary file to $target_file"
rm -f "$temp_file"
return 1
fi
else
log_error "Failed to create new agent file"
rm -f "$temp_file"
return 1
fi
else
# Update existing file
if [[ ! -r "$target_file" ]]; then
log_error "Cannot read existing file: $target_file"
return 1
fi
if [[ ! -w "$target_file" ]]; then
log_error "Cannot write to existing file: $target_file"
return 1
fi
if update_existing_agent_file "$target_file" "$current_date"; then
log_success "Updated existing $agent_name context file"
else
log_error "Failed to update existing agent file"
return 1
fi
fi
return 0
}
#==============================================================================
# Agent Selection and Processing
#==============================================================================
update_specific_agent() {
local agent_type="$1"
case "$agent_type" in
claude)
update_agent_file "$CLAUDE_FILE" "Claude Code"
;;
gemini)
update_agent_file "$GEMINI_FILE" "Gemini CLI"
;;
copilot)
update_agent_file "$COPILOT_FILE" "GitHub Copilot"
;;
cursor-agent)
update_agent_file "$CURSOR_FILE" "Cursor IDE"
;;
qwen)
update_agent_file "$QWEN_FILE" "Qwen Code"
;;
opencode)
update_agent_file "$AGENTS_FILE" "opencode"
;;
codex)
update_agent_file "$AGENTS_FILE" "Codex CLI"
;;
windsurf)
update_agent_file "$WINDSURF_FILE" "Windsurf"
;;
kilocode)
update_agent_file "$KILOCODE_FILE" "Kilo Code"
;;
auggie)
update_agent_file "$AUGGIE_FILE" "Auggie CLI"
;;
roo)
update_agent_file "$ROO_FILE" "Roo Code"
;;
codebuddy)
update_agent_file "$CODEBUDDY_FILE" "CodeBuddy CLI"
;;
amp)
update_agent_file "$AMP_FILE" "Amp"
;;
shai)
update_agent_file "$SHAI_FILE" "SHAI"
;;
q)
update_agent_file "$Q_FILE" "Amazon Q Developer CLI"
;;
bob)
update_agent_file "$BOB_FILE" "IBM Bob"
;;
*)
log_error "Unknown agent type '$agent_type'"
log_error "Expected: claude|gemini|copilot|cursor-agent|qwen|opencode|codex|windsurf|kilocode|auggie|roo|amp|shai|q|bob"
exit 1
;;
esac
}
update_all_existing_agents() {
local found_agent=false
# Check each possible agent file and update if it exists
if [[ -f "$CLAUDE_FILE" ]]; then
update_agent_file "$CLAUDE_FILE" "Claude Code"
found_agent=true
fi
if [[ -f "$GEMINI_FILE" ]]; then
update_agent_file "$GEMINI_FILE" "Gemini CLI"
found_agent=true
fi
if [[ -f "$COPILOT_FILE" ]]; then
update_agent_file "$COPILOT_FILE" "GitHub Copilot"
found_agent=true
fi
if [[ -f "$CURSOR_FILE" ]]; then
update_agent_file "$CURSOR_FILE" "Cursor IDE"
found_agent=true
fi
if [[ -f "$QWEN_FILE" ]]; then
update_agent_file "$QWEN_FILE" "Qwen Code"
found_agent=true
fi
if [[ -f "$AGENTS_FILE" ]]; then
update_agent_file "$AGENTS_FILE" "Codex/opencode"
found_agent=true
fi
if [[ -f "$WINDSURF_FILE" ]]; then
update_agent_file "$WINDSURF_FILE" "Windsurf"
found_agent=true
fi
if [[ -f "$KILOCODE_FILE" ]]; then
update_agent_file "$KILOCODE_FILE" "Kilo Code"
found_agent=true
fi
if [[ -f "$AUGGIE_FILE" ]]; then
update_agent_file "$AUGGIE_FILE" "Auggie CLI"
found_agent=true
fi
if [[ -f "$ROO_FILE" ]]; then
update_agent_file "$ROO_FILE" "Roo Code"
found_agent=true
fi
if [[ -f "$CODEBUDDY_FILE" ]]; then
update_agent_file "$CODEBUDDY_FILE" "CodeBuddy CLI"
found_agent=true
fi
if [[ -f "$SHAI_FILE" ]]; then
update_agent_file "$SHAI_FILE" "SHAI"
found_agent=true
fi
if [[ -f "$Q_FILE" ]]; then
update_agent_file "$Q_FILE" "Amazon Q Developer CLI"
found_agent=true
fi
if [[ -f "$BOB_FILE" ]]; then
update_agent_file "$BOB_FILE" "IBM Bob"
found_agent=true
fi
# If no agent files exist, create a default Claude file
if [[ "$found_agent" == false ]]; then
log_info "No existing agent files found, creating default Claude file..."
update_agent_file "$CLAUDE_FILE" "Claude Code"
fi
}
print_summary() {
echo
log_info "Summary of changes:"
if [[ -n "$NEW_LANG" ]]; then
echo " - Added language: $NEW_LANG"
fi
if [[ -n "$NEW_FRAMEWORK" ]]; then
echo " - Added framework: $NEW_FRAMEWORK"
fi
if [[ -n "$NEW_DB" ]] && [[ "$NEW_DB" != "N/A" ]]; then
echo " - Added database: $NEW_DB"
fi
echo
log_info "Usage: $0 [claude|gemini|copilot|cursor-agent|qwen|opencode|codex|windsurf|kilocode|auggie|codebuddy|shai|q|bob]"
}
#==============================================================================
# Main Execution
#==============================================================================
main() {
# Validate environment before proceeding
validate_environment
log_info "=== Updating agent context files for feature $CURRENT_BRANCH ==="
# Parse the plan file to extract project information
if ! parse_plan_data "$NEW_PLAN"; then
log_error "Failed to parse plan data"
exit 1
fi
# Process based on agent type argument
local success=true
if [[ -z "$AGENT_TYPE" ]]; then
# No specific agent provided - update all existing agent files
log_info "No agent specified, updating all existing agent files..."
if ! update_all_existing_agents; then
success=false
fi
else
# Specific agent provided - update only that agent
log_info "Updating specific agent: $AGENT_TYPE"
if ! update_specific_agent "$AGENT_TYPE"; then
success=false
fi
fi
# Print summary
print_summary
if [[ "$success" == true ]]; then
log_success "Agent context update completed successfully"
exit 0
else
log_error "Agent context update completed with errors"
exit 1
fi
}
# Execute main function if script is run directly
if [[ "${BASH_SOURCE[0]}" == "${0}" ]]; then
main "$@"
fi

View file

@ -0,0 +1,28 @@
# [PROJECT NAME] Development Guidelines
Auto-generated from all feature plans. Last updated: [DATE]
## Active Technologies
[EXTRACTED FROM ALL PLAN.MD FILES]
## Project Structure
```text
[ACTUAL STRUCTURE FROM PLANS]
```
## Commands
[ONLY COMMANDS FOR ACTIVE TECHNOLOGIES]
## Code Style
[LANGUAGE-SPECIFIC, ONLY FOR LANGUAGES IN USE]
## Recent Changes
[LAST 3 FEATURES AND WHAT THEY ADDED]
<!-- MANUAL ADDITIONS START -->
<!-- MANUAL ADDITIONS END -->

View file

@ -0,0 +1,40 @@
# [CHECKLIST TYPE] Checklist: [FEATURE NAME]
**Purpose**: [Brief description of what this checklist covers]
**Created**: [DATE]
**Feature**: [Link to spec.md or relevant documentation]
**Note**: This checklist is generated by the `/speckit.checklist` command based on feature context and requirements.
<!--
============================================================================
IMPORTANT: The checklist items below are SAMPLE ITEMS for illustration only.
The /speckit.checklist command MUST replace these with actual items based on:
- User's specific checklist request
- Feature requirements from spec.md
- Technical context from plan.md
- Implementation details from tasks.md
DO NOT keep these sample items in the generated checklist file.
============================================================================
-->
## [Category 1]
- [ ] CHK001 First checklist item with clear action
- [ ] CHK002 Second checklist item
- [ ] CHK003 Third checklist item
## [Category 2]
- [ ] CHK004 Another category item
- [ ] CHK005 Item with specific criteria
- [ ] CHK006 Final item in this category
## Notes
- Check items off as completed: `[x]`
- Add comments or findings inline
- Link to relevant resources or documentation
- Items are numbered sequentially for easy reference

View file

@ -0,0 +1,104 @@
# Implementation Plan: [FEATURE]
**Branch**: `[###-feature-name]` | **Date**: [DATE] | **Spec**: [link]
**Input**: Feature specification from `/specs/[###-feature-name]/spec.md`
**Note**: This template is filled in by the `/speckit.plan` command. See `.specify/templates/commands/plan.md` for the execution workflow.
## Summary
[Extract from feature spec: primary requirement + technical approach from research]
## Technical Context
<!--
ACTION REQUIRED: Replace the content in this section with the technical details
for the project. The structure here is presented in advisory capacity to guide
the iteration process.
-->
**Language/Version**: [e.g., Python 3.11, Swift 5.9, Rust 1.75 or NEEDS CLARIFICATION]
**Primary Dependencies**: [e.g., FastAPI, UIKit, LLVM or NEEDS CLARIFICATION]
**Storage**: [if applicable, e.g., PostgreSQL, CoreData, files or N/A]
**Testing**: [e.g., pytest, XCTest, cargo test or NEEDS CLARIFICATION]
**Target Platform**: [e.g., Linux server, iOS 15+, WASM or NEEDS CLARIFICATION]
**Project Type**: [single/web/mobile - determines source structure]
**Performance Goals**: [domain-specific, e.g., 1000 req/s, 10k lines/sec, 60 fps or NEEDS CLARIFICATION]
**Constraints**: [domain-specific, e.g., <200ms p95, <100MB memory, offline-capable or NEEDS CLARIFICATION]
**Scale/Scope**: [domain-specific, e.g., 10k users, 1M LOC, 50 screens or NEEDS CLARIFICATION]
## Constitution Check
*GATE: Must pass before Phase 0 research. Re-check after Phase 1 design.*
[Gates determined based on constitution file]
## Project Structure
### Documentation (this feature)
```text
specs/[###-feature]/
├── plan.md # This file (/speckit.plan command output)
├── research.md # Phase 0 output (/speckit.plan command)
├── data-model.md # Phase 1 output (/speckit.plan command)
├── quickstart.md # Phase 1 output (/speckit.plan command)
├── contracts/ # Phase 1 output (/speckit.plan command)
└── tasks.md # Phase 2 output (/speckit.tasks command - NOT created by /speckit.plan)
```
### Source Code (repository root)
<!--
ACTION REQUIRED: Replace the placeholder tree below with the concrete layout
for this feature. Delete unused options and expand the chosen structure with
real paths (e.g., apps/admin, packages/something). The delivered plan must
not include Option labels.
-->
```text
# [REMOVE IF UNUSED] Option 1: Single project (DEFAULT)
src/
├── models/
├── services/
├── cli/
└── lib/
tests/
├── contract/
├── integration/
└── unit/
# [REMOVE IF UNUSED] Option 2: Web application (when "frontend" + "backend" detected)
backend/
├── src/
│ ├── models/
│ ├── services/
│ └── api/
└── tests/
frontend/
├── src/
│ ├── components/
│ ├── pages/
│ └── services/
└── tests/
# [REMOVE IF UNUSED] Option 3: Mobile + API (when "iOS/Android" detected)
api/
└── [same as backend above]
ios/ or android/
└── [platform-specific structure: feature modules, UI flows, platform tests]
```
**Structure Decision**: [Document the selected structure and reference the real
directories captured above]
## Complexity Tracking
> **Fill ONLY if Constitution Check has violations that must be justified**
| Violation | Why Needed | Simpler Alternative Rejected Because |
|-----------|------------|-------------------------------------|
| [e.g., 4th project] | [current need] | [why 3 projects insufficient] |
| [e.g., Repository pattern] | [specific problem] | [why direct DB access insufficient] |

View file

@ -0,0 +1,115 @@
# Feature Specification: [FEATURE NAME]
**Feature Branch**: `[###-feature-name]`
**Created**: [DATE]
**Status**: Draft
**Input**: User description: "$ARGUMENTS"
## User Scenarios & Testing *(mandatory)*
<!--
IMPORTANT: User stories should be PRIORITIZED as user journeys ordered by importance.
Each user story/journey must be INDEPENDENTLY TESTABLE - meaning if you implement just ONE of them,
you should still have a viable MVP (Minimum Viable Product) that delivers value.
Assign priorities (P1, P2, P3, etc.) to each story, where P1 is the most critical.
Think of each story as a standalone slice of functionality that can be:
- Developed independently
- Tested independently
- Deployed independently
- Demonstrated to users independently
-->
### User Story 1 - [Brief Title] (Priority: P1)
[Describe this user journey in plain language]
**Why this priority**: [Explain the value and why it has this priority level]
**Independent Test**: [Describe how this can be tested independently - e.g., "Can be fully tested by [specific action] and delivers [specific value]"]
**Acceptance Scenarios**:
1. **Given** [initial state], **When** [action], **Then** [expected outcome]
2. **Given** [initial state], **When** [action], **Then** [expected outcome]
---
### User Story 2 - [Brief Title] (Priority: P2)
[Describe this user journey in plain language]
**Why this priority**: [Explain the value and why it has this priority level]
**Independent Test**: [Describe how this can be tested independently]
**Acceptance Scenarios**:
1. **Given** [initial state], **When** [action], **Then** [expected outcome]
---
### User Story 3 - [Brief Title] (Priority: P3)
[Describe this user journey in plain language]
**Why this priority**: [Explain the value and why it has this priority level]
**Independent Test**: [Describe how this can be tested independently]
**Acceptance Scenarios**:
1. **Given** [initial state], **When** [action], **Then** [expected outcome]
---
[Add more user stories as needed, each with an assigned priority]
### Edge Cases
<!--
ACTION REQUIRED: The content in this section represents placeholders.
Fill them out with the right edge cases.
-->
- What happens when [boundary condition]?
- How does system handle [error scenario]?
## Requirements *(mandatory)*
<!--
ACTION REQUIRED: The content in this section represents placeholders.
Fill them out with the right functional requirements.
-->
### Functional Requirements
- **FR-001**: System MUST [specific capability, e.g., "allow users to create accounts"]
- **FR-002**: System MUST [specific capability, e.g., "validate email addresses"]
- **FR-003**: Users MUST be able to [key interaction, e.g., "reset their password"]
- **FR-004**: System MUST [data requirement, e.g., "persist user preferences"]
- **FR-005**: System MUST [behavior, e.g., "log all security events"]
*Example of marking unclear requirements:*
- **FR-006**: System MUST authenticate users via [NEEDS CLARIFICATION: auth method not specified - email/password, SSO, OAuth?]
- **FR-007**: System MUST retain user data for [NEEDS CLARIFICATION: retention period not specified]
### Key Entities *(include if feature involves data)*
- **[Entity 1]**: [What it represents, key attributes without implementation]
- **[Entity 2]**: [What it represents, relationships to other entities]
## Success Criteria *(mandatory)*
<!--
ACTION REQUIRED: Define measurable success criteria.
These must be technology-agnostic and measurable.
-->
### Measurable Outcomes
- **SC-001**: [Measurable metric, e.g., "Users can complete account creation in under 2 minutes"]
- **SC-002**: [Measurable metric, e.g., "System handles 1000 concurrent users without degradation"]
- **SC-003**: [User satisfaction metric, e.g., "90% of users successfully complete primary task on first attempt"]
- **SC-004**: [Business metric, e.g., "Reduce support tickets related to [X] by 50%"]

View file

@ -0,0 +1,251 @@
---
description: "Task list template for feature implementation"
---
# Tasks: [FEATURE NAME]
**Input**: Design documents from `/specs/[###-feature-name]/`
**Prerequisites**: plan.md (required), spec.md (required for user stories), research.md, data-model.md, contracts/
**Tests**: The examples below include test tasks. Tests are OPTIONAL - only include them if explicitly requested in the feature specification.
**Organization**: Tasks are grouped by user story to enable independent implementation and testing of each story.
## Format: `[ID] [P?] [Story] Description`
- **[P]**: Can run in parallel (different files, no dependencies)
- **[Story]**: Which user story this task belongs to (e.g., US1, US2, US3)
- Include exact file paths in descriptions
## Path Conventions
- **Single project**: `src/`, `tests/` at repository root
- **Web app**: `backend/src/`, `frontend/src/`
- **Mobile**: `api/src/`, `ios/src/` or `android/src/`
- Paths shown below assume single project - adjust based on plan.md structure
<!--
============================================================================
IMPORTANT: The tasks below are SAMPLE TASKS for illustration purposes only.
The /speckit.tasks command MUST replace these with actual tasks based on:
- User stories from spec.md (with their priorities P1, P2, P3...)
- Feature requirements from plan.md
- Entities from data-model.md
- Endpoints from contracts/
Tasks MUST be organized by user story so each story can be:
- Implemented independently
- Tested independently
- Delivered as an MVP increment
DO NOT keep these sample tasks in the generated tasks.md file.
============================================================================
-->
## Phase 1: Setup (Shared Infrastructure)
**Purpose**: Project initialization and basic structure
- [ ] T001 Create project structure per implementation plan
- [ ] T002 Initialize [language] project with [framework] dependencies
- [ ] T003 [P] Configure linting and formatting tools
---
## Phase 2: Foundational (Blocking Prerequisites)
**Purpose**: Core infrastructure that MUST be complete before ANY user story can be implemented
**⚠️ CRITICAL**: No user story work can begin until this phase is complete
Examples of foundational tasks (adjust based on your project):
- [ ] T004 Setup database schema and migrations framework
- [ ] T005 [P] Implement authentication/authorization framework
- [ ] T006 [P] Setup API routing and middleware structure
- [ ] T007 Create base models/entities that all stories depend on
- [ ] T008 Configure error handling and logging infrastructure
- [ ] T009 Setup environment configuration management
**Checkpoint**: Foundation ready - user story implementation can now begin in parallel
---
## Phase 3: User Story 1 - [Title] (Priority: P1) 🎯 MVP
**Goal**: [Brief description of what this story delivers]
**Independent Test**: [How to verify this story works on its own]
### Tests for User Story 1 (OPTIONAL - only if tests requested) ⚠️
> **NOTE: Write these tests FIRST, ensure they FAIL before implementation**
- [ ] T010 [P] [US1] Contract test for [endpoint] in tests/contract/test_[name].py
- [ ] T011 [P] [US1] Integration test for [user journey] in tests/integration/test_[name].py
### Implementation for User Story 1
- [ ] T012 [P] [US1] Create [Entity1] model in src/models/[entity1].py
- [ ] T013 [P] [US1] Create [Entity2] model in src/models/[entity2].py
- [ ] T014 [US1] Implement [Service] in src/services/[service].py (depends on T012, T013)
- [ ] T015 [US1] Implement [endpoint/feature] in src/[location]/[file].py
- [ ] T016 [US1] Add validation and error handling
- [ ] T017 [US1] Add logging for user story 1 operations
**Checkpoint**: At this point, User Story 1 should be fully functional and testable independently
---
## Phase 4: User Story 2 - [Title] (Priority: P2)
**Goal**: [Brief description of what this story delivers]
**Independent Test**: [How to verify this story works on its own]
### Tests for User Story 2 (OPTIONAL - only if tests requested) ⚠️
- [ ] T018 [P] [US2] Contract test for [endpoint] in tests/contract/test_[name].py
- [ ] T019 [P] [US2] Integration test for [user journey] in tests/integration/test_[name].py
### Implementation for User Story 2
- [ ] T020 [P] [US2] Create [Entity] model in src/models/[entity].py
- [ ] T021 [US2] Implement [Service] in src/services/[service].py
- [ ] T022 [US2] Implement [endpoint/feature] in src/[location]/[file].py
- [ ] T023 [US2] Integrate with User Story 1 components (if needed)
**Checkpoint**: At this point, User Stories 1 AND 2 should both work independently
---
## Phase 5: User Story 3 - [Title] (Priority: P3)
**Goal**: [Brief description of what this story delivers]
**Independent Test**: [How to verify this story works on its own]
### Tests for User Story 3 (OPTIONAL - only if tests requested) ⚠️
- [ ] T024 [P] [US3] Contract test for [endpoint] in tests/contract/test_[name].py
- [ ] T025 [P] [US3] Integration test for [user journey] in tests/integration/test_[name].py
### Implementation for User Story 3
- [ ] T026 [P] [US3] Create [Entity] model in src/models/[entity].py
- [ ] T027 [US3] Implement [Service] in src/services/[service].py
- [ ] T028 [US3] Implement [endpoint/feature] in src/[location]/[file].py
**Checkpoint**: All user stories should now be independently functional
---
[Add more user story phases as needed, following the same pattern]
---
## Phase N: Polish & Cross-Cutting Concerns
**Purpose**: Improvements that affect multiple user stories
- [ ] TXXX [P] Documentation updates in docs/
- [ ] TXXX Code cleanup and refactoring
- [ ] TXXX Performance optimization across all stories
- [ ] TXXX [P] Additional unit tests (if requested) in tests/unit/
- [ ] TXXX Security hardening
- [ ] TXXX Run quickstart.md validation
---
## Dependencies & Execution Order
### Phase Dependencies
- **Setup (Phase 1)**: No dependencies - can start immediately
- **Foundational (Phase 2)**: Depends on Setup completion - BLOCKS all user stories
- **User Stories (Phase 3+)**: All depend on Foundational phase completion
- User stories can then proceed in parallel (if staffed)
- Or sequentially in priority order (P1 → P2 → P3)
- **Polish (Final Phase)**: Depends on all desired user stories being complete
### User Story Dependencies
- **User Story 1 (P1)**: Can start after Foundational (Phase 2) - No dependencies on other stories
- **User Story 2 (P2)**: Can start after Foundational (Phase 2) - May integrate with US1 but should be independently testable
- **User Story 3 (P3)**: Can start after Foundational (Phase 2) - May integrate with US1/US2 but should be independently testable
### Within Each User Story
- Tests (if included) MUST be written and FAIL before implementation
- Models before services
- Services before endpoints
- Core implementation before integration
- Story complete before moving to next priority
### Parallel Opportunities
- All Setup tasks marked [P] can run in parallel
- All Foundational tasks marked [P] can run in parallel (within Phase 2)
- Once Foundational phase completes, all user stories can start in parallel (if team capacity allows)
- All tests for a user story marked [P] can run in parallel
- Models within a story marked [P] can run in parallel
- Different user stories can be worked on in parallel by different team members
---
## Parallel Example: User Story 1
```bash
# Launch all tests for User Story 1 together (if tests requested):
Task: "Contract test for [endpoint] in tests/contract/test_[name].py"
Task: "Integration test for [user journey] in tests/integration/test_[name].py"
# Launch all models for User Story 1 together:
Task: "Create [Entity1] model in src/models/[entity1].py"
Task: "Create [Entity2] model in src/models/[entity2].py"
```
---
## Implementation Strategy
### MVP First (User Story 1 Only)
1. Complete Phase 1: Setup
2. Complete Phase 2: Foundational (CRITICAL - blocks all stories)
3. Complete Phase 3: User Story 1
4. **STOP and VALIDATE**: Test User Story 1 independently
5. Deploy/demo if ready
### Incremental Delivery
1. Complete Setup + Foundational → Foundation ready
2. Add User Story 1 → Test independently → Deploy/Demo (MVP!)
3. Add User Story 2 → Test independently → Deploy/Demo
4. Add User Story 3 → Test independently → Deploy/Demo
5. Each story adds value without breaking previous stories
### Parallel Team Strategy
With multiple developers:
1. Team completes Setup + Foundational together
2. Once Foundational is done:
- Developer A: User Story 1
- Developer B: User Story 2
- Developer C: User Story 3
3. Stories complete and integrate independently
---
## Notes
- [P] tasks = different files, no dependencies
- [Story] label maps task to specific user story for traceability
- Each user story should be independently completable and testable
- Verify tests fail before implementing
- Commit after each task or logical group
- Stop at any checkpoint to validate story independently
- Avoid: vague tasks, same file conflicts, cross-story dependencies that break independence

66
flaredb/Cargo.toml Normal file
View file

@ -0,0 +1,66 @@
[workspace]
members = [
"crates/flaredb-types",
"crates/flaredb-proto",
"crates/flaredb-storage",
"crates/flaredb-raft",
"crates/flaredb-server",
"crates/flaredb-pd",
"crates/flaredb-client",
"crates/flaredb-cli",
]
resolver = "2"
[workspace.package]
version = "0.1.0"
edition = "2021"
license = "MIT OR Apache-2.0"
rust-version = "1.75"
[workspace.dependencies]
# Async runtime
tokio = { version = "1.40", features = ["full"] }
tokio-stream = "0.1"
futures = "0.3"
async-trait = "0.1"
# Distributed Consensus
openraft = { version = "0.9", features = ["serde"] }
# Storage
rocksdb = { version = "0.24", default-features = false, features = ["multi-threaded-cf", "zstd", "lz4", "snappy"] }
# gRPC
tonic = "0.12"
tonic-build = "0.12"
tonic-health = "0.12"
prost = "0.13"
prost-types = "0.13"
# Serialization
serde = { version = "1.0", features = ["derive"] }
serde_json = "1.0"
bincode = "1.3"
# Logging & Tracing
tracing = "0.1"
tracing-subscriber = { version = "0.3", features = ["env-filter"] }
# Error handling
thiserror = "1.0"
anyhow = "1.0"
# CLI
clap = { version = "4.5", features = ["derive"] }
# Utilities
sha2 = "0.10"
bytes = "1.5"
# Testing
tempfile = "3"
[profile.release]
lto = true
opt-level = 3
codegen-units = 1

124
flaredb/advice.md Normal file
View file

@ -0,0 +1,124 @@
ざっくり結論
* **Chainfire**は、RaftRocksDBgRPCGossipSWIM/focaで「etcd 風の分散KVWatch」を狙う設計。Rust のワークスペース分割もきれいで、API/ストレージ/ウォッチ/ゴシップ/ラフトがモジュール化されている。ただし**Raft の対外RPCがまだ未配線inmemory/ダミー)**で、本当の多ノードクラスタとしては未完成。単一ノードやプロセス内検証には十分使える段階。
* **FlareDB**は、PDPlacement DriverTSO単調増加タイムスタンプKV(Raw/CAS)Raftサービスリージョン/マルチラフトの下地Merkle整合性検査の雛形まで手が入っており、**実験用の分散ストレージ最小系**としてよくまとまっている。CI/テスト項目・Quickstart・検証スクリプトもあり、開発者体験が良い。実運用には、マルチラフトの完成度・レプリケーション/再配置・フォロワリード/線形化リード・トランザクションなど**次の一歩**が必要。
---
## Chainfire何ができていて、どこが足りないか
**できていること(コードから確認できる実体)**
* Rust Workspace でAPI/サーバ/ストレージ/ラフト/ゴシップ/ウォッチが分離。依存は `openraft`Raft`foca`SWIM Gossip`rocksdb``tonic/prost`gRPCに整理済み。
* Raft 設定は OpenRaft の典型値で初期化(心拍/選挙タイムアウト/スナップショット方針等)し、ユニットテストもあり。
* gRPC の **KV / Watch / Cluster / 内部Raft** サービスを一つのTonicサーバに束ねて起動する作り。
* **Watch** は双方向ストリームで、内部のWatchRegistryとつながるちゃんとした実装。クライアント側の受信ハンドルも用意済み。
* RocksDB をCF分割で利用。スナップショットのビルド/適用テストあり(データ転送の下地)。
**詰めが甘い/未完成な点(現状の制約)**
* **Raft RPCが未配線**`RaftRpcClient` は “gRPC実装を後で差す” 前提のトレイトのまま。ノード生成時も **Dummy/Inmemory のクライアント**が使われており、実ノード間通信になっていない。これだと**単一プロセス内での検証**はできるが、別プロセス/別ホストにまたぐクラスタは動かない。
* **Raft用ポートの扱い**:ログには Raft用アドレスを出しているが、実際のTonicサーバは **APIアドレスでまとめて** `RaftService` も公開している。ポート分離・セキュリティ/ネットワーク設計が未整理。
* クラスタメンバーシップ変更joint consensusや、線形化読み取りReadIndex、スナップショット転送の堅牢化など、Raft運用の“本番ポイント”は未記述/未配線に見える設計としてはOpenRaftが担保可能
**今の実用性(どこで役に立つ?)**
* **研究/検証・単一ードのメタデータKV**としては十分。“etcd互換風のAPIWatch”の感触を掴むには良い。
* **本番クラスタ**やフェイルオーバを求める用途では、**Raft RPC配線とメンバーシップ管理**が入るまで待ちが必要。
**短期で刺さる改善(着手順)**
1. **RaftのgRPCクライアント**を `internal_proto` に基づいて実装し、`RaftRpcClient` に差し込む。
2. **Raft用ポート分離**`api_addr``raft_addr` を別サーバで起動し、TLS/認証の下地も確保。
3. **Gossip⇔Raft連携**focaでの生存監視をトリガに、メンバー自動追加/離脱をRaftのjointconsensusに流す。依存は既にワークスペースにある。
4. **線形化Read/ReadIndex**実装、**フォロワリード**(許容するなら条件付き)を整理。
5. **ウォッチの厳密な順序/Revision**保証をStateMachineの適用と一体化watch_txの結線
6. **スナップショット転送の実戦投入**(チャンク/再送/検証)。テストは下地あり。
7. **メトリクス/トレース**Prometheus/OpenTelemetryと**障害注入テスト**。
8. Docker/Helm/Flakeの梱包をCIに載せる。
---
## FlareDB何ができていて、どこが足りないか
**できていること(コードから確認できる実体)**
* **PDTSO** の独立プロセス。**Quickstart**に起動順とCLI操作TSO/Raw Put/Get/CASが書かれており、User StoryのチェックリストにもTSO達成が明記。
* **サーバ側サービス**`KvRaw`/`KvCas`/`RaftService` を同一 gRPC サーバで提供。
* **PD連携のハートビート/再接続・リージョン更新ループ**の骨格がある起動後に定期HB→失敗時は再接続、リージョン情報を同期
* **Merkle**(領域ハッシュの雛形)で後々のアンチエントロピー/整合性検査を意識。
* **テストと仕様フォルダが豊富**:レプリケーション/マルチリージョン/スプリット/整合性などのテスト群、spec・scripts で動作確認の導線がある。
**詰めが甘い/未完成な点(現状の制約)**
* **マルチラフトの完成度**:リージョン分割・再配置・投票者/ラーナ/学習者の遷移、PDのスケジューリングリバランス/ホットキー対策の“運用アルゴリズム”はこれから。ディレクトリやspecはあるが、本番相当の道具立ては未完成。
* **リードパスの整理**:強整合/フォロワリード/ReadIndexの選択や遅延観測の制御が未整備に見える。
* **トランザクションMVCC**TSOはあるが、二相コミットや悲観/楽観制御、ロールバック/ロック解放の実働コードはこれからCASはある
* **障害時挙動と耐久性**:スナップショット/ログの回復・リージョンマージ・アンチエントロピーMerkle駆動のバックグラウンドジョブは雛形段階。
**今の実用性**
* 研究用途・PoC として**単一少数ードのKVRaw/CAS**を回し、PD/TSO連携やリージョンの概念を試すには充分。
* フル機能の分散トランザクショナルKV/SQL バックエンドを**本番投入**するには、マルチラフト/リージョン管理/トランザクション/可観測性などの整備が必要。
**短期で刺さる改善(着手順)**
1. **マルチラフトの完成**:リージョンスプリットのトリガ(サイズ/負荷→新リージョンのRaft起動→PDのメタ更新→クライアントのRegion Cache更新をE2Eでつなぐ。テスト骨子は既にある。
2. **フォロワリード/線形化Read**の切替を導入読み取りSLAと一貫性を両立
3. **MVCC2PC**TSO を commit_ts/read_ts に使い、Prewrite/CommitTiKV流 or OCC を追加。Quickstart のCASを土台に昇華。
4. **Merkleベースのアンチエントロピー**バックグラウンドでリージョンのMerkle葉を比較し、差分レンジを修復。
5. **PDのスケジューラ**:移動コスト・ホットキー・障害隔離を考慮した配置。
6. **メトリクス/トレース/プロファイリング**と**YCSB/Jepsen系テスト**で性能と安全性を可視化。
---
## さらに高みへ(共通の設計指針)
1. **制御面Chainfire×データ面FlareDBの分業を明確化**
Chainfire を“クラスタ制御の中枢”(ノードメタ/アロケーション/設定/ウォッチに、FlareDB を“データ平面”に寄せる。Gossipの生存情報→ChainfireのKV→FlareDB PDへの反映という**単一路**を敷くと運用が楽になる。
2. **アドレス解決とメンバーシップの一元管理**
ChainfireのCluster APIに Raft peer の `BasicNode` 情報を登録/取得する経路を作り、`NetworkFactory` がそこから**動的にダイヤル**できるようにする。現状はトレイトとFactoryが揃っているので配線だけで前進する。
3. **明示的なポート分離とゼロトラスト前提**
Client APIKV/Watchと Peer RPCRaftを分離配信し、mTLS認可を段階導入。今は一つのTonicサーバに同居している。
4. **線形化の“契約”をドキュメント化**
Watch の順序Revision と Read の一貫性ReadIndex/フォロワ/リーダをモード化して明示する。API層は既に独立しているので拡張しやすい。
5. **スナップショットと再構築の運用設計**
既存のスナップショット構造を基に**chunked streaming/再送/検証**を実装し、ローリングアップグレードと迅速なリカバリを可能に。
6. **MVCCTSOで“トランザクション対応のFlareDB”へ**
まずは単一リージョンで2PC/OCCを成立させ、その後リージョンを跨ぐ分散トランザクションへ。Quickstart とタスク表に沿って前進できる。
7. **可観測性と安全性**
すべてのRaft RPC・適用・スナップショット・Gossipイベントに**トレースID**を通し、Prometheus でSLOを監視。障害注入テストネット分断/ディスク遅延)を回す。
---
## 小さく始めるチェックリスト12スプリント
**Chainfire**
* [ ] `RaftRpcClient` の gRPC 実装を追加(`internal_proto` をクライアント化)し、`Dummy` を置き換え。
* [ ] `api_addr``raft_addr` を別 `Server``serve`。ログ出力と一致させる。
* [ ] Gossip からメンバーの up/down を拾い、Cluster API経由でRaft構成変更に反映。
**FlareDB**
* [ ] `verify-multiraft.sh` とテスト群に合わせ、リージョンスプリット→新ラフト起動→PD更新→クライアントRegion Cache更新の一連をE2E化。
* [ ] フォロワリード/線形化Readの切替をサービスに実装。
* [ ] TSO を使った MVCC の最小実装単一リージョンを追加してから、2PCへ拡張。
---
## まとめ(現実的な採用ライン)
* **今すぐの実用**Chainfire はシングルードの構成管理KVWatchに、FlareDB は PD/TSO を含む実験用KV に向く。
* **クラスタ本番**Chainfire は **Raft RPC 配線+メンバーシップ**実装が最初の関門。FlareDB は **マルチラフト/リージョン運用MVCC/2PC** が到達点。
* 設計の方向性はどちらも筋が良い。エンジンは積んだ、あとは**配線と路面(運用)**を敷けば走れる、という状態だ。
> 補足この評価はアップロードされたリポジトリのソース構成・実装・Spec/Quickstart に基づいています抜粋例依存関係・OpenRaft設定・サーバ起動配線・Dummy/Inmemory RPC・PD/TSO・テスト/スクリプトなど)。必要なら、具体ファイル/行も辿れるように示しました。
この先、どのユースケースたとえばK8sのコントロールプレーン用KV大規模Key-Valueの裏側学術実験を主眼にするかで実装の優先度は変わります。用途を教えてくれれば、必要機能の優先順位表まで落とし込みます。

1935
flaredb/chat.md Normal file

File diff suppressed because it is too large Load diff

View file

@ -0,0 +1,9 @@
[package]
name = "flaredb-cli"
version.workspace = true
edition.workspace = true
[dependencies]
flaredb-client = { path = "../flaredb-client" }
tokio.workspace = true
clap.workspace = true

View file

@ -0,0 +1,3 @@
fn main() {
println!("Hello from rdb-cli!");
}

View file

@ -0,0 +1,14 @@
[package]
name = "flaredb-client"
version.workspace = true
edition.workspace = true
[dependencies]
flaredb-proto = { path = "../flaredb-proto" }
tokio.workspace = true
tonic.workspace = true
prost.workspace = true
clap.workspace = true
[dev-dependencies]
tokio-stream.workspace = true

Some files were not shown because too many files have changed in this diff Show more