photoncloud-monorepo/specifications/chainfire/README.md
centra 5c6eb04a46 T036: Add VM cluster deployment configs for nixos-anywhere
- netboot-base.nix with SSH key auth
- Launch scripts for node01/02/03
- Node configuration.nix and disko.nix
- Nix modules for first-boot automation

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2025-12-11 09:59:19 +09:00

11 KiB

Chainfire Specification

Version: 1.0 | Status: Draft | Last Updated: 2025-12-08

1. Overview

1.1 Purpose

Chainfire is a distributed key-value store designed for cluster management with etcd-compatible semantics. It provides strongly consistent storage with MVCC (Multi-Version Concurrency Control), watch notifications, and transaction support.

1.2 Scope

  • In scope: Distributed KV storage, consensus (Raft), watch/subscribe, transactions, cluster membership
  • Out of scope: SQL queries, secondary indexes, full-text search

1.3 Design Goals

  • etcd API compatibility for ecosystem tooling
  • High availability via Raft consensus
  • Low latency for configuration management workloads
  • Simple deployment (single binary)

2. Architecture

2.1 Crate Structure

chainfire/
├── crates/
│   ├── chainfire-api/      # gRPC service implementations
│   ├── chainfire-core/     # Embeddable cluster library, config, callbacks
│   ├── chainfire-gossip/   # SWIM gossip protocol (foca)
│   ├── chainfire-raft/     # OpenRaft integration
│   ├── chainfire-server/   # Server binary, config
│   ├── chainfire-storage/  # RocksDB state machine
│   ├── chainfire-types/    # Shared types (KV, Watch, Command)
│   └── chainfire-watch/    # Watch registry
├── chainfire-client/       # Rust client library
└── proto/
    ├── chainfire.proto     # Public API (KV, Watch, Cluster)
    └── internal.proto      # Raft internal RPCs (Vote, AppendEntries)

2.2 Data Flow

[Client gRPC] → [API Layer] → [Raft Node] → [State Machine] → [RocksDB]
                     ↓              ↓
              [Watch Registry] ← [Events]

2.3 Dependencies

Crate Version Purpose
tokio 1.40 Async runtime
tonic 0.12 gRPC framework
openraft 0.9 Raft consensus
rocksdb 0.24 Storage engine
foca 1.0 SWIM gossip protocol
prost 0.13 Protocol buffers
dashmap 6 Concurrent hash maps

3. API

3.1 gRPC Services

KV Service (chainfire.v1.KV)

service KV {
  rpc Range(RangeRequest) returns (RangeResponse);
  rpc Put(PutRequest) returns (PutResponse);
  rpc Delete(DeleteRangeRequest) returns (DeleteRangeResponse);
  rpc Txn(TxnRequest) returns (TxnResponse);
}

Range (Get/Scan)

  • Single key lookup: key set, range_end empty
  • Range scan: key = start, range_end = end (exclusive)
  • Prefix scan: range_end = prefix + 1
  • Options: limit, revision (point-in-time), keys_only, count_only

Put

  • Writes key-value pair
  • Optional: lease (TTL), prev_kv (return previous)

Delete

  • Single key or range delete
  • Optional: prev_kv (return deleted values)

Transaction (Txn)

  • Atomic compare-and-swap operations
  • compare: Conditions to check
  • success: Operations if all conditions pass
  • failure: Operations if any condition fails

Watch Service (chainfire.v1.Watch)

service Watch {
  rpc Watch(stream WatchRequest) returns (stream WatchResponse);
}
  • Bidirectional streaming
  • Supports: single key, prefix, range watches
  • Historical replay via start_revision
  • Progress notifications

Cluster Service (chainfire.v1.Cluster)

service Cluster {
  rpc MemberAdd(MemberAddRequest) returns (MemberAddResponse);
  rpc MemberRemove(MemberRemoveRequest) returns (MemberRemoveResponse);
  rpc MemberList(MemberListRequest) returns (MemberListResponse);
  rpc Status(StatusRequest) returns (StatusResponse);
}

3.2 Client Library

use chainfire_client::Client;

let mut client = Client::connect("http://127.0.0.1:2379").await?;

// Put
let revision = client.put("key", "value").await?;

// Get
let value = client.get("key").await?;  // Option<Vec<u8>>

// Get with string convenience
let value = client.get_str("key").await?;  // Option<String>

// Prefix scan
let kvs = client.get_prefix("prefix/").await?;  // Vec<(key, value, revision)>

// Delete
let deleted = client.delete("key").await?;  // bool

// Status
let status = client.status().await?;
println!("Leader: {}, Term: {}", status.leader, status.raft_term);

3.3 Public Traits (chainfire-core)

ClusterEventHandler

#[async_trait]
pub trait ClusterEventHandler: Send + Sync {
    async fn on_node_joined(&self, node: &NodeInfo) {}
    async fn on_node_left(&self, node_id: u64, reason: LeaveReason) {}
    async fn on_leader_changed(&self, old: Option<u64>, new: u64) {}
    async fn on_became_leader(&self) {}
    async fn on_lost_leadership(&self) {}
    async fn on_membership_changed(&self, members: &[NodeInfo]) {}
    async fn on_partition_detected(&self, reachable: &[u64], unreachable: &[u64]) {}
    async fn on_cluster_ready(&self) {}
}

KvEventHandler

#[async_trait]
pub trait KvEventHandler: Send + Sync {
    async fn on_key_changed(&self, namespace: &str, key: &[u8], value: &[u8], revision: u64) {}
    async fn on_key_deleted(&self, namespace: &str, key: &[u8], revision: u64) {}
    async fn on_prefix_changed(&self, namespace: &str, prefix: &[u8], entries: &[KvEntry]) {}
}

StorageBackend

#[async_trait]
pub trait StorageBackend: Send + Sync {
    async fn get(&self, key: &[u8]) -> io::Result<Option<Vec<u8>>>;
    async fn put(&self, key: &[u8], value: &[u8]) -> io::Result<()>;
    async fn delete(&self, key: &[u8]) -> io::Result<bool>;
}

3.4 Embeddable Library (chainfire-core)

use chainfire_core::{ClusterBuilder, ClusterEventHandler};

let cluster = ClusterBuilder::new(node_id)
    .name("node-1")
    .gossip_addr("0.0.0.0:7946".parse()?)
    .raft_addr("0.0.0.0:2380".parse()?)
    .on_cluster_event(MyHandler)
    .build()
    .await?;

// Use the KVS
cluster.kv().put("key", b"value").await?;

4. Data Models

4.1 Core Types

KeyValue Entry

pub struct KvEntry {
    pub key: Vec<u8>,
    pub value: Vec<u8>,
    pub create_revision: u64,  // Revision when created (immutable)
    pub mod_revision: u64,     // Last modification revision
    pub version: u64,          // Update count (1, 2, 3, ...)
    pub lease_id: Option<i64>, // Lease ID for TTL expiration
}

Read Consistency Levels

pub enum ReadConsistency {
    Local,         // Read from local storage (may be stale)
    Serializable,  // Verify with leader's committed index
    Linearizable,  // Read only from leader (default, strongest)
}

Watch Event

pub enum WatchEventType {
    Put,
    Delete,
}

pub struct WatchEvent {
    pub event_type: WatchEventType,
    pub kv: KvEntry,
    pub prev_kv: Option<KvEntry>,
}

Response Header

pub struct ResponseHeader {
    pub cluster_id: u64,
    pub member_id: u64,
    pub revision: u64,    // Current store revision
    pub raft_term: u64,
}

4.2 Transaction Types

pub struct Compare {
    pub key: Vec<u8>,
    pub target: CompareTarget,
    pub result: CompareResult,
}

pub enum CompareTarget {
    Version(u64),
    CreateRevision(u64),
    ModRevision(u64),
    Value(Vec<u8>),
}

pub enum CompareResult {
    Equal,
    NotEqual,
    Greater,
    Less,
}

4.3 Storage Format

  • Engine: RocksDB
  • Column Families:
    • raft_logs: Raft log entries
    • raft_meta: Raft metadata (vote, term, membership)
    • key_value: KV data (key bytes → serialized KvEntry)
    • snapshot: Snapshot metadata
  • Metadata Keys: vote, last_applied, membership, revision, last_snapshot
  • Serialization: bincode for Raft, Protocol Buffers for gRPC
  • MVCC: Global revision counter, per-key create/mod revisions

5. Configuration

5.1 Config File Format (TOML)

[node]
id = 1
name = "chainfire-1"
role = "control_plane"  # or "worker"

[storage]
data_dir = "/var/lib/chainfire"

[network]
api_addr = "0.0.0.0:2379"
raft_addr = "0.0.0.0:2380"
gossip_addr = "0.0.0.0:2381"

[cluster]
id = 1
bootstrap = true
initial_members = []

[raft]
role = "voter"  # "voter", "learner", or "none"

5.2 Environment Variables

Variable Default Description
CHAINFIRE_DATA_DIR ./data Data directory
CHAINFIRE_API_ADDR 127.0.0.1:2379 Client API address
CHAINFIRE_RAFT_ADDR 127.0.0.1:2380 Raft peer address

5.3 Raft Tuning

heartbeat_interval: 150ms      // Leader heartbeat
election_timeout_min: 300ms    // Min election timeout
election_timeout_max: 600ms    // Max election timeout
snapshot_policy: LogsSinceLast(5000)
snapshot_max_chunk_size: 3MB
max_payload_entries: 300

6. Security

6.1 Authentication

  • Current: None (development mode)
  • Planned: mTLS for peer communication, token-based client auth

6.2 Authorization

  • Current: All operations permitted
  • Planned: RBAC integration with IAM (aegis)

6.3 Multi-tenancy

  • Namespace isolation: Key prefix per tenant
  • Planned: Per-namespace quotas, ACLs via IAM

7. Operations

7.1 Deployment

Single Node (Bootstrap)

chainfire-server --config config.toml
# With bootstrap = true in config

Cluster (3-node)

# Node 1 (bootstrap)
chainfire-server --config node1.toml

# Node 2, 3 (join)
# Set bootstrap = false, add node1 to initial_members
chainfire-server --config node2.toml

7.2 Monitoring

  • Health: gRPC health check service
  • Metrics: Prometheus endpoint (planned)
    • chainfire_kv_operations_total
    • chainfire_raft_term
    • chainfire_storage_bytes
    • chainfire_watch_active

7.3 Backup & Recovery

  • Snapshot: Automatic via Raft (every 5000 log entries)
  • Manual backup: Copy data_dir while stopped
  • Point-in-time: Use revision parameter in Range requests

8. Compatibility

8.1 API Versioning

  • gRPC package: chainfire.v1
  • Breaking changes: New major version (v2, v3)
  • Backward compatible: Add fields, new RPCs

8.2 Wire Protocol

  • Protocol Buffers 3
  • tonic/prost for Rust
  • Compatible with any gRPC client

8.3 etcd Compatibility

  • Compatible: KV operations, Watch, basic transactions
  • Different: gRPC package names, some field names
  • Not implemented: Lease service, Auth service (planned)

Appendix

A. Error Codes

Error Meaning
NOT_LEADER Node is not the Raft leader
KEY_NOT_FOUND Key does not exist
REVISION_COMPACTED Requested revision no longer available
TXN_FAILED Transaction condition not met

B. Raft Commands

pub enum RaftCommand {
    Put { key, value, lease_id, prev_kv },
    Delete { key, prev_kv },
    DeleteRange { start, end, prev_kv },
    Txn { compare, success, failure },
    Noop,  // Leadership establishment
}

C. Port Assignments

Port Protocol Purpose
2379 gRPC Client API
2380 gRPC Raft peer
2381 UDP SWIM gossip

D. Node Roles

/// Role in cluster gossip
pub enum NodeRole {
    ControlPlane,  // Participates in Raft consensus
    Worker,        // Gossip only, watches Control Plane
}

/// Role in Raft consensus
pub enum RaftRole {
    Voter,    // Full voting member
    Learner,  // Non-voting replica (receives log replication)
    None,     // No Raft participation (agent/proxy only)
}

E. Internal Raft RPCs (internal.proto)

service RaftService {
    rpc Vote(VoteRequest) returns (VoteResponse);
    rpc AppendEntries(AppendEntriesRequest) returns (AppendEntriesResponse);
    rpc InstallSnapshot(stream InstallSnapshotRequest) returns (InstallSnapshotResponse);
}