# Option C: Snapshot Pre-seed Workaround ## Problem OpenRaft 0.9.21 has a bug where the assertion `upto >= log_id_range.prev` fails in `progress/inflight/mod.rs:178` during learner replication. This occurs when: 1. A learner is added to a cluster with `add_learner()` 2. The leader's progress tracking state becomes inconsistent during initial log replication ## Root Cause Analysis When a new learner joins, it has empty log state. The leader must replicate all logs from the beginning. During this catch-up phase, OpenRaft's progress tracking can become inconsistent when: - Replication streams are re-spawned - Progress reverts to zero - The `upto >= log_id_range.prev` invariant is violated ## Workaround Approach: Snapshot Pre-seed Instead of relying on OpenRaft's log replication to catch up the learner, we pre-seed the learner with a snapshot before adding it to the cluster. ### How It Works 1. **Leader exports snapshot:** ```rust // On leader node let snapshot = raft_storage.get_current_snapshot().await?; let bytes = snapshot.snapshot.into_inner(); // Vec ``` 2. **Transfer snapshot to learner:** - Via file copy (manual) - Via new gRPC API endpoint (automated) 3. **Learner imports snapshot:** ```rust // On learner node, before starting Raft let snapshot = Snapshot::from_bytes(&bytes)?; snapshot_builder.apply(&snapshot)?; // Also set log state to match snapshot log_storage.purge(snapshot.meta.last_log_index)?; ``` 4. **Add pre-seeded learner:** - Learner already has state at `last_log_index` - Only recent entries (since snapshot) need replication - Minimal replication window avoids the bug ### Implementation Options #### Option C1: Manual Data Directory Copy - Copy leader's `data_dir/` to learner before starting - Simplest, but requires manual intervention - Good for initial cluster setup #### Option C2: New ClusterService API ```protobuf service ClusterService { // Existing rpc AddMember(AddMemberRequest) returns (AddMemberResponse); // New rpc TransferSnapshot(TransferSnapshotRequest) returns (stream TransferSnapshotResponse); } message TransferSnapshotRequest { uint64 target_node_id = 1; string target_addr = 2; } message TransferSnapshotResponse { bytes chunk = 1; bool done = 2; SnapshotMeta meta = 3; // Only in first chunk } ``` Modified join flow: 1. `ClusterService::add_member()` first calls `TransferSnapshot()` to pre-seed 2. Waits for learner to apply snapshot 3. Then calls `add_learner()` #### Option C3: Bootstrap from Snapshot Add config option `bootstrap_from = "node_id"`: - Node fetches snapshot from specified node on startup - Applies it before joining cluster - Then waits for `add_learner()` call ### Recommended Approach: C2 (API-based) **Pros:** - Automated, no manual intervention - Works with dynamic cluster expansion - Fits existing gRPC architecture **Cons:** - More code to implement (~200-300L) - Snapshot transfer adds latency to join ### Files to Modify 1. `chainfire/proto/cluster.proto` - Add TransferSnapshot RPC 2. `chainfire-api/src/cluster_service.rs` - Implement snapshot transfer 3. `chainfire-api/src/cluster_service.rs` - Modify add_member flow 4. `chainfire-storage/src/snapshot.rs` - Expose snapshot APIs ### Test Plan 1. Start single-node cluster 2. Write some data (create entries in log) 3. Start second node 4. Call add_member() - should trigger snapshot transfer 5. Verify second node receives data 6. Verify no assertion failures ### Estimated Effort - Implementation: 3-4 hours - Testing: 1-2 hours - Total: 4-6 hours ### Status - [x] Research complete - [ ] Awaiting 24h timer for upstream OpenRaft response - [ ] Implementation (if needed)