- Remove gitlinks (160000 mode) for chainfire, flaredb, iam - Add workspace contents as regular tracked files - Update flake.nix to use simple paths instead of builtins.fetchGit This resolves the nix build failure where submodule directories appeared empty in the nix store. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
5.9 KiB
| description |
|---|
| Task list for Raft Core Replication |
Tasks: Raft Core Replication
Input: Design documents from /specs/002-raft-features/
Prerequisites: plan.md (required), spec.md (required for user stories), research.md, data-model.md, contracts/
Tests: Required per constitution; include unit/integration tests for Raft storage, proposal/commit, replication, and recovery.
Organization: Tasks are grouped by user story to enable independent implementation and testing.
Format: [ID] [P?] [Story] Description
- [P]: Can run in parallel (different files, no dependencies)
- [Story]: Which user story this task belongs to (e.g., US1, US2, US3)
- Include exact file paths in descriptions
Phase 1: Setup (Shared Infrastructure)
Purpose: Ensure tooling and layout are ready for Raft feature work.
- T001 Verify Raft proto service definition matches contract in
rdb-proto/src/raft_server.proto - T002 Ensure Raft gRPC server/client wiring is enabled in
rdb-server/src/main.rsandrdb-server/src/raft_service.rs
Phase 2: Foundational (Blocking Prerequisites)
Purpose: Durable Raft storage primitives required by all stories.
- T003 Implement complete Raft storage persistence (log/hard state/conf state read/write) in
rdb-server/src/raft_storage.rs - T004 Add unit tests for Raft storage persistence (log append, load, truncate) in
rdb-server/src/raft_storage.rs - T005 Ensure Peer ready loop persists entries and hard state before apply in
rdb-server/src/peer.rs
Checkpoint: Raft storage durability verified.
Phase 3: User Story 1 - Single-Node Raft Baseline (Priority: P1)
Goal: Single node can self-elect, propose, commit, and apply entries to storage.
Independent Test: Run unit/integration tests that start one peer, campaign, propose a command, and verify commit/apply and durable log.
Tests
- T006 [US1] Add single-node campaign/propose/apply test in
rdb-server/src/peer.rs(cfg(test)) orrdb-server/tests/test_single_node.rs
Implementation
- T007 [US1] Implement Peer campaign/propose handling with log apply in
rdb-server/src/peer.rs - T008 [US1] Expose a simple propose entry point (e.g., CLI or helper) for single-node testing in
rdb-server/src/main.rs - T009 [US1] Validate single-node flow passes tests and persists entries (run
cargo test -p rdb-server -- single_node)
Checkpoint: Single-node Raft end-to-end verified.
Phase 4: User Story 2 - Multi-Node Replication (Priority: P1)
Goal: 3-node cluster replicates entries to a majority; leader/follower paths wired via gRPC.
Independent Test: Integration harness spins up 3 nodes, elects leader, proposes entry, asserts commit on at least 2 nodes.
Tests
- T010 [US2] Create 3-node integration test harness in
rdb-server/tests/test_replication.rsto validate majority commit
Implementation
- T011 [US2] Wire RaftService transport send/receive to dispatch messages to peers in
rdb-server/src/raft_service.rs - T012 [P] [US2] Implement peer registry/peer manager to track remote addresses and send Raft messages in
rdb-server/src/peer_manager.rs - T013 [US2] Update server startup to create/join fixed 3-node cluster with configured peers in
rdb-server/src/main.rs - T014 [US2] Ensure ready loop sends outbound messages produced by RawNode in
rdb-server/src/peer.rs - T015 [US2] Verify majority replication via integration harness (run
cargo test -p rdb-server -- test_replication)
Checkpoint: Majority replication validated on 3 nodes.
Phase 5: User Story 3 - Failure and Recovery (Priority: P2)
Goal: Followers can restart and catch up without losing committed entries; isolation prevents commits.
Independent Test: Integration test stops a follower, commits entry while down, restarts follower, and verifies log reconciliation and apply.
Tests
- T016 [US3] Add follower restart/catch-up integration test in
rdb-server/tests/test_recovery.rs - T016 [US3] Add follower restart/catch-up integration test in
rdb-server/tests/test_recovery.rs(in progress; currently ignored intest_replication.rs)
Implementation
- T017 [US3] Implement startup recovery: load HardState/ConfState/log and reconcile via AppendEntries in
rdb-server/src/peer.rs - T018 [US3] Handle log truncate/append on conflict and apply committed entries after recovery in
rdb-server/src/peer.rs - T019 [US3] Add isolation guard: prevent commit advancement on minority partition detection (e.g., via quorum checks) in
rdb-server/src/peer.rs - T020 [US3] Validate recovery/integration tests pass (run
cargo test -p rdb-server -- test_recovery)
Checkpoint: Recovery and partition safety validated.
Phase 6: Polish & Cross-Cutting Concerns
Purpose: Hardening and operability.
- T021 Add structured Raft logging (term/index/apply/commit) in
rdb-serverwith slog - T022 Add quickstart or script to launch 3-node cluster and run replication test in
scripts/verify-raft.sh - T023 Run full workspace tests and format/lint (
cargo test,cargo fmt,cargo clippy)
Dependencies & Execution Order
- Foundational (Phase 2) blocks all Raft user stories.
- US1 must complete before US2/US3 (builds basic propose/apply).
- US2 should precede US3 (replication before recovery).
- Polish runs last.
Parallel Examples
- T011 (transport wiring) and T012 (peer manager) can proceed in parallel once T003–T005 are done.
- US2 tests (T010) can be authored in parallel with transport implementation, then enabled once wiring lands.
- Logging and script polish (T021–T022) can run in parallel after core stories complete.
Implementation Strategy
- Complete Foundational (durable storage).
- Deliver US1 (single-node MVP).
- Deliver US2 (majority replication).
- Deliver US3 (recovery/partition safety).
- Polish (logging, scripts, fmt/clippy).