- Remove gitlinks (160000 mode) for chainfire, flaredb, iam - Add workspace contents as regular tracked files - Update flake.nix to use simple paths instead of builtins.fetchGit This resolves the nix build failure where submodule directories appeared empty in the nix store. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
128 lines
5.9 KiB
Markdown
128 lines
5.9 KiB
Markdown
---
|
||
description: "Task list for Raft Core Replication"
|
||
---
|
||
|
||
# Tasks: Raft Core Replication
|
||
|
||
**Input**: Design documents from `/specs/002-raft-features/`
|
||
**Prerequisites**: plan.md (required), spec.md (required for user stories), research.md, data-model.md, contracts/
|
||
|
||
**Tests**: Required per constitution; include unit/integration tests for Raft storage, proposal/commit, replication, and recovery.
|
||
|
||
**Organization**: Tasks are grouped by user story to enable independent implementation and testing.
|
||
|
||
## Format: `[ID] [P?] [Story] Description`
|
||
|
||
- **[P]**: Can run in parallel (different files, no dependencies)
|
||
- **[Story]**: Which user story this task belongs to (e.g., US1, US2, US3)
|
||
- Include exact file paths in descriptions
|
||
|
||
## Phase 1: Setup (Shared Infrastructure)
|
||
|
||
**Purpose**: Ensure tooling and layout are ready for Raft feature work.
|
||
|
||
- [X] T001 Verify Raft proto service definition matches contract in `rdb-proto/src/raft_server.proto`
|
||
- [X] T002 Ensure Raft gRPC server/client wiring is enabled in `rdb-server/src/main.rs` and `rdb-server/src/raft_service.rs`
|
||
|
||
---
|
||
|
||
## Phase 2: Foundational (Blocking Prerequisites)
|
||
|
||
**Purpose**: Durable Raft storage primitives required by all stories.
|
||
|
||
- [X] T003 Implement complete Raft storage persistence (log/hard state/conf state read/write) in `rdb-server/src/raft_storage.rs`
|
||
- [X] T004 Add unit tests for Raft storage persistence (log append, load, truncate) in `rdb-server/src/raft_storage.rs`
|
||
- [X] T005 Ensure Peer ready loop persists entries and hard state before apply in `rdb-server/src/peer.rs`
|
||
|
||
**Checkpoint**: Raft storage durability verified.
|
||
|
||
---
|
||
|
||
## Phase 3: User Story 1 - Single-Node Raft Baseline (Priority: P1)
|
||
|
||
**Goal**: Single node can self-elect, propose, commit, and apply entries to storage.
|
||
|
||
**Independent Test**: Run unit/integration tests that start one peer, campaign, propose a command, and verify commit/apply and durable log.
|
||
|
||
### Tests
|
||
- [X] T006 [US1] Add single-node campaign/propose/apply test in `rdb-server/src/peer.rs` (cfg(test)) or `rdb-server/tests/test_single_node.rs`
|
||
|
||
### Implementation
|
||
- [X] T007 [US1] Implement Peer campaign/propose handling with log apply in `rdb-server/src/peer.rs`
|
||
- [X] T008 [US1] Expose a simple propose entry point (e.g., CLI or helper) for single-node testing in `rdb-server/src/main.rs`
|
||
- [X] T009 [US1] Validate single-node flow passes tests and persists entries (run `cargo test -p rdb-server -- single_node`)
|
||
|
||
**Checkpoint**: Single-node Raft end-to-end verified.
|
||
|
||
---
|
||
|
||
## Phase 4: User Story 2 - Multi-Node Replication (Priority: P1)
|
||
|
||
**Goal**: 3-node cluster replicates entries to a majority; leader/follower paths wired via gRPC.
|
||
|
||
**Independent Test**: Integration harness spins up 3 nodes, elects leader, proposes entry, asserts commit on at least 2 nodes.
|
||
|
||
### Tests
|
||
- [X] T010 [US2] Create 3-node integration test harness in `rdb-server/tests/test_replication.rs` to validate majority commit
|
||
|
||
### Implementation
|
||
- [X] T011 [US2] Wire RaftService transport send/receive to dispatch messages to peers in `rdb-server/src/raft_service.rs`
|
||
- [X] T012 [P] [US2] Implement peer registry/peer manager to track remote addresses and send Raft messages in `rdb-server/src/peer_manager.rs`
|
||
- [X] T013 [US2] Update server startup to create/join fixed 3-node cluster with configured peers in `rdb-server/src/main.rs`
|
||
- [X] T014 [US2] Ensure ready loop sends outbound messages produced by RawNode in `rdb-server/src/peer.rs`
|
||
- [X] T015 [US2] Verify majority replication via integration harness (run `cargo test -p rdb-server -- test_replication`)
|
||
|
||
**Checkpoint**: Majority replication validated on 3 nodes.
|
||
|
||
---
|
||
|
||
## Phase 5: User Story 3 - Failure and Recovery (Priority: P2)
|
||
|
||
**Goal**: Followers can restart and catch up without losing committed entries; isolation prevents commits.
|
||
|
||
**Independent Test**: Integration test stops a follower, commits entry while down, restarts follower, and verifies log reconciliation and apply.
|
||
|
||
### Tests
|
||
- [X] T016 [US3] Add follower restart/catch-up integration test in `rdb-server/tests/test_recovery.rs`
|
||
- [X] T016 [US3] Add follower restart/catch-up integration test in `rdb-server/tests/test_recovery.rs` (in progress; currently ignored in `test_replication.rs`)
|
||
|
||
### Implementation
|
||
- [X] T017 [US3] Implement startup recovery: load HardState/ConfState/log and reconcile via AppendEntries in `rdb-server/src/peer.rs`
|
||
- [X] T018 [US3] Handle log truncate/append on conflict and apply committed entries after recovery in `rdb-server/src/peer.rs`
|
||
- [X] T019 [US3] Add isolation guard: prevent commit advancement on minority partition detection (e.g., via quorum checks) in `rdb-server/src/peer.rs`
|
||
- [X] T020 [US3] Validate recovery/integration tests pass (run `cargo test -p rdb-server -- test_recovery`)
|
||
|
||
**Checkpoint**: Recovery and partition safety validated.
|
||
|
||
---
|
||
|
||
## Phase 6: Polish & Cross-Cutting Concerns
|
||
|
||
**Purpose**: Hardening and operability.
|
||
|
||
- [X] T021 Add structured Raft logging (term/index/apply/commit) in `rdb-server` with slog
|
||
- [X] T022 Add quickstart or script to launch 3-node cluster and run replication test in `scripts/verify-raft.sh`
|
||
- [X] T023 Run full workspace tests and format/lint (`cargo test`, `cargo fmt`, `cargo clippy`)
|
||
|
||
---
|
||
|
||
## Dependencies & Execution Order
|
||
|
||
- Foundational (Phase 2) blocks all Raft user stories.
|
||
- US1 must complete before US2/US3 (builds basic propose/apply).
|
||
- US2 should precede US3 (replication before recovery).
|
||
- Polish runs last.
|
||
|
||
## Parallel Examples
|
||
|
||
- T011 (transport wiring) and T012 (peer manager) can proceed in parallel once T003–T005 are done.
|
||
- US2 tests (T010) can be authored in parallel with transport implementation, then enabled once wiring lands.
|
||
- Logging and script polish (T021–T022) can run in parallel after core stories complete.
|
||
|
||
## Implementation Strategy
|
||
|
||
1. Complete Foundational (durable storage).
|
||
2. Deliver US1 (single-node MVP).
|
||
3. Deliver US2 (majority replication).
|
||
4. Deliver US3 (recovery/partition safety).
|
||
5. Polish (logging, scripts, fmt/clippy).
|