photoncloud-monorepo/flaredb/specs/002-raft-features/tasks.md
centra 8f94aee1fa Fix R8: Convert submodule gitlinks to regular directories
- Remove gitlinks (160000 mode) for chainfire, flaredb, iam
- Add workspace contents as regular tracked files
- Update flake.nix to use simple paths instead of builtins.fetchGit

This resolves the nix build failure where submodule directories
appeared empty in the nix store.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-09 16:51:20 +09:00

128 lines
5.9 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

---
description: "Task list for Raft Core Replication"
---
# Tasks: Raft Core Replication
**Input**: Design documents from `/specs/002-raft-features/`
**Prerequisites**: plan.md (required), spec.md (required for user stories), research.md, data-model.md, contracts/
**Tests**: Required per constitution; include unit/integration tests for Raft storage, proposal/commit, replication, and recovery.
**Organization**: Tasks are grouped by user story to enable independent implementation and testing.
## Format: `[ID] [P?] [Story] Description`
- **[P]**: Can run in parallel (different files, no dependencies)
- **[Story]**: Which user story this task belongs to (e.g., US1, US2, US3)
- Include exact file paths in descriptions
## Phase 1: Setup (Shared Infrastructure)
**Purpose**: Ensure tooling and layout are ready for Raft feature work.
- [X] T001 Verify Raft proto service definition matches contract in `rdb-proto/src/raft_server.proto`
- [X] T002 Ensure Raft gRPC server/client wiring is enabled in `rdb-server/src/main.rs` and `rdb-server/src/raft_service.rs`
---
## Phase 2: Foundational (Blocking Prerequisites)
**Purpose**: Durable Raft storage primitives required by all stories.
- [X] T003 Implement complete Raft storage persistence (log/hard state/conf state read/write) in `rdb-server/src/raft_storage.rs`
- [X] T004 Add unit tests for Raft storage persistence (log append, load, truncate) in `rdb-server/src/raft_storage.rs`
- [X] T005 Ensure Peer ready loop persists entries and hard state before apply in `rdb-server/src/peer.rs`
**Checkpoint**: Raft storage durability verified.
---
## Phase 3: User Story 1 - Single-Node Raft Baseline (Priority: P1)
**Goal**: Single node can self-elect, propose, commit, and apply entries to storage.
**Independent Test**: Run unit/integration tests that start one peer, campaign, propose a command, and verify commit/apply and durable log.
### Tests
- [X] T006 [US1] Add single-node campaign/propose/apply test in `rdb-server/src/peer.rs` (cfg(test)) or `rdb-server/tests/test_single_node.rs`
### Implementation
- [X] T007 [US1] Implement Peer campaign/propose handling with log apply in `rdb-server/src/peer.rs`
- [X] T008 [US1] Expose a simple propose entry point (e.g., CLI or helper) for single-node testing in `rdb-server/src/main.rs`
- [X] T009 [US1] Validate single-node flow passes tests and persists entries (run `cargo test -p rdb-server -- single_node`)
**Checkpoint**: Single-node Raft end-to-end verified.
---
## Phase 4: User Story 2 - Multi-Node Replication (Priority: P1)
**Goal**: 3-node cluster replicates entries to a majority; leader/follower paths wired via gRPC.
**Independent Test**: Integration harness spins up 3 nodes, elects leader, proposes entry, asserts commit on at least 2 nodes.
### Tests
- [X] T010 [US2] Create 3-node integration test harness in `rdb-server/tests/test_replication.rs` to validate majority commit
### Implementation
- [X] T011 [US2] Wire RaftService transport send/receive to dispatch messages to peers in `rdb-server/src/raft_service.rs`
- [X] T012 [P] [US2] Implement peer registry/peer manager to track remote addresses and send Raft messages in `rdb-server/src/peer_manager.rs`
- [X] T013 [US2] Update server startup to create/join fixed 3-node cluster with configured peers in `rdb-server/src/main.rs`
- [X] T014 [US2] Ensure ready loop sends outbound messages produced by RawNode in `rdb-server/src/peer.rs`
- [X] T015 [US2] Verify majority replication via integration harness (run `cargo test -p rdb-server -- test_replication`)
**Checkpoint**: Majority replication validated on 3 nodes.
---
## Phase 5: User Story 3 - Failure and Recovery (Priority: P2)
**Goal**: Followers can restart and catch up without losing committed entries; isolation prevents commits.
**Independent Test**: Integration test stops a follower, commits entry while down, restarts follower, and verifies log reconciliation and apply.
### Tests
- [X] T016 [US3] Add follower restart/catch-up integration test in `rdb-server/tests/test_recovery.rs`
- [X] T016 [US3] Add follower restart/catch-up integration test in `rdb-server/tests/test_recovery.rs` (in progress; currently ignored in `test_replication.rs`)
### Implementation
- [X] T017 [US3] Implement startup recovery: load HardState/ConfState/log and reconcile via AppendEntries in `rdb-server/src/peer.rs`
- [X] T018 [US3] Handle log truncate/append on conflict and apply committed entries after recovery in `rdb-server/src/peer.rs`
- [X] T019 [US3] Add isolation guard: prevent commit advancement on minority partition detection (e.g., via quorum checks) in `rdb-server/src/peer.rs`
- [X] T020 [US3] Validate recovery/integration tests pass (run `cargo test -p rdb-server -- test_recovery`)
**Checkpoint**: Recovery and partition safety validated.
---
## Phase 6: Polish & Cross-Cutting Concerns
**Purpose**: Hardening and operability.
- [X] T021 Add structured Raft logging (term/index/apply/commit) in `rdb-server` with slog
- [X] T022 Add quickstart or script to launch 3-node cluster and run replication test in `scripts/verify-raft.sh`
- [X] T023 Run full workspace tests and format/lint (`cargo test`, `cargo fmt`, `cargo clippy`)
---
## Dependencies & Execution Order
- Foundational (Phase 2) blocks all Raft user stories.
- US1 must complete before US2/US3 (builds basic propose/apply).
- US2 should precede US3 (replication before recovery).
- Polish runs last.
## Parallel Examples
- T011 (transport wiring) and T012 (peer manager) can proceed in parallel once T003T005 are done.
- US2 tests (T010) can be authored in parallel with transport implementation, then enabled once wiring lands.
- Logging and script polish (T021T022) can run in parallel after core stories complete.
## Implementation Strategy
1. Complete Foundational (durable storage).
2. Deliver US1 (single-node MVP).
3. Deliver US2 (majority replication).
4. Deliver US3 (recovery/partition safety).
5. Polish (logging, scripts, fmt/clippy).