photoncloud-monorepo/flaredb/specs/002-raft-features/tasks.md
centra 8f94aee1fa Fix R8: Convert submodule gitlinks to regular directories
- Remove gitlinks (160000 mode) for chainfire, flaredb, iam
- Add workspace contents as regular tracked files
- Update flake.nix to use simple paths instead of builtins.fetchGit

This resolves the nix build failure where submodule directories
appeared empty in the nix store.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-09 16:51:20 +09:00

5.9 KiB
Raw Blame History

description
Task list for Raft Core Replication

Tasks: Raft Core Replication

Input: Design documents from /specs/002-raft-features/
Prerequisites: plan.md (required), spec.md (required for user stories), research.md, data-model.md, contracts/

Tests: Required per constitution; include unit/integration tests for Raft storage, proposal/commit, replication, and recovery.

Organization: Tasks are grouped by user story to enable independent implementation and testing.

Format: [ID] [P?] [Story] Description

  • [P]: Can run in parallel (different files, no dependencies)
  • [Story]: Which user story this task belongs to (e.g., US1, US2, US3)
  • Include exact file paths in descriptions

Phase 1: Setup (Shared Infrastructure)

Purpose: Ensure tooling and layout are ready for Raft feature work.

  • T001 Verify Raft proto service definition matches contract in rdb-proto/src/raft_server.proto
  • T002 Ensure Raft gRPC server/client wiring is enabled in rdb-server/src/main.rs and rdb-server/src/raft_service.rs

Phase 2: Foundational (Blocking Prerequisites)

Purpose: Durable Raft storage primitives required by all stories.

  • T003 Implement complete Raft storage persistence (log/hard state/conf state read/write) in rdb-server/src/raft_storage.rs
  • T004 Add unit tests for Raft storage persistence (log append, load, truncate) in rdb-server/src/raft_storage.rs
  • T005 Ensure Peer ready loop persists entries and hard state before apply in rdb-server/src/peer.rs

Checkpoint: Raft storage durability verified.


Phase 3: User Story 1 - Single-Node Raft Baseline (Priority: P1)

Goal: Single node can self-elect, propose, commit, and apply entries to storage.

Independent Test: Run unit/integration tests that start one peer, campaign, propose a command, and verify commit/apply and durable log.

Tests

  • T006 [US1] Add single-node campaign/propose/apply test in rdb-server/src/peer.rs (cfg(test)) or rdb-server/tests/test_single_node.rs

Implementation

  • T007 [US1] Implement Peer campaign/propose handling with log apply in rdb-server/src/peer.rs
  • T008 [US1] Expose a simple propose entry point (e.g., CLI or helper) for single-node testing in rdb-server/src/main.rs
  • T009 [US1] Validate single-node flow passes tests and persists entries (run cargo test -p rdb-server -- single_node)

Checkpoint: Single-node Raft end-to-end verified.


Phase 4: User Story 2 - Multi-Node Replication (Priority: P1)

Goal: 3-node cluster replicates entries to a majority; leader/follower paths wired via gRPC.

Independent Test: Integration harness spins up 3 nodes, elects leader, proposes entry, asserts commit on at least 2 nodes.

Tests

  • T010 [US2] Create 3-node integration test harness in rdb-server/tests/test_replication.rs to validate majority commit

Implementation

  • T011 [US2] Wire RaftService transport send/receive to dispatch messages to peers in rdb-server/src/raft_service.rs
  • T012 [P] [US2] Implement peer registry/peer manager to track remote addresses and send Raft messages in rdb-server/src/peer_manager.rs
  • T013 [US2] Update server startup to create/join fixed 3-node cluster with configured peers in rdb-server/src/main.rs
  • T014 [US2] Ensure ready loop sends outbound messages produced by RawNode in rdb-server/src/peer.rs
  • T015 [US2] Verify majority replication via integration harness (run cargo test -p rdb-server -- test_replication)

Checkpoint: Majority replication validated on 3 nodes.


Phase 5: User Story 3 - Failure and Recovery (Priority: P2)

Goal: Followers can restart and catch up without losing committed entries; isolation prevents commits.

Independent Test: Integration test stops a follower, commits entry while down, restarts follower, and verifies log reconciliation and apply.

Tests

  • T016 [US3] Add follower restart/catch-up integration test in rdb-server/tests/test_recovery.rs
  • T016 [US3] Add follower restart/catch-up integration test in rdb-server/tests/test_recovery.rs (in progress; currently ignored in test_replication.rs)

Implementation

  • T017 [US3] Implement startup recovery: load HardState/ConfState/log and reconcile via AppendEntries in rdb-server/src/peer.rs
  • T018 [US3] Handle log truncate/append on conflict and apply committed entries after recovery in rdb-server/src/peer.rs
  • T019 [US3] Add isolation guard: prevent commit advancement on minority partition detection (e.g., via quorum checks) in rdb-server/src/peer.rs
  • T020 [US3] Validate recovery/integration tests pass (run cargo test -p rdb-server -- test_recovery)

Checkpoint: Recovery and partition safety validated.


Phase 6: Polish & Cross-Cutting Concerns

Purpose: Hardening and operability.

  • T021 Add structured Raft logging (term/index/apply/commit) in rdb-server with slog
  • T022 Add quickstart or script to launch 3-node cluster and run replication test in scripts/verify-raft.sh
  • T023 Run full workspace tests and format/lint (cargo test, cargo fmt, cargo clippy)

Dependencies & Execution Order

  • Foundational (Phase 2) blocks all Raft user stories.
  • US1 must complete before US2/US3 (builds basic propose/apply).
  • US2 should precede US3 (replication before recovery).
  • Polish runs last.

Parallel Examples

  • T011 (transport wiring) and T012 (peer manager) can proceed in parallel once T003T005 are done.
  • US2 tests (T010) can be authored in parallel with transport implementation, then enabled once wiring lands.
  • Logging and script polish (T021T022) can run in parallel after core stories complete.

Implementation Strategy

  1. Complete Foundational (durable storage).
  2. Deliver US1 (single-node MVP).
  3. Deliver US2 (majority replication).
  4. Deliver US3 (recovery/partition safety).
  5. Polish (logging, scripts, fmt/clippy).