photoncloud-monorepo/flaredb/specs/004-multi-raft/tasks.md
centra 8f94aee1fa Fix R8: Convert submodule gitlinks to regular directories
- Remove gitlinks (160000 mode) for chainfire, flaredb, iam
- Add workspace contents as regular tracked files
- Update flake.nix to use simple paths instead of builtins.fetchGit

This resolves the nix build failure where submodule directories
appeared empty in the nix store.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-09 16:51:20 +09:00

5.6 KiB
Raw Blame History

description
Task list for Multi-Raft (Static -> Split -> Move)

Tasks: Multi-Raft (Static -> Split -> Move)

Input: Design documents from /specs/004-multi-raft/
Prerequisites: plan.md (required), spec.md (user stories), research.md, data-model.md, contracts/

Tests: Required per constitution; include unit/integration tests for multi-region routing, split, confchange/move.

Organization: Tasks are grouped by user story to enable independent implementation and testing.

Format: [ID] [P?] [Story] Description

  • [P]: Can run in parallel (different files, no dependencies)
  • [Story]: Which user story this task belongs to (e.g., US1, US2, US3)
  • Include exact file paths in descriptions

Phase 1: Setup (Shared Infrastructure)

Purpose: Prepare store/container and region-aware routing foundations.

  • T001 Add Store container skeleton managing RegionID->Peer map in rdb-server/src/store.rs
  • T002 Wire RaftService to dispatch by region_id via Store in rdb-server/src/raft_service.rs
  • T003 Add region-aware KV routing (Key->Region) stub in rdb-server/src/service.rs
  • T004 Region-prefixed Raft storage keys to isolate logs/hs/conf in rdb-server/src/raft_storage.rs
  • T005 Update main startup to init Store from PD initial region meta in rdb-server/src/main.rs

Phase 2: Foundational (Blocking Prerequisites)

Purpose: PD integration and routing validation.

  • T006 Add PD client call to fetch initial region metadata in rdb-proto/src/pdpb.proto and rdb-server/src/main.rs
  • T007 Add routing cache (Region range map) with PD heartbeat refresh in rdb-server/src/service.rs
  • T008 Add multi-region Raft message dispatch tests in rdb-server/tests/test_multi_region.rs
  • T009 Add KV routing tests for disjoint regions in rdb-server/tests/test_multi_region.rs

Checkpoint: Multiple regions can start, elect leaders, and route KV without interference.


Phase 3: User Story 1 - PD主導の複数Region起動 (Priority: P1)

Goal: Auto-start multiple regions from PD meta; independent read/write per region.

Tests

  • T010 [US1] Integration test: startup with PD returning 2 regions; both elect leaders and accept writes in rdb-server/tests/test_multi_region.rs

Implementation

  • T011 [US1] Store registers peers per PD region meta; validation for overlapping ranges in rdb-server/src/store.rs
  • T012 [US1] KV service uses region router from PD meta to propose to correct peer in rdb-server/src/service.rs
  • T013 [US1] Structured errors for unknown region/key-range in rdb-server/src/service.rs

Checkpoint: Two+ regions operate independently with PD-provided meta.


Phase 4: User Story 2 - Region Split (Priority: P1)

Goal: Detect size threshold and split online into two regions.

Tests

  • T014 [US2] Split trigger test (approx size over threshold) in rdb-server/tests/test_split.rs
  • T015 [US2] Post-split routing test: keys before/after split_key go to correct regions in rdb-server/tests/test_split.rs

Implementation

  • T016 [US2] Approximate size measurement and threshold check in rdb-server/src/store.rs
  • T017 [US2] Define/apply Split raft command; update region meta atomically in rdb-server/src/peer.rs
  • T018 [US2] Create/register new peer for split region and update routing map in rdb-server/src/store.rs
  • T019 [US2] Persist updated region metadata (start/end keys) in rdb-server/src/store.rs

Checkpoint: Region splits online; post-split read/write succeeds in both regions.


Phase 5: User Story 3 - Region Move (Priority: P2)

Goal: Rebalance region replicas via ConfChange (add → catch-up → remove).

Tests

  • T020 [US3] ConfChange add/remove replica test across two stores in rdb-server/tests/test_confchange.rs
  • T021 [US3] Move scenario: PD directs move, data reachable after move in rdb-server/tests/test_confchange.rs

Implementation

  • T022 [US3] Implement ConfChange apply for add/remove node per region in rdb-server/src/peer.rs
  • T023 [US3] PD heartbeat reporting region list/size and apply PD move directives in rdb-server/src/store.rs
  • T024 [US3] Snapshot/fast catch-up path for new replica join in rdb-server/src/peer.rs

Checkpoint: Region can move between stores without data loss; quorum maintained.


Phase 6: Polish & Cross-Cutting Concerns

Purpose: Hardening, docs, and verification.

  • T025 Update contracts for PD/Region RPCs in specs/004-multi-raft/contracts/
  • T026 Update data-model for Region/Store/PlacementMeta in specs/004-multi-raft/data-model.md
  • T027 Quickstart covering multi-region start, split, move flows in specs/004-multi-raft/quickstart.md
  • T028 Verification script to run multi-region/split/move tests in scripts/verify-multiraft.sh
  • T029 [P] Cleanup warnings, run cargo fmt, cargo test -p rdb-server --tests across workspace

Dependencies & Execution Order

  • Phase 1 → Phase 2 → US1 → US2 → US3 → Polish
  • Split (US2) depends on routing in US1; Move (US3) depends on ConfChange plumbing.

Parallel Examples

  • T008 and T009 can run in parallel after T002/T003/T004 (multi-region dispatch + routing tests).
  • T014 and T015 can run in parallel after routing map is in place (post-split tests).
  • T020 and T021 can run in parallel once ConfChange scaffolding exists.

Implementation Strategy

  1. Lay Store/routing foundations (Phase 12).
  2. Deliver US1 (PD-driven multi-region start).
  3. Add Split path (US2).
  4. Add ConfChange/move path (US3).
  5. Polish docs/contracts/verify script.