photoncloud-monorepo/flaredb/specs/002-raft-features/spec.md
centra 8f94aee1fa Fix R8: Convert submodule gitlinks to regular directories
- Remove gitlinks (160000 mode) for chainfire, flaredb, iam
- Add workspace contents as regular tracked files
- Update flake.nix to use simple paths instead of builtins.fetchGit

This resolves the nix build failure where submodule directories
appeared empty in the nix store.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-09 16:51:20 +09:00

4.9 KiB
Raw Blame History

Feature Specification: Raft Core Replication

Feature Branch: 002-raft-features
Created: 2025-12-01
Status: Draft
Input: User description: "Raft関連の機能についてお願いします。"

Clarifications

Session 2025-12-01

  • Q: Should this phase assume fixed 3-node membership or include dynamic membership? → A: Fixed 3-node, extensible for future scaling.

User Scenarios & Testing (mandatory)

User Story 1 - Single-Node Raft Baseline (Priority: P1)

As a platform engineer, I want a single-node Raft instance to accept proposals, elect a leader, and persist committed entries so I can validate the log/storage plumbing before scaling out.

Why this priority: Establishes correctness of log append/apply and persistence; blocks multi-node rollout.

Independent Test: Start one node, trigger self-election, propose an entry, verify it is committed and applied to storage with the expected data.

Acceptance Scenarios:

  1. Given a single node started fresh, When it campaigns, Then it becomes leader and can accept proposals.
  2. Given a proposed entry "e1", When it commits, Then storage contains "e1" and last index increments by 1.

User Story 2 - Multi-Node Replication (Priority: P1)

As a platform engineer, I want a 3-node Raft cluster to replicate entries to a majority so that writes remain durable under follower failure.

Why this priority: Majority replication is the core availability guarantee of Raft.

Independent Test: Start 3 nodes, elect a leader, propose an entry; verify leader and at least one follower store the entry at the same index/term and report commit.

Acceptance Scenarios:

  1. Given a 3-node cluster, When a leader is elected, Then at least two nodes acknowledge commit for the same index/term.
  2. Given a committed entry on the leader, When one follower is stopped, Then the other follower still receives and persists the entry.

User Story 3 - Failure and Recovery (Priority: P2)

As an operator, I want a stopped follower to recover and catch up without losing committed data so that the cluster can heal after restarts.

Why this priority: Ensures durability across restarts and supports rolling maintenance.

Independent Test: Commit an entry, stop a follower, commit another entry, restart the follower; verify it restores state and applies all committed entries.

Acceptance Scenarios:

  1. Given a follower stopped after entry N is committed, When the cluster commits entry N+1 while it is down, Then on restart the follower installs both entries in order.
  2. Given divergent logs on restart, When leader sends AppendEntries, Then follower truncates/aligns to leader and preserves committed suffix.

Edge Cases

  • Leader crash immediately after commit but before followers apply.
  • Network partition isolating a minority vs. majority; minority must not commit new entries.
  • Log holes or conflicting terms on recovery must be reconciled to leaders log.

Requirements (mandatory)

Functional Requirements

  • FR-001: The system MUST support single-node leader election and proposal handling without external coordination.
  • FR-002: The system MUST replicate log entries to a majority in a 3-node cluster before marking them committed.
  • FR-003: The system MUST persist log entries, hard state (term, vote), and conf state to durable storage so that restarts preserve committed progress.
  • FR-004: The system MUST apply committed entries to the underlying storage engine in log order without gaps.
  • FR-005: The system MUST prevent a node in a minority partition from committing new entries while isolated.
  • FR-006: On restart, a node MUST reconcile its log with the leader (truncate/append) to match the committed log and reapply missing committed entries.
  • FR-007: For this phase, operate a fixed 3-node membership (no dynamic add/remove), but architecture must allow future extension to scale out safely.

Key Entities

  • Peer: A Raft node with ID, region scope, in-memory state machine, and access to durable Raft storage.
  • Raft Log Entry: Indexed record containing term and opaque command bytes; persisted and replicated.
  • Hard State: Term, vote, commit index persisted to ensure safety across restarts.
  • Conf State: Voter set defining the quorum for replication.

Success Criteria (mandatory)

Measurable Outcomes

  • SC-001: Single-node bootstraps and accepts a proposal within 2 seconds, committing it and persisting the entry.
  • SC-002: In a 3-node cluster, a committed entry is present on at least two nodes within 3 seconds of proposal.
  • SC-003: After a follower restart, all previously committed entries are restored and applied in order within 5 seconds of rejoining a healthy leader.
  • SC-004: During a minority partition, isolated nodes do not advance commit index or apply uncommitted entries.