- Remove gitlinks (160000 mode) for chainfire, flaredb, iam - Add workspace contents as regular tracked files - Update flake.nix to use simple paths instead of builtins.fetchGit This resolves the nix build failure where submodule directories appeared empty in the nix store. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
92 lines
4.9 KiB
Markdown
92 lines
4.9 KiB
Markdown
# Feature Specification: Raft Core Replication
|
||
|
||
**Feature Branch**: `002-raft-features`
|
||
**Created**: 2025-12-01
|
||
**Status**: Draft
|
||
**Input**: User description: "Raft関連の機能についてお願いします。"
|
||
|
||
## Clarifications
|
||
|
||
### Session 2025-12-01
|
||
- Q: Should this phase assume fixed 3-node membership or include dynamic membership? → A: Fixed 3-node, extensible for future scaling.
|
||
|
||
## User Scenarios & Testing *(mandatory)*
|
||
|
||
### User Story 1 - Single-Node Raft Baseline (Priority: P1)
|
||
|
||
As a platform engineer, I want a single-node Raft instance to accept proposals, elect a leader, and persist committed entries so I can validate the log/storage plumbing before scaling out.
|
||
|
||
**Why this priority**: Establishes correctness of log append/apply and persistence; blocks multi-node rollout.
|
||
|
||
**Independent Test**: Start one node, trigger self-election, propose an entry, verify it is committed and applied to storage with the expected data.
|
||
|
||
**Acceptance Scenarios**:
|
||
|
||
1. **Given** a single node started fresh, **When** it campaigns, **Then** it becomes leader and can accept proposals.
|
||
2. **Given** a proposed entry "e1", **When** it commits, **Then** storage contains "e1" and last index increments by 1.
|
||
|
||
---
|
||
|
||
### User Story 2 - Multi-Node Replication (Priority: P1)
|
||
|
||
As a platform engineer, I want a 3-node Raft cluster to replicate entries to a majority so that writes remain durable under follower failure.
|
||
|
||
**Why this priority**: Majority replication is the core availability guarantee of Raft.
|
||
|
||
**Independent Test**: Start 3 nodes, elect a leader, propose an entry; verify leader and at least one follower store the entry at the same index/term and report commit.
|
||
|
||
**Acceptance Scenarios**:
|
||
|
||
1. **Given** a 3-node cluster, **When** a leader is elected, **Then** at least two nodes acknowledge commit for the same index/term.
|
||
2. **Given** a committed entry on the leader, **When** one follower is stopped, **Then** the other follower still receives and persists the entry.
|
||
|
||
---
|
||
|
||
### User Story 3 - Failure and Recovery (Priority: P2)
|
||
|
||
As an operator, I want a stopped follower to recover and catch up without losing committed data so that the cluster can heal after restarts.
|
||
|
||
**Why this priority**: Ensures durability across restarts and supports rolling maintenance.
|
||
|
||
**Independent Test**: Commit an entry, stop a follower, commit another entry, restart the follower; verify it restores state and applies all committed entries.
|
||
|
||
**Acceptance Scenarios**:
|
||
|
||
1. **Given** a follower stopped after entry N is committed, **When** the cluster commits entry N+1 while it is down, **Then** on restart the follower installs both entries in order.
|
||
2. **Given** divergent logs on restart, **When** leader sends AppendEntries, **Then** follower truncates/aligns to leader and preserves committed suffix.
|
||
|
||
---
|
||
|
||
### Edge Cases
|
||
|
||
- Leader crash immediately after commit but before followers apply.
|
||
- Network partition isolating a minority vs. majority; minority must not commit new entries.
|
||
- Log holes or conflicting terms on recovery must be reconciled to leader’s log.
|
||
|
||
## Requirements *(mandatory)*
|
||
|
||
### Functional Requirements
|
||
|
||
- **FR-001**: The system MUST support single-node leader election and proposal handling without external coordination.
|
||
- **FR-002**: The system MUST replicate log entries to a majority in a 3-node cluster before marking them committed.
|
||
- **FR-003**: The system MUST persist log entries, hard state (term, vote), and conf state to durable storage so that restarts preserve committed progress.
|
||
- **FR-004**: The system MUST apply committed entries to the underlying storage engine in log order without gaps.
|
||
- **FR-005**: The system MUST prevent a node in a minority partition from committing new entries while isolated.
|
||
- **FR-006**: On restart, a node MUST reconcile its log with the leader (truncate/append) to match the committed log and reapply missing committed entries.
|
||
- **FR-007**: For this phase, operate a fixed 3-node membership (no dynamic add/remove), but architecture must allow future extension to scale out safely.
|
||
|
||
### Key Entities
|
||
|
||
- **Peer**: A Raft node with ID, region scope, in-memory state machine, and access to durable Raft storage.
|
||
- **Raft Log Entry**: Indexed record containing term and opaque command bytes; persisted and replicated.
|
||
- **Hard State**: Term, vote, commit index persisted to ensure safety across restarts.
|
||
- **Conf State**: Voter set defining the quorum for replication.
|
||
|
||
## Success Criteria *(mandatory)*
|
||
|
||
### Measurable Outcomes
|
||
|
||
- **SC-001**: Single-node bootstraps and accepts a proposal within 2 seconds, committing it and persisting the entry.
|
||
- **SC-002**: In a 3-node cluster, a committed entry is present on at least two nodes within 3 seconds of proposal.
|
||
- **SC-003**: After a follower restart, all previously committed entries are restored and applied in order within 5 seconds of rejoining a healthy leader.
|
||
- **SC-004**: During a minority partition, isolated nodes do not advance commit index or apply uncommitted entries.
|