photoncloud-monorepo/flaredb/specs/004-multi-raft/spec.md

# Feature Specification: [FEATURE NAME]

**Feature Branch**: `[###-feature-name]`
**Created**: [DATE]
**Status**: Draft
**Input**: User description: "$ARGUMENTS"

## User Scenarios & Testing *(mandatory)*

<!--
  IMPORTANT: User stories should be PRIORITIZED as user journeys ordered by importance.
  Each user story/journey must be INDEPENDENTLY TESTABLE - meaning if you implement just ONE of them,
  you should still have a viable MVP (Minimum Viable Product) that delivers value.

  Assign priorities (P1, P2, P3, etc.) to each story, where P1 is the most critical.
  Think of each story as a standalone slice of functionality that can be:
  - Developed independently
  - Tested independently
  - Deployed independently
  - Demonstrated to users independently
-->

### User Story 1 - [Brief Title] (Priority: P1)

[Describe this user journey in plain language]

**Why this priority**: [Explain the value and why it has this priority level]

**Independent Test**: [Describe how this can be tested independently - e.g., "Can be fully tested by [specific action] and delivers [specific value]"]

**Acceptance Scenarios**:

1. **Given** [initial state], **When** [action], **Then** [expected outcome]
2. **Given** [initial state], **When** [action], **Then** [expected outcome]

---

### User Story 2 - [Brief Title] (Priority: P2)

[Describe this user journey in plain language]

**Why this priority**: [Explain the value and why it has this priority level]

**Independent Test**: [Describe how this can be tested independently]

**Acceptance Scenarios**:

1. **Given** [initial state], **When** [action], **Then** [expected outcome]

---

### User Story 3 - [Brief Title] (Priority: P3)

[Describe this user journey in plain language]

**Why this priority**: [Explain the value and why it has this priority level]

**Independent Test**: [Describe how this can be tested independently]

**Acceptance Scenarios**:

1. **Given** [initial state], **When** [action], **Then** [expected outcome]

---

[Add more user stories as needed, each with an assigned priority]

### Edge Cases

<!--
  ACTION REQUIRED: The content in this section represents placeholders.
  Fill them out with the right edge cases.
-->

- What happens when [boundary condition]?
- How does system handle [error scenario]?

## Requirements *(mandatory)*

<!--
  ACTION REQUIRED: The content in this section represents placeholders.
  Fill them out with the right functional requirements.
-->

### Functional Requirements

- **FR-001**: System MUST [specific capability, e.g., "allow users to create accounts"]
- **FR-002**: System MUST [specific capability, e.g., "validate email addresses"]
- **FR-003**: Users MUST be able to [key interaction, e.g., "reset their password"]
- **FR-004**: System MUST [data requirement, e.g., "persist user preferences"]
- **FR-005**: System MUST [behavior, e.g., "log all security events"]

*Example of marking unclear requirements:*

- **FR-006**: System MUST authenticate users via [NEEDS CLARIFICATION: auth method not specified - email/password, SSO, OAuth?]
- **FR-007**: System MUST retain user data for [NEEDS CLARIFICATION: retention period not specified]

### Key Entities *(include if feature involves data)*

- **[Entity 1]**: [What it represents, key attributes without implementation]
- **[Entity 2]**: [What it represents, relationships to other entities]

## Success Criteria *(mandatory)*

<!--
  ACTION REQUIRED: Define measurable success criteria.
  These must be technology-agnostic and measurable.
-->

### Measurable Outcomes

- **SC-001**: [Measurable metric, e.g., "Users can complete account creation in under 2 minutes"]
- **SC-002**: [Measurable metric, e.g., "System handles 1000 concurrent users without degradation"]
- **SC-003**: [User satisfaction metric, e.g., "90% of users successfully complete primary task on first attempt"]
- **SC-004**: [Business metric, e.g., "Reduce support tickets related to [X] by 50%"]
# Feature Specification: Multi-Raft (Static → Split → Move)

**Feature Branch**: `004-multi-raft`
**Created**: 2024-XX-XX
**Status**: Draft
**Input**: User description: "Phase 3くらいまでやる前提でお願いします。"

## User Scenarios & Testing *(mandatory)*

### User Story 1 - PD主導の複数Region起動 (Priority: P1)

運用者として、起動時に外部設定を不要とし、PDが配布する初期Regionメタデータに従って複数Regionを自動起動させたい（各Regionが独立にリーダー選出・書き込みを行う）。

**Why this priority**: Multi-Raftの基盤となるため最重要。これがないと以降のSplitやMoveが成立しない。
**Independent Test**: PDが返す初期Regionセット（例: 2Region）で起動し、両Regionでリーダー選出が成功し、別々のキー範囲に書き込み・読み出しできることを確認するE2Eテスト。

**Acceptance Scenarios**:

1. **Given** PDが初期Regionメタ（例: Region1 `[start="", end="m")`, Region2 `[start="m", end=""]`）を返す **When** ノードを起動する **Then** 両Regionでリーダーが選出され、互いに干渉せずに書き込みできる。
2. **Given** RaftService が region_id 付きメッセージを受信 **When** region_id に対応するPeerが存在する **Then** 正しいPeerに配送され、未登録ならエラーを返す。

---

### User Story 2 - Region Split のオンライン適用 (Priority: P1)

運用者として、Regionサイズが閾値を超えたときに、ダウンタイムなしでSplitが実行され、新しいRegionが自動生成・登録されてほしい。

**Why this priority**: データ増加に伴うスケールアウトを可能にするため。
**Independent Test**: 1 Region に大量書き込みを行い、閾値到達で Split が合意・適用され、2 Region に分割後も新旧両Regionで読み書きできることを確認。

**Acceptance Scenarios**:

1. **Given** Region サイズが閾値（例: 96MB相当）に達した **When** リーダーが Split コマンドを提案・合意する **Then** 新Region が作成され、元Regionの EndKey が縮小される。
2. **Given** Split 適用直後 **When** 分割後キー範囲に対し書き込みを行う **Then** それぞれの新旧Regionが正しく処理し、一貫性が崩れない。

---

### User Story 3 - Region 移動による負荷分散 (Priority: P2)

運用者として、混雑しているStoreから空いているStoreへRegionを移動（レプリカ追加・除去）し、ディスク/CPU負荷を均衡化したい。

**Why this priority**: Phase 3でのリバランスを可能にし、スケールアウトの価値を引き出すため。
**Independent Test**: PDが「Region X を Store A→B へ移動」指示を出し、ConfChangeでレプリカ追加→キャッチアップ→旧レプリカ除去が完了することを確認。

**Acceptance Scenarios**:

1. **Given** PD が Store B へのレプリカ追加を指示 **When** リーダーが ConfChange を提案 **Then** 新レプリカが追加され、キャッチアップ後に投票権が付与される。
2. **Given** 新レプリカがキャッチアップ **When** 旧レプリカを除去する ConfChange を適用 **Then** Region は新しい構成で継続し、クォーラムが維持される。

---

### Edge Cases

- 未登録の region_id を含む Raft メッセージを受信した場合は安全に拒否し、ログに記録する。
- Split 中にリーダーが交代した場合、二重Splitを防ぎ、コミット済みのSplitのみを適用する。
- Region 移動中にネットワーク分断が発生した場合、クォーラム不足時は書き込みを拒否し、再結合後に再同期する。
- PDが返す初期Regionメタにキー範囲の重複があった場合、起動時に検出してフェイルする。

## Requirements *(mandatory)*

### Functional Requirements

- **FR-001**: システムは PD が配布する初期Regionメタに基づき複数Regionを起動し、RegionID→Peerを Store で管理できなければならない。
- **FR-002**: RaftService は受信メッセージの region_id に基づき適切な Peer に配送し、未登録Regionはエラーを返さなければならない。
- **FR-003**: KvService は Key から Region を判定し、対応する Peer に提案して処理しなければならない。
- **FR-004**: Raftログおよびハードステートは RegionID で名前空間分離され、異なる Region 間で衝突しないようにしなければならない。
- **FR-005**: Region サイズが閾値を超えた場合、リーダーは Split コマンドを提案し、合意後に新Regionを Store に登録しなければならない。
- **FR-006**: Split 適用時は元Regionのメタデータ (Start/EndKey) を更新し、新Regionのメタデータを生成する操作がアトミックでなければならない。
- **FR-007**: Region の移動（レプリカ追加・除去）は Raft の ConfChange を用いて実施し、クォーラムを維持しながら完了しなければならない。
- **FR-008**: PD は Region 配置のメタを保持し、移動/追加/除去の指示を発行し、ノードはそれを反映できなければならない。
- **FR-009**: Region の状態 (リーダー/レプリカ/サイズ/キー範囲) は PD へハートビートで報告されなければならない。

### Key Entities *(include if feature involves data)*

- **Store**: 物理ノード。RegionID→Peerの管理、Raftメッセージディスパッチ、PDへのハートビートを担う。
- **Region**: キー範囲を持つ論理シャード。StartKey, EndKey, サイズ情報。
- **Peer**: RegionごとのRaftレプリカ。リーダー選出・ログ複製を担当。
- **Placement Metadata (PD)**: Region配置・サイズ・リーダー情報・バランス方針を保持するメタデータ。

## Success Criteria *(mandatory)*

### Measurable Outcomes

- **SC-001**: 2つ以上のRegionを起動した場合、各Regionでリーダー選出が60秒以内に完了する。
- **SC-002**: Regionごとの書き込みが他Regionに混入せず、キー範囲外アクセスは100%拒否される。
- **SC-003**: Split トリガー後、60秒以内に新Regionが登録され、分割後も書き込み成功率が99%以上を維持する。
- **SC-004**: Region 移動（レプリカ追加→キャッチアップ→除去）が 5 分以内に完了し、移動中の書き込み成功率が99%以上を維持する。

## Clarifications

### Session 2025-01-05

- Q: PDへの報告間隔と内容は？ → A: 30秒ごとにRegion一覧＋approx_size＋リーダー/ピア＋ヘルスをPDへ報告