# NixOS Deployment / Scheduler Roadmap (2026-03-20)

## 背景

このリポジトリにはすでに次の材料がある。

- `NixOS` モジュール群: 各サービスの systemd 化とテストクラスタ構成
- `deployer`: bare metal/bootstrap 用の phone-home と node inventory
- `deployer-ctl`: ChainFire 上の cluster desired state を apply する CLI
- `fleet-scheduler`: native service の配置決定
- `node-agent`: 各ノードで process/container を reconcile
- `plasmavmc` / `k8shost`: VM と Pod の個別スケジューラ

ただし、これらがまだ 1 本の「Nix から始まるデプロイ経路」になっていない。

現状は、

- `Nix` で host configuration を作る経路
- `deployer-ctl` の YAML/JSON で cluster state を入れる経路
- `deployer` の phone-home で node を登録する経路
- `fleet-scheduler` / `node-agent` で native service を動かす経路

が並立していて、単一正本と責務境界がまだ弱い。

## いま見えている重要な不足

### 1. `Nix` が単一正本になっていない

- `plasmacloud-cluster.nix` / `nix-nos` / `deployer-ctl` の `ClusterStateSpec` が並立している
- static topology をどこで持つかが定まっていない
- node class / pool / enrollment rule / service schedule が Nix から一気通貫で生成されていない

### 2. bootstrap はあるが「NixOS を適用する agent」がない

- `deployer` は `nix_profile` を返せる
- しかし bootstrap ISO は `node-config.json` の `hostname` / `ip` しか実質使っていない
- `node-agent` は process/container 用であり、NixOS generation を apply しない

つまり、`NixOS deployment` と `runtime scheduling` が別々で、間をつなぐ node-side reconciler が存在しない。

### 3. ISO / netboot が generic bootstrap になり切っていない

- ISO は `node01|node02|node03` と `nix/nodes/vm-cluster/$NODE_ID` に寄っている
- real bare metal 用の node class / profile / disk layout 選択器としてはまだ固定的
- `cloud-init` endpoint はあるが、本流はまだ「Nix native bootstrap API」ではない

### 4. `first-boot-automation` は設計途中で本流に乗っていない

- モジュールはあるが実利用配線が薄い
- ChainFire / FlareDB / IAM bootstrap の責務が中途半端
- 現状の main path は `initialPeers` と固定 node 定義で成立しており、ここは未整理

### 5. scheduler の層分離がまだ曖昧

- `fleet-scheduler`: native service 配置
- `plasmavmc`: VM 配置
- `k8shost`: Pod 配置

この 3 つは全部「scheduler」だが、対象が違う。
それぞれの責務を意図的に分離しないと、NixOS 配備責務まで混ざる。

### 6. MaaS 代替としては inventory / commissioning がまだ弱い

- machine-id と enrollment rule はある
- しかし hardware facts, NIC facts, disk facts, BMC/Redfish, power cycle, reprovision, rescue の層は未整備

MaaS を本当に置き換えるなら、少なくとも commission/inventory/reinstall/power の最小セットが必要。

### 7. CI が bootstrap 経路の閉じ方を保証していない

- CI は主に build / fmt / clippy / unit test
- `deployer` + `fleet-scheduler` + `node-agent` + `test-cluster` の end-to-end は publishable gate になっていない

## 目標アーキテクチャ

### 原則

`Nix` を static desired state の単一正本にする。  
動的な reconcile は PhotonCloud の各 agent / control plane に任せる。

分け方は以下。

### 1. Static layer: `Nix`

ここで定義するもの:

- cluster / datacenter / rack / VLAN / BGP / IP pool
- node class / pool / hardware policy
- bootstrap seed set
- disk layout policy
- host profile
- native service の desired policy
- install image / bootstrap image の生成

`Terraform` の static な部分はここで置き換える。

### 2. Bootstrap layer: `deployer`

ここでやること:

- node discovery
- enrollment rule による class/pool/profile 決定
- machine-id/MAC/DMI 等と node-id の束縛
- install-time secrets / SSH host key / TLS 発行
- install plan の返却

`cloud-init` / `MaaS` の bootstrap 部分をここで置き換える。

### 3. Node system reconcile layer: 新しい `nix-agent` 相当

ここでやること:

- desired NixOS generation / flake attr / closure を受け取る
- closure を取得
- `switch-to-configuration` を実行
- activation 成功/失敗を report
- 世代 rollback と health gating

ここがない限り、NixOS デプロイは phone-home 後に止まる。

これは既存 `node-agent` の責務とは別物なので、最初は別 agent に分ける方が安全。

### 4. Native runtime layer: `fleet-scheduler` + `node-agent`

ここでやること:

- stateless / movable な native service を worker pool 上に配置
- process/container の desired state を ChainFire に書く
- node-agent が実行と health を担う

`Kubernetes Deployment/DaemonSet/Service` のうち、PhotonCloud 自前 native service 用の部分をここで置き換える。

### 5. Tenant workload layer: `plasmavmc` / `k8shost`

ここは別 scheduler のままでよい。

- VM は `plasmavmc`
- Pod は `k8shost`

これらは NixOS host deployment とは分離して扱う。

## やるべきこと

### Phase 0: 単一正本を決める

最優先はここ。

1. `Nix` を cluster source of truth に固定する
2. `deployer-ctl` の YAML/JSON は hand-written ではなく `Nix` から生成する
3. `plasmacloud-cluster` と `nix-nos` の重複生成ロジックを統一する

推奨:

- `nix/lib/cluster-schema.nix` を作る
- そこから次を生成する
  - `nixosConfigurations.<node>`
  - bootstrap 用 install plan
  - `ClusterStateSpec` 相当の JSON
  - test-cluster 用 topology

### Phase 1: generic bootstrap を完成させる

1. ISO / netboot を node 固定実装から profile/class ベースへ変更する
2. `deployer` の返却値を `node-config` ではなく `install-plan` に寄せる
3. `install-plan` には最低限以下を含める
   - node id
   - hostname
   - primary IP / network facts
   - disk layout ref
   - flake attr or system profile ref
   - SSH host key / TLS / bootstrap token
4. ISO 側は `nix_profile` を実際に使って install target を決める
5. `disko` 参照も node 固定 path ではなく profile/class から生成する

この段階で、`cloud-init` endpoint は compatibility として残してもよいが、本流ではなくすべき。

### Phase 2: NixOS apply agent を入れる

1. `node-agent` とは別に `nix-agent` を追加する
2. responsibilities:
   - desired generation の取得
   - closure prefetch
   - activation
   - success/failure/report
   - rollback
3. state model を ChainFire に持つ
   - `nodes/<id>/desired-system`
   - `nodes/<id>/observed-system`
   - `deployments/<generation-id>`
4. health gate を入れる
   - reboot required
   - activation timeout
   - rollback on failed health check

これで初めて「Terraform なしで host rollout する基盤」になる。

### Phase 3: native service scheduling を Nix と接続する

1. `ServiceSpec` / `PlacementPolicy` を Nix から生成する
2. `deployer-ctl apply` を human CLI ではなく generator/exporter の受け口にする
3. `fleet-scheduler` は以下だけに責務を限定する
   - node selection
   - rollout budget
   - failover
   - publication trigger
4. `node-agent` は以下だけに責務を限定する
   - process/container reconcile
   - health reporting
   - observed state update

重要:

- ChainFire
- FlareDB
- IAM bootstrap
- Deployer 自体
- Scheduler 自体

のような基盤コアは、最初は scheduler 対象にせず NixOS 固定配置のままにする方がよい。  
いきなり全部 movable service にすると bootstrap が壊れやすい。

### Phase 4: inventory / MaaS 代替を広げる

1. phone-home payload に hardware facts を追加する
   - CPU
   - memory
   - disks
   - NICs
   - virtualization capability
   - serial / DMI
2. enrollment rule を machine-id 依存から広げる
   - MAC
   - DMI
   - hardware traits
   - rack / TOR port metadata
3. node lifecycle を増やす
   - discovered
   - commissioned
   - install-pending
   - installing
   - active
   - draining
   - reprovisioning
   - rescue
4. 余力があれば Redfish/IPMI を追加する
   - power on/off
   - reboot
   - virtual media

ここまで来ると MaaS の最小代替になる。

### Phase 5: validation path を本流化する

1. `nix/test-cluster` に bootstrap end-to-end を追加する
   - generic ISO/netboot
   - deployer enrollment
   - install-plan
   - install
   - first boot
   - nix-agent apply
2. `fleet-scheduler` native runtime flow を CI gate に入れる
3. node loss / reboot / reprovision / rollback の試験を追加する

## 実装優先順位

### P0

- `Nix` を single source of truth に固定
- `nix_profile` を実際の install/apply 経路に接続
- generic install-plan を定義
- node 固定 ISO install path を撤廃
- `nix-agent` の最小版を作る

### P1

- Nix -> `ClusterStateSpec` generator
- `fleet-scheduler` と `node-agent` の Nix-generated service spec 化
- hardware inventory / enrollment 強化
- bootstrap E2E を `nix/test-cluster` に組み込み

### P2

- Redfish/IPMI
- reprovision / rescue
- publication と host rollout の連携
- drain / cordon / maintenance window

## このリポジトリに対する具体的な backlog

1. `nix_profile` を受け取って実際に install target を決める bootstrap API と ISO 処理を作る
2. `nix/nodes/vm-cluster/node01` のような node 固定構成を、class/profile 生成へ寄せる
3. `deployer-ctl` 用の YAML を手書きせず、Nix から JSON export する generator を追加する
4. `first-boot-automation` は使うなら bootstrap API 契約に合わせて作り直す。使わないなら一旦凍結する
5. `node-agent` とは別に `nix-agent` crate と NixOS module を追加する
6. `deployer` に hardware inventory と install state report を足す
7. `nix/test-cluster` に bare-metal-like bootstrap scenario を追加する
8. CI で `deployer/scripts/verify-deployer-bootstrap-e2e.sh` と `verify-fleet-scheduler-e2e.sh` 相当を gate に入れる

## 結論

このプロジェクトで不足しているのは、scheduler 単体ではない。  
足りていないのは次の 1 本の経路である。

`Nix cluster declaration`
-> `bootstrap/install plan generation`
-> `deployer enrollment`
-> `NixOS installation`
-> `node-side NixOS reconcile`
-> `native service scheduling`
-> `runtime health/rollback`

いまは各部品はあるが、1 本の pipeline になっていない。  
最優先でやるべきは、`Nix` を単一正本に固定し、その宣言から

- host install
- host rollout
- native service scheduling

の 3 つを生成・reconcile できるようにすること。