133 lines
4.6 KiB
Markdown
133 lines
4.6 KiB
Markdown
# Nix-NOS Simplification Plan (2026-04-04)
|
|
|
|
## Summary
|
|
|
|
`nix-nos` should not remain a second cluster authoring surface.
|
|
|
|
Status update:
|
|
|
|
- `ultracloud.cluster` is now the only in-repo cluster authoring path
|
|
- `services.first-boot-automation` no longer has a `useNixNOS` mode
|
|
- root `flake.nix` no longer imports `nix-nos`
|
|
- topology-specific `nix-nos` files have been removed
|
|
- the remaining `nix-nos` tree is only network/BGP/routing primitives
|
|
|
|
The right plan is:
|
|
|
|
- keep `ultracloud.cluster` as the only cluster source of truth
|
|
- keep `nix-nos` only as a compatibility facade for older topology-driven flows
|
|
- eventually shrink `nix-nos` down to network primitives, or remove it entirely if those primitives are moved into the main Nix module tree
|
|
|
|
## Current State
|
|
|
|
Today the repo is already halfway through this transition.
|
|
|
|
- `nix/lib/cluster-schema.nix` is the actual schema/helper library
|
|
- `nix/modules/ultracloud-cluster.nix` generates:
|
|
- per-node `cluster-config.json`
|
|
- `nix-nos.clusters`
|
|
- deployer cluster state
|
|
- `nix-nos/modules/topology.nix` no longer owns its own schema logic; it delegates to `cluster-schema.nix`
|
|
- `services.first-boot-automation` still has a `useNixNOS` path and still treats `nix-nos.generateClusterConfig` as a real config source
|
|
|
|
So the duplication is smaller than before, but the user-facing model is still confusing because there are still two apparent ways to describe a cluster.
|
|
|
|
## Recommendation
|
|
|
|
The recommended target is:
|
|
|
|
1. `ultracloud.cluster` is the only supported cluster authoring API.
|
|
2. `nix-nos` is explicitly legacy-compatibility only for topology consumers that have not been migrated yet.
|
|
3. `nix-nos` should stop presenting itself as a general cluster definition layer.
|
|
4. `first-boot-automation` should stop depending on `nix-nos` as a primary provider.
|
|
|
|
This keeps the repo simpler without forcing a big-bang removal.
|
|
|
|
## What Nix-NOS Should Still Own
|
|
|
|
Only keep the parts that are actually distinct:
|
|
|
|
- interface/VLAN primitives
|
|
- BGP primitives
|
|
- static routing primitives
|
|
- any truly reusable NOS-style networking submodules
|
|
|
|
These are valid low-level modules.
|
|
|
|
What `nix-nos` should not own anymore:
|
|
|
|
- whole-cluster source of truth
|
|
- bootstrap node selection rules
|
|
- cluster-config generation semantics
|
|
- host inventory / deployer state generation
|
|
|
|
Those belong in `ultracloud.cluster` and `cluster-schema.nix`.
|
|
|
|
## Target Shape
|
|
|
|
### Primary path
|
|
|
|
- user writes `ultracloud.cluster`
|
|
- `cluster-schema.nix` derives:
|
|
- node cluster config
|
|
- deployer cluster state
|
|
- compatibility topology objects if needed
|
|
|
|
### Compatibility path
|
|
|
|
- `nix-nos` may still expose `clusters` and `generateClusterConfig`
|
|
- but they are documented and warned as legacy compatibility only
|
|
- ideally they become thin read-only views over `cluster-schema.nix`, not an authoring API
|
|
|
|
### First boot
|
|
|
|
`services.first-boot-automation` should eventually have only these modes:
|
|
|
|
- use generated UltraCloud cluster config
|
|
- use an explicit file path
|
|
|
|
It should not need a separate `useNixNOS` mode long-term.
|
|
|
|
## Migration Plan
|
|
|
|
### Phase 1: Freeze
|
|
|
|
- do not add new functionality to `nix-nos.clusters`
|
|
- mark `nix-nos` topology usage as legacy in warnings/docs
|
|
- keep all schema changes in `cluster-schema.nix`
|
|
|
|
### Phase 2: Move first-boot off Nix-NOS
|
|
|
|
- make `services.first-boot-automation` prefer `ultracloud.cluster.generated.nodeClusterConfig`
|
|
- keep `nix-nos` only as fallback/compat, not as the preferred path
|
|
- stop using `useNixNOS` in normal tests/configurations
|
|
|
|
### Phase 3: Remove topology authoring role
|
|
|
|
- deprecate direct authoring of `nix-nos.clusters`
|
|
- remove `nix/modules/nix-nos/cluster-config-generator.nix`
|
|
- collapse any remaining direct topology generation onto `cluster-schema.nix`
|
|
|
|
### Phase 4: Decide final fate
|
|
|
|
Choose one:
|
|
|
|
- keep `nix-nos` as a small network-primitives library
|
|
- or move those network primitives under `nix/modules/network/*` and delete `nix-nos`
|
|
|
|
The first option is lower risk. The second is cleaner.
|
|
|
|
## Recommended Decision
|
|
|
|
Recommended decision:
|
|
|
|
- short term: keep `nix-nos`, but only as a compatibility/network-primitives layer
|
|
- medium term: remove `nix-nos` as a cluster authoring concept
|
|
- long term: either rename/rehome the remaining network modules, or delete `nix-nos` if nothing substantial remains
|
|
|
|
## Immediate Next Steps
|
|
|
|
1. Mark `nix-nos.clusters` and `services.first-boot-automation.useNixNOS` as legacy in evaluation warnings.
|
|
2. Reduce test usage so only one compatibility smoke test still exercises direct `nix-nos` authoring.
|
|
3. Change docs/examples to author clusters through `ultracloud.cluster` only.
|
|
4. After that, remove the standalone `cluster-config-generator.nix` path.
|