photoncloud-monorepo/docs/storage-benchmarks.md

107 lines
4.5 KiB
Markdown

# Storage Benchmarks
Generated on 2026-03-10T20:02:00+09:00 with:
```bash
nix run ./nix/test-cluster#cluster -- fresh-bench-storage
```
## CoronaFS
Cluster network baseline, measured with `iperf3` from `node04` to `node01` before the storage tests:
| Metric | Result |
|---|---:|
| TCP throughput | 22.83 MiB/s |
| TCP retransmits | 78 |
Measured from `node04`.
Local worker disk is the baseline. CoronaFS is the shared block volume path used for mutable VM disks, exported from `node01` over NBD.
| Metric | Local Disk | CoronaFS |
|---|---:|---:|
| Sequential write | 26.36 MiB/s | 5.24 MiB/s |
| Sequential read | 348.77 MiB/s | 10.08 MiB/s |
| 4k random read | 1243 IOPS | 145 IOPS |
Queue-depth profile (`libaio`, `iodepth=32`) from the same worker:
| Metric | Local Disk | CoronaFS |
|---|---:|---:|
| Depth-32 write | 27.12 MiB/s | 11.42 MiB/s |
| Depth-32 read | 4797.47 MiB/s | 10.06 MiB/s |
Cross-worker shared-volume visibility, measured by writing on `node04` and reading from `node05` over the same CoronaFS NBD export:
| Metric | Result |
|---|---:|
| Cross-worker sequential read | 17.72 MiB/s |
## LightningStor
Measured from `node03` against the S3-compatible endpoint on `node01`.
The object path exercised the distributed backend with replication across the worker storage nodes.
Cluster network baseline for this client, measured with `iperf3` from `node03` to `node01` before the storage tests:
| Metric | Result |
|---|---:|
| TCP throughput | 18.35 MiB/s |
| TCP retransmits | 78 |
### Large-object path
| Metric | Result |
|---|---:|
| Object size | 256 MiB |
| Upload throughput | 8.11 MiB/s |
| Download throughput | 7.54 MiB/s |
### Small-object batch
Measured as 32 objects of 4 MiB each (128 MiB total).
| Metric | Result |
|---|---:|
| Batch upload throughput | 0.81 MiB/s |
| Batch download throughput | 0.83 MiB/s |
| PUT rate | 0.20 objects/s |
| GET rate | 0.21 objects/s |
### Parallel small-object batch
Measured as the same 32 objects of 4 MiB each, but with 8 concurrent client jobs from `node03`.
| Metric | Result |
|---|---:|
| Parallel batch upload throughput | 3.03 MiB/s |
| Parallel batch download throughput | 2.89 MiB/s |
| Parallel PUT rate | 0.76 objects/s |
| Parallel GET rate | 0.72 objects/s |
## VM Image Path
Measured against the real `PlasmaVMC -> LightningStor artifact -> CoronaFS-backed managed volume` path on `node01`.
| Metric | Result |
|---|---:|
| Guest image artifact size | 2017 MiB |
| Guest image virtual size | 4096 MiB |
| `CreateImage` latency | 176.03 s |
| First image-backed `CreateVolume` latency | 76.51 s |
| Second image-backed `CreateVolume` latency | 170.49 s |
## Assessment
- CoronaFS shared-volume reads are currently 2.9% of the measured local-disk baseline on this nested-QEMU lab cluster.
- CoronaFS 4k random reads are currently 11.7% of the measured local-disk baseline.
- CoronaFS cross-worker reads are currently 5.1% of the measured local-disk sequential-read baseline, which is the more relevant signal for VM restart and migration paths.
- CoronaFS sequential reads are currently 44.2% of the measured node04->node01 TCP baseline, which helps separate NBD/export overhead from raw cluster-network limits.
- CoronaFS depth-32 reads are currently 0.2% of the local depth-32 baseline, which is a better proxy for queued guest I/O than the single-depth path.
- The shared-volume path is functionally correct for mutable VM disks and migration tests, but its read-side throughput is still too low to call production-ready for heavier VM workloads.
- LightningStor's replicated S3 path is working correctly, but 8.11 MiB/s upload and 7.54 MiB/s download are still lab-grade numbers rather than strong object-store throughput.
- LightningStor large-object downloads are currently 41.1% of the same node04->node01 TCP baseline, which indicates how much of the headroom is being lost above the raw network path.
- LightningStor's small-object batch path is also functional, but 0.20 PUT/s and 0.21 GET/s still indicate a lab cluster rather than a tuned object-storage deployment.
- The parallel small-object profile is the more relevant control-plane/object-ingest signal; it currently reaches 0.76 PUT/s and 0.72 GET/s.
- The VM image path is now measured directly rather than inferred. The cold `CreateVolume` path includes artifact fetch plus CoronaFS population; the warm `CreateVolume` path isolates repeated CoronaFS population from an already cached image.
- The local sequential-write baseline is noisy in this environment, so the read and random-read deltas are the more reliable signal.