Storage Benchmarks

Generated on 2026-03-27T12:08:47+09:00 with:

nix run ./nix/test-cluster#cluster -- bench-storage

CoronaFS

Cluster network baseline, measured with iperf3 from node04 to node01 before the storage tests:

Metric	Result
TCP throughput	45.92 MiB/s
TCP retransmits	193

Measured from node04. Local worker disk is the baseline. CoronaFS now has two relevant data paths in the lab: the controller export sourced from node01, and the node-local export materialized onto the worker that actually attaches the mutable VM disk.

Metric	Local Disk	Controller Export	Node-local Export
Sequential write	679.05 MiB/s	30.35 MiB/s	395.06 MiB/s
Sequential read	2723.40 MiB/s	42.70 MiB/s	709.14 MiB/s
4k random read	16958 IOPS	2034 IOPS	5087 IOPS
4k queued random read (`iodepth=32`)	106026 IOPS	14261 IOPS	28898 IOPS

Queue-depth profile (libaio, iodepth=32) from the same worker:

Metric	Local Disk	Controller Export	Node-local Export
Depth-32 write	3417.45 MiB/s	39.26 MiB/s	178.04 MiB/s
Depth-32 read	12996.47 MiB/s	55.71 MiB/s	112.88 MiB/s

Node-local materialization timing and target-node steady-state read path:

Metric	Result
Node04 materialize latency	9.23 s
Node05 materialize latency	5.82 s
Node05 node-local sequential read	709.14 MiB/s

PlasmaVMC now prefers the worker-local CoronaFS export for mutable node-local volumes, even when the underlying materialization is a qcow2 overlay. The VM runtime section below is therefore the closest end-to-end proxy for real local-attach VM I/O, while the node-local export numbers remain useful for CoronaFS service consumers and for diagnosing exporter overhead.

LightningStor

Measured from node03 against the S3-compatible endpoint on node01. The object path exercised the distributed backend with replication across the worker storage nodes.

Cluster network baseline for this client, measured with iperf3 from node03 to node01 before the storage tests:

Metric	Result
TCP throughput	45.99 MiB/s
TCP retransmits	207

Large-object path

Metric	Result
Object size	256 MiB
Upload throughput	18.20 MiB/s
Download throughput	39.21 MiB/s

Small-object batch

Measured as 32 objects of 4 MiB each (128 MiB total).

Metric	Result
Batch upload throughput	18.96 MiB/s
Batch download throughput	39.88 MiB/s
PUT rate	4.74 objects/s
GET rate	9.97 objects/s

Parallel small-object batch

Measured as the same 32 objects of 4 MiB each, but with 8 concurrent client jobs from node03.

Metric	Result
Parallel batch upload throughput	16.23 MiB/s
Parallel batch download throughput	26.07 MiB/s
Parallel PUT rate	4.06 objects/s
Parallel GET rate	6.52 objects/s

VM Image Path

Measured against the PlasmaVMC -> LightningStor artifact -> CoronaFS-backed managed volume clone path on node01.

Metric	Result
Guest image artifact size	2017 MiB
Guest image virtual size	4096 MiB
`CreateImage` latency	66.49 s
First image-backed `CreateVolume` latency	16.86 s
Second image-backed `CreateVolume` latency	0.12 s

VM Runtime Path

Measured against the real StartVm -> qemu attach -> guest boot -> guest fio path on a worker node, using a CoronaFS-backed root disk and data disk.

Metric	Result
`StartVm` to qemu attach	0.60 s
`StartVm` to guest benchmark result	35.69 s
Guest sequential write	123.49252223968506 MiB/s
Guest sequential read	1492.7113695144653 MiB/s
Guest 4k random read	25550 IOPS

Assessment

CoronaFS controller-export reads are currently 1.6% of the measured local-disk baseline on this nested-QEMU lab cluster.
CoronaFS controller-export 4k random reads are currently 12.0% of the measured local-disk baseline.
CoronaFS controller-export queued 4k random reads are currently 13.5% of the measured local queued-random-read baseline.
CoronaFS controller-export sequential reads are currently 93.0% of the measured node04->node01 TCP baseline, which isolates the centralized source path from raw cluster-network limits.
CoronaFS controller-export depth-32 reads are currently 0.4% of the local depth-32 baseline.
CoronaFS node-local reads are currently 26.0% of the measured local-disk baseline, which is the more relevant steady-state signal for mutable VM disks after attachment.
CoronaFS node-local 4k random reads are currently 30.0% of the measured local-disk baseline.
CoronaFS node-local queued 4k random reads are currently 27.3% of the measured local queued-random-read baseline.
CoronaFS node-local depth-32 reads are currently 0.9% of the local depth-32 baseline.
The target worker's node-local read path is 26.0% of the measured local sequential-read baseline after materialization, which is the better proxy for restart and migration steady state than the old shared-export read.
PlasmaVMC now attaches writable node-local volumes through the worker-local CoronaFS export, so the guest-runtime section should be treated as the real local VM steady-state path rather than the node-local export numbers alone.
CoronaFS single-depth writes remain sensitive to the nested-QEMU/VDE lab transport, so the queued-depth and guest-runtime numbers are still the more reliable proxy for real VM workload behavior than the single-stream write figure alone.
The central export path is now best understood as a source/materialization path; the worker-local export is the path that should determine VM-disk readiness going forward.
LightningStor's replicated S3 path is working correctly, but 18.20 MiB/s upload and 39.21 MiB/s download are still lab-grade numbers rather than strong object-store throughput.
LightningStor large-object downloads are currently 85.3% of the same node04->node01 TCP baseline, which indicates how much of the headroom is being lost above the raw network path.
The current S3 frontend tuning baseline is the built-in 16 MiB streaming threshold with multipart PUT/FETCH concurrency of 8; that combination is the best default observed on this lab cluster so far.
LightningStor uploads should be read against the replication write quorum and the same ~45.99 MiB/s lab network ceiling; this environment still limits end-to-end throughput well before modern bare-metal NICs would.
LightningStor's small-object batch path is also functional, but 4.74 PUT/s and 9.97 GET/s still indicate a lab cluster rather than a tuned object-storage deployment.
The parallel small-object profile is the more relevant control-plane/object-ingest signal; it currently reaches 4.06 PUT/s and 6.52 GET/s.
The VM image section measures clone/materialization cost, not guest runtime I/O.
The PlasmaVMC local image-backed clone fast path is now active again; a 0.12 s second clone indicates the CoronaFS qcow2 backing-file path is being hit on node01 rather than falling back to eager raw materialization.
The VM runtime section is the real PlasmaVMC + CoronaFS + QEMU virtio-blk + guest kernel path; use it to judge whether QEMU/NBD tuning is helping.
The local sequential-write baseline is noisy in this environment, so the read and random-read deltas are the more reliable signal.

7.3 KiB Raw Blame History