# Metricstor A Prometheus-compatible metrics storage system with mTLS support, written in Rust. ## Overview Metricstor is a high-performance time-series database designed to replace VictoriaMetrics in environments requiring open-source mTLS support. It provides: - **Prometheus Compatibility**: Remote write ingestion and PromQL query support - **mTLS Security**: Mutual TLS authentication for all connections - **Push-based Ingestion**: Accept metrics via Prometheus remote_write protocol - **Scalable Storage**: Efficient time-series storage with compression and retention - **PromQL Engine**: Query metrics using the Prometheus query language This project is part of the cloud infrastructure stack (PROJECT.md Item 12). ## Architecture For detailed architecture documentation, see [`docs/por/T033-metricstor/DESIGN.md`](../docs/por/T033-metricstor/DESIGN.md). ### High-Level Components ``` ┌─────────────────────────────────────────────────────────────────┐ │ Metricstor Server │ ├─────────────────────────────────────────────────────────────────┤ │ │ │ ┌──────────────────┐ ┌──────────────────┐ │ │ │ HTTP Ingestion │ │ gRPC Query │ │ │ │ (remote_write) │ │ (PromQL API) │ │ │ └────────┬─────────┘ └────────┬─────────┘ │ │ │ │ │ │ ▼ ▼ │ │ ┌──────────────────────────────────────────────┐ │ │ │ Storage Engine │ │ │ │ - In-memory head block (WAL-backed) │ │ │ │ - Persistent blocks (Gorilla compression) │ │ │ │ - Inverted index (label → series) │ │ │ │ - Compaction & retention │ │ │ └──────────────────────────────────────────────┘ │ │ │ └─────────────────────────────────────────────────────────────────┘ ``` ### Crates - **metricstor-api**: gRPC client library and protobuf definitions - **metricstor-types**: Core data types (Metric, TimeSeries, Label, Sample) - **metricstor-server**: Main server implementation ## Building ### Prerequisites - Rust 1.75 or later - Protocol Buffers compiler (provided via `protoc-bin-vendored`) ### Build Commands ```bash # Build all crates cargo build --release # Build specific crate cargo build -p metricstor-server --release # Run tests cargo test # Check code without building cargo check ``` ### NixOS The project includes Nix flake support (per T024 patterns): ```bash # Build with Nix nix build # Enter development shell nix develop ``` ## Configuration Configuration is specified in YAML format. Default location: `config.yaml` ### Example Configuration ```yaml server: grpc_addr: "0.0.0.0:9100" # gRPC query API http_addr: "0.0.0.0:9101" # HTTP remote_write endpoint max_concurrent_streams: 100 query_timeout_seconds: 30 max_samples_per_query: 10000000 storage: data_dir: "/var/lib/metricstor" retention_days: 15 wal_segment_size_mb: 128 block_duration_hours: 2 max_head_samples: 1000000 compaction_interval_seconds: 3600 # Optional: Enable mTLS (T027 unified TLS pattern) tls: cert_file: "/etc/metricstor/tls/cert.pem" key_file: "/etc/metricstor/tls/key.pem" ca_file: "/etc/metricstor/tls/ca.pem" require_client_cert: true ``` ## Running ```bash # Run with default config ./target/release/metricstor-server # Run with custom config ./target/release/metricstor-server --config /path/to/config.yaml ``` ## Usage ### Ingesting Metrics Metricstor implements the Prometheus remote_write protocol v1.0 for push-based metric ingestion. #### Using Prometheus Remote Write Configure Prometheus to push metrics to Metricstor: ```yaml # prometheus.yml remote_write: - url: "http://localhost:9101/api/v1/write" queue_config: capacity: 10000 max_shards: 10 batch_send_deadline: 5s # Optional: mTLS configuration tls_config: cert_file: client.pem key_file: client-key.pem ca_file: ca.pem ``` #### Using the API Directly You can also push metrics directly using the remote_write protocol: ```bash # Run the example to push sample metrics cargo run --example push_metrics ``` The remote_write endpoint (`POST /api/v1/write`) expects: - **Content-Type**: `application/x-protobuf` - **Content-Encoding**: `snappy` - **Body**: Snappy-compressed Prometheus WriteRequest protobuf See [`examples/push_metrics.rs`](crates/metricstor-server/examples/push_metrics.rs) for a complete implementation example. #### Features - **Snappy Compression**: Efficient compression for wire transfer - **Label Validation**: Prometheus-compliant label name validation - **Backpressure**: HTTP 429 when write buffer is full - **Sample Validation**: Rejects NaN and Inf values - **Buffered Writes**: In-memory batching for performance ### Querying Metrics Metricstor provides a Prometheus-compatible HTTP API for querying metrics using PromQL. #### API Endpoints ##### Instant Query Query metric values at a specific point in time: ```bash GET /api/v1/query?query=&time= # Example curl 'http://localhost:9101/api/v1/query?query=up&time=1234567890000' ``` Parameters: - `query` (required): PromQL expression - `time` (optional): Unix timestamp in milliseconds (defaults to current time) Response format: ```json { "status": "success", "data": { "resultType": "vector", "result": [ { "metric": {"__name__": "up", "job": "prometheus"}, "value": [1234567890000, 1.0] } ] } } ``` ##### Range Query Query metric values over a time range: ```bash GET /api/v1/query_range?query=&start=&end=&step= # Example curl 'http://localhost:9101/api/v1/query_range?query=rate(http_requests_total[5m])&start=1234567890000&end=1234571490000&step=60000' ``` Parameters: - `query` (required): PromQL expression - `start` (required): Start timestamp in milliseconds - `end` (required): End timestamp in milliseconds - `step` (required): Step duration in milliseconds ##### Label Values Get all values for a specific label: ```bash GET /api/v1/label//values # Example curl 'http://localhost:9101/api/v1/label/job/values' ``` ##### Series Metadata Get metadata for all series: ```bash GET /api/v1/series # Example curl 'http://localhost:9101/api/v1/series' ``` #### Supported PromQL Metricstor implements a practical subset of PromQL covering 80% of common use cases: **Selectors:** ```promql # Metric name http_requests_total # Label matching http_requests_total{method="GET"} http_requests_total{method="GET", status="200"} # Label operators metric{label="value"} # Exact match metric{label!="value"} # Not equal metric{label=~"regex"} # Regex match metric{label!~"regex"} # Negative regex ``` **Range Selectors:** ```promql http_requests_total[5m] # Last 5 minutes http_requests_total[1h] # Last 1 hour ``` **Aggregations:** ```promql sum(http_requests_total) avg(http_requests_total) min(http_requests_total) max(http_requests_total) count(http_requests_total) ``` **Functions:** ```promql # Rate functions rate(http_requests_total[5m]) # Per-second rate irate(http_requests_total[5m]) # Instant rate (last 2 points) increase(http_requests_total[1h]) # Total increase over time ``` #### Example Client Run the example query client to test all query endpoints: ```bash cargo run --example query_metrics ``` See [`examples/query_metrics.rs`](crates/metricstor-server/examples/query_metrics.rs) for implementation details. #### Grafana Integration Configure Grafana to use Metricstor as a Prometheus data source: 1. Add a new Prometheus data source 2. Set URL to `http://localhost:9101` 3. (Optional) Configure mTLS certificates 4. Test connection with instant query Grafana will automatically use the `/api/v1/query` and `/api/v1/query_range` endpoints for dashboard queries. ## Development Roadmap This workspace scaffold (S2) provides the foundation. Implementation proceeds as: - **S2 (Scaffold)**: Complete - workspace structure, types, protobuf definitions - **S3 (Push Ingestion)**: Complete - Prometheus remote_write endpoint with validation, compression, and buffering (34 tests passing) - **S4 (PromQL Engine)**: Complete - Query execution engine with instant/range queries, aggregations, rate functions (42 tests passing) - **S5 (Storage Layer)**: Implement persistent time-series storage backend - **S6 (Integration)**: NixOS module, testing, documentation See [`docs/por/T033-metricstor/task.yaml`](../docs/por/T033-metricstor/task.yaml) for detailed task tracking. ## Integration ### Service Ports - **9100**: gRPC query API (mTLS) - **9101**: HTTP remote_write API (mTLS) ### Monitoring Metricstor exports its own metrics on the standard `/metrics` endpoint for self-monitoring. ## License MIT OR Apache-2.0 ## References - **Task**: T033 Metricstor (PROJECT.md Item 12) - **Design**: [`docs/por/T033-metricstor/DESIGN.md`](../docs/por/T033-metricstor/DESIGN.md) - **Dependencies**: T024 (NixOS), T027 (Unified TLS) - **Prometheus Remote Write**: https://prometheus.io/docs/concepts/remote_write_spec/ - **PromQL**: https://prometheus.io/docs/prometheus/latest/querying/basics/