photoncloud-monorepo/metricstor/README.md
centra 5c6eb04a46 T036: Add VM cluster deployment configs for nixos-anywhere
- netboot-base.nix with SSH key auth
- Launch scripts for node01/02/03
- Node configuration.nix and disko.nix
- Nix modules for first-boot automation

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2025-12-11 09:59:19 +09:00

349 lines
10 KiB
Markdown

# Metricstor
A Prometheus-compatible metrics storage system with mTLS support, written in Rust.
## Overview
Metricstor is a high-performance time-series database designed to replace VictoriaMetrics
in environments requiring open-source mTLS support. It provides:
- **Prometheus Compatibility**: Remote write ingestion and PromQL query support
- **mTLS Security**: Mutual TLS authentication for all connections
- **Push-based Ingestion**: Accept metrics via Prometheus remote_write protocol
- **Scalable Storage**: Efficient time-series storage with compression and retention
- **PromQL Engine**: Query metrics using the Prometheus query language
This project is part of the cloud infrastructure stack (PROJECT.md Item 12).
## Architecture
For detailed architecture documentation, see [`docs/por/T033-metricstor/DESIGN.md`](../docs/por/T033-metricstor/DESIGN.md).
### High-Level Components
```
┌─────────────────────────────────────────────────────────────────┐
│ Metricstor Server │
├─────────────────────────────────────────────────────────────────┤
│ │
│ ┌──────────────────┐ ┌──────────────────┐ │
│ │ HTTP Ingestion │ │ gRPC Query │ │
│ │ (remote_write) │ │ (PromQL API) │ │
│ └────────┬─────────┘ └────────┬─────────┘ │
│ │ │ │
│ ▼ ▼ │
│ ┌──────────────────────────────────────────────┐ │
│ │ Storage Engine │ │
│ │ - In-memory head block (WAL-backed) │ │
│ │ - Persistent blocks (Gorilla compression) │ │
│ │ - Inverted index (label → series) │ │
│ │ - Compaction & retention │ │
│ └──────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────┘
```
### Crates
- **metricstor-api**: gRPC client library and protobuf definitions
- **metricstor-types**: Core data types (Metric, TimeSeries, Label, Sample)
- **metricstor-server**: Main server implementation
## Building
### Prerequisites
- Rust 1.75 or later
- Protocol Buffers compiler (provided via `protoc-bin-vendored`)
### Build Commands
```bash
# Build all crates
cargo build --release
# Build specific crate
cargo build -p metricstor-server --release
# Run tests
cargo test
# Check code without building
cargo check
```
### NixOS
The project includes Nix flake support (per T024 patterns):
```bash
# Build with Nix
nix build
# Enter development shell
nix develop
```
## Configuration
Configuration is specified in YAML format. Default location: `config.yaml`
### Example Configuration
```yaml
server:
grpc_addr: "0.0.0.0:9100" # gRPC query API
http_addr: "0.0.0.0:9101" # HTTP remote_write endpoint
max_concurrent_streams: 100
query_timeout_seconds: 30
max_samples_per_query: 10000000
storage:
data_dir: "/var/lib/metricstor"
retention_days: 15
wal_segment_size_mb: 128
block_duration_hours: 2
max_head_samples: 1000000
compaction_interval_seconds: 3600
# Optional: Enable mTLS (T027 unified TLS pattern)
tls:
cert_file: "/etc/metricstor/tls/cert.pem"
key_file: "/etc/metricstor/tls/key.pem"
ca_file: "/etc/metricstor/tls/ca.pem"
require_client_cert: true
```
## Running
```bash
# Run with default config
./target/release/metricstor-server
# Run with custom config
./target/release/metricstor-server --config /path/to/config.yaml
```
## Usage
### Ingesting Metrics
Metricstor implements the Prometheus remote_write protocol v1.0 for push-based metric ingestion.
#### Using Prometheus Remote Write
Configure Prometheus to push metrics to Metricstor:
```yaml
# prometheus.yml
remote_write:
- url: "http://localhost:9101/api/v1/write"
queue_config:
capacity: 10000
max_shards: 10
batch_send_deadline: 5s
# Optional: mTLS configuration
tls_config:
cert_file: client.pem
key_file: client-key.pem
ca_file: ca.pem
```
#### Using the API Directly
You can also push metrics directly using the remote_write protocol:
```bash
# Run the example to push sample metrics
cargo run --example push_metrics
```
The remote_write endpoint (`POST /api/v1/write`) expects:
- **Content-Type**: `application/x-protobuf`
- **Content-Encoding**: `snappy`
- **Body**: Snappy-compressed Prometheus WriteRequest protobuf
See [`examples/push_metrics.rs`](crates/metricstor-server/examples/push_metrics.rs) for a complete implementation example.
#### Features
- **Snappy Compression**: Efficient compression for wire transfer
- **Label Validation**: Prometheus-compliant label name validation
- **Backpressure**: HTTP 429 when write buffer is full
- **Sample Validation**: Rejects NaN and Inf values
- **Buffered Writes**: In-memory batching for performance
### Querying Metrics
Metricstor provides a Prometheus-compatible HTTP API for querying metrics using PromQL.
#### API Endpoints
##### Instant Query
Query metric values at a specific point in time:
```bash
GET /api/v1/query?query=<promql>&time=<timestamp_ms>
# Example
curl 'http://localhost:9101/api/v1/query?query=up&time=1234567890000'
```
Parameters:
- `query` (required): PromQL expression
- `time` (optional): Unix timestamp in milliseconds (defaults to current time)
Response format:
```json
{
"status": "success",
"data": {
"resultType": "vector",
"result": [
{
"metric": {"__name__": "up", "job": "prometheus"},
"value": [1234567890000, 1.0]
}
]
}
}
```
##### Range Query
Query metric values over a time range:
```bash
GET /api/v1/query_range?query=<promql>&start=<ts>&end=<ts>&step=<duration_ms>
# Example
curl 'http://localhost:9101/api/v1/query_range?query=rate(http_requests_total[5m])&start=1234567890000&end=1234571490000&step=60000'
```
Parameters:
- `query` (required): PromQL expression
- `start` (required): Start timestamp in milliseconds
- `end` (required): End timestamp in milliseconds
- `step` (required): Step duration in milliseconds
##### Label Values
Get all values for a specific label:
```bash
GET /api/v1/label/<label_name>/values
# Example
curl 'http://localhost:9101/api/v1/label/job/values'
```
##### Series Metadata
Get metadata for all series:
```bash
GET /api/v1/series
# Example
curl 'http://localhost:9101/api/v1/series'
```
#### Supported PromQL
Metricstor implements a practical subset of PromQL covering 80% of common use cases:
**Selectors:**
```promql
# Metric name
http_requests_total
# Label matching
http_requests_total{method="GET"}
http_requests_total{method="GET", status="200"}
# Label operators
metric{label="value"} # Exact match
metric{label!="value"} # Not equal
metric{label=~"regex"} # Regex match
metric{label!~"regex"} # Negative regex
```
**Range Selectors:**
```promql
http_requests_total[5m] # Last 5 minutes
http_requests_total[1h] # Last 1 hour
```
**Aggregations:**
```promql
sum(http_requests_total)
avg(http_requests_total)
min(http_requests_total)
max(http_requests_total)
count(http_requests_total)
```
**Functions:**
```promql
# Rate functions
rate(http_requests_total[5m]) # Per-second rate
irate(http_requests_total[5m]) # Instant rate (last 2 points)
increase(http_requests_total[1h]) # Total increase over time
```
#### Example Client
Run the example query client to test all query endpoints:
```bash
cargo run --example query_metrics
```
See [`examples/query_metrics.rs`](crates/metricstor-server/examples/query_metrics.rs) for implementation details.
#### Grafana Integration
Configure Grafana to use Metricstor as a Prometheus data source:
1. Add a new Prometheus data source
2. Set URL to `http://localhost:9101`
3. (Optional) Configure mTLS certificates
4. Test connection with instant query
Grafana will automatically use the `/api/v1/query` and `/api/v1/query_range` endpoints for dashboard queries.
## Development Roadmap
This workspace scaffold (S2) provides the foundation. Implementation proceeds as:
- **S2 (Scaffold)**: Complete - workspace structure, types, protobuf definitions
- **S3 (Push Ingestion)**: Complete - Prometheus remote_write endpoint with validation, compression, and buffering (34 tests passing)
- **S4 (PromQL Engine)**: Complete - Query execution engine with instant/range queries, aggregations, rate functions (42 tests passing)
- **S5 (Storage Layer)**: Implement persistent time-series storage backend
- **S6 (Integration)**: NixOS module, testing, documentation
See [`docs/por/T033-metricstor/task.yaml`](../docs/por/T033-metricstor/task.yaml) for detailed task tracking.
## Integration
### Service Ports
- **9100**: gRPC query API (mTLS)
- **9101**: HTTP remote_write API (mTLS)
### Monitoring
Metricstor exports its own metrics on the standard `/metrics` endpoint for self-monitoring.
## License
MIT OR Apache-2.0
## References
- **Task**: T033 Metricstor (PROJECT.md Item 12)
- **Design**: [`docs/por/T033-metricstor/DESIGN.md`](../docs/por/T033-metricstor/DESIGN.md)
- **Dependencies**: T024 (NixOS), T027 (Unified TLS)
- **Prometheus Remote Write**: https://prometheus.io/docs/concepts/remote_write_spec/
- **PromQL**: https://prometheus.io/docs/prometheus/latest/querying/basics/