- Replace form_urlencoded with RFC 3986 compliant URI encoding - Implement aws_uri_encode() matching AWS SigV4 spec exactly - Unreserved chars (A-Z,a-z,0-9,-,_,.,~) not encoded - All other chars percent-encoded with uppercase hex - Preserve slashes in paths, encode in query params - Normalize empty paths to '/' per AWS spec - Fix test expectations (body hash, HMAC values) - Add comprehensive SigV4 signature determinism test This fixes the canonicalization mismatch that caused signature validation failures in T047. Auth can now be enabled for production. Refs: T058.S1
349 lines
10 KiB
Markdown
349 lines
10 KiB
Markdown
# Nightlight
|
|
|
|
A Prometheus-compatible metrics storage system with mTLS support, written in Rust.
|
|
|
|
## Overview
|
|
|
|
Nightlight is a high-performance time-series database designed to replace VictoriaMetrics
|
|
in environments requiring open-source mTLS support. It provides:
|
|
|
|
- **Prometheus Compatibility**: Remote write ingestion and PromQL query support
|
|
- **mTLS Security**: Mutual TLS authentication for all connections
|
|
- **Push-based Ingestion**: Accept metrics via Prometheus remote_write protocol
|
|
- **Scalable Storage**: Efficient time-series storage with compression and retention
|
|
- **PromQL Engine**: Query metrics using the Prometheus query language
|
|
|
|
This project is part of the cloud infrastructure stack (PROJECT.md Item 12).
|
|
|
|
## Architecture
|
|
|
|
For detailed architecture documentation, see [`docs/por/T033-nightlight/DESIGN.md`](../docs/por/T033-nightlight/DESIGN.md).
|
|
|
|
### High-Level Components
|
|
|
|
```
|
|
┌─────────────────────────────────────────────────────────────────┐
|
|
│ Nightlight Server │
|
|
├─────────────────────────────────────────────────────────────────┤
|
|
│ │
|
|
│ ┌──────────────────┐ ┌──────────────────┐ │
|
|
│ │ HTTP Ingestion │ │ gRPC Query │ │
|
|
│ │ (remote_write) │ │ (PromQL API) │ │
|
|
│ └────────┬─────────┘ └────────┬─────────┘ │
|
|
│ │ │ │
|
|
│ ▼ ▼ │
|
|
│ ┌──────────────────────────────────────────────┐ │
|
|
│ │ Storage Engine │ │
|
|
│ │ - In-memory head block (WAL-backed) │ │
|
|
│ │ - Persistent blocks (Gorilla compression) │ │
|
|
│ │ - Inverted index (label → series) │ │
|
|
│ │ - Compaction & retention │ │
|
|
│ └──────────────────────────────────────────────┘ │
|
|
│ │
|
|
└─────────────────────────────────────────────────────────────────┘
|
|
```
|
|
|
|
### Crates
|
|
|
|
- **nightlight-api**: gRPC client library and protobuf definitions
|
|
- **nightlight-types**: Core data types (Metric, TimeSeries, Label, Sample)
|
|
- **nightlight-server**: Main server implementation
|
|
|
|
## Building
|
|
|
|
### Prerequisites
|
|
|
|
- Rust 1.75 or later
|
|
- Protocol Buffers compiler (provided via `protoc-bin-vendored`)
|
|
|
|
### Build Commands
|
|
|
|
```bash
|
|
# Build all crates
|
|
cargo build --release
|
|
|
|
# Build specific crate
|
|
cargo build -p nightlight-server --release
|
|
|
|
# Run tests
|
|
cargo test
|
|
|
|
# Check code without building
|
|
cargo check
|
|
```
|
|
|
|
### NixOS
|
|
|
|
The project includes Nix flake support (per T024 patterns):
|
|
|
|
```bash
|
|
# Build with Nix
|
|
nix build
|
|
|
|
# Enter development shell
|
|
nix develop
|
|
```
|
|
|
|
## Configuration
|
|
|
|
Configuration is specified in YAML format. Default location: `config.yaml`
|
|
|
|
### Example Configuration
|
|
|
|
```yaml
|
|
server:
|
|
grpc_addr: "0.0.0.0:9100" # gRPC query API
|
|
http_addr: "0.0.0.0:9101" # HTTP remote_write endpoint
|
|
max_concurrent_streams: 100
|
|
query_timeout_seconds: 30
|
|
max_samples_per_query: 10000000
|
|
|
|
storage:
|
|
data_dir: "/var/lib/nightlight"
|
|
retention_days: 15
|
|
wal_segment_size_mb: 128
|
|
block_duration_hours: 2
|
|
max_head_samples: 1000000
|
|
compaction_interval_seconds: 3600
|
|
|
|
# Optional: Enable mTLS (T027 unified TLS pattern)
|
|
tls:
|
|
cert_file: "/etc/nightlight/tls/cert.pem"
|
|
key_file: "/etc/nightlight/tls/key.pem"
|
|
ca_file: "/etc/nightlight/tls/ca.pem"
|
|
require_client_cert: true
|
|
```
|
|
|
|
## Running
|
|
|
|
```bash
|
|
# Run with default config
|
|
./target/release/nightlight-server
|
|
|
|
# Run with custom config
|
|
./target/release/nightlight-server --config /path/to/config.yaml
|
|
```
|
|
|
|
## Usage
|
|
|
|
### Ingesting Metrics
|
|
|
|
Nightlight implements the Prometheus remote_write protocol v1.0 for push-based metric ingestion.
|
|
|
|
#### Using Prometheus Remote Write
|
|
|
|
Configure Prometheus to push metrics to Nightlight:
|
|
|
|
```yaml
|
|
# prometheus.yml
|
|
remote_write:
|
|
- url: "http://localhost:9101/api/v1/write"
|
|
queue_config:
|
|
capacity: 10000
|
|
max_shards: 10
|
|
batch_send_deadline: 5s
|
|
# Optional: mTLS configuration
|
|
tls_config:
|
|
cert_file: client.pem
|
|
key_file: client-key.pem
|
|
ca_file: ca.pem
|
|
```
|
|
|
|
#### Using the API Directly
|
|
|
|
You can also push metrics directly using the remote_write protocol:
|
|
|
|
```bash
|
|
# Run the example to push sample metrics
|
|
cargo run --example push_metrics
|
|
```
|
|
|
|
The remote_write endpoint (`POST /api/v1/write`) expects:
|
|
- **Content-Type**: `application/x-protobuf`
|
|
- **Content-Encoding**: `snappy`
|
|
- **Body**: Snappy-compressed Prometheus WriteRequest protobuf
|
|
|
|
See [`examples/push_metrics.rs`](crates/nightlight-server/examples/push_metrics.rs) for a complete implementation example.
|
|
|
|
#### Features
|
|
|
|
- **Snappy Compression**: Efficient compression for wire transfer
|
|
- **Label Validation**: Prometheus-compliant label name validation
|
|
- **Backpressure**: HTTP 429 when write buffer is full
|
|
- **Sample Validation**: Rejects NaN and Inf values
|
|
- **Buffered Writes**: In-memory batching for performance
|
|
|
|
### Querying Metrics
|
|
|
|
Nightlight provides a Prometheus-compatible HTTP API for querying metrics using PromQL.
|
|
|
|
#### API Endpoints
|
|
|
|
##### Instant Query
|
|
|
|
Query metric values at a specific point in time:
|
|
|
|
```bash
|
|
GET /api/v1/query?query=<promql>&time=<timestamp_ms>
|
|
|
|
# Example
|
|
curl 'http://localhost:9101/api/v1/query?query=up&time=1234567890000'
|
|
```
|
|
|
|
Parameters:
|
|
- `query` (required): PromQL expression
|
|
- `time` (optional): Unix timestamp in milliseconds (defaults to current time)
|
|
|
|
Response format:
|
|
```json
|
|
{
|
|
"status": "success",
|
|
"data": {
|
|
"resultType": "vector",
|
|
"result": [
|
|
{
|
|
"metric": {"__name__": "up", "job": "prometheus"},
|
|
"value": [1234567890000, 1.0]
|
|
}
|
|
]
|
|
}
|
|
}
|
|
```
|
|
|
|
##### Range Query
|
|
|
|
Query metric values over a time range:
|
|
|
|
```bash
|
|
GET /api/v1/query_range?query=<promql>&start=<ts>&end=<ts>&step=<duration_ms>
|
|
|
|
# Example
|
|
curl 'http://localhost:9101/api/v1/query_range?query=rate(http_requests_total[5m])&start=1234567890000&end=1234571490000&step=60000'
|
|
```
|
|
|
|
Parameters:
|
|
- `query` (required): PromQL expression
|
|
- `start` (required): Start timestamp in milliseconds
|
|
- `end` (required): End timestamp in milliseconds
|
|
- `step` (required): Step duration in milliseconds
|
|
|
|
##### Label Values
|
|
|
|
Get all values for a specific label:
|
|
|
|
```bash
|
|
GET /api/v1/label/<label_name>/values
|
|
|
|
# Example
|
|
curl 'http://localhost:9101/api/v1/label/job/values'
|
|
```
|
|
|
|
##### Series Metadata
|
|
|
|
Get metadata for all series:
|
|
|
|
```bash
|
|
GET /api/v1/series
|
|
|
|
# Example
|
|
curl 'http://localhost:9101/api/v1/series'
|
|
```
|
|
|
|
#### Supported PromQL
|
|
|
|
Nightlight implements a practical subset of PromQL covering 80% of common use cases:
|
|
|
|
**Selectors:**
|
|
```promql
|
|
# Metric name
|
|
http_requests_total
|
|
|
|
# Label matching
|
|
http_requests_total{method="GET"}
|
|
http_requests_total{method="GET", status="200"}
|
|
|
|
# Label operators
|
|
metric{label="value"} # Exact match
|
|
metric{label!="value"} # Not equal
|
|
metric{label=~"regex"} # Regex match
|
|
metric{label!~"regex"} # Negative regex
|
|
```
|
|
|
|
**Range Selectors:**
|
|
```promql
|
|
http_requests_total[5m] # Last 5 minutes
|
|
http_requests_total[1h] # Last 1 hour
|
|
```
|
|
|
|
**Aggregations:**
|
|
```promql
|
|
sum(http_requests_total)
|
|
avg(http_requests_total)
|
|
min(http_requests_total)
|
|
max(http_requests_total)
|
|
count(http_requests_total)
|
|
```
|
|
|
|
**Functions:**
|
|
```promql
|
|
# Rate functions
|
|
rate(http_requests_total[5m]) # Per-second rate
|
|
irate(http_requests_total[5m]) # Instant rate (last 2 points)
|
|
increase(http_requests_total[1h]) # Total increase over time
|
|
```
|
|
|
|
#### Example Client
|
|
|
|
Run the example query client to test all query endpoints:
|
|
|
|
```bash
|
|
cargo run --example query_metrics
|
|
```
|
|
|
|
See [`examples/query_metrics.rs`](crates/nightlight-server/examples/query_metrics.rs) for implementation details.
|
|
|
|
#### Grafana Integration
|
|
|
|
Configure Grafana to use Nightlight as a Prometheus data source:
|
|
|
|
1. Add a new Prometheus data source
|
|
2. Set URL to `http://localhost:9101`
|
|
3. (Optional) Configure mTLS certificates
|
|
4. Test connection with instant query
|
|
|
|
Grafana will automatically use the `/api/v1/query` and `/api/v1/query_range` endpoints for dashboard queries.
|
|
|
|
## Development Roadmap
|
|
|
|
This workspace scaffold (S2) provides the foundation. Implementation proceeds as:
|
|
|
|
- **S2 (Scaffold)**: Complete - workspace structure, types, protobuf definitions
|
|
- **S3 (Push Ingestion)**: Complete - Prometheus remote_write endpoint with validation, compression, and buffering (34 tests passing)
|
|
- **S4 (PromQL Engine)**: Complete - Query execution engine with instant/range queries, aggregations, rate functions (42 tests passing)
|
|
- **S5 (Storage Layer)**: Implement persistent time-series storage backend
|
|
- **S6 (Integration)**: NixOS module, testing, documentation
|
|
|
|
See [`docs/por/T033-nightlight/task.yaml`](../docs/por/T033-nightlight/task.yaml) for detailed task tracking.
|
|
|
|
## Integration
|
|
|
|
### Service Ports
|
|
|
|
- **9100**: gRPC query API (mTLS)
|
|
- **9101**: HTTP remote_write API (mTLS)
|
|
|
|
### Monitoring
|
|
|
|
Nightlight exports its own metrics on the standard `/metrics` endpoint for self-monitoring.
|
|
|
|
## License
|
|
|
|
MIT OR Apache-2.0
|
|
|
|
## References
|
|
|
|
- **Task**: T033 Nightlight (PROJECT.md Item 12)
|
|
- **Design**: [`docs/por/T033-nightlight/DESIGN.md`](../docs/por/T033-nightlight/DESIGN.md)
|
|
- **Dependencies**: T024 (NixOS), T027 (Unified TLS)
|
|
- **Prometheus Remote Write**: https://prometheus.io/docs/concepts/remote_write_spec/
|
|
- **PromQL**: https://prometheus.io/docs/prometheus/latest/querying/basics/
|