- netboot-base.nix with SSH key auth - Launch scripts for node01/02/03 - Node configuration.nix and disko.nix - Nix modules for first-boot automation 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
10 KiB
T033 Metricstor Validation Plan
Purpose: End-to-end validation checklist for Metricstor integration fix (ingestion → query roundtrip).
Context: E2E validation (E2E_VALIDATION.md) discovered critical bug where IngestionService and QueryService have isolated storage. PeerB is implementing fix to share storage. This plan guides validation of the fix.
Owner: PeerA Created: 2025-12-11 Status: Ready (awaiting PeerB fix completion)
1. Pre-Validation Checks
Before starting validation, verify PeerB has completed:
- Code changes committed to main
- Integration test
test_ingestion_query_roundtripexists intests/integration_test.rs - Integration test passes:
cargo test test_ingestion_query_roundtrip - All existing tests still pass:
cargo test -p metricstor-server - No new compiler warnings introduced
- PeerB has signaled completion via mailbox
Commands:
# Check git status
cd /home/centra/cloud/metricstor
git log -1 --oneline # Verify recent commit from PeerB
# Run integration test
cargo test test_ingestion_query_roundtrip -- --nocapture
# Run all tests
cargo test -p metricstor-server --no-fail-fast
# Check for warnings
cargo check -p metricstor-server 2>&1 | grep -i warning
2. Test Environment Setup
2.1 Clean Environment
# Stop any running metricstor-server instances
pkill -f metricstor-server || true
# Clean old data directory
rm -rf /home/centra/cloud/metricstor/data
# Rebuild in release mode
cd /home/centra/cloud/metricstor
cargo build --release -p metricstor-server
2.2 Verify plasma-demo-api Running
# Check plasma-demo-api is running (port 3000)
curl -s http://127.0.0.1:3000/metrics | head -5
# If not running, start it:
# cd /home/centra/cloud/docs/por/T029-practical-app-demo
# cargo run --release &
2.3 Start metricstor-server
cd /home/centra/cloud/metricstor
./target/release/metricstor-server 2>&1 | tee validation.log &
METRICSTOR_PID=$!
# Wait for startup
sleep 2
# Verify server listening on port 9101
ss -tlnp | grep 9101
3. Test Execution
Test 1: Ingestion → Query Roundtrip (CRITICAL)
3.1 Push Metrics via remote_write
cd /home/centra/cloud/metricstor
cargo run --example push_metrics 2>&1 | tee push_output.txt
# Expected output:
# "Successfully pushed 3 samples to http://127.0.0.1:9101/api/v1/write"
Success Criteria:
- HTTP 204 response received
- No errors in push_output.txt
- Server logs show "Received 3 samples" (check validation.log)
3.2 Query Pushed Metrics (CRITICAL FIX VALIDATION)
# Query the metric we just pushed
curl -s "http://127.0.0.1:9101/api/v1/query?query=http_requests_total" | jq '.'
# Expected output:
# {
# "status": "success",
# "data": {
# "resultType": "vector",
# "result": [
# {
# "metric": {
# "__name__": "http_requests_total",
# "method": "GET",
# "status": "200"
# },
# "value": [<timestamp>, "100"]
# },
# {
# "metric": {
# "__name__": "http_requests_total",
# "method": "POST",
# "status": "201"
# },
# "value": [<timestamp>, "50"]
# }
# ]
# }
# }
Success Criteria:
- ✅
"status": "success" - ✅
resultarray is NOT empty (critical fix - was empty before) - ✅ Contains 2 series (GET and POST)
- ✅ Values match pushed data (100 and 50)
CRITICAL: If result is empty, the fix did NOT work. Stop validation and notify PeerB.
Test 2: Series Metadata API
2.1 Query All Series
curl -s "http://127.0.0.1:9101/api/v1/series" | jq '.'
# Expected: Array with 2 series objects containing labels
Success Criteria:
- Series array contains at least 2 entries
- Each entry has
__name__: "http_requests_total"
2.2 Query Label Values
curl -s "http://127.0.0.1:9101/api/v1/label/method/values" | jq '.'
# Expected output:
# {
# "status": "success",
# "data": ["GET", "POST"]
# }
Success Criteria:
- Returns both "GET" and "POST" values
Test 3: Real-World Scrape (plasma-demo-api)
3.1 Scrape Metrics from plasma-demo-api
# Generate some traffic first
curl http://127.0.0.1:3000/items
curl -X POST http://127.0.0.1:3000/items -H "Content-Type: application/json" -d '{"name":"test"}'
# Fetch metrics from plasma-demo-api
METRICS=$(curl -s http://127.0.0.1:3000/metrics)
# Convert to remote_write format (manual for now, or use existing example)
# This validates real Prometheus-compatible workflow
# NOTE: push_metrics example uses hard-coded data; may need to modify for real scrape
Success Criteria:
- plasma-demo-api exports metrics successfully
- Metrics can be ingested and queried back
Test 4: Persistence Validation
4.1 Restart Server and Query Again
# Stop server gracefully
kill -TERM $METRICSTOR_PID
sleep 2
# Verify data saved to disk
ls -lh /home/centra/cloud/metricstor/data/metricstor.db
# Restart server
cd /home/centra/cloud/metricstor
./target/release/metricstor-server 2>&1 | tee validation_restart.log &
sleep 2
# Query again (should still return data from before restart)
curl -s "http://127.0.0.1:9101/api/v1/query?query=http_requests_total" | jq '.data.result | length'
# Expected output: 2 (same data as before restart)
Success Criteria:
- Data file exists and has non-zero size
- Server restarts successfully
- Query returns same data as before restart (persistence works)
4. Integration Test Verification
Run PeerB's new integration test:
cd /home/centra/cloud/metricstor
cargo test test_ingestion_query_roundtrip -- --nocapture --test-threads=1
# Expected: Test PASSES
# This test should verify POST /write -> GET /query returns data
Success Criteria:
- Test passes without errors
- Test output shows successful ingestion and query
- No race conditions or timing issues
5. Evidence Collection
5.1 Test Results Summary
# Create evidence summary file
cat > /home/centra/cloud/docs/por/T033-metricstor/VALIDATION_EVIDENCE.md <<'EOF'
# T033 Metricstor Validation Evidence
**Date:** $(date -Iseconds)
**Validator:** PeerA
**Fix Implemented By:** PeerB
## Test Results
### Test 1: Ingestion → Query Roundtrip ✅/❌
- Push metrics: [PASS/FAIL]
- Query returns data: [PASS/FAIL]
- Data correctness: [PASS/FAIL]
### Test 2: Series Metadata API ✅/❌
- Series list: [PASS/FAIL]
- Label values: [PASS/FAIL]
### Test 3: Real-World Scrape ✅/❌
- Scrape plasma-demo-api: [PASS/FAIL]
- Query scraped metrics: [PASS/FAIL]
### Test 4: Persistence ✅/❌
- Data saved to disk: [PASS/FAIL]
- Data restored after restart: [PASS/FAIL]
### Integration Test ✅/❌
- test_ingestion_query_roundtrip: [PASS/FAIL]
## Artifacts
- validation.log (server startup logs)
- push_output.txt (ingestion test output)
- validation_restart.log (restart test logs)
## Conclusion
[PASS: MVP-Alpha 12/12 ACHIEVED | FAIL: Additional work required]
EOF
5.2 Capture Logs
# Archive validation logs
mkdir -p /home/centra/cloud/docs/por/T033-metricstor/validation_artifacts
cp validation.log push_output.txt validation_restart.log \
/home/centra/cloud/docs/por/T033-metricstor/validation_artifacts/
5.3 Update Task Status
# If ALL tests pass, update task.yaml status to "complete"
# Add validation evidence to evidence section
# Example evidence entry:
# - path: docs/por/T033-metricstor/VALIDATION_EVIDENCE.md
# note: "Post-fix E2E validation (2025-12-11) - ALL TESTS PASSED"
# outcome: PASS
# details: |
# Validated integration fix by PeerB:
# - ✅ Ingestion → Query roundtrip works (2 series, correct values)
# - ✅ Series metadata API returns data
# - ✅ Persistence across restarts validated
# - ✅ Integration test test_ingestion_query_roundtrip passes
# - Impact: Silent data loss bug FIXED
# - Status: T033 ready for production, MVP-Alpha 12/12 ACHIEVED
6. Decision Criteria
PASS Criteria (Mark T033 Complete)
All of the following must be true:
- ✅ Test 1 (Ingestion → Query) returns non-empty results with correct data
- ✅ Test 2 (Series Metadata) returns expected series and labels
- ✅ Test 4 (Persistence) data survives restart
- ✅ Integration test
test_ingestion_query_roundtrippasses - ✅ All existing tests (57 total) still pass
- ✅ No new compiler warnings
FAIL Criteria (Request Rework)
Any of the following:
- ❌ Query returns empty results (bug not fixed)
- ❌ Integration test fails
- ❌ Existing tests regressed
- ❌ Data not persisted correctly
- ❌ New critical bugs introduced
7. Post-Validation Actions
If PASS:
- Update task.yaml:
- Change
status: needs-fix→status: complete - Add validation evidence to evidence section
- Change
- Update POR.md:
- Change MVP-Alpha from 11/12 to 12/12
- Add decision log entry: "T033 integration fix validated, MVP-Alpha achieved"
- Notify user via to_user.md:
- "T033 Metricstor validation COMPLETE - MVP-Alpha 12/12 ACHIEVED"
- Notify PeerB via to_peer.md:
- "T033 validation passed - excellent fix, integration working correctly"
If FAIL:
- Document failure mode in VALIDATION_EVIDENCE.md
- Notify PeerB via to_peer.md:
- Specific test failures
- Observed vs expected behavior
- Logs and error messages
- Request for rework or guidance
- Do NOT update task.yaml status
- Do NOT update POR.md MVP status
8. Reference
Related Documents:
- E2E_VALIDATION.md - Original bug discovery report
- task.yaml - Task status and steps
- ../T029-practical-app-demo/ - plasma-demo-api source
Key Files to Inspect:
- metricstor-server/src/main.rs - Service initialization (PeerB's fix should be here)
- metricstor-server/src/ingestion.rs - Ingestion service
- metricstor-server/src/query.rs - Query service
- metricstor-server/tests/integration_test.rs - New roundtrip test
Expected Fix Pattern (from foreman message):
// BEFORE (bug):
let ingestion_service = IngestionService::new();
let query_service = QueryService::new_with_persistence(&data_path)?;
// AFTER (fixed):
let storage = Arc::new(RwLock::new(QueryableStorage::new()));
let ingestion_service = IngestionService::new(storage.clone());
let query_service = QueryService::new(storage.clone());
// OR: Implement flush mechanism from ingestion buffer to query storage
END OF VALIDATION PLAN