# T033 Nightlight Validation Plan **Purpose:** End-to-end validation checklist for Nightlight integration fix (ingestion → query roundtrip). **Context:** E2E validation (E2E_VALIDATION.md) discovered critical bug where IngestionService and QueryService have isolated storage. PeerB is implementing fix to share storage. This plan guides validation of the fix. **Owner:** PeerA **Created:** 2025-12-11 **Status:** Ready (awaiting PeerB fix completion) --- ## 1. Pre-Validation Checks **Before starting validation, verify PeerB has completed:** - [ ] Code changes committed to main - [ ] Integration test `test_ingestion_query_roundtrip` exists in `tests/integration_test.rs` - [ ] Integration test passes: `cargo test test_ingestion_query_roundtrip` - [ ] All existing tests still pass: `cargo test -p nightlight-server` - [ ] No new compiler warnings introduced - [ ] PeerB has signaled completion via mailbox **Commands:** ```bash # Check git status cd /home/centra/cloud/nightlight git log -1 --oneline # Verify recent commit from PeerB # Run integration test cargo test test_ingestion_query_roundtrip -- --nocapture # Run all tests cargo test -p nightlight-server --no-fail-fast # Check for warnings cargo check -p nightlight-server 2>&1 | grep -i warning ``` --- ## 2. Test Environment Setup **2.1 Clean Environment** ```bash # Stop any running nightlight-server instances pkill -f nightlight-server || true # Clean old data directory rm -rf /home/centra/cloud/nightlight/data # Rebuild in release mode cd /home/centra/cloud/nightlight cargo build --release -p nightlight-server ``` **2.2 Verify plasma-demo-api Running** ```bash # Check plasma-demo-api is running (port 3000) curl -s http://127.0.0.1:3000/metrics | head -5 # If not running, start it: # cd /home/centra/cloud/docs/por/T029-practical-app-demo # cargo run --release & ``` **2.3 Start nightlight-server** ```bash cd /home/centra/cloud/nightlight ./target/release/nightlight-server 2>&1 | tee validation.log & METRICSTOR_PID=$! # Wait for startup sleep 2 # Verify server listening on port 9101 ss -tlnp | grep 9101 ``` --- ## 3. Test Execution ### Test 1: Ingestion → Query Roundtrip (CRITICAL) **3.1 Push Metrics via remote_write** ```bash cd /home/centra/cloud/nightlight cargo run --example push_metrics 2>&1 | tee push_output.txt # Expected output: # "Successfully pushed 3 samples to http://127.0.0.1:9101/api/v1/write" ``` **Success Criteria:** - HTTP 204 response received - No errors in push_output.txt - Server logs show "Received 3 samples" (check validation.log) **3.2 Query Pushed Metrics (CRITICAL FIX VALIDATION)** ```bash # Query the metric we just pushed curl -s "http://127.0.0.1:9101/api/v1/query?query=http_requests_total" | jq '.' # Expected output: # { # "status": "success", # "data": { # "resultType": "vector", # "result": [ # { # "metric": { # "__name__": "http_requests_total", # "method": "GET", # "status": "200" # }, # "value": [, "100"] # }, # { # "metric": { # "__name__": "http_requests_total", # "method": "POST", # "status": "201" # }, # "value": [, "50"] # } # ] # } # } ``` **Success Criteria:** - ✅ `"status": "success"` - ✅ `result` array is NOT empty (critical fix - was empty before) - ✅ Contains 2 series (GET and POST) - ✅ Values match pushed data (100 and 50) **CRITICAL:** If result is empty, the fix did NOT work. Stop validation and notify PeerB. --- ### Test 2: Series Metadata API **2.1 Query All Series** ```bash curl -s "http://127.0.0.1:9101/api/v1/series" | jq '.' # Expected: Array with 2 series objects containing labels ``` **Success Criteria:** - Series array contains at least 2 entries - Each entry has `__name__: "http_requests_total"` **2.2 Query Label Values** ```bash curl -s "http://127.0.0.1:9101/api/v1/label/method/values" | jq '.' # Expected output: # { # "status": "success", # "data": ["GET", "POST"] # } ``` **Success Criteria:** - Returns both "GET" and "POST" values --- ### Test 3: Real-World Scrape (plasma-demo-api) **3.1 Scrape Metrics from plasma-demo-api** ```bash # Generate some traffic first curl http://127.0.0.1:3000/items curl -X POST http://127.0.0.1:3000/items -H "Content-Type: application/json" -d '{"name":"test"}' # Fetch metrics from plasma-demo-api METRICS=$(curl -s http://127.0.0.1:3000/metrics) # Convert to remote_write format (manual for now, or use existing example) # This validates real Prometheus-compatible workflow # NOTE: push_metrics example uses hard-coded data; may need to modify for real scrape ``` **Success Criteria:** - plasma-demo-api exports metrics successfully - Metrics can be ingested and queried back --- ### Test 4: Persistence Validation **4.1 Restart Server and Query Again** ```bash # Stop server gracefully kill -TERM $METRICSTOR_PID sleep 2 # Verify data saved to disk ls -lh /home/centra/cloud/nightlight/data/nightlight.db # Restart server cd /home/centra/cloud/nightlight ./target/release/nightlight-server 2>&1 | tee validation_restart.log & sleep 2 # Query again (should still return data from before restart) curl -s "http://127.0.0.1:9101/api/v1/query?query=http_requests_total" | jq '.data.result | length' # Expected output: 2 (same data as before restart) ``` **Success Criteria:** - Data file exists and has non-zero size - Server restarts successfully - Query returns same data as before restart (persistence works) --- ## 4. Integration Test Verification **Run PeerB's new integration test:** ```bash cd /home/centra/cloud/nightlight cargo test test_ingestion_query_roundtrip -- --nocapture --test-threads=1 # Expected: Test PASSES # This test should verify POST /write -> GET /query returns data ``` **Success Criteria:** - Test passes without errors - Test output shows successful ingestion and query - No race conditions or timing issues --- ## 5. Evidence Collection **5.1 Test Results Summary** ```bash # Create evidence summary file cat > /home/centra/cloud/docs/por/T033-nightlight/VALIDATION_EVIDENCE.md <<'EOF' # T033 Nightlight Validation Evidence **Date:** $(date -Iseconds) **Validator:** PeerA **Fix Implemented By:** PeerB ## Test Results ### Test 1: Ingestion → Query Roundtrip ✅/❌ - Push metrics: [PASS/FAIL] - Query returns data: [PASS/FAIL] - Data correctness: [PASS/FAIL] ### Test 2: Series Metadata API ✅/❌ - Series list: [PASS/FAIL] - Label values: [PASS/FAIL] ### Test 3: Real-World Scrape ✅/❌ - Scrape plasma-demo-api: [PASS/FAIL] - Query scraped metrics: [PASS/FAIL] ### Test 4: Persistence ✅/❌ - Data saved to disk: [PASS/FAIL] - Data restored after restart: [PASS/FAIL] ### Integration Test ✅/❌ - test_ingestion_query_roundtrip: [PASS/FAIL] ## Artifacts - validation.log (server startup logs) - push_output.txt (ingestion test output) - validation_restart.log (restart test logs) ## Conclusion [PASS: MVP-Alpha 12/12 ACHIEVED | FAIL: Additional work required] EOF ``` **5.2 Capture Logs** ```bash # Archive validation logs mkdir -p /home/centra/cloud/docs/por/T033-nightlight/validation_artifacts cp validation.log push_output.txt validation_restart.log \ /home/centra/cloud/docs/por/T033-nightlight/validation_artifacts/ ``` **5.3 Update Task Status** ```bash # If ALL tests pass, update task.yaml status to "complete" # Add validation evidence to evidence section # Example evidence entry: # - path: docs/por/T033-nightlight/VALIDATION_EVIDENCE.md # note: "Post-fix E2E validation (2025-12-11) - ALL TESTS PASSED" # outcome: PASS # details: | # Validated integration fix by PeerB: # - ✅ Ingestion → Query roundtrip works (2 series, correct values) # - ✅ Series metadata API returns data # - ✅ Persistence across restarts validated # - ✅ Integration test test_ingestion_query_roundtrip passes # - Impact: Silent data loss bug FIXED # - Status: T033 ready for production, MVP-Alpha 12/12 ACHIEVED ``` --- ## 6. Decision Criteria ### PASS Criteria (Mark T033 Complete) All of the following must be true: 1. ✅ Test 1 (Ingestion → Query) returns non-empty results with correct data 2. ✅ Test 2 (Series Metadata) returns expected series and labels 3. ✅ Test 4 (Persistence) data survives restart 4. ✅ Integration test `test_ingestion_query_roundtrip` passes 5. ✅ All existing tests (57 total) still pass 6. ✅ No new compiler warnings ### FAIL Criteria (Request Rework) Any of the following: 1. ❌ Query returns empty results (bug not fixed) 2. ❌ Integration test fails 3. ❌ Existing tests regressed 4. ❌ Data not persisted correctly 5. ❌ New critical bugs introduced --- ## 7. Post-Validation Actions ### If PASS: 1. Update task.yaml: - Change `status: needs-fix` → `status: complete` - Add validation evidence to evidence section 2. Update POR.md: - Change MVP-Alpha from 11/12 to 12/12 - Add decision log entry: "T033 integration fix validated, MVP-Alpha achieved" 3. Notify user via to_user.md: - "T033 Nightlight validation COMPLETE - MVP-Alpha 12/12 ACHIEVED" 4. Notify PeerB via to_peer.md: - "T033 validation passed - excellent fix, integration working correctly" ### If FAIL: 1. Document failure mode in VALIDATION_EVIDENCE.md 2. Notify PeerB via to_peer.md: - Specific test failures - Observed vs expected behavior - Logs and error messages - Request for rework or guidance 3. Do NOT update task.yaml status 4. Do NOT update POR.md MVP status --- ## 8. Reference **Related Documents:** - E2E_VALIDATION.md - Original bug discovery report - task.yaml - Task status and steps - ../T029-practical-app-demo/ - plasma-demo-api source **Key Files to Inspect:** - nightlight-server/src/main.rs - Service initialization (PeerB's fix should be here) - nightlight-server/src/ingestion.rs - Ingestion service - nightlight-server/src/query.rs - Query service - nightlight-server/tests/integration_test.rs - New roundtrip test **Expected Fix Pattern (from foreman message):** ```rust // BEFORE (bug): let ingestion_service = IngestionService::new(); let query_service = QueryService::new_with_persistence(&data_path)?; // AFTER (fixed): let storage = Arc::new(RwLock::new(QueryableStorage::new())); let ingestion_service = IngestionService::new(storage.clone()); let query_service = QueryService::new(storage.clone()); // OR: Implement flush mechanism from ingestion buffer to query storage ``` --- **END OF VALIDATION PLAN**