photoncloud-monorepo/docs/por/T033-metricstor/VALIDATION_PLAN.md
centra d2149b6249 fix(lightningstor): Fix SigV4 canonicalization for AWS S3 auth
- Replace form_urlencoded with RFC 3986 compliant URI encoding
- Implement aws_uri_encode() matching AWS SigV4 spec exactly
- Unreserved chars (A-Z,a-z,0-9,-,_,.,~) not encoded
- All other chars percent-encoded with uppercase hex
- Preserve slashes in paths, encode in query params
- Normalize empty paths to '/' per AWS spec
- Fix test expectations (body hash, HMAC values)
- Add comprehensive SigV4 signature determinism test

This fixes the canonicalization mismatch that caused signature
validation failures in T047. Auth can now be enabled for production.

Refs: T058.S1
2025-12-12 06:23:46 +09:00

10 KiB

T033 Nightlight Validation Plan

Purpose: End-to-end validation checklist for Nightlight integration fix (ingestion → query roundtrip).

Context: E2E validation (E2E_VALIDATION.md) discovered critical bug where IngestionService and QueryService have isolated storage. PeerB is implementing fix to share storage. This plan guides validation of the fix.

Owner: PeerA Created: 2025-12-11 Status: Ready (awaiting PeerB fix completion)


1. Pre-Validation Checks

Before starting validation, verify PeerB has completed:

  • Code changes committed to main
  • Integration test test_ingestion_query_roundtrip exists in tests/integration_test.rs
  • Integration test passes: cargo test test_ingestion_query_roundtrip
  • All existing tests still pass: cargo test -p nightlight-server
  • No new compiler warnings introduced
  • PeerB has signaled completion via mailbox

Commands:

# Check git status
cd /home/centra/cloud/nightlight
git log -1 --oneline  # Verify recent commit from PeerB

# Run integration test
cargo test test_ingestion_query_roundtrip -- --nocapture

# Run all tests
cargo test -p nightlight-server --no-fail-fast

# Check for warnings
cargo check -p nightlight-server 2>&1 | grep -i warning

2. Test Environment Setup

2.1 Clean Environment

# Stop any running nightlight-server instances
pkill -f nightlight-server || true

# Clean old data directory
rm -rf /home/centra/cloud/nightlight/data

# Rebuild in release mode
cd /home/centra/cloud/nightlight
cargo build --release -p nightlight-server

2.2 Verify plasma-demo-api Running

# Check plasma-demo-api is running (port 3000)
curl -s http://127.0.0.1:3000/metrics | head -5

# If not running, start it:
# cd /home/centra/cloud/docs/por/T029-practical-app-demo
# cargo run --release &

2.3 Start nightlight-server

cd /home/centra/cloud/nightlight
./target/release/nightlight-server 2>&1 | tee validation.log &
METRICSTOR_PID=$!

# Wait for startup
sleep 2

# Verify server listening on port 9101
ss -tlnp | grep 9101

3. Test Execution

Test 1: Ingestion → Query Roundtrip (CRITICAL)

3.1 Push Metrics via remote_write

cd /home/centra/cloud/nightlight
cargo run --example push_metrics 2>&1 | tee push_output.txt

# Expected output:
# "Successfully pushed 3 samples to http://127.0.0.1:9101/api/v1/write"

Success Criteria:

  • HTTP 204 response received
  • No errors in push_output.txt
  • Server logs show "Received 3 samples" (check validation.log)

3.2 Query Pushed Metrics (CRITICAL FIX VALIDATION)

# Query the metric we just pushed
curl -s "http://127.0.0.1:9101/api/v1/query?query=http_requests_total" | jq '.'

# Expected output:
# {
#   "status": "success",
#   "data": {
#     "resultType": "vector",
#     "result": [
#       {
#         "metric": {
#           "__name__": "http_requests_total",
#           "method": "GET",
#           "status": "200"
#         },
#         "value": [<timestamp>, "100"]
#       },
#       {
#         "metric": {
#           "__name__": "http_requests_total",
#           "method": "POST",
#           "status": "201"
#         },
#         "value": [<timestamp>, "50"]
#       }
#     ]
#   }
# }

Success Criteria:

  • "status": "success"
  • result array is NOT empty (critical fix - was empty before)
  • Contains 2 series (GET and POST)
  • Values match pushed data (100 and 50)

CRITICAL: If result is empty, the fix did NOT work. Stop validation and notify PeerB.


Test 2: Series Metadata API

2.1 Query All Series

curl -s "http://127.0.0.1:9101/api/v1/series" | jq '.'

# Expected: Array with 2 series objects containing labels

Success Criteria:

  • Series array contains at least 2 entries
  • Each entry has __name__: "http_requests_total"

2.2 Query Label Values

curl -s "http://127.0.0.1:9101/api/v1/label/method/values" | jq '.'

# Expected output:
# {
#   "status": "success",
#   "data": ["GET", "POST"]
# }

Success Criteria:

  • Returns both "GET" and "POST" values

Test 3: Real-World Scrape (plasma-demo-api)

3.1 Scrape Metrics from plasma-demo-api

# Generate some traffic first
curl http://127.0.0.1:3000/items
curl -X POST http://127.0.0.1:3000/items -H "Content-Type: application/json" -d '{"name":"test"}'

# Fetch metrics from plasma-demo-api
METRICS=$(curl -s http://127.0.0.1:3000/metrics)

# Convert to remote_write format (manual for now, or use existing example)
# This validates real Prometheus-compatible workflow
# NOTE: push_metrics example uses hard-coded data; may need to modify for real scrape

Success Criteria:

  • plasma-demo-api exports metrics successfully
  • Metrics can be ingested and queried back

Test 4: Persistence Validation

4.1 Restart Server and Query Again

# Stop server gracefully
kill -TERM $METRICSTOR_PID
sleep 2

# Verify data saved to disk
ls -lh /home/centra/cloud/nightlight/data/nightlight.db

# Restart server
cd /home/centra/cloud/nightlight
./target/release/nightlight-server 2>&1 | tee validation_restart.log &
sleep 2

# Query again (should still return data from before restart)
curl -s "http://127.0.0.1:9101/api/v1/query?query=http_requests_total" | jq '.data.result | length'

# Expected output: 2 (same data as before restart)

Success Criteria:

  • Data file exists and has non-zero size
  • Server restarts successfully
  • Query returns same data as before restart (persistence works)

4. Integration Test Verification

Run PeerB's new integration test:

cd /home/centra/cloud/nightlight
cargo test test_ingestion_query_roundtrip -- --nocapture --test-threads=1

# Expected: Test PASSES
# This test should verify POST /write -> GET /query returns data

Success Criteria:

  • Test passes without errors
  • Test output shows successful ingestion and query
  • No race conditions or timing issues

5. Evidence Collection

5.1 Test Results Summary

# Create evidence summary file
cat > /home/centra/cloud/docs/por/T033-nightlight/VALIDATION_EVIDENCE.md <<'EOF'
# T033 Nightlight Validation Evidence

**Date:** $(date -Iseconds)
**Validator:** PeerA
**Fix Implemented By:** PeerB

## Test Results

### Test 1: Ingestion → Query Roundtrip ✅/❌
- Push metrics: [PASS/FAIL]
- Query returns data: [PASS/FAIL]
- Data correctness: [PASS/FAIL]

### Test 2: Series Metadata API ✅/❌
- Series list: [PASS/FAIL]
- Label values: [PASS/FAIL]

### Test 3: Real-World Scrape ✅/❌
- Scrape plasma-demo-api: [PASS/FAIL]
- Query scraped metrics: [PASS/FAIL]

### Test 4: Persistence ✅/❌
- Data saved to disk: [PASS/FAIL]
- Data restored after restart: [PASS/FAIL]

### Integration Test ✅/❌
- test_ingestion_query_roundtrip: [PASS/FAIL]

## Artifacts
- validation.log (server startup logs)
- push_output.txt (ingestion test output)
- validation_restart.log (restart test logs)

## Conclusion
[PASS: MVP-Alpha 12/12 ACHIEVED | FAIL: Additional work required]
EOF

5.2 Capture Logs

# Archive validation logs
mkdir -p /home/centra/cloud/docs/por/T033-nightlight/validation_artifacts
cp validation.log push_output.txt validation_restart.log \
   /home/centra/cloud/docs/por/T033-nightlight/validation_artifacts/

5.3 Update Task Status

# If ALL tests pass, update task.yaml status to "complete"
# Add validation evidence to evidence section

# Example evidence entry:
# - path: docs/por/T033-nightlight/VALIDATION_EVIDENCE.md
#   note: "Post-fix E2E validation (2025-12-11) - ALL TESTS PASSED"
#   outcome: PASS
#   details: |
#     Validated integration fix by PeerB:
#     - ✅ Ingestion → Query roundtrip works (2 series, correct values)
#     - ✅ Series metadata API returns data
#     - ✅ Persistence across restarts validated
#     - ✅ Integration test test_ingestion_query_roundtrip passes
#     - Impact: Silent data loss bug FIXED
#     - Status: T033 ready for production, MVP-Alpha 12/12 ACHIEVED

6. Decision Criteria

PASS Criteria (Mark T033 Complete)

All of the following must be true:

  1. Test 1 (Ingestion → Query) returns non-empty results with correct data
  2. Test 2 (Series Metadata) returns expected series and labels
  3. Test 4 (Persistence) data survives restart
  4. Integration test test_ingestion_query_roundtrip passes
  5. All existing tests (57 total) still pass
  6. No new compiler warnings

FAIL Criteria (Request Rework)

Any of the following:

  1. Query returns empty results (bug not fixed)
  2. Integration test fails
  3. Existing tests regressed
  4. Data not persisted correctly
  5. New critical bugs introduced

7. Post-Validation Actions

If PASS:

  1. Update task.yaml:
    • Change status: needs-fixstatus: complete
    • Add validation evidence to evidence section
  2. Update POR.md:
    • Change MVP-Alpha from 11/12 to 12/12
    • Add decision log entry: "T033 integration fix validated, MVP-Alpha achieved"
  3. Notify user via to_user.md:
    • "T033 Nightlight validation COMPLETE - MVP-Alpha 12/12 ACHIEVED"
  4. Notify PeerB via to_peer.md:
    • "T033 validation passed - excellent fix, integration working correctly"

If FAIL:

  1. Document failure mode in VALIDATION_EVIDENCE.md
  2. Notify PeerB via to_peer.md:
    • Specific test failures
    • Observed vs expected behavior
    • Logs and error messages
    • Request for rework or guidance
  3. Do NOT update task.yaml status
  4. Do NOT update POR.md MVP status

8. Reference

Related Documents:

  • E2E_VALIDATION.md - Original bug discovery report
  • task.yaml - Task status and steps
  • ../T029-practical-app-demo/ - plasma-demo-api source

Key Files to Inspect:

  • nightlight-server/src/main.rs - Service initialization (PeerB's fix should be here)
  • nightlight-server/src/ingestion.rs - Ingestion service
  • nightlight-server/src/query.rs - Query service
  • nightlight-server/tests/integration_test.rs - New roundtrip test

Expected Fix Pattern (from foreman message):

// BEFORE (bug):
let ingestion_service = IngestionService::new();
let query_service = QueryService::new_with_persistence(&data_path)?;

// AFTER (fixed):
let storage = Arc::new(RwLock::new(QueryableStorage::new()));
let ingestion_service = IngestionService::new(storage.clone());
let query_service = QueryService::new(storage.clone());
// OR: Implement flush mechanism from ingestion buffer to query storage

END OF VALIDATION PLAN