# FlareDB SQL Layer Design ## Overview This document outlines the design for a SQL-compatible layer built on top of FlareDB's KVS foundation. The goal is to enable SQL queries (DDL/DML) while leveraging FlareDB's existing distributed KVS capabilities. ## Architecture Principles 1. **KVS Foundation**: All SQL data stored as KVS key-value pairs 2. **Simple First**: Start with core SQL subset (no JOINs, no transactions initially) 3. **Efficient Encoding**: Optimize key encoding for range scans 4. **Namespace Isolation**: Use FlareDB namespaces for multi-tenancy ## Key Design Decisions ### 1. SQL Parser **Choice**: Use `sqlparser-rs` crate - Mature, well-tested SQL parser - Supports MySQL/PostgreSQL/ANSI SQL dialects - Easy to extend for custom syntax ### 2. Table Metadata Schema Table metadata stored in KVS with special prefix: ``` Key: __sql_meta:tables:{table_name} Value: TableMetadata { table_id: u32, table_name: String, columns: Vec, primary_key: Vec, created_at: u64, } ColumnDef { name: String, data_type: DataType, nullable: bool, default_value: Option, } DataType enum: - Integer - BigInt - Text - Boolean - Timestamp ``` Table ID allocation: ``` Key: __sql_meta:next_table_id Value: u32 (monotonic counter) ``` ### 3. Row Key Encoding Efficient key encoding for table rows: ``` Format: __sql_data:{table_id}:{primary_key_encoded} Example: Table: users (table_id=1) Primary key: id=42 Key: __sql_data:1:42 ``` For composite primary keys: ``` Format: __sql_data:{table_id}:{pk1}:{pk2}:... Example: Table: order_items (table_id=2) Primary key: (order_id=100, item_id=5) Key: __sql_data:2:100:5 ``` ### 4. Row Value Encoding Row values stored as serialized structs: ``` Value: RowData { columns: HashMap, version: u64, // For optimistic concurrency } Value enum: - Null - Integer(i64) - Text(String) - Boolean(bool) - Timestamp(u64) ``` Serialization: Use `bincode` for efficient binary encoding ### 5. Query Execution Engine Simple query execution pipeline: ``` SQL String ↓ [Parser] ↓ Abstract Syntax Tree (AST) ↓ [Planner] ↓ Execution Plan ↓ [Executor] ↓ Result Set ``` **Supported Operations (v1):** DDL: - CREATE TABLE - DROP TABLE DML: - INSERT INTO ... VALUES (...) - SELECT * FROM table WHERE ... - SELECT col1, col2 FROM table WHERE ... - UPDATE table SET ... WHERE ... - DELETE FROM table WHERE ... **WHERE Clause Support:** - Simple comparisons: =, !=, <, >, <=, >= - Logical operators: AND, OR, NOT - Primary key lookups (optimized) - Full table scans (for non-PK queries) **Query Optimization:** - Primary key point lookups → raw_get() - Primary key range queries → raw_scan() - Non-indexed queries → full table scan ### 6. API Surface New gRPC service: `SqlService` ```protobuf service SqlService { rpc Execute(SqlRequest) returns (SqlResponse); rpc Query(SqlRequest) returns (stream RowBatch); } message SqlRequest { string namespace = 1; string sql = 2; } message SqlResponse { oneof result { DdlResult ddl_result = 1; DmlResult dml_result = 2; QueryResult query_result = 3; ErrorResult error = 4; } } message DdlResult { string message = 1; // "Table created", "Table dropped" } message DmlResult { uint64 rows_affected = 1; } message QueryResult { repeated string columns = 1; repeated Row rows = 2; } message Row { repeated Value values = 1; } message Value { oneof value { int64 int_value = 1; string text_value = 2; bool bool_value = 3; uint64 timestamp_value = 4; } bool is_null = 5; } ``` ### 7. Namespace Integration SQL layer respects FlareDB namespaces: - Each namespace has isolated SQL tables - Table IDs are namespace-scoped - Metadata keys include namespace prefix ``` Key format with namespace: {namespace_id}:__sql_meta:tables:{table_name} {namespace_id}:__sql_data:{table_id}:{primary_key} ``` ## Implementation Plan ### Phase 1: Core Infrastructure (S2) - Table metadata storage - CREATE TABLE / DROP TABLE - Table ID allocation ### Phase 2: Row Storage (S3) - Row key/value encoding - INSERT statement - UPDATE statement - DELETE statement ### Phase 3: Query Engine (S4) - SELECT parser - WHERE clause evaluator - Result set builder - Table scan implementation ### Phase 4: Integration (S5) - E2E tests - Example application - Performance benchmarks ## Performance Considerations 1. **Primary Key Lookups**: O(1) via raw_get() 2. **Range Scans**: O(log N) via raw_scan() with key encoding 3. **Full Table Scans**: O(N) - unavoidable without indexes 4. **Metadata Access**: Cached in memory for frequently accessed tables ## Future Enhancements (Out of Scope) 1. **Secondary Indexes**: Additional KVS entries for non-PK queries 2. **JOINs**: Multi-table query support 3. **Transactions**: ACID guarantees across multiple operations 4. **Query Optimizer**: Cost-based query planning 5. **SQL Standard Compliance**: More data types, functions, etc. ## Testing Strategy 1. **Unit Tests**: Parser, executor, encoding/decoding 2. **Integration Tests**: Full SQL operations via gRPC 3. **E2E Tests**: Real-world application scenarios 4. **Performance Tests**: Benchmark vs PostgreSQL/SQLite baseline ## Example Usage ```rust // Create connection let client = SqlServiceClient::connect("http://127.0.0.1:8001").await?; // Create table client.execute(SqlRequest { namespace: "default".to_string(), sql: "CREATE TABLE users ( id INTEGER PRIMARY KEY, name TEXT NOT NULL, email TEXT, created_at TIMESTAMP )".to_string(), }).await?; // Insert data client.execute(SqlRequest { namespace: "default".to_string(), sql: "INSERT INTO users (id, name, email) VALUES (1, 'Alice', 'alice@example.com')".to_string(), }).await?; // Query data let response = client.query(SqlRequest { namespace: "default".to_string(), sql: "SELECT * FROM users WHERE id = 1".to_string(), }).await?; ``` ## Success Criteria ✓ CREATE/DROP TABLE working ✓ INSERT/UPDATE/DELETE working ✓ SELECT with WHERE clause working ✓ Primary key lookups optimized ✓ Integration tests passing ✓ Example application demonstrating CRUD ## References - sqlparser-rs: https://github.com/sqlparser-rs/sqlparser-rs - FlareDB KVS API: flaredb/proto/kvrpc.proto - RocksDB encoding: https://github.com/facebook/rocksdb/wiki