# SQL Query Engine Feature, Dev, and Test Plan This document outlines the plan for adding comprehensive SQL support to SeaweedFS, focusing on schema-tized Message Queue (MQ) topics with full DDL and DML capabilities, plus S3 objects querying. ## Feature Plan **1. Goal** To provide a full-featured SQL interface for SeaweedFS, treating schema-tized MQ topics as database tables with complete DDL/DML support. This enables: - Database-like operations on MQ topics (CREATE TABLE, ALTER TABLE, DROP TABLE) - Advanced querying with SELECT, WHERE, JOIN, aggregations - Schema management and metadata operations (SHOW DATABASES, SHOW TABLES) - In-place analytics on Parquet-stored messages without data movement **2. Key Features** * **Schema-tized Topic Management (Priority 1):** * `SHOW DATABASES` - List all MQ namespaces * `SHOW TABLES` - List all topics in a namespace * `CREATE TABLE topic_name (field1 INT, field2 STRING, ...)` - Create new MQ topic with schema * `ALTER TABLE topic_name ADD COLUMN field3 BOOL` - Modify topic schema (with versioning) * `DROP TABLE topic_name` - Delete MQ topic * `DESCRIBE table_name` - Show topic schema details * **Advanced Query Engine (Priority 1):** * Full `SELECT` support with `WHERE`, `ORDER BY`, `LIMIT`, `OFFSET` * Aggregation functions: `COUNT()`, `SUM()`, `AVG()`, `MIN()`, `MAX()`, `GROUP BY` * Join operations between topics (leveraging Parquet columnar format) * Window functions and advanced analytics * Temporal queries with timestamp-based filtering * **S3 Select (Priority 2):** * Support for querying objects in standard data formats (CSV, JSON, Parquet) * Queries executed directly on storage nodes to minimize data transfer * **User Interfaces:** * New API endpoint `/sql` for HTTP-based SQL execution * New CLI command `weed sql` with interactive shell mode * Optional: Web UI for query execution and result visualization * **Output Formats:** * JSON (default), CSV, Parquet for result sets * Streaming results for large queries * Pagination support for result navigation ## Development Plan **1. Scaffolding & Dependencies** * **SQL Parser:** **IMPORTANT ARCHITECTURAL DECISION** * **Current Implementation:** Native PostgreSQL parser (`pg_query_go`) * **PostgreSQL Compatibility Issue:** MySQL dialect parser used with PostgreSQL wire protocol creates dialect mismatch: * **Identifier Quoting:** PostgreSQL uses `"identifiers"` vs MySQL `` `identifiers` `` * **String Concatenation:** PostgreSQL uses `||` vs MySQL `CONCAT()` * **System Functions:** PostgreSQL has unique `pg_catalog` system functions * **Recommended Alternatives for Better PostgreSQL Compatibility:** * **`pg_query_go`** - Pure PostgreSQL dialect parser (best compatibility) * **Generic SQL parsers** supporting multiple dialects * **Custom translation layer** (current mitigation strategy) * **Current Mitigation:** Query translation in `protocol.go` handles PostgreSQL-specific queries * **Trade-off:** Implementation complexity vs dialect compatibility * **Project Structure:** * Extend existing `weed/query/` package for SQL execution engine * Create `weed/query/engine/` for query planning and execution * Create `weed/query/metadata/` for schema catalog management * Integration point in `weed/mq/` for topic-to-table mapping **2. SQL Engine Architecture** * **Schema Catalog:** * Leverage existing `weed/mq/schema/` infrastructure * Map MQ namespaces to "databases" and topics to "tables" * Store schema metadata with version history * Handle schema evolution and migration * **Query Planner:** * Parse SQL AST using Vitess parser * Create optimized execution plans leveraging Parquet columnar format * Push-down predicates to storage layer for efficient filtering * Optimize joins using partition pruning * **Query Executor:** * Utilize existing `weed/mq/logstore/` for Parquet reading * Implement streaming execution for large result sets * Support parallel processing across topic partitions * Handle schema evolution during query execution **3. Data Source Integration** * **MQ Topic Connector (Primary):** * Build on existing `weed/mq/logstore/read_parquet_to_log.go` * Implement efficient Parquet scanning with predicate pushdown * Support schema evolution and backward compatibility * Handle partition-based parallelism for scalable queries * **Schema Registry Integration:** * Extend `weed/mq/schema/schema.go` for SQL metadata operations * Implement DDL operations that modify underlying MQ topic schemas * Version control for schema changes with migration support * **S3 Connector (Secondary):** * Reading data from S3 objects with CSV, JSON, and Parquet parsers * Efficient streaming for large files with columnar optimizations **4. API & CLI Integration** * **HTTP API Endpoint:** * Add `/sql` endpoint to Filer server following existing patterns in `weed/server/filer_server.go` * Support both POST (for queries) and GET (for metadata operations) * Include query result pagination and streaming * Authentication and authorization integration * **CLI Command:** * New `weed sql` command with interactive shell mode (similar to `weed shell`) * Support for script execution and result formatting * Connection management for remote SeaweedFS clusters * **gRPC API:** * Add SQL service to existing MQ broker gRPC interface * Enable efficient query execution with streaming results ## Example Usage Scenarios **Scenario 1: Topic Management** ```sql -- List all namespaces (databases) SHOW DATABASES; -- List topics in a namespace USE my_namespace; SHOW TABLES; -- Create a new topic with schema CREATE TABLE user_events ( user_id INT, event_type STRING, timestamp BIGINT, metadata STRING ); -- Modify topic schema ALTER TABLE user_events ADD COLUMN session_id STRING; -- View topic structure DESCRIBE user_events; ``` **Scenario 2: Data Querying** ```sql -- Basic filtering and projection SELECT user_id, event_type, timestamp FROM user_events WHERE timestamp > 1640995200000 ORDER BY timestamp DESC LIMIT 100; -- Aggregation queries SELECT event_type, COUNT(*) as event_count FROM user_events WHERE timestamp >= 1640995200000 GROUP BY event_type; -- Cross-topic joins SELECT u.user_id, u.event_type, p.product_name FROM user_events u JOIN product_catalog p ON u.product_id = p.id WHERE u.event_type = 'purchase'; ``` **Scenario 3: Analytics & Monitoring** ```sql -- Time-series analysis SELECT DATE_TRUNC('hour', FROM_UNIXTIME(timestamp/1000)) as hour, COUNT(*) as events_per_hour FROM user_events WHERE timestamp >= 1640995200000 GROUP BY hour ORDER BY hour; -- Real-time monitoring SELECT event_type, AVG(response_time) as avg_response FROM api_logs WHERE timestamp >= UNIX_TIMESTAMP() - 3600 GROUP BY event_type HAVING avg_response > 1000; ``` ## Architecture Overview ``` SQL Query Flow: ┌─────────────┐ ┌──────────────┐ ┌─────────────────┐ ┌──────────────┐ │ Client │ │ SQL Parser │ │ Query Planner │ │ Execution │ │ (CLI/HTTP) │──→ │ PostgreSQL │──→ │ & Optimizer │──→ │ Engine │ │ │ │ (pg_query) │ │ │ │ │ └─────────────┘ └──────────────┘ └─────────────────┘ └──────────────┘ │ │ ▼ │ ┌─────────────────────────────────────────────────┐│ │ Schema Catalog ││ │ • Namespace → Database mapping ││ │ • Topic → Table mapping ││ │ • Schema version management ││ └─────────────────────────────────────────────────┘│ │ ▼ ┌─────────────────────────────────────────────────────────────────────────────┐ │ MQ Storage Layer │ │ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │ │ │ Topic A │ │ Topic B │ │ Topic C │ │ ... │ │ │ │ (Parquet) │ │ (Parquet) │ │ (Parquet) │ │ (Parquet) │ │ │ └─────────────┘ └─────────────┘ └─────────────┘ └─────────────┘ │ └─────────────────────────────────────────────────────────────────────────────┘ ``` ## Key Design Decisions **1. SQL-to-MQ Mapping Strategy:** * MQ Namespaces ↔ SQL Databases * MQ Topics ↔ SQL Tables * Topic Partitions ↔ Table Shards (transparent to users) * Schema Fields ↔ Table Columns **2. Schema Evolution Handling:** * Maintain schema version history in topic metadata * Support backward-compatible queries across schema versions * Automatic type coercion where possible * Clear error messages for incompatible changes **3. Query Optimization:** * Leverage Parquet columnar format for projection pushdown * Use topic partitioning for parallel query execution * Implement predicate pushdown to minimize data scanning * Cache frequently accessed schema metadata **4. SQL Parser Dialect Strategy:** * **Challenge:** PostgreSQL wire protocol + MySQL-dialect parser = compatibility gap * **Current Approach:** Translation layer in `protocol.go` for PostgreSQL-specific queries * **Supported Translation:** System queries (`version()`, `BEGIN`, `COMMIT`), error codes, type mapping * **Known Limitations:** * Identifier quoting differences (`"` vs `` ` ``) * Function differences (`||` vs `CONCAT()`) * System catalog access (`pg_catalog.*`) * **Future Migration Path:** Consider `pg_query_go` for full PostgreSQL dialect support * **Trade-off Decision:** Rapid development with translation layer vs pure dialect compatibility **5. Transaction Semantics:** * DDL operations (CREATE/ALTER/DROP) are atomic per topic * SELECT queries provide read-consistent snapshots * No cross-topic transactions initially (future enhancement) **6. Performance Considerations:** * Prioritize read performance over write consistency * Leverage MQ's natural partitioning for parallel queries * Use Parquet metadata for query optimization * Implement connection pooling and query caching ## Implementation Phases **Phase 1: Core SQL Infrastructure (Weeks 1-3)** 1. Use native PostgreSQL parser (`pg_query_go`) for better PostgreSQL compatibility 2. Create `weed/query/engine/` package with basic SQL execution framework 3. Implement metadata catalog mapping MQ topics to SQL tables 4. Basic `SHOW DATABASES`, `SHOW TABLES`, `DESCRIBE` commands **Phase 2: DDL Operations (Weeks 4-5)** 1. `CREATE TABLE` → Create MQ topic with schema 2. `ALTER TABLE` → Modify topic schema with versioning 3. `DROP TABLE` → Delete MQ topic 4. Schema validation and migration handling **Phase 3: Query Engine (Weeks 6-8)** 1. `SELECT` with `WHERE`, `ORDER BY`, `LIMIT`, `OFFSET` 2. Aggregation functions and `GROUP BY` 3. Basic joins between topics 4. Predicate pushdown to Parquet layer **Phase 4: API & CLI Integration (Weeks 9-10)** 1. HTTP `/sql` endpoint implementation 2. `weed sql` CLI command with interactive mode 3. Result streaming and pagination 4. Error handling and query optimization ## Test Plan **1. Unit Tests** * **SQL Parser Tests:** Validate parsing of all supported DDL/DML statements * **Schema Mapping Tests:** Test topic-to-table conversion and metadata operations * **Query Planning Tests:** Verify optimization and predicate pushdown logic * **Execution Engine Tests:** Test query execution with various data patterns * **Edge Cases:** Malformed queries, schema evolution, concurrent operations **2. Integration Tests** * **End-to-End Workflow:** Complete SQL operations against live SeaweedFS cluster * **Schema Evolution:** Test backward compatibility during schema changes * **Multi-Topic Joins:** Validate cross-topic query performance and correctness * **Large Dataset Tests:** Performance validation with GB-scale Parquet data * **Concurrent Access:** Multiple SQL sessions operating simultaneously **3. Performance & Security Testing** * **Query Performance:** Benchmark latency for various query patterns * **Memory Usage:** Monitor resource consumption during large result sets * **Scalability Tests:** Performance across multiple partitions and topics * **SQL Injection Prevention:** Security validation of parser and execution engine * **Fuzz Testing:** Automated testing with malformed SQL inputs ## Success Metrics * **Feature Completeness:** Support for all specified DDL/DML operations * **Performance:** Query latency < 100ms for simple selects, < 1s for complex joins * **Scalability:** Handle topics with millions of messages efficiently * **Reliability:** 99.9% success rate for valid SQL operations * **Usability:** Intuitive SQL interface matching standard database expectations