chrislu
d60c542ecc
feat: Replace pg_query_go with lightweight SQL parser (no CGO required)
- Remove github.com/pganalyze/pg_query_go/v6 dependency to avoid CGO requirement
- Implement lightweight SQL parser for basic SELECT, SHOW, and DDL statements
- Fix operator precedence in WHERE clause parsing (handle AND/OR before comparisons)
- Support INTEGER, FLOAT, and STRING literals in WHERE conditions
- All SQL engine tests passing with new parser
- PostgreSQL integration tests can now build without CGO
The lightweight parser handles the essential SQL features needed for the
SeaweedFS query engine while maintaining compatibility and avoiding CGO
dependencies that caused Docker build issues.
6 days ago
chrislu
4d9de40c5c
fmt
6 days ago
chrislu
42661ac110
fix tests
6 days ago
chrislu
991247facf
fix tests
6 days ago
chrislu
e3e369c264
change to pg_query_go
6 days ago
chrislu
ba4a8b91d5
fmt
6 days ago
chrislu
59d6806146
fix empty spaces and coercion
6 days ago
chrislu
3fa7670557
fix todo
6 days ago
chrislu
687c5d6bfd
fix tests
6 days ago
chrislu
a7eb178cec
Update engine.go
6 days ago
chrislu
60066a6a4c
read broker, logs, and parquet files
6 days ago
chrislu
d29f54e0be
de-support alter table and drop table
6 days ago
chrislu
8e15fdf2c7
remove sample data
6 days ago
chrislu
f776a49322
avoid sample data
6 days ago
chrislu
ed7102df6e
column name can be on left or right in where conditions
6 days ago
chrislu
586a795b02
return fast on error
7 days ago
chrislu
c9e093194d
setup integration test for postgres
7 days ago
chrislu
6fb88a8edb
buffer start stored as 8 bytes
1 week ago
chrislu
db75742e37
explain with broker buffer
1 week ago
chrislu
467034c8c7
process buffer from brokers
1 week ago
chrislu
7ca3b59c44
save source buffer index start for log files
1 week ago
chrislu
f5ed25f755
fix decoding data
1 week ago
chrislu
99bfe95e51
detailed logs during explain. Fix bugs on reading live logs.
1 week ago
chrislu
c7a0b89067
fix after refactoring
1 week ago
chrislu
e385f0ce7d
refactor
1 week ago
chrislu
61ad3c39ac
add tests
1 week ago
chrislu
4214d765cf
use mock for testing
1 week ago
chrislu
a3f6a5da27
skip
1 week ago
chrislu
dfd0897e49
improve tests
1 week ago
chrislu
7d88a81482
add tests
1 week ago
chrislu
eaa7136c92
explain the execution plan
1 week ago
chrislu
93a09f5da4
explain
1 week ago
chrislu
55cad6dc4a
combine parquet results with live logs
1 week ago
chrislu
e3798c2ec9
sql
1 week ago
chrislu
55dfb97fc8
parquet file generation remember the sources also
1 week ago
chrislu
2fa8991a52
scan all files
1 week ago
chrislu
c7598d89f1
remove emoji
1 week ago
chrislu
c73ceac79f
use parquet statistics for optimization
1 week ago
chrislu
471ba271dc
fix reading system fields
1 week ago
chrislu
8645f3a264
column name case insensitive, better auto column names
1 week ago
chrislu
32e73811f2
support aggregation functions
1 week ago
chrislu
cf9ad26608
scan topic messages
1 week ago
chrislu
3e54e7356c
show tables works
1 week ago
chrislu
aa883472a5
show databases works
1 week ago
chrislu
675ec42fad
integer conversion
1 week ago
chrislu
4858f21639
feat: Extended WHERE Operators - Complete Advanced Filtering
✅ **EXTENDED WHERE OPERATORS IMPLEMENTEDtest ./weed/query/engine/ -v | grep -E PASS
1 week ago
chrislu
db363d025d
feat: Time Filter Extraction - Complete Performance Optimization
✅ FOURTH HIGH PRIORITY TODO COMPLETED!
⏰ **Time Filter Extraction & Push-Down Optimization** (engine.go:198-199)
- Replaced hardcoded StartTimeNs=0, StopTimeNs=0 with intelligent extraction
- Added extractTimeFilters() with recursive WHERE clause analysis
- Smart time column detection (\_timestamp_ns, created_at, timestamp, etc.)
- Comprehensive time value parsing (nanoseconds, ISO dates, datetime formats)
- Operator reversal handling (column op value vs value op column)
🧠 **Intelligent WHERE Clause Processing:**
- AND expressions: Combine time bounds (intersection) ✅
- OR expressions: Skip extraction (safety) ✅
- Parentheses: Recursive unwrapping ✅
- Comparison operators: >, >=, <, <=, = ✅
- Multiple time formats: nanoseconds, RFC3339, date-only, datetime ✅
🚀 **Performance Impact:**
- Push-down filtering to hybrid scanner level
- Reduced data scanning at source (live logs + Parquet files)
- Time-based partition pruning potential
- Significant performance gains for time-series queries
📊 **Comprehensive Testing (21 tests passing):**
- ✅ Time filter extraction (6 test scenarios)
- ✅ Time column recognition (case-insensitive)
- ✅ Time value parsing (5 formats)
- ✅ Full integration with SELECT queries
- ✅ Backward compatibility maintained
💡 **Real-World Query Examples:**
Before: Scans ALL data, filters in memory
SELECT * FROM events WHERE \_timestamp_ns > 1672531200000000000;
After: Scans ONLY relevant time range at source level
→ StartTimeNs=1672531200000000000, StopTimeNs=0
→ Massive performance improvement for large datasets!
🎯 **Production Ready Features:**
- Multiple time column formats supported
- Graceful fallbacks for invalid dates
- OR clause safety (avoids incorrect optimization)
- Comprehensive error handling
**ALL MEDIUM PRIORITY TODOs NOW READY FOR NEXT PHASEtest ./weed/query/engine/ -v* 🎉
1 week ago
chrislu
593c1ebef2
fix: Resolve High Priority TODOs - Real MQ Broker Integration
✅ COMPLETED HIGH PRIORITY TODOs:
🔧 **Real FilerClient Integration** (engine.go:131)
- Implemented GetFilerClient() method in BrokerClient
- Added filerClientImpl with full FilerClient interface compliance
- Added AdjustedUrl() and GetDataCenter() methods
- Real filerClient connection replaces nil fallback
🔧 **Partition Discovery via MQ Broker** (hybrid_message_scanner.go:116)
- Added ListTopicPartitions() method using topic configuration
- Implemented discoverTopicPartitions() in HybridMessageScanner
- Reads actual partition count from BrokerPartitionAssignments
- Generates proper partition ranges based on topic.PartitionCount
📋 **Technical Fixes:**
- Fixed compilation errors with undefined variables
- Proper error handling with filerClientErr variable
- Corrected ConfigureTopicResponse field usage (BrokerPartitionAssignments vs PartitionCount)
- Complete FilerClient interface implementation
🎯 **Impact:**
- SQL engine now connects to real MQ broker infrastructure
- Actual topic partition discovery instead of hardcoded defaults
- Production-ready broker integration with graceful fallbacks
- Maintains backward compatibility with sample data when broker unavailable
✅ All tests passing - High priority TODO resolution complete!
Next: Schema-aware message parsing and time filter optimization.
1 week ago
chrislu
fe41380d51
feat: Phase 2 - Add DDL operations and real MQ broker integration
Implements comprehensive DDL support for MQ topic management:
New Components:
- Real MQ broker connectivity via BrokerClient
- CREATE TABLE → ConfigureTopic gRPC calls
- DROP TABLE → DeleteTopic operations
- DESCRIBE table → Schema introspection
- SQL type mapping (SQL ↔ MQ schema types)
Enhanced Features:
- Live topic discovery from MQ broker
- Fallback to cached/sample data when broker unavailable
- MySQL-compatible DESCRIBE output
- Schema validation and error handling
- CREATE TABLE with column definitions
Key Infrastructure:
- broker_client.go: gRPC communication with MQ broker
- sql_types.go: Bidirectional SQL/MQ type conversion
- describe.go: Table schema introspection
- Enhanced engine.go: Full DDL routing and execution
Supported SQL Operations:
✅ SHOW DATABASES, SHOW TABLES (live + fallback)
✅ CREATE TABLE table_name (col1 INT, col2 VARCHAR(50), ...)
✅ DROP TABLE table_name
✅ DESCRIBE table_name / SHOW COLUMNS FROM table_name
Known Limitations:
- SQL parser issues with reserved keywords (e.g., 'timestamp')
- Requires running MQ broker for full functionality
- ALTER TABLE not yet implemented
- DeleteTopic method needs broker-side implementation
Architecture Decisions:
- Broker discovery via filer lock mechanism (same as shell commands)
- Graceful fallback when broker unavailable
- ConfigureTopic for CREATE TABLE with 6 default partitions
- Schema versioning ready for ALTER TABLE support
Testing:
- Unit tests updated with filer address parameter
- Integration tests for DDL operations
- Error handling for connection failures
Next Phase: SELECT query execution with Parquet scanning
1 week ago
chrislu
ad86637e0b
feat: Phase 1 - Add SQL query engine foundation for MQ topics
Implements core SQL infrastructure with metadata operations:
New Components:
- SQL parser integration using github.com/xwb1989/sqlparser
- Query engine framework in weed/query/engine/
- Schema catalog mapping MQ topics to SQL tables
- Interactive SQL CLI command 'weed sql'
Supported Operations:
- SHOW DATABASES (lists MQ namespaces)
- SHOW TABLES (lists MQ topics)
- SQL statement parsing and routing
- Error handling and result formatting
Key Design Decisions:
- MQ namespaces ↔ SQL databases
- MQ topics ↔ SQL tables
- Parquet message storage ready for querying
- Backward-compatible schema evolution support
Testing:
- Unit tests for core engine functionality
- Command integration tests
- Parse error handling validation
Assumptions (documented in code):
- All MQ messages stored in Parquet format
- Schema evolution maintains backward compatibility
- MySQL-compatible SQL syntax via sqlparser
- Single-threaded usage per SQL session
Next Phase: DDL operations (CREATE/ALTER/DROP TABLE)
1 week ago