cleanup

2 days ago · afbe52f262
3 changed files with 0 additions and 800 deletions
--- a/BUCKET_POLICY_ENGINE_INTEGRATION.md
+++ b/BUCKET_POLICY_ENGINE_INTEGRATION.md
@ -1,242 +0,0 @@
 # Bucket Policy Engine Integration - Complete
 ## Summary
 Successfully integrated the `policy_engine` package to evaluate bucket policies for **all requests** (both anonymous and authenticated). This provides comprehensive AWS S3-compatible bucket policy support.
 ## What Changed
 ### 1. **New File: `s3api_bucket_policy_engine.go`**
 Created a wrapper around `policy_engine.PolicyEngine` to:
 - Load bucket policies from filer entries
 - Sync policies from the bucket config cache
 - Evaluate policies for any request (bucket, object, action, principal)
 - Return structured results (allowed, evaluated, error)
 ### 2. **Modified: `s3api_server.go`**
 - Added `policyEngine *BucketPolicyEngine` field to `S3ApiServer` struct
 - Initialized the policy engine in `NewS3ApiServerWithStore()`
 - Linked `IdentityAccessManagement` back to `S3ApiServer` for policy evaluation
 ### 3. **Modified: `auth_credentials.go`**
 - Added `s3ApiServer *S3ApiServer` field to `IdentityAccessManagement` struct
 - Added `buildPrincipalARN()` helper to convert identities to AWS ARN format
 - **Integrated bucket policy evaluation into the authentication flow:**
  - Policies are now checked **before** IAM/identity-based permissions
  - Explicit `Deny` in bucket policy blocks access immediately
  - Explicit `Allow` in bucket policy grants access and **bypasses IAM checks** (enables cross-account access)
  - If no policy exists, falls through to normal IAM checks
  - Policy evaluation errors result in access denial (fail-close security)
 ### 4. **Modified: `s3api_bucket_config.go`**
 - Added policy engine sync when bucket configs are loaded
 - Ensures policies are loaded into the engine for evaluation
 ### 5. **Modified: `auth_credentials_subscribe.go`**
 - Added policy engine sync when bucket metadata changes
 - Keeps the policy engine up-to-date via event-driven updates
 ## How It Works
 ### Anonymous Requests
 ```
 1. Request comes in (no credentials)
 2. Check ACL-based public access → if public, allow
 3. Check bucket policy for anonymous ("*") access → if allowed, allow
 4. Otherwise, deny
 ```
 ### Authenticated Requests (NEW!)
 ```
 1. Request comes in (with credentials)
 2. Authenticate user → get Identity
 3. Build principal ARN (e.g., "arn:aws:iam::123456:user/bob")
 4. Check bucket policy:
   - If DENY → reject immediately
   - If ALLOW → grant access immediately (bypasses IAM checks)
   - If no policy or no matching statements → continue to step 5
 5. Check IAM/identity-based permissions (only if not already allowed by bucket policy)
 6. Allow or deny based on identity permissions
 ```
 ## Policy Evaluation Flow
 ```
 ┌─────────────────────────────────────────────────────────┐
 │                   Request (GET /bucket/file)            │
 └───────────────────────────┬─────────────────────────────┘
                            │
                ┌───────────▼──────────┐
                │  Authenticate User   │
                │  (or Anonymous)      │
                └───────────┬──────────┘
                            │
                ┌───────────▼──────────────────────────────┐
                │  Build Principal ARN                     │
                │  - Anonymous: "*"                        │
                │  - User: "arn:aws:iam::123456:user/bob"  │
                └───────────┬──────────────────────────────┘
                            │
                ┌───────────▼──────────────────────────────┐
                │  Evaluate Bucket Policy (PolicyEngine)   │
                │  - Action: "s3:GetObject"                │
                │  - Resource: "arn:aws:s3:::bucket/file"  │
                │  - Principal: (from above)               │
                └───────────┬──────────────────────────────┘
                            │
              ┌─────────────┼─────────────┐
              │             │             │
         DENY │        ALLOW │        NO POLICY
              │             │             │
              ▼             ▼             ▼
        Reject Request  Grant Access  Continue
                                          │
                      ┌───────────────────┘
                      │
         ┌────────────▼─────────────┐
         │  IAM/Identity Check      │
         │  (identity.canDo)        │
         └────────────┬─────────────┘
                      │
            ┌─────────┴─────────┐
            │                   │
       ALLOW │              DENY │
            ▼                   ▼
     Grant Access        Reject Request
 ```
 ## Example Policies That Now Work
 ### 1. **Public Read Access** (Anonymous)
 ```json
 {
  "Version": "2012-10-17",
  "Statement": [{
    "Effect": "Allow",
    "Principal": "*",
    "Action": "s3:GetObject",
    "Resource": "arn:aws:s3:::mybucket/*"
  }]
 }
 ```
 - Anonymous users can read all objects
 - Authenticated users are also evaluated against this policy. If they don't match an explicit `Allow` for this action, they will fall back to their own IAM permissions
 ### 2. **Grant Access to Specific User** (Authenticated)
 ```json
 {
  "Version": "2012-10-17",
  "Statement": [{
    "Effect": "Allow",
    "Principal": {"AWS": "arn:aws:iam::123456789012:user/bob"},
    "Action": ["s3:GetObject", "s3:PutObject"],
    "Resource": "arn:aws:s3:::mybucket/shared/*"
  }]
 }
 ```
 - User "bob" can read/write objects in `/shared/` prefix
 - Other users cannot (unless granted by their IAM policies)
 ### 3. **Deny Access to Specific Path** (Both)
 ```json
 {
  "Version": "2012-10-17",
  "Statement": [{
    "Effect": "Deny",
    "Principal": "*",
    "Action": "s3:*",
    "Resource": "arn:aws:s3:::mybucket/confidential/*"
  }]
 }
 ```
 - **No one** can access `/confidential/` objects
 - Denies override all other allows (AWS policy evaluation rules)
 ## Performance Characteristics
 ### Policy Loading
 - **Cold start**: Policy loaded from filer → parsed → compiled → cached
 - **Warm path**: Policy retrieved from `BucketConfigCache` (already parsed)
 - **Updates**: Event-driven sync via metadata subscription (real-time)
 ### Policy Evaluation
 - **Compiled policies**: Pre-compiled regex patterns and matchers
 - **Pattern cache**: Regex patterns cached with LRU eviction (max 1000)
 - **Fast path**: Common patterns (`*`, exact matches) optimized
 - **Case sensitivity**: Actions case-insensitive, resources case-sensitive (AWS-compatible)
 ### Overhead
 - **Anonymous requests**: Minimal (policy already checked, now using compiled engine)
 - **Authenticated requests**: ~1-2ms added for policy evaluation (compiled patterns)
 - **No policy**: Near-zero overhead (quick indeterminate check)
 ## Testing
 All tests pass:
 ```bash
 ✅ TestBucketPolicyValidationBasics
 ✅ TestPrincipalMatchesAnonymous
 ✅ TestActionToS3Action
 ✅ TestResourceMatching
 ✅ TestMatchesPatternRegexEscaping (security tests)
 ✅ TestActionMatchingCaseInsensitive
 ✅ TestResourceMatchingCaseSensitive
 ✅ All policy_engine package tests (30+ tests)
 ```
 ## Security Improvements
 1. **Regex Metacharacter Escaping**: Patterns like `*.json` properly match only files ending in `.json` (not `filexjson`)
 2. **Case-Insensitive Actions**: S3 actions matched case-insensitively per AWS spec
 3. **Case-Sensitive Resources**: Resource paths matched case-sensitively for security
 4. **Pattern Cache Size Limit**: Prevents DoS attacks via unbounded cache growth
 5. **Principal Validation**: Supports `[]string` for manually constructed policies
 ## AWS Compatibility
 The implementation follows AWS S3 bucket policy evaluation rules:
 1. **Explicit Deny** always wins (checked first)
 2. **Explicit Allow** grants access (checked second)
 3. **Default Deny** if no matching statements (implicit)
 4. Bucket policies work alongside IAM policies (both are evaluated)
 ## Files Changed
 ```
 Modified:
  weed/s3api/auth_credentials.go           (+47 lines)
  weed/s3api/auth_credentials_subscribe.go (+8 lines)
  weed/s3api/s3api_bucket_config.go        (+8 lines)
  weed/s3api/s3api_server.go               (+5 lines)
 New:
  weed/s3api/s3api_bucket_policy_engine.go (115 lines)
 ```
 ## Migration Notes
 - **Backward Compatible**: Existing setups without bucket policies work unchanged
 - **No Breaking Changes**: All existing ACL and IAM-based authorization still works
 - **Additive Feature**: Bucket policies are an additional layer of authorization
 - **Performance**: Minimal impact on existing workloads
 ## Future Enhancements
 Potential improvements (not implemented yet):
 - [ ] Condition support (IP address, time-based, etc.) - already in policy_engine
 - [ ] Cross-account policies (different AWS accounts)
 - [ ] Policy validation API endpoint
 - [ ] Policy simulation/testing tool
 - [ ] Metrics for policy evaluations (allow/deny counts)
 ## Conclusion
 Bucket policies now work for **all requests** in SeaweedFS S3 API:
 - ✅ Anonymous requests (public access)
 - ✅ Authenticated requests (user-specific policies)
 - ✅ High performance (compiled policies, caching)
 - ✅ AWS-compatible (follows AWS evaluation rules)
 - ✅ Secure (proper escaping, case sensitivity)
 The integration is complete, tested, and ready for use!
--- a/DESIGN.md
+++ b/DESIGN.md
@ -1,413 +0,0 @@
 # SeaweedFS Task Distribution System Design
 ## Overview
 This document describes the design of a distributed task management system for SeaweedFS that handles Erasure Coding (EC) and vacuum operations through a scalable admin server and worker process architecture.
 ## System Architecture
 ### High-Level Components
 ```
 ┌─────────────────┐    ┌──────────────────┐    ┌─────────────────┐
 │   Master        │◄──►│  Admin Server    │◄──►│   Workers       │
 │                 │    │                  │    │                 │
 │ - Volume Info   │    │ - Task Discovery │    │ - Task Exec     │
 │ - Shard Status  │    │ - Task Assign    │    │ - Progress      │
 │ - Heartbeats    │    │ - Progress Track │    │ - Error Report  │
 └─────────────────┘    └──────────────────┘    └─────────────────┘
         │                        │                        │
         │                        │                        │
         ▼                        ▼                        ▼
 ┌─────────────────┐    ┌──────────────────┐    ┌─────────────────┐
 │ Volume Servers  │    │ Volume Monitor   │    │ Task Execution  │
 │                 │    │                  │    │                 │
 │ - Store Volumes │    │ - Health Check   │    │ - EC Convert    │
 │ - EC Shards     │    │ - Usage Stats    │    │ - Vacuum Clean  │
 │ - Report Status │    │ - State Sync     │    │ - Status Report │
 └─────────────────┘    └──────────────────┘    └─────────────────┘
 ```
 ## 1. Admin Server Design
 ### 1.1 Core Responsibilities
 - **Task Discovery**: Scan volumes to identify EC and vacuum candidates
 - **Worker Management**: Track available workers and their capabilities  
 - **Task Assignment**: Match tasks to optimal workers
 - **Progress Tracking**: Monitor in-progress tasks for capacity planning
 - **State Reconciliation**: Sync with master server for volume state updates
 ### 1.2 Task Discovery Engine
 ```go
 type TaskDiscoveryEngine struct {
    masterClient   MasterClient
    volumeScanner  VolumeScanner
    taskDetectors  map[TaskType]TaskDetector
    scanInterval   time.Duration
 }
 type VolumeCandidate struct {
    VolumeID       uint32
    Server         string
    Collection     string
    TaskType       TaskType
    Priority       TaskPriority
    Reason         string
    DetectedAt     time.Time
    Parameters     map[string]interface{}
 }
 ```
 **EC Detection Logic**:
 - Find volumes >= 95% full and idle for > 1 hour
 - Exclude volumes already in EC format
 - Exclude volumes with ongoing operations
 - Prioritize by collection and age
 **Vacuum Detection Logic**:
 - Find volumes with garbage ratio > 30%
 - Exclude read-only volumes
 - Exclude volumes with recent vacuum operations
 - Prioritize by garbage percentage
 ### 1.3 Worker Registry & Management
 ```go
 type WorkerRegistry struct {
    workers        map[string]*Worker
    capabilities   map[TaskType][]*Worker
    lastHeartbeat  map[string]time.Time
    taskAssignment map[string]*Task
    mutex          sync.RWMutex
 }
 type Worker struct {
    ID            string
    Address       string
    Capabilities  []TaskType
    MaxConcurrent int
    CurrentLoad   int
    Status        WorkerStatus
    LastSeen      time.Time
    Performance   WorkerMetrics
 }
 ```
 ### 1.4 Task Assignment Algorithm
 ```go
 type TaskScheduler struct {
    registry       *WorkerRegistry
    taskQueue      *PriorityQueue
    inProgressTasks map[string]*InProgressTask
    volumeReservations map[uint32]*VolumeReservation
 }
 // Worker Selection Criteria:
 // 1. Has required capability (EC or Vacuum)
 // 2. Available capacity (CurrentLoad < MaxConcurrent)
 // 3. Best performance history for task type
 // 4. Lowest current load
 // 5. Geographically close to volume server (optional)
 ```
 ## 2. Worker Process Design
 ### 2.1 Worker Architecture
 ```go
 type MaintenanceWorker struct {
    id              string
    config          *WorkerConfig
    adminClient     AdminClient
    taskExecutors   map[TaskType]TaskExecutor
    currentTasks    map[string]*RunningTask
    registry        *TaskRegistry
    heartbeatTicker *time.Ticker
    requestTicker   *time.Ticker
 }
 ```
 ### 2.2 Task Execution Framework
 ```go
 type TaskExecutor interface {
    Execute(ctx context.Context, task *Task) error
    EstimateTime(task *Task) time.Duration
    ValidateResources(task *Task) error
    GetProgress() float64
    Cancel() error
 }
 type ErasureCodingExecutor struct {
    volumeClient VolumeServerClient
    progress     float64
    cancelled    bool
 }
 type VacuumExecutor struct {
    volumeClient VolumeServerClient
    progress     float64
    cancelled    bool
 }
 ```
 ### 2.3 Worker Capabilities & Registration
 ```go
 type WorkerCapabilities struct {
    SupportedTasks   []TaskType
    MaxConcurrent    int
    ResourceLimits   ResourceLimits
    PreferredServers []string  // Affinity for specific volume servers
 }
 type ResourceLimits struct {
    MaxMemoryMB      int64
    MaxDiskSpaceMB   int64
    MaxNetworkMbps   int64
    MaxCPUPercent    float64
 }
 ```
 ## 3. Task Lifecycle Management
 ### 3.1 Task States
 ```go
 type TaskState string
 const (
    TaskStatePending     TaskState = "pending"
    TaskStateAssigned    TaskState = "assigned"
    TaskStateInProgress  TaskState = "in_progress"
    TaskStateCompleted   TaskState = "completed"
    TaskStateFailed      TaskState = "failed"
    TaskStateCancelled   TaskState = "cancelled"
    TaskStateStuck       TaskState = "stuck"       // Taking too long
    TaskStateDuplicate   TaskState = "duplicate"   // Detected duplicate
 )
 ```
 ### 3.2 Progress Tracking & Monitoring
 ```go
 type InProgressTask struct {
    Task           *Task
    WorkerID       string
    StartedAt      time.Time
    LastUpdate     time.Time
    Progress       float64
    EstimatedEnd   time.Time
    VolumeReserved bool  // Reserved for capacity planning
 }
 type TaskMonitor struct {
    inProgressTasks map[string]*InProgressTask
    timeoutChecker  *time.Ticker
    stuckDetector   *time.Ticker
    duplicateChecker *time.Ticker
 }
 ```
 ## 4. Volume Capacity Reconciliation
 ### 4.1 Volume State Tracking
 ```go
 type VolumeStateManager struct {
    masterClient      MasterClient
    inProgressTasks   map[uint32]*InProgressTask  // VolumeID -> Task
    committedChanges  map[uint32]*VolumeChange    // Changes not yet in master
    reconcileInterval time.Duration
 }
 type VolumeChange struct {
    VolumeID     uint32
    ChangeType   ChangeType  // "ec_encoding", "vacuum_completed"
    OldCapacity  int64
    NewCapacity  int64
    TaskID       string
    CompletedAt  time.Time
    ReportedToMaster bool
 }
 ```
 ### 4.2 Shard Assignment Integration
 When the master needs to assign shards, it must consider:
 1. **Current volume state** from its own records
 2. **In-progress capacity changes** from admin server
 3. **Committed but unreported changes** from admin server
 ```go
 type CapacityOracle struct {
    adminServer   AdminServerClient
    masterState   *MasterVolumeState
    updateFreq    time.Duration
 }
 func (o *CapacityOracle) GetAdjustedCapacity(volumeID uint32) int64 {
    baseCapacity := o.masterState.GetCapacity(volumeID)
    // Adjust for in-progress tasks
    if task := o.adminServer.GetInProgressTask(volumeID); task != nil {
        switch task.Type {
        case TaskTypeErasureCoding:
            // EC reduces effective capacity
            return baseCapacity / 2  // Simplified
        case TaskTypeVacuum:
            // Vacuum may increase available space
            return baseCapacity + int64(float64(baseCapacity) * 0.3)
        }
    }
    // Adjust for completed but unreported changes
    if change := o.adminServer.GetPendingChange(volumeID); change != nil {
        return change.NewCapacity
    }
    return baseCapacity
 }
 ```
 ## 5. Error Handling & Recovery
 ### 5.1 Worker Failure Scenarios
 ```go
 type FailureHandler struct {
    taskRescheduler *TaskRescheduler
    workerMonitor   *WorkerMonitor
    alertManager    *AlertManager
 }
 // Failure Scenarios:
 // 1. Worker becomes unresponsive (heartbeat timeout)
 // 2. Task execution fails (reported by worker)
 // 3. Task gets stuck (progress timeout)
 // 4. Duplicate task detection
 // 5. Resource exhaustion
 ```
 ### 5.2 Recovery Strategies
 **Worker Timeout Recovery**:
 - Mark worker as inactive after 3 missed heartbeats
 - Reschedule all assigned tasks to other workers
 - Cleanup any partial state
 **Task Stuck Recovery**:
 - Detect tasks with no progress for > 2x estimated time
 - Cancel stuck task and mark volume for cleanup
 - Reschedule if retry count < max_retries
 **Duplicate Task Prevention**:
 ```go
 type DuplicateDetector struct {
    activeFingerprints map[string]bool  // VolumeID+TaskType
    recentCompleted    *LRUCache        // Recently completed tasks
 }
 func (d *DuplicateDetector) IsTaskDuplicate(task *Task) bool {
    fingerprint := fmt.Sprintf("%d-%s", task.VolumeID, task.Type)
    return d.activeFingerprints[fingerprint] || 
           d.recentCompleted.Contains(fingerprint)
 }
 ```
 ## 6. Simulation & Testing Framework
 ### 6.1 Failure Simulation
 ```go
 type TaskSimulator struct {
    scenarios map[string]SimulationScenario
 }
 type SimulationScenario struct {
    Name            string
    WorkerCount     int
    VolumeCount     int
    FailurePatterns []FailurePattern
    Duration        time.Duration
 }
 type FailurePattern struct {
    Type        FailureType  // "worker_timeout", "task_stuck", "duplicate"
    Probability float64      // 0.0 to 1.0
    Timing      TimingSpec   // When during task execution
    Duration    time.Duration
 }
 ```
 ### 6.2 Test Scenarios
 **Scenario 1: Worker Timeout During EC**
 - Start EC task on 30GB volume
 - Kill worker at 50% progress
 - Verify task reassignment
 - Verify no duplicate EC operations
 **Scenario 2: Stuck Vacuum Task**
 - Start vacuum on high-garbage volume
 - Simulate worker hanging at 75% progress
 - Verify timeout detection and cleanup
 - Verify volume state consistency
 **Scenario 3: Duplicate Task Prevention**
 - Submit same EC task from multiple sources
 - Verify only one task executes
 - Verify proper conflict resolution
 **Scenario 4: Master-Admin State Divergence**
 - Create in-progress EC task
 - Simulate master restart
 - Verify state reconciliation
 - Verify shard assignment accounts for in-progress work
 ## 7. Performance & Scalability
 ### 7.1 Metrics & Monitoring
 ```go
 type SystemMetrics struct {
    TasksPerSecond     float64
    WorkerUtilization  float64
    AverageTaskTime    time.Duration
    FailureRate        float64
    QueueDepth         int
    VolumeStatesSync   bool
 }
 ```
 ### 7.2 Scalability Considerations
 - **Horizontal Worker Scaling**: Add workers without admin server changes
 - **Admin Server HA**: Master-slave admin servers for fault tolerance
 - **Task Partitioning**: Partition tasks by collection or datacenter
 - **Batch Operations**: Group similar tasks for efficiency
 ## 8. Implementation Plan
 ### Phase 1: Core Infrastructure
 1. Admin server basic framework
 2. Worker registration and heartbeat
 3. Simple task assignment
 4. Basic progress tracking
 ### Phase 2: Advanced Features
 1. Volume state reconciliation
 2. Sophisticated worker selection
 3. Failure detection and recovery
 4. Duplicate prevention
 ### Phase 3: Optimization & Monitoring
 1. Performance metrics
 2. Load balancing algorithms
 3. Capacity planning integration
 4. Comprehensive monitoring
 This design provides a robust, scalable foundation for distributed task management in SeaweedFS while maintaining consistency with the existing architecture patterns. 
--- a/SQL_FEATURE_PLAN.md
+++ b/SQL_FEATURE_PLAN.md
@ -1,145 +0,0 @@
 # SQL Query Engine Feature, Dev, and Test Plan
 This document outlines the plan for adding SQL querying support to SeaweedFS, focusing on reading and analyzing data from Message Queue (MQ) topics.
 ## Feature Plan
 **1. Goal**
 To provide a SQL querying interface for SeaweedFS, enabling analytics on existing MQ topics. This enables:
 - Basic querying with SELECT, WHERE, aggregations on MQ topics
 - Schema discovery and metadata operations (SHOW DATABASES, SHOW TABLES, DESCRIBE)
 - In-place analytics on Parquet-stored messages without data movement
 **2. Key Features**
 *   **Schema Discovery and Metadata:**
    *   `SHOW DATABASES` - List all MQ namespaces
    *   `SHOW TABLES` - List all topics in a namespace  
    *   `DESCRIBE table_name` - Show topic schema details
    *   Automatic schema detection from existing Parquet data
 *   **Basic Query Engine:**
    *   `SELECT` support with `WHERE`, `LIMIT`, `OFFSET`
    *   Aggregation functions: `COUNT()`, `SUM()`, `AVG()`, `MIN()`, `MAX()`
    *   Temporal queries with timestamp-based filtering
 *   **User Interfaces:**
    *   New CLI command `weed sql` with interactive shell mode
    *   Optional: Web UI for query execution and result visualization
 *   **Output Formats:**
    *   JSON (default), CSV, Parquet for result sets
    *   Streaming results for large queries
    *   Pagination support for result navigation
 ## Development Plan
 **3. Data Source Integration**
 *   **MQ Topic Connector (Primary):**
    *   Build on existing `weed/mq/logstore/read_parquet_to_log.go`
    *   Implement efficient Parquet scanning with predicate pushdown
    *   Support schema evolution and backward compatibility
    *   Handle partition-based parallelism for scalable queries
 *   **Schema Registry Integration:**
    *   Extend `weed/mq/schema/schema.go` for SQL metadata operations
    *   Read existing topic schemas for query planning
    *   Handle schema evolution during query execution
 **4. API & CLI Integration**
 *   **CLI Command:**
    *   New `weed sql` command with interactive shell mode (similar to `weed shell`)
    *   Support for script execution and result formatting
    *   Connection management for remote SeaweedFS clusters
 *   **gRPC API:**
    *   Add SQL service to existing MQ broker gRPC interface
    *   Enable efficient query execution with streaming results
 ## Example Usage Scenarios
 **Scenario 1: Schema Discovery and Metadata**
 ```sql
 -- List all namespaces (databases)
 SHOW DATABASES;
 -- List topics in a namespace
 USE my_namespace;
 SHOW TABLES;
 -- View topic structure and discovered schema
 DESCRIBE user_events;
 ```
 **Scenario 2: Data Querying**
 ```sql
 -- Basic filtering and projection
 SELECT user_id, event_type, timestamp 
 FROM user_events 
 WHERE timestamp > 1640995200000 
 LIMIT 100;
 -- Aggregation queries  
 SELECT COUNT(*) as event_count
 FROM user_events 
 WHERE timestamp >= 1640995200000;
 -- More aggregation examples
 SELECT MAX(timestamp), MIN(timestamp) 
 FROM user_events;
 ```
 **Scenario 3: Analytics & Monitoring**
 ```sql
 -- Basic analytics
 SELECT COUNT(*) as total_events
 FROM user_events 
 WHERE timestamp >= 1640995200000;
 -- Simple monitoring
 SELECT AVG(response_time) as avg_response
 FROM api_logs
 WHERE timestamp >= 1640995200000;
 ## Architecture Overview
 ```
 SQL Query Flow:
                                  1. Parse SQL        2. Plan & Optimize      3. Execute Query
 ┌─────────────┐    ┌──────────────┐    ┌─────────────────┐    ┌──────────────┐
 │   Client    │    │  SQL Parser  │    │  Query Planner  │    │   Execution  │
 │    (CLI)    │──→ │ PostgreSQL   │──→ │   & Optimizer   │──→ │    Engine    │
 │             │    │ (Custom)     │    │                 │    │              │
 └─────────────┘    └──────────────┘    └─────────────────┘    └──────────────┘
                                               │                       │
                                               │ Schema Lookup         │ Data Access
                                               ▼                       ▼
                    ┌─────────────────────────────────────────────────────────────┐
                    │                    Schema Catalog                            │
                    │  • Namespace → Database mapping                            │
                    │  • Topic → Table mapping                                  │
                    │  • Schema version management                              │
                    └─────────────────────────────────────────────────────────────┘
                                                                        ▲
                                                                        │ Metadata
                                                                        │
 ┌─────────────────────────────────────────────────────────────────────────────┐
 │                          MQ Storage Layer                                      │
 │  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐    ▲    │
 │  │   Topic A   │  │   Topic B   │  │   Topic C   │  │     ...     │    │    │
 │  │ (Parquet)   │  │ (Parquet)   │  │ (Parquet)   │  │ (Parquet)   │    │    │
 │  └─────────────┘  └─────────────┘  └─────────────┘  └─────────────┘    │    │
 └──────────────────────────────────────────────────────────────────────────│──┘
                                                                          │
                                                                     Data Access
 ```
 ## Success Metrics
 *   **Feature Completeness:** Support for all specified SELECT operations and metadata commands
 *   **Performance:** 
    *   **Simple SELECT queries**: < 100ms latency for single-table queries with up to 3 WHERE predicates on ≤ 100K records
    *   **Complex queries**: < 1s latency for queries involving aggregations (COUNT, SUM, MAX, MIN) on ≤ 1M records
    *   **Time-range queries**: < 500ms for timestamp-based filtering on ≤ 500K records within 24-hour windows
 *   **Scalability:** Handle topics with millions of messages efficiently