7.9 KiB

Raw Permalink Blame History

SeaweedMQ Integration Test Design

Overview

This document outlines the comprehensive integration test strategy for SeaweedMQ, covering all critical functionalities from basic pub/sub operations to advanced features like auto-scaling, failover, and performance testing.

Architecture Under Test

SeaweedMQ consists of:

Masters: Cluster coordination and metadata management
Volume Servers: Storage layer for persistent messages
Filers: File system interface for metadata storage
Brokers: Message processing and routing (stateless)
Agents: Client interface for pub/sub operations
Schema System: Protobuf-based message schema management

Test Categories

1. Basic Functionality Tests

1.1 Basic Pub/Sub Operations

Test: TestBasicPublishSubscribe
- Publish messages to a topic
- Subscribe and receive messages
- Verify message content and ordering
- Test with different data types (string, int, bytes, records)
Test: TestMultipleConsumers
- Multiple subscribers on same topic
- Verify message distribution
- Test consumer group functionality
Test: TestMessageOrdering
- Publish messages in sequence
- Verify FIFO ordering within partitions
- Test with different partition keys

1.2 Schema Management

Test: TestSchemaValidation
- Publish with valid schemas
- Reject invalid schema messages
- Test schema evolution scenarios
Test: TestRecordTypes
- Nested record structures
- List types and complex schemas
- Schema-to-Parquet conversion

2. Partitioning and Scaling Tests

2.1 Partition Management

Test: TestPartitionDistribution
- Messages distributed across partitions based on keys
- Verify partition assignment logic
- Test partition rebalancing
Test: TestAutoSplitMerge
- Simulate high load to trigger auto-split
- Simulate low load to trigger auto-merge
- Verify data consistency during splits/merges

2.2 Broker Scaling

Test: TestBrokerAddRemove
- Add brokers during operation
- Remove brokers gracefully
- Verify partition reassignment
Test: TestLoadBalancing
- Verify even load distribution across brokers
- Test with varying message sizes and rates
- Monitor broker resource utilization

3. Failover and Reliability Tests

3.1 Broker Failover

Test: TestBrokerFailover
- Kill leader broker during publishing
- Verify seamless failover to follower
- Test data consistency after failover
Test: TestBrokerRecovery
- Broker restart scenarios
- State recovery from storage
- Partition reassignment after recovery

3.2 Data Durability

Test: TestMessagePersistence
- Publish messages and restart cluster
- Verify all messages are recovered
- Test with different replication settings
Test: TestFollowerReplication
- Leader-follower message replication
- Verify consistency between replicas
- Test follower promotion scenarios

4. Agent Functionality Tests

4.1 Session Management

Test: TestPublishSessions
- Create/close publish sessions
- Concurrent session management
- Session cleanup after failures
Test: TestSubscribeSessions
- Subscribe session lifecycle
- Consumer group management
- Offset tracking and acknowledgments

4.2 Error Handling

Test: TestConnectionFailures
- Network partitions between agent and broker
- Automatic reconnection logic
- Message buffering during outages

5. Performance and Load Tests

5.1 Throughput Tests

Test: TestHighThroughputPublish
- Publish 100K+ messages/second
- Monitor system resources
- Verify no message loss
Test: TestHighThroughputSubscribe
- Multiple consumers processing high volume
- Monitor processing latency
- Test backpressure handling

5.2 Spike Traffic Tests

Test: TestTrafficSpikes
- Sudden increase in message volume
- Auto-scaling behavior verification
- Resource utilization patterns
Test: TestLargeMessages
- Messages with large payloads (MB size)
- Memory usage monitoring
- Storage efficiency testing

6. End-to-End Scenarios

6.1 Complete Workflow Tests

Test: TestProducerConsumerWorkflow
- Multi-stage data processing pipeline
- Producer → Topic → Multiple Consumers
- Data transformation and aggregation
Test: TestMultiTopicOperations
- Multiple topics with different schemas
- Cross-topic message routing
- Topic management operations

Test Infrastructure

Environment Setup

Docker Compose Configuration

# test-environment.yml
version: '3.9'
services:
  master-cluster:
    # 3 master nodes for HA
  volume-cluster:
    # 3 volume servers for data storage
  filer-cluster:
    # 2 filers for metadata
  broker-cluster:
    # 3 brokers for message processing
  test-runner:
    # Container to run integration tests

Test Data Management

Pre-defined test schemas
Sample message datasets
Performance benchmarking data

Test Framework Structure

// Base test framework
type IntegrationTestSuite struct {
    masters     []string
    brokers     []string
    filers      []string
    testClient  *TestClient
    cleanup     []func()
}

// Test utilities
type TestClient struct {
    publishers  map[string]*pub_client.TopicPublisher
    subscribers map[string]*sub_client.TopicSubscriber
    agents      []*agent.MessageQueueAgent
}

Monitoring and Metrics

Health Checks

Broker connectivity status
Master cluster health
Storage system availability
Network connectivity between components

Performance Metrics

Message throughput (msgs/sec)
End-to-end latency
Resource utilization (CPU, Memory, Disk)
Network bandwidth usage

Test Execution Strategy

Parallel Test Execution

Categorize tests by resource requirements
Run independent tests in parallel
Serialize tests that modify cluster state

Continuous Integration

Automated test runs on PR submissions
Performance regression detection
Multi-platform testing (Linux, macOS, Windows)

Test Environment Management

Docker-based isolated environments
Automatic cleanup after test completion
Resource monitoring and alerts

Success Criteria

Functional Requirements

✅ All messages published are received by subscribers
✅ Message ordering preserved within partitions
✅ Schema validation works correctly
✅ Auto-scaling triggers at expected thresholds
✅ Failover completes within 30 seconds
✅ No data loss during normal operations

Performance Requirements

✅ Throughput: 50K+ messages/second/broker
✅ Latency: P95 < 100ms end-to-end
✅ Memory usage: < 1GB per broker under normal load
✅ Storage efficiency: < 20% overhead vs raw message size

Reliability Requirements

✅ 99.9% uptime during normal operations
✅ Automatic recovery from single component failures
✅ Data consistency maintained across all scenarios
✅ Graceful degradation under resource constraints

Implementation Timeline

Phase 1: Core Functionality (Week 1-2)

Basic pub/sub tests
Schema validation tests
Simple failover scenarios

Phase 2: Advanced Features (Week 3-4)

Auto-scaling tests
Complex failover scenarios
Agent functionality tests

Phase 3: Performance & Load (Week 5-6)

Throughput and latency tests
Spike traffic handling
Resource utilization monitoring

Phase 4: End-to-End (Week 7-8)

Complete workflow tests
Multi-component integration
Performance regression testing

Maintenance and Updates

Regular Updates

Add tests for new features
Update performance baselines
Enhance error scenarios coverage

Test Data Refresh

Generate new test datasets quarterly
Update schema examples
Refresh performance benchmarks

This comprehensive test design ensures SeaweedMQ's reliability, performance, and functionality across all critical use cases and failure scenarios.

7.9 KiB Raw Permalink Blame History