# Kafka Client Load Test for SeaweedFS

This comprehensive load testing suite validates the SeaweedFS MQ stack using real Kafka client libraries. Unlike the existing SMQ tests, this uses actual Kafka clients (`sarama` and `confluent-kafka-go`) to test the complete integration through:

- **Kafka Clients** → **SeaweedFS Kafka Gateway** → **SeaweedFS MQ Broker** → **SeaweedFS Storage**

## Architecture

```
┌─────────────────┐    ┌──────────────────┐    ┌─────────────────────┐
│   Kafka Client  │    │  Kafka Gateway   │    │   SeaweedFS MQ      │
│   Load Test     │───▶│  (Port 9093)     │───▶│   Broker            │
│   - Producers   │    │                  │    │                     │
│   - Consumers   │    │  Protocol        │    │   Topic Management  │
│                 │    │  Translation     │    │   Message Storage   │
└─────────────────┘    └──────────────────┘    └─────────────────────┘
                                                             │
                                                             ▼
                                                ┌─────────────────────┐
                                                │  SeaweedFS Storage  │
                                                │  - Master           │
                                                │  - Volume Server    │
                                                │  - Filer            │
                                                └─────────────────────┘
```

## Features

### 🚀 **Multiple Test Modes**
- **Producer-only**: Pure message production testing
- **Consumer-only**: Consumption from existing topics  
- **Comprehensive**: Full producer + consumer load testing

### 📊 **Rich Metrics & Monitoring**
- Prometheus metrics collection
- Grafana dashboards
- Real-time throughput and latency tracking
- Consumer lag monitoring
- Error rate analysis

### 🔧 **Configurable Test Scenarios**
- **Quick Test**: 1-minute smoke test
- **Standard Test**: 5-minute medium load
- **Stress Test**: 10-minute high load  
- **Endurance Test**: 30-minute sustained load
- **Custom**: Fully configurable parameters

### 📈 **Message Types**
- **JSON**: Structured test messages
- **Avro**: Schema Registry integration
- **Binary**: Raw binary payloads

### 🛠 **Kafka Client Support**
- **Sarama**: Native Go Kafka client
- **Confluent**: Official Confluent Go client
- Schema Registry integration
- Consumer group management

## Quick Start

### Prerequisites
- Docker & Docker Compose
- Make (optional, but recommended)

### 1. Run Default Test
```bash
make test
```
This runs a 5-minute comprehensive test with 10 producers and 5 consumers.

### 2. Quick Smoke Test
```bash
make quick-test
```
1-minute test with minimal load for validation.

### 3. Stress Test
```bash
make stress-test  
```
10-minute high-throughput test with 20 producers and 10 consumers.

### 4. Test with Monitoring
```bash
make test-with-monitoring
```
Includes Prometheus + Grafana dashboards for real-time monitoring.

## Detailed Usage

### Manual Control
```bash
# Start infrastructure only
make start

# Run load test against running infrastructure
make test TEST_MODE=comprehensive TEST_DURATION=10m

# Stop everything
make stop

# Clean up all resources
make clean
```

### Using Scripts Directly
```bash
# Full control with the main script
./scripts/run-loadtest.sh start -m comprehensive -d 10m --monitoring

# Check service health
./scripts/wait-for-services.sh check

# Setup monitoring configurations
./scripts/setup-monitoring.sh
```

### Environment Variables
```bash
export TEST_MODE=comprehensive        # producer, consumer, comprehensive  
export TEST_DURATION=300s            # Test duration
export PRODUCER_COUNT=10              # Number of producer instances
export CONSUMER_COUNT=5               # Number of consumer instances  
export MESSAGE_RATE=1000              # Messages/second per producer
export MESSAGE_SIZE=1024              # Message size in bytes
export TOPIC_COUNT=5                  # Number of topics to create
export PARTITIONS_PER_TOPIC=3         # Partitions per topic

make test
```

## Configuration

### Main Configuration File
Edit `config/loadtest.yaml` to customize:

- **Kafka Settings**: Bootstrap servers, security, timeouts
- **Producer Config**: Batching, compression, acknowledgments  
- **Consumer Config**: Group settings, fetch parameters
- **Message Settings**: Size, format (JSON/Avro/Binary)
- **Schema Registry**: Avro/Protobuf schema validation
- **Metrics**: Prometheus collection intervals
- **Test Scenarios**: Predefined load patterns

### Example Custom Configuration
```yaml
test_mode: "comprehensive"
duration: "600s"  # 10 minutes

producers:
  count: 15
  message_rate: 2000
  message_size: 2048
  compression_type: "snappy"
  acks: "all"

consumers:
  count: 8
  group_prefix: "high-load-group"
  max_poll_records: 1000

topics:
  count: 10
  partitions: 6
  replication_factor: 1
```

## Test Scenarios

### 1. Producer Performance Test
```bash
make producer-test TEST_DURATION=10m PRODUCER_COUNT=20 MESSAGE_RATE=3000
```
Tests maximum message production throughput.

### 2. Consumer Performance Test  
```bash
# First produce messages
make producer-test TEST_DURATION=5m

# Then test consumption
make consumer-test TEST_DURATION=10m CONSUMER_COUNT=15
```

### 3. Schema Registry Integration
```bash
# Enable schemas in config/loadtest.yaml
schemas:
  enabled: true
  
make test
```
Tests Avro message serialization through Schema Registry.

### 4. High Availability Test
```bash
# Test with container restarts during load
make test TEST_DURATION=20m &
sleep 300
docker restart kafka-gateway
```

## Monitoring & Metrics

### Real-Time Dashboards
When monitoring is enabled:
- **Prometheus**: http://localhost:9090
- **Grafana**: http://localhost:3000 (admin/admin)

### Key Metrics Tracked
- **Throughput**: Messages/second, MB/second
- **Latency**: End-to-end message latency percentiles  
- **Errors**: Producer/consumer error rates
- **Consumer Lag**: Per-partition lag monitoring
- **Resource Usage**: CPU, memory, disk I/O

### Grafana Dashboards
- **Kafka Load Test**: Comprehensive test metrics
- **SeaweedFS Cluster**: Storage system health
- **Custom Dashboards**: Extensible monitoring

## Advanced Features

### Schema Registry Testing
```bash
# Test Avro message serialization
export KAFKA_VALUE_TYPE=avro
make test
```

The load test includes:
- Schema registration
- Avro message encoding/decoding  
- Schema evolution testing
- Compatibility validation

### Multi-Client Testing
The test supports both Sarama and Confluent clients:
```go
// Configure in producer/consumer code
useConfluent := true  // Switch client implementation
```

### Consumer Group Rebalancing
- Automatic consumer group management
- Partition rebalancing simulation
- Consumer failure recovery testing

### Chaos Testing
```yaml
chaos:
  enabled: true
  producer_failure_rate: 0.01
  consumer_failure_rate: 0.01
  network_partition_probability: 0.001
```

## Troubleshooting

### Common Issues

#### Services Not Starting
```bash
# Check service health
make health-check

# View detailed logs
make logs

# Debug mode
make debug
```

#### Low Throughput
- Increase `MESSAGE_RATE` and `PRODUCER_COUNT`
- Adjust `batch_size` and `linger_ms` in config
- Check consumer `max_poll_records` setting

#### High Latency
- Reduce `linger_ms` for lower latency
- Adjust `acks` setting (0, 1, or "all")
- Monitor consumer lag

#### Memory Issues  
```bash
# Reduce concurrent clients
make test PRODUCER_COUNT=5 CONSUMER_COUNT=3

# Adjust message size  
make test MESSAGE_SIZE=512
```

### Debug Commands
```bash
# Execute shell in containers
make exec-master
make exec-filer  
make exec-gateway

# Attach to load test
make attach-loadtest

# View real-time stats
curl http://localhost:8080/stats
```

## Development

### Building from Source
```bash
# Set up development environment
make dev-env

# Build load test binary
make build

# Run tests locally (requires Go 1.21+)
cd cmd/loadtest && go run main.go -config ../../config/loadtest.yaml
```

### Extending the Tests
1. **Add new message formats** in `internal/producer/`
2. **Add custom metrics** in `internal/metrics/`  
3. **Create new test scenarios** in `config/loadtest.yaml`
4. **Add monitoring panels** in `monitoring/grafana/dashboards/`

### Contributing
1. Fork the repository
2. Create a feature branch
3. Add tests for new functionality
4. Ensure all tests pass: `make test`
5. Submit a pull request

## Performance Benchmarks

### Expected Performance (on typical hardware)

| Scenario | Producers | Consumers | Rate (msg/s) | Latency (p95) |
|----------|-----------|-----------|--------------|---------------|
| Quick    | 2         | 2         | 200          | <10ms         |
| Standard | 5         | 3         | 2,500        | <20ms         |
| Stress   | 20        | 10        | 40,000       | <50ms         |
| Endurance| 10        | 5         | 10,000       | <30ms         |

*Results vary based on hardware, network, and SeaweedFS configuration*

### Tuning for Maximum Performance
```yaml
producers:
  batch_size: 1000
  linger_ms: 10
  compression_type: "lz4"
  acks: "1"  # Balance between speed and durability

consumers:  
  max_poll_records: 5000
  fetch_min_bytes: 1048576  # 1MB
  fetch_max_wait_ms: 100
```

## Comparison with Existing Tests

| Feature | SMQ Tests | **Kafka Client Load Test** |
|---------|-----------|----------------------------|
| Protocol | SMQ (SeaweedFS native) | **Kafka (industry standard)** |
| Clients | SMQ clients | **Real Kafka clients (Sarama, Confluent)** |
| Schema Registry | ❌ | **✅ Full Avro/Protobuf support** |
| Consumer Groups | Basic | **✅ Full Kafka consumer group features** |
| Monitoring | Basic | **✅ Prometheus + Grafana dashboards** |
| Test Scenarios | Limited | **✅ Multiple predefined scenarios** |
| Real-world | Synthetic | **✅ Production-like workloads** |

This load test provides comprehensive validation of the SeaweedFS Kafka Gateway using real-world Kafka clients and protocols.

---

## Quick Reference

```bash
# Essential Commands
make help                    # Show all available commands
make test                    # Run default comprehensive test  
make quick-test              # 1-minute smoke test
make stress-test             # High-load stress test
make test-with-monitoring    # Include Grafana dashboards
make clean                   # Clean up all resources

# Monitoring
make monitor                 # Start Prometheus + Grafana
# → http://localhost:9090 (Prometheus)
# → http://localhost:3000 (Grafana, admin/admin)

# Advanced
make benchmark               # Run full benchmark suite
make health-check            # Validate service health
make validate-setup          # Check configuration
```