9.1 KiB
Kafka Protocol Compatibility Review & TODOs
Overview
This document identifies areas in the current Kafka implementation that need attention for full protocol compatibility, including assumptions, simplifications, and potential issues.
Critical Protocol Issues
🚨 HIGH PRIORITY - Protocol Breaking Issues
1. Record Batch Parsing (Produce API)
File: protocol/produce.go
Issues:
parseRecordSet()uses simplified parsing logic that doesn't handle the full Kafka record batch format- Hardcoded assumptions about record batch structure
- Missing compression support (gzip, snappy, lz4, zstd)
- CRC validation is completely missing
TODOs:
// TODO: Implement full Kafka record batch parsing
// - Support all record batch versions (v0, v1, v2)
// - Handle compression codecs (gzip, snappy, lz4, zstd)
// - Validate CRC32 checksums
// - Parse individual record headers, keys, values, timestamps
// - Handle transaction markers and control records
2. Request Parsing Assumptions
Files: protocol/offset_management.go, protocol/joingroup.go, protocol/consumer_coordination.go
Issues:
- Most parsing functions have hardcoded topic/partition assumptions
- Missing support for array parsing (topics, partitions, group protocols)
- Simplified request structures that don't match real Kafka clients
TODOs:
// TODO: Fix OffsetCommit/OffsetFetch request parsing
// Currently returns hardcoded "test-topic" with partition 0
// Need to parse actual topics array from request body
// TODO: Fix JoinGroup protocol parsing
// Currently ignores group protocols array and subscription metadata
// Need to extract actual subscribed topics from consumer metadata
// TODO: Add support for batch operations
// OffsetCommit can commit multiple topic-partitions
// LeaveGroup can handle multiple members leaving
3. Fetch Record Construction
File: protocol/fetch.go
Issues:
constructRecordBatch()creates fake record batches with dummy data- Varint encoding is simplified to single bytes (incorrect)
- Missing proper record headers, timestamps, and metadata
TODOs:
// TODO: Replace dummy record batch construction with real data
// - Read actual message data from SeaweedMQ/storage
// - Implement proper varint encoding/decoding
// - Support record headers and custom timestamps
// - Handle different record batch versions correctly
⚠️ MEDIUM PRIORITY - Compatibility Issues
4. API Version Support
File: protocol/handler.go
Issues:
- ApiVersions response advertises max versions but implementations may not support all features
- No version-specific handling in most APIs
TODOs:
// TODO: Add API version validation per request
// Different API versions have different request/response formats
// Need to validate apiVersion from request header and respond accordingly
// TODO: Update handleApiVersions to reflect actual supported features
// Current max versions may be too optimistic for partial implementations
5. Consumer Group Protocol Metadata
File: protocol/joingroup.go
Issues:
- Consumer subscription extraction is hardcoded to return
["test-topic"] - Group protocol metadata parsing is completely stubbed
TODOs:
// TODO: Implement proper consumer protocol metadata parsing
// Consumer clients send subscription information in protocol metadata
// Need to decode consumer subscription protocol format:
// - Version(2) + subscription topics + user data
// TODO: Support multiple assignment strategies properly
// Currently only basic range/roundrobin, need to parse client preferences
6. Error Code Mapping
Files: Multiple protocol files Issues:
- Some error codes may not match Kafka specifications exactly
- Missing error codes for edge cases
TODOs:
// TODO: Verify all error codes match Kafka specification
// Check ErrorCode constants against official Kafka protocol docs
// Some custom error codes may not be recognized by clients
// TODO: Add missing error codes for:
// - Network errors, timeout errors
// - Quota exceeded, throttling
// - Security/authorization errors
🔧 LOW PRIORITY - Implementation Completeness
7. Connection Management
File: protocol/handler.go
Issues:
- Basic connection handling without connection pooling
- No support for SASL authentication or SSL/TLS
- Missing connection metadata (client host, version)
TODOs:
// TODO: Extract client connection metadata
// JoinGroup requests need actual client host instead of "unknown-host"
// Parse client version from request headers for better compatibility
// TODO: Add connection security support
// Support SASL/PLAIN, SASL/SCRAM authentication
// Support SSL/TLS encryption
8. Record Timestamps and Offsets
Files: protocol/produce.go, protocol/fetch.go
Issues:
- Simplified timestamp handling
- Offset assignment may not match Kafka behavior exactly
TODOs:
// TODO: Implement proper offset assignment strategy
// Kafka offsets are partition-specific and strictly increasing
// Current implementation may have gaps or inconsistencies
// TODO: Support timestamp types correctly
// Kafka supports CreateTime vs LogAppendTime
// Need to handle timestamp-based offset lookups properly
9. SeaweedMQ Integration Assumptions
File: integration/seaweedmq_handler.go
Issues:
- Simplified record format conversion
- Single partition assumption for new topics
- Missing topic configuration support
TODOs:
// TODO: Implement proper Kafka->SeaweedMQ record conversion
// Currently uses placeholder keys/values
// Need to extract actual record data from Kafka record batches
// TODO: Support configurable partition counts
// Currently hardcoded to 1 partition per topic
// Need to respect CreateTopics partition count requests
// TODO: Add topic configuration support
// Kafka topics have configs like retention, compression, cleanup policy
// Map these to SeaweedMQ topic settings
Testing Compatibility Issues
Missing Integration Tests
TODOs:
// TODO: Add real Kafka client integration tests
// Test with kafka-go, Sarama, and other popular Go clients
// Verify producer/consumer workflows work end-to-end
// TODO: Add protocol conformance tests
// Use Kafka protocol test vectors if available
// Test edge cases and error conditions
// TODO: Add load testing
// Verify behavior under high throughput
// Test with multiple concurrent consumer groups
Protocol Version Testing
TODOs:
// TODO: Test multiple API versions
// Clients may use different API versions
// Ensure backward compatibility
// TODO: Test with different Kafka client libraries
// Java clients, Python clients, etc.
// Different clients may have different protocol expectations
Performance & Scalability TODOs
Memory Management
TODOs:
// TODO: Add memory pooling for large messages
// Avoid allocating large byte slices for each request
// Reuse buffers for protocol encoding/decoding
// TODO: Implement streaming for large record batches
// Don't load entire batches into memory at once
// Stream records directly from storage to client
Connection Handling
TODOs:
// TODO: Add connection timeout handling
// Implement proper client timeout detection
// Clean up stale connections and consumer group members
// TODO: Add backpressure handling
// Implement flow control for high-throughput scenarios
// Prevent memory exhaustion during load spikes
Immediate Action Items
Phase 4 Priority List:
- Fix Record Batch Parsing - Critical for real client compatibility
- Implement Proper Request Parsing - Remove hardcoded assumptions
- Add Compression Support - Essential for performance
- Real SeaweedMQ Integration - Move beyond placeholder data
- Consumer Protocol Metadata - Fix subscription handling
- API Version Handling - Support multiple protocol versions
Compatibility Validation:
- Test with kafka-go library - Most popular Go Kafka client
- Test with Sarama library - Alternative popular Go client
- Test with Java Kafka clients - Reference implementation
- Performance benchmarking - Compare against Apache Kafka
Protocol Standards References
- Kafka Protocol Guide: https://kafka.apache.org/protocol.html
- Record Batch Format: Kafka protocol v2 record format specification
- Consumer Protocol: Group coordination and assignment protocol details
- API Versioning: How different API versions affect request/response format
Notes on Current State
What Works Well:
- Basic produce/consume flow for simple cases
- Consumer group coordination state management
- In-memory testing mode for development
- Graceful error handling for most common cases
What Needs Work:
- Real-world client compatibility (requires fixing parsing issues)
- Performance under load (needs compression, streaming)
- Production deployment (needs security, monitoring)
- Edge case handling (various protocol versions, error conditions)
This review should be updated as protocol implementations improve and more compatibility issues are discovered.