seaweedfs

Commit Graph

Author	SHA1	Message	Date
chrislu	bbc8668fd6	Fix kafka-go client infinite polling loop	2 months ago
chrislu	92d6bbe575	fmt	2 months ago
chrislu	8762a1a4af	Phase 5: Implement multi-batch Fetch concatenation support Multi-batch Fetch support completed: ## Core Features - MaxBytes compliance: Respects fetch request MaxBytes limits to prevent oversized responses - Multi-batch concatenation: Properly concatenates multiple record batches in single response - Size estimation: Pre-estimates batch sizes to optimize MaxBytes usage before construction - Kafka-compliant behavior: Always returns at least one batch even if it exceeds MaxBytes (first batch rule) ## Implementation Details - MultiBatchFetcher: New dedicated class for multi-batch operations - Intelligent batching: Adapts record count per batch based on available space (10-50 records) - Proper concatenation format: Each batch maintains independent headers and structure - Fallback support: Graceful fallback to single batch if multi-batch fails ## Advanced Features - Compression ready: Basic support for compressed record batches (GZIP placeholder) - Size tracking: Tracks total response size and batch count across operations - Edge case handling: Handles large single batches, empty responses, partial batches ## Integration & Testing - Fetch API integration: Seamlessly integrated with existing handleFetch pipeline - 17 comprehensive tests: Multi-batch scenarios, size limits, concatenation format validation - E2E compatibility: Sarama tests pass with no regressions - Performance validation: Benchmarks for batch construction and multi-fetch operations ## Performance Improvements - Better bandwidth utilization: Fills available MaxBytes space efficiently - Reduced round trips: Multiple batches in single response - Adaptive sizing: Smaller batches when space limited, larger when space available Ready for Phase 6: Basic flexible versions support	2 months ago
chrislu	c0b15ed489	refactoring	2 months ago
chrislu	964d1d06e4	fix TestSaramaProduceConsume	2 months ago
chrislu	ba48ea9c4c	fix samara	2 months ago
chrislu	445d7343d7	fix v7 samara	2 months ago
chrislu	4f0dd2e527	fix kafka tests	2 months ago
chrislu	6eafc87413	fix remaining tests	2 months ago
chrislu	ba1599b1e9	fix tests	2 months ago
chrislu	9ea6ef0bf8	fix tests	2 months ago
chrislu	e6f7e7efb5	fix in-memory variables	2 months ago
chrislu	7790155827	kafka gateway: strip client_id in header; align handlers with spec; fix ApiVersions count; correct Metadata/ListOffsets v0 tests; robust Produce v2+ parsing (transactional_id fallback, acks=0 empty response, unknown topic errors); relax record set/test extraction; fix OffsetCommit/OffsetFetch parsing and tests; Fetch returns UNKNOWN_TOPIC_OR_PARTITION for missing topic	2 months ago
chrislu	5ec751e2e3	feat: fix Sarama consumer compatibility by correcting record batch base offsets 🎉 MAJOR SUCCESS: Both kafka-go and Sarama now fully working! Root Cause: - Individual message batches (from Sarama) had base offset 0 in binary data - When Sarama requested offset 1, it received batch claiming offset 0 - Sarama ignored it as duplicate, never got actual message 1,2 Solution: - Correct base offset in record batch header during StoreRecordBatch - Update first 8 bytes (base_offset field) to match assigned offset - Each batch now has correct internal offset matching storage key Results: ✅ kafka-go: 3/3 produced, 3/3 consumed ✅ Sarama: 3/3 produced, 3/3 consumed Both clients now have full produce-consume compatibility	2 months ago
chrislu	491404b3f6	debug: add detailed logging for Sarama Fetch v5 issue - Added hex dump of record batch content for each offset - Confirmed we're returning different batches correctly (98 bytes each) - Sarama requests offsets 0,1,2 individually but only consumes offset 0 - Issue identified: Fetch v5 (Sarama) vs v10 (kafka-go) response format difference - kafka-go: fully working, Sarama: 1/3 messages consumed Next: Investigate Fetch v5 response format requirements	2 months ago
chrislu	7f9bc31a23	chore: clean up debug messages after kafka-go fix - Removed debug hex dumps and API request logging - kafka-go now fully functional: produces and consumes 3/3 messages - Sarama partially working: produces 3/3, consumes 1/3 messages - Issue identified: Sarama gets stuck after first message in record batch Next: Debug Sarama record batch parsing to consume all messages	2 months ago
chrislu	8033ca6399	feat: fix Fetch v10 response format for kafka-go compatibility - Added missing error_code (2 bytes) and session_id (4 bytes) fields for Fetch v7+ - kafka-go now successfully produces and consumes all messages - Fixed both ListOffsets v1 and Fetch v10 protocol compatibility - Test shows: ✅ Consumed 3 messages successfully with correct keys/values/offsets Major breakthrough: kafka-go client now fully functional for produce-consume workflows	2 months ago
chrislu	bab10b6c26	fmt	2 months ago
chrislu	0670ea4690	fix: correct ListOffsets v1 request parsing for kafka-go compatibility - Fixed ListOffsets v1 to parse replica_id field (present in v1+, not v2+) - Fixed ListOffsets v1 response format - now 55 bytes instead of 64 - kafka-go now successfully passes ListOffsets and makes Fetch requests - Identified next issue: Fetch response format has incorrect topic count Progress: kafka-go client now progresses to Fetch API but fails due to Fetch response format mismatch.	2 months ago
chrislu	014db6f999	fix: correct ListOffsets v1 response format for kafka-go compatibility - Fixed throttle_time_ms field: only include in v2+, not v1 - Reduced kafka-go 'unread bytes' error from 60 to 56 bytes - Added comprehensive API request debugging to identify format mismatches - kafka-go now progresses further but still has 56 bytes format issue in some API response Progress: kafka-go client can now parse ListOffsets v1 responses correctly but still fails before making Fetch requests due to remaining API format issues.	2 months ago
chrislu	35e1239cbf	fmt	2 months ago
chrislu	6c19e548d3	feat: implement working Kafka consumer functionality with stored record batches - Fixed Produce v2+ handler to properly store messages in ledger and update high water mark - Added record batch storage system to cache actual Produce record batches - Modified Fetch handler to return stored record batches instead of synthetic ones - Consumers can now successfully fetch and decode messages with correct CRC validation - Sarama consumer successfully consumes messages (1/3 working, investigating offset handling) Key improvements: - Produce handler now calls AssignOffsets() and AppendRecord() correctly - High water mark properly updates from 0 → 1 → 2 → 3 - Record batches stored during Produce and retrieved during Fetch - CRC validation passes because we return exact same record batch data - Debug logging shows 'Using stored record batch for offset X' TODO: Fix consumer offset handling when fetchOffset == highWaterMark	2 months ago
chrislu	28d4f90d83	feat: enhance Fetch API with proper request parsing and record batch construction - Added comprehensive Fetch request parsing for different API versions - Implemented constructRecordBatchFromLedger to return actual messages - Added support for dynamic topic/partition handling in Fetch responses - Enhanced record batch format with proper Kafka v2 structure - Added varint encoding for record fields - Improved error handling and validation TODO: Debug consumer integration issues and test with actual message retrieval	2 months ago
chrislu	deb315a8a9	persist kafka offset Phase E2: Integrate Protobuf descriptor parser with decoder - Update NewProtobufDecoder to use ProtobufDescriptorParser - Add findFirstMessageName helper for automatic message detection - Fix ParseBinaryDescriptor to return schema even on resolution failure - Add comprehensive tests for protobuf decoder integration - Improve error handling and caching behavior This enables proper binary descriptor parsing in the protobuf decoder, completing the integration between descriptor parsing and decoding. Phase E3: Complete Protobuf message descriptor resolution - Implement full protobuf descriptor resolution using protoreflect API - Add buildFileDescriptor and findMessageInFileDescriptor methods - Support nested message resolution with findNestedMessageDescriptor - Add proper mutex protection for thread-safe cache access - Update all test data to use proper field cardinality labels - Update test expectations to handle successful descriptor resolution - Enable full protobuf decoder creation from binary descriptors Phase E (Protobuf Support) is now complete: ✅ E1: Binary descriptor parsing ✅ E2: Decoder integration ✅ E3: Full message descriptor resolution Protobuf messages can now be fully parsed and decoded Phase F: Implement Kafka record batch compression support - Add comprehensive compression module supporting gzip/snappy/lz4/zstd - Implement RecordBatchParser with full compression and CRC validation - Support compression codec extraction from record batch attributes - Add compression/decompression for all major Kafka codecs - Integrate compression support into Produce and Fetch handlers - Add extensive unit tests for all compression codecs - Support round-trip compression/decompression with proper error handling - Add performance benchmarks for compression operations Key features: ✅ Gzip compression (ratio: 0.02) ✅ Snappy compression (ratio: 0.06, fastest) ✅ LZ4 compression (ratio: 0.02) ✅ Zstd compression (ratio: 0.01, best compression) ✅ CRC32 validation for record batch integrity ✅ Proper Kafka record batch format v2 parsing ✅ Backward compatibility with uncompressed records Phase F (Compression Handling) is now complete. Phase G: Implement advanced schema compatibility checking and migration - Add comprehensive SchemaEvolutionChecker with full compatibility rules - Support BACKWARD, FORWARD, FULL, and NONE compatibility levels - Implement Avro schema compatibility checking with field analysis - Add JSON Schema compatibility validation - Support Protobuf compatibility checking (simplified implementation) - Add type promotion rules (int->long, float->double, string<->bytes) - Integrate schema evolution into Manager with validation methods - Add schema evolution suggestions and migration guidance - Support schema compatibility validation before evolution - Add comprehensive unit tests for all compatibility scenarios Key features: ✅ BACKWARD compatibility: New schema can read old data ✅ FORWARD compatibility: Old schema can read new data ✅ FULL compatibility: Both backward and forward compatible ✅ Type promotion support for safe schema evolution ✅ Field addition/removal validation with default value checks ✅ Schema evolution suggestions for incompatible changes ✅ Integration with schema registry for validation workflows Phase G (Schema Evolution) is now complete. fmt	2 months ago
chrislu	71b2615f4a	fmt	2 months ago
chrislu	9cfbc0d4a1	Phase 7: Implement Fetch path schema reconstruction framework - Add schema reconstruction functions to convert SMQ RecordValue back to Kafka format - Implement Confluent envelope reconstruction with proper schema metadata - Add Kafka record batch creation for schematized messages - Include topic-based schema detection and metadata retrieval - Add comprehensive round-trip testing for Avro schema reconstruction - Fix envelope parsing to avoid Protobuf interference with Avro messages - Prepare foundation for full SeaweedMQ integration in Phase 8 This enables the Kafka Gateway to reconstruct original message formats on Fetch.	2 months ago
chrislu	26eae1583f	Phase 1: Enhanced Kafka Gateway Schema Integration - Enhanced AgentClient with comprehensive Kafka record schema - Added kafka_key, kafka_value, kafka_timestamp, kafka_headers fields - Added kafka_offset and kafka_partition for full Kafka compatibility - Implemented createKafkaRecordSchema() for structured message storage - Enhanced SeaweedMQHandler with schema-aware topic management - Added CreateTopicWithSchema() method for proper schema registration - Integrated getDefaultKafkaSchema() for consistent schema across topics - Enhanced KafkaTopicInfo to store schema metadata - Enhanced Produce API with SeaweedMQ integration - Updated produceToSeaweedMQ() to use enhanced schema - Added comprehensive debug logging for SeaweedMQ operations - Maintained backward compatibility with in-memory mode - Added comprehensive integration tests - TestSeaweedMQIntegration for end-to-end SeaweedMQ backend testing - TestSchemaCompatibility for various message format validation - Tests verify enhanced schema works with different key-value types This implements the mq.agent architecture pattern for Kafka Gateway, providing structured message storage in SeaweedFS with full schema support.	2 months ago
chrislu	440fd4b65e	feat: major Kafka Gateway milestone - near-complete E2E functionality ✅ COMPLETED: - Cross-client Produce compatibility (kafka-go + Sarama) - Fetch API version validation (v0-v11) - ListOffsets v2 parsing (replica_id, isolation_level) - Fetch v5 response structure (18→78 bytes, ~95% Sarama compatible) 🔧 CURRENT STATUS: - Produce: ✅ Working perfectly with both clients - Metadata: ✅ Working with multiple versions (v0-v7) - ListOffsets: ✅ Working with v2 format - Fetch: 🟡 Nearly compatible, minor format tweaks needed Next: Fine-tune Fetch v5 response for perfect Sarama compatibility	2 months ago
chrislu	f6da3b2920	fix: Fetch API version validation and ListOffsets v2 parsing - Updated Fetch API to support v0-v11 (was v0-v1) - Fixed ListOffsets v2 request parsing (added replica_id and isolation_level fields) - Added proper debug logging for Fetch and ListOffsets handlers - Improved record batch construction with proper varint encoding - Cross-client Produce compatibility confirmed (kafka-go and Sarama) Next: Fix Fetch v5 response format for Sarama consumer compatibility	2 months ago
chrislu	2a7d1ccacf	fmt	2 months ago
chrislu	23f4f5e096	fix: correct Produce v7 request parsing for Sarama compatibility ✅ MAJOR FIX: Produce v7 Request Parsing - Fixed client_id, transactional_id, acks, timeout parsing - Now correctly parses Sarama requests: * client_id: sarama ✅ * transactional_id: null ✅ * acks: -1, timeout: 10000 ✅ * topics count: 1 ✅ * topic: sarama-e2e-topic ✅ 🔧 NEXT: Fix Produce v7 response format - Sarama getting 'invalid length' error on response - Response parsing issue, not request parsing	2 months ago
chrislu	5595dfd476	mq(kafka): Add comprehensive protocol compatibility review and TODOs - Create PROTOCOL_COMPATIBILITY_REVIEW.md documenting all compatibility issues - Add critical TODOs to most problematic protocol implementations: * Produce: Record batch parsing is simplified, missing compression/CRC * Offset management: Hardcoded 'test-topic' parsing breaks real clients * JoinGroup: Consumer subscription extraction hardcoded, incomplete parsing * Fetch: Fake record batch construction with dummy data * Handler: Missing API version validation across all endpoints - Identify high/medium/low priority fixes needed for real client compatibility - Document specific areas needing work: * Record format parsing (v0/v1/v2, compression, CRC validation) * Request parsing (topics arrays, partition arrays, protocol metadata) * Consumer group protocol metadata parsing * Connection metadata extraction * Error code accuracy - Add testing recommendations for kafka-go, Sarama, Java clients - Provide roadmap for Phase 4 protocol compliance improvements This review is essential before attempting integration with real Kafka clients as current simplified implementations will fail with actual client libraries.	2 months ago
chrislu	59f1c3dda5	mq(kafka): implement Fetch handler with record batch construction, high watermark tracking, and comprehensive test coverage for consumer functionality	2 months ago

33 Commits (bbc8668fd61fd734172c70bbcfc0d02f3b336038)