From b68b9c6dd60c0daaea2c6f65b547dbec24ae0244 Mon Sep 17 00:00:00 2001 From: chrislu Date: Fri, 17 Oct 2025 14:53:58 -0700 Subject: [PATCH] test: Single-partition test confirms broker data retrieval bug MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Phase 8: Single Partition Test - Isolates Root Cause Test Configuration: - 1 topic, 1 partition (loadtest-topic-0[0]) - 1 producer (50 msg/sec) - 1 consumer - Duration: 2 minutes Results: - Produced: 6100 messages (offsets 0-6099) - Consumed: 301 messages (offsets 0-300) - Missing: 5799 messages (95.1% loss!) - Duplicates: 0 (no duplication) Key Findings: ✅ Consumer stops cleanly at offset 300 ✅ No gaps in consumed data (0-300 all present) ❌ Broker returns 0 messages for offset 301 ❌ HWM shows 5601, meaning 5300 messages available ❌ Gateway logs: "CRITICAL BUG: Broker returned 0 messages" ROOT CAUSE CONFIRMED: - This is NOT a buffer flush bug (unit tests passed) - This is NOT a rebalancing issue (single consumer) - This is NOT a duplication issue (0 duplicates) - This IS a broker data retrieval bug at offset 301 The broker's ReadMessagesAtOffset or FetchMessage RPC fails to return data that exists on disk/memory. Next: Debug broker's ReadMessagesAtOffset for offset 301 --- .../single-partition-test.sh | 36 +++++++++++++++++++ 1 file changed, 36 insertions(+) create mode 100755 test/kafka/kafka-client-loadtest/single-partition-test.sh diff --git a/test/kafka/kafka-client-loadtest/single-partition-test.sh b/test/kafka/kafka-client-loadtest/single-partition-test.sh new file mode 100755 index 000000000..9c8b8a712 --- /dev/null +++ b/test/kafka/kafka-client-loadtest/single-partition-test.sh @@ -0,0 +1,36 @@ +#!/bin/bash +# Single partition test - produce and consume from ONE topic, ONE partition + +set -e + +echo "================================================================" +echo " Single Partition Test - Isolate Missing Messages" +echo " - Topic: single-test-topic (1 partition only)" +echo " - Duration: 2 minutes" +echo " - Producer: 1 (50 msgs/sec)" +echo " - Consumer: 1 (reading from partition 0 only)" +echo "================================================================" + +# Clean up +make clean +make start + +# Run test with single topic, single partition +TEST_MODE=comprehensive \ +TEST_DURATION=2m \ +PRODUCER_COUNT=1 \ +CONSUMER_COUNT=1 \ +MESSAGE_RATE=50 \ +MESSAGE_SIZE=512 \ +TOPIC_COUNT=1 \ +PARTITIONS_PER_TOPIC=1 \ +VALUE_TYPE=avro \ +docker compose --profile loadtest up --abort-on-container-exit kafka-client-loadtest + +echo "" +echo "================================================================" +echo " Single Partition Test Complete!" +echo "================================================================" +echo "" +echo "Analyzing results..." +cd test-results && python3 analyze_missing.py