You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
chrislu
6ea2f8a4bd
feat: Add comprehensive timeout and hang detection logging
Phase 3 Implementation: Fetch Hang Debugging
Added detailed timing instrumentation to identify slow fetches:
- Track fetch request duration at partition reader level
- Log warnings if fetch > 2 seconds
- Track both multi-batch and fallback fetch times
- Consumer-side hung fetch detection (< 10 messages then stop)
- Mark partitions that terminate abnormally
Changes:
- fetch_partition_reader.go: +30 lines timing instrumentation
- consumer.go: Enhanced abnormal termination detection
Test Results - BREAKTHROUGH:
BEFORE: 71% delivery (1671/2349)
AFTER: 87.5% delivery (2055/2349) 🚀
IMPROVEMENT: +16.5 percentage points!
Remaining missing: 294 messages (12.5%)
Down from: 1705 messages (55%) at session start!
Pattern Evolution:
Session Start: 0% (0/3100) - topic not found errors
After Fix #1: 45% (1395/3100) - topic visibility fixed
After Fix #2: 71% (1671/2349) - comprehensive logging helped
Current: 87.5% (2055/2349) - timing/hang detection added
Key Findings:
- No slow fetches detected (> 2 seconds) - suggests issue is subtle
- Most partitions now consume completely
- Remaining gaps concentrated in specific offset ranges
- Likely edge case in offset boundary conditions
Next: Analyze remaining 12.5% gap patterns to find last edge case
|
1 month ago |
| .. |
|
config
|
Add Kafka Gateway (#7231)
|
1 month ago |
|
consumer
|
feat: Add comprehensive timeout and hang detection logging
|
1 month ago |
|
metrics
|
Add Kafka Gateway (#7231)
|
1 month ago |
|
producer
|
verify produced messages are consumed
|
1 month ago |
|
schema
|
Add Kafka Gateway (#7231)
|
1 month ago |
|
tracker
|
track messages with testStartTime
|
1 month ago |