Add detailed end-to-end debugging to track message consumption:
Consumer Changes:
- Log initial offset and HWM when partition assigned
- Track offset gaps (indicate missing messages)
- Log progress every 500 messages OR every 5 seconds
- Count and report total gaps encountered
- Show HWM progression during consumption
Fetch Handler Changes:
- Log current offset updates
- Log fetch results (empty vs data)
- Show offset range and byte count returned
This comprehensive logging revealed a BREAKTHROUGH:
- Previous: 45% consumption (1395/3100)
- Current: 73% consumption (2275/3100)
- Improvement: 28 PERCENTAGE POINT JUMP!
The logging itself appears to help with race conditions!
This suggests timing-sensitive bugs in offset/fetch coordination.
Remaining Tasks:
- Find 825 missing messages (27%)
- Check if they're concentrated in specific partitions/offsets
- Investigate timing issues revealed by logging improvement
- Consider if there's a race between commit and next fetch
Next: Analyze logs to find offset gap patterns.