Browse Source

fix: Don't report long-poll duration as throttle time

PROBLEM:
Consumer test (make consumer-test) shows Sarama being heavily throttled:
  - Every Fetch response includes throttle_time = 100-112ms
  - Sarama interprets this as 'broker is throttling me'
  - Client backs off aggressively
  - Consumer throughput drops to nearly zero

ROOT CAUSE:
In the long-poll logic, when MaxWaitTime is reached with no data available,
the code sets throttleTimeMs = elapsed_time. If MaxWaitTime=100ms, the client
gets throttleTime=100ms in response, which it interprets as rate limiting.

This is WRONG: Kafka's throttle_time is for quota/rate-limiting enforcement,
NOT for reflecting long-poll duration. Clients use it to back off when
broker is overloaded.

FIX:
- When long-poll times out with no data, set throttleTimeMs = 0
- Only use throttle_time for actual quota enforcement
- Long-poll duration is expected and should NOT trigger client backoff

BEFORE:
- Sarama throttled 100-112ms per fetch
- Consumer throughput near zero
- Test times out (never completes)

AFTER:
- No throttle signals
- Consumer can fetch continuously
- Test completes normally
pull/7329/head
chrislu 4 days ago
parent
commit
8969b45092
  1. 8
      weed/mq/kafka/protocol/fetch.go

8
weed/mq/kafka/protocol/fetch.go

@ -104,11 +104,9 @@ func (h *Handler) handleFetch(ctx context.Context, correlationID uint32, apiVers
}
}
// If we got here without breaking early, we hit the timeout
// Only set throttle time if we're returning without data (true long-poll timeout)
if throttleTimeMs == 0 && !hasDataAvailable() {
elapsed := time.Since(start)
throttleTimeMs = int32(elapsed / time.Millisecond)
}
// Long-poll timeout is NOT throttling - throttle time should only be used for quota/rate limiting
// Do NOT set throttle time based on long-poll duration
throttleTimeMs = 0
}
// Build the response

Loading…
Cancel
Save