Browse Source
Critical bug fixes from PR review:
1. Fix RetryCount reset bug (CRITICAL)
- Problem: When items are re-queued via AddOrUpdate, RetryCount
resets to 1, breaking exponential backoff
- Solution: Add RequeueForRetry() method that preserves retry state
- Impact: Ensures proper exponential backoff progression
2. Add overflow protection in backoff calculation
- Check shift amount > 63 to prevent bit-shift overflow
- Additional safety: check if delay <= 0 or > MaxRetryDelay
- Protects against arithmetic overflow in extreme cases
3. Expand retryable error patterns
- Added: timeout, deadline exceeded, context canceled
- Added: lookup error/failed (volume discovery issues)
- Added: connection refused, broken pipe (network errors)
- Added: too many requests, service unavailable (backpressure)
- Added: temporarily unavailable, try again (transient errors)
- Added: i/o timeout (network timeouts)
Benefits:
- Retry mechanism now works correctly across restarts
- More robust against edge cases and overflow
- Better coverage of transient failure scenarios
- Improved resilience in high-failure environments
Addresses feedback from CodeRabbit and Gemini Code Assist in PR #7402.
pull/7402/head
1 changed files with 58 additions and 5 deletions
Loading…
Reference in new issue