Address PR feedback with minor optimizations:
- Add MaxLoggedErrorDetails constant (replaces magic number 10)
- Pre-allocate slices and maps in processRetryBatch for efficiency
- Improve log message formatting to use constant
These changes improve code maintainability and runtime performance
without altering functionality.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Extract large callback functions into dedicated private methods to
improve code organization and maintainability.
Changes:
1. Extract processDeletionBatch method
- Handles deletion of a batch of file IDs
- Classifies errors (success, not found, retryable, permanent)
- Manages retry queue additions
- Consolidates logging logic
2. Extract processRetryBatch method
- Handles retry attempts for previously failed deletions
- Processes retry results and updates queue
- Symmetric to processDeletionBatch for consistency
Benefits:
- Main loop functions (loopProcessingDeletion, loopProcessingDeletionRetry)
are now concise and focused on orchestration
- Business logic is separated into testable methods
- Reduced nesting depth improves readability
- Easier to understand control flow at a glance
- Better separation of concerns
The refactored methods follow the single responsibility principle,
making the codebase more maintainable and easier to extend.
1. Replace interface{} with any in heap methods
- Addresses modern Go style (Go 1.18+)
- Improves code readability
2. Enhance isRetryableError documentation
- Acknowledge string matching brittleness
- Add comprehensive TODO for future improvements:
* Use HTTP status codes (503, 429, etc.)
* Implement structured error types with errors.Is/As
* Extract gRPC status codes
* Add error wrapping for better context
- Document each error pattern with context
- Add defensive check for empty error strings
Current implementation remains pragmatic for initial release while
documenting a clear path for future robustness improvements. String
matching is acceptable for now but should be replaced with structured
error checking when refactoring the deletion pipeline.
Replace map-based retry queue with a min-heap for better scalability
and deterministic ordering.
Performance improvements:
- GetReadyItems: O(N) → O(K log N) where K is items retrieved
- AddOrUpdate: O(1) → O(log N) (acceptable trade-off)
- Early exit when checking ready items (heap top is earliest)
- No full iteration over all items while holding lock
Benefits:
- Deterministic processing order (earliest NextRetryAt first)
- Better scalability for large retry queues (thousands of items)
- Reduced lock contention duration
- Memory efficient (no separate slice reconstruction)
Implementation:
- Min-heap ordered by NextRetryAt using container/heap
- Dual index: heap for ordering + map for O(1) FileId lookups
- heap.Fix() used when updating existing items
- Comprehensive complexity documentation in comments
This addresses the performance bottleneck identified in GetReadyItems
where iterating over the entire map with a write lock could block
other goroutines in high-failure scenarios.
Replace hardcoded values with package-level constants for better
maintainability:
- DeletionRetryPollInterval (1 minute): interval for checking retry queue
- DeletionRetryBatchSize (1000): max items to process per iteration
This improves code readability and makes configuration changes easier.
Implement a retry queue with exponential backoff for handling transient
deletion failures, particularly when volumes are temporarily read-only.
Key features:
- Automatic retry for retryable errors (read-only volumes, network issues)
- Exponential backoff: 5min → 10min → 20min → ... (max 6 hours)
- Maximum 10 retry attempts per file before giving up
- Separate goroutine processing retry queue every minute
- Map-based retry queue for O(1) lookups and deletions
- Enhanced logging with retry/permanent error classification
- Consistent error detail limiting (max 10 total errors logged)
- Graceful shutdown support with quit channel for both processors
This addresses the issue where file deletions fail when volumes are
temporarily read-only (tiered volumes, maintenance, etc.) and these
deletions were previously lost.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Implement a retry queue with exponential backoff for handling transient
deletion failures, particularly when volumes are temporarily read-only.
Key features:
- Automatic retry for retryable errors (read-only volumes, network issues)
- Exponential backoff: 5min → 10min → 20min → ... (max 6 hours)
- Maximum 10 retry attempts per file before giving up
- Separate goroutine processing retry queue every minute
- Enhanced logging with retry/permanent error classification
This addresses the issue where file deletions fail when volumes are
temporarily read-only (tiered volumes, maintenance, etc.) and these
deletions were previously lost.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
* batch deletion operations to return individual error results
Modify batch deletion operations to return individual error results instead of one aggregated error, enabling better tracking of which specific files failed to delete (helping reduce orphan file issues).
* Simplified logging logic
* Optimized nested loop
* handles the edge case where the RPC succeeds but connection cleanup fails
* simplify
* simplify
* ignore 'not found' errors here