* filer: async empty folder cleanup via metadata events
Implements asynchronous empty folder cleanup when files are deleted in S3.
Key changes:
1. EmptyFolderCleaner - New component that handles folder cleanup:
- Uses consistent hashing (LockRing) to determine folder ownership
- Each filer owns specific folders, avoiding duplicate cleanup work
- Debounces delete events (10s delay) to batch multiple deletes
- Caches rough folder counts to skip unnecessary checks
- Cancels pending cleanup when new files are created
- Handles both file and subdirectory deletions
2. Integration with metadata events:
- Listens to both local and remote filer metadata events
- Processes create/delete/rename events to track folder state
- Only processes folders under /buckets/<bucket>/...
3. Removed synchronous empty folder cleanup from S3 handlers:
- DeleteObjectHandler no longer calls DoDeleteEmptyParentDirectories
- DeleteMultipleObjectsHandler no longer tracks/cleans directories
- Cleanup now happens asynchronously via metadata events
Benefits:
- Non-blocking: S3 delete requests return immediately
- Coordinated: Only one filer (the owner) cleans each folder
- Efficient: Batching and caching reduce unnecessary checks
- Event-driven: Folder deletion triggers parent folder check automatically
* filer: add CleanupQueue data structure for deduplicated folder cleanup
CleanupQueue uses a linked list for FIFO ordering and a hashmap for O(1)
deduplication. Processing is triggered when:
- Queue size reaches maxSize (default 1000), OR
- Oldest item exceeds maxAge (default 10 minutes)
Key features:
- O(1) Add, Remove, Pop, Contains operations
- Duplicate folders are ignored (keeps original position/time)
- Testable with injectable time function
- Thread-safe with mutex protection
* filer: use CleanupQueue for empty folder cleanup
Replace timer-per-folder approach with queue-based processing:
- Use CleanupQueue for deduplication and ordered processing
- Process queue when full (1000 items) or oldest item exceeds 10 minutes
- Background processor checks queue every 10 seconds
- Remove from queue on create events to cancel pending cleanup
Benefits:
- Bounded memory: queue has max size, not unlimited timers
- Efficient: O(1) add/remove/contains operations
- Batch processing: handle many folders efficiently
- Better for high-volume delete scenarios
* filer: CleanupQueue.Add moves duplicate to back with updated time
When adding a folder that already exists in the queue:
- Remove it from its current position
- Add it to the back of the queue
- Update the queue time to current time
This ensures that folders with recent delete activity are processed
later, giving more time for additional deletes to occur.
* filer: CleanupQueue uses event time and inserts in sorted order
Changes:
- Add() now takes eventTime parameter instead of using current time
- Insert items in time-sorted order (oldest at front) to handle out-of-order events
- When updating duplicate with newer time, reposition to maintain sort order
- Ignore updates with older time (keep existing later time)
This ensures proper ordering when processing events from distributed filers
where event arrival order may not match event occurrence order.
* filer: remove unused CleanupQueue functions (SetNowFunc, GetAll)
Removed test-only functions:
- SetNowFunc: tests now use real time with past event times
- GetAll: tests now use Pop() to verify order
Kept functions used in production:
- Peek: used in filer_notify_read.go
- OldestAge: used in empty_folder_cleaner.go logging
* filer: initialize cache entry on first delete/create event
Previously, roughCount was only updated if the cache entry already
existed, but entries were only created during executeCleanup. This
meant delete/create events before the first cleanup didn't track
the count.
Now create the cache entry on first event, so roughCount properly
tracks all changes from the start.
* filer: skip adding to cleanup queue if roughCount > 0
If the cached roughCount indicates there are still items in the
folder, don't bother adding it to the cleanup queue. This avoids
unnecessary queue entries and reduces wasted cleanup checks.
* filer: don't create cache entry on create event
Only update roughCount if the folder is already being tracked.
New folders don't need tracking until we see a delete event.
* filer: move empty folder cleanup to its own package
- Created weed/filer/empty_folder_cleanup package
- Defined FilerOperations interface to break circular dependency
- Added CountDirectoryEntries method to Filer
- Exported IsUnderPath and IsUnderBucketPath helper functions
* filer: make isUnderPath and isUnderBucketPath private
These helpers are only used within the empty_folder_cleanup package.
1. go get aqwari.net/xml/cmd/xsdgen 2. Add EncodingType element for ListBucketResult in AmazonS3.xsd 3. xsdgen -o s3api_xsd_generated.go -pkg s3api AmazonS3.xsd 4. Remove empty Grantee struct in s3api_xsd_generated.go 5. Remove xmlns: sed s'/http:\/\/s3.amazonaws.com\/doc\/2006-03-01\/\ //' s3api_xsd_generated.go