* filer: async empty folder cleanup via metadata events
Implements asynchronous empty folder cleanup when files are deleted in S3.
Key changes:
1. EmptyFolderCleaner - New component that handles folder cleanup:
- Uses consistent hashing (LockRing) to determine folder ownership
- Each filer owns specific folders, avoiding duplicate cleanup work
- Debounces delete events (10s delay) to batch multiple deletes
- Caches rough folder counts to skip unnecessary checks
- Cancels pending cleanup when new files are created
- Handles both file and subdirectory deletions
2. Integration with metadata events:
- Listens to both local and remote filer metadata events
- Processes create/delete/rename events to track folder state
- Only processes folders under /buckets/<bucket>/...
3. Removed synchronous empty folder cleanup from S3 handlers:
- DeleteObjectHandler no longer calls DoDeleteEmptyParentDirectories
- DeleteMultipleObjectsHandler no longer tracks/cleans directories
- Cleanup now happens asynchronously via metadata events
Benefits:
- Non-blocking: S3 delete requests return immediately
- Coordinated: Only one filer (the owner) cleans each folder
- Efficient: Batching and caching reduce unnecessary checks
- Event-driven: Folder deletion triggers parent folder check automatically
* filer: add CleanupQueue data structure for deduplicated folder cleanup
CleanupQueue uses a linked list for FIFO ordering and a hashmap for O(1)
deduplication. Processing is triggered when:
- Queue size reaches maxSize (default 1000), OR
- Oldest item exceeds maxAge (default 10 minutes)
Key features:
- O(1) Add, Remove, Pop, Contains operations
- Duplicate folders are ignored (keeps original position/time)
- Testable with injectable time function
- Thread-safe with mutex protection
* filer: use CleanupQueue for empty folder cleanup
Replace timer-per-folder approach with queue-based processing:
- Use CleanupQueue for deduplication and ordered processing
- Process queue when full (1000 items) or oldest item exceeds 10 minutes
- Background processor checks queue every 10 seconds
- Remove from queue on create events to cancel pending cleanup
Benefits:
- Bounded memory: queue has max size, not unlimited timers
- Efficient: O(1) add/remove/contains operations
- Batch processing: handle many folders efficiently
- Better for high-volume delete scenarios
* filer: CleanupQueue.Add moves duplicate to back with updated time
When adding a folder that already exists in the queue:
- Remove it from its current position
- Add it to the back of the queue
- Update the queue time to current time
This ensures that folders with recent delete activity are processed
later, giving more time for additional deletes to occur.
* filer: CleanupQueue uses event time and inserts in sorted order
Changes:
- Add() now takes eventTime parameter instead of using current time
- Insert items in time-sorted order (oldest at front) to handle out-of-order events
- When updating duplicate with newer time, reposition to maintain sort order
- Ignore updates with older time (keep existing later time)
This ensures proper ordering when processing events from distributed filers
where event arrival order may not match event occurrence order.
* filer: remove unused CleanupQueue functions (SetNowFunc, GetAll)
Removed test-only functions:
- SetNowFunc: tests now use real time with past event times
- GetAll: tests now use Pop() to verify order
Kept functions used in production:
- Peek: used in filer_notify_read.go
- OldestAge: used in empty_folder_cleaner.go logging
* filer: initialize cache entry on first delete/create event
Previously, roughCount was only updated if the cache entry already
existed, but entries were only created during executeCleanup. This
meant delete/create events before the first cleanup didn't track
the count.
Now create the cache entry on first event, so roughCount properly
tracks all changes from the start.
* filer: skip adding to cleanup queue if roughCount > 0
If the cached roughCount indicates there are still items in the
folder, don't bother adding it to the cleanup queue. This avoids
unnecessary queue entries and reduces wasted cleanup checks.
* filer: don't create cache entry on create event
Only update roughCount if the folder is already being tracked.
New folders don't need tracking until we see a delete event.
* filer: move empty folder cleanup to its own package
- Created weed/filer/empty_folder_cleanup package
- Defined FilerOperations interface to break circular dependency
- Added CountDirectoryEntries method to Filer
- Exported IsUnderPath and IsUnderBucketPath helper functions
* filer: make isUnderPath and isUnderBucketPath private
These helpers are only used within the empty_folder_cleanup package.
This document describes the binary layout of the Needle structure as used in SeaweedFS storage, for all supported versions (v1, v2, v3).
A Needle represents a file or data blob stored in a volume file. The layout determines how the Needle is serialized to disk for efficient storage and retrieval.
Common Field Sizes
Field
Size (bytes)
Cookie
4
NeedleId
8
Size
4
DataSize
4
Flags
1
NameSize
1
MimeSize
1
LastModified
5
Ttl
2
PairsSize
2
Checksum
4
Timestamp
8
Needle Layouts by Version
Version 1
Offset
Field
Size (bytes)
Description
0
Cookie
4
Random number to mitigate brute force lookups
4
Id
8
Needle ID
12
Size
4
Length of Data
16
Data
N
File data (N = Size)
16+N
Checksum
4
CRC32 of Data
20+N
Padding
0-7
To align to 8 bytes
Version 2
Offset
Field
Size (bytes)
Description
0
Cookie
4
Random number
4
Id
8
Needle ID
12
Size
4
Total size of the following fields
16
DataSize
4
Length of Data (N)
20
Data
N
File data
20+N
Flags
1
Bit flags
21+N
NameSize
1 (opt)
Optional, if present
22+N
Name
M (opt)
Optional, if present (M = NameSize)
...
MimeSize
1 (opt)
Optional, if present
...
Mime
K (opt)
Optional, if present (K = MimeSize)
...
LastModified
5 (opt)
Optional, if present
...
Ttl
2 (opt)
Optional, if present
...
PairsSize
2 (opt)
Optional, if present
...
Pairs
P (opt)
Optional, if present (P = PairsSize)
...
Checksum
4
CRC32
...
Padding
0-7
To align to 8 bytes
Version 3
Offset
Field
Size (bytes)
Description
0
Cookie
4
Random number
4
Id
8
Needle ID
12
Size
4
Total size of the following fields
16
DataSize
4
Length of Data (N)
20
Data
N
File data
20+N
Flags
1
Bit flags
21+N
NameSize
1 (opt)
Optional, if present
22+N
Name
M (opt)
Optional, if present (M = NameSize)
...
MimeSize
1 (opt)
Optional, if present
...
Mime
K (opt)
Optional, if present (K = MimeSize)
...
LastModified
5 (opt)
Optional, if present
...
Ttl
2 (opt)
Optional, if present
...
PairsSize
2 (opt)
Optional, if present
...
Pairs
P (opt)
Optional, if present (P = PairsSize)
...
Checksum
4
CRC32
...
Timestamp
8
Append time in nanoseconds
...
Padding
0-7
To align to 8 bytes
Offsets marked with ... depend on the presence and size of previous optional fields.
Fields marked (opt) are optional and only present if the corresponding size or flag is non-zero.
N = DataSize, M = NameSize, K = MimeSize, P = PairsSize.
Field Explanations
Cookie: 4 bytes, random value for security.
Id: 8 bytes, unique identifier for the Needle.
Size: 4 bytes, total size of the Needle data section (not including header, checksum, timestamp, or padding).
Checksum: 4 bytes, CRC32 checksum of the Needle data.
Timestamp: 8 bytes, append time (only in v3).
Padding: 0-7 bytes, to align the total Needle size to 8 bytes.
Version Comparison Table
Field
v1
v2
v3
Cookie
✔
✔
✔
Id
✔
✔
✔
Size
✔
✔
✔
DataSize
✔
✔
Data
✔
✔
✔
Flags
✔
✔
NameSize/Name
✔
✔
MimeSize/Mime
✔
✔
LastModified
✔
✔
Ttl
✔
✔
PairsSize/Pairs
✔
✔
Checksum
✔
✔
✔
Timestamp
✔
Padding
✔
✔
✔
Flags Field Details
The Flags field (present in v2 and v3) is a bitmask that encodes several boolean properties of the Needle. Each bit has a specific meaning:
Bit Value
Name
Meaning
0x01
FlagIsCompressed
Data is compressed (isCompressed)
0x02
FlagHasName
Name field is present (NameSize/Name)
0x04
FlagHasMime
Mime field is present (MimeSize/Mime)
0x08
FlagHasLastModifiedDate
LastModified field is present
0x10
FlagHasTtl
Ttl field is present
0x20
FlagHasPairs
Pairs field is present (PairsSize/Pairs)
0x80
FlagIsChunkManifest
Data is a chunk manifest (for large files)
If a flag is set, the corresponding field(s) will appear in the Needle layout at the appropriate position.
The Flags field is always present in v2 and v3, immediately after the Data field.
Optional Fields
Fields marked as optional in the layout tables are only present if the corresponding flag in the Flags field is set (except for Name/Mime/Pairs, which also depend on their size fields being non-zero).
The order of optional fields is fixed and matches the order of their flags.
Special Notes
isCompressed: If set, the Data field is compressed (typically using gzip). This is indicated by the lowest bit (0x01) in the Flags byte.
isChunkManifest: If set, the Data field contains a manifest describing chunks of a large file, not raw file data.
All multi-byte fields are stored in big-endian order.
Padding is always added at the end to align the total Needle size to 8 bytes.
N = DataSize, M = NameSize, K = MimeSize, P = PairsSize in the layout tables above.
For more details, see the implementation in the corresponding Go files in this directory.