* Add Spark Iceberg catalog integration tests and CI support
Implement comprehensive integration tests for Spark with SeaweedFS Iceberg REST catalog:
- Basic CRUD operations (Create, Read, Update, Delete) on Iceberg tables
- Namespace (database) management
- Data insertion, querying, and deletion
- Time travel capabilities via snapshot versioning
- Compatible with SeaweedFS S3 and Iceberg REST endpoints
Tests mirror the structure of existing Trino integration tests but use Spark's
Python SQL API and PySpark for testing.
Add GitHub Actions CI job for spark-iceberg-catalog-tests in s3-tables-tests.yml
to automatically run Spark integration tests on pull requests.
* fmt
* Fix Spark integration tests - code review feedback
* go mod tidy
* Add go mod tidy step to integration test jobs
Add 'go mod tidy' step before test runs for all integration test jobs:
- s3-tables-tests
- iceberg-catalog-tests
- trino-iceberg-catalog-tests
- spark-iceberg-catalog-tests
This ensures dependencies are clean before running tests.
* Fix remaining Spark operations test issues
Address final code review comments:
Setup & Initialization:
- Add waitForSparkReady() helper function that polls Spark readiness
with backoff instead of hardcoded 10-second sleep
- Extract setupSparkTestEnv() helper to reduce boilerplate duplication
between TestSparkCatalogBasicOperations and TestSparkTimeTravel
- Both tests now use helpers for consistent, reliable setup
Assertions & Validation:
- Make setup-critical operations (namespace, table creation, initial
insert) use t.Fatalf instead of t.Errorf to fail fast
- Validate setupSQL output in TestSparkTimeTravel and fail if not
'Setup complete'
- Add validation after second INSERT in TestSparkTimeTravel:
verify row count increased to 2 before time travel test
- Add context to error messages with namespace and tableName params
Code Quality:
- Remove code duplication between test functions
- All critical paths now properly validated
- Consistent error handling throughout
* Fix go vet errors in S3 Tables tests
Fixes:
1. setup_test.go (Spark):
- Add missing import: github.com/testcontainers/testcontainers-go/wait
- Use wait.ForLog instead of undefined testcontainers.NewLogStrategy
- Remove unused strings import
2. trino_catalog_test.go:
- Use net.JoinHostPort instead of fmt.Sprintf for address formatting
- Properly handles IPv6 addresses by wrapping them in brackets
* Use weed mini for simpler SeaweedFS startup
Replace complex multi-process startup (master, volume, filer, s3)
with single 'weed mini' command that starts all services together.
Benefits:
- Simpler, more reliable startup
- Single weed mini process vs 4 separate processes
- Automatic coordination between components
- Better port management with no manual coordination
Changes:
- Remove separate master, volume, filer process startup
- Use weed mini with -master.port, -filer.port, -s3.port flags
- Keep Iceberg REST as separate service (still needed)
- Increase timeout to 15s for port readiness (weed mini startup)
- Remove volumePort and filerProcess fields from TestEnvironment
- Simplify cleanup to only handle two processes (mini, iceberg rest)
* Clean up dead code and temp directory leaks
Fixes:
1. Remove dead s3Process field and cleanup:
- weed mini bundles S3 gateway, no separate process needed
- Removed s3Process field from TestEnvironment
- Removed unnecessary s3Process cleanup code
2. Fix temp config directory leak:
- Add sparkConfigDir field to TestEnvironment
- Store returned configDir in writeSparkConfig
- Clean up sparkConfigDir in Cleanup() with os.RemoveAll
- Prevents accumulation of temp directories in test runs
3. Simplify Cleanup:
- Now handles only necessary processes (weed mini, iceberg rest)
- Removes both seaweedfsDataDir and sparkConfigDir
- Cleaner shutdown sequence
* Use weed mini's built-in Iceberg REST and fix python binary
Changes:
- Add -s3.port.iceberg flag to weed mini for built-in Iceberg REST Catalog
- Remove separate 'weed server' process for Iceberg REST
- Remove icebergRestProcess field from TestEnvironment
- Simplify Cleanup() to only manage weed mini + Spark
- Add port readiness check for iceberg REST from weed mini
- Set Spark container Cmd to '/bin/sh -c sleep 3600' to keep it running
- Change python to python3 in container.Exec calls
This simplifies to truly one all-in-one weed mini process (master, filer, s3,
iceberg-rest) plus just the Spark container.
* go fmt
* clean up
* bind on a non-loopback IP for container access, aligned Iceberg metadata saves/locations with table locations, and reworked Spark time travel to use TIMESTAMP AS OF with safe timestamp extraction.
* shared mini start
* Fixed internal directory creation under /buckets so .objects paths can auto-create without failing bucket-name validation, which restores table bucket object writes
* fix path
Updated table bucket objects to write under `/buckets/<bucket>` and saved Iceberg metadata there, adjusting Spark time-travel timestamp to committed_at +1s. Rebuilt the weed binary (`go
install ./weed`) and confirmed passing tests for Spark and Trino with focused test commands.
* Updated table bucket creation to stop creating /buckets/.objects and switched Trino REST warehouse to s3://<bucket> to match Iceberg layout.
* Stabilize S3Tables integration tests
* Fix timestamp extraction and remove dead code in bucketDir
* Use table bucket as warehouse in s3tables tests
* Update trino_blog_operations_test.go
* adds the CASCADE option to handle any remaining table metadata/files in the schema directory
* skip namespace not empty
* Add Trino blog operations test
* Update test/s3tables/catalog_trino/trino_blog_operations_test.go
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
* feat: add table bucket path helpers and filer operations
- Add table object root and table location mapping directories
- Implement ensureDirectory, upsertFile, deleteEntryIfExists helpers
- Support table location bucket mapping for S3 access
* feat: manage table bucket object roots on creation/deletion
- Create .objects directory for table buckets on creation
- Clean up table object bucket paths on deletion
- Enable S3 operations on table bucket object roots
* feat: add table location mapping for Iceberg REST
- Track table location bucket mappings when tables are created/updated/deleted
- Enable location-based routing for S3 operations on table data
* feat: route S3 operations to table bucket object roots
- Route table-s3 bucket names to mapped table paths
- Route table buckets to object root directories
- Support table location bucket mapping lookup
* feat: emit table-s3 locations from Iceberg REST
- Generate unique table-s3 bucket names with UUID suffix
- Store table metadata under table bucket paths
- Return table-s3 locations for Trino compatibility
* fix: handle missing directories in S3 list operations
- Propagate ErrNotFound from ListEntries for non-existent directories
- Treat missing directories as empty results for list operations
- Fixes Trino non-empty location checks on table creation
* test: improve Trino CSV parsing for single-value results
- Sanitize Trino output to skip jline warnings
- Handle single-value CSV results without header rows
- Strip quotes from numeric values in tests
* refactor: use bucket path helpers throughout S3 API
- Replace direct bucket path operations with helper functions
- Leverage centralized table bucket routing logic
- Improve maintainability with consistent path resolution
* fix: add table bucket cache and improve filer error handling
- Cache table bucket lookups to reduce filer overhead on repeated checks
- Use filer_pb.CreateEntry and filer_pb.UpdateEntry helpers to check resp.Error
- Fix delete order in handler_bucket_get_list_delete: delete table object before directory
- Make location mapping errors best-effort: log and continue, don't fail API
- Update table location mappings to delete stale prior bucket mappings on update
- Add 1-second sleep before timestamp time travel query to ensure timestamps are in past
- Fix CSV parsing: examine all lines, not skip first; handle single-value rows
* fix: properly handle stale metadata location mapping cleanup
- Capture oldMetadataLocation before mutation in handleUpdateTable
- Update updateTableLocationMapping to accept both old and new locations
- Use passed-in oldMetadataLocation to detect location changes
- Delete stale mapping only when location actually changes
- Pass empty string for oldLocation in handleCreateTable (new tables have no prior mapping)
- Improve logging to show old -> new location transitions
* refactor: cleanup imports and cache design
- Remove unused 'sync' import from bucket_paths.go
- Use filer_pb.UpdateEntry helper in setExtendedAttribute and deleteExtendedAttribute for consistent error handling
- Add dedicated tableBucketCache map[string]bool to BucketRegistry instead of mixing concerns with metadataCache
- Improve cache separation: table buckets cache is now separate from bucket metadata cache
* fix: improve cache invalidation and add transient error handling
Cache invalidation (critical fix):
- Add tableLocationCache to BucketRegistry for location mapping lookups
- Clear tableBucketCache and tableLocationCache in RemoveBucketMetadata
- Prevents stale cache entries when buckets are deleted/recreated
Transient error handling:
- Only cache table bucket lookups when conclusive (found or ErrNotFound)
- Skip caching on transient errors (network, permission, etc)
- Prevents marking real table buckets as non-table due to transient failures
Performance optimization:
- Cache tableLocationDir results to avoid repeated filer RPCs on hot paths
- tableLocationDir now checks cache before making expensive filer lookups
- Cache stores empty string for 'not found' to avoid redundant lookups
Code clarity:
- Add comment to deleteDirectory explaining DeleteEntry response lacks Error field
* go fmt
* fix: mirror transient error handling in tableLocationDir and optimize bucketDir
Transient error handling:
- tableLocationDir now only caches definitive results
- Mirrors isTableBucket behavior to prevent treating transient errors as permanent misses
- Improves reliability on flaky systems or during recovery
Performance optimization:
- bucketDir avoids redundant isTableBucket call via bucketRoot
- Directly use s3a.option.BucketsPath for regular buckets
- Saves one cache lookup for every non-table bucket operation
* fix: revert bucketDir optimization to preserve bucketRoot logic
The optimization to directly use BucketsPath bypassed bucketRoot's logic
and caused issues with S3 list operations on delimiter+prefix cases.
Revert to using path.Join(s3a.bucketRoot(bucket), bucket) which properly
handles all bucket types and ensures consistent path resolution across
the codebase.
The slight performance cost of an extra cache lookup is worth the correctness
and consistency benefits.
* feat: move table buckets under /buckets
Add a table-bucket marker attribute, reuse bucket metadata cache for table bucket detection, and update list/validation/UI/test paths to treat table buckets as /buckets entries.
* Fix S3 Tables code review issues
- handler_bucket_create.go: Fix bucket existence check to properly validate
entryResp.Entry before setting s3BucketExists flag (nil Entry should not
indicate existing bucket)
- bucket_paths.go: Add clarifying comment to bucketRoot() explaining unified
buckets root path for all bucket types
- file_browser_data.go: Optimize by extracting table bucket check early to
avoid redundant WithFilerClient call
* Fix list prefix delimiter handling
* Handle list errors conservatively
* Fix Trino FOR TIMESTAMP query - use past timestamp
Iceberg requires the timestamp to be strictly in the past.
Use current_timestamp - interval '1' second instead of current_timestamp.
---------
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
* fix multipart etag
* address comments
* clean up
* clean up
* optimization
* address comments
* unquoted etag
* dedup
* upgrade
* clean
* etag
* return quoted tag
* quoted etag
* debug
* s3api: unify ETag retrieval and quoting across handlers
Refactor newListEntry to take *S3ApiServer and use getObjectETag,
and update setResponseHeaders to use the same logic. This ensures
consistent ETags are returned for both listing and direct access.
* s3api: implement ListObjects deduplication for versioned buckets
Handle duplicate entries between the main path and the .versions
directory by prioritizing the latest version when bucket versioning
is enabled.
* s3api: cleanup stale main file entries during versioned uploads
Add explicit deletion of pre-existing "main" files when creating new
versions in versioned buckets. This prevents stale entries from
appearing in bucket listings and ensures consistency.
* s3api: fix cleanup code placement in versioned uploads
Correct the placement of rm calls in completeMultipartUpload and
putVersionedObject to ensure stale main files are properly deleted
during versioned uploads.
* s3api: improve getObjectETag fallback for empty ExtETagKey
Ensure that when ExtETagKey exists but contains an empty value,
the function falls through to MD5/chunk-based calculation instead
of returning an empty string.
* s3api: fix test files for new newListEntry signature
Update test files to use the new newListEntry signature where the
first parameter is *S3ApiServer. Created mockS3ApiServer to properly
test owner display name lookup functionality.
* s3api: use filer.ETag for consistent Md5 handling in getEtagFromEntry
Change getEtagFromEntry fallback to use filer.ETag(entry) instead of
filer.ETagChunks to ensure legacy entries with Attributes.Md5 are
handled consistently with the rest of the codebase.
* s3api: optimize list logic and fix conditional header logging
- Hoist bucket versioning check out of per-entry callback to avoid
repeated getVersioningState calls
- Extract appendOrDedup helper function to eliminate duplicate
dedup/append logic across multiple code paths
- Change If-Match mismatch logging from glog.Errorf to glog.V(3).Infof
and remove DEBUG prefix for consistency
* s3api: fix test mock to properly initialize IAM accounts
Fixed nil pointer dereference in TestNewListEntryOwnerDisplayName by
directly initializing the IdentityAccessManagement.accounts map in the
test setup. This ensures newListEntry can properly look up account
display names without panicking.
* cleanup
* s3api: remove premature main file cleanup in versioned uploads
Removed incorrect cleanup logic that was deleting main files during
versioned uploads. This was causing test failures because it deleted
objects that should have been preserved as null versions when
versioning was first enabled. The deduplication logic in listing is
sufficient to handle duplicate entries without deleting files during
upload.
* s3api: add empty-value guard to getEtagFromEntry
Added the same empty-value guard used in getObjectETag to prevent
returning quoted empty strings. When ExtETagKey exists but is empty,
the function now falls through to filer.ETag calculation instead of
returning "".
* s3api: fix listing of directory key objects with matching prefix
Revert prefix handling logic to use strings.TrimPrefix instead of
checking HasPrefix with empty string result. This ensures that when a
directory key object exactly matches the prefix (e.g. prefix="dir/",
object="dir/"), it is correctly handled as a regular entry instead of
being skipped or incorrectly processed as a common prefix. Also fixed
missing variable definition.
* s3api: refactor list inline dedup to use appendOrDedup helper
Refactored the inline deduplication logic in listFilerEntries to use the
shared appendOrDedup helper function. This ensures consistent behavior
and reduces code duplication.
* test: fix port allocation race in s3tables integration test
Updated startMiniCluster to find all required ports simultaneously using
findAvailablePorts instead of sequentially. This prevents race conditions
where the OS reallocates a port that was just released, causing multiple
services (e.g. Filer and Volume) to be assigned the same port and fail
to start.
* fix: S3 listing NextMarker missing intermediate directory component
When listing with nested prefixes like "character/member/", the NextMarker
was incorrectly constructed as "character/res024/" instead of
"character/member/res024/", causing continuation requests to fail.
Root cause: The code at line 331 was constructing NextMarker as:
nextMarker = requestDir + "/" + nextMarker
This worked when nextMarker already contained the full relative path,
but failed when it was just the entry name from the innermost recursion.
Fix: Include the prefix component when constructing NextMarker:
if prefix != "" {
nextMarker = requestDir + "/" + prefix + "/" + nextMarker
}
This ensures the full path is always constructed correctly for both:
- CommonPrefix entries (directories)
- Regular entries (files)
Also includes fix for cursor.prefixEndsOnDelimiter state leak that was
causing sibling directories to be incorrectly listed.
* test: add regression tests for NextMarker construction
Add comprehensive unit tests to verify NextMarker is correctly constructed
with nested prefixes. Tests cover:
- Regular entries with nested prefix (character/member/res024)
- CommonPrefix entries (directories)
- Edge cases (no requestDir, no prefix, deeply nested)
These tests ensure the fix prevents regression of the bug where
NextMarker was missing intermediate directory components.
* fix: directory incorrectly listed as object in S3 ListObjects
Regular directories (without MIME type) were only added to CommonPrefixes
when delimiter was exactly '/'. This caused directories to be silently
skipped for other delimiter values.
Changed the condition from 'delimiter == "/"' to 'delimiter != ""' to
ensure directories are correctly added to CommonPrefixes for any delimiter.
Fixes issue where directories like 'data/file.vhd' were being returned as
objects instead of prefixes in ListObjects responses.
* fix: complete the directory listing fix for all delimiters
Address reviewer feedback:
- Changed doListFilerEntries line 549 from 'delimiter != "/"' to 'delimiter == ""'
This ensures directories are yielded to the callback for ANY delimiter, not just "/"
- Parameterized test to verify fix works with multiple delimiters (/, _, :)
The previous fix only addressed line 260 but line 549 was still causing
recursion for non-"/" delimiters, preventing directories from being
added to CommonPrefixes.
* docs: update test comment to reflect multiple delimiters
Address reviewer feedback - clarify that the test verifies behavior
for any non-empty delimiter, not just '/'.
* docs: clarify test comment with delimiter examples
Add specific examples of delimiters ('/', '_', ':') to make it clear
that the test verifies behavior with multiple delimiter types.
* fix: revert line 549 to original logic, only line 260 needed changing
The fix for directories being listed as objects only required changing
line 260 from 'delimiter == "/"' to 'delimiter != ""'.
Line 549 should remain as 'delimiter != "/"' to allow recursion for
delimiters that don't exist in paths (e.g., delimiter=z for paths like
b/a/c). This is correct S3 behavior.
Updated test to only verify delimiter="/" since other delimiters should
recurse into directories to find actual files.
* docs: clarify test scope in directory listing test
* Fix: Eliminate duplicate versioned objects in S3 list operations
- Move versioned directory processing outside of pagination loop to process only once
- Add deduplication during .versions directory collection phase
- Fix directory handling to not add directories to results in recursive mode
- Directly add versioned entries to contents array instead of using callback
Fixes issue where AWS S3 list operations returned duplicated versioned objects
(e.g., 1000 duplicate entries from 4 unique objects). Now correctly returns only
the unique logical entries without duplication.
Verified with:
aws s3api list-objects --endpoint-url http://localhost:8333 --bucket pm-itatiaiucu-01
Returns exactly 4 entries (ClientInfo.xml and Repository from 2 Veeam backup folders)
* Refactor: Process .versions directories immediately when encountered
Instead of collecting .versions directories and processing them after the
pagination loop, process them immediately when encountered during traversal.
Benefits:
- Simpler code: removed versionedDirEntry struct and collection array
- More efficient: no need to store and iterate through collected entries
- Same O(V) complexity but with less memory overhead
- Clearer logic: processing happens in one pass during traversal
Since each .versions directory is only visited once during recursive
traversal (we never traverse into them), there's no need for deferred
processing or deduplication.
* Add comprehensive tests for versioned objects list
- TestListObjectsWithVersionedObjects: Tests listing with various delimiters
- TestVersionedObjectsNoDuplication: Core test validating no 250x duplication
- TestVersionedObjectsWithDeleteMarker: Tests delete marker filtering
- TestVersionedObjectsMaxKeys: Tests pagination with versioned objects
- TestVersionsDirectoryNotTraversed: Ensures .versions never traversed
- Fix existing test signature to match updated doListFilerEntries
* style: Fix formatting alignment in versioned objects tests
* perf: Optimize path extraction using string indexing
Replace multiple strings.Split/Join calls with efficient strings.Index
slicing to extract bucket-relative path from directory string.
Reduces unnecessary allocations and improves performance in versioned
objects listing path construction.
* refactor: Address code review feedback from Gemini Code Assist
1. Fix misleading comment about versioned directory processing location.
Versioned directories are processed immediately in doListFilerEntries,
not deferred to ListObjectsV1Handler.
2. Simplify path extraction logic using explicit bucket path construction
instead of index-based string slicing for better readability and
maintainability.
3. Add clarifying comment to test callback explaining why production logic
is duplicated - necessary because listFilerEntries is not easily testable
with filer client injection.
* fmt
* refactor: Address code review feedback from Copilot
- Fix misleading comment about versioned directory processing location
(note that processing happens within doListFilerEntries, not at top level)
- Add maxKeys validation checks in all test callbacks for consistency
- Add maxKeys check before calling eachEntryFn for versioned objects
- Improve test documentation to clarify testing approach and avoid apologetic tone
* refactor: Address code review feedback from Gemini Code Assist
- Remove redundant maxKeys check before eachEntryFn call on line 541
(the loop already checks maxKeys <= 0 at line 502, ensuring quota exists)
- Fix pagination pattern consistency in all test callbacks
- TestVersionedObjectsNoDuplication: Use cursor.maxKeys <= 0 check and decrement
- TestVersionedObjectsWithDeleteMarker: Use cursor.maxKeys <= 0 check and decrement
- TestVersionsDirectoryNotTraversed: Use cursor.maxKeys <= 0 check and decrement
- Ensures consistent pagination logic across all callbacks matching production behavior
* refactor: Address code review suggestions for code quality
- Adjust log verbosity from V(5) to V(4) for file additions to reduce noise
while maintaining useful debug output during troubleshooting
- Remove unused isRecursive parameter from doListFilerEntries function
signature and all call sites (not used for any logic decisions)
- Consolidate redundant comments about versioned directory handling
to reduce documentation duplication
These changes improve code maintainability and clarity.
* fmt
* refactor: Add pagination test and optimize stream processing
- Add comprehensive test validation to TestVersionedObjectsMaxKeys
that verifies truncation is correctly set when maxKeys is exhausted
with more entries available, ensuring proper pagination state
- Optimize stream processing in doListFilerEntries by using 'break'
instead of 'continue' when quota is exhausted (cursor.maxKeys <= 0)
This avoids receiving and discarding entries from the stream when
we've already reached the requested limit, improving efficiency
* s3api: fix bucket-root listing w/ delimiter
* test: improve mock robustness for bucket-root listing test
- Make testListEntriesStream implement interface explicitly without embedding
- Add prefix filtering logic to testFilerClient to simulate real filer behavior
- Special-case prefix='/' to not filter for bucket root compatibility
- Add required imports for metadata and strings packages
This addresses review comments about test mock brittleness and accuracy.
* test: add clarifying comment for mock filtering behavior
Add detailed comment explaining which ListEntriesRequest parameters
are implemented (Prefix) vs ignored (Limit, StartFromFileName, etc.)
in the test mock to improve code documentation and future maintenance.
* logging
* less logs
* less check if already locked
* fix: achieve single-scan efficiency for S3 versioned object listing
When listing objects in a versioning-enabled bucket, the original code
triggered multiple getEntry calls per versioned object (up to 12 with
retries), causing excessive 'find' operations visible in Grafana and
leading to high memory usage.
This fix achieves single-scan efficiency by caching list metadata
(size, ETag, mtime, owner) directly in the .versions directory:
1. Add new Extended keys for caching list metadata in .versions dir
2. Update upload/copy/multipart paths to cache metadata when creating versions
3. Update getLatestVersionEntryFromDirectoryEntry to use cached metadata
(zero getEntry calls when cache is available)
4. Update updateLatestVersionAfterDeletion to maintain cache consistency
Performance improvement for N versioned objects:
- Before: N×1 to N×12 find operations per list request
- After: 0 extra find operations (all metadata from single scan)
This matches the efficiency of normal (non-versioned) object listing.
* Update s3api_object_versioning.go
* s3api: fix ETag handling for versioned objects and simplify delete marker creation
- Add Md5 attribute to synthetic logicalEntry for single-part uploads to ensure
filer.ETag() returns correct value in ListObjects response
- Simplify delete marker creation by initializing entry directly in mkFile callback
- Add bytes and encoding/hex imports for ETag parsing
* s3api: preserve default attributes in delete marker mkFile callback
Only modify Mtime field instead of replacing the entire Attributes struct,
preserving default values like Crtime, FileMode, Uid, and Gid that mkFile
initializes.
* s3api: fix ETag handling in newListEntry for multipart uploads
Prioritize ExtETagKey from Extended attributes before falling back to
filer.ETag(). This properly handles multipart upload ETags (format: md5-parts)
for versioned objects, where the synthetic entry has cached ETag metadata
but no chunks to calculate from.
* s3api: reduce code duplication in delete marker creation
Extract deleteMarkerExtended map to be reused in both mkFile callback
and deleteMarkerEntry construction.
* test: add multipart upload versioning tests for ETag verification
Add tests to verify that multipart uploaded objects in versioned buckets
have correct ETags when listed:
- TestMultipartUploadVersioningListETag: Basic multipart upload with 2 parts
- TestMultipartUploadMultipleVersionsListETag: Multiple multipart versions
- TestMixedSingleAndMultipartVersionsListETag: Mix of single-part and multipart
These tests cover a bug where synthetic entries for versioned objects
didn't include proper ETag handling for multipart uploads.
* test: add delete marker test for multipart uploaded versioned objects
TestMultipartUploadDeleteMarkerListBehavior verifies:
- Delete marker creation hides object from ListObjectsV2
- ListObjectVersions shows both version and delete marker
- Version ETag (multipart format) is preserved after delete marker
- Object can be accessed by version ID after delete marker
- Removing delete marker restores object visibility
* refactor: address code review feedback
- test: use assert.ElementsMatch for ETag verification (more idiomatic)
- s3api: optimize newListEntry ETag logic (check ExtETagKey first)
- s3api: fix edge case in ETag parsing (>= 2 instead of > 2)
* s3api: prevent stale cached metadata and preserve existing extended attrs
- setCachedListMetadata: clear old cached keys before setting new values
to prevent stale data when new version lacks certain fields (e.g., owner)
- createDeleteMarker: merge extended attributes instead of overwriting
to preserve any existing metadata on the entry
* s3api: extract clearCachedVersionMetadata to reduce code duplication
- clearCachedVersionMetadata: clears only metadata fields (size, mtime, etag, owner, deleteMarker)
- clearCachedListMetadata: now reuses clearCachedVersionMetadata + clears ID/filename
- setCachedListMetadata: uses clearCachedVersionMetadata (not clearCachedListMetadata
because caller has already set ID/filename)
* s3api: share timestamp between version entry and cache entry
Capture versionMtime once before mkFile and reuse for both:
- versionEntry.Attributes.Mtime in the mkFile callback
- versionEntryForCache.Attributes.Mtime for list caching
This keeps list vs. HEAD LastModified timestamps aligned.
* s3api: remove amzAccountId variable shadowing in multipart upload
Extract amzAccountId before mkFile callback and reuse in both places,
similar to how versionMtime is handled. Avoids confusion from
redeclaring the same variable.
* fix: prevent path doubling in versioned object listing
Fix path doubling bug in getLatestVersionEntryForListOperation that caused
Velero/Kopia backups to fail when using S3 bucket versioning.
The issue was that when creating logical entries for versioned object listing,
the entry.Name was set to the full object path (e.g., 'kopia/logpaste/kopia.blobcfg')
instead of just the base filename ('kopia.blobcfg'). When this entry was used in
the list callback which combines dir + entry.Name, the paths got doubled:
'/buckets/velero/kopia/logpaste/kopia/logpaste/kopia.blobcfg'
This caused Kopia to fail loading pack indexes with the error:
'unable to load pack indexes despite 10 retries'
The fix uses path.Base(object) to extract only the filename portion, matching
how regular (non-versioned) entries work in the listing callback.
Fixes: GitHub discussion #7573
* refactor: use path.Base directly in test instead of reimplementing
Address code review feedback to simplify the test by using the standard
library path.Base function directly instead of reimplementing it.
* remove test: unit test for path.Base doesn't add much value
* fix: add pagination to list-object-versions for buckets with >1000 objects
The findVersionsRecursively() function used a fixed limit of 1000 entries
without pagination. This caused objects beyond the first 1000 entries
(sorted alphabetically) to never appear in list-object-versions responses.
Changes:
- Add pagination loop using filer.PaginationSize (1024)
- Use isLast flag from s3a.list() to detect end of pagination
- Track startFrom marker for each page
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
* fix: prevent infinite loop in ListObjects when processing .versions directories
The doListFilerEntries() function processes .versions directories in a
secondary loop after the main entry loop, but failed to update nextMarker.
This caused infinite pagination loops when results were truncated, as the
same .versions directories would be reprocessed on each page.
Bug introduced by: c196d03951
("fix listing object versions (#7006)")
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
The allowEmptyFolder option is no longer functional because:
1. The code that used it was already commented out
2. Empty folder cleanup is now handled asynchronously by EmptyFolderCleaner
The CLI flags are kept for backward compatibility but marked as deprecated
and ignored. This removes:
- S3ApiServerOption.AllowEmptyFolder field
- The actual usage in s3api_object_handlers_list.go
- Helm chart values and template references
- References in test Makefiles and docker-compose files
* Lazy Versioning Check, Conditional SSE Entry Fetch, HEAD Request Optimization
* revert
Reverted the conditional versioning check to always check versioning status
Reverted the conditional SSE entry fetch to always fetch entry metadata
Reverted the conditional versioning check to always check versioning status
Reverted the conditional SSE entry fetch to always fetch entry metadata
* Lazy Entry Fetch for SSE, Skip Conditional Header Check
* SSE-KMS headers are present, this is not an SSE-C request (mutually exclusive)
* SSE-C is mutually exclusive with SSE-S3 and SSE-KMS
* refactor
* Removed Premature Mutual Exclusivity Check
* check for the presence of the X-Amz-Server-Side-Encryption header
* not used
* fmt
* directly read write volume servers
* HTTP Range Request Support
* set header
* md5
* copy object
* fix sse
* fmt
* implement sse
* sse continue
* fixed the suffix range bug (bytes=-N for "last N bytes")
* debug logs
* Missing PartsCount Header
* profiling
* url encoding
* test_multipart_get_part
* headers
* debug
* adjust log level
* handle part number
* Update s3api_object_handlers.go
* nil safety
* set ModifiedTsNs
* remove
* nil check
* fix sse header
* same logic as filer
* decode values
* decode ivBase64
* s3: Fix SSE decryption JWT authentication and streaming errors
Critical fix for SSE (Server-Side Encryption) test failures:
1. **JWT Authentication Bug** (Root Cause):
- Changed from GenJwtForFilerServer to GenJwtForVolumeServer
- S3 API now uses correct JWT when directly reading from volume servers
- Matches filer's authentication pattern for direct volume access
- Fixes 'unexpected EOF' and 500 errors in SSE tests
2. **Streaming Error Handling**:
- Added error propagation in getEncryptedStreamFromVolumes goroutine
- Use CloseWithError() to properly communicate stream failures
- Added debug logging for streaming errors
3. **Response Header Timing**:
- Removed premature WriteHeader(http.StatusOK) call
- Let Go's http package write status automatically on first write
- Prevents header lock when errors occur during streaming
4. **Enhanced SSE Decryption Debugging**:
- Added IV/Key validation and logging for SSE-C, SSE-KMS, SSE-S3
- Better error messages for missing or invalid encryption metadata
- Added glog.V(2) debugging for decryption setup
This fixes SSE integration test failures where encrypted objects
could not be retrieved due to volume server authentication failures.
The JWT bug was causing volume servers to reject requests, resulting
in truncated/empty streams (EOF) or internal errors.
* s3: Fix SSE multipart upload metadata preservation
Critical fix for SSE multipart upload test failures (SSE-C and SSE-KMS):
**Root Cause - Incomplete SSE Metadata Copying**:
The old code only tried to copy 'SeaweedFSSSEKMSKey' from the first
part to the completed object. This had TWO bugs:
1. **Wrong Constant Name** (Key Mismatch Bug):
- Storage uses: SeaweedFSSSEKMSKeyHeader = 'X-SeaweedFS-SSE-KMS-Key'
- Old code read: SeaweedFSSSEKMSKey = 'x-seaweedfs-sse-kms-key'
- Result: SSE-KMS metadata was NEVER copied → 500 errors
2. **Missing SSE-C and SSE-S3 Headers**:
- SSE-C requires: IV, Algorithm, KeyMD5
- SSE-S3 requires: encrypted key data + standard headers
- Old code: copied nothing for SSE-C/SSE-S3 → decryption failures
**Fix - Complete SSE Header Preservation**:
Now copies ALL SSE headers from first part to completed object:
- SSE-C: SeaweedFSSSEIV, CustomerAlgorithm, CustomerKeyMD5
- SSE-KMS: SeaweedFSSSEKMSKeyHeader, AwsKmsKeyId, ServerSideEncryption
- SSE-S3: SeaweedFSSSES3Key, ServerSideEncryption
Applied consistently to all 3 code paths:
1. Versioned buckets (creates version file)
2. Suspended versioning (creates main object with null versionId)
3. Non-versioned buckets (creates main object)
**Why This Is Correct**:
The headers copied EXACTLY match what putToFiler stores during part
upload (lines 496-521 in s3api_object_handlers_put.go). This ensures
detectPrimarySSEType() can correctly identify encrypted multipart
objects and trigger inline decryption with proper metadata.
Fixes: TestSSEMultipartUploadIntegration (SSE-C and SSE-KMS subtests)
* s3: Add debug logging for versioning state diagnosis
Temporary debug logging to diagnose test_versioning_obj_plain_null_version_overwrite_suspended failure.
Added glog.V(0) logging to show:
1. setBucketVersioningStatus: when versioning status is changed
2. PutObjectHandler: what versioning state is detected (Enabled/Suspended/none)
3. PutObjectHandler: which code path is taken (putVersionedObject vs putSuspendedVersioningObject)
This will help identify if:
- The versioning status is being set correctly in bucket config
- The cache is returning stale/incorrect versioning state
- The switch statement is correctly routing to suspended vs enabled handlers
* s3: Enhanced versioning state tracing for suspended versioning diagnosis
Added comprehensive logging across the entire versioning state flow:
PutBucketVersioningHandler:
- Log requested status (Enabled/Suspended)
- Log when calling setBucketVersioningStatus
- Log success/failure of status change
setBucketVersioningStatus:
- Log bucket and status being set
- Log when config is updated
- Log completion with error code
updateBucketConfig:
- Log versioning state being written to cache
- Immediate cache verification after Set
- Log if cache verification fails
getVersioningState:
- Log bucket name and state being returned
- Log if object lock forces VersioningEnabled
- Log errors
This will reveal:
1. If PutBucketVersioning(Suspended) is reaching the handler
2. If the cache update succeeds
3. What state getVersioningState returns during PUT
4. Any cache consistency issues
Expected to show why bucket still reports 'Enabled' after 'Suspended' call.
* s3: Add SSE chunk detection debugging for multipart uploads
Added comprehensive logging to diagnose why TestSSEMultipartUploadIntegration fails:
detectPrimarySSEType now logs:
1. Total chunk count and extended header count
2. All extended headers with 'sse'/'SSE'/'encryption' in the name
3. For each chunk: index, SseType, and whether it has metadata
4. Final SSE type counts (SSE-C, SSE-KMS, SSE-S3)
This will reveal if:
- Chunks are missing SSE metadata after multipart completion
- Extended headers are copied correctly from first part
- The SSE detection logic is working correctly
Expected to show if chunks have SseType=0 (none) or proper SSE types set.
* s3: Trace SSE chunk metadata through multipart completion and retrieval
Added end-to-end logging to track SSE chunk metadata lifecycle:
**During Multipart Completion (filer_multipart.go)**:
1. Log finalParts chunks BEFORE mkFile - shows SseType and metadata
2. Log versionEntry.Chunks INSIDE mkFile callback - shows if mkFile preserves SSE info
3. Log success after mkFile completes
**During GET Retrieval (s3api_object_handlers.go)**:
1. Log retrieved entry chunks - shows SseType and metadata after retrieval
2. Log detected SSE type result
This will reveal at which point SSE chunk metadata is lost:
- If finalParts have SSE metadata but versionEntry.Chunks don't → mkFile bug
- If versionEntry.Chunks have SSE metadata but retrieved chunks don't → storage/retrieval bug
- If chunks never have SSE metadata → multipart completion SSE processing bug
Expected to show chunks with SseType=NONE during retrieval even though
they were created with proper SseType during multipart completion.
* s3: Fix SSE-C multipart IV base64 decoding bug
**Critical Bug Found**: SSE-C multipart uploads were failing because:
Root Cause:
- entry.Extended[SeaweedFSSSEIV] stores base64-encoded IV (24 bytes for 16-byte IV)
- SerializeSSECMetadata expects raw IV bytes (16 bytes)
- During multipart completion, we were passing base64 IV directly → serialization error
Error Message:
"Failed to serialize SSE-C metadata for chunk in part X: invalid IV length: expected 16 bytes, got 24"
Fix:
- Base64-decode IV before passing to SerializeSSECMetadata
- Added error handling for decode failures
Impact:
- SSE-C multipart uploads will now correctly serialize chunk metadata
- Chunks will have proper SSE metadata for decryption during GET
This fixes the SSE-C subtest of TestSSEMultipartUploadIntegration.
SSE-KMS still has a separate issue (error code 23) being investigated.
* fixes
* kms sse
* handle retry if not found in .versions folder and should read the normal object
* quick check (no retries) to see if the .versions/ directory exists
* skip retry if object is not found
* explicit update to avoid sync delay
* fix map update lock
* Remove fmt.Printf debug statements
* Fix SSE-KMS multipart base IV fallback to fail instead of regenerating
* fmt
* Fix ACL grants storage logic
* header handling
* nil handling
* range read for sse content
* test range requests for sse objects
* fmt
* unused code
* upload in chunks
* header case
* fix url
* bucket policy error vs bucket not found
* jwt handling
* fmt
* jwt in request header
* Optimize Case-Insensitive Prefix Check
* dead code
* Eliminated Unnecessary Stream Prefetch for Multipart SSE
* range sse
* sse
* refactor
* context
* fmt
* fix type
* fix SSE-C IV Mismatch
* Fix Headers Being Set After WriteHeader
* fix url parsing
* propergate sse headers
* multipart sse-s3
* aws sig v4 authen
* sse kms
* set content range
* better errors
* Update s3api_object_handlers_copy.go
* Update s3api_object_handlers.go
* Update s3api_object_handlers.go
* avoid magic number
* clean up
* Update s3api_bucket_policy_handlers.go
* fix url parsing
* context
* data and metadata both use background context
* adjust the offset
* SSE Range Request IV Calculation
* adjust logs
* IV relative to offset in each part, not the whole file
* collect logs
* offset
* fix offset
* fix url
* logs
* variable
* jwt
* Multipart ETag semantics: conditionally set object-level Md5 for single-chunk uploads only.
* sse
* adjust IV and offset
* multipart boundaries
* ensures PUT and GET operations return consistent ETags
* Metadata Header Case
* CommonPrefixes Sorting with URL Encoding
* always sort
* remove the extra PathUnescape call
* fix the multipart get part ETag
* the FileChunk is created without setting ModifiedTsNs
* Sort CommonPrefixes lexicographically to match AWS S3 behavior
* set md5 for multipart uploads
* prevents any potential data loss or corruption in the small-file inline storage path
* compiles correctly
* decryptedReader will now be properly closed after use
* Fixed URL encoding and sort order for CommonPrefixes
* Update s3api_object_handlers_list.go
* SSE-x Chunk View Decryption
* Different IV offset calculations for single-part vs multipart objects
* still too verbose in logs
* less logs
* ensure correct conversion
* fix listing
* nil check
* minor fixes
* nil check
* single character delimiter
* optimize
* range on empty object or zero-length
* correct IV based on its position within that part, not its position in the entire object
* adjust offset
* offset
Fetch FULL encrypted chunk (not just the range)
Adjust IV by PartOffset/ChunkOffset only
Decrypt full chunk
Skip in the DECRYPTED stream to reach OffsetInChunk
* look breaking
* refactor
* error on no content
* handle intra-block byte skipping
* Incomplete HTTP Response Error Handling
* multipart SSE
* Update s3api_object_handlers.go
* address comments
* less logs
* handling directory
* Optimized rejectDirectoryObjectWithoutSlash() to avoid unnecessary lookups
* Revert "handling directory"
This reverts commit 3a335f0ac3.
* constant
* Consolidate nil entry checks in GetObjectHandler
* add range tests
* Consolidate redundant nil entry checks in HeadObjectHandler
* adjust logs
* SSE type
* large files
* large files
Reverted the plain-object range test
* ErrNoEncryptionConfig
* Fixed SSERangeReader Infinite Loop Vulnerability
* Fixed SSE-KMS Multipart ChunkReader HTTP Body Leak
* handle empty directory in S3, added PyArrow tests
* purge unused code
* Update s3_parquet_test.py
* Update requirements.txt
* According to S3 specifications, when both partNumber and Range are present, the Range should apply within the selected part's boundaries, not to the full object.
* handle errors
* errors after writing header
* https
* fix: Wait for volume assignment readiness before running Parquet tests
The test-implicit-dir-with-server test was failing with an Internal Error
because volume assignment was not ready when tests started. This fix adds
a check that attempts a volume assignment and waits for it to succeed
before proceeding with tests.
This ensures that:
1. Volume servers are registered with the master
2. Volume growth is triggered if needed
3. The system can successfully assign volumes for writes
Fixes the timeout issue where boto3 would retry 4 times and fail with
'We encountered an internal error, please try again.'
* sse tests
* store derived IV
* fix: Clean up gRPC ports between tests to prevent port conflicts
The second test (test-implicit-dir-with-server) was failing because the
volume server's gRPC port (18080 = VOLUME_PORT + 10000) was still in use
from the first test. The cleanup code only killed HTTP port processes,
not gRPC port processes.
Added cleanup for gRPC ports in all stop targets:
- Master gRPC: MASTER_PORT + 10000 (19333)
- Volume gRPC: VOLUME_PORT + 10000 (18080)
- Filer gRPC: FILER_PORT + 10000 (18888)
This ensures clean state between test runs in CI.
* add import
* address comments
* docs: Add placeholder documentation files for Parquet test suite
Added three missing documentation files referenced in test/s3/parquet/README.md:
1. TEST_COVERAGE.md - Documents 43 total test cases (17 Go unit tests,
6 Python integration tests, 20 Python end-to-end tests)
2. FINAL_ROOT_CAUSE_ANALYSIS.md - Explains the s3fs compatibility issue
with PyArrow, the implicit directory problem, and how the fix works
3. MINIO_DIRECTORY_HANDLING.md - Compares MinIO's directory handling
approach with SeaweedFS's implementation
Each file contains:
- Title and overview
- Key technical details relevant to the topic
- TODO sections for future expansion
These placeholder files resolve the broken README links and provide
structure for future detailed documentation.
* clean up if metadata operation failed
* Update s3_parquet_test.py
* clean up
* Update Makefile
* Update s3_parquet_test.py
* Update Makefile
* Handle ivSkip for non-block-aligned offsets
* Update README.md
* stop volume server faster
* stop volume server in 1 second
* different IV for each chunk in SSE-S3 and SSE-KMS
* clean up if fails
* testing upload
* error propagation
* fmt
* simplify
* fix copying
* less logs
* endian
* Added marshaling error handling
* handling invalid ranges
* error handling for adding to log buffer
* fix logging
* avoid returning too quickly and ensure proper cleaning up
* Activity Tracking for Disk Reads
* Cleanup Unused Parameters
* Activity Tracking for Kafka Publishers
* Proper Test Error Reporting
* refactoring
* less logs
* less logs
* go fmt
* guard it with if entry.Attributes.TtlSec > 0 to match the pattern used elsewhere.
* Handle bucket-default encryption config errors explicitly for multipart
* consistent activity tracking
* obsolete code for s3 on filer read/write handlers
* Update weed/s3api/s3api_object_handlers_list.go
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
---------
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
* fix listing objects
* add more list testing
* address comments
* fix next marker
* fix isTruncated in listing
* fix tests
* address tests
* Update s3api_object_handlers_multipart.go
* fixes
* store json into bucket content, for tagging and cors
* switch bucket metadata from json to proto
* fix
* Update s3api_bucket_config.go
* fix test issue
* fix test_bucket_listv2_delimiter_prefix
* Update cors.go
* skip special characters
* passing listing
* fix test_bucket_list_delimiter_prefix
* ok. fix the xsd generated go code now
* fix cors tests
* fix test
* fix test_bucket_list_unordered and test_bucket_listv2_unordered
do not accept the allow-unordered and delimiter parameter combination
* fix test_bucket_list_objects_anonymous and test_bucket_listv2_objects_anonymous
The tests test_bucket_list_objects_anonymous and test_bucket_listv2_objects_anonymous were failing because they try to set bucket ACL to public-read, but SeaweedFS only supported private ACL.
Updated PutBucketAclHandler to use the existing ExtractAcl function which already supports all standard S3 canned ACLs
Replaced the hardcoded check for only private ACL with proper ACL parsing that handles public-read, public-read-write, authenticated-read, bucket-owner-read, bucket-owner-full-control, etc.
Added unit tests to verify all standard canned ACLs are accepted
* fix list unordered
The test is expecting the error code to be InvalidArgument instead of InvalidRequest
* allow anonymous listing( and head, get)
* fix test_bucket_list_maxkeys_invalid
Invalid values: max-keys=blah → Returns ErrInvalidMaxKeys (HTTP 400)
* updating IsPublicRead when parsing acl
* more logs
* CORS Test Fix
* fix test_bucket_list_return_data
* default to private
* fix test_bucket_list_delimiter_not_skip_special
* default no acl
* add debug logging
* more logs
* use basic http client
remove logs also
* fixes
* debug
* Update stats.go
* debugging
* fix anonymous test expectation
anonymous user can read, as configured in s3 json.
* add s3test for sql
* fix test test_bucket_listv2_delimiter_basic for s3
* fix action s3tests
* regen s3 api xsd
* rm minor s3 test test_bucket_listv2_fetchowner_defaultempty
* add docs
* without xmlns
* fix s3test test_bucket_listv2_delimiter_prefix_ends_with_delimiter
* fix list with delimiter and start token
---------
Co-authored-by: Konstantin Lebedev <9497591+kmlebedev@users.noreply.github.co>
* s3 fix get list of dir object key with slash suffix
https://github.com/seaweedfs/seaweedfs/issues/3086
* list only entry dir eq prefix
---------
Co-authored-by: Konstantin Lebedev <9497591+kmlebedev@users.noreply.github.co>
Fix when the stored data is actually enough but s3api_object_list_handlers returns less than the specified limit
Signed-off-by: changlin.shi <changlin.shi@ly.com>
* Revert previous changes
* s3: use cursor to track tree traversal
fix https://github.com/seaweedfs/seaweedfs/issues/3166
* special cases for empty prefix and empty directory
* use constants
* address empty folder
* undo local changes
* fix IsTruncated
* adjust counting directories
* fix cases when prefix is a directory
* s3: handle directory object
works for
aws --endpoint-url http://127.0.0.1:8333/ s3api list-objects-v2 --bucket test --prefix "fakedir"