Find the source identity before checking for collisions, matching
the standalone handler's logic. Previously a non-existent user
renamed to an existing name would get EntityAlreadyExists instead
of NoSuchEntity.
Groups are always dynamic (from filer), never static (from s3.config).
Seeding from iam.groups caused stale deleted groups to persist.
Now only uses config.Groups from the dynamic filer config.
Pass nil newEntry to bucket, IAM, and circuit-breaker handlers for
the source directory during cross-directory moves, so all watchers
can clear caches for the moved-away resource.
When a file is moved out of an IAM directory (e.g., /etc/iam/groups),
the dir variable was overwritten with NewParentPath, causing the
source directory change to be missed. Now also notifies handlers
about the source directory for cross-directory moves.
Previously the merge started with empty group maps, dropping any
static-file groups. Now seeds from existing iam.groups before
overlaying dynamic config, and builds the reverse index after
merging to avoid stale entries from overridden groups.
Move the nil check on identity before accessing identity.Name to
prevent panic. Also refine hasAttachedPolicies to only consider groups
that are enabled and have actual policies attached, so membership in
a no-policy group doesn't incorrectly trigger IAM authorization.
Return ServiceFailure for credential manager errors instead of masking
them as NoSuchEntity. Also switch ListGroupsForUser to use s3cfg.Groups
instead of in-memory reverse index to avoid stale data. Add duplicate
name check to UpdateGroup rename.
Add UpdateGroup action to enable/disable groups and rename groups
via the IAM API. This is a SeaweedFS extension (not in AWS SDK) used
by tests to toggle group disabled status.
Policies created via CreatePolicy through credentialManager are stored
in the credential store, not in s3cfg.Policies (which only has static
config policies). Change AttachGroupPolicy to use credentialManager.GetPolicy()
for policy existence validation.
Reject DeleteGroup when group has attached policies, matching the
existing members check. Also fix GetGroup error handling in
DeletePolicy to only skip ErrGroupNotFound, not all errors.
Merge policy names from user's enabled groups into the IAMIdentity
used for authorization, so group-attached policies are evaluated
alongside user-attached policies.
- Fix XSS vulnerability in groups.templ: replace innerHTML string
concatenation with DOM APIs (createElement/textContent) for rendering
member and policy lists
- Use userGroups reverse index in embedded IAM ListGroupsForUser for
O(1) lookup instead of iterating all groups
- Add buildUserGroupsIndex helper in standalone IAM handlers; use it
in ListGroupsForUser and removeUserFromAllGroups for efficient lookup
- Add note about gRPC store load-modify-save race condition limitation
When a user is deleted, remove them from all groups they belong to.
When a user is renamed, update group membership references. Applied
to both embedded and standalone IAM handlers.
Add groups and userGroups reverse index to IdentityAccessManagement.
Populate both maps during ReplaceS3ApiConfiguration and
MergeS3ApiConfiguration. Modify evaluateIAMPolicies to evaluate
policies from user's enabled groups in addition to user policies.
Update VerifyActionPermission to consider group policies when
checking hasAttachedPolicies.
* s3: use url.PathUnescape for X-Amz-Copy-Source header (#8544)
The X-Amz-Copy-Source header is a URL-encoded path, not a query string.
Using url.QueryUnescape incorrectly converts literal '+' characters to
spaces, which can cause object key mismatches during copy operations.
Switch to url.PathUnescape in CopyObjectHandler, CopyObjectPartHandler,
and pathToBucketObjectAndVersion to correctly handle special characters
like '!', '+', and other RFC 3986 sub-delimiters that S3 clients may
percent-encode (e.g. '!' as %21).
* s3: add path validation to CopyObjectPartHandler
CopyObjectPartHandler was missing the validateTableBucketObjectPath
checks that CopyObjectHandler has, allowing potential path traversal
in the source bucket/object of copy part requests.
* s3: fix case-sensitive HeadersRegexp for copy source routing
The HeadersRegexp for X-Amz-Copy-Source used `%2F` which only matched
uppercase hex encoding. RFC 3986 allows both `%2F` and `%2f`, so
clients sending lowercase percent-encoding would bypass the copy
handler and hit PutObjectHandler instead. Add (?i) flag for
case-insensitive matching.
Also add test coverage for the versionId branch in
pathToBucketObjectAndVersion and for lowercase %2f routing.
* filer: expose metadata events and list snapshots
* mount: invalidate hot directory caches
* mount: read hot directories directly from filer
* mount: add sequenced metadata cache applier
* mount: apply metadata responses through cache applier
* mount: replay snapshot-consistent directory builds
* mount: dedupe self metadata events
* mount: factor directory build cleanup
* mount: replace proto marshal dedup with composite key and ring buffer
The dedup logic was doing a full deterministic proto.Marshal on every
metadata event just to produce a dedup key. Replace with a cheap
composite string key (TsNs|Directory|OldName|NewName).
Also replace the sliding-window slice (which leaked the backing array
unboundedly) with a fixed-size ring buffer that reuses the same array.
* filer: remove mutex and proto.Clone from request-scoped MetadataEventSink
MetadataEventSink is created per-request and only accessed by the
goroutine handling the gRPC call. The mutex and double proto.Clone
(once in Record, once in Last) were unnecessary overhead on every
filer write operation. Store the pointer directly instead.
* mount: skip proto.Clone for caller-owned metadata events
Add ApplyMetadataResponseOwned that takes ownership of the response
without cloning. Local metadata events (mkdir, create, flush, etc.)
are freshly constructed and never shared, so the clone is unnecessary.
* filer: only populate MetadataEvent on successful DeleteEntry
Avoid calling eventSink.Last() on error paths where the sink may
contain a partial event from an intermediate child deletion during
recursive deletes.
* mount: avoid map allocation in collectDirectoryNotifications
Replace the map with a fixed-size array and linear dedup. There are
at most 3 directories to notify (old parent, new parent, new child
if directory), so a 3-element array avoids the heap allocation on
every metadata event.
* mount: fix potential deadlock in enqueueApplyRequest
Release applyStateMu before the blocking channel send. Previously,
if the channel was full (cap 128), the send would block while holding
the mutex, preventing Shutdown from acquiring it to set applyClosed.
* mount: restore signature-based self-event filtering as fast path
Re-add the signature check that was removed when content-based dedup
was introduced. Checking signatures is O(1) on a small slice and
avoids enqueuing and processing events that originated from this
mount instance. The content-based dedup remains as a fallback.
* filer: send snapshotTsNs only in first ListEntries response
The snapshot timestamp is identical for every entry in a single
ListEntries stream. Sending it in every response message wastes
wire bandwidth for large directories. The client already reads
it only from the first response.
* mount: exit read-through mode after successful full directory listing
MarkDirectoryRefreshed was defined but never called, so directories
that entered read-through mode (hot invalidation threshold) stayed
there permanently, hitting the filer on every readdir even when cold.
Call it after a complete read-through listing finishes.
* mount: include event shape and full paths in dedup key
The previous dedup key only used Names, which could collapse distinct
rename targets. Include the event shape (C/D/U/R), source directory,
new parent path, and both entry names so structurally different events
are never treated as duplicates.
* mount: drain pending requests on shutdown in runApplyLoop
After receiving the shutdown sentinel, drain any remaining requests
from applyCh non-blockingly and signal each with errMetaCacheClosed
so callers waiting on req.done are released.
* mount: include IsDirectory in synthetic delete events
metadataDeleteEvent now accepts an isDirectory parameter so the
applier can distinguish directory deletes from file deletes. Rmdir
passes true, Unlink passes false.
* mount: fall back to synthetic event when MetadataEvent is nil
In mknod and mkdir, if the filer response omits MetadataEvent (e.g.
older filer without the field), synthesize an equivalent local
metadata event so the cache is always updated.
* mount: make Flush metadata apply best-effort after successful commit
After filer_pb.CreateEntryWithResponse succeeds, the entry is
persisted. Don't fail the Flush syscall if the local metadata cache
apply fails — log and invalidate the directory cache instead.
Also fall back to a synthetic event when MetadataEvent is nil.
* mount: make Rename metadata apply best-effort
The rename has already succeeded on the filer by the time we apply
the local metadata event. Log failures instead of returning errors
that would be dropped by the caller anyway.
* mount: make saveEntry metadata apply best-effort with fallback
After UpdateEntryWithResponse succeeds, treat local metadata apply
as non-fatal. Log and invalidate the directory cache on failure.
Also fall back to a synthetic event when MetadataEvent is nil.
* filer_pb: preserve snapshotTsNs on error in ReadDirAllEntriesWithSnapshot
Return the snapshot timestamp even when the first page fails, so
callers receive the snapshot boundary when partial data was received.
* filer: send snapshot token for empty directory listings
When no entries are streamed, send a final ListEntriesResponse with
only SnapshotTsNs so clients always receive the snapshot boundary.
* mount: distinguish not-found vs transient errors in lookupEntry
Return fuse.EIO for non-not-found filer errors instead of
unconditionally returning ENOENT, so transient failures don't
masquerade as missing entries.
* mount: make CacheRemoteObject metadata apply best-effort
The file content has already been cached successfully. Don't fail
the read if the local metadata cache update fails.
* mount: use consistent snapshot for readdir in direct mode
Capture the SnapshotTsNs from the first loadDirectoryEntriesDirect
call and store it on the DirectoryHandle. Subsequent batch loads
pass this stored timestamp so all batches use the same snapshot.
Also export DoSeaweedListWithSnapshot so mount can use it directly
with snapshot passthrough.
* filer_pb: fix test fake to send SnapshotTsNs only on first response
Match the server behavior: only the first ListEntriesResponse in a
page carries the snapshot timestamp, subsequent entries leave it zero.
* Fix nil pointer dereference in ListEntries stream consumers
Remove the empty-directory snapshot-only response from ListEntries
that sent a ListEntriesResponse with Entry==nil, which crashed every
raw stream consumer that assumed resp.Entry is always non-nil.
Also add defensive nil checks for resp.Entry in all raw ListEntries
stream consumers across: S3 listing, broker topic lookup, broker
topic config, admin dashboard, topic retention, hybrid message
scanner, Kafka integration, and consumer offset storage.
* Add nil guards for resp.Entry in remaining ListEntries stream consumers
Covers: S3 object lock check, MQ management dashboard (version/
partition/offset loops), and topic retention version loop.
* Make applyLocalMetadataEvent best-effort in Link and Symlink
The filer operations already succeeded; failing the syscall because
the local cache apply failed is wrong. Log a warning and invalidate
the parent directory cache instead.
* Make applyLocalMetadataEvent best-effort in Mkdir/Rmdir/Mknod/Unlink
The filer RPC already committed; don't fail the syscall when the
local metadata cache apply fails. Log a warning and invalidate the
parent directory cache to force a re-fetch on next access.
* flushFileMetadata: add nil-fallback for metadata event and best-effort apply
Synthesize a metadata event when resp.GetMetadataEvent() is nil
(matching doFlush), and make the apply best-effort with cache
invalidation on failure.
* Prevent double-invocation of cleanupBuild in doEnsureVisited
Add a cleanupDone guard so the deferred cleanup and inline error-path
cleanup don't both call DeleteFolderChildren/AbortDirectoryBuild.
* Fix comment: signature check is O(n) not O(1)
* Prevent deferred cleanup after successful CompleteDirectoryBuild
Set cleanupDone before returning from the success path so the
deferred context-cancellation check cannot undo a published build.
* Invalidate parent directory caches on rename metadata apply failure
When applyLocalMetadataEvent fails during rename, invalidate the
source and destination parent directory caches so subsequent accesses
trigger a re-fetch from the filer.
* Add event nil-fallback and cache invalidation to Link and Symlink
Synthesize metadata events when the server doesn't return one, and
invalidate parent directory caches on apply failure.
* Match requested partition when scanning partition directories
Parse the partition range format (NNNN-NNNN) and match against the
requested partition parameter instead of using the first directory.
* Preserve snapshot timestamp across empty directory listings
Initialize actualSnapshotTsNs from the caller-requested value so it
isn't lost when the server returns no entries. Re-add the server-side
snapshot-only response for empty directories (all raw stream consumers
now have nil guards for Entry).
* Fix CreateEntry error wrapping to support errors.Is/errors.As
Use errors.New + %w instead of %v for resp.Error so callers can
unwrap the underlying error.
* Fix object lock pagination: only advance on non-nil entries
Move entriesReceived inside the nil check so nil entries don't
cause repeated ListEntries calls with the same lastFileName.
* Guard Attributes nil check before accessing Mtime in MQ management
* Do not send nil-Entry response for empty directory listings
The snapshot-only ListEntriesResponse (with Entry == nil) for empty
directories breaks consumers that treat any received response as an
entry (Java FilerClient, S3 listing). The Go client-side
DoSeaweedListWithSnapshot already preserves the caller-requested
snapshot via actualSnapshotTsNs initialization, so the server-side
send is unnecessary.
* Fix review findings: subscriber dedup, invalidation normalization, nil guards, shutdown race
- Remove self-signature early-return in processEventFn so all events
flow through the applier (directory-build buffering sees self-originated
events that arrive after a snapshot)
- Normalize NewParentPath in collectEntryInvalidations to avoid duplicate
invalidations when NewParentPath is empty (same-directory update)
- Guard resp.Entry.Attributes for nil in admin_server.go and
topic_retention.go to prevent panics on entries without attributes
- Fix enqueueApplyRequest race with shutdown by using select on both
applyCh and applyDone, preventing sends after the apply loop exits
- Add cleanupDone check to deferred cleanup in meta_cache_init.go for
clarity alongside the existing guard in cleanupBuild
- Add empty directory test case for snapshot consistency
* Propagate authoritative metadata event from CacheRemoteObjectToLocalCluster and generate client-side snapshot for empty directories
- Add metadata_event field to CacheRemoteObjectToLocalClusterResponse
proto so the filer-emitted event is available to callers
- Use WithMetadataEventSink in the server handler to capture the event
from NotifyUpdateEvent and return it on the response
- Update filehandle_read.go to prefer the RPC's metadata event over
a locally fabricated one, falling back to metadataUpdateEvent when
the server doesn't provide one (e.g., older filers)
- Generate a client-side snapshot cutoff in DoSeaweedListWithSnapshot
when the server sends no snapshot (empty directory), so callers like
CompleteDirectoryBuild get a meaningful boundary for filtering
buffered events
* Skip directory notifications for dirs being built to prevent mid-build cache wipe
When a metadata event is buffered during a directory build,
applyMetadataSideEffects was still firing noteDirectoryUpdate for the
building directory. If the directory accumulated enough updates to
become "hot", markDirectoryReadThrough would call DeleteFolderChildren,
wiping entries that EnsureVisited had already inserted. The build would
then complete and mark the directory cached with incomplete data.
Fix by using applyMetadataSideEffectsSkippingBuildingDirs for buffered
events, which suppresses directory notifications for dirs currently in
buildingDirs while still applying entry invalidations.
* Add test for directory notification suppression during active build
TestDirectoryNotificationsSuppressedDuringBuild verifies that metadata
events targeting a directory under active EnsureVisited build do NOT
fire onDirectoryUpdate for that directory. In production, this prevents
markDirectoryReadThrough from calling DeleteFolderChildren mid-build,
which would wipe entries already inserted by the listing.
The test inserts an entry during a build, sends multiple metadata events
for the building directory, asserts no notifications fired for it,
verifies the entry survives, and confirms buffered events are replayed
after CompleteDirectoryBuild.
* Fix create invalidations, build guard, event shape, context, and snapshot error path
- collectEntryInvalidations: invalidate FUSE kernel cache on pure
create events (OldEntry==nil && NewEntry!=nil), not just updates
and deletes
- completeDirectoryBuildNow: only call markCachedFn when an active
build existed (state != nil), preventing an unpopulated directory
from being marked as cached
- Add metadataCreateEvent helper that produces a create-shaped event
(NewEntry only, no OldEntry) and use it in mkdir, mknod, symlink,
and hardlink create fallback paths instead of metadataUpdateEvent
which incorrectly set both OldEntry and NewEntry
- applyMetadataResponseEnqueue: use context.Background() for the
queued mutation so a cancelled caller context cannot abort the
apply loop mid-write
- DoSeaweedListWithSnapshot: move snapshot initialization before
ListEntries call so the error path returns the preserved snapshot
instead of 0
* Fix review findings: test loop, cache race, context safety, snapshot consistency
- Fix build test loop starting at i=1 instead of i=0, missing new-0.txt verification
- Re-check IsDirectoryCached after cache miss to avoid ENOENT race with markDirectoryReadThrough
- Use context.Background() in enqueueAndWait so caller cancellation can't abort build/complete mid-way
- Pass dh.snapshotTsNs in skip-batch loadDirectoryEntriesDirect for snapshot consistency
- Prefer resp.MetadataEvent over fallback in Unlink event derivation
- Add comment on MetadataEventSink.Record single-event assumption
* Fix empty-directory snapshot clock skew and build cancellation race
Empty-directory snapshot: Remove client-side time.Now() synthesis when
the server returns no entries. Instead return snapshotTsNs=0, and in
completeDirectoryBuildNow replay ALL buffered events when snapshot is 0.
This eliminates the clock-skew bug where a client ahead of the filer
would filter out legitimate post-list events.
Build cancellation: Use context.Background() for BeginDirectoryBuild
and CompleteDirectoryBuild calls in doEnsureVisited, so errgroup
cancellation doesn't cause enqueueAndWait to return early and trigger
cleanupBuild while the operation is still queued.
* Add tests for empty-directory build replay and cancellation resilience
TestEmptyDirectoryBuildReplaysAllBufferedEvents: verifies that when
CompleteDirectoryBuild receives snapshotTsNs=0 (empty directory, no
server snapshot), ALL buffered events are replayed regardless of their
TsNs values — no clock-skew-sensitive filtering occurs.
TestBuildCompletionSurvivesCallerCancellation: verifies that once
CompleteDirectoryBuild is enqueued, a cancelled caller context does not
prevent the build from completing. The apply loop runs with
context.Background(), so the directory becomes cached and buffered
events are replayed even when the caller gives up waiting.
* Fix directory subtree cleanup, Link rollback, test robustness
- applyMetadataResponseLocked: when a directory entry is deleted or
moved, call DeleteFolderChildren on the old path so cached descendants
don't leak as stale entries.
- Link: save original HardLinkId/Counter before mutation. If
CreateEntryWithResponse fails after the source was already updated,
rollback the source entry to its original state via UpdateEntry.
- TestBuildCompletionSurvivesCallerCancellation: replace fixed
time.Sleep(50ms) with a deadline-based poll that checks
IsDirectoryCached in a loop, failing only after 2s timeout.
- TestReadDirAllEntriesWithSnapshotEmptyDirectory: assert that
ListEntries was actually invoked on the mock client so the test
exercises the RPC path.
- newMetadataEvent: add early return when both oldEntry and newEntry are
nil to avoid emitting events with empty Directory.
---------
Co-authored-by: Copilot <copilot@github.com>
* request_id: add shared request middleware
* s3err: preserve request ids in responses and logs
* iam: reuse request ids in XML responses
* sts: reuse request ids in XML responses
* request_id: drop legacy header fallback
* request_id: use AWS-style request id format
* iam: fix AWS-compatible XML format for ErrorResponse and field ordering
- ErrorResponse uses bare <RequestId> at root level instead of
<ResponseMetadata> wrapper, matching the AWS IAM error response spec
- Move CommonResponse to last field in success response structs so
<ResponseMetadata> serializes after result elements
- Add randomness to request ID generation to avoid collisions
- Add tests for XML ordering and ErrorResponse format
* iam: remove duplicate error_response_test.go
Test is already covered by responses_test.go.
* address PR review comments
- Guard against typed nil pointers in SetResponseRequestID before
interface assertion (CodeRabbit)
- Use regexp instead of strings.Index in test helpers for extracting
request IDs (Gemini)
* request_id: prevent spoofing, fix nil-error branch, thread reqID to error writers
- Ensure() now always generates a server-side ID, ignoring client-sent
x-amz-request-id headers to prevent request ID spoofing. Uses a
private context key (contextKey{}) instead of the header string.
- writeIamErrorResponse in both iamapi and embedded IAM now accepts
reqID as a parameter instead of calling Ensure() internally, ensuring
a single request ID per request lifecycle.
- The nil-iamError branch in writeIamErrorResponse now writes a 500
Internal Server Error response instead of returning silently.
- Updated tests to set request IDs via context (not headers) and added
tests for spoofing prevention and context reuse.
* sts: add request-id consistency assertions to ActionInBody tests
* test: update admin test to expect server-generated request IDs
The test previously sent a client x-amz-request-id header and expected
it echoed back. Since Ensure() now ignores client headers to prevent
spoofing, update the test to verify the server returns a non-empty
server-generated request ID instead.
* iam: add generic WithRequestID helper alongside reflection-based fallback
Add WithRequestID[T] that uses generics to take the address of a value
type, satisfying the pointer receiver on SetRequestId without reflection.
The existing SetResponseRequestID is kept for the two call sites that
operate on interface{} (from large action switches where the concrete
type varies at runtime). Generics cannot replace reflection there since
Go cannot infer type parameters from interface{}.
* Remove reflection and generics from request ID setting
Call SetRequestId directly on concrete response types in each switch
branch before boxing into interface{}, eliminating the need for
WithRequestID (generics) and SetResponseRequestID (reflection).
* iam: return pointer responses in action dispatch
* Fix IAM error handling consistency and ensure request IDs on all responses
- UpdateUser/CreatePolicy error branches: use writeIamErrorResponse instead
of s3err.WriteErrorResponse to preserve IAM formatting and request ID
- ExecuteAction: accept reqID parameter and generate one if empty, ensuring
every response carries a RequestId regardless of caller
* Clean up inline policies on DeleteUser and UpdateUser rename
DeleteUser: remove InlinePolicies[userName] from policy storage before
removing the identity, so policies are not orphaned.
UpdateUser: move InlinePolicies[userName] to InlinePolicies[newUserName]
when renaming, so GetUserPolicy/DeleteUserPolicy work under the new name.
Both operations persist the updated policies and return an error if
the storage write fails, preventing partial state.
* Places the CommonResponse struct at the end of all IAM responses, rather than the start.
* iam: fix error response request id layout
* iam: add XML ordering regression test
* iam: share request id generation
---------
Co-authored-by: Aaron Segal <aaron.segal@rpsolutions.com>
Co-authored-by: Chris Lu <chris.lu@gmail.com>
Previously, evaluateIAMPolicies created a new PolicyEngine and re-parsed
the JSON policy document for every policy on every request. This adds a
shared iamPolicyEngine field that caches compiled policies, kept in sync
by PutPolicy, DeletePolicy, and bulk config reload paths.
- PutPolicy deletes the old cache entry before setting the new one, so a
parse failure on update does not leave a stale allow.
- Log warnings when policy compilation fails instead of silently
discarding errors.
- Add test for valid-to-invalid policy update regression.
* s3api: add error code and header constants for GetObjectAttributes
Add ErrInvalidAttributeName error code and header constants
(X-Amz-Object-Attributes, X-Amz-Max-Parts, X-Amz-Part-Number-Marker,
X-Amz-Delete-Marker) needed by the S3 GetObjectAttributes API.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* s3api: implement GetObjectAttributes handler
Add GetObjectAttributesHandler that returns selected object metadata
(ETag, Checksum, StorageClass, ObjectSize, ObjectParts) without
returning the object body. Follows the same versioning and conditional
header patterns as HeadObjectHandler.
The handler parses the X-Amz-Object-Attributes header to determine
which attributes to include in the XML response, and supports
ObjectParts pagination via X-Amz-Max-Parts and X-Amz-Part-Number-Marker.
Ref: https://docs.aws.amazon.com/AmazonS3/latest/API/API_GetObjectAttributes.html
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* s3api: register GetObjectAttributes route
Register the GET /{object}?attributes route for the
GetObjectAttributes API, placed before other object query
routes to ensure proper matching.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* s3api: add integration tests for GetObjectAttributes
Test coverage:
- Basic: simple object with all attribute types
- MultipartObject: multipart upload with parts pagination
- SelectiveAttributes: requesting only specific attributes
- InvalidAttribute: server rejects invalid attribute names
- NonExistentObject: returns NoSuchKey for missing objects
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* s3api: add versioned object test for GetObjectAttributes
Test puts two versions of the same object and verifies that:
- GetObjectAttributes returns the latest version by default
- GetObjectAttributes with versionId returns the specific version
- ObjectSize and VersionId are correct for each version
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* s3api: fix combined conditional header evaluation per RFC 7232
Per RFC 7232:
- Section 3.4: If-Unmodified-Since MUST be ignored when If-Match is
present (If-Match is the more accurate replacement)
- Section 3.3: If-Modified-Since MUST be ignored when If-None-Match is
present (If-None-Match is the more accurate replacement)
Previously, all four conditional headers were evaluated independently.
This caused incorrect 412 responses when If-Match succeeded but
If-Unmodified-Since failed (should return 200 per AWS S3 behavior).
Fix applied to both validateConditionalHeadersForReads (GET/HEAD) and
validateConditionalHeaders (PUT) paths.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* s3api: add conditional header combination tests for GetObjectAttributes
Test the RFC 7232 combined conditional header semantics:
- If-Match=true + If-Unmodified-Since=false => 200 (If-Unmodified-Since ignored)
- If-None-Match=false + If-Modified-Since=true => 304 (If-Modified-Since ignored)
- If-None-Match=true + If-Modified-Since=false => 200 (If-Modified-Since ignored)
- If-Match=true + If-Unmodified-Since=true => 200
- If-Match=false => 412 regardless
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* s3api: document Checksum attribute as not yet populated
Checksum is accepted in validation (so clients requesting it don't get
a 400 error, matching AWS behavior for objects without checksums) but
SeaweedFS does not yet store S3 checksums. Add a comment explaining
this and noting where to populate it when checksum storage is added.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* s3api: add s3:GetObjectAttributes IAM action for ?attributes query
Previously, GET /{object}?attributes resolved to s3:GetObject via the
fallback path since resolveFromQueryParameters had no case for the
"attributes" query parameter.
Add S3_ACTION_GET_OBJECT_ATTRIBUTES constant ("s3:GetObjectAttributes")
and a branch in resolveFromQueryParameters to return it for GET requests
with the "attributes" query parameter, so IAM policies can distinguish
GetObjectAttributes from GetObject.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* s3api: evaluate conditional headers after version resolution
Move conditional header evaluation (If-Match, If-None-Match, etc.) to
after the version resolution step in GetObjectAttributesHandler. This
ensures that when a specific versionId is requested, conditions are
checked against the correct version entry rather than always against
the latest version.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* s3api: use bounded HTTP client in GetObjectAttributes tests
Replace http.DefaultClient with a timeout-aware http.Client (10s) in
the signedGetObjectAttributes helper and testGetObjectAttributesInvalid
to prevent tests from hanging indefinitely.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* s3api: check attributes query before versionId in action resolver
Move the GetObjectAttributes action check before the versionId check
in resolveFromQueryParameters. This fixes GET /bucket/key?attributes&versionId=xyz
being incorrectly classified as s3:GetObjectVersion instead of
s3:GetObjectAttributes.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* s3api: add tests for versioned conditional headers and action resolver
Add integration test that verifies conditional headers (If-Match,
If-None-Match) are evaluated against the requested version entry, not
the latest version. This covers the fix in 55c409dec.
Add unit test for ResolveS3Action verifying that the attributes query
parameter takes precedence over versionId, so GET ?attributes&versionId
resolves to s3:GetObjectAttributes. This covers the fix in b92c61c95.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* s3api: guard negative chunk indices and rename PartsCount field
Add bounds checks for b.StartChunk >= 0 and b.EndChunk >= 0 in
buildObjectAttributesParts to prevent panics from corrupted metadata
with negative index values.
Rename ObjectAttributesParts.PartsCount to TotalPartsCount to match
the AWS SDK v2 Go field naming convention, while preserving the XML
element name "PartsCount" via the struct tag.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* s3api: reject malformed max-parts and part-number-marker headers
Return ErrInvalidMaxParts and ErrInvalidPartNumberMarker when the
X-Amz-Max-Parts or X-Amz-Part-Number-Marker headers contain
non-integer or negative values, matching ListObjectPartsHandler
behavior. Previously these were silently ignored with defaults.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
* Prevent concurrent maintenance tasks per volume
* fix panic
* fix(s3api): correctly extract host header port when X-Forwarded-Port is present
* test(s3api): add test cases for misreported X-Forwarded-Port
* set working directory
* consolidate to worker directory
* working directory
* correct directory name
* refactoring to use wildcard matcher
* simplify
* cleaning ec working directory
* fix reference
* clean
* adjust test
* feat: drop table location mapping support
Disable external metadata locations for S3 Tables and remove the table location
mapping index entirely. Table metadata must live under the table bucket paths,
so lookups no longer use mapping directories.
Changes:
- Remove mapping lookup and cache from bucket path resolution
- Reject metadataLocation in CreateTable and UpdateTable
- Remove mapping helpers and tests
* compile
* refactor
* fix: accept metadataLocation in S3 Tables API requests
We removed the external table location mapping feature, but still need to
accept and store metadataLocation values from clients like Trino. The mapping
feature was an internal implementation detail that mapped external buckets to
internal table paths. The metadataLocation field itself is part of the S3 Tables
API and should be preserved.
* fmt
* fix: handle MetadataLocation in UpdateTable requests
Mirror handleCreateTable behavior by updating metadata.MetadataLocation
when req.MetadataLocation is provided in UpdateTable requests. This ensures
table metadata location can be updated, not just set during creation.
* fix: move table location mappings to /etc/s3tables to avoid bucket name validation
Fixes#8362 - table location mappings were stored under /buckets/.table-location-mappings
which fails bucket name validation because it starts with a dot. Moving them to
/etc/s3tables resolves the migration error for upgrades.
Changes:
- Table location mappings now stored under /etc/s3tables
- Ensure parent /etc directory exists before creating /etc/s3tables
- Normal writes go to new location only (no legacy compatibility)
- Removed bucket name validation exception for old location
* refactor: simplify lookupTableLocationMapping by removing redundant mappingPath parameter
The mappingPath function parameter was redundant as the path can be derived
from mappingDir and bucket using path.Join. This simplifies the code and
reduces the risk of path mismatches between parameters.
* Fix S3 signature verification behind reverse proxies
When SeaweedFS is deployed behind a reverse proxy (e.g. nginx, Kong,
Traefik), AWS S3 Signature V4 verification fails because the Host header
the client signed with (e.g. "localhost:9000") differs from the Host
header SeaweedFS receives on the backend (e.g. "seaweedfs:8333").
This commit adds a new -s3.externalUrl parameter (and S3_EXTERNAL_URL
environment variable) that tells SeaweedFS what public-facing URL clients
use to connect. When set, SeaweedFS uses this host value for signature
verification instead of the Host header from the incoming request.
New parameter:
-s3.externalUrl (flag) or S3_EXTERNAL_URL (environment variable)
Example: -s3.externalUrl=http://localhost:9000
Example: S3_EXTERNAL_URL=https://s3.example.com
The environment variable is particularly useful in Docker/Kubernetes
deployments where the external URL is injected via container config.
The flag takes precedence over the environment variable when both are set.
At startup, the URL is parsed and default ports are stripped to match
AWS SDK behavior (port 80 for HTTP, port 443 for HTTPS), so
"http://s3.example.com:80" and "http://s3.example.com" are equivalent.
Bugs fixed:
- Default port stripping was removed by a prior PR, causing signature
mismatches when clients connect on standard ports (80/443)
- X-Forwarded-Port was ignored when X-Forwarded-Host was not present
- Scheme detection now uses proper precedence: X-Forwarded-Proto >
TLS connection > URL scheme > "http"
- Test expectations for standard port stripping were incorrect
- expectedHost field in TestSignatureV4WithForwardedPort was declared
but never actually checked (self-referential test)
* Add Docker integration test for S3 proxy signature verification
Docker Compose setup with nginx reverse proxy to validate that the
-s3.externalUrl parameter (or S3_EXTERNAL_URL env var) correctly
resolves S3 signature verification when SeaweedFS runs behind a proxy.
The test uses nginx proxying port 9000 to SeaweedFS on port 8333,
with X-Forwarded-Host/Port/Proto headers set. SeaweedFS is configured
with -s3.externalUrl=http://localhost:9000 so it uses "localhost:9000"
for signature verification, matching what the AWS CLI signs with.
The test can be run with aws CLI on the host or without it by using
the amazon/aws-cli Docker image with --network host.
Test covers: create-bucket, list-buckets, put-object, head-object,
list-objects-v2, get-object, content round-trip integrity,
delete-object, and delete-bucket — all through the reverse proxy.
* Create s3-proxy-signature-tests.yml
* fix CLI
* fix CI
* Update s3-proxy-signature-tests.yml
* address comments
* Update Dockerfile
* add user
* no need for fuse
* Update s3-proxy-signature-tests.yml
* debug
* weed mini
* fix health check
* health check
* fix health checking
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: Chris Lu <chris.lu@gmail.com>
* Allow multipart upload operations when s3:PutObject is authorized
Multipart upload is an implementation detail of putting objects, not a separate
permission. When a policy grants s3:PutObject, it should implicitly allow:
- s3:CreateMultipartUpload
- s3:UploadPart
- s3:CompleteMultipartUpload
- s3:AbortMultipartUpload
- s3:ListParts
This fixes a compatibility issue where clients like PyArrow that use multipart
uploads by default would fail even though the role had s3:PutObject permission.
The session policy intersection still applies - both the identity-based policy
AND session policy must allow s3:PutObject for multipart operations to work.
Implementation:
- Added constants for S3 multipart action strings
- Added multipartActionSet to efficiently check if action is multipart-related
- Updated MatchesAction method to implicitly grant multipart when PutObject allowed
* Update weed/s3api/policy_engine/types.go
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
* Add s3:ListMultipartUploads to multipart action set
Include s3:ListMultipartUploads in the multipartActionSet so that listing
multipart uploads is implicitly granted when s3:PutObject is authorized.
ListMultipartUploads is a critical part of the multipart upload workflow,
allowing clients to query in-progress uploads before completing them.
Changes:
- Added s3ListMultipartUploads constant definition
- Included s3ListMultipartUploads in multipartActionSet initialization
- Existing references to multipartActionSet automatically now cover ListMultipartUploads
All policy engine tests pass (0.351s execution time)
* Refactor: reuse multipart action constants from s3_constants package
Remove duplicate constant definitions from policy_engine/types.go and import
the canonical definitions from s3api/s3_constants/s3_actions.go instead.
This eliminates duplication and ensures a single source of truth for
multipart action strings:
- ACTION_CREATE_MULTIPART_UPLOAD
- ACTION_UPLOAD_PART
- ACTION_COMPLETE_MULTIPART
- ACTION_ABORT_MULTIPART
- ACTION_LIST_PARTS
- ACTION_LIST_MULTIPART_UPLOADS
All policy engine tests pass (0.350s execution time)
* Fix S3_ACTION_LIST_MULTIPART_UPLOADS constant value
Move S3_ACTION_LIST_MULTIPART_UPLOADS from bucket operations to multipart
operations section and change value from 's3:ListBucketMultipartUploads' to
's3:ListMultipartUploads' to match the action strings used in policy_engine
and s3_actions.go.
This ensures consistent action naming across all S3 constant definitions.
* refactor names
* Fix S3 action constant mismatches and MatchesAction early return bug
Fix two critical issues in policy engine:
1. S3Actions map had incorrect multipart action mappings:
- 'ListMultipartUploads' was 's3:ListMultipartUploads' (should be 's3:ListBucketMultipartUploads')
- 'ListParts' was 's3:ListParts' (should be 's3:ListMultipartUploadParts')
These mismatches caused authorization checks to fail for list operations
2. CompiledStatement.MatchesAction() had early return bug:
- Previously returned true immediately upon first direct action match
- This prevented scanning remaining matchers for s3:PutObject permission
- Now scans ALL matchers before returning, tracking both direct match and PutObject grant
- Ensures multipart operations inherit s3:PutObject authorization even when
explicitly requested action doesn't match (e.g., s3:ListMultipartUploadParts)
Changes:
- Track matchedAction flag to defer
Fix two critical issues in policy engine:
1. S3Actions map had incorrect multipart action mappings:
- 'ListMultipartUploads' was 's3:ListMultipartUplPer
1. S3Actions map had incorrect multiparAll - 'ListMultipartUploads(0.334s execution time)
* Refactor S3Actions map to use s3_constants
Replace hardcoded action strings in the S3Actions map with references to
canonical S3_ACTION_* constants from s3_constants/s3_action_strings.go.
Benefits:
- Single source of truth for S3 action values
- Eliminates string duplication across codebase
- Ensures consistency between policy engine and middleware
- Reduces maintenance burden when action strings need updates
All policy engine tests pass (0.334s execution time)
* Remove unused S3Actions map
The S3Actions map in types.go was never referenced anywhere in the codebase.
All action mappings are handled by GetActionMappings() in integration.go instead.
This removes 42 lines of dead code.
* Fix test: reload configuration function must also reload IAM state
TestEmbeddedIamAttachUserPolicyRefreshesIAM was failing because the test's
reloadConfigurationFunc only updated mockConfig but didn't reload the actual IAM
state. When AttachUserPolicy calls refreshIAMConfiguration(), it would use the
test's incomplete reload function instead of the real LoadS3ApiConfigurationFromCredentialManager().
Fixed by making the test's reloadConfigurationFunc also call
e.iam.LoadS3ApiConfigurationFromCredentialManager() so lookupByIdentityName()
sees the updated policy attachments.
---------
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Avoid stripping default ports (80/443) from the Host header in extractHostHeader.
This fixes SignatureDoesNotMatch errors when SeaweedFS is accessed via a proxy
(like Kong Ingress) that explicitly includes the port in the Host header or
X-Forwarded-Host, which S3 clients sign.
Also cleaned up unused variables and logic after refactoring.
* fix(s3api): make ListObjectsV1 namespaced and stop marker-echo pagination loops
* test(s3api): harden marker-echo coverage and align V1 encoding tag
* test(s3api): cover encoded marker matching and trim redundant setup
* refactor(s3api): tighten V1 list helper visibility and test mock docs
* S3: Truncate timestamps to milliseconds for CopyObjectResult and CopyPartResult
Fixes#8394
* S3: Address nitpick comments in copy handlers
- Synchronize Mtime and LastModified by capturing time once\n- Optimize copyChunksForRange loop\n- Use built-in min/max\n- Remove dead previewLen code
* Enforce IAM for s3tables bucket creation
* Prefer IAM path when policies exist
* Ensure IAM enforcement honors default allow
* address comments
* Reused the precomputed principal when setting tableBucketMetadata.OwnerAccountID, avoiding the redundant getAccountID call.
* get identity
* fix
* dedup
* fix
* comments
* fix tests
* update iam config
* go fmt
* fix ports
* fix flags
* mini clean shutdown
* Revert "update iam config"
This reverts commit ca48fdbb0a.
Revert "mini clean shutdown"
This reverts commit 9e17f6baff.
Revert "fix flags"
This reverts commit e9e7b29d2f.
Revert "go fmt"
This reverts commit bd3241960b.
* test/s3tables: share single weed mini per test package via TestMain
Previously each top-level test function in the catalog and s3tables
package started and stopped its own weed mini instance. This caused
failures when a prior instance wasn't cleanly stopped before the next
one started (port conflicts, leaked global state).
Changes:
- catalog/iceberg_catalog_test.go: introduce TestMain that starts one
shared TestEnvironment (external weed binary) before all tests and
tears it down after. All individual test functions now use sharedEnv.
Added randomSuffix() for unique resource names across tests.
- catalog/pyiceberg_test.go: updated to use sharedEnv instead of
per-test environments.
- catalog/pyiceberg_test_helpers.go -> pyiceberg_test_helpers_test.go:
renamed to a _test.go file so it can access TestEnvironment which is
defined in a test file.
- table-buckets/setup.go: add package-level sharedCluster variable.
- table-buckets/s3tables_integration_test.go: introduce TestMain that
starts one shared TestCluster before all tests. TestS3TablesIntegration
now uses sharedCluster. Extract startMiniClusterInDir (no *testing.T)
for TestMain use. TestS3TablesCreateBucketIAMPolicy keeps its own
cluster (different IAM config). Remove miniClusterMutex (no longer
needed). Fix Stop() to not panic when t is nil."
* delete
* parse
* default allow should work with anonymous
* fix port
* iceberg route
The failures are from Iceberg REST using the default bucket warehouse when no prefix is provided. Your tests create random buckets, so /v1/namespaces was looking in warehouse and failing. I updated the tests to use the prefixed Iceberg routes (/v1/{bucket}/...) via a small helper.
* test(s3tables): fix port conflicts and IAM ARN matching in integration tests
- Pass -master.dir explicitly to prevent filer store directory collision
between shared cluster and per-test clusters running in the same process
- Pass -volume.port.public and -volume.publicUrl to prevent the global
publicPort flag (mutated from 0 → concrete port by first cluster) from
being reused by a second cluster, causing 'address already in use'
- Remove the flag-reset loop in Stop() that reset global flag values while
other goroutines were reading them (race → panic)
- Fix IAM policy Resource ARN in TestS3TablesCreateBucketIAMPolicy to use
wildcards (arn:aws:s3tables:*:*:bucket/<name>) because the handler
generates ARNs with its own DefaultRegion (us-east-1) and principal name
('admin'), not the test constants testRegion/testAccountID
* Persist managed IAM policies
* Add IAM list/get policy integration test
* Faster marker lookup and cleanup
* Handle delete conflict and improve listing
* Add delete-in-use policy integration test
* Stabilize policy ID and guard path prefix
* Tighten CreatePolicy guard and reload
* Add ListPolicyNames to credential store
* s3: fix signature mismatch with non-standard ports and capitalized host
- ensure host header extraction is case-insensitive in SignedHeaders
- prioritize non-standard ports in X-Forwarded-Host over default ports in X-Forwarded-Port
- add regression tests for both scenarios
fixes https://github.com/seaweedfs/seaweedfs/issues/8382
* simplify