* fix: IAM authentication with AWS Signature V4 and environment credentials
Three key fixes for authenticated IAM requests to work:
1. Fix request body consumption before signature verification
- iamMatcher was calling r.ParseForm() which consumed POST body
- This broke AWS Signature V4 verification on subsequent reads
- Now only check query string in matcher, preserving body for verification
- File: weed/s3api/s3api_server.go
2. Preserve environment variable credentials across config reloads
- After IAM mutations, config reload overwrote env var credentials
- Extract env var loading into loadEnvironmentVariableCredentials()
- Call after every config reload to persist credentials
- File: weed/s3api/auth_credentials.go
3. Add authenticated IAM tests and test infrastructure
- New TestIAMAuthenticated suite with AWS SDK + Signature V4
- Dynamic port allocation for independent test execution
- Flag reset to prevent state leakage between tests
- CI workflow to run S3 and IAM tests separately
- Files: test/s3/example/*, .github/workflows/s3-example-integration-tests.yml
All tests pass:
- TestIAMCreateUser (unauthenticated)
- TestIAMAuthenticated (with AWS Signature V4)
- S3 integration tests
* fmt
* chore: rename test/s3/example to test/s3/normal
* simplify: CI runs all integration tests in single job
* Update s3-example-integration-tests.yml
* ci: run each test group separately to avoid raft registry conflicts
* Fix imbalance detection disk type grouping and volume grow errors
This PR addresses two issues:
1. Imbalance Detection: Previously, balance detection did not verify disk types, leading to false positives when comparing heterogenous nodes (e.g. SSD vs HDD). Logic is now updated to group volumes by DiskType before calculating imbalance.
2. Volume Grow Errors: Fixed a variable scope issue in master_grpc_server_volume.go and added a pre-check for available space to prevent 'only 0 volumes left' error logs when a disk type is full or abandoned.
Included units tests for the detection logic.
* Refactor balance detection loop into detectForDiskType
* Fix potential panic in volume grow logic by checking replica placement parse error
* storage: fix EC shard recovery with improved diagnostics and logging
- Fix buffer size mismatch in ReconstructData call
- Add detailed logging of available and missing shards
- Improve error messages when recovery is impossible
- Add unit tests for EC recovery shard counting logic
* test: refine EC recovery unit tests
- Remove redundant tests that only validate setup
- Use standard strings.Contains instead of custom recursive helper
* adjust tests and minor improvement
* fix: S3 listing NextMarker missing intermediate directory component
When listing with nested prefixes like "character/member/", the NextMarker
was incorrectly constructed as "character/res024/" instead of
"character/member/res024/", causing continuation requests to fail.
Root cause: The code at line 331 was constructing NextMarker as:
nextMarker = requestDir + "/" + nextMarker
This worked when nextMarker already contained the full relative path,
but failed when it was just the entry name from the innermost recursion.
Fix: Include the prefix component when constructing NextMarker:
if prefix != "" {
nextMarker = requestDir + "/" + prefix + "/" + nextMarker
}
This ensures the full path is always constructed correctly for both:
- CommonPrefix entries (directories)
- Regular entries (files)
Also includes fix for cursor.prefixEndsOnDelimiter state leak that was
causing sibling directories to be incorrectly listed.
* test: add regression tests for NextMarker construction
Add comprehensive unit tests to verify NextMarker is correctly constructed
with nested prefixes. Tests cover:
- Regular entries with nested prefix (character/member/res024)
- CommonPrefix entries (directories)
- Edge cases (no requestDir, no prefix, deeply nested)
These tests ensure the fix prevents regression of the bug where
NextMarker was missing intermediate directory components.
Specifically:
- Use bytes.NewReader for binary data instead of strings.NewReader
- Increase binary test data from 8 bytes to 1KB to avoid edge cases
- Add 50ms delay between subtests to prevent overwhelming the server
* Fix: Populate Claims from STS session RequestContext for policy variable substitution
When using STS temporary credentials (from AssumeRoleWithWebIdentity) with
AWS Signature V4 authentication, JWT claims like preferred_username were
not available for bucket policy variable substitution (e.g., ${jwt:preferred_username}).
Root Cause:
- STS session tokens store user claims in the req_ctx field (added in PR #8079)
- validateSTSSessionToken() created Identity but didn't populate Claims field
- authorizeWithIAM() created IAMIdentity but didn't copy Claims
- Policy engine couldn't resolve ${jwt:*} variables without claims
Changes:
1. auth_signature_v4.go: Extract claims from sessionInfo.RequestContext
and populate Identity.Claims in validateSTSSessionToken()
2. auth_credentials.go: Copy Claims when creating IAMIdentity in
authorizeWithIAM()
3. auth_sts_identity_test.go: Add TestSTSIdentityClaimsPopulation to
verify claims are properly populated from RequestContext
This enables bucket policies with JWT claim variables to work correctly
with STS temporary credentials obtained via AssumeRoleWithWebIdentity.
Fixes#8037
* Refactor: Idiomatic map population for STS claims
* Fix S3 conditional writes with versioning (Issue #8073)
Refactors conditional header checks to properly resolve the latest object version when versioning is enabled. This prevents incorrect validation against non-versioned root objects.
* Add integration test for S3 conditional writes with versioning (Issue #8073)
* Refactor: Propagate internal errors in conditional header checks
- Make resolveObjectEntry return errors from isVersioningConfigured
- Update checkConditionalHeaders checks to return 500 on internal resolve errors
* Refactor: Stricter error handling and test assertions
- Propagate internal errors in checkConditionalHeaders*WithGetter functions
- Enforce strict 412 PreconditionFailed check in integration test
* Perf: Add early return for conditional headers + safety improvements
- Add fast path to skip resolveObjectEntry when no conditional headers present
- Avoids expensive getLatestObjectVersion retries in common case
- Add nil checks before dereferencing pointers in integration test
- Fix grammar in test comments
- Remove duplicate comment in resolveObjectEntry
* Refactor: Use errors.Is for robust ErrNotFound checking
- Update checkConditionalHeaders* to use errors.Is(err, filer_pb.ErrNotFound)
- Update resolveObjectEntry to use errors.Is for wrapped error compatibility
- Remove duplicate comment lines in s3api handlers
* Perf: Optimize resolveObjectEntry for conditional checks
- Refactor getLatestObjectVersion to doGetLatestObjectVersion supporting variable retries
- Use 1-retry path in resolveObjectEntry to avoid exponential backoff latency
* Test: Enhance integration test with content verification
- Verify actual object content equals expected content after successful conditional write
- Add missing io and errors imports to test file
* Refactor: Final refinements based on feedback
- Optimize header validation by passing parsed headers to avoid redundant parsing
- Simplify integration test assertions using require.Error and assert.True
- Fix build errors in s3api handler and test imports
* Test: Use smithy.APIError for robust error code checking
- Replace string-based error checking with structured API error
- Add smithy-go import for AWS SDK v2 error handling
* Test: Use types.PreconditionFailed and handle io.ReadAll error
- Replace smithy.APIError with more specific types.PreconditionFailed
- Add proper error handling for io.ReadAll in content verification
* Refactor: Use combined error checking and add nil guards
- Use smithy.APIError with ErrorCode() for robust error checking
- Add nil guards for entry.Attributes before accessing Mtime
- Prevents potential panics when Attributes is uninitialized
* fix: Refactor CORS middleware to consistently apply the `Vary: Origin` header when a configuration exists and streamline request processing logic.
* fix: Add Vary: Origin header to CORS OPTIONS responses and refactor request handling for clarity and correctness.
* fix: update CORS middleware tests to correctly parse and check for 'Origin' in Vary header.
* refactor: extract `hasVaryOrigin` helper function to simplify Vary header checks in tests.
* test: Remove `Vary: Origin` header from CORS test expectations.
* refactor: consolidate CORS request handling into a new `processCORS` method using a `next` callback.
Fix S3 CORS for non-existent buckets
Enable fallback to global CORS configuration when a bucket is not found (s3err.ErrNoSuchBucket). This ensures consistent CORS behavior and prevents information disclosure.
Fixes#8065
Problem:
- CORS headers were only applied after checking bucket existence
- Non-existent buckets returned responses without CORS headers
- This caused CORS preflight failures and information disclosure vulnerability
- Unauthenticated users could infer bucket existence from CORS header presence
Solution:
- Moved CORS evaluation before bucket existence check in middleware
- CORS headers now applied consistently regardless of bucket existence
- Preflight requests succeed for non-existent buckets (matching AWS S3)
- Actual requests still return NoSuchBucket error but with CORS headers
Changes:
- Modified Handler() and HandleOptionsRequest() in middleware.go
- Added comprehensive test suite for non-existent bucket scenarios
- All 39 tests passing (31 existing + 8 new)
Security Impact:
- Prevents information disclosure about bucket existence
- Bucket existence cannot be inferred from CORS header presence/absence
AWS S3 Compatibility:
- Improved compatibility with AWS S3 CORS behavior
- Preflight requests now succeed for non-existent buckets
* Fix nil pointer panic in maintenance worker when receiving empty task assignment
When a worker requests a task and none are available, the admin server
sends an empty TaskAssignment message. The worker was attempting to log
the task details without checking if the TaskId was empty, causing a
nil pointer dereference when accessing taskAssign.Params.VolumeId.
This fix adds a check for empty TaskId before processing the assignment,
preventing worker crashes and improving stability in production environments.
* Add EC integration test for admin-worker maintenance system
Adds comprehensive integration test that verifies the end-to-end flow
of erasure coding maintenance tasks:
- Admin server detects volumes needing EC encoding
- Workers register and receive task assignments
- EC encoding is executed and verified in master topology
- File read-back validation confirms data integrity
The test uses unique absolute working directories for each worker to
prevent ID conflicts and ensure stable worker registration. Includes
proper cleanup and process management for reliable test execution.
* Improve maintenance system stability and task deduplication
- Add cross-type task deduplication to prevent concurrent maintenance
operations on the same volume (EC, balance, vacuum)
- Implement HasAnyTask check in ActiveTopology for better coordination
- Increase RequestTask timeout from 5s to 30s to prevent unnecessary
worker reconnections
- Add TaskTypeNone sentinel for generic task checks
- Update all task detectors to use HasAnyTask for conflict prevention
- Improve config persistence and schema handling
* Add GitHub Actions workflow for EC integration tests
Adds CI workflow that runs EC integration tests on push and pull requests
to master branch. The workflow:
- Triggers on changes to admin, worker, or test files
- Builds the weed binary
- Runs the EC integration test suite
- Uploads test logs as artifacts on failure for debugging
This ensures the maintenance system remains stable and worker-admin
integration is validated in CI.
* go version 1.24
* address comments
* Update maintenance_integration.go
* support seconds
* ec prioritize over balancing in tests
Fix: Propagate OIDC claims to IAM identity for dynamic policy variables
Fixes#8037. Ensures additional OIDC claims (like preferred_username) are preserved in ExternalIdentity attributes and propagated to IAM tokens, enabling substitution in dynamic policies.
* Prevent split-brain: Persistent ClusterID and Join Validation
- Persist ClusterId in Raft store to survive restarts.
- Validate ClusterId on Raft command application (piggybacked on MaxVolumeId).
- Prevent masters with conflicting ClusterIds from joining/operating together.
- Update Telemetry to report the persistent ClusterId.
* Refine ClusterID validation based on feedback
- Improved error message in cluster_commands.go.
- Added ClusterId mismatch check in RaftServer.Recovery.
* Handle Raft errors and support Hashicorp Raft for ClusterId
- Check for errors when persisting ClusterId in legacy Raft.
- Implement ClusterId generation and persistence for Hashicorp Raft leader changes.
- Ensure consistent error logging.
* Refactor ClusterId validation
- Centralize ClusterId mismatch check in Topology.SetClusterId.
- Simplify MaxVolumeIdCommand.Apply and RaftServer.Recovery to rely on SetClusterId.
* Fix goroutine leak and add timeout
- Handle channel closure in Hashicorp Raft leader listener.
- Add timeout to Raft Apply call to prevent blocking.
* Fix deadlock in legacy Raft listener
- Wrap ClusterId generation/persistence in a goroutine to avoid blocking the Raft event loop (deadlock).
* Rename ClusterId to SystemId
- Renamed ClusterId to SystemId across the codebase (protobuf, topology, server, telemetry).
- Regenerated telemetry.pb.go with new field.
* Rename SystemId to TopologyId
- Rename to SystemId was intermediate step.
- Final name is TopologyId for the persistent cluster identifier.
- Updated protobuf, topology, raft server, master server, and telemetry.
* Optimize Hashicorp Raft listener
- Integrated TopologyId generation into existing monitorLeaderLoop.
- Removed extra goroutine in master_server.go.
* Fix optimistic TopologyId update
- Removed premature local state update of TopologyId in master_server.go and raft_hashicorp.go.
- State is now solely updated via the Raft state machine Apply/Restore methods after consensus.
* Add explicit log for recovered TopologyId
- Added glog.V(0) info log in RaftServer.Recovery to print the recovered TopologyId on startup.
* Add Raft barrier to prevent TopologyId race condition
- Implement ensureTopologyId helper method
- Send no-op MaxVolumeIdCommand to sync Raft log before checking TopologyId
- Ensures persisted TopologyId is recovered before generating new one
- Prevents race where generation happens during log replay
* Serialize TopologyId generation with mutex
- Add topologyIdGenLock mutex to MasterServer struct
- Wrap ensureTopologyId method with lock to prevent concurrent generation
- Fixes race where event listener and manual leadership check both generate IDs
- Second caller waits for first to complete and sees the generated ID
* Add TopologyId recovery logging to Apply method
- Change log level from V(1) to V(0) for visibility
- Log 'Recovered TopologyId' when applying from Raft log
- Ensures recovery is visible whether from snapshot or log replay
- Matches Recovery() method logging for consistency
* Fix Raft barrier timing issue
- Add 100ms delay after barrier command to ensure log application completes
- Add debug logging to track barrier execution and TopologyId state
- Return early if barrier command fails
- Prevents TopologyId generation before old logs are fully applied
* ensure leader
* address comments
* address comments
* redundant
* clean up
* double check
* refactoring
* comment
* filer: auto clean empty s3 implicit folders
Explicitly tag implicitly created S3 folders (parent directories from object uploads) with 'Seaweed-X-Amz-Implicit-Dir'.
Update EmptyFolderCleaner to check for this attribute and cache the result efficiently.
* filer: correctly handle nil attributes in empty folder cleaner cache
* filer: refine implicit tagging logic
Prevent tagging buckets as implicit directories. Reduce code duplication.
* filer: safeguard GetEntryAttributes against nil entry and not found error
* filer: move ErrNotFound handling to EmptyFolderCleaner
* filer: add comment to explain level > 3 check for implicit directories
* Add access key status management to Admin UI
- Add Status field to AccessKeyInfo struct
- Implement UpdateAccessKeyStatus API endpoint
- Add status dropdown in access keys modal
- Fix modal backdrop issue by using refreshAccessKeysList helper
- Status can be toggled between Active and Inactive
* Replace magic strings with constants for access key status
- Define AccessKeyStatusActive and AccessKeyStatusInactive constants in admin_data.go
- Define STATUS_ACTIVE and STATUS_INACTIVE constants in JavaScript
- Replace all hardcoded 'Active' and 'Inactive' strings with constants
- Update error messages to use constants for consistency
* Remove duplicate manageAccessKeys function definition
* Add security improvements to access key status management
- Add status validation in UpdateAccessKeyStatus to prevent invalid values
- Fix XSS vulnerability by replacing inline onchange with data attributes
- Add delegated event listener for status select changes
- Add URL encoding to API request path segments
Fix bucket permission persistence and security issues (#7226)
Security Fixes:
- Fix XSS vulnerability in showModal by using DOM methods instead of template strings for title
- Add escapeHtmlForAttribute helper to properly escape all HTML entities (&, <, >, ", ')
- Fix XSS in showSecretKey and showNewAccessKeyModal by using proper HTML escaping
- Fix XSS in createAccessKeysContent by replacing inline onclick with data attributes and event delegation
Code Cleanup:
- Remove debug label "(DEBUG)" from page header
- Remove debug console.log statements from buildBucketPermissionsNew
- Remove dead functions: addBucketPermissionRow, removeBucketPermissionRow, parseBucketPermissions, buildBucketPermissions
Validation Improvements:
- Add validation in handleUpdateUser to prevent empty permissions submission
- Update buildBucketPermissionsNew to return null when no buckets selected (instead of empty array)
- Add proper error messages for validation failures
UI Improvements:
- Enhanced access key management with proper modals and copy buttons
- Improved copy-to-clipboard functionality with fallbacks
Fixes#7226
* Fix: Fail fast when initializing volume with Version 0
* Fix: Fail fast when loading unsupported volume version (e.g. 0 or 4)
* Refactor: Use IsSupportedVersion helper function for version validation
* Fix#8040: Support 'default' keyword in collectionPattern to match default collection
The default collection in SeaweedFS is represented as an empty string internally.
Previously, it was impossible to specifically target only the default collection
because:
- Empty collectionPattern matched ALL collections (filter was skipped)
- Using collectionPattern="default" tried to match the literal string "default"
This commit adds special handling for the keyword "default" in collectionPattern
across multiple shell commands:
- volume.tier.move
- volume.list
- volume.fix.replication
- volume.configure.replication
Now users can use -collectionPattern="default" to specifically target volumes
in the default collection (empty collection name), while maintaining backward
compatibility where empty pattern matches all collections.
Updated help text to document this feature.
* Update compileCollectionPattern to support 'default' keyword
This extends the fix to all commands that use regex-based collection
pattern matching:
- ec.encode
- ec.decode
- volume.tier.download
- volume.balance
The compileCollectionPattern function now treats "default" as a special
keyword that compiles to the regex "^$" (matching empty strings), making
it consistent with the other commands that use filepath.Match.
* Use CollectionDefault constant instead of hardcoded "default" string
Refactored the collection pattern matching logic to use a central constant
CollectionDefault defined in weed/shell/common.go. This improves maintainability
and ensures consistency across all shell commands.
* Address PR review feedback: simplify logic and use '_default' keyword
Changes:
1. Changed CollectionDefault from "default" to "_default" to avoid collision
with literal collection names
2. Simplified pattern matching logic to reduce code duplication across all
affected commands
3. Fixed error handling in command_volume_tier_move.go to properly propagate
filepath.Match errors instead of swallowing them
4. Updated documentation to clarify how to match a literal "default"
collection using regex patterns like "^default$"
This addresses all feedback from PR review comments.
* Remove unnecessary documentation about matching literal 'default'
Since we changed the keyword to '_default', users can now simply use
'default' to match a literal collection named "default". The previous
documentation about using regex patterns was confusing and no longer needed.
* Fix error propagation and empty pattern handling
1. command_volume_tier_move.go: Added early termination check after
eachDataNode callback to stop processing remaining nodes if a pattern
matching error occurred, improving efficiency
2. command_volume_configure_replication.go: Fixed empty pattern handling
to match all collections (collectionMatched = true when pattern is empty),
mirroring the behavior in other commands
These changes address the remaining PR review feedback.
* fix: S3 copying test Makefile syntax and add S3_ENDPOINT env support
* fix: add weed mini to stop-seaweedfs target
Ensure weed mini process is properly killed when stopping SeaweedFS,
matching the process started in start-seaweedfs target.
* Clean up PID file in stop-seaweedfs and clean targets
Address review feedback to ensure /tmp/weed-mini.pid is removed
for a clean state after tests.