Some filesystems, such as XFS, may over-allocate disk spaces when using
volume preallocation. Remove this option from the default docker entrypoint
scripts to allow volumes to use only the necessary disk space.
Fixes: https://github.com/seaweedfs/seaweedfs/issues/6465#issuecomment-3964174718
Co-authored-by: Copilot <copilot@github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* Make EC detection context aware
* Update register.go
* Speed up EC detection planning
* Add tests for EC detection planner
* optimizations
detection.go: extracted ParseCollectionFilter (exported) and feed it into the detection loop so both detection and tracing share the same parsing/whitelisting logic; the detection loop now iterates on a sorted list of volume IDs, checks the context at every iteration, and only sets hasMore when there are still unprocessed groups after hitting maxResults, keeping runtime bounded while still scheduling planned tasks before returning the results.
erasure_coding_handler.go: dropped the duplicated inline filter parsing in emitErasureCodingDetectionDecisionTrace and now reuse erasurecodingtask.ParseCollectionFilter, and the summary suffix logic now only accounts for the hasMore case that can actually happen.
detection_test.go: updated the helper topology builder to use master_pb.VolumeInformationMessage (matching the current protobuf types) and tightened the cancellation/max-results tests so they reliably exercise the detection logic (cancel before calling Detection, and provide enough disks so one result is produced before the limit).
* use working directory
* fix compilation
* fix compilation
* rename
* go vet
* fix getenv
* address comments, fix error
* Fix SFTP file upload failures with JWT filer tokens (issue #8425)
When JWT authentication is enabled for filer operations via jwt.filer_signing.*
configuration, SFTP server file upload requests were rejected because they lacked
JWT authorization headers.
Changes:
- Added JWT signing key and expiration fields to SftpServer struct
- Modified putFile() to generate and include JWT tokens in upload requests
- Enhanced SFTPServiceOptions with JWT configuration fields
- Updated SFTP command startup to load and pass JWT config to service
This allows SFTP uploads to authenticate with JWT-enabled filers, consistent
with how other SeaweedFS components (S3 API, file browser) handle filer auth.
Fixes#8425
* Apply suggestion from @gemini-code-assist[bot]
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
---------
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
TestS3MultipartOperationsInheritPutObjectPermissions verifies that multipart
upload operations (CreateMultipartUpload, UploadPart, ListParts,
CompleteMultipartUpload, AbortMultipartUpload, ListMultipartUploads) work
correctly when a user has only s3:PutObject permission granted.
This test validates the behavior where multipart operations are implicitly
granted when s3:PutObject is authorized, as multipart upload is an
implementation detail of putting objects in S3.
* Allow multipart upload operations when s3:PutObject is authorized
Multipart upload is an implementation detail of putting objects, not a separate
permission. When a policy grants s3:PutObject, it should implicitly allow:
- s3:CreateMultipartUpload
- s3:UploadPart
- s3:CompleteMultipartUpload
- s3:AbortMultipartUpload
- s3:ListParts
This fixes a compatibility issue where clients like PyArrow that use multipart
uploads by default would fail even though the role had s3:PutObject permission.
The session policy intersection still applies - both the identity-based policy
AND session policy must allow s3:PutObject for multipart operations to work.
Implementation:
- Added constants for S3 multipart action strings
- Added multipartActionSet to efficiently check if action is multipart-related
- Updated MatchesAction method to implicitly grant multipart when PutObject allowed
* Update weed/s3api/policy_engine/types.go
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
* Add s3:ListMultipartUploads to multipart action set
Include s3:ListMultipartUploads in the multipartActionSet so that listing
multipart uploads is implicitly granted when s3:PutObject is authorized.
ListMultipartUploads is a critical part of the multipart upload workflow,
allowing clients to query in-progress uploads before completing them.
Changes:
- Added s3ListMultipartUploads constant definition
- Included s3ListMultipartUploads in multipartActionSet initialization
- Existing references to multipartActionSet automatically now cover ListMultipartUploads
All policy engine tests pass (0.351s execution time)
* Refactor: reuse multipart action constants from s3_constants package
Remove duplicate constant definitions from policy_engine/types.go and import
the canonical definitions from s3api/s3_constants/s3_actions.go instead.
This eliminates duplication and ensures a single source of truth for
multipart action strings:
- ACTION_CREATE_MULTIPART_UPLOAD
- ACTION_UPLOAD_PART
- ACTION_COMPLETE_MULTIPART
- ACTION_ABORT_MULTIPART
- ACTION_LIST_PARTS
- ACTION_LIST_MULTIPART_UPLOADS
All policy engine tests pass (0.350s execution time)
* Fix S3_ACTION_LIST_MULTIPART_UPLOADS constant value
Move S3_ACTION_LIST_MULTIPART_UPLOADS from bucket operations to multipart
operations section and change value from 's3:ListBucketMultipartUploads' to
's3:ListMultipartUploads' to match the action strings used in policy_engine
and s3_actions.go.
This ensures consistent action naming across all S3 constant definitions.
* refactor names
* Fix S3 action constant mismatches and MatchesAction early return bug
Fix two critical issues in policy engine:
1. S3Actions map had incorrect multipart action mappings:
- 'ListMultipartUploads' was 's3:ListMultipartUploads' (should be 's3:ListBucketMultipartUploads')
- 'ListParts' was 's3:ListParts' (should be 's3:ListMultipartUploadParts')
These mismatches caused authorization checks to fail for list operations
2. CompiledStatement.MatchesAction() had early return bug:
- Previously returned true immediately upon first direct action match
- This prevented scanning remaining matchers for s3:PutObject permission
- Now scans ALL matchers before returning, tracking both direct match and PutObject grant
- Ensures multipart operations inherit s3:PutObject authorization even when
explicitly requested action doesn't match (e.g., s3:ListMultipartUploadParts)
Changes:
- Track matchedAction flag to defer
Fix two critical issues in policy engine:
1. S3Actions map had incorrect multipart action mappings:
- 'ListMultipartUploads' was 's3:ListMultipartUplPer
1. S3Actions map had incorrect multiparAll - 'ListMultipartUploads(0.334s execution time)
* Refactor S3Actions map to use s3_constants
Replace hardcoded action strings in the S3Actions map with references to
canonical S3_ACTION_* constants from s3_constants/s3_action_strings.go.
Benefits:
- Single source of truth for S3 action values
- Eliminates string duplication across codebase
- Ensures consistency between policy engine and middleware
- Reduces maintenance burden when action strings need updates
All policy engine tests pass (0.334s execution time)
* Remove unused S3Actions map
The S3Actions map in types.go was never referenced anywhere in the codebase.
All action mappings are handled by GetActionMappings() in integration.go instead.
This removes 42 lines of dead code.
* Fix test: reload configuration function must also reload IAM state
TestEmbeddedIamAttachUserPolicyRefreshesIAM was failing because the test's
reloadConfigurationFunc only updated mockConfig but didn't reload the actual IAM
state. When AttachUserPolicy calls refreshIAMConfiguration(), it would use the
test's incomplete reload function instead of the real LoadS3ApiConfigurationFromCredentialManager().
Fixed by making the test's reloadConfigurationFunc also call
e.iam.LoadS3ApiConfigurationFromCredentialManager() so lookupByIdentityName()
sees the updated policy attachments.
---------
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
* feat: add statfile; add error for remote storage misses
* feat: statfile implementations for storage providers
* test: add unit tests for StatFile method across providers
Add comprehensive unit tests for the StatFile implementation covering:
- S3: interface compliance and error constant accessibility
- Azure: interface compliance, error constants, and field population
- GCS: interface compliance, error constants, error detection, and field population
Also fix variable shadowing issue in S3 and Azure StatFile implementations where
named return parameters were being shadowed by local variable declarations.
Co-authored-by: Cursor <cursoragent@cursor.com>
* fix: address StatFile review feedback
- Use errors.New for ErrRemoteObjectNotFound sentinel
- Fix S3 HeadObject 404 detection to use awserr.Error code check
- Remove hollow field-population tests that tested nothing
- Remove redundant stdlib error detection tests
- Trim verbose doc comment on ErrRemoteObjectNotFound
Co-authored-by: Cursor <cursoragent@cursor.com>
* fix: address second round of StatFile review feedback
- Rename interface assertion tests to TestXxxRemoteStorageClientImplementsInterface
- Delegate readFileRemoteEntry to StatFile in all three providers
- Revert S3 404 detection to RequestFailure.StatusCode() check
- Fix double-slash in GCS error message format string
- Add storage type prefix to S3 error message for consistency
Co-authored-by: Cursor <cursoragent@cursor.com>
* fix: comments
---------
Co-authored-by: Cursor <cursoragent@cursor.com>
* Enhance volume.merge command with deduplication and disk-based backend
* Fix copyVolume function call with correct argument order and missing bool parameter
* Revert "Fix copyVolume function call with correct argument order and missing bool parameter"
This reverts commit 7b4a190643.
* Fix critical issues: per-replica writable tracking, tail goroutine cancellation via done channel, and debug logging for allocation failures
* Optimize memory usage with watermark approach for duplicate detection
* Fix critical issues: swap copyVolume arguments, increase idle timeout, remove file double-close, use glog for logging
* Replace temporary file with in-memory buffer for needle blob serialization
* test(volume.merge): Add comprehensive unit and integration tests
Add 7 unit tests covering:
- Ordering by timestamp
- Cross-stream duplicate deduplication
- Empty stream handling
- Complex multi-stream deduplication
- Single stream passthrough
- Large needle ID support
- LastModified fallback when timestamp unavailable
Add 2 integration validation tests:
- TestMergeWorkflowValidation: Documents 9-stage merge workflow
- TestMergeEdgeCaseHandling: Validates 10 edge case handling
All tests passing (9/9)
* fix(volume.merge): Use time window for deduplication to handle clock skew
The same needle ID can have different timestamps on different servers due to
clock skew and replication lag. Needles with the same ID within a 5-second
time window are now treated as duplicates (same write with timestamp variance).
Key changes:
- Add mergeDeduplicationWindowNs constant (5 seconds)
- Replace exact timestamp matching with time window comparison
- Use windowInitialized flag to properly detect window transitions
- Add TestMergeNeedleStreamsTimeWindowDeduplication test
This ensures that replicated writes with slight timestamp differences are
properly deduplicated during merge, while separate updates to the same file
ID (outside the window) are preserved.
All tests passing (10/10)
* test: Add volume.merge integration tests with 5 comprehensive test cases
* test: integration tests for volume.merge command
* Fix integration tests: use TripleVolumeCluster for volume.merge testing
- Created new TripleVolumeCluster framework (cluster_triple.go) with 3 volume servers
- Rebuilt weed binary with volume.merge command compiled in
- Updated all 5 integration tests to use TripleVolumeCluster instead of DualVolumeCluster
- Tests now properly allocate volumes on 2 servers and let merge allocate on 3rd
- All 5 integration tests now pass:
- TestVolumeMergeBasic
- TestVolumeMergeReadonly
- TestVolumeMergeRestore
- TestVolumeMergeTailNeedles
- TestVolumeMergeDivergentReplicas
* Refactor test framework: use parameterized server count instead of hardcoded
- Renamed TripleVolumeCluster to MultiVolumeCluster with serverCount parameter
- Replaced hardcoded volumePort0/1/2 with slices for flexible server count
- Updated StartTripleVolumeCluster as backward-compatible wrapper calling StartMultiVolumeCluster(t, profile, 3)
- Made directory creation, port allocation, and server startup loop-based
- Updated accessor methods (VolumeAdminAddress, VolumeGRPCAddress, etc.) to support any server count
- All 5 integration tests continue to pass with new parameterized cluster framework
- Enables future testing with 2, 4, 5+ volume servers by calling StartMultiVolumeCluster directly
* Consolidate cluster frameworks: StartDualVolumeCluster now uses MultiVolumeCluster
- Made DualVolumeCluster a type alias for MultiVolumeCluster
- Updated StartDualVolumeCluster to call StartMultiVolumeCluster(t, profile, 2)
- Removed duplicate code from cluster_dual.go (now just 17 lines)
- All existing tests using StartDualVolumeCluster continue to work without changes
- Backward compatible: existing code continues to use the old function signatures
- Added wrapper functions in cluster_multi.go for StartTripleVolumeCluster
- Enables unified cluster management across all test suites
* Address PR review comments: improve error handling and clean up code
- Replace parse error swallow with proper error return
- Log cleanup and restoration errors instead of silently discarding them
- Remove unused offset field from memoryBackendFile struct
- Fix WriteAt buffer truncation bug to preserve trailing bytes
- All unit tests passing (10/10)
- Code compiles successfully
* Fix PR review findings: test improvements and code quality
- Add timeout to runWeedShell to prevent hanging
- Add server 1 readonly status verification in tests
- Assert merge fails when replicas writable (not just log output)
- Replace sleep with polling for writable restoration check
- Fix WriteAt stale data snapshot bug in memoryBackendFile
- Fix startVolume error logging to show current server log
- Fix volumePubPorts double assignment in port allocation
- Rename test to reflect behavior: DoesNotDeduplicateAcrossWindows
- Fix misleading dedup window comment
Unit tests: 10/10 passing
Binary: Compiles successfully
* Fix test assumption: merge command marks volumes readonly automatically
TestVolumeMergeReadonly was expecting merge to fail on writable volumes, but the
merge command is designed to mark volumes readonly as part of its operation. Fixed
test to verify merge succeeds on writable volumes and properly restores writable
state afterward. Removed redundant Test 2 code that duplicated the new behavior.
* fmt
* Fix deduplication logic to correctly handle same-stream vs cross-stream duplicates
The dedup map previously used only NeedleId as key, causing same-stream
overwrites to be incorrectly skipped as duplicates. Changed to track which
stream first processed each needle ID in the current window:
- Cross-stream duplicates (same ID from different streams, within window) are skipped
- Same-stream duplicates (overwrites from same stream) are kept
- Map now stores: needleId -> streamIndex of first occurrence in window
Added TestMergeNeedleStreamsSameStreamDuplicates to verify same-stream
overwrites are preserved while cross-stream duplicates are skipped.
All unit tests passing (11/11)
Binary compiles successfully
* Fix IAM inline user policy retrieval
* fmt
* Persist inline user policies to avoid loss on server restart
- Use s3ApiConfig.PutPolicies/GetPolicies for persistent storage instead of non-persistent global map
- Remove unused global policyDocuments map
- Update PutUserPolicy to store policies in persistent storage
- Update GetUserPolicy to read from persistent storage
- Update DeleteUserPolicy to clean up persistent storage
- Add mock IamS3ApiConfig for testing
- Improve test to verify policy statements are not merged or lost
* Fix inline policy key collision and action aggregation
* Improve error handling and optimize inline policy management
- GetUserPolicy: Propagate GetPolicies errors instead of silently falling through
- DeleteUserPolicy: Return error immediately on GetPolicies failure
- computeAggregatedActionsForUser: Add optional Policies parameter for I/O optimization
- PutUserPolicy: Reuse fetched policies to avoid redundant GetPolicies call
- Improve logging with clearer messages about best-effort aggregation
- Update test to use exact action string matching instead of substring checks
All 15 tests pass with no regressions.
* Add per-user policy index for O(1) lookup performance
- Extend Policies struct with InlinePolicies map[userName]map[policyName]
- Add getOrCreateUserPolicies() helper for safe user map management
- Update computeAggregatedActionsForUser to use direct user map access
- Update PutUserPolicy, GetUserPolicy, DeleteUserPolicy for new structure
- Performance: O(1) user lookups instead of O(all_policies) iteration
- Eliminates string prefix matching loop
- All tests pass; backward compatible with managed policies
* Fix DeleteUserPolicy to validate user existence before storage modification
Refactor DeleteUserPolicy handler to check user existence early:
- First iterate s3cfg.Identities to verify user exists
- Return NoSuchEntity error immediately if user not found
- Only then proceed with GetPolicies and policy deletion
- Capture reference to found identity for direct update
This ensures consistency: if user doesn't exist, storage is not modified.
Previously the code would delete from storage first and check identity
afterwards, potentially leaving orphaned policies.
Benefits:
- Fail-fast validation before storage operations
- No orphaned policies in storage if validation fails
- Atomic from logical perspective
- Direct identity reference eliminates redundant loop
- All error paths preserved and tested
All 15 tests pass; no functional changes to behavior.
* Fix GetUserPolicy to return NoSuchEntity when inline policy not found
When InlinePolicies[userName] exists but does not contain policyName,
the handler now immediately returns NoSuchEntity error instead of
falling through to the reconstruction logic.
Changes:
- Add else clause after userPolicies[policyName] lookup
- Return IamError(NoSuchEntityException, "policy not found") immediately
- Prevents incorrect fallback to reconstructing ident.Actions
- Ensures explicit error when policy explicitly doesn't exist
This improves error semantics:
- Policy exists in stored inline policies → return error (not reconstruct)
- Policy doesn't exist in stored inline policies → try reconstruction (backward compat)
- Storage error → return service failure error
All 15 tests pass; no behavioral changes to existing error or success paths.
* fix(plugin/worker): make VacuumHandler report MaxExecutionConcurrency from worker startup flag
Previously, MaxExecutionConcurrency was hardcoded to 2 in VacuumHandler.Capability().
The scheduler's schedulerWorkerExecutionLimit() takes the minimum of the UI-configured
PerWorkerExecutionConcurrency and the worker-reported capability limit, so the hardcoded
value silently capped each worker to 2 concurrent vacuum executions regardless of the
--max-execute flag passed at worker startup.
Pass maxExecutionConcurrency into NewVacuumHandler() and wire it through
buildPluginWorkerHandler/buildPluginWorkerHandlers so the capability reflects the actual
worker configuration. The default falls back to 2 when the value is unset or zero.
* Update weed/command/worker_runtime.go
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
---------
Co-authored-by: Anton Ustyugov <anton@devops>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
* fix(admin): release mutex before disk I/O in maintenance queue
saveTaskState performs synchronous BoltDB writes. Calling it while
holding mq.mutex.Lock() in AddTask, GetNextTask, and CompleteTask
blocks all readers (GetTasks via RLock) for the full disk write
duration on every task state change.
During a maintenance scan AddTasksFromResults calls AddTask for every
volume — potentially hundreds of times — meaning the write lock is
held almost continuously. The HTTP handler for /maintenance calls
GetTasks which blocks on RLock, exceeding the 30s timeout and
returning 408 to the browser.
Fix: update in-memory state (mq.tasks, mq.pendingTasks) under the
lock as before, then unlock before calling saveTaskState. In-memory
state is the authoritative source; persistence is crash-recovery only
and does not require lock protection during the write.
* fix(admin): add mutex to ConfigPersistence to synchronize tasks/ filesystem ops
saveTaskState is now called outside mq.mutex, meaning SaveTaskState,
LoadAllTaskStates, DeleteTaskState, and CleanupCompletedTasks can be
invoked concurrently from multiple goroutines. ConfigPersistence had no
internal synchronization, creating races on the tasks/ directory:
- concurrent os.WriteFile + os.ReadFile on the same .pb file could
yield a partial read and unmarshal error
- LoadAllTaskStates (ReadDir + per-file ReadFile) could see a
directory entry for a file being written or deleted concurrently
- CleanupCompletedTasks (LoadAllTaskStates + DeleteTaskState) could
race with SaveTaskState on the same file
Fix: add tasksMu sync.Mutex to ConfigPersistence, acquired at the top
of SaveTaskState, LoadTaskState, LoadAllTaskStates, DeleteTaskState,
and CleanupCompletedTasks. Extract private Locked helpers so that
CleanupCompletedTasks (which holds tasksMu) can call them internally
without deadlocking.
---------
Co-authored-by: Anton Ustyugov <anton@devops>
Avoid stripping default ports (80/443) from the Host header in extractHostHeader.
This fixes SignatureDoesNotMatch errors when SeaweedFS is accessed via a proxy
(like Kong Ingress) that explicitly includes the port in the Host header or
X-Forwarded-Host, which S3 clients sign.
Also cleaned up unused variables and logic after refactoring.
* fix(s3api): make ListObjectsV1 namespaced and stop marker-echo pagination loops
* test(s3api): harden marker-echo coverage and align V1 encoding tag
* test(s3api): cover encoded marker matching and trim redundant setup
* refactor(s3api): tighten V1 list helper visibility and test mock docs
Fixes#8401
When creating balance/vacuum tasks, the worker maintenance scheduler was
accidentally discarding the custom grpcPort defined on the DataNodeInfo
by using just its HTTP Address string, which defaults to +10000
during grpc dialing.
By using pb.NewServerAddressFromDataNode, the grpcPort suffix is correctly
encoded in the server address string, preventing connection refused errors
for users running volume servers with custom gRPC ports.
* helm: refine openshift-values.yaml to remove hardcoded UIDs
Remove hardcoded runAsUser, runAsGroup, and fsGroup from the
openshift-values.yaml example. This allows OpenShift's admission
controller to automatically assign a valid UID from the namespace's
allocated range, avoiding "forbidden" errors when UID 1000 is
outside the permissible range.
Updates #8381, #8390.
* helm: fix volume.logs and add consistent security context comments
* Update README.md
* fix volume.fsck crashing on EC volumes and add multi-volume vacuum support
* address comments
* S3: Truncate timestamps to milliseconds for CopyObjectResult and CopyPartResult
Fixes#8394
* S3: Address nitpick comments in copy handlers
- Synchronize Mtime and LastModified by capturing time once\n- Optimize copyChunksForRange loop\n- Use built-in min/max\n- Remove dead previewLen code