* admin: seed admin_script plugin config from master maintenance scripts
When the admin server starts, fetch the maintenance scripts configuration
from the master via GetMasterConfiguration. If the admin_script plugin
worker does not already have a saved config, use the master's scripts as
the default value. This enables seamless migration from master.toml
[master.maintenance] to the admin script plugin worker.
Changes:
- Add maintenance_scripts and maintenance_sleep_minutes fields to
GetMasterConfigurationResponse in master.proto
- Populate the new fields from viper config in master_grpc_server.go
- On admin server startup, fetch the master config and seed the
admin_script plugin config if no config exists yet
- Strip lock/unlock commands from the master scripts since the admin
script worker handles locking automatically
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: address review comments on admin_script seeding
- Replace TOCTOU race (separate Load+Save) with atomic
SaveJobTypeConfigIfNotExists on ConfigStore and Plugin
- Replace ineffective polling loop with single GetMaster call using
30s context timeout, since GetMaster respects context cancellation
- Add unit tests for SaveJobTypeConfigIfNotExists (in-memory + on-disk)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: apply maintenance script defaults in gRPC handler
The gRPC handler for GetMasterConfiguration read maintenance scripts
from viper without calling SetDefault, relying on startAdminScripts
having run first. If the admin server calls GetMasterConfiguration
before startAdminScripts sets the defaults, viper returns empty
strings and the seeding is silently skipped.
Apply SetDefault in the gRPC handler itself so it is self-contained.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* Revert "fix: apply maintenance script defaults in gRPC handler"
This reverts commit 068a506330.
* fix: use atomic save in ensureJobTypeConfigFromDescriptor
ensureJobTypeConfigFromDescriptor used a separate Load + Save, racing
with seedAdminScriptFromMaster. If the descriptor defaults (empty
script) were saved first, SaveJobTypeConfigIfNotExists in the seeding
goroutine would see an existing config and skip, losing the master's
maintenance scripts.
Switch to SaveJobTypeConfigIfNotExists so both paths are atomic. Whichever
wins, the other is a safe no-op.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: fetch master scripts inline during config bootstrap, not in goroutine
Replace the seedAdminScriptFromMaster goroutine with a
ConfigDefaultsProvider callback. When the plugin bootstraps
admin_script defaults from the worker descriptor, it calls the
provider which fetches maintenance scripts from the master
synchronously. This eliminates the race between the seeding
goroutine and the descriptor-based config bootstrap.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* skip commented lock unlock
Co-Authored-By: Copilot <223556219+Copilot@users.noreply.github.com>
* reduce grpc calls
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* Add volume dir tags to topology
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* Add preferred tag config for EC
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* Prioritize EC destinations by tags
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* Add EC placement planner tag tests
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* Refactor EC placement tests to reuse buildActiveTopology
Remove buildActiveTopologyWithDiskTags helper function and consolidate
tag setup inline in test cases. Tests now use UpdateTopology to apply
tags after topology creation, reusing the existing buildActiveTopology
function rather than duplicating its logic.
All tag scenario tests pass:
- TestECPlacementPlannerPrefersTaggedDisks
- TestECPlacementPlannerFallsBackWhenTagsInsufficient
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* Consolidate normalizeTagList into shared util package
Extract normalizeTagList from three locations (volume.go,
detection.go, erasure_coding_handler.go) into new weed/util/tag.go
as exported NormalizeTagList function. Replace all duplicate
implementations with imports and calls to util.NormalizeTagList.
This improves code reuse and maintainability by centralizing
tag normalization logic.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* Add PreferredTags to EC config persistence
Add preferred_tags field to ErasureCodingTaskConfig protobuf with field
number 5. Update GetConfigSpec to include preferred_tags field in the
UI configuration schema. Add PreferredTags to ToTaskPolicy to serialize
config to protobuf. Add PreferredTags to FromTaskPolicy to deserialize
from protobuf with defensive copy to prevent external mutation.
This allows EC preferred tags to be persisted and restored across
worker restarts.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* Add defensive copy for Tags slice in DiskLocation
Copy the incoming tags slice in NewDiskLocation instead of storing
by reference. This prevents external callers from mutating the
DiskLocation.Tags slice after construction, improving encapsulation
and preventing unexpected changes to disk metadata.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* Add doc comment to buildCandidateSets method
Document the tiered candidate selection and fallback behavior. Explain
that for a planner with preferredTags, it accumulates disks matching
each tag in order into progressively larger tiers, emits a candidate
set once a tier reaches shardsNeeded, and finally falls back to the
full candidates set if preferred-tag tiers are insufficient.
This clarifies the intended semantics for future maintainers.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* Apply final PR review fixes
1. Update parseVolumeTags to replicate single tag entry to all folders
instead of leaving some folders with nil tags. This prevents nil
pointer dereferences when processing folders without explicit tags.
2. Add defensive copy in ToTaskPolicy for PreferredTags slice to match
the pattern used in FromTaskPolicy, preventing external mutation of
the returned TaskPolicy.
3. Add clarifying comment in buildCandidateSets explaining that the
shardsNeeded <= 0 branch is a defensive check for direct callers,
since selectDestinations guarantees shardsNeeded > 0.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* Fix nil pointer dereference in parseVolumeTags
Ensure all folder tags are initialized to either normalized tags or
empty slices, not nil. When multiple tag entries are provided and there
are more folders than entries, remaining folders now get empty slices
instead of nil, preventing nil pointer dereference in downstream code.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* Fix NormalizeTagList to return empty slice instead of nil
Change NormalizeTagList to always return a non-nil slice. When all tags
are empty or whitespace after normalization, return an empty slice
instead of nil. This prevents nil pointer dereferences in downstream
code that expects a valid (possibly empty) slice.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* Add nil safety check for v.tags pointer
Add a safety check to handle the case where v.tags might be nil,
preventing a nil pointer dereference. If v.tags is nil, use an empty
string instead. This is defensive programming to prevent panics in
edge cases.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* Add volume.tags flag to weed server and weed mini commands
Add the volume.tags CLI option to both the 'weed server' and 'weed mini'
commands. This allows users to specify disk tags when running the
combined server modes, just like they can with 'weed volume'.
The flag uses the same format and description as the volume command:
comma-separated tag groups per data dir with ':' separators
(e.g. fast:ssd,archive).
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
---------
Co-authored-by: Copilot <copilot@github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* iam: add XML responses for managed user policy APIs
* s3api: implement attach/detach/list attached user policies
* s3api: add embedded IAM tests for managed user policies
* iam: update CredentialStore interface and Manager for managed policies
Updated the `CredentialStore` interface to include `AttachUserPolicy`,
`DetachUserPolicy`, and `ListAttachedUserPolicies` methods.
The `CredentialManager` was updated to delegate these calls to the store.
Added common error variables for policy management.
* iam: implement managed policy methods in MemoryStore
Implemented `AttachUserPolicy`, `DetachUserPolicy`, and
`ListAttachedUserPolicies` in the MemoryStore.
Also ensured deep copying of identities includes PolicyNames.
* iam: implement managed policy methods in PostgresStore
Modified Postgres schema to include `policy_names` JSONB column in `users`.
Implemented `AttachUserPolicy`, `DetachUserPolicy`, and `ListAttachedUserPolicies`.
Updated user CRUD operations to handle policy names persistence.
* iam: implement managed policy methods in remaining stores
Implemented user policy management in:
- `FilerEtcStore` (partial implementation)
- `IamGrpcStore` (delegated via GetUser/UpdateUser)
- `PropagatingCredentialStore` (to broadcast updates)
Ensures cluster-wide consistency for policy attachments.
* s3api: refactor EmbeddedIamApi to use managed policy APIs
- Refactored `AttachUserPolicy`, `DetachUserPolicy`, and `ListAttachedUserPolicies`
to use `e.credentialManager` directly.
- Fixed a critical error suppression bug in `ExecuteAction` that always
returned success even on failure.
- Implemented robust error matching using string comparison fallbacks.
- Improved consistency by reloading configuration after policy changes.
* s3api: update and refine IAM integration tests
- Updated tests to use a real `MemoryStore`-backed `CredentialManager`.
- Refined test configuration synchronization using `sync.Once` and
manual deep-copying to prevent state corruption.
- Improved `extractEmbeddedIamErrorCodeAndMessage` to handle more XML
formats robustly.
- Adjusted test expectations to match current AWS IAM behavior.
* fix compilation
* visibility
* ensure 10 policies
* reload
* add integration tests
* Guard raft command registration
* Allow IAM actions in policy tests
* Validate gRPC policy attachments
* Revert Validate gRPC policy attachments
* Tighten gRPC policy attach/detach
* Improve IAM managed policy handling
* Improve managed policy filters
* filer: add default log purging to master maintenance scripts
* filer: fix default maintenance scripts to include full set of tasks
* filer: refactor maintenance scripts to avoid duplication
* Fix master leader election startup issue
Fixes #error-log-leader-not-selected-yet
* Fix master leader election startup issue
This change improves server address comparison using the 'Equals' method and handles recursion in topology leader lookup, resolving the 'leader not selected yet' error during master startup.
* Merge user improvements: use MaybeLeader for non-blocking checks
* not useful test
* Address code review: optimize Equals, fix deadlock in IsLeader, safe access in Leader
* fix multipart etag
* address comments
* clean up
* clean up
* optimization
* address comments
* unquoted etag
* dedup
* upgrade
* clean
* etag
* return quoted tag
* quoted etag
* debug
* s3api: unify ETag retrieval and quoting across handlers
Refactor newListEntry to take *S3ApiServer and use getObjectETag,
and update setResponseHeaders to use the same logic. This ensures
consistent ETags are returned for both listing and direct access.
* s3api: implement ListObjects deduplication for versioned buckets
Handle duplicate entries between the main path and the .versions
directory by prioritizing the latest version when bucket versioning
is enabled.
* s3api: cleanup stale main file entries during versioned uploads
Add explicit deletion of pre-existing "main" files when creating new
versions in versioned buckets. This prevents stale entries from
appearing in bucket listings and ensures consistency.
* s3api: fix cleanup code placement in versioned uploads
Correct the placement of rm calls in completeMultipartUpload and
putVersionedObject to ensure stale main files are properly deleted
during versioned uploads.
* s3api: improve getObjectETag fallback for empty ExtETagKey
Ensure that when ExtETagKey exists but contains an empty value,
the function falls through to MD5/chunk-based calculation instead
of returning an empty string.
* s3api: fix test files for new newListEntry signature
Update test files to use the new newListEntry signature where the
first parameter is *S3ApiServer. Created mockS3ApiServer to properly
test owner display name lookup functionality.
* s3api: use filer.ETag for consistent Md5 handling in getEtagFromEntry
Change getEtagFromEntry fallback to use filer.ETag(entry) instead of
filer.ETagChunks to ensure legacy entries with Attributes.Md5 are
handled consistently with the rest of the codebase.
* s3api: optimize list logic and fix conditional header logging
- Hoist bucket versioning check out of per-entry callback to avoid
repeated getVersioningState calls
- Extract appendOrDedup helper function to eliminate duplicate
dedup/append logic across multiple code paths
- Change If-Match mismatch logging from glog.Errorf to glog.V(3).Infof
and remove DEBUG prefix for consistency
* s3api: fix test mock to properly initialize IAM accounts
Fixed nil pointer dereference in TestNewListEntryOwnerDisplayName by
directly initializing the IdentityAccessManagement.accounts map in the
test setup. This ensures newListEntry can properly look up account
display names without panicking.
* cleanup
* s3api: remove premature main file cleanup in versioned uploads
Removed incorrect cleanup logic that was deleting main files during
versioned uploads. This was causing test failures because it deleted
objects that should have been preserved as null versions when
versioning was first enabled. The deduplication logic in listing is
sufficient to handle duplicate entries without deleting files during
upload.
* s3api: add empty-value guard to getEtagFromEntry
Added the same empty-value guard used in getObjectETag to prevent
returning quoted empty strings. When ExtETagKey exists but is empty,
the function now falls through to filer.ETag calculation instead of
returning "".
* s3api: fix listing of directory key objects with matching prefix
Revert prefix handling logic to use strings.TrimPrefix instead of
checking HasPrefix with empty string result. This ensures that when a
directory key object exactly matches the prefix (e.g. prefix="dir/",
object="dir/"), it is correctly handled as a regular entry instead of
being skipped or incorrectly processed as a common prefix. Also fixed
missing variable definition.
* s3api: refactor list inline dedup to use appendOrDedup helper
Refactored the inline deduplication logic in listFilerEntries to use the
shared appendOrDedup helper function. This ensures consistent behavior
and reduces code duplication.
* test: fix port allocation race in s3tables integration test
Updated startMiniCluster to find all required ports simultaneously using
findAvailablePorts instead of sequentially. This prevents race conditions
where the OS reallocates a port that was just released, causing multiple
services (e.g. Filer and Volume) to be assigned the same port and fail
to start.
* Add a version token on `GetState()`/`SetState()` RPCs for volume server states.
* Make state version a property ov `VolumeServerState` instead of an in-memory counter.
Also extend state atomicity to reads, instead of just writes.
Implement index (fast) scrubbing for regular/EC volumes via `ScrubVolume()`/`ScrubEcVolume()`.
Also rearranges existing index test files for reuse across unit tests for different modules.
* fix float stepping
* do not auto refresh
* only logs when non 200 status
* fix maintenance task sorting and cleanup redundant handler logic
* Refactor log retrieval to persist to disk and fix slowness
- Move log retrieval to disk-based persistence in GetMaintenanceTaskDetail
- Implement background log fetching on task completion in worker_grpc_server.go
- Implement async background refresh for in-progress tasks
- Completely remove blocking gRPC calls from the UI path to fix 10s timeouts
- Cleanup debug logs and performance profiling code
* Ensure consistent deterministic sorting in config_persistence cleanup
* Replace magic numbers with constants and remove debug logs
- Added descriptive constants for truncation limits and timeouts in admin_server.go and worker_grpc_server.go
- Replaced magic numbers with these constants throughout the codebase
- Verified removal of stdout debug printing
- Ensured consistent truncation logic during log persistence
* Address code review feedback on history truncation and logging logic
- Fix AssignmentHistory double-serialization by copying task in GetMaintenanceTaskDetail
- Fix handleTaskCompletion logging logic (mutually exclusive success/failure logs)
- Remove unused Timeout field from LogRequestContext and sync select timeouts with constants
- Ensure AssignmentHistory is only provided in the top-level field for better JSON structure
* Implement goroutine leak protection and request deduplication
- Add request deduplication in RequestTaskLogs to prevent multiple concurrent fetches for the same task
- Implement safe cleanup in timeout handlers to avoid race conditions in pendingLogRequests map
- Add a 10s cooldown for background log refreshes in GetMaintenanceTaskDetail to prevent spamming
- Ensure all persistent log-fetching goroutines are bounded and efficiently managed
* Fix potential nil pointer panics in maintenance handlers
- Add nil checks for adminServer in ShowTaskDetail, ShowMaintenanceWorkers, and UpdateTaskConfig
- Update getMaintenanceQueueData to return a descriptive error instead of nil when adminServer is uninitialized
- Ensure internal helper methods consistently check for adminServer initialization before use
* Strictly enforce disk-only log reading
- Remove background log fetching from GetMaintenanceTaskDetail to prevent timeouts and network calls during page view
- Remove unused lastLogFetch tracking fields to clean up dead code
- Ensure logs are only updated upon task completion via handleTaskCompletion
* Refactor GetWorkerLogs to read from disk
- Update /api/maintenance/workers/:id/logs endpoint to use configPersistence.LoadTaskExecutionLogs
- Remove synchronous gRPC call RequestTaskLogs to prevent timeouts and bad gateway errors
- Ensure consistent log retrieval behavior across the application (disk-only)
* Fix timestamp parsing in log viewer
- Update task_detail.templ JS to handle both ISO 8601 strings and Unix timestamps
- Fix "Invalid time value" error when displaying logs fetched from disk
- Regenerate templates
* master: fallback to HDD if SSD volumes are full in Assign
* worker: improve EC detection logging and fix skip counters
* worker: add Sync method to TaskLogger interface
* worker: implement Sync and ensure logs are flushed before task completion
* admin: improve task log retrieval with retries and better timeouts
* admin: robust timestamp parsing in task detail view
* Fix: Initialize filer CredentialManager with filer address
* The fix involves checking for directory existence before creation.
* adjust error message
* Fix: Implement FilerAddressSetter in PropagatingCredentialStore
* Refactor: Reorder credential manager initialization in filer server
* refactor
* Implement RPC skeleton for regular/EC volumes scrubbing.
See https://github.com/seaweedfs/seaweedfs/issues/8018 for details.
* Minor proto improvements for `ScrubVolume()`, `ScrubEcVolume()`:
- Add fields for scrubbing details in `ScrubVolumeResponse` and `ScrubEcVolumeResponse`,
instead of reporting these through RPC errors.
- Return a list of broken shards when scrubbing EC volumes, via `EcShardInfo'.
* Boostrap persistent state for volume servers.
This PR implements logic load/save persistent state information for storages
associated with volume servers, and reporting state changes back to masters
via heartbeat messages.
More work ensues!
See https://github.com/seaweedfs/seaweedfs/issues/7977 for details.
* Add volume server RPCs to read and update state flags.
* Boostrap persistent state for volume servers.
This PR implements logic load/save persistent state information for storages
associated with volume servers, and reporting state changes back to masters
via heartbeat messages.
More work ensues!
See https://github.com/seaweedfs/seaweedfs/issues/7977 for details.
* Block RPC operations writing to volume servers when maintenance mode is on.
* fix Filer startup failure due to JWT on / path #8149
- Comment out JWT keys in security.toml.example
- Revert Dockerfile.local change that enabled security by default
- Exempt GET/HEAD on / from JWT check for health checks
* refactor: simplify JWT bypass condition as per PR feedback
This ensures that metadata load correctly restores entries with their
original TTL instead of overwriting with the default from filer.conf.
Fixes#8159.
* Implement IAM propagation to S3 servers
- Add PropagatingCredentialStore to propagate IAM changes to S3 servers via gRPC
- Add Policy management RPCs to S3 proto and S3ApiServer
- Update CredentialManager to use PropagatingCredentialStore when MasterClient is available
- Wire FilerServer to enable propagation
* Implement parallel IAM propagation and fix S3 cluster registration
- Parallelized IAM change propagation with 10s timeout.
- Refined context usage in PropagatingCredentialStore.
- Added S3Type support to cluster node management.
- Enabled S3 servers to register with gRPC address to the master.
- Ensured IAM configuration reload after policy updates via gRPC.
* Optimize IAM propagation with direct in-memory cache updates
* Secure IAM propagation: Use metadata to skip persistence only on propagation
* pb: refactor IAM and S3 services for unidirectional IAM propagation
- Move SeaweedS3IamCache service from iam.proto to s3.proto.
- Remove legacy IAM management RPCs and empty SeaweedS3 service from s3.proto.
- Enforce that S3 servers only use the synchronization interface.
* pb: regenerate Go code for IAM and S3 services
Updated generated code following the proto refactoring of IAM synchronization services.
* s3api: implement read-only mode for Embedded IAM API
- Add readOnly flag to EmbeddedIamApi to reject write operations via HTTP.
- Enable read-only mode by default in S3ApiServer.
- Handle AccessDenied error in writeIamErrorResponse.
- Embed SeaweedS3IamCacheServer in S3ApiServer.
* credential: refactor PropagatingCredentialStore for unidirectional IAM flow
- Update to use s3_pb.SeaweedS3IamCacheClient for propagation to S3 servers.
- Propagate full Identity object via PutIdentity for consistency.
- Remove redundant propagation of specific user/account/policy management RPCs.
- Add timeout context for propagation calls.
* s3api: implement SeaweedS3IamCacheServer for unidirectional sync
- Update S3ApiServer to implement the cache synchronization gRPC interface.
- Methods (PutIdentity, RemoveIdentity, etc.) now perform direct in-memory cache updates.
- Register SeaweedS3IamCacheServer in command/s3.go.
- Remove registration for the legacy and now empty SeaweedS3 service.
* s3api: update tests for read-only IAM and propagation
- Added TestEmbeddedIamReadOnly to verify rejection of write operations in read-only mode.
- Update test setup to pass readOnly=false to NewEmbeddedIamApi in routing tests.
- Updated EmbeddedIamApiForTest helper with read-only checks matching production behavior.
* s3api: add back temporary debug logs for IAM updates
Log IAM updates received via:
- gRPC propagation (PutIdentity, PutPolicy, etc.)
- Metadata configuration reloads (LoadS3ApiConfigurationFromCredentialManager)
- Core identity management (UpsertIdentity, RemoveIdentity)
* IAM: finalize propagation fix with reduced logging and clarified architecture
* Allow configuring IAM read-only mode for S3 server integration tests
* s3api: add defensive validation to UpsertIdentity
* s3api: fix log message to reference correct IAM read-only flag
* test/s3/iam: ensure WaitForS3Service checks for IAM write permissions
* test: enable writable IAM in Makefile for integration tests
* IAM: add GetPolicy/ListPolicies RPCs to s3.proto
* S3: add GetBucketPolicy and ListBucketPolicies helpers
* S3: support storing generic IAM policies in IdentityAccessManagement
* S3: implement IAM policy RPCs using IdentityAccessManagement
* IAM: fix stale user identity on rename propagation
* Add IAM gRPC service definition
- Add GetConfiguration/PutConfiguration for config management
- Add CreateUser/GetUser/UpdateUser/DeleteUser/ListUsers for user management
- Add CreateAccessKey/DeleteAccessKey/GetUserByAccessKey for access key management
- Methods mirror existing IAM HTTP API functionality
* Add IAM gRPC handlers on filer server
- Implement IamGrpcServer with CredentialManager integration
- Handle configuration get/put operations
- Handle user CRUD operations
- Handle access key create/delete operations
- All methods delegate to CredentialManager for actual storage
* Wire IAM gRPC service to filer server
- Add CredentialManager field to FilerOption and FilerServer
- Import credential store implementations in filer command
- Initialize CredentialManager from credential.toml if available
- Register IAM gRPC service on filer gRPC server
- Enable credential management via gRPC alongside existing filer services
* Regenerate IAM protobuf with gRPC service methods
* fix: compilation error in DeleteUser
* fix: address code review comments for IAM migration
* feat: migrate policies to multi-file layout and fix identity duplicated content
* refactor: remove configuration.json and migrate Service Accounts to multi-file layout
* refactor: standardize Service Accounts as distinct store entities and fix Admin Server persistence
* config: set ServiceAccountsDirectory to /etc/iam/service_accounts
* Fix Chrome dialog auto-dismiss with Bootstrap modals
- Add modal-alerts.js library with Bootstrap modal replacements
- Replace all 15 confirm() calls with showConfirm/showDeleteConfirm
- Auto-override window.alert() for all alert() calls
- Fixes Chrome 132+ aggressively blocking native dialogs
* Upgrade Bootstrap from 5.3.2 to 5.3.8
* Fix syntax error in object_store_users.templ - remove duplicate closing braces
* create policy
* display errors
* migrate to multi-file policies
* address PR feedback: use showDeleteConfirm and showErrorMessage in policies.templ, refine migration check
* Update policies_templ.go
* add service account to iam grpc
* iam: fix potential path traversal in policy names by validating name pattern
* iam: add GetServiceAccountByAccessKey to CredentialStore interface
* iam: implement service account support for PostgresStore
Includes full CRUD operations and efficient lookup by access key.
* iam: implement GetServiceAccountByAccessKey for filer_etc, grpc, and memory stores
Provides efficient lookup of service accounts by access key where possible,
with linear scan fallbacks for file-based stores.
* iam: remove filer_multiple support
Deleted its implementation and references in imports, scaffold config,
and core interface constants. Redundant with filer_etc.
* clear comment
* dash: robustify service account construction
- Guard against nil sa.Credential when constructing responses
- Fix Expiration logic to only set if > 0, avoiding Unix epoch 1970
- Ensure consistency across Get, Create, and Update handlers
* credential/filer_etc: improve error propagation in configuration handlers
- Return error from loadServiceAccountsFromMultiFile to callers
- Ensure listEntries errors in SaveConfiguration (cleanup logic) are
propagated unless they are "not found" failures.
- Fixes potential silent failures during IAM configuration sync.
* credential/filer_etc: add existence check to CreateServiceAccount
Ensures consistency with other stores by preventing accidental overwrite
of existing service accounts during creation.
* credential/memory: improve store robustness and Reset logic
- Enforce ID immutability in UpdateServiceAccount to prevent orphans
- Update Reset() to also clear the policies map, ensuring full state
cleanup for tests.
* dash: improve service account robustness and policy docs
- Wrap parent user lookup errors to preserve context
- Strictly validate Status field in UpdateServiceAccount
- Add deprecation comments to legacy policy management methods
* credential/filer_etc: protect against path traversal in service accounts
Implemented ID validation (alphanumeric, underscores, hyphens) and applied
it to Get, Save, and Delete operations to ensure no directory traversal
via saId.json filenames.
* credential/postgres: improve robustness and cleanup comments
- Removed brainstorming comments in GetServiceAccountByAccessKey
- Added missing rows.Err() check during iteration
- Properly propagate Scan and Unmarshal errors instead of swallowing them
* admin: unify UI alerts and confirmations using Bootstrap modals
- Updated modal-alerts.js with improved automated alert type detection
- Replaced native alert() and confirm() with showAlert(), showConfirm(),
and showDeleteConfirm() across various Templ components
- Improved UX for delete operations by providing better context and styling
- Ensured consistent error reporting across IAM and Maintenance views
* admin: additional UI consistency fixes for alerts and confirmations
- Replaced native alert() and confirm() with Bootstrap modals in:
- EC volumes (repair flow)
- Collection details (repair flow)
- File browser (properties and delete)
- Maintenance config schema (save and reset)
- Improved delete confirmation in file browser with item context
- Ensured consistent success/error/info styling for all feedbacks
* make
* iam: add GetServiceAccountByAccessKey RPC and update GetConfiguration
* iam: implement GetServiceAccountByAccessKey on server and client
* iam: centralize policy and service account validation
* iam: optimize MemoryStore service account lookups with indexing
* iam: fix postgres service_accounts table and optimize lookups
* admin: refactor modal alerts and clean up dashboard logic
* admin: fix EC shards table layout mismatch
* admin: URL-encode IAM path parameters for safety
* admin: implement pauseWorker logic in maintenance view
* iam: add rows.Err() check to postgres ListServiceAccounts
* iam: standardize ErrServiceAccountNotFound across credential stores
* iam: map ErrServiceAccountNotFound to codes.NotFound in DeleteServiceAccount
* iam: refine service account store logic, errors and schema
* iam: add validation to GetServiceAccountByAccessKey
* admin: refine modal titles and ensure URL safety
* admin: address bot review comments for alerts and async usage
* iam: fix syntax error by restoring missing function declaration
* [FilerEtcStore] improve error handling in CreateServiceAccount
Refine error handling to provide clearer messages when checking for
existing service accounts.
* [PostgresStore] add nil guards and validation to service account methods
Ensure input parameters are not nil and required IDs are present
to prevent runtime panics and ensure data integrity.
* [JS] add shared IAM utility script
Consolidate common IAM operations like deleteUser and deleteAccessKey
into a shared utility script for better maintainability.
* [View] include shared IAM utilities in layout
Include iam-utils.js in the main layout to make IAM functions
available across all administrative pages.
* [View] refactor IAM logic and restore async in EC Shards view
Remove redundant local IAM functions and ensure that delete
confirmation callbacks are properly marked as async.
* [View] consolidate IAM logic in Object Store Users view
Remove redundant local definitions of deleteUser and deleteAccessKey,
relying on the shared utilities instead.
* [View] update generated templ files for UI consistency
* credential/postgres: remove redundant name column from service_accounts table
The id is already used as the unique identifier and was being copied to the name column.
This removes the name column from the schema and updates the INSERT/UPDATE queries.
* credential/filer_etc: improve logging for policy migration failures
Added Errorf log if AtomicRenameEntry fails during migration to ensure visibility of common failure points.
* credential: allow uppercase characters in service account ID username
Updated ServiceAccountIdPattern to allow [A-Za-z0-9_-]+ for the username component,
matching the actual service account creation logic which uses the parent user name directly.
* Update object_store_users_templ.go
* admin: fix ec_shards pagination to handle numeric page arguments
Updated goToPage in cluster_ec_shards.templ to accept either an Event
or a numeric page argument. This prevents errors when goToPage(1)
is called directly. Corrected both the .templ source and generated Go code.
* credential/filer_etc: improve service account storage robustness
Added nil guard to saveServiceAccount, updated GetServiceAccount
to return ErrServiceAccountNotFound for empty data, and improved
deleteServiceAccount to handle response-level Filer errors.
* Add IAM gRPC service definition
- Add GetConfiguration/PutConfiguration for config management
- Add CreateUser/GetUser/UpdateUser/DeleteUser/ListUsers for user management
- Add CreateAccessKey/DeleteAccessKey/GetUserByAccessKey for access key management
- Methods mirror existing IAM HTTP API functionality
* Add IAM gRPC handlers on filer server
- Implement IamGrpcServer with CredentialManager integration
- Handle configuration get/put operations
- Handle user CRUD operations
- Handle access key create/delete operations
- All methods delegate to CredentialManager for actual storage
* Wire IAM gRPC service to filer server
- Add CredentialManager field to FilerOption and FilerServer
- Import credential store implementations in filer command
- Initialize CredentialManager from credential.toml if available
- Register IAM gRPC service on filer gRPC server
- Enable credential management via gRPC alongside existing filer services
* Regenerate IAM protobuf with gRPC service methods
* iam_pb: add Policy Management to protobuf definitions
* credential: implement PolicyManager in credential stores
* filer: implement IAM Policy Management RPCs
* shell: add s3.policy command
* test: add integration test for s3.policy
* test: fix compilation errors in policy_test
* pb
* fmt
* test
* weed shell: add -policies flag to s3.configure
This allows linking/unlinking IAM policies to/from identities
directly from the s3.configure command.
* test: verify s3.configure policy linking and fix port allocation
- Added test case for linking policies to users via s3.configure
- Implemented findAvailablePortPair to ensure HTTP and gRPC ports
are both available, avoiding conflicts with randomized port assignments.
- Updated assertion to match jsonpb output (policyNames)
* credential: add StoreTypeGrpc constant
* credential: add IAM gRPC store boilerplate
* credential: implement identity methods in gRPC store
* credential: implement policy methods in gRPC store
* admin: use gRPC credential store for AdminServer
This ensures that all IAM and policy changes made through the Admin UI
are persisted via the Filer's IAM gRPC service instead of direct file manipulation.
* shell: s3.configure use granular IAM gRPC APIs instead of full config patching
* shell: s3.configure use granular IAM gRPC APIs
* shell: replace deprecated ioutil with os in s3.policy
* filer: use gRPC FailedPrecondition for unconfigured credential manager
* test: improve s3.policy integration tests and fix error checks
* ci: add s3 policy shell integration tests to github workflow
* filer: fix LoadCredentialConfiguration error handling
* credential/grpc: propagate unmarshal errors in GetPolicies
* filer/grpc: improve error handling and validation
* shell: use gRPC status codes in s3.configure
* credential: document PutPolicy as create-or-replace
* credential/postgres: reuse CreatePolicy in PutPolicy to deduplicate logic
* shell: add timeout context and strictly enforce flags in s3.policy
* iam: standardize policy content field naming in gRPC and proto
* shell: extract slice helper functions in s3.configure
* filer: map credential store errors to gRPC status codes
* filer: add input validation for UpdateUser and CreateAccessKey
* iam: improve validation in policy and config handlers
* filer: ensure IAM service registration by defaulting credential manager
* credential: add GetStoreName method to manager
* test: verify policy deletion in integration test
* Fix imbalance detection disk type grouping and volume grow errors
This PR addresses two issues:
1. Imbalance Detection: Previously, balance detection did not verify disk types, leading to false positives when comparing heterogenous nodes (e.g. SSD vs HDD). Logic is now updated to group volumes by DiskType before calculating imbalance.
2. Volume Grow Errors: Fixed a variable scope issue in master_grpc_server_volume.go and added a pre-check for available space to prevent 'only 0 volumes left' error logs when a disk type is full or abandoned.
Included units tests for the detection logic.
* Refactor balance detection loop into detectForDiskType
* Fix potential panic in volume grow logic by checking replica placement parse error
* Prevent split-brain: Persistent ClusterID and Join Validation
- Persist ClusterId in Raft store to survive restarts.
- Validate ClusterId on Raft command application (piggybacked on MaxVolumeId).
- Prevent masters with conflicting ClusterIds from joining/operating together.
- Update Telemetry to report the persistent ClusterId.
* Refine ClusterID validation based on feedback
- Improved error message in cluster_commands.go.
- Added ClusterId mismatch check in RaftServer.Recovery.
* Handle Raft errors and support Hashicorp Raft for ClusterId
- Check for errors when persisting ClusterId in legacy Raft.
- Implement ClusterId generation and persistence for Hashicorp Raft leader changes.
- Ensure consistent error logging.
* Refactor ClusterId validation
- Centralize ClusterId mismatch check in Topology.SetClusterId.
- Simplify MaxVolumeIdCommand.Apply and RaftServer.Recovery to rely on SetClusterId.
* Fix goroutine leak and add timeout
- Handle channel closure in Hashicorp Raft leader listener.
- Add timeout to Raft Apply call to prevent blocking.
* Fix deadlock in legacy Raft listener
- Wrap ClusterId generation/persistence in a goroutine to avoid blocking the Raft event loop (deadlock).
* Rename ClusterId to SystemId
- Renamed ClusterId to SystemId across the codebase (protobuf, topology, server, telemetry).
- Regenerated telemetry.pb.go with new field.
* Rename SystemId to TopologyId
- Rename to SystemId was intermediate step.
- Final name is TopologyId for the persistent cluster identifier.
- Updated protobuf, topology, raft server, master server, and telemetry.
* Optimize Hashicorp Raft listener
- Integrated TopologyId generation into existing monitorLeaderLoop.
- Removed extra goroutine in master_server.go.
* Fix optimistic TopologyId update
- Removed premature local state update of TopologyId in master_server.go and raft_hashicorp.go.
- State is now solely updated via the Raft state machine Apply/Restore methods after consensus.
* Add explicit log for recovered TopologyId
- Added glog.V(0) info log in RaftServer.Recovery to print the recovered TopologyId on startup.
* Add Raft barrier to prevent TopologyId race condition
- Implement ensureTopologyId helper method
- Send no-op MaxVolumeIdCommand to sync Raft log before checking TopologyId
- Ensures persisted TopologyId is recovered before generating new one
- Prevents race where generation happens during log replay
* Serialize TopologyId generation with mutex
- Add topologyIdGenLock mutex to MasterServer struct
- Wrap ensureTopologyId method with lock to prevent concurrent generation
- Fixes race where event listener and manual leadership check both generate IDs
- Second caller waits for first to complete and sees the generated ID
* Add TopologyId recovery logging to Apply method
- Change log level from V(1) to V(0) for visibility
- Log 'Recovered TopologyId' when applying from Raft log
- Ensures recovery is visible whether from snapshot or log replay
- Matches Recovery() method logging for consistency
* Fix Raft barrier timing issue
- Add 100ms delay after barrier command to ensure log application completes
- Add debug logging to track barrier execution and TopologyId state
- Return early if barrier command fails
- Prevents TopologyId generation before old logs are fully applied
* ensure leader
* address comments
* address comments
* redundant
* clean up
* double check
* refactoring
* comment
* Fix remote.meta.sync TTL issue (#8021)
Remote entries should not have TTL applied because they represent files
in remote storage, not local SeaweedFS files. When TTL was configured on
a prefix, remote.meta.sync would create entries that immediately expired,
causing them to be deleted and recreated on each sync.
Changes:
- Set TtlSec=0 explicitly when creating remote entries in remote.meta.sync
- Skip TTL application in CreateEntry handler for entries with Remote field set
Fixes#8021
* Add TTL protection for remote entries in update path
- Set TtlSec=0 in doSaveRemoteEntry before calling UpdateEntry
- Add server-side TTL protection in UpdateEntry handler for remote entries
- Ensures remote entries don't inherit or preserve TTL when updated
* Implement optional path-prefix and method scoping for Filer HTTP JWT
* Fix security vulnerability and improve test error handling
* Address PR feedback: replace debug logging and improve tests
* Use URL.Path in logs to avoid leaking query params
This PR implements logic load/save persistent state information for storages
associated with volume servers, and reporting state changes back to masters
via heartbeat messages.
More work ensues!
See https://github.com/seaweedfs/seaweedfs/issues/7977 for details.
* chore: execute goimports to format the code
Signed-off-by: promalert <promalert@outlook.com>
* goimports -w .
---------
Signed-off-by: promalert <promalert@outlook.com>
Co-authored-by: Chris Lu <chris.lu@gmail.com>
* fix: EC UI template error when viewing shard details
Fixed field name mismatch in volume.html where it was using .ShardDetails
instead of .Shards. Added a robust type conversion wrapper in templates.go
to handle int64 to uint64 conversion for bytesToHumanReadable.
Added regression test to ensure future stability.
* refactor: improve bytesToHumanReadable and test robustness
- Handled more integer types (uint32, int32, uint) in bytesToHumanReadable.
- Improved volume_test.go to verify both shards are formatted correctly.
* refactor: add bounds checking to bytesToHumanReadable
Added checks for negative values in signed integer types to avoid incorrect
formatting when converting to uint64.
Addressed feedback from coderabbitai.
* fix: use keyed fields in struct literals
- Replace unsafe reflect.StringHeader/SliceHeader with safe unsafe.String/Slice (weed/query/sqltypes/unsafe.go)
- Add field names to Type_ScalarType struct literals (weed/mq/schema/schema_builder.go)
- Add Duration field name to FlexibleDuration struct literals across test files
- Add field names to bson.D struct literals (weed/filer/mongodb/mongodb_store_kv.go)
Fixes go vet warnings about unkeyed struct literals.
* fix: remove unreachable code
- Remove unreachable return statements after infinite for loops
- Remove unreachable code after if/else blocks where all paths return
- Simplify recursive logic by removing unnecessary for loop (inode_to_path.go)
- Fix Type_ScalarType literal to use enum value directly (schema_builder.go)
- Call onCompletionFn on stream error (subscribe_session.go)
Files fixed:
- weed/query/sqltypes/unsafe.go
- weed/mq/schema/schema_builder.go
- weed/mq/client/sub_client/connect_to_sub_coordinator.go
- weed/filer/redis3/ItemList.go
- weed/mq/client/agent_client/subscribe_session.go
- weed/mq/broker/broker_grpc_pub_balancer.go
- weed/mount/inode_to_path.go
- weed/util/skiplist/name_list.go
* fix: avoid copying lock values in protobuf messages
- Use proto.Merge() instead of direct assignment to avoid copying sync.Mutex in S3ApiConfiguration (iamapi_server.go)
- Add explicit comments noting that channel-received values are already copies before taking addresses (volume_grpc_client_to_master.go)
The protobuf messages contain sync.Mutex fields from the message state, which should not be copied.
Using proto.Merge() properly merges messages without copying the embedded mutex.
* fix: correct byte array size for uint32 bit shift operations
The generateAccountId() function only needs 4 bytes to create a uint32 value.
Changed from allocating 8 bytes to 4 bytes to match the actual usage.
This fixes go vet warning about shifting 8-bit values (bytes) by more than 8 bits.
* fix: ensure context cancellation on all error paths
In broker_client_subscribe.go, ensure subscriberCancel() is called on all error return paths:
- When stream creation fails
- When partition assignment fails
- When sending initialization message fails
This prevents context leaks when an error occurs during subscriber creation.
* fix: ensure subscriberCancel called for CreateFreshSubscriber stream.Send error
Ensure subscriberCancel() is called when stream.Send fails in CreateFreshSubscriber.
* ci: add go vet step to prevent future lint regressions
- Add go vet step to GitHub Actions workflow
- Filter known protobuf lock warnings (MessageState sync.Mutex)
These are expected in generated protobuf code and are safe
- Prevents accumulation of go vet errors in future PRs
- Step runs before build to catch issues early
* fix: resolve remaining syntax and logic errors in vet fixes
- Fixed syntax errors in filer_sync.go caused by missing closing braces
- Added missing closing brace for if block and function
- Synchronized fixes to match previous commits on branch
* fix: add missing return statements to daemon functions
- Add 'return false' after infinite loops in filer_backup.go and filer_meta_backup.go
- Satisfies declared bool return type signatures
- Maintains consistency with other daemon functions (runMaster, runFilerSynchronize, runWorker)
- While unreachable, explicitly declares the return satisfies function signature contract
* fix: add nil check for onCompletionFn in SubscribeMessageRecord
- Check if onCompletionFn is not nil before calling it
- Prevents potential panic if nil function is passed
- Matches pattern used in other callback functions
* docs: clarify unreachable return statements in daemon functions
- Add comments documenting that return statements satisfy function signature
- Explains that these returns follow infinite loops and are unreachable
- Improves code clarity for future maintainers