* Fix ec.rebuild failing on unrepairable volumes instead of skipping them
When an EC volume has fewer shards than DataShardsCount, ec.rebuild would
return an error and abort the entire operation. Now it logs a warning and
continues rebuilding the remaining volumes.
Fixes#8630
* Remove duplicate volume ID in unrepairable log message
---------
Co-authored-by: Copilot <copilot@github.com>
* feat(vacuum): add volume state, location, and enhanced collection filters
Align the vacuum handler's admin config with the balance handler by adding:
- volume_state filter (ALL/ACTIVE/FULL) to scope vacuum to writable or
read-only volumes
- data_center_filter, rack_filter, node_filter to scope vacuum to
specific infrastructure locations
- Enhanced collection_filter description matching the balance handler's
ALL_COLLECTIONS/EACH_COLLECTION/regex modes
The new filters reuse filterMetricsByVolumeState() and
filterMetricsByLocation() already defined in the same package.
* use wildcard matchers for DC/rack/node filters
Replace exact-match and CSV set lookups with wildcard matching
from util/wildcard package. Patterns like "dc*", "rack-1?", or
"node-a*" are now supported in all location filter fields for
both balance and vacuum handlers.
* add nil guard in filterMetricsByLocation
* feat(plugin): make page tabs and sub-tabs addressable by URLs
Update the plugin page so that clicking tabs and sub-tabs pushes
browser history via history.pushState(), enabling bookmarkable URLs,
browser back/forward navigation, and shareable links.
URL mapping:
- /plugin → Overview tab
- /plugin/configuration → Configuration sub-tab
- /plugin/detection → Job Detection sub-tab
- /plugin/queue → Job Queue sub-tab
- /plugin/execution → Job Execution sub-tab
Job-type-specific URLs use the ?job= query parameter (e.g.,
/plugin/configuration?job=vacuum) so that a specific job type tab
is pre-selected on page load.
Changes:
- Add initialJob parameter to Plugin() template and handler
- Extract ?job= query param in renderPluginPage handler
- Add buildPluginURL/updateURL helpers in JavaScript
- Push history state on top-tab, sub-tab, and job-type clicks
- Listen for popstate to restore tab state on back/forward
- Replace initial history entry on page load via replaceState
* make popstate handler async with proper error handling
Await loadDescriptorAndConfig so data loading completes before
rendering dependent views. Log errors instead of silently
swallowing them.
* feat: auto-disable master vacuum when plugin vacuum worker is active
When a vacuum-capable plugin worker connects to the admin server, the
admin server calls DisableVacuum on the master to prevent the automatic
scheduled vacuum from conflicting with the plugin worker's vacuum. When
the worker disconnects, EnableVacuum is called to restore the default
behavior. A safety net in the topology refresh loop re-enables vacuum
if the admin server disconnects without cleanup.
* rename isAdminServerConnected to isAdminServerConnectedFunc
* add 5s timeout to DisableVacuum/EnableVacuum gRPC calls
Prevents the monitor goroutine from blocking indefinitely if the
master is unresponsive.
* track plugin ownership of vacuum disable to avoid overriding operator
- Add vacuumDisabledByPlugin flag to Topology, set when DisableVacuum
is called while admin server is connected (i.e., by plugin monitor)
- Safety net only re-enables vacuum when it was disabled by plugin,
not when an operator intentionally disabled it via shell command
- EnableVacuum clears the plugin flag
* extract syncVacuumState for testability, add fake toggler tests
Extract the single sync step into syncVacuumState() with a
vacuumToggler interface. Add TestSyncVacuumState with a fake
toggler that verifies disable/enable calls on state transitions.
* use atomic.Bool for isDisableVacuum and vacuumDisabledByPlugin
Both fields are written by gRPC handlers and read by the vacuum
goroutine, causing a data race. Use atomic.Bool with Store/Load
for thread-safe access.
* use explicit by_plugin field instead of connection heuristic
Add by_plugin bool to DisableVacuumRequest proto so the caller
declares intent explicitly. The admin server monitor sets it to
true; shell commands leave it false. This prevents an operator's
intentional disable from being auto-reversed by the safety net.
* use setter for admin server callback instead of function parameter
Move isAdminServerConnected from StartRefreshWritableVolumes
parameter to Topology.SetAdminServerConnectedFunc() setter.
Keeps the function signature stable and decouples the topology
layer from the admin server concept.
* suppress repeated log messages on persistent sync failures
Add retrying parameter to syncVacuumState so the initial
state transition is logged at V(0) but subsequent retries
of the same transition are silent until the call succeeds.
* clear plugin ownership flag on manual DisableVacuum
Prevents stale plugin flag from causing incorrect auto-enable
when an operator manually disables vacuum after a plugin had
previously disabled it.
* add by_plugin to EnableVacuumRequest for symmetric ownership tracking
Plugin-driven EnableVacuum now only re-enables if the plugin was
the one that disabled it. If an operator manually disabled vacuum
after the plugin, the plugin's EnableVacuum is a no-op. This
prevents the plugin monitor from overriding operator intent on
worker disconnect.
* use cancellable context for monitorVacuumWorker goroutine
Replace context.Background() with a cancellable context stored
as bgCancel on AdminServer. Shutdown() calls bgCancel() so
monitorVacuumWorker exits cleanly via ctx.Done().
* track operator and plugin vacuum disables independently
Replace single isDisableVacuum flag with two independent flags:
vacuumDisabledByOperator and vacuumDisabledByPlugin. Each caller
only flips its own flag. The effective disabled state is the OR
of both. This prevents a plugin connect/disconnect cycle from
overriding an operator's manual disable, and vice versa.
* fix safety net to clear plugin flag, not operator flag
The safety net should call EnableVacuumByPlugin() to clear only
the plugin disable flag when the admin server disconnects. The
previous call to EnableVacuum() incorrectly cleared the operator
flag instead.
* feat(balance): add replica placement validation for volume moves
When the volume balance detection proposes moving a volume, validate
that the move does not violate the volume's replication policy (e.g.,
ReplicaPlacement=010 requires replicas on different racks). If the
preferred destination violates the policy, fall back to score-based
planning; if that also violates, skip the volume entirely.
- Add ReplicaLocation type and VolumeReplicaMap to ClusterInfo
- Build replica map from all volumes before collection filtering
- Port placement validation logic from command_volume_fix_replication.go
- Thread replica map through collectVolumeMetrics call chain
- Add IsGoodMove check in createBalanceTask before destination use
* address PR review: extract validation closure, add defensive checks
- Extract validateMove closure to eliminate duplicated ReplicaLocation
construction and IsGoodMove calls
- Add defensive check for empty replica map entries (len(replicas) == 0)
- Add bounds check for int-to-byte cast on ExpectedReplicas (0-255)
* address nitpick: rp test helper accepts *testing.T and fails on error
Prevents silent failures from typos in replica placement codes.
* address review: add composite replica placement tests (011, 110)
Test multi-constraint placement policies where both rack and DC
rules must be satisfied simultaneously.
* address review: use struct keys instead of string concatenation
Replace string-concatenated map keys with typed rackKey/nodeKey
structs to eliminate allocations and avoid ambiguity if IDs
contain spaces.
* address review: simplify bounds check, log fallback error, guard source
- Remove unreachable ExpectedReplicas < 0 branch (outer condition
already guarantees > 0), fold bounds check into single condition
- Log error from planBalanceDestination in replica validation fallback
- Return false from IsGoodMove when sourceNodeID not found in
existing replicas (inconsistent cluster state)
* address review: use slices.Contains instead of hand-rolled helpers
Replace isAmongDC and isAmongRack with slices.Contains from the
standard library, reducing boilerplate.
* feat(plugin): add DC/rack/node filtering for volume balance detection
Add scoping filters so balance detection can be limited to specific data
centers, racks, or nodes. Filters are applied both at the metrics level
(in the handler) and at the topology seeding level (in detection) to
ensure only the targeted infrastructure participates in balancing.
* address PR review: use set lookups, deduplicate test helpers, add target checks
* address review: assert non-empty tasks in filter tests
Prevent vacuous test passes by requiring len(tasks) > 0
before checking source/target exclusions.
* address review: enforce filter scope in fallback, clarify DC filter
- Thread allowedServers into createBalanceTask so the fallback
planner cannot produce out-of-scope targets when DC/rack/node
filters are active
- Update data_center_filter description to clarify single-DC usage
* address review: centralize parseCSVSet, fix filter scope leak, iterate all targets
- Extract ParseCSVSet to shared weed/worker/tasks/util package,
remove duplicates from detection.go and volume_balance_handler.go
- Fix metric accumulation re-introducing filtered-out servers by
only counting metrics for servers that passed DC/rack/node filters
- Trim DataCenterFilter before matching to handle trailing spaces
- Iterate all task.TypedParams.Targets in filter tests, not just [0]
* remove useless descriptor string test
* feat(plugin): enhanced collection filtering for volume balance
Replace wildcard matching with three collection filter modes:
- ALL_COLLECTIONS (default): treat all volumes as one pool
- EACH_COLLECTION: run detection separately per collection
- Regex pattern: filter volumes by matching collection names
The EACH_COLLECTION mode extracts distinct collections from metrics
and calls Detection() per collection, sharing the maxResults budget
and clusterInfo (with ActiveTopology) across all calls.
* address PR review: fix wildcard→regexp replacement, optimize EACH_COLLECTION
* address nitpick: fail fast on config errors (invalid regex)
Add configError type so invalid collection_filter regex returns
immediately instead of retrying across all masters with the same
bad config. Transient errors still retry.
* address review: constants, unbounded maxResults, wildcard compat
- Define collectionFilterAll/collectionFilterEach constants to
eliminate magic strings across handler and metrics code
- Fix EACH_COLLECTION budget loop to treat maxResults <= 0 as
unbounded, matching Detection's existing semantics
- Treat "*" as ALL_COLLECTIONS for backward compat with wildcard
* address review: nil guard in EACH_COLLECTION grouping loop
* remove useless descriptor string test
* feat(balance): add volume state filter (ALL/ACTIVE/FULL)
Add a volume_state admin config field to the plugin worker volume balance
handler, matching the shell's -volumeBy flag. This allows filtering volumes
by state before balance detection:
- ALL (default): consider all volumes
- ACTIVE: only writable volumes below the size limit (FullnessRatio < 1.01)
- FULL: only read-only volumes above the size limit (FullnessRatio >= 1.01)
The 1.01 threshold mirrors the shell's thresholdVolumeSize constant.
* address PR review: use enum/select widget, switch-based filter, nil safety
- Change volume_state field from string/text to enum/select with
dropdown options (ALL, ACTIVE, FULL)
- Refactor filterMetricsByVolumeState to use switch with predicate
function for clearer extensibility
- Add nil-check guard to prevent panic on nil metric elements
- Add TestFilterMetricsByVolumeState_NilElement regression test
* feat(filer): add lazy directory listing for remote mounts
Directory listings on remote mounts previously only queried the local
filer store. With lazy mounts the listing was empty; with eager mounts
it went stale over time.
Add on-demand directory listing that fetches from remote and caches
results with a 5-minute TTL:
- Add `ListDirectory` to `RemoteStorageClient` interface (delimiter-based,
single-level listing, separate from recursive `Traverse`)
- Implement in S3, GCS, and Azure backends using each platform's
hierarchical listing API
- Add `maybeLazyListFromRemote` to filer: before each directory listing,
check if the directory is under a remote mount with an expired cache,
fetch from remote, persist entries to the local store, then let existing
listing logic run on the populated store
- Use singleflight to deduplicate concurrent requests for the same directory
- Skip local-only entries (no RemoteEntry) to avoid overwriting unsynced uploads
- Errors are logged and swallowed (availability over consistency)
* refactor: extract xattr key to constant xattrRemoteListingSyncedAt
* feat: make listing cache TTL configurable per mount via listing_cache_ttl_seconds
Add listing_cache_ttl_seconds field to RemoteStorageLocation protobuf.
When 0 (default), lazy directory listing is disabled for that mount.
When >0, enables on-demand directory listing with the specified TTL.
Expose as -listingCacheTTL flag on remote.mount command.
* refactor: address review feedback for lazy directory listing
- Add context.Context to ListDirectory interface and all implementations
- Capture startTime before remote call for accurate TTL tracking
- Simplify S3 ListDirectory using ListObjectsV2PagesWithContext
- Make maybeLazyListFromRemote return void (errors always swallowed)
- Remove redundant trailing-slash path manipulation in caller
- Update tests to match new signatures
* When an existing entry has Remote != nil, we should merge remote metadata into it rather than replacing it.
* fix(gcs): wrap ListDirectory iterator error with context
The raw iterator error was returned without bucket/path context,
making it harder to debug. Wrap it consistently with the S3 pattern.
* fix(s3): guard against nil pointer dereference in Traverse and ListDirectory
Some S3-compatible backends may return nil for LastModified, Size, or
ETag fields. Check for nil before dereferencing to prevent panics.
* fix(filer): remove blanket 2-minute timeout from lazy listing context
Individual SDK operations (S3, GCS, Azure) already have per-request
timeouts and retry policies. The blanket timeout could cut off large
directory listings mid-operation even though individual pages were
succeeding.
* fix(filer): preserve trace context in lazy listing with WithoutCancel
Use context.WithoutCancel(ctx) instead of context.Background() so
trace/span values from the incoming request are retained for
distributed tracing, while still decoupling cancellation.
* fix(filer): use Store.FindEntry for internal lookups, add Uid/Gid to files, fix updateDirectoryListingSyncedAt
- Use f.Store.FindEntry instead of f.FindEntry for staleness check and
child lookups to avoid unnecessary lazy-fetch overhead
- Set OS_UID/OS_GID on new file entries for consistency with directories
- In updateDirectoryListingSyncedAt, use Store.UpdateEntry for existing
directories instead of CreateEntry to avoid deleteChunksIfNotNew and
NotifyUpdateEvent side effects
* fix(filer): distinguish not-found from store errors in lazy listing
Previously, any error from Store.FindEntry was treated as "not found,"
which could cause entry recreation/overwrite on transient DB failures.
Now check for filer_pb.ErrNotFound explicitly and skip entries or
bail out on real store errors.
* refactor(filer): use errors.Is for ErrNotFound comparisons
* fix(helm): trim whitespace before s3 TLS args to prevent command breakage (#8613)
When global.enableSecurity is enabled, the `{{ include }}` call for
s3 TLS args lacked the leading dash (`{{-`), producing an extra blank
line in the rendered shell command. This broke shell continuation and
caused the filer (and s3/all-in-one) to crash because arguments after
the blank line were silently dropped.
* ci(helm): assert no blank lines in security+S3 command blocks
Renders the chart with global.enableSecurity=true and S3 enabled for
normal mode (filer + s3 deployments) and all-in-one mode, then parses
every /bin/sh -ec command block and fails if any contains blank lines.
This catches the whitespace regression from #8613 where a missing {{-
dash on the seaweedfs.s3.tlsArgs include produced a blank line that
broke shell continuation.
* ci(helm): enable S3 in all-in-one security render test
The s3.tlsArgs include is gated by allInOne.s3.enabled, so without
this flag the all-in-one command block wasn't actually exercising the
TLS args path.
* feat(remote): add -noSync flag to skip upfront metadata pull on mount
Made-with: Cursor
* refactor(remote): split mount setup from metadata sync
Extract ensureMountDirectory for create/validate; call pullMetadata
directly when sync is needed. Caller controls sync step for -noSync.
Made-with: Cursor
* fix(remote): validate mount root when -noSync so bad bucket/creds fail fast
When -noSync is used, perform a cheap remote check (ListBuckets and
verify bucket exists) instead of skipping all remote I/O. Invalid
buckets or credentials now fail at mount time.
Made-with: Cursor
* test(remote): add TestRemoteMountNoSync for -noSync mount and persisted mapping
Made-with: Cursor
* test(remote): assert no upfront metadata after -noSync mount
After remote.mount -noSync, run fs.ls on the mount dir and assert empty
listing so the test fails if pullMetadata was invoked eagerly.
Made-with: Cursor
* fix(remote): propagate non-ErrNotFound lookup errors in ensureMountDirectory
Return lookupErr immediately for any LookupDirectoryEntry failure that
is not filer_pb.ErrNotFound, so only the not-found case creates the
entry and other lookup failures are reported to the caller.
Made-with: Cursor
* fix(remote): use errors.Is for ErrNotFound in ensureMountDirectory
Replace fragile strings.Contains(lookupErr.Error(), ...) with
errors.Is(lookupErr, filer_pb.ErrNotFound) before calling CreateEntry.
Made-with: Cursor
* fix(remote): use LookupEntry so ErrNotFound is recognised after gRPC
Raw gRPC LookupDirectoryEntry returns a status error, not the sentinel,
so errors.Is(lookupErr, filer_pb.ErrNotFound) was always false. Use
filer_pb.LookupEntry which normalises not-found to ErrNotFound so the
mount directory is created when missing.
Made-with: Cursor
* test(remote): ignore weed shell banner in TestRemoteMountNoSync fs.ls count
Exclude master/filer and prompt lines from entry count so the assertion
checks only actual fs.ls output for empty -noSync mount.
Made-with: Cursor
* fix(remote.mount): use 0755 for mount dir, document bucket-less early return
Made-with: Cursor
* feat(remote.mount): replace -noSync with -metadataStrategy=lazy|eager
- Add -metadataStrategy flag (eager default, lazy skips upfront metadata pull)
- Accept lazy/eager case-insensitively; reject invalid values with clear error
- Rename TestRemoteMountNoSync to TestRemoteMountMetadataStrategyLazy
- Add TestRemoteMountMetadataStrategyEager and TestRemoteMountMetadataStrategyInvalid
Made-with: Cursor
* fix(remote.mount): validate strategy and remote before creating mount directory
Move strategy validation and validateMountRoot (lazy path) before
ensureMountDirectory so that invalid strategies or bad bucket/credentials
fail without leaving orphaned directory entries in the filer.
* refactor(remote.mount): remove unused remote param from ensureMountDirectory
The remote *RemoteStorageLocation parameter was left over from the old
syncMetadata signature. Only remoteConf.Name is used inside the function.
* doc(remote.mount): add TODO for HeadBucket-style validation
validateMountRoot currently lists all buckets to verify one exists.
Note the need for a targeted BucketExists method in the interface.
* refactor(remote.mount): use MetadataStrategy type and constants
Replace raw string comparisons with a MetadataStrategy type and
MetadataStrategyEager/MetadataStrategyLazy constants for clarity
and compile-time safety.
* refactor(remote.mount): rename MetadataStrategy to MetadataCacheStrategy
More precisely describes the purpose: controlling how metadata is
cached from the remote, not metadata handling in general.
* fix(remote.mount): remove validateMountRoot from lazy path
Lazy mount's purpose is to skip remote I/O. Validating via ListBuckets
contradicts that, especially on accounts with many buckets. Invalid
buckets or credentials will surface on first lazy access instead.
* fix(test): handle shell exit 0 in TestRemoteMountMetadataStrategyInvalid
The weed shell process exits with code 0 even when individual commands
fail — errors appear in stdout. Check output instead of requiring a
non-nil error.
* test(remote.mount): remove metadataStrategy shell integration tests
These tests only verify string output from a shell process that always
exits 0 — they cannot meaningfully validate eager vs lazy behavior
without a real remote backend.
---------
Co-authored-by: Chris Lu <chris.lu@gmail.com>
* filer: propagate lazy metadata deletes to remote mounts
Delete operations now call the remote backend for mounted remote-only entries before removing filer metadata, keeping remote state aligned and preserving retry semantics on remote failures.
Made-with: Cursor
* filer: harden remote delete metadata recovery
Persist remote-delete metadata pendings so local entry removal can be retried after failures, and return explicit errors when remote client resolution fails to prevent silent local-only deletes.
Made-with: Cursor
* filer: streamline remote delete client lookup and logging
Avoid a redundant mount trie traversal by resolving the remote client directly from the matched mount location, and add parity logging for successful remote directory deletions.
Made-with: Cursor
* filer: harden pending remote metadata deletion flow
Retry pending-marker writes before local delete, fail closed when marking cannot be persisted, and start remote pending reconciliation only after the filer store is initialised to avoid nil store access.
Made-with: Cursor
* filer: avoid lazy fetch in pending metadata reconciliation
Use a local-only entry lookup during pending remote metadata reconciliation so cache misses do not trigger remote lazy fetches.
Made-with: Cursor
* filer: serialise concurrent index read-modify-write in pending metadata deletion
Add remoteMetadataDeletionIndexMu to Filer and acquire it for the full
read→mutate→commit sequence in markRemoteMetadataDeletionPending and
clearRemoteMetadataDeletionPending, preventing concurrent goroutines
from overwriting each other's index updates.
Made-with: Cursor
* filer: start remote deletion reconciliation loop in NewFiler
Move the background goroutine for pending remote metadata deletion
reconciliation from SetStore (where it was gated by sync.Once) to
NewFiler alongside the existing loopProcessingDeletion goroutine.
The sync.Once approach was problematic: it buried a goroutine launch
as a side effect of a setter, was unrecoverable if the goroutine
panicked, could race with store initialisation, and coupled its
lifecycle to unrelated shutdown machinery. The existing nil-store
guard in reconcilePendingRemoteMetadataDeletions handles the window
before SetStore is called.
* filer: skip remote delete for replicated deletes from other filers
When isFromOtherCluster is true the delete was already propagated to
the remote backend by the originating filer. Repeating the remote
delete on every replica doubles API calls, and a transient remote
failure on the replica would block local metadata cleanup — leaving
filers inconsistent.
* filer: skip pending marking for directory remote deletes
Directory remote deletes are idempotent and do not need the
pending/reconcile machinery that was designed for file deletes where
the local metadata delete might fail after the remote object is
already removed.
* filer: propagate remote deletes for children in recursive folder deletion
doBatchDeleteFolderMetaAndData iterated child files but only called
NotifyUpdateEvent and collected chunks — it never called
maybeDeleteFromRemote for individual children. This left orphaned
objects in the remote backend when a directory containing remote-only
files was recursively deleted.
Also fix isFromOtherCluster being hardcoded to false in the recursive
call to doBatchDeleteFolderMetaAndData for subdirectories.
* filer: simplify pending remote deletion tracking to single index key
Replace the double-bookkeeping scheme (individual KV entry per path +
newline-delimited index key) with a single index key that stores paths
directly. This removes the per-path KV writes/deletes, the base64
encoding round-trip, and the transaction overhead that was only needed
to keep the two representations in sync.
* filer: address review feedback on remote deletion flow
- Distinguish missing remote config from client initialization failure
in maybeDeleteFromRemote error messages.
- Use a detached context (30s timeout) for pending-mark and
pending-clear KV writes so they survive request cancellation after
the remote object has already been deleted.
- Emit NotifyUpdateEvent in reconcilePendingRemoteMetadataDeletions
after a successful retry deletion so downstream watchers and replicas
learn about the eventual metadata removal.
* filer: remove background reconciliation for pending remote deletions
The pending-mark/reconciliation machinery (KV index, mutex, background
loop, detached contexts) handled the narrow case where the remote
object was deleted but the subsequent local metadata delete failed.
The client already receives the error and can retry — on retry the
remote not-found is treated as success and the local delete proceeds
normally. The added complexity (and new edge cases around
NotifyUpdateEvent, multi-filer consistency during reconciliation, and
context lifetime) is not justified for a transient store failure the
caller already handles.
Remove: loopProcessingRemoteMetadataDeletionPending,
reconcilePendingRemoteMetadataDeletions, markRemoteMetadataDeletionPending,
clearRemoteMetadataDeletionPending, listPendingRemoteMetadataDeletionPaths,
encodePendingRemoteMetadataDeletionIndex, FindEntryLocal, and all
associated constants, fields, and test infrastructure.
* filer: fix test stubs and add early exit on child remote delete error
- Refactor stubFilerStore to release lock before invoking callbacks and
propagate callback errors, preventing potential deadlocks in tests
- Implement ListDirectoryPrefixedEntries with proper prefix filtering
instead of delegating to the unfiltered ListDirectoryEntries
- Add continue after setting err on child remote delete failure in
doBatchDeleteFolderMetaAndData to skip further processing of the
failed entry
* filer: propagate child remote delete error instead of silently continuing
Replace `continue` with early `break` when maybeDeleteFromRemote fails
for a child entry during recursive folder deletion. The previous
`continue` skipped the error check at the end of the loop body, so a
subsequent successful entry would overwrite err and the remote delete
error was silently lost. Now the loop breaks, the existing error check
returns the error, and NotifyUpdateEvent / chunk collection are
correctly skipped for the failed entry.
* filer: delete remote file when entry has Remote pointer, not only when remote-only
Replace IsInRemoteOnly() guard with entry.Remote == nil check in
maybeDeleteFromRemote. IsInRemoteOnly() requires zero local chunks and
RemoteSize > 0, which incorrectly skips remote deletion for cached
files (local chunks exist) and zero-byte remote objects (RemoteSize 0).
The correct condition is whether the entry has a remote backing object
at all.
---------
Co-authored-by: Chris Lu <chris.lu@gmail.com>
* fix(helm): use componentName for all service names to fix truncation mismatch (#8610)
PR #8143 updated statefulsets and deployments to use the componentName
helper (which truncates the fullname before appending the suffix), but
left service definitions using the old `printf + trunc 63` pattern.
When release names are long enough, these two strategies produce
different names, causing DNS resolution failures (e.g., S3 cannot
find the filer-client service and falls back to localhost:8888).
Unify all service name definitions and cluster address helpers to use
the componentName helper consistently.
* refactor(helm): simplify cluster address helpers with ternary
* test(helm): add regression test for service name truncation with long release names
Renders the chart with a >63-char fullname in both normal and all-in-one
modes, then asserts that Service metadata.name values match the hostnames
produced by cluster.masterAddress, cluster.filerAddress, and the S3
deployment's -filer= argument. Prevents future truncation/DNS mismatch
regressions like #8610.
* fix(helm-ci): limit S3_FILER_HOST extraction to first match
* fix(filer): limit concurrent proxy reads per volume server
Add a per-volume-server semaphore (default 16) to proxyToVolumeServer
to prevent replication bursts from overwhelming individual volume
servers with hundreds of concurrent connections, which causes them
to drop connections with "unexpected EOF".
Excess requests queue up and respect the client's context, returning
503 if the client disconnects while waiting.
Also log io.CopyBuffer errors that were previously silently discarded.
* Apply suggestion from @gemini-code-assist[bot]
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
* fix(filer): use non-blocking release for proxy semaphore
Prevents a goroutine from blocking forever if releaseProxySemaphore
is ever called without a matching acquire.
* test(filer): clean up proxySemaphores entries in all proxy tests
---------
Co-authored-by: Copilot <copilot@github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
* fix(replication): resume partial chunk reads on EOF instead of re-downloading
When replicating chunks and the source connection drops mid-transfer,
accumulate the bytes already received and retry with a Range header
to fetch only the remaining bytes. This avoids re-downloading
potentially large chunks from scratch on each retry, reducing load
on busy source servers and speeding up recovery.
* test(replication): add tests for downloadWithRange including gzip partial reads
Tests cover:
- No offset (no Range header sent)
- With offset (Range header verified)
- Content-Disposition filename extraction
- Partial read + resume: server drops connection mid-transfer, client
resumes with Range from the offset of received bytes
- Gzip partial read + resume: first response is gzip-encoded (Go auto-
decompresses), connection drops, resume request gets decompressed data
(Go doesn't add Accept-Encoding when Range is set, so the server
decompresses), combined bytes match original
* fix(replication): address PR review comments
- Consolidate downloadWithRange into DownloadFile with optional offset
parameter (variadic), eliminating code duplication (DRY)
- Validate HTTP response status: require 206 + correct Content-Range
when offset > 0, reject when server ignores Range header
- Use if/else for fullData assignment for clarity
- Add test for rejected Range (server returns 200 instead of 206)
* refactor(replication): remove unused ReplicationSource interface
The interface was never referenced and its signature didn't match
the actual FilerSource.ReadPart method.
---------
Co-authored-by: Copilot <copilot@github.com>
* feat(security): add [admin] section to security.toml scaffold
Add admin credential fields (user, password, readonly.user,
readonly.password) to security.toml. Via viper's WEED_ env prefix and
AutomaticEnv(), these are automatically overridable as WEED_ADMIN_USER,
WEED_ADMIN_PASSWORD, etc.
Ref: https://github.com/seaweedfs/seaweedfs/discussions/8586
* feat(admin): support env var and security.toml fallbacks for credentials
Add applyViperFallback() to read admin credentials from security.toml /
WEED_* environment variables when CLI flags are not explicitly set.
This allows systems like NixOS to pass secrets via env vars instead of
CLI flags, which appear in process listings.
Precedence: CLI flag > env var / security.toml > default value.
Also change -adminUser default from "admin" to "" so that credentials
are fully opt-in.
Ref: https://github.com/seaweedfs/seaweedfs/discussions/8586
* feat(helm): use WEED_ env vars for admin credentials instead of CLI flags
Rename SEAWEEDFS_ADMIN_USER/PASSWORD to WEED_ADMIN_USER/PASSWORD so
viper picks them up natively. Remove -adminUser/-adminPassword shell
expansion from command args since the Go binary now reads these
directly via viper.
* docs(admin): document env var and security.toml credential support
Add environment variable mapping table, security.toml example, and
precedence rules to the admin README.
* style(security): use nested [admin.readonly] table in security.toml
Use a nested TOML table instead of dotted keys for the readonly
credentials. More idiomatic and easier to read; no change in how
Viper parses it.
* fix(admin): use util.GetViper() for env var support and fix README example
applyViperFallback() was using viper.GetString() directly, which
bypasses the WEED_ env prefix and AutomaticEnv setup that only
happens in util.GetViper(). Switch to util.GetViper().GetString()
so WEED_ADMIN_* environment variables are actually picked up.
Also fix the README example to include WEED_ADMIN_USER alongside
WEED_ADMIN_PASSWORD, since runAdmin() rejects an empty username
when a password is set.
* fix(admin): restore default adminUser to "admin"
Defaulting adminUser to "" broke the common flow of setting only
WEED_ADMIN_PASSWORD — runAdmin() rejects an empty username when a
password is set. Restore "admin" as the default so that setting
only the password works out of the box.
* docs(admin): align README security.toml example with scaffold format
Use nested [admin.readonly] table instead of flat dotted keys to
match the format in weed/command/scaffold/security.toml.
* docs(admin): remove README.md in favor of wiki page
Admin documentation lives at the wiki (Admin-UI.md). Remove the
in-repo README to avoid maintaining duplicate docs.
---------
Co-authored-by: Copilot <copilot@github.com>
The log message was comparing against the planned size of the destination
volume (including volumes already planned to merge into it) but only
displaying the raw volume size, making the output confusing when the
displayed sizes clearly didn't add up to exceed the limit.
When getEncryptionConfiguration encounters a not-found error (e.g.,
during bucket recreation after a partial delete), return
ErrNoSuchBucketEncryptionConfiguration instead of ErrInternalError.
This prevents uploads from failing with 500 errors during recovery.
When autoCreateBucket finds the bucket already exists, remove it from
the negative cache so subsequent requests don't unnecessarily trigger
another auto-create attempt.
Clarify the comment and log message for the case where a collection
exists but the bucket directory is missing, explaining the root cause
(partial deletion) more precisely.
Reorder DeleteBucketHandler to remove the bucket directory first, then
delete the collection. If collection deletion fails, the bucket is still
effectively deleted and can be recreated. Previously, if directory
deletion succeeded but collection deletion failed, the bucket was left
in an unrecoverable state.
* fix(s3api): allow bucket recreation when orphaned collection exists (#8601)
When a bucket is deleted, its filer directory is removed but the
underlying collection/volumes may not be fully cleaned up yet. If the
bucket is immediately recreated, PutBucketHandler was returning
ErrBucketAlreadyExists due to the orphaned collection, blocking bucket
recreation and causing subsequent uploads to fail with InternalError.
Allow bucket creation to proceed when a collection exists without a
corresponding bucket directory, since this is a transient orphaned state
from a previous deletion.
* fix(s3api): handle concurrent bucket creation race in mkdir
On mkdir failure, re-check whether the bucket directory now exists
and return BucketAlreadyExists instead of InternalError when another
request created the bucket concurrently.
remote.uncache checks LastLocalSyncTsNs to determine if a file has been
synced to remote. remote.copy.local was not setting this field, leaving
it at 0, which caused uncache to skip all files uploaded via
remote.copy.local.
Fixes#8602
* Reduce task logger noise: stop duplicating every log entry to glog and stderr
Every task log entry was being tripled: written to the task log file,
forwarded to glog (which writes to /tmp by default with no rotation),
and echoed to stderr. This caused glog files to fill /tmp on long-running
workers.
- Remove INFO/DEBUG forwarding to glog (only ERROR/WARNING remain)
- Remove stderr echo of every log line
- Remove fsync on every single log write (unnecessary for log files)
* Fix glog call depth for correct source file attribution
The call stack is: caller → Error() → log() → writeLogEntry() →
glog.ErrorDepth(), so depth=4 is needed for glog to report the
original caller's file and line number.
fix(s3api): ListObjects with trailing-slash prefix returns wrong results
When ListObjectsV2 is called with a prefix ending in "/" (e.g., "foo/"),
normalizePrefixMarker strips the trailing slash and splits into
dir="parent" and prefix="foo". The filer then lists entries matching
prefix "foo", which returns both directory "foo" and "foo1000".
The prefixEndsOnDelimiter guard correctly identifies directory "foo" as
the target and recurses into it, but then resets the guard to false.
The loop continues and incorrectly recurses into "foo1000" as well,
causing the listing to return objects from unrelated directories.
Fix: after recursing into the exact directory targeted by the
trailing-slash prefix, return immediately from the listing loop.
There is no reason to process sibling entries since the original
prefix specifically targeted one directory.
* Add tests for AWS user principal in AssumeRole trust policies
Add test cases that verify trust policy validation when using specific
AWS user principals (e.g., "arn:aws:iam::000000000000:user/backend")
in the Principal field of trust policies for AssumeRole.
Covers single user, multiple users (array), wildcard, and plain string
principal formats. These tests demonstrate the bug reported in #8588
where specific user principals always fail validation.
* Populate RequestContext in ValidateTrustPolicyForPrincipal
ValidateTrustPolicyForPrincipal was creating an EvaluationContext with
a nil RequestContext. The policy engine's principal matching logic looks
up "aws:PrincipalArn" in RequestContext for non-wildcard principals,
so specific user ARNs like "arn:aws:iam::000000000000:user/backend"
always failed to match, while wildcard "*" worked because it
short-circuits before the lookup.
Populate RequestContext with both "principal" and "aws:PrincipalArn"
keys, consistent with how IsActionAllowed already does it.
Fixes#8588
* Remove GitHub discussion URL from source code comments
* Add specific error message assertions in trust policy tests
Fix plugin configuration tab layout overflow (#8587)
Remove h-100 from Job Scheduling Settings card, which caused it to
stretch to 100% of the row height and push the Next Run card below
the row boundary, overflowing into the Detection Results section.
* Stop deleting counter metrics during bucket TTL cleanup
Counter metrics (traffic bytes, request counts, object counts) are
monotonically increasing by design. Deleting them after 10 minutes of
bucket inactivity causes them to vanish from /metrics output and reset
to zero when traffic resumes, breaking Prometheus rate()/increase()
queries and making historical traffic reporting impossible.
Only delete gauges and histograms in the TTL cleanup loop, as these
represent current state and are safely re-populated on next activity.
Fixes https://github.com/seaweedfs/seaweedfs/issues/8521
* Clean up all bucket metrics on bucket deletion
Add DeleteBucketMetrics() to delete all metrics (including counters)
for a bucket when it is explicitly deleted. This prevents unbounded
label cardinality from accumulating for buckets that no longer exist.
Called from DeleteBucketHandler after successful bucket deletion.
* Reduce mutex scope in bucket metrics TTL sweep
Collect expired bucket names under the lock, then release before
calling DeletePartialMatch on Prometheus metrics. This prevents
RecordBucketActiveTime from blocking during the expensive cleanup.
Collect expired bucket names under the lock, then release before
calling DeletePartialMatch on Prometheus metrics. This prevents
RecordBucketActiveTime from blocking during the expensive cleanup.
* Remove misleading Workers sub-menu items from admin sidebar
The sidebar sub-items (Job Detection, Job Queue, Job Execution,
Configuration) always navigated to the first job type's tabs
(typically EC Encoding) rather than showing cross-job-type views.
This was confusing as noted in #8590. Since the in-page tabs already
provide this navigation, remove the redundant sidebar sub-items and
keep only the top-level Workers link.
Fixes#8590
* Update layout_templ.go
* add dynamic timeouts to plugin worker vacuum gRPC calls
All vacuum gRPC calls used context.Background() with no deadline,
so the plugin scheduler's execution timeout could kill a job while
a large volume compact was still in progress. Use volume-size-scaled
timeouts matching the topology vacuum approach: 3 min/GB for compact,
1 min/GB for check, commit, and cleanup.
Fixes#8591
* scale scheduler execution timeout by volume size
The scheduler's per-job execution timeout (default 240s) would kill
vacuum jobs on large volumes before they finish. Three changes:
1. Vacuum detection now includes estimated_runtime_seconds in job
proposals, computed as 5 min/GB of volume size.
2. The scheduler checks for estimated_runtime_seconds in job
parameters and uses it as the execution timeout when larger than
the default — a generic mechanism any handler can use.
3. Vacuum task gRPC calls now use the passed-in ctx as parent
instead of context.Background(), so scheduler cancellation
propagates to in-flight RPCs.
* extend job type runtime when proposals need more time
The JobTypeMaxRuntime (default 30 min) wraps both detection and
execution. Its context is the parent of all per-job execution
contexts, so even with per-job estimated_runtime_seconds, jobCtx
would cancel everything when it expires.
After detection, scan proposals for the maximum
estimated_runtime_seconds. If any proposal needs more time than
the remaining JobTypeMaxRuntime, create a new execution context
with enough headroom. This lets large vacuum jobs complete without
being killed by the job type deadline while still respecting the
configured limit for normal-sized jobs.
* log missing volume size metric, remove dead minimum runtime guard
Add a debug log in vacuumTimeout when t.volumeSize is 0 so
operators can investigate why metrics are missing for a volume.
Remove the unreachable estimatedRuntimeSeconds < 180 check in
buildVacuumProposal — volumeSizeGB always >= 1 (due to +1 floor),
so estimatedRuntimeSeconds is always >= 300.
* cap estimated runtime and fix status check context
- Cap maxEstimatedRuntime and per-job timeout overrides to 8 hours
to prevent unbounded timeouts from bad metrics.
- Check execCtx.Err() instead of jobCtx.Err() for status reporting,
since dispatch runs under execCtx which may have a longer deadline.
A successful dispatch under execCtx was misreported as "timeout"
when jobCtx had expired.
* check for nil needle map before compaction sync
When CommitCompact runs concurrently, it sets v.nm = nil under
dataFileAccessLock. CompactByIndex does not hold that lock, so
v.nm.Sync() can hit a nil pointer. Add an early nil check to
return an error instead of crashing.
Fixes#8591
* guard copyDataBasedOnIndexFile size check against nil needle map
The post-compaction size validation at line 538 accesses
v.nm.ContentSize() and v.nm.DeletedSize(). If CommitCompact has
concurrently set v.nm to nil, this causes a SIGSEGV. Skip the
validation when v.nm is nil since the actual data copy uses local
needle maps (oldNm/newNm) and is unaffected.
Fixes#8591
* use atomic.Bool for compaction flags to prevent concurrent vacuum races
The isCompacting and isCommitCompacting flags were plain bools
read and written from multiple goroutines without synchronization.
This allowed concurrent vacuums on the same volume to pass the
guard checks and run simultaneously, leading to the nil pointer
crash. Using atomic.Bool with CompareAndSwap ensures only one
compaction or commit can run per volume at a time.
Fixes#8591
* use go-version-file in CI workflows instead of hardcoded versions
Use go-version-file: 'go.mod' so CI automatically picks up the Go
version from go.mod, avoiding future version drift. Reordered
checkout before setup-go in go.yml and e2e.yml so go.mod is
available. Removed the now-unused GO_VERSION env vars.
* capture v.nm locally in CompactByIndex to close TOCTOU race
A bare nil check on v.nm followed by v.nm.Sync() has a race window
where CommitCompact can set v.nm = nil between the two. Snapshot
the pointer into a local variable so the nil check and Sync operate
on the same reference.
* add dynamic timeouts to plugin worker vacuum gRPC calls
All vacuum gRPC calls used context.Background() with no deadline,
so the plugin scheduler's execution timeout could kill a job while
a large volume compact was still in progress. Use volume-size-scaled
timeouts matching the topology vacuum approach: 3 min/GB for compact,
1 min/GB for check, commit, and cleanup.
Fixes#8591
* Revert "add dynamic timeouts to plugin worker vacuum gRPC calls"
This reverts commit 80951934c3.
* unify compaction lifecycle into single atomic flag
Replace separate isCompacting and isCommitCompacting flags with a
single isCompactionInProgress atomic.Bool. This ensures CompactBy*,
CommitCompact, Close, and Destroy are mutually exclusive — only one
can run at a time per volume.
Key changes:
- All entry points use CompareAndSwap(false, true) to claim exclusive
access. CompactByVolumeData and CompactByIndex now also guard v.nm
and v.DataBackend with local captures.
- Close() waits for the flag outside dataFileAccessLock to avoid
deadlocking with CommitCompact (which holds the flag while waiting
for the lock). It claims the flag before acquiring the lock so no
new compaction can start.
- Destroy() uses CAS instead of a racy Load check, preventing
concurrent compaction from racing with volume teardown.
- unmountVolumeByCollection no longer deletes from the map;
DeleteCollectionFromDiskLocation removes entries only after
successful Destroy, preventing orphaned volumes on failure.
Fixes#8591
The SchedulerConfig struct and its persistence/API were unnecessary
indirection. Replace with a simple constant (reduced from 613s to 61s)
so the scheduler re-checks for detectable job types promptly after
going idle, improving the clean-install experience.