* Add volume dir tags to topology
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* Add preferred tag config for EC
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* Prioritize EC destinations by tags
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* Add EC placement planner tag tests
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* Refactor EC placement tests to reuse buildActiveTopology
Remove buildActiveTopologyWithDiskTags helper function and consolidate
tag setup inline in test cases. Tests now use UpdateTopology to apply
tags after topology creation, reusing the existing buildActiveTopology
function rather than duplicating its logic.
All tag scenario tests pass:
- TestECPlacementPlannerPrefersTaggedDisks
- TestECPlacementPlannerFallsBackWhenTagsInsufficient
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* Consolidate normalizeTagList into shared util package
Extract normalizeTagList from three locations (volume.go,
detection.go, erasure_coding_handler.go) into new weed/util/tag.go
as exported NormalizeTagList function. Replace all duplicate
implementations with imports and calls to util.NormalizeTagList.
This improves code reuse and maintainability by centralizing
tag normalization logic.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* Add PreferredTags to EC config persistence
Add preferred_tags field to ErasureCodingTaskConfig protobuf with field
number 5. Update GetConfigSpec to include preferred_tags field in the
UI configuration schema. Add PreferredTags to ToTaskPolicy to serialize
config to protobuf. Add PreferredTags to FromTaskPolicy to deserialize
from protobuf with defensive copy to prevent external mutation.
This allows EC preferred tags to be persisted and restored across
worker restarts.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* Add defensive copy for Tags slice in DiskLocation
Copy the incoming tags slice in NewDiskLocation instead of storing
by reference. This prevents external callers from mutating the
DiskLocation.Tags slice after construction, improving encapsulation
and preventing unexpected changes to disk metadata.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* Add doc comment to buildCandidateSets method
Document the tiered candidate selection and fallback behavior. Explain
that for a planner with preferredTags, it accumulates disks matching
each tag in order into progressively larger tiers, emits a candidate
set once a tier reaches shardsNeeded, and finally falls back to the
full candidates set if preferred-tag tiers are insufficient.
This clarifies the intended semantics for future maintainers.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* Apply final PR review fixes
1. Update parseVolumeTags to replicate single tag entry to all folders
instead of leaving some folders with nil tags. This prevents nil
pointer dereferences when processing folders without explicit tags.
2. Add defensive copy in ToTaskPolicy for PreferredTags slice to match
the pattern used in FromTaskPolicy, preventing external mutation of
the returned TaskPolicy.
3. Add clarifying comment in buildCandidateSets explaining that the
shardsNeeded <= 0 branch is a defensive check for direct callers,
since selectDestinations guarantees shardsNeeded > 0.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* Fix nil pointer dereference in parseVolumeTags
Ensure all folder tags are initialized to either normalized tags or
empty slices, not nil. When multiple tag entries are provided and there
are more folders than entries, remaining folders now get empty slices
instead of nil, preventing nil pointer dereference in downstream code.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* Fix NormalizeTagList to return empty slice instead of nil
Change NormalizeTagList to always return a non-nil slice. When all tags
are empty or whitespace after normalization, return an empty slice
instead of nil. This prevents nil pointer dereferences in downstream
code that expects a valid (possibly empty) slice.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* Add nil safety check for v.tags pointer
Add a safety check to handle the case where v.tags might be nil,
preventing a nil pointer dereference. If v.tags is nil, use an empty
string instead. This is defensive programming to prevent panics in
edge cases.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* Add volume.tags flag to weed server and weed mini commands
Add the volume.tags CLI option to both the 'weed server' and 'weed mini'
commands. This allows users to specify disk tags when running the
combined server modes, just like they can with 'weed volume'.
The flag uses the same format and description as the volume command:
comma-separated tag groups per data dir with ':' separators
(e.g. fast:ssd,archive).
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
---------
Co-authored-by: Copilot <copilot@github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* admin: add plugin runtime UI page and route wiring
* pb: add plugin gRPC contract and generated bindings
* admin/plugin: implement worker registry, runtime, monitoring, and config store
* admin/dash: wire plugin runtime and expose plugin workflow APIs
* command: add flags to enable plugin runtime
* admin: rename remaining plugin v2 wording to plugin
* admin/plugin: add detectable job type registry helper
* admin/plugin: add scheduled detection and dispatch orchestration
* admin/plugin: prefetch job type descriptors when workers connect
* admin/plugin: add known job type discovery API and UI
* admin/plugin: refresh design doc to match current implementation
* admin/plugin: enforce per-worker scheduler concurrency limits
* admin/plugin: use descriptor runtime defaults for scheduler policy
* admin/ui: auto-load first known plugin job type on page open
* admin/plugin: bootstrap persisted config from descriptor defaults
* admin/plugin: dedupe scheduled proposals by dedupe key
* admin/ui: add job type and state filters for plugin monitoring
* admin/ui: add per-job-type plugin activity summary
* admin/plugin: split descriptor read API from schema refresh
* admin/ui: keep plugin summary metrics global while tables are filtered
* admin/plugin: retry executor reservation before timing out
* admin/plugin: expose scheduler states for monitoring
* admin/ui: show per-job-type scheduler states in plugin monitor
* pb/plugin: rename protobuf package to plugin
* admin/plugin: rename pluginRuntime wiring to plugin
* admin/plugin: remove runtime naming from plugin APIs and UI
* admin/plugin: rename runtime files to plugin naming
* admin/plugin: persist jobs and activities for monitor recovery
* admin/plugin: lease one detector worker per job type
* admin/ui: show worker load from plugin heartbeats
* admin/plugin: skip stale workers for detector and executor picks
* plugin/worker: add plugin worker command and stream runtime scaffold
* plugin/worker: implement vacuum detect and execute handlers
* admin/plugin: document external vacuum plugin worker starter
* command: update plugin.worker help to reflect implemented flow
* command/admin: drop legacy Plugin V2 label
* plugin/worker: validate vacuum job type and respect min interval
* plugin/worker: test no-op detect when min interval not elapsed
* command/admin: document plugin.worker external process
* plugin/worker: advertise configured concurrency in hello
* command/plugin.worker: add jobType handler selection
* command/plugin.worker: test handler selection by job type
* command/plugin.worker: persist worker id in workingDir
* admin/plugin: document plugin.worker jobType and workingDir flags
* plugin/worker: support cancel request for in-flight work
* plugin/worker: test cancel request acknowledgements
* command/plugin.worker: document workingDir and jobType behavior
* plugin/worker: emit executor activity events for monitor
* plugin/worker: test executor activity builder
* admin/plugin: send last successful run in detection request
* admin/plugin: send cancel request when detect or execute context ends
* admin/plugin: document worker cancel request responsibility
* admin/handlers: expose plugin scheduler states API in no-auth mode
* admin/handlers: test plugin scheduler states route registration
* admin/plugin: keep worker id on worker-generated activity records
* admin/plugin: test worker id propagation in monitor activities
* admin/dash: always initialize plugin service
* command/admin: remove plugin enable flags and default to enabled
* admin/dash: drop pluginEnabled constructor parameter
* admin/plugin UI: stop checking plugin enabled state
* admin/plugin: remove docs for plugin enable flags
* admin/dash: remove unused plugin enabled check method
* admin/dash: fallback to in-memory plugin init when dataDir fails
* admin/plugin API: expose worker gRPC port in status
* command/plugin.worker: resolve admin gRPC port via plugin status
* split plugin UI into overview/configuration/monitoring pages
* Update layout_templ.go
* add volume_balance plugin worker handler
* wire plugin.worker CLI for volume_balance job type
* add erasure_coding plugin worker handler
* wire plugin.worker CLI for erasure_coding job type
* support multi-job handlers in plugin worker runtime
* allow plugin.worker jobType as comma-separated list
* admin/plugin UI: rename to Workers and simplify config view
* plugin worker: queue detection requests instead of capacity reject
* Update plugin_worker.go
* plugin volume_balance: remove force_move/timeout from worker config UI
* plugin erasure_coding: enforce local working dir and cleanup
* admin/plugin UI: rename admin settings to job scheduling
* admin/plugin UI: persist and robustly render detection results
* admin/plugin: record and return detection trace metadata
* admin/plugin UI: show detection process and decision trace
* plugin: surface detector decision trace as activities
* mini: start a plugin worker by default
* admin/plugin UI: split monitoring into detection and execution tabs
* plugin worker: emit detection decision trace for EC and balance
* admin workers UI: split monitoring into detection and execution pages
* plugin scheduler: skip proposals for active assigned/running jobs
* admin workers UI: add job queue tab
* plugin worker: add dummy stress detector and executor job type
* admin workers UI: reorder tabs to detection queue execution
* admin workers UI: regenerate plugin template
* plugin defaults: include dummy stress and add stress tests
* plugin dummy stress: rotate detection selections across runs
* plugin scheduler: remove cross-run proposal dedupe
* plugin queue: track pending scheduled jobs
* plugin scheduler: wait for executor capacity before dispatch
* plugin scheduler: skip detection when waiting backlog is high
* plugin: add disk-backed job detail API and persistence
* admin ui: show plugin job detail modal from job id links
* plugin: generate unique job ids instead of reusing proposal ids
* plugin worker: emit heartbeats on work state changes
* plugin registry: round-robin tied executor and detector picks
* add temporary EC overnight stress runner
* plugin job details: persist and render EC execution plans
* ec volume details: color data and parity shard badges
* shard labels: keep parity ids numeric and color-only distinction
* admin: remove legacy maintenance UI routes and templates
* admin: remove dead maintenance endpoint helpers
* Update layout_templ.go
* remove dummy_stress worker and command support
* refactor plugin UI to job-type top tabs and sub-tabs
* migrate weed worker command to plugin runtime
* remove plugin.worker command and keep worker runtime with metrics
* update helm worker args for jobType and execution flags
* set plugin scheduling defaults to global 16 and per-worker 4
* stress: fix RPC context reuse and remove redundant variables in ec_stress_runner
* admin/plugin: fix lifecycle races, safe channel operations, and terminal state constants
* admin/dash: randomize job IDs and fix priority zero-value overwrite in plugin API
* admin/handlers: implement buffered rendering to prevent response corruption
* admin/plugin: implement debounced persistence flusher and optimize BuildJobDetail memory lookups
* admin/plugin: fix priority overwrite and implement bounded wait in scheduler reserve
* admin/plugin: implement atomic file writes and fix run record side effects
* admin/plugin: use P prefix for parity shard labels in execution plans
* admin/plugin: enable parallel execution for cancellation tests
* admin: refactor time.Time fields to pointers for better JSON omitempty support
* admin/plugin: implement pointer-safe time assignments and comparisons in plugin core
* admin/plugin: fix time assignment and sorting logic in plugin monitor after pointer refactor
* admin/plugin: update scheduler activity tracking to use time pointers
* admin/plugin: fix time-based run history trimming after pointer refactor
* admin/dash: fix JobSpec struct literal in plugin API after pointer refactor
* admin/view: add D/P prefixes to EC shard badges for UI consistency
* admin/plugin: use lifecycle-aware context for schema prefetching
* Update ec_volume_details_templ.go
* admin/stress: fix proposal sorting and log volume cleanup errors
* stress: refine ec stress runner with math/rand and collection name
- Added Collection field to VolumeEcShardsDeleteRequest for correct filename construction.
- Replaced crypto/rand with seeded math/rand PRNG for bulk payloads.
- Added documentation for EcMinAge zero-value behavior.
- Added logging for ignored errors in volume/shard deletion.
* admin: return internal server error for plugin store failures
Changed error status code from 400 Bad Request to 500 Internal Server Error for failures in GetPluginJobDetail to correctly reflect server-side errors.
* admin: implement safe channel sends and graceful shutdown sync
- Added sync.WaitGroup to Plugin struct to manage background goroutines.
- Implemented safeSendCh helper using recover() to prevent panics on closed channels.
- Ensured Shutdown() waits for all background operations to complete.
* admin: robustify plugin monitor with nil-safe time and record init
- Standardized nil-safe assignment for *time.Time pointers (CreatedAt, UpdatedAt, CompletedAt).
- Ensured persistJobDetailSnapshot initializes new records correctly if they don't exist on disk.
- Fixed debounced persistence to trigger immediate write on job completion.
* admin: improve scheduler shutdown behavior and logic guards
- Replaced brittle error string matching with explicit r.shutdownCh selection for shutdown detection.
- Removed redundant nil guard in buildScheduledJobSpec.
- Standardized WaitGroup usage for schedulerLoop.
* admin: implement deep copy for job parameters and atomic write fixes
- Implemented deepCopyGenericValue and used it in cloneTrackedJob to prevent shared state.
- Ensured atomicWriteFile creates parent directories before writing.
* admin: remove unreachable branch in shard classification
Removed an unreachable 'totalShards <= 0' check in classifyShardID as dataShards and parityShards are already guarded.
* admin: secure UI links and use canonical shard constants
- Added rel="noopener noreferrer" to external links for security.
- Replaced magic number 14 with erasure_coding.TotalShardsCount.
- Used renderEcShardBadge for missing shard list consistency.
* admin: stabilize plugin tests and fix regressions
- Composed a robust plugin_monitor_test.go to handle asynchronous persistence.
- Updated all time.Time literals to use timeToPtr helper.
- Added explicit Shutdown() calls in tests to synchronize with debounced writes.
- Fixed syntax errors and orphaned struct literals in tests.
* Potential fix for code scanning alert no. 278: Slice memory allocation with excessive size value
Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>
* Potential fix for code scanning alert no. 283: Uncontrolled data used in path expression
Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>
* admin: finalize refinements for error handling, scheduler, and race fixes
- Standardized HTTP 500 status codes for store failures in plugin_api.go.
- Tracked scheduled detection goroutines with sync.WaitGroup for safe shutdown.
- Fixed race condition in safeSendDetectionComplete by extracting channel under lock.
- Implemented deep copy for JobActivity details.
- Used defaultDirPerm constant in atomicWriteFile.
* test(ec): migrate admin dockertest to plugin APIs
* admin/plugin_api: fix RunPluginJobTypeAPI to return 500 for server-side detection/filter errors
* admin/plugin_api: fix ExecutePluginJobAPI to return 500 for job execution failures
* admin/plugin_api: limit parseProtoJSONBody request body to 1MB to prevent unbounded memory usage
* admin/plugin: consolidate regex to package-level validJobTypePattern; add char validation to sanitizeJobID
* admin/plugin: fix racy Shutdown channel close with sync.Once
* admin/plugin: track sendLoop and recv goroutines in WorkerStream with r.wg
* admin/plugin: document writeProtoFiles atomicity — .pb is source of truth, .json is human-readable only
* admin/plugin: extract activityLess helper to deduplicate nil-safe OccurredAt sort comparators
* test/ec: check http.NewRequest errors to prevent nil req panics
* test/ec: replace deprecated ioutil/math/rand, fix stale step comment 5.1→3.1
* plugin(ec): raise default detection and scheduling throughput limits
* topology: include empty disks in volume list and EC capacity fallback
* topology: remove hard 10-task cap for detection planning
* Update ec_volume_details_templ.go
* adjust default
* fix tests
---------
Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>
* pb: add id field to Heartbeat message for stable volume server identification
This adds an 'id' field to the Heartbeat protobuf message that allows
volume servers to identify themselves independently of their IP:port address.
Ref: https://github.com/seaweedfs/seaweedfs/issues/7487
* storage: add Id field to Store struct
Add Id field to Store struct and include it in CollectHeartbeat().
The Id field provides a stable volume server identity independent of IP:port.
Ref: https://github.com/seaweedfs/seaweedfs/issues/7487
* topology: support id-based DataNode identification
Update GetOrCreateDataNode to accept an id parameter for stable node
identification. When id is provided, the DataNode can maintain its identity
even when its IP address changes (e.g., in Kubernetes pod reschedules).
For backward compatibility:
- If id is provided, use it as the node ID
- If id is empty, fall back to ip:port
Ref: https://github.com/seaweedfs/seaweedfs/issues/7487
* volume: add -id flag for stable volume server identity
Add -id command line flag to volume server that allows specifying a stable
identifier independent of the IP address. This is useful for Kubernetes
deployments with hostPath volumes where pods can be rescheduled to different
nodes while the persisted data remains on the original node.
Usage: weed volume -id=node-1 -ip=10.0.0.1 ...
If -id is not specified, it defaults to ip:port for backward compatibility.
Fixes https://github.com/seaweedfs/seaweedfs/issues/7487
* server: add -volume.id flag to weed server command
Support the -volume.id flag in the all-in-one 'weed server' command,
consistent with the standalone 'weed volume' command.
Usage: weed server -volume.id=node-1 ...
Ref: https://github.com/seaweedfs/seaweedfs/issues/7487
* topology: add test for id-based DataNode identification
Test the key scenarios:
1. Create DataNode with explicit id
2. Same id with different IP returns same DataNode (K8s reschedule)
3. IP/PublicUrl are updated when node reconnects with new address
4. Different id creates new DataNode
5. Empty id falls back to ip:port (backward compatibility)
Ref: https://github.com/seaweedfs/seaweedfs/issues/7487
* pb: add address field to DataNodeInfo for proper node addressing
Previously, DataNodeInfo.Id was used as the node address, which worked
when Id was always ip:port. Now that Id can be an explicit string,
we need a separate Address field for connection purposes.
Changes:
- Add 'address' field to DataNodeInfo protobuf message
- Update ToDataNodeInfo() to populate the address field
- Update NewServerAddressFromDataNode() to use Address (with Id fallback)
- Fix LookupEcVolume to use dn.Url() instead of dn.Id()
Ref: https://github.com/seaweedfs/seaweedfs/issues/7487
* fix: trim whitespace from volume server id and fix test
- Trim whitespace from -id flag to treat ' ' as empty
- Fix store_load_balancing_test.go to include id parameter in NewStore call
Ref: https://github.com/seaweedfs/seaweedfs/issues/7487
* refactor: extract GetVolumeServerId to util package
Move the volume server ID determination logic to a shared utility function
to avoid code duplication between volume.go and rack.go.
Ref: https://github.com/seaweedfs/seaweedfs/issues/7487
* fix: improve transition logic for legacy nodes
- Use exact ip:port match instead of net.SplitHostPort heuristic
- Update GrpcPort and PublicUrl during transition for consistency
- Remove unused net import
Ref: https://github.com/seaweedfs/seaweedfs/issues/7487
* fix: add id normalization and address change logging
- Normalize id parameter at function boundary (trim whitespace)
- Log when DataNode IP:Port changes (helps debug K8s pod rescheduling)
Ref: https://github.com/seaweedfs/seaweedfs/issues/7487