seaweedfs

Commit Graph

Author	SHA1	Message	Date
Chris Lu	56773ebf18	fix: use backend-normalized group.Name in CreateGroup response After credentialManager.CreateGroup may normalize the name (e.g., trim whitespace), use group.Name instead of the raw input for the returned GroupData to ensure consistency.	1 day ago
Chris Lu	e814b23b48	fix: use appropriate error message in GetGroupDetails based on status Return "Group not found" only for 404, use "Failed to retrieve group" for other error statuses instead of always saying "Group not found".	1 day ago
Chris Lu	5db6687f3c	fix: add ErrGroupNotEmpty sentinel and map to HTTP 409 AdminServer.DeleteGroup now wraps conflict errors with ErrGroupNotEmpty, and groupErrorToHTTPStatus maps it to 409 Conflict instead of 500.	1 day ago
Chris Lu	a06d6c56e6	fix: validate members/policies before deleting group in admin handler AdminServer.DeleteGroup now checks for attached members and policies before delegating to credentialManager, matching the IAM handler guards.	1 day ago
Chris Lu	2d783a5ba7	fix: add ErrUserNotFound and ErrPolicyNotFound to groupErrorToHTTPStatus Map these sentinel errors to 404 so AddGroupMember and AttachGroupPolicy return proper HTTP status codes.	2 days ago
Chris Lu	cabcd5a697	fix: wrap DetachGroupPolicy error with ErrPolicyNotAttached sentinel Use credential.ErrPolicyNotAttached so groupErrorToHTTPStatus maps it to 400 instead of falling back to 500.	2 days ago
Chris Lu	5b77cc7a26	admin: use groupErrorToHTTPStatus in GetGroupMembers and GetGroupPolicies	2 days ago
Chris Lu	519ea3c03b	admin: regenerate groups_templ.go with XSS-safe data attributes Regenerated from groups.templ which uses data-group-name attributes instead of inline onclick with string interpolation.	2 days ago
Chris Lu	ff9a1fdbaa	iam: wrap ErrUserNotInGroup sentinel in RemoveGroupMember error Wrap credential.ErrUserNotInGroup so errors.Is works in groupErrorToHTTPStatus, returning proper 400 instead of 500.	2 days ago
Chris Lu	e367600b21	iam: add defensive copies, validation, and XSS fixes for group management - Memory store: clone groups on store/retrieve to prevent mutation - Admin dash: deep copy groups before mutation, validate user/policy exists - HTTP handlers: translate credential errors to proper HTTP status codes, use *bool for Enabled field to distinguish missing vs false - Groups templ: use data attributes + event delegation instead of inline onclick for XSS safety, prevent stale async responses	2 days ago
Chris Lu	07f6734e80	iam: address PR review comments for group management - Fix XSS vulnerability in groups.templ: replace innerHTML string concatenation with DOM APIs (createElement/textContent) for rendering member and policy lists - Use userGroups reverse index in embedded IAM ListGroupsForUser for O(1) lookup instead of iterating all groups - Add buildUserGroupsIndex helper in standalone IAM handlers; use it in ListGroupsForUser and removeUserFromAllGroups for efficient lookup - Add note about gRPC store load-modify-save race condition limitation	2 days ago
Chris Lu	75b8d7c821	admin: add group management page to admin UI Add groups page with CRUD operations, member management, policy attachment, and enable/disable toggle. Register routes in admin handlers and add Groups entry to sidebar navigation.	2 days ago
Chris Lu	ba66411337	Update plugin_templ.go	2 days ago
Chris Lu	7808b301ef	admin: remove Scheduler Settings cards from plugin UI (#8558 ) * admin: remove Scheduler Settings cards, make Next Run full-width Remove the two "Scheduler Settings" placeholder cards from the plugin UI (overview page and scheduler tab). They only contained a text note saying detection intervals are configured per job type, which is self-evident from the per-job-type settings form. Make the "Next Run" card full-width on the overview page since it no longer shares a row with the removed card. * plugin UI: promote Next Run to top summary card row Move "Next Run" from a standalone card into the top row alongside Workers, Active Jobs, and Activities as a compact stat card.	2 days ago
Chris Lu	fa7da0f57e	template	2 days ago
Chris Lu	961c270aba	admin: expose per-job-type detection interval in plugin UI (#8552 ) * admin: expose per-job-type detection interval in plugin UI The detection_interval_seconds field was not editable in the admin UI. collectAdminSettings() silently preserved the existing value, making it impossible for users to change how often a job type checks for new work. Users would change the global "Sleep Between Iterations" setting expecting it to control job scheduling frequency, but that only controls the scheduler loop's idle polling rate. Add a "Detection Interval (s)" input to the per-job-type admin settings form so users can actually configure it. Fixes #8549 * admin: remove global Sleep Between Iterations setting Now that per-job-type detection intervals are exposed in the UI, the global IdleSleepSeconds setting is redundant and confusing. It only controlled the scheduler loop's idle polling rate, which is always overridden by earliestNextDetectionAt() when job types exist. Replace the three usages with simpler alternatives: - Scheduler loop sleep: use defaultSchedulerIdleSleep constant - Initial delay for new job types: use policy.DetectionInterval/2 (more logical since it's already per-job-type) - Status fallback: use the constant The API endpoints are kept for backward compatibility but the UI no longer exposes or calls them. * admin: restore configurable idle sleep in scheduler loop The EC integration test sets idle_sleep_seconds=1 via the scheduler config API so the scheduler wakes quickly after workers connect. The previous commit replaced this with a hardcoded 613s constant, causing the scheduler to sleep through the entire test window. Restore GetSchedulerConfig().IdleSleepDuration() in the scheduler loop and status reporting. The UI removal of the setting is still correct — the API endpoint remains for programmatic use (e.g., tests). * admin: cap first-run initial delay to 5s instead of DetectionInterval/2 The initial delay for first-run job types was set to policy.DetectionInterval/2, which creates unbounded first-run latency (e.g., 1 hour for vacuum with a 2-hour detection interval). A small fixed 5-second delay provides sufficient stagger without penalizing startup time.	2 days ago
Chris Lu	e25558e4d8	admin: fix mobile sidebar menu inaccessible in portrait mode (#8556 ) * admin: fix mobile sidebar menu inaccessible in portrait mode The hamburger button only toggled the user dropdown, leaving the sidebar navigation inaccessible on mobile devices in portrait mode. Add a dedicated sidebar toggle button (visible only on mobile), give the sidebar an id so Bootstrap collapse can target it, add a backdrop overlay for the open state, and auto-close the sidebar when a nav link is clicked. Fixes #8550 * admin: address review feedback on mobile sidebar - Remove redundant JS show/hide.bs.collapse listeners; CSS sibling selector already handles backdrop visibility - Use const instead of var for non-reassigned variables - Move inline style on user icon to CSS class * admin: add aria attributes to user-menu toggler, use CSS variable for navbar height - Add aria-controls, aria-expanded, and aria-label to the user-menu toggle button for assistive technology - Extract hard-coded 56px navbar height into --navbar-height CSS custom property used by sidebar and backdrop positioning * admin: extract hideSidebar helper, use toggler visibility for breakpoint check - Extract duplicated collapse-hide logic into a hideSidebar helper - Replace hardcoded window.innerWidth < 768 with a check on the sidebar toggler's computed display, decoupling JS from CSS breakpoints - Add aria-expanded="false" to sidebar toggle button --------- Co-authored-by: Copilot <copilot@github.com>	2 days ago
Chris Lu	3f946fc0c0	mount: make metadata cache rebuilds snapshot-consistent (#8531 ) * filer: expose metadata events and list snapshots * mount: invalidate hot directory caches * mount: read hot directories directly from filer * mount: add sequenced metadata cache applier * mount: apply metadata responses through cache applier * mount: replay snapshot-consistent directory builds * mount: dedupe self metadata events * mount: factor directory build cleanup * mount: replace proto marshal dedup with composite key and ring buffer The dedup logic was doing a full deterministic proto.Marshal on every metadata event just to produce a dedup key. Replace with a cheap composite string key (TsNs\|Directory\|OldName\|NewName). Also replace the sliding-window slice (which leaked the backing array unboundedly) with a fixed-size ring buffer that reuses the same array. * filer: remove mutex and proto.Clone from request-scoped MetadataEventSink MetadataEventSink is created per-request and only accessed by the goroutine handling the gRPC call. The mutex and double proto.Clone (once in Record, once in Last) were unnecessary overhead on every filer write operation. Store the pointer directly instead. * mount: skip proto.Clone for caller-owned metadata events Add ApplyMetadataResponseOwned that takes ownership of the response without cloning. Local metadata events (mkdir, create, flush, etc.) are freshly constructed and never shared, so the clone is unnecessary. * filer: only populate MetadataEvent on successful DeleteEntry Avoid calling eventSink.Last() on error paths where the sink may contain a partial event from an intermediate child deletion during recursive deletes. * mount: avoid map allocation in collectDirectoryNotifications Replace the map with a fixed-size array and linear dedup. There are at most 3 directories to notify (old parent, new parent, new child if directory), so a 3-element array avoids the heap allocation on every metadata event. * mount: fix potential deadlock in enqueueApplyRequest Release applyStateMu before the blocking channel send. Previously, if the channel was full (cap 128), the send would block while holding the mutex, preventing Shutdown from acquiring it to set applyClosed. * mount: restore signature-based self-event filtering as fast path Re-add the signature check that was removed when content-based dedup was introduced. Checking signatures is O(1) on a small slice and avoids enqueuing and processing events that originated from this mount instance. The content-based dedup remains as a fallback. * filer: send snapshotTsNs only in first ListEntries response The snapshot timestamp is identical for every entry in a single ListEntries stream. Sending it in every response message wastes wire bandwidth for large directories. The client already reads it only from the first response. * mount: exit read-through mode after successful full directory listing MarkDirectoryRefreshed was defined but never called, so directories that entered read-through mode (hot invalidation threshold) stayed there permanently, hitting the filer on every readdir even when cold. Call it after a complete read-through listing finishes. * mount: include event shape and full paths in dedup key The previous dedup key only used Names, which could collapse distinct rename targets. Include the event shape (C/D/U/R), source directory, new parent path, and both entry names so structurally different events are never treated as duplicates. * mount: drain pending requests on shutdown in runApplyLoop After receiving the shutdown sentinel, drain any remaining requests from applyCh non-blockingly and signal each with errMetaCacheClosed so callers waiting on req.done are released. * mount: include IsDirectory in synthetic delete events metadataDeleteEvent now accepts an isDirectory parameter so the applier can distinguish directory deletes from file deletes. Rmdir passes true, Unlink passes false. * mount: fall back to synthetic event when MetadataEvent is nil In mknod and mkdir, if the filer response omits MetadataEvent (e.g. older filer without the field), synthesize an equivalent local metadata event so the cache is always updated. * mount: make Flush metadata apply best-effort after successful commit After filer_pb.CreateEntryWithResponse succeeds, the entry is persisted. Don't fail the Flush syscall if the local metadata cache apply fails — log and invalidate the directory cache instead. Also fall back to a synthetic event when MetadataEvent is nil. * mount: make Rename metadata apply best-effort The rename has already succeeded on the filer by the time we apply the local metadata event. Log failures instead of returning errors that would be dropped by the caller anyway. * mount: make saveEntry metadata apply best-effort with fallback After UpdateEntryWithResponse succeeds, treat local metadata apply as non-fatal. Log and invalidate the directory cache on failure. Also fall back to a synthetic event when MetadataEvent is nil. * filer_pb: preserve snapshotTsNs on error in ReadDirAllEntriesWithSnapshot Return the snapshot timestamp even when the first page fails, so callers receive the snapshot boundary when partial data was received. * filer: send snapshot token for empty directory listings When no entries are streamed, send a final ListEntriesResponse with only SnapshotTsNs so clients always receive the snapshot boundary. * mount: distinguish not-found vs transient errors in lookupEntry Return fuse.EIO for non-not-found filer errors instead of unconditionally returning ENOENT, so transient failures don't masquerade as missing entries. * mount: make CacheRemoteObject metadata apply best-effort The file content has already been cached successfully. Don't fail the read if the local metadata cache update fails. * mount: use consistent snapshot for readdir in direct mode Capture the SnapshotTsNs from the first loadDirectoryEntriesDirect call and store it on the DirectoryHandle. Subsequent batch loads pass this stored timestamp so all batches use the same snapshot. Also export DoSeaweedListWithSnapshot so mount can use it directly with snapshot passthrough. * filer_pb: fix test fake to send SnapshotTsNs only on first response Match the server behavior: only the first ListEntriesResponse in a page carries the snapshot timestamp, subsequent entries leave it zero. * Fix nil pointer dereference in ListEntries stream consumers Remove the empty-directory snapshot-only response from ListEntries that sent a ListEntriesResponse with Entry==nil, which crashed every raw stream consumer that assumed resp.Entry is always non-nil. Also add defensive nil checks for resp.Entry in all raw ListEntries stream consumers across: S3 listing, broker topic lookup, broker topic config, admin dashboard, topic retention, hybrid message scanner, Kafka integration, and consumer offset storage. * Add nil guards for resp.Entry in remaining ListEntries stream consumers Covers: S3 object lock check, MQ management dashboard (version/ partition/offset loops), and topic retention version loop. * Make applyLocalMetadataEvent best-effort in Link and Symlink The filer operations already succeeded; failing the syscall because the local cache apply failed is wrong. Log a warning and invalidate the parent directory cache instead. * Make applyLocalMetadataEvent best-effort in Mkdir/Rmdir/Mknod/Unlink The filer RPC already committed; don't fail the syscall when the local metadata cache apply fails. Log a warning and invalidate the parent directory cache to force a re-fetch on next access. * flushFileMetadata: add nil-fallback for metadata event and best-effort apply Synthesize a metadata event when resp.GetMetadataEvent() is nil (matching doFlush), and make the apply best-effort with cache invalidation on failure. * Prevent double-invocation of cleanupBuild in doEnsureVisited Add a cleanupDone guard so the deferred cleanup and inline error-path cleanup don't both call DeleteFolderChildren/AbortDirectoryBuild. * Fix comment: signature check is O(n) not O(1) * Prevent deferred cleanup after successful CompleteDirectoryBuild Set cleanupDone before returning from the success path so the deferred context-cancellation check cannot undo a published build. * Invalidate parent directory caches on rename metadata apply failure When applyLocalMetadataEvent fails during rename, invalidate the source and destination parent directory caches so subsequent accesses trigger a re-fetch from the filer. * Add event nil-fallback and cache invalidation to Link and Symlink Synthesize metadata events when the server doesn't return one, and invalidate parent directory caches on apply failure. * Match requested partition when scanning partition directories Parse the partition range format (NNNN-NNNN) and match against the requested partition parameter instead of using the first directory. * Preserve snapshot timestamp across empty directory listings Initialize actualSnapshotTsNs from the caller-requested value so it isn't lost when the server returns no entries. Re-add the server-side snapshot-only response for empty directories (all raw stream consumers now have nil guards for Entry). * Fix CreateEntry error wrapping to support errors.Is/errors.As Use errors.New + %w instead of %v for resp.Error so callers can unwrap the underlying error. * Fix object lock pagination: only advance on non-nil entries Move entriesReceived inside the nil check so nil entries don't cause repeated ListEntries calls with the same lastFileName. * Guard Attributes nil check before accessing Mtime in MQ management * Do not send nil-Entry response for empty directory listings The snapshot-only ListEntriesResponse (with Entry == nil) for empty directories breaks consumers that treat any received response as an entry (Java FilerClient, S3 listing). The Go client-side DoSeaweedListWithSnapshot already preserves the caller-requested snapshot via actualSnapshotTsNs initialization, so the server-side send is unnecessary. * Fix review findings: subscriber dedup, invalidation normalization, nil guards, shutdown race - Remove self-signature early-return in processEventFn so all events flow through the applier (directory-build buffering sees self-originated events that arrive after a snapshot) - Normalize NewParentPath in collectEntryInvalidations to avoid duplicate invalidations when NewParentPath is empty (same-directory update) - Guard resp.Entry.Attributes for nil in admin_server.go and topic_retention.go to prevent panics on entries without attributes - Fix enqueueApplyRequest race with shutdown by using select on both applyCh and applyDone, preventing sends after the apply loop exits - Add cleanupDone check to deferred cleanup in meta_cache_init.go for clarity alongside the existing guard in cleanupBuild - Add empty directory test case for snapshot consistency * Propagate authoritative metadata event from CacheRemoteObjectToLocalCluster and generate client-side snapshot for empty directories - Add metadata_event field to CacheRemoteObjectToLocalClusterResponse proto so the filer-emitted event is available to callers - Use WithMetadataEventSink in the server handler to capture the event from NotifyUpdateEvent and return it on the response - Update filehandle_read.go to prefer the RPC's metadata event over a locally fabricated one, falling back to metadataUpdateEvent when the server doesn't provide one (e.g., older filers) - Generate a client-side snapshot cutoff in DoSeaweedListWithSnapshot when the server sends no snapshot (empty directory), so callers like CompleteDirectoryBuild get a meaningful boundary for filtering buffered events * Skip directory notifications for dirs being built to prevent mid-build cache wipe When a metadata event is buffered during a directory build, applyMetadataSideEffects was still firing noteDirectoryUpdate for the building directory. If the directory accumulated enough updates to become "hot", markDirectoryReadThrough would call DeleteFolderChildren, wiping entries that EnsureVisited had already inserted. The build would then complete and mark the directory cached with incomplete data. Fix by using applyMetadataSideEffectsSkippingBuildingDirs for buffered events, which suppresses directory notifications for dirs currently in buildingDirs while still applying entry invalidations. * Add test for directory notification suppression during active build TestDirectoryNotificationsSuppressedDuringBuild verifies that metadata events targeting a directory under active EnsureVisited build do NOT fire onDirectoryUpdate for that directory. In production, this prevents markDirectoryReadThrough from calling DeleteFolderChildren mid-build, which would wipe entries already inserted by the listing. The test inserts an entry during a build, sends multiple metadata events for the building directory, asserts no notifications fired for it, verifies the entry survives, and confirms buffered events are replayed after CompleteDirectoryBuild. * Fix create invalidations, build guard, event shape, context, and snapshot error path - collectEntryInvalidations: invalidate FUSE kernel cache on pure create events (OldEntry==nil && NewEntry!=nil), not just updates and deletes - completeDirectoryBuildNow: only call markCachedFn when an active build existed (state != nil), preventing an unpopulated directory from being marked as cached - Add metadataCreateEvent helper that produces a create-shaped event (NewEntry only, no OldEntry) and use it in mkdir, mknod, symlink, and hardlink create fallback paths instead of metadataUpdateEvent which incorrectly set both OldEntry and NewEntry - applyMetadataResponseEnqueue: use context.Background() for the queued mutation so a cancelled caller context cannot abort the apply loop mid-write - DoSeaweedListWithSnapshot: move snapshot initialization before ListEntries call so the error path returns the preserved snapshot instead of 0 * Fix review findings: test loop, cache race, context safety, snapshot consistency - Fix build test loop starting at i=1 instead of i=0, missing new-0.txt verification - Re-check IsDirectoryCached after cache miss to avoid ENOENT race with markDirectoryReadThrough - Use context.Background() in enqueueAndWait so caller cancellation can't abort build/complete mid-way - Pass dh.snapshotTsNs in skip-batch loadDirectoryEntriesDirect for snapshot consistency - Prefer resp.MetadataEvent over fallback in Unlink event derivation - Add comment on MetadataEventSink.Record single-event assumption * Fix empty-directory snapshot clock skew and build cancellation race Empty-directory snapshot: Remove client-side time.Now() synthesis when the server returns no entries. Instead return snapshotTsNs=0, and in completeDirectoryBuildNow replay ALL buffered events when snapshot is 0. This eliminates the clock-skew bug where a client ahead of the filer would filter out legitimate post-list events. Build cancellation: Use context.Background() for BeginDirectoryBuild and CompleteDirectoryBuild calls in doEnsureVisited, so errgroup cancellation doesn't cause enqueueAndWait to return early and trigger cleanupBuild while the operation is still queued. * Add tests for empty-directory build replay and cancellation resilience TestEmptyDirectoryBuildReplaysAllBufferedEvents: verifies that when CompleteDirectoryBuild receives snapshotTsNs=0 (empty directory, no server snapshot), ALL buffered events are replayed regardless of their TsNs values — no clock-skew-sensitive filtering occurs. TestBuildCompletionSurvivesCallerCancellation: verifies that once CompleteDirectoryBuild is enqueued, a cancelled caller context does not prevent the build from completing. The apply loop runs with context.Background(), so the directory becomes cached and buffered events are replayed even when the caller gives up waiting. * Fix directory subtree cleanup, Link rollback, test robustness - applyMetadataResponseLocked: when a directory entry is deleted or moved, call DeleteFolderChildren on the old path so cached descendants don't leak as stale entries. - Link: save original HardLinkId/Counter before mutation. If CreateEntryWithResponse fails after the source was already updated, rollback the source entry to its original state via UpdateEntry. - TestBuildCompletionSurvivesCallerCancellation: replace fixed time.Sleep(50ms) with a deadline-based poll that checks IsDirectoryCached in a loop, failing only after 2s timeout. - TestReadDirAllEntriesWithSnapshotEmptyDirectory: assert that ListEntries was actually invoked on the mock client so the test exercises the RPC path. - newMetadataEvent: add early return when both oldEntry and newEntry are nil to avoid emitting events with empty Directory. --------- Co-authored-by: Copilot <copilot@github.com>	3 days ago
Chris Lu	f9311a3422	s3api: fix static IAM policy enforcement after reload (#8532 ) * s3api: honor attached IAM policies over legacy actions * s3api: hydrate IAM policy docs during config reload * s3api: use policy-aware auth when listing buckets * credential: propagate context through filer_etc policy reads * credential: make legacy policy deletes durable * s3api: exercise managed policy runtime loader * s3api: allow static IAM users without session tokens * iam: deny unmatched attached policies under default allow * iam: load embedded policy files from filer store * s3api: require session tokens for IAM presigning * s3api: sync runtime policies into zero-config IAM * credential: respect context in policy file loads * credential: serialize legacy policy deletes * iam: align filer policy store naming * s3api: use authenticated principals for presigning * iam: deep copy policy conditions * s3api: require request creation in policy tests * filer: keep ReadInsideFiler as the context-aware API * iam: harden filer policy store writes * credential: strengthen legacy policy serialization test * credential: forward runtime policy loaders through wrapper * s3api: harden runtime policy merging * iam: require typed already-exists errors	4 days ago
Chris Lu	1f3df6e9ef	admin: remove Alpha badge and unused Metrics/Logs menu items (#8525 ) * admin: remove Alpha badge and unused Metrics/Logs menu items * Update layout_templ.go	5 days ago
Chris Lu	b3620c7e14	admin: auto migrating master maintenance scripts to admin_script plugin config (#8509 ) * admin: seed admin_script plugin config from master maintenance scripts When the admin server starts, fetch the maintenance scripts configuration from the master via GetMasterConfiguration. If the admin_script plugin worker does not already have a saved config, use the master's scripts as the default value. This enables seamless migration from master.toml [master.maintenance] to the admin script plugin worker. Changes: - Add maintenance_scripts and maintenance_sleep_minutes fields to GetMasterConfigurationResponse in master.proto - Populate the new fields from viper config in master_grpc_server.go - On admin server startup, fetch the master config and seed the admin_script plugin config if no config exists yet - Strip lock/unlock commands from the master scripts since the admin script worker handles locking automatically Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: address review comments on admin_script seeding - Replace TOCTOU race (separate Load+Save) with atomic SaveJobTypeConfigIfNotExists on ConfigStore and Plugin - Replace ineffective polling loop with single GetMaster call using 30s context timeout, since GetMaster respects context cancellation - Add unit tests for SaveJobTypeConfigIfNotExists (in-memory + on-disk) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: apply maintenance script defaults in gRPC handler The gRPC handler for GetMasterConfiguration read maintenance scripts from viper without calling SetDefault, relying on startAdminScripts having run first. If the admin server calls GetMasterConfiguration before startAdminScripts sets the defaults, viper returns empty strings and the seeding is silently skipped. Apply SetDefault in the gRPC handler itself so it is self-contained. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Revert "fix: apply maintenance script defaults in gRPC handler" This reverts commit `068a506330`. * fix: use atomic save in ensureJobTypeConfigFromDescriptor ensureJobTypeConfigFromDescriptor used a separate Load + Save, racing with seedAdminScriptFromMaster. If the descriptor defaults (empty script) were saved first, SaveJobTypeConfigIfNotExists in the seeding goroutine would see an existing config and skip, losing the master's maintenance scripts. Switch to SaveJobTypeConfigIfNotExists so both paths are atomic. Whichever wins, the other is a safe no-op. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: fetch master scripts inline during config bootstrap, not in goroutine Replace the seedAdminScriptFromMaster goroutine with a ConfigDefaultsProvider callback. When the plugin bootstraps admin_script defaults from the worker descriptor, it calls the provider which fetches maintenance scripts from the master synchronously. This eliminates the race between the seeding goroutine and the descriptor-based config bootstrap. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * skip commented lock unlock Co-Authored-By: Copilot <223556219+Copilot@users.noreply.github.com> * reduce grpc calls --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	6 days ago
Chris Lu	c19f88eef1	fix: resolve ServerAddress to NodeId in maintenance task sync (#8508 ) * fix: maintenance task topology lookup, retry, and stale task cleanup 1. Strip gRPC port from ServerAddress in SyncTask using ToHttpAddress() so task targets match topology disk keys (NodeId format). 2. Skip capacity check when topology has no disks yet (startup race where tasks are loaded from persistence before first topology update). 3. Don't retry permanent errors like "volume not found" - these will never succeed on retry. 4. Cancel all pending tasks for each task type before re-detection, ensuring stale proposals from previous cycles are cleaned up. This prevents stale tasks from blocking new detection and from repeatedly failing. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * logs Co-Authored-By: Copilot <223556219+Copilot@users.noreply.github.com> * less lock scope Co-Authored-By: Copilot <223556219+Copilot@users.noreply.github.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	6 days ago
Fábio Henrique Araújo	88e8342e44	style: Reseted padding to container-fluid div in layout template (#8505 ) * style: Reseted padding to container-fluid div in layout template * address comment Co-Authored-By: Copilot <223556219+Copilot@users.noreply.github.com> --------- Co-authored-by: Chris Lu <chris.lu@gmail.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	6 days ago
Copilot	70ed9c2a55	Update plugin_templ.go Co-Authored-By: Copilot <223556219+Copilot@users.noreply.github.com>	6 days ago
Chris Lu	45ce18266a	Disable master maintenance scripts when admin server runs (#8499 ) * Disable master maintenance scripts when admin server runs * Stop defaulting master maintenance scripts * Apply suggestion from @gemini-code-assist[bot] Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> * Apply suggestion from @gemini-code-assist[bot] Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> * Clarify master scripts are disabled by default * Skip master maintenance scripts when admin server is connected * Restore default master maintenance scripts * Document admin server skip for master maintenance scripts --------- Co-authored-by: Copilot <copilot@github.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	6 days ago
Chris Lu	18ccc9b773	Plugin scheduler: sequential iterations with max runtime (#8496 ) * pb: add job type max runtime setting * plugin: default job type max runtime * plugin: redesign scheduler loop * admin ui: update scheduler settings * plugin: fix scheduler loop state name * plugin scheduler: restore backlog skip * plugin scheduler: drop legacy detection helper * admin api: require scheduler config body * admin ui: preserve detection interval on save * plugin scheduler: use job context and drain cancels * plugin scheduler: respect detection intervals * plugin scheduler: gate runs and drain queue * ec test: reuse req/resp vars * ec test: add scheduler debug logs * Adjust scheduler idle sleep and initial run delay * Clear pending job queue before scheduler runs * Log next detection time in EC integration test * Improve plugin scheduler debug logging in EC test * Expose scheduler next detection time * Log scheduler next detection time in EC test * Wake scheduler on config or worker updates * Expose scheduler sleep interval in UI * Fix scheduler sleep save value selection * Set scheduler idle sleep default to 613s * Show scheduler next run time in plugin UI --------- Co-authored-by: Copilot <copilot@github.com>	6 days ago
Chris Lu	e1e5b4a8a6	add admin script worker (#8491 ) * admin: add plugin lock coordination * shell: allow bypassing lock checks * plugin worker: add admin script handler * mini: include admin_script in plugin defaults * admin script UI: drop name and enlarge text * admin script: add default script * admin_script: make run interval configurable * plugin: gate other jobs during admin_script runs * plugin: use last completed admin_script run * admin: backfill plugin config defaults * templ Co-Authored-By: Copilot <223556219+Copilot@users.noreply.github.com> * comparable to default version Co-Authored-By: Copilot <223556219+Copilot@users.noreply.github.com> * default to run Co-Authored-By: Copilot <223556219+Copilot@users.noreply.github.com> * format Co-Authored-By: Copilot <223556219+Copilot@users.noreply.github.com> * shell: respect pre-set noLock for fix.replication * shell: add force no-lock mode for admin scripts * volume balance worker already exists Co-Authored-By: Copilot <223556219+Copilot@users.noreply.github.com> * admin: expose scheduler status JSON * shell: add sleep command * shell: restrict sleep syntax * Revert "shell: respect pre-set noLock for fix.replication" This reverts commit `2b14e8b826`. * templ Co-Authored-By: Copilot <223556219+Copilot@users.noreply.github.com> * fix import Co-Authored-By: Copilot <223556219+Copilot@users.noreply.github.com> * less logs Co-Authored-By: Copilot <223556219+Copilot@users.noreply.github.com> * Reduce master client logs on canceled contexts * Update mini default job type count --------- Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	7 days ago
Chris Lu	a61a2affe3	Expire stuck plugin jobs (#8492 ) * Add stale job expiry and expire API * Add expire job button * Add test hook and coverage for ExpirePluginJobAPI * Document scheduler filtering side effect and reuse helper * Restore job spec proposal test * Regenerate plugin template output --------- Co-authored-by: Copilot <copilot@github.com>	1 week ago
Chris Lu	f5c35240be	Add volume dir tags and EC placement priority (#8472 ) * Add volume dir tags to topology Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Add preferred tag config for EC Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Prioritize EC destinations by tags Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Add EC placement planner tag tests Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Refactor EC placement tests to reuse buildActiveTopology Remove buildActiveTopologyWithDiskTags helper function and consolidate tag setup inline in test cases. Tests now use UpdateTopology to apply tags after topology creation, reusing the existing buildActiveTopology function rather than duplicating its logic. All tag scenario tests pass: - TestECPlacementPlannerPrefersTaggedDisks - TestECPlacementPlannerFallsBackWhenTagsInsufficient Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Consolidate normalizeTagList into shared util package Extract normalizeTagList from three locations (volume.go, detection.go, erasure_coding_handler.go) into new weed/util/tag.go as exported NormalizeTagList function. Replace all duplicate implementations with imports and calls to util.NormalizeTagList. This improves code reuse and maintainability by centralizing tag normalization logic. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Add PreferredTags to EC config persistence Add preferred_tags field to ErasureCodingTaskConfig protobuf with field number 5. Update GetConfigSpec to include preferred_tags field in the UI configuration schema. Add PreferredTags to ToTaskPolicy to serialize config to protobuf. Add PreferredTags to FromTaskPolicy to deserialize from protobuf with defensive copy to prevent external mutation. This allows EC preferred tags to be persisted and restored across worker restarts. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Add defensive copy for Tags slice in DiskLocation Copy the incoming tags slice in NewDiskLocation instead of storing by reference. This prevents external callers from mutating the DiskLocation.Tags slice after construction, improving encapsulation and preventing unexpected changes to disk metadata. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Add doc comment to buildCandidateSets method Document the tiered candidate selection and fallback behavior. Explain that for a planner with preferredTags, it accumulates disks matching each tag in order into progressively larger tiers, emits a candidate set once a tier reaches shardsNeeded, and finally falls back to the full candidates set if preferred-tag tiers are insufficient. This clarifies the intended semantics for future maintainers. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Apply final PR review fixes 1. Update parseVolumeTags to replicate single tag entry to all folders instead of leaving some folders with nil tags. This prevents nil pointer dereferences when processing folders without explicit tags. 2. Add defensive copy in ToTaskPolicy for PreferredTags slice to match the pattern used in FromTaskPolicy, preventing external mutation of the returned TaskPolicy. 3. Add clarifying comment in buildCandidateSets explaining that the shardsNeeded <= 0 branch is a defensive check for direct callers, since selectDestinations guarantees shardsNeeded > 0. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Fix nil pointer dereference in parseVolumeTags Ensure all folder tags are initialized to either normalized tags or empty slices, not nil. When multiple tag entries are provided and there are more folders than entries, remaining folders now get empty slices instead of nil, preventing nil pointer dereference in downstream code. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Fix NormalizeTagList to return empty slice instead of nil Change NormalizeTagList to always return a non-nil slice. When all tags are empty or whitespace after normalization, return an empty slice instead of nil. This prevents nil pointer dereferences in downstream code that expects a valid (possibly empty) slice. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Add nil safety check for v.tags pointer Add a safety check to handle the case where v.tags might be nil, preventing a nil pointer dereference. If v.tags is nil, use an empty string instead. This is defensive programming to prevent panics in edge cases. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Add volume.tags flag to weed server and weed mini commands Add the volume.tags CLI option to both the 'weed server' and 'weed mini' commands. This allows users to specify disk tags when running the combined server modes, just like they can with 'weed volume'. The flag uses the same format and description as the volume command: comma-separated tag groups per data dir with ':' separators (e.g. fast:ssd,archive). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --------- Co-authored-by: Copilot <copilot@github.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	1 week ago
Chris Lu	b9e560dcf1	Prevent overlapping maintenance tasks per volume (#8463 ) * Prevent concurrent maintenance tasks per volume * fix panic	2 weeks ago
Chris Lu	c73e65ad5e	Add customizable plugin display names and weights (#8459 ) * feat: add customizable plugin display names and weights - Add weight field to JobTypeCapability proto message - Modify ListKnownJobTypes() to return JobTypeInfo with display names and weights - Modify ListPluginJobTypes() to return JobTypeInfo instead of string - Sort plugins by weight (descending) then alphabetically - Update admin API to return enriched job type metadata - Update plugin UI template to display names instead of IDs - Consolidate API by reusing existing function names instead of suffixed variants * perf: optimize plugin job type capability lookup and add null-safe parsing - Pre-calculate job type capabilities in a map to reduce O(nm) nested loops to O(n+m) lookup time in ListKnownJobTypes() - Add parseJobTypeItem() helper function for null-safe job type item parsing - Refactor plugin.templ to use parseJobTypeItem() in all job type access points (hasJobType, applyInitialNavigation, ensureActiveNavigation, renderTopTabs) - Deterministic capability resolution by using first worker's capability templ * refactor: use parseJobTypeItem helper consistently in plugin.templ Replace duplicated job type extraction logic at line 1296-1298 with parseJobTypeItem() helper function for consistency and maintainability. * improve: prefer richer capability metadata and add null-safety checks - Improve capability selection in ListKnownJobTypes() to prefer capabilities with non-empty DisplayName and higher Weight across all workers instead of first-wins approach. Handles mixed-version clusters better. - Add defensive null checks in renderJobTypeSummary() to safely access parseJobTypeItem() result before property access - Ensures malformed or missing entries won't break the rendering pipeline * fix: preserve existing DisplayName when merging capabilities Fix capability merge logic to respect existing DisplayName values: - If existing has DisplayName but candidate doesn't, preserve existing - If existing doesn't have DisplayName but candidate does, use candidate - Only use Weight comparison if DisplayName status is equal - Prevents higher-weight capabilities with empty DisplayName from overriding capabilities with non-empty DisplayName	2 weeks ago
Chris Lu	a3cb7fa8cc	go fmt	2 weeks ago
Chris Lu	ce4940b441	fix filer link on dashboard	2 weeks ago
Anton	b4c7d42a06	fix(admin): release mutex before disk I/O in maintenance queue; remove per-request LoadAllTaskStates (#8433 ) * fix(admin): release mutex before disk I/O in maintenance queue saveTaskState performs synchronous BoltDB writes. Calling it while holding mq.mutex.Lock() in AddTask, GetNextTask, and CompleteTask blocks all readers (GetTasks via RLock) for the full disk write duration on every task state change. During a maintenance scan AddTasksFromResults calls AddTask for every volume — potentially hundreds of times — meaning the write lock is held almost continuously. The HTTP handler for /maintenance calls GetTasks which blocks on RLock, exceeding the 30s timeout and returning 408 to the browser. Fix: update in-memory state (mq.tasks, mq.pendingTasks) under the lock as before, then unlock before calling saveTaskState. In-memory state is the authoritative source; persistence is crash-recovery only and does not require lock protection during the write. * fix(admin): add mutex to ConfigPersistence to synchronize tasks/ filesystem ops saveTaskState is now called outside mq.mutex, meaning SaveTaskState, LoadAllTaskStates, DeleteTaskState, and CleanupCompletedTasks can be invoked concurrently from multiple goroutines. ConfigPersistence had no internal synchronization, creating races on the tasks/ directory: - concurrent os.WriteFile + os.ReadFile on the same .pb file could yield a partial read and unmarshal error - LoadAllTaskStates (ReadDir + per-file ReadFile) could see a directory entry for a file being written or deleted concurrently - CleanupCompletedTasks (LoadAllTaskStates + DeleteTaskState) could race with SaveTaskState on the same file Fix: add tasksMu sync.Mutex to ConfigPersistence, acquired at the top of SaveTaskState, LoadTaskState, LoadAllTaskStates, DeleteTaskState, and CleanupCompletedTasks. Extract private Locked helpers so that CleanupCompletedTasks (which holds tasksMu) can call them internally without deadlocking. --------- Co-authored-by: Anton Ustyugov <anton@devops>	2 weeks ago
Chris Lu	cba69f4593	Update layout_templ.go	2 weeks ago
Chris Lu	3f58e3bf8f	Use master shard sizes for EC volumes (#8423 ) * Use master shard sizes for EC volumes * Remove EC volume shard size fallback * Remove unused EC dash imports	2 weeks ago
Chris Lu	8d59ef41d5	Admin UI: replace gin with mux (#8420 ) * Replace admin gin router with mux * Update layout_templ.go * Harden admin handlers * Add login CSRF handling * Fix filer copy naming conflict * address comments * address comments	2 weeks ago
Chris Lu	07f284c391	fix links	2 weeks ago
Chris Lu	7b08cf74ed	consistent template generation	2 weeks ago
Chris Lu	8ec9ff4a12	Refactor plugin system and migrate worker runtime (#8369 ) * admin: add plugin runtime UI page and route wiring * pb: add plugin gRPC contract and generated bindings * admin/plugin: implement worker registry, runtime, monitoring, and config store * admin/dash: wire plugin runtime and expose plugin workflow APIs * command: add flags to enable plugin runtime * admin: rename remaining plugin v2 wording to plugin * admin/plugin: add detectable job type registry helper * admin/plugin: add scheduled detection and dispatch orchestration * admin/plugin: prefetch job type descriptors when workers connect * admin/plugin: add known job type discovery API and UI * admin/plugin: refresh design doc to match current implementation * admin/plugin: enforce per-worker scheduler concurrency limits * admin/plugin: use descriptor runtime defaults for scheduler policy * admin/ui: auto-load first known plugin job type on page open * admin/plugin: bootstrap persisted config from descriptor defaults * admin/plugin: dedupe scheduled proposals by dedupe key * admin/ui: add job type and state filters for plugin monitoring * admin/ui: add per-job-type plugin activity summary * admin/plugin: split descriptor read API from schema refresh * admin/ui: keep plugin summary metrics global while tables are filtered * admin/plugin: retry executor reservation before timing out * admin/plugin: expose scheduler states for monitoring * admin/ui: show per-job-type scheduler states in plugin monitor * pb/plugin: rename protobuf package to plugin * admin/plugin: rename pluginRuntime wiring to plugin * admin/plugin: remove runtime naming from plugin APIs and UI * admin/plugin: rename runtime files to plugin naming * admin/plugin: persist jobs and activities for monitor recovery * admin/plugin: lease one detector worker per job type * admin/ui: show worker load from plugin heartbeats * admin/plugin: skip stale workers for detector and executor picks * plugin/worker: add plugin worker command and stream runtime scaffold * plugin/worker: implement vacuum detect and execute handlers * admin/plugin: document external vacuum plugin worker starter * command: update plugin.worker help to reflect implemented flow * command/admin: drop legacy Plugin V2 label * plugin/worker: validate vacuum job type and respect min interval * plugin/worker: test no-op detect when min interval not elapsed * command/admin: document plugin.worker external process * plugin/worker: advertise configured concurrency in hello * command/plugin.worker: add jobType handler selection * command/plugin.worker: test handler selection by job type * command/plugin.worker: persist worker id in workingDir * admin/plugin: document plugin.worker jobType and workingDir flags * plugin/worker: support cancel request for in-flight work * plugin/worker: test cancel request acknowledgements * command/plugin.worker: document workingDir and jobType behavior * plugin/worker: emit executor activity events for monitor * plugin/worker: test executor activity builder * admin/plugin: send last successful run in detection request * admin/plugin: send cancel request when detect or execute context ends * admin/plugin: document worker cancel request responsibility * admin/handlers: expose plugin scheduler states API in no-auth mode * admin/handlers: test plugin scheduler states route registration * admin/plugin: keep worker id on worker-generated activity records * admin/plugin: test worker id propagation in monitor activities * admin/dash: always initialize plugin service * command/admin: remove plugin enable flags and default to enabled * admin/dash: drop pluginEnabled constructor parameter * admin/plugin UI: stop checking plugin enabled state * admin/plugin: remove docs for plugin enable flags * admin/dash: remove unused plugin enabled check method * admin/dash: fallback to in-memory plugin init when dataDir fails * admin/plugin API: expose worker gRPC port in status * command/plugin.worker: resolve admin gRPC port via plugin status * split plugin UI into overview/configuration/monitoring pages * Update layout_templ.go * add volume_balance plugin worker handler * wire plugin.worker CLI for volume_balance job type * add erasure_coding plugin worker handler * wire plugin.worker CLI for erasure_coding job type * support multi-job handlers in plugin worker runtime * allow plugin.worker jobType as comma-separated list * admin/plugin UI: rename to Workers and simplify config view * plugin worker: queue detection requests instead of capacity reject * Update plugin_worker.go * plugin volume_balance: remove force_move/timeout from worker config UI * plugin erasure_coding: enforce local working dir and cleanup * admin/plugin UI: rename admin settings to job scheduling * admin/plugin UI: persist and robustly render detection results * admin/plugin: record and return detection trace metadata * admin/plugin UI: show detection process and decision trace * plugin: surface detector decision trace as activities * mini: start a plugin worker by default * admin/plugin UI: split monitoring into detection and execution tabs * plugin worker: emit detection decision trace for EC and balance * admin workers UI: split monitoring into detection and execution pages * plugin scheduler: skip proposals for active assigned/running jobs * admin workers UI: add job queue tab * plugin worker: add dummy stress detector and executor job type * admin workers UI: reorder tabs to detection queue execution * admin workers UI: regenerate plugin template * plugin defaults: include dummy stress and add stress tests * plugin dummy stress: rotate detection selections across runs * plugin scheduler: remove cross-run proposal dedupe * plugin queue: track pending scheduled jobs * plugin scheduler: wait for executor capacity before dispatch * plugin scheduler: skip detection when waiting backlog is high * plugin: add disk-backed job detail API and persistence * admin ui: show plugin job detail modal from job id links * plugin: generate unique job ids instead of reusing proposal ids * plugin worker: emit heartbeats on work state changes * plugin registry: round-robin tied executor and detector picks * add temporary EC overnight stress runner * plugin job details: persist and render EC execution plans * ec volume details: color data and parity shard badges * shard labels: keep parity ids numeric and color-only distinction * admin: remove legacy maintenance UI routes and templates * admin: remove dead maintenance endpoint helpers * Update layout_templ.go * remove dummy_stress worker and command support * refactor plugin UI to job-type top tabs and sub-tabs * migrate weed worker command to plugin runtime * remove plugin.worker command and keep worker runtime with metrics * update helm worker args for jobType and execution flags * set plugin scheduling defaults to global 16 and per-worker 4 * stress: fix RPC context reuse and remove redundant variables in ec_stress_runner * admin/plugin: fix lifecycle races, safe channel operations, and terminal state constants * admin/dash: randomize job IDs and fix priority zero-value overwrite in plugin API * admin/handlers: implement buffered rendering to prevent response corruption * admin/plugin: implement debounced persistence flusher and optimize BuildJobDetail memory lookups * admin/plugin: fix priority overwrite and implement bounded wait in scheduler reserve * admin/plugin: implement atomic file writes and fix run record side effects * admin/plugin: use P prefix for parity shard labels in execution plans * admin/plugin: enable parallel execution for cancellation tests * admin: refactor time.Time fields to pointers for better JSON omitempty support * admin/plugin: implement pointer-safe time assignments and comparisons in plugin core * admin/plugin: fix time assignment and sorting logic in plugin monitor after pointer refactor * admin/plugin: update scheduler activity tracking to use time pointers * admin/plugin: fix time-based run history trimming after pointer refactor * admin/dash: fix JobSpec struct literal in plugin API after pointer refactor * admin/view: add D/P prefixes to EC shard badges for UI consistency * admin/plugin: use lifecycle-aware context for schema prefetching * Update ec_volume_details_templ.go * admin/stress: fix proposal sorting and log volume cleanup errors * stress: refine ec stress runner with math/rand and collection name - Added Collection field to VolumeEcShardsDeleteRequest for correct filename construction. - Replaced crypto/rand with seeded math/rand PRNG for bulk payloads. - Added documentation for EcMinAge zero-value behavior. - Added logging for ignored errors in volume/shard deletion. * admin: return internal server error for plugin store failures Changed error status code from 400 Bad Request to 500 Internal Server Error for failures in GetPluginJobDetail to correctly reflect server-side errors. * admin: implement safe channel sends and graceful shutdown sync - Added sync.WaitGroup to Plugin struct to manage background goroutines. - Implemented safeSendCh helper using recover() to prevent panics on closed channels. - Ensured Shutdown() waits for all background operations to complete. * admin: robustify plugin monitor with nil-safe time and record init - Standardized nil-safe assignment for time.Time pointers (CreatedAt, UpdatedAt, CompletedAt). - Ensured persistJobDetailSnapshot initializes new records correctly if they don't exist on disk. - Fixed debounced persistence to trigger immediate write on job completion. admin: improve scheduler shutdown behavior and logic guards - Replaced brittle error string matching with explicit r.shutdownCh selection for shutdown detection. - Removed redundant nil guard in buildScheduledJobSpec. - Standardized WaitGroup usage for schedulerLoop. * admin: implement deep copy for job parameters and atomic write fixes - Implemented deepCopyGenericValue and used it in cloneTrackedJob to prevent shared state. - Ensured atomicWriteFile creates parent directories before writing. * admin: remove unreachable branch in shard classification Removed an unreachable 'totalShards <= 0' check in classifyShardID as dataShards and parityShards are already guarded. * admin: secure UI links and use canonical shard constants - Added rel="noopener noreferrer" to external links for security. - Replaced magic number 14 with erasure_coding.TotalShardsCount. - Used renderEcShardBadge for missing shard list consistency. * admin: stabilize plugin tests and fix regressions - Composed a robust plugin_monitor_test.go to handle asynchronous persistence. - Updated all time.Time literals to use timeToPtr helper. - Added explicit Shutdown() calls in tests to synchronize with debounced writes. - Fixed syntax errors and orphaned struct literals in tests. * Potential fix for code scanning alert no. 278: Slice memory allocation with excessive size value Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com> * Potential fix for code scanning alert no. 283: Uncontrolled data used in path expression Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com> * admin: finalize refinements for error handling, scheduler, and race fixes - Standardized HTTP 500 status codes for store failures in plugin_api.go. - Tracked scheduled detection goroutines with sync.WaitGroup for safe shutdown. - Fixed race condition in safeSendDetectionComplete by extracting channel under lock. - Implemented deep copy for JobActivity details. - Used defaultDirPerm constant in atomicWriteFile. * test(ec): migrate admin dockertest to plugin APIs * admin/plugin_api: fix RunPluginJobTypeAPI to return 500 for server-side detection/filter errors * admin/plugin_api: fix ExecutePluginJobAPI to return 500 for job execution failures * admin/plugin_api: limit parseProtoJSONBody request body to 1MB to prevent unbounded memory usage * admin/plugin: consolidate regex to package-level validJobTypePattern; add char validation to sanitizeJobID * admin/plugin: fix racy Shutdown channel close with sync.Once * admin/plugin: track sendLoop and recv goroutines in WorkerStream with r.wg * admin/plugin: document writeProtoFiles atomicity — .pb is source of truth, .json is human-readable only * admin/plugin: extract activityLess helper to deduplicate nil-safe OccurredAt sort comparators * test/ec: check http.NewRequest errors to prevent nil req panics * test/ec: replace deprecated ioutil/math/rand, fix stale step comment 5.1→3.1 * plugin(ec): raise default detection and scheduling throughput limits * topology: include empty disks in volume list and EC capacity fallback * topology: remove hard 10-task cap for detection planning * Update ec_volume_details_templ.go * adjust default * fix tests --------- Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>	3 weeks ago
Chris Lu	f44e25b422	fix(iam): ensure access key status is persisted and defaulted to Active (#8341 ) * Fix master leader election startup issue Fixes #error-log-leader-not-selected-yet * not useful test * fix(iam): ensure access key status is persisted and defaulted to Active * make pb * update tests * using constants	4 weeks ago
Chris Lu	1b2f719d7c	admin: fix file browser items-per-page selector (#8291 ) * admin: fix file browser page size selector Fix file browser pagination page-size selectors to use explicit select IDs instead of this.value in templ-generated handlers, which could resolve to undefined and produce limit=undefined in requests. Add a focused template render regression test to prevent this from recurring. Fixes #8284 * revert file browser template regression test	4 weeks ago
Chris Lu	db76eb26e7	compile	4 weeks ago
Chris Lu	aef2de3109	s3tables: support multi-level namespaces in parser/admin paths (#8273 ) * s3tables: support multi-level namespace normalization * admin: handle namespace parsing errors centrally * admin: clean namespace validation duplication	4 weeks ago
Chris Lu	be26ce74ce	s3tables: support multi-level namespace normalization	4 weeks ago
Chris Lu	5a0204310c	Add Iceberg admin UI (#8246 ) * Add Iceberg table details view * Enhance Iceberg catalog browsing UI * Fix Iceberg UI security and logic issues - Fix selectSchema() and partitionFieldsFromFullMetadata() to always search for matching IDs instead of checking != 0 - Fix snapshotsFromFullMetadata() to defensive-copy before sorting to prevent mutating caller's slice - Fix XSS vulnerabilities in s3tables.js: replace innerHTML with textContent/createElement for user-controlled data - Fix deleteIcebergTable() to redirect to namespace tables list on details page instead of reloading - Fix data-bs-target in iceberg_namespaces.templ: remove templ.SafeURL for CSS selector - Add catalogName to delete modal data attributes for proper redirect - Remove unused hidden inputs from create table form (icebergTableBucketArn, icebergTableNamespace) * Regenerate templ files for Iceberg UI updates * Support complex Iceberg type objects in schema Change Type field from string to json.RawMessage in both IcebergSchemaFieldInfo and internal icebergSchemaField to properly handle Iceberg spec's complex type objects (e.g. {"type": "struct", "fields": [...]}). Currently test data only shows primitive string types, but this change makes the implementation defensively robust for future complex types by preserving the exact JSON representation. Add typeToString() helper and update schema extraction functions to marshal string types as JSON. Update template to convert json.RawMessage to string for display. * Regenerate templ files for Type field changes * templ * Fix additional Iceberg UI issues from code review - Fix lazy-load flag that was set before async operation completed, preventing retries on error; now sets loaded flag only after successful load and throws error to caller for proper error handling and UI updates - Add zero-time guards for CreatedAt and ModifiedAt fields in table details to avoid displaying Go zero-time values; render dash when time is zero - Add URL path escaping for all catalog/namespace/table names in URLs to prevent malformed URLs when names contain special characters like /, ?, or # - Remove redundant innerHTML clear in loadIcebergNamespaceTables that cleared twice before appending the table list - Fix selectSnapshotForMetrics to remove != 0 guard for consistency with selectSchema fix; now always searches for CurrentSnapshotID without zero-value gate - Enhance typeToString() helper to display '(complex)' for non-primitive JSON types * Regenerate templ files for Phase 3 updates * Fix template generation to use correct file paths Run templ generate from repo root instead of weed/admin directory to ensure generated _templ.go files have correct absolute paths in error messages (e.g., 'weed/admin/view/app/iceberg_table_details.templ' instead of 'app/iceberg_table_details.templ'). This ensures both 'make admin-generate' at repo root and 'make generate' in weed/admin directory produce identical output with consistent file path references. * Regenerate template files with correct path references * Validate S3 Tables names in UI - Add client-side validation for table bucket and namespace names to surface errors for invalid characters (dots/underscores) before submission - Use HTML validity messages with reportValidity for immediate feedback - Update namespace helper text to reflect actual constraints (single-level, lowercase letters, numbers, and underscores) * Regenerate templ files for namespace helper text * Fix Iceberg catalog REST link and actions * Disallow S3 object access on table buckets * Validate Iceberg layout for table bucket objects * Fix REST API link to /v1/config * merge iceberg page with table bucket page * Allowed Trino/Iceberg stats files in metadata validation * fixes - Backend/data handling: - Normalized Iceberg type display and fallback handling in weed/admin/dash/s3tables_management.go. - Fixed snapshot fallback pointer semantics in weed/admin/dash/s3tables_management.go. - Added CSRF token generation/propagation/validation for namespace create/delete in: - weed/admin/dash/csrf.go - weed/admin/dash/auth_middleware.go - weed/admin/dash/middleware.go - weed/admin/dash/s3tables_management.go - weed/admin/view/layout/layout.templ - weed/admin/static/js/s3tables.js - UI/template fixes: - Zero-time guards for CreatedAt fields in: - weed/admin/view/app/iceberg_namespaces.templ - weed/admin/view/app/iceberg_tables.templ - Fixed invalid templ-in-script interpolation and host/port rendering in: - weed/admin/view/app/iceberg_catalog.templ - weed/admin/view/app/s3tables_buckets.templ - Added data-catalog-name consistency on Iceberg delete action in weed/admin/view/app/iceberg_tables.templ. - Updated retry wording in weed/admin/static/js/s3tables.js. - Regenerated all affected _templ.go files. - S3 API/comment follow-ups: - Reused cached table-bucket validator in weed/s3api/bucket_paths.go. - Added validation-failure debug logging in weed/s3api/s3api_object_handlers_tagging.go. - Added multipart path-validation design comment in weed/s3api/s3api_object_handlers_multipart.go. - Build tooling: - Fixed templ generate working directory issues in weed/admin/Makefile (watch + pattern rule). * populate data * test/s3tables: harden populate service checks * admin: skip table buckets in object-store bucket list * admin sidebar: move object store to top-level links * admin iceberg catalog: guard zero times and escape links * admin forms: add csrf/error handling and client-side name validation * admin s3tables: fix namespace delete modal redeclaration * admin: replace native confirm dialogs with modal helpers * admin modal-alerts: remove noisy confirm usage console log * reduce logs * test/s3tables: use partitioned tables in trino and spark populate * admin file browser: normalize filer ServerAddress for HTTP parsing	4 weeks ago
Andrii Bratanin	aba42419be	Fix tip message in maintenance_workers.templ (#8245 )	1 month ago
Chris Lu	3bb9493a5b	Enhance Iceberg catalog browsing UI	1 month ago
Chris Lu	d9e3fb2b8e	Add Iceberg table details view	1 month ago
Chris Lu	e6ee293c17	Add table operations test (#8241 ) * Add Trino blog operations test * Update test/s3tables/catalog_trino/trino_blog_operations_test.go Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> * feat: add table bucket path helpers and filer operations - Add table object root and table location mapping directories - Implement ensureDirectory, upsertFile, deleteEntryIfExists helpers - Support table location bucket mapping for S3 access * feat: manage table bucket object roots on creation/deletion - Create .objects directory for table buckets on creation - Clean up table object bucket paths on deletion - Enable S3 operations on table bucket object roots * feat: add table location mapping for Iceberg REST - Track table location bucket mappings when tables are created/updated/deleted - Enable location-based routing for S3 operations on table data * feat: route S3 operations to table bucket object roots - Route table-s3 bucket names to mapped table paths - Route table buckets to object root directories - Support table location bucket mapping lookup * feat: emit table-s3 locations from Iceberg REST - Generate unique table-s3 bucket names with UUID suffix - Store table metadata under table bucket paths - Return table-s3 locations for Trino compatibility * fix: handle missing directories in S3 list operations - Propagate ErrNotFound from ListEntries for non-existent directories - Treat missing directories as empty results for list operations - Fixes Trino non-empty location checks on table creation * test: improve Trino CSV parsing for single-value results - Sanitize Trino output to skip jline warnings - Handle single-value CSV results without header rows - Strip quotes from numeric values in tests * refactor: use bucket path helpers throughout S3 API - Replace direct bucket path operations with helper functions - Leverage centralized table bucket routing logic - Improve maintainability with consistent path resolution * fix: add table bucket cache and improve filer error handling - Cache table bucket lookups to reduce filer overhead on repeated checks - Use filer_pb.CreateEntry and filer_pb.UpdateEntry helpers to check resp.Error - Fix delete order in handler_bucket_get_list_delete: delete table object before directory - Make location mapping errors best-effort: log and continue, don't fail API - Update table location mappings to delete stale prior bucket mappings on update - Add 1-second sleep before timestamp time travel query to ensure timestamps are in past - Fix CSV parsing: examine all lines, not skip first; handle single-value rows * fix: properly handle stale metadata location mapping cleanup - Capture oldMetadataLocation before mutation in handleUpdateTable - Update updateTableLocationMapping to accept both old and new locations - Use passed-in oldMetadataLocation to detect location changes - Delete stale mapping only when location actually changes - Pass empty string for oldLocation in handleCreateTable (new tables have no prior mapping) - Improve logging to show old -> new location transitions * refactor: cleanup imports and cache design - Remove unused 'sync' import from bucket_paths.go - Use filer_pb.UpdateEntry helper in setExtendedAttribute and deleteExtendedAttribute for consistent error handling - Add dedicated tableBucketCache map[string]bool to BucketRegistry instead of mixing concerns with metadataCache - Improve cache separation: table buckets cache is now separate from bucket metadata cache * fix: improve cache invalidation and add transient error handling Cache invalidation (critical fix): - Add tableLocationCache to BucketRegistry for location mapping lookups - Clear tableBucketCache and tableLocationCache in RemoveBucketMetadata - Prevents stale cache entries when buckets are deleted/recreated Transient error handling: - Only cache table bucket lookups when conclusive (found or ErrNotFound) - Skip caching on transient errors (network, permission, etc) - Prevents marking real table buckets as non-table due to transient failures Performance optimization: - Cache tableLocationDir results to avoid repeated filer RPCs on hot paths - tableLocationDir now checks cache before making expensive filer lookups - Cache stores empty string for 'not found' to avoid redundant lookups Code clarity: - Add comment to deleteDirectory explaining DeleteEntry response lacks Error field * go fmt * fix: mirror transient error handling in tableLocationDir and optimize bucketDir Transient error handling: - tableLocationDir now only caches definitive results - Mirrors isTableBucket behavior to prevent treating transient errors as permanent misses - Improves reliability on flaky systems or during recovery Performance optimization: - bucketDir avoids redundant isTableBucket call via bucketRoot - Directly use s3a.option.BucketsPath for regular buckets - Saves one cache lookup for every non-table bucket operation * fix: revert bucketDir optimization to preserve bucketRoot logic The optimization to directly use BucketsPath bypassed bucketRoot's logic and caused issues with S3 list operations on delimiter+prefix cases. Revert to using path.Join(s3a.bucketRoot(bucket), bucket) which properly handles all bucket types and ensures consistent path resolution across the codebase. The slight performance cost of an extra cache lookup is worth the correctness and consistency benefits. * feat: move table buckets under /buckets Add a table-bucket marker attribute, reuse bucket metadata cache for table bucket detection, and update list/validation/UI/test paths to treat table buckets as /buckets entries. * Fix S3 Tables code review issues - handler_bucket_create.go: Fix bucket existence check to properly validate entryResp.Entry before setting s3BucketExists flag (nil Entry should not indicate existing bucket) - bucket_paths.go: Add clarifying comment to bucketRoot() explaining unified buckets root path for all bucket types - file_browser_data.go: Optimize by extracting table bucket check early to avoid redundant WithFilerClient call * Fix list prefix delimiter handling * Handle list errors conservatively * Fix Trino FOR TIMESTAMP query - use past timestamp Iceberg requires the timestamp to be strictly in the past. Use current_timestamp - interval '1' second instead of current_timestamp. --------- Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	1 month ago

1 2 3 4

171 Commits (f8c06f3994a56bd24528ca72e2805e367683d13b)