seaweedfs

Commit Graph

Author	SHA1	Message	Date
Chris Lu	fa7da0f57e	template	1 day ago
Chris Lu	961c270aba	admin: expose per-job-type detection interval in plugin UI (#8552 ) * admin: expose per-job-type detection interval in plugin UI The detection_interval_seconds field was not editable in the admin UI. collectAdminSettings() silently preserved the existing value, making it impossible for users to change how often a job type checks for new work. Users would change the global "Sleep Between Iterations" setting expecting it to control job scheduling frequency, but that only controls the scheduler loop's idle polling rate. Add a "Detection Interval (s)" input to the per-job-type admin settings form so users can actually configure it. Fixes #8549 * admin: remove global Sleep Between Iterations setting Now that per-job-type detection intervals are exposed in the UI, the global IdleSleepSeconds setting is redundant and confusing. It only controlled the scheduler loop's idle polling rate, which is always overridden by earliestNextDetectionAt() when job types exist. Replace the three usages with simpler alternatives: - Scheduler loop sleep: use defaultSchedulerIdleSleep constant - Initial delay for new job types: use policy.DetectionInterval/2 (more logical since it's already per-job-type) - Status fallback: use the constant The API endpoints are kept for backward compatibility but the UI no longer exposes or calls them. * admin: restore configurable idle sleep in scheduler loop The EC integration test sets idle_sleep_seconds=1 via the scheduler config API so the scheduler wakes quickly after workers connect. The previous commit replaced this with a hardcoded 613s constant, causing the scheduler to sleep through the entire test window. Restore GetSchedulerConfig().IdleSleepDuration() in the scheduler loop and status reporting. The UI removal of the setting is still correct — the API endpoint remains for programmatic use (e.g., tests). * admin: cap first-run initial delay to 5s instead of DetectionInterval/2 The initial delay for first-run job types was set to policy.DetectionInterval/2, which creates unbounded first-run latency (e.g., 1 hour for vacuum with a 2-hour detection interval). A small fixed 5-second delay provides sufficient stagger without penalizing startup time.	1 day ago
Chris Lu	e25558e4d8	admin: fix mobile sidebar menu inaccessible in portrait mode (#8556 ) * admin: fix mobile sidebar menu inaccessible in portrait mode The hamburger button only toggled the user dropdown, leaving the sidebar navigation inaccessible on mobile devices in portrait mode. Add a dedicated sidebar toggle button (visible only on mobile), give the sidebar an id so Bootstrap collapse can target it, add a backdrop overlay for the open state, and auto-close the sidebar when a nav link is clicked. Fixes #8550 * admin: address review feedback on mobile sidebar - Remove redundant JS show/hide.bs.collapse listeners; CSS sibling selector already handles backdrop visibility - Use const instead of var for non-reassigned variables - Move inline style on user icon to CSS class * admin: add aria attributes to user-menu toggler, use CSS variable for navbar height - Add aria-controls, aria-expanded, and aria-label to the user-menu toggle button for assistive technology - Extract hard-coded 56px navbar height into --navbar-height CSS custom property used by sidebar and backdrop positioning * admin: extract hideSidebar helper, use toggler visibility for breakpoint check - Extract duplicated collapse-hide logic into a hideSidebar helper - Replace hardcoded window.innerWidth < 768 with a check on the sidebar toggler's computed display, decoupling JS from CSS breakpoints - Add aria-expanded="false" to sidebar toggle button --------- Co-authored-by: Copilot <copilot@github.com>	1 day ago
Chris Lu	3f946fc0c0	mount: make metadata cache rebuilds snapshot-consistent (#8531 ) * filer: expose metadata events and list snapshots * mount: invalidate hot directory caches * mount: read hot directories directly from filer * mount: add sequenced metadata cache applier * mount: apply metadata responses through cache applier * mount: replay snapshot-consistent directory builds * mount: dedupe self metadata events * mount: factor directory build cleanup * mount: replace proto marshal dedup with composite key and ring buffer The dedup logic was doing a full deterministic proto.Marshal on every metadata event just to produce a dedup key. Replace with a cheap composite string key (TsNs\|Directory\|OldName\|NewName). Also replace the sliding-window slice (which leaked the backing array unboundedly) with a fixed-size ring buffer that reuses the same array. * filer: remove mutex and proto.Clone from request-scoped MetadataEventSink MetadataEventSink is created per-request and only accessed by the goroutine handling the gRPC call. The mutex and double proto.Clone (once in Record, once in Last) were unnecessary overhead on every filer write operation. Store the pointer directly instead. * mount: skip proto.Clone for caller-owned metadata events Add ApplyMetadataResponseOwned that takes ownership of the response without cloning. Local metadata events (mkdir, create, flush, etc.) are freshly constructed and never shared, so the clone is unnecessary. * filer: only populate MetadataEvent on successful DeleteEntry Avoid calling eventSink.Last() on error paths where the sink may contain a partial event from an intermediate child deletion during recursive deletes. * mount: avoid map allocation in collectDirectoryNotifications Replace the map with a fixed-size array and linear dedup. There are at most 3 directories to notify (old parent, new parent, new child if directory), so a 3-element array avoids the heap allocation on every metadata event. * mount: fix potential deadlock in enqueueApplyRequest Release applyStateMu before the blocking channel send. Previously, if the channel was full (cap 128), the send would block while holding the mutex, preventing Shutdown from acquiring it to set applyClosed. * mount: restore signature-based self-event filtering as fast path Re-add the signature check that was removed when content-based dedup was introduced. Checking signatures is O(1) on a small slice and avoids enqueuing and processing events that originated from this mount instance. The content-based dedup remains as a fallback. * filer: send snapshotTsNs only in first ListEntries response The snapshot timestamp is identical for every entry in a single ListEntries stream. Sending it in every response message wastes wire bandwidth for large directories. The client already reads it only from the first response. * mount: exit read-through mode after successful full directory listing MarkDirectoryRefreshed was defined but never called, so directories that entered read-through mode (hot invalidation threshold) stayed there permanently, hitting the filer on every readdir even when cold. Call it after a complete read-through listing finishes. * mount: include event shape and full paths in dedup key The previous dedup key only used Names, which could collapse distinct rename targets. Include the event shape (C/D/U/R), source directory, new parent path, and both entry names so structurally different events are never treated as duplicates. * mount: drain pending requests on shutdown in runApplyLoop After receiving the shutdown sentinel, drain any remaining requests from applyCh non-blockingly and signal each with errMetaCacheClosed so callers waiting on req.done are released. * mount: include IsDirectory in synthetic delete events metadataDeleteEvent now accepts an isDirectory parameter so the applier can distinguish directory deletes from file deletes. Rmdir passes true, Unlink passes false. * mount: fall back to synthetic event when MetadataEvent is nil In mknod and mkdir, if the filer response omits MetadataEvent (e.g. older filer without the field), synthesize an equivalent local metadata event so the cache is always updated. * mount: make Flush metadata apply best-effort after successful commit After filer_pb.CreateEntryWithResponse succeeds, the entry is persisted. Don't fail the Flush syscall if the local metadata cache apply fails — log and invalidate the directory cache instead. Also fall back to a synthetic event when MetadataEvent is nil. * mount: make Rename metadata apply best-effort The rename has already succeeded on the filer by the time we apply the local metadata event. Log failures instead of returning errors that would be dropped by the caller anyway. * mount: make saveEntry metadata apply best-effort with fallback After UpdateEntryWithResponse succeeds, treat local metadata apply as non-fatal. Log and invalidate the directory cache on failure. Also fall back to a synthetic event when MetadataEvent is nil. * filer_pb: preserve snapshotTsNs on error in ReadDirAllEntriesWithSnapshot Return the snapshot timestamp even when the first page fails, so callers receive the snapshot boundary when partial data was received. * filer: send snapshot token for empty directory listings When no entries are streamed, send a final ListEntriesResponse with only SnapshotTsNs so clients always receive the snapshot boundary. * mount: distinguish not-found vs transient errors in lookupEntry Return fuse.EIO for non-not-found filer errors instead of unconditionally returning ENOENT, so transient failures don't masquerade as missing entries. * mount: make CacheRemoteObject metadata apply best-effort The file content has already been cached successfully. Don't fail the read if the local metadata cache update fails. * mount: use consistent snapshot for readdir in direct mode Capture the SnapshotTsNs from the first loadDirectoryEntriesDirect call and store it on the DirectoryHandle. Subsequent batch loads pass this stored timestamp so all batches use the same snapshot. Also export DoSeaweedListWithSnapshot so mount can use it directly with snapshot passthrough. * filer_pb: fix test fake to send SnapshotTsNs only on first response Match the server behavior: only the first ListEntriesResponse in a page carries the snapshot timestamp, subsequent entries leave it zero. * Fix nil pointer dereference in ListEntries stream consumers Remove the empty-directory snapshot-only response from ListEntries that sent a ListEntriesResponse with Entry==nil, which crashed every raw stream consumer that assumed resp.Entry is always non-nil. Also add defensive nil checks for resp.Entry in all raw ListEntries stream consumers across: S3 listing, broker topic lookup, broker topic config, admin dashboard, topic retention, hybrid message scanner, Kafka integration, and consumer offset storage. * Add nil guards for resp.Entry in remaining ListEntries stream consumers Covers: S3 object lock check, MQ management dashboard (version/ partition/offset loops), and topic retention version loop. * Make applyLocalMetadataEvent best-effort in Link and Symlink The filer operations already succeeded; failing the syscall because the local cache apply failed is wrong. Log a warning and invalidate the parent directory cache instead. * Make applyLocalMetadataEvent best-effort in Mkdir/Rmdir/Mknod/Unlink The filer RPC already committed; don't fail the syscall when the local metadata cache apply fails. Log a warning and invalidate the parent directory cache to force a re-fetch on next access. * flushFileMetadata: add nil-fallback for metadata event and best-effort apply Synthesize a metadata event when resp.GetMetadataEvent() is nil (matching doFlush), and make the apply best-effort with cache invalidation on failure. * Prevent double-invocation of cleanupBuild in doEnsureVisited Add a cleanupDone guard so the deferred cleanup and inline error-path cleanup don't both call DeleteFolderChildren/AbortDirectoryBuild. * Fix comment: signature check is O(n) not O(1) * Prevent deferred cleanup after successful CompleteDirectoryBuild Set cleanupDone before returning from the success path so the deferred context-cancellation check cannot undo a published build. * Invalidate parent directory caches on rename metadata apply failure When applyLocalMetadataEvent fails during rename, invalidate the source and destination parent directory caches so subsequent accesses trigger a re-fetch from the filer. * Add event nil-fallback and cache invalidation to Link and Symlink Synthesize metadata events when the server doesn't return one, and invalidate parent directory caches on apply failure. * Match requested partition when scanning partition directories Parse the partition range format (NNNN-NNNN) and match against the requested partition parameter instead of using the first directory. * Preserve snapshot timestamp across empty directory listings Initialize actualSnapshotTsNs from the caller-requested value so it isn't lost when the server returns no entries. Re-add the server-side snapshot-only response for empty directories (all raw stream consumers now have nil guards for Entry). * Fix CreateEntry error wrapping to support errors.Is/errors.As Use errors.New + %w instead of %v for resp.Error so callers can unwrap the underlying error. * Fix object lock pagination: only advance on non-nil entries Move entriesReceived inside the nil check so nil entries don't cause repeated ListEntries calls with the same lastFileName. * Guard Attributes nil check before accessing Mtime in MQ management * Do not send nil-Entry response for empty directory listings The snapshot-only ListEntriesResponse (with Entry == nil) for empty directories breaks consumers that treat any received response as an entry (Java FilerClient, S3 listing). The Go client-side DoSeaweedListWithSnapshot already preserves the caller-requested snapshot via actualSnapshotTsNs initialization, so the server-side send is unnecessary. * Fix review findings: subscriber dedup, invalidation normalization, nil guards, shutdown race - Remove self-signature early-return in processEventFn so all events flow through the applier (directory-build buffering sees self-originated events that arrive after a snapshot) - Normalize NewParentPath in collectEntryInvalidations to avoid duplicate invalidations when NewParentPath is empty (same-directory update) - Guard resp.Entry.Attributes for nil in admin_server.go and topic_retention.go to prevent panics on entries without attributes - Fix enqueueApplyRequest race with shutdown by using select on both applyCh and applyDone, preventing sends after the apply loop exits - Add cleanupDone check to deferred cleanup in meta_cache_init.go for clarity alongside the existing guard in cleanupBuild - Add empty directory test case for snapshot consistency * Propagate authoritative metadata event from CacheRemoteObjectToLocalCluster and generate client-side snapshot for empty directories - Add metadata_event field to CacheRemoteObjectToLocalClusterResponse proto so the filer-emitted event is available to callers - Use WithMetadataEventSink in the server handler to capture the event from NotifyUpdateEvent and return it on the response - Update filehandle_read.go to prefer the RPC's metadata event over a locally fabricated one, falling back to metadataUpdateEvent when the server doesn't provide one (e.g., older filers) - Generate a client-side snapshot cutoff in DoSeaweedListWithSnapshot when the server sends no snapshot (empty directory), so callers like CompleteDirectoryBuild get a meaningful boundary for filtering buffered events * Skip directory notifications for dirs being built to prevent mid-build cache wipe When a metadata event is buffered during a directory build, applyMetadataSideEffects was still firing noteDirectoryUpdate for the building directory. If the directory accumulated enough updates to become "hot", markDirectoryReadThrough would call DeleteFolderChildren, wiping entries that EnsureVisited had already inserted. The build would then complete and mark the directory cached with incomplete data. Fix by using applyMetadataSideEffectsSkippingBuildingDirs for buffered events, which suppresses directory notifications for dirs currently in buildingDirs while still applying entry invalidations. * Add test for directory notification suppression during active build TestDirectoryNotificationsSuppressedDuringBuild verifies that metadata events targeting a directory under active EnsureVisited build do NOT fire onDirectoryUpdate for that directory. In production, this prevents markDirectoryReadThrough from calling DeleteFolderChildren mid-build, which would wipe entries already inserted by the listing. The test inserts an entry during a build, sends multiple metadata events for the building directory, asserts no notifications fired for it, verifies the entry survives, and confirms buffered events are replayed after CompleteDirectoryBuild. * Fix create invalidations, build guard, event shape, context, and snapshot error path - collectEntryInvalidations: invalidate FUSE kernel cache on pure create events (OldEntry==nil && NewEntry!=nil), not just updates and deletes - completeDirectoryBuildNow: only call markCachedFn when an active build existed (state != nil), preventing an unpopulated directory from being marked as cached - Add metadataCreateEvent helper that produces a create-shaped event (NewEntry only, no OldEntry) and use it in mkdir, mknod, symlink, and hardlink create fallback paths instead of metadataUpdateEvent which incorrectly set both OldEntry and NewEntry - applyMetadataResponseEnqueue: use context.Background() for the queued mutation so a cancelled caller context cannot abort the apply loop mid-write - DoSeaweedListWithSnapshot: move snapshot initialization before ListEntries call so the error path returns the preserved snapshot instead of 0 * Fix review findings: test loop, cache race, context safety, snapshot consistency - Fix build test loop starting at i=1 instead of i=0, missing new-0.txt verification - Re-check IsDirectoryCached after cache miss to avoid ENOENT race with markDirectoryReadThrough - Use context.Background() in enqueueAndWait so caller cancellation can't abort build/complete mid-way - Pass dh.snapshotTsNs in skip-batch loadDirectoryEntriesDirect for snapshot consistency - Prefer resp.MetadataEvent over fallback in Unlink event derivation - Add comment on MetadataEventSink.Record single-event assumption * Fix empty-directory snapshot clock skew and build cancellation race Empty-directory snapshot: Remove client-side time.Now() synthesis when the server returns no entries. Instead return snapshotTsNs=0, and in completeDirectoryBuildNow replay ALL buffered events when snapshot is 0. This eliminates the clock-skew bug where a client ahead of the filer would filter out legitimate post-list events. Build cancellation: Use context.Background() for BeginDirectoryBuild and CompleteDirectoryBuild calls in doEnsureVisited, so errgroup cancellation doesn't cause enqueueAndWait to return early and trigger cleanupBuild while the operation is still queued. * Add tests for empty-directory build replay and cancellation resilience TestEmptyDirectoryBuildReplaysAllBufferedEvents: verifies that when CompleteDirectoryBuild receives snapshotTsNs=0 (empty directory, no server snapshot), ALL buffered events are replayed regardless of their TsNs values — no clock-skew-sensitive filtering occurs. TestBuildCompletionSurvivesCallerCancellation: verifies that once CompleteDirectoryBuild is enqueued, a cancelled caller context does not prevent the build from completing. The apply loop runs with context.Background(), so the directory becomes cached and buffered events are replayed even when the caller gives up waiting. * Fix directory subtree cleanup, Link rollback, test robustness - applyMetadataResponseLocked: when a directory entry is deleted or moved, call DeleteFolderChildren on the old path so cached descendants don't leak as stale entries. - Link: save original HardLinkId/Counter before mutation. If CreateEntryWithResponse fails after the source was already updated, rollback the source entry to its original state via UpdateEntry. - TestBuildCompletionSurvivesCallerCancellation: replace fixed time.Sleep(50ms) with a deadline-based poll that checks IsDirectoryCached in a loop, failing only after 2s timeout. - TestReadDirAllEntriesWithSnapshotEmptyDirectory: assert that ListEntries was actually invoked on the mock client so the test exercises the RPC path. - newMetadataEvent: add early return when both oldEntry and newEntry are nil to avoid emitting events with empty Directory. --------- Co-authored-by: Copilot <copilot@github.com>	2 days ago
Chris Lu	f9311a3422	s3api: fix static IAM policy enforcement after reload (#8532 ) * s3api: honor attached IAM policies over legacy actions * s3api: hydrate IAM policy docs during config reload * s3api: use policy-aware auth when listing buckets * credential: propagate context through filer_etc policy reads * credential: make legacy policy deletes durable * s3api: exercise managed policy runtime loader * s3api: allow static IAM users without session tokens * iam: deny unmatched attached policies under default allow * iam: load embedded policy files from filer store * s3api: require session tokens for IAM presigning * s3api: sync runtime policies into zero-config IAM * credential: respect context in policy file loads * credential: serialize legacy policy deletes * iam: align filer policy store naming * s3api: use authenticated principals for presigning * iam: deep copy policy conditions * s3api: require request creation in policy tests * filer: keep ReadInsideFiler as the context-aware API * iam: harden filer policy store writes * credential: strengthen legacy policy serialization test * credential: forward runtime policy loaders through wrapper * s3api: harden runtime policy merging * iam: require typed already-exists errors	3 days ago
Chris Lu	1f3df6e9ef	admin: remove Alpha badge and unused Metrics/Logs menu items (#8525 ) * admin: remove Alpha badge and unused Metrics/Logs menu items * Update layout_templ.go	4 days ago
Chris Lu	b3620c7e14	admin: auto migrating master maintenance scripts to admin_script plugin config (#8509 ) * admin: seed admin_script plugin config from master maintenance scripts When the admin server starts, fetch the maintenance scripts configuration from the master via GetMasterConfiguration. If the admin_script plugin worker does not already have a saved config, use the master's scripts as the default value. This enables seamless migration from master.toml [master.maintenance] to the admin script plugin worker. Changes: - Add maintenance_scripts and maintenance_sleep_minutes fields to GetMasterConfigurationResponse in master.proto - Populate the new fields from viper config in master_grpc_server.go - On admin server startup, fetch the master config and seed the admin_script plugin config if no config exists yet - Strip lock/unlock commands from the master scripts since the admin script worker handles locking automatically Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: address review comments on admin_script seeding - Replace TOCTOU race (separate Load+Save) with atomic SaveJobTypeConfigIfNotExists on ConfigStore and Plugin - Replace ineffective polling loop with single GetMaster call using 30s context timeout, since GetMaster respects context cancellation - Add unit tests for SaveJobTypeConfigIfNotExists (in-memory + on-disk) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: apply maintenance script defaults in gRPC handler The gRPC handler for GetMasterConfiguration read maintenance scripts from viper without calling SetDefault, relying on startAdminScripts having run first. If the admin server calls GetMasterConfiguration before startAdminScripts sets the defaults, viper returns empty strings and the seeding is silently skipped. Apply SetDefault in the gRPC handler itself so it is self-contained. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Revert "fix: apply maintenance script defaults in gRPC handler" This reverts commit `068a506330`. * fix: use atomic save in ensureJobTypeConfigFromDescriptor ensureJobTypeConfigFromDescriptor used a separate Load + Save, racing with seedAdminScriptFromMaster. If the descriptor defaults (empty script) were saved first, SaveJobTypeConfigIfNotExists in the seeding goroutine would see an existing config and skip, losing the master's maintenance scripts. Switch to SaveJobTypeConfigIfNotExists so both paths are atomic. Whichever wins, the other is a safe no-op. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: fetch master scripts inline during config bootstrap, not in goroutine Replace the seedAdminScriptFromMaster goroutine with a ConfigDefaultsProvider callback. When the plugin bootstraps admin_script defaults from the worker descriptor, it calls the provider which fetches maintenance scripts from the master synchronously. This eliminates the race between the seeding goroutine and the descriptor-based config bootstrap. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * skip commented lock unlock Co-Authored-By: Copilot <223556219+Copilot@users.noreply.github.com> * reduce grpc calls --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	5 days ago
Chris Lu	c19f88eef1	fix: resolve ServerAddress to NodeId in maintenance task sync (#8508 ) * fix: maintenance task topology lookup, retry, and stale task cleanup 1. Strip gRPC port from ServerAddress in SyncTask using ToHttpAddress() so task targets match topology disk keys (NodeId format). 2. Skip capacity check when topology has no disks yet (startup race where tasks are loaded from persistence before first topology update). 3. Don't retry permanent errors like "volume not found" - these will never succeed on retry. 4. Cancel all pending tasks for each task type before re-detection, ensuring stale proposals from previous cycles are cleaned up. This prevents stale tasks from blocking new detection and from repeatedly failing. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * logs Co-Authored-By: Copilot <223556219+Copilot@users.noreply.github.com> * less lock scope Co-Authored-By: Copilot <223556219+Copilot@users.noreply.github.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	5 days ago
Fábio Henrique Araújo	88e8342e44	style: Reseted padding to container-fluid div in layout template (#8505 ) * style: Reseted padding to container-fluid div in layout template * address comment Co-Authored-By: Copilot <223556219+Copilot@users.noreply.github.com> --------- Co-authored-by: Chris Lu <chris.lu@gmail.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	5 days ago
Copilot	70ed9c2a55	Update plugin_templ.go Co-Authored-By: Copilot <223556219+Copilot@users.noreply.github.com>	6 days ago
Chris Lu	45ce18266a	Disable master maintenance scripts when admin server runs (#8499 ) * Disable master maintenance scripts when admin server runs * Stop defaulting master maintenance scripts * Apply suggestion from @gemini-code-assist[bot] Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> * Apply suggestion from @gemini-code-assist[bot] Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> * Clarify master scripts are disabled by default * Skip master maintenance scripts when admin server is connected * Restore default master maintenance scripts * Document admin server skip for master maintenance scripts --------- Co-authored-by: Copilot <copilot@github.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	6 days ago
Chris Lu	18ccc9b773	Plugin scheduler: sequential iterations with max runtime (#8496 ) * pb: add job type max runtime setting * plugin: default job type max runtime * plugin: redesign scheduler loop * admin ui: update scheduler settings * plugin: fix scheduler loop state name * plugin scheduler: restore backlog skip * plugin scheduler: drop legacy detection helper * admin api: require scheduler config body * admin ui: preserve detection interval on save * plugin scheduler: use job context and drain cancels * plugin scheduler: respect detection intervals * plugin scheduler: gate runs and drain queue * ec test: reuse req/resp vars * ec test: add scheduler debug logs * Adjust scheduler idle sleep and initial run delay * Clear pending job queue before scheduler runs * Log next detection time in EC integration test * Improve plugin scheduler debug logging in EC test * Expose scheduler next detection time * Log scheduler next detection time in EC test * Wake scheduler on config or worker updates * Expose scheduler sleep interval in UI * Fix scheduler sleep save value selection * Set scheduler idle sleep default to 613s * Show scheduler next run time in plugin UI --------- Co-authored-by: Copilot <copilot@github.com>	6 days ago
Chris Lu	e1e5b4a8a6	add admin script worker (#8491 ) * admin: add plugin lock coordination * shell: allow bypassing lock checks * plugin worker: add admin script handler * mini: include admin_script in plugin defaults * admin script UI: drop name and enlarge text * admin script: add default script * admin_script: make run interval configurable * plugin: gate other jobs during admin_script runs * plugin: use last completed admin_script run * admin: backfill plugin config defaults * templ Co-Authored-By: Copilot <223556219+Copilot@users.noreply.github.com> * comparable to default version Co-Authored-By: Copilot <223556219+Copilot@users.noreply.github.com> * default to run Co-Authored-By: Copilot <223556219+Copilot@users.noreply.github.com> * format Co-Authored-By: Copilot <223556219+Copilot@users.noreply.github.com> * shell: respect pre-set noLock for fix.replication * shell: add force no-lock mode for admin scripts * volume balance worker already exists Co-Authored-By: Copilot <223556219+Copilot@users.noreply.github.com> * admin: expose scheduler status JSON * shell: add sleep command * shell: restrict sleep syntax * Revert "shell: respect pre-set noLock for fix.replication" This reverts commit `2b14e8b826`. * templ Co-Authored-By: Copilot <223556219+Copilot@users.noreply.github.com> * fix import Co-Authored-By: Copilot <223556219+Copilot@users.noreply.github.com> * less logs Co-Authored-By: Copilot <223556219+Copilot@users.noreply.github.com> * Reduce master client logs on canceled contexts * Update mini default job type count --------- Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	6 days ago
Chris Lu	a61a2affe3	Expire stuck plugin jobs (#8492 ) * Add stale job expiry and expire API * Add expire job button * Add test hook and coverage for ExpirePluginJobAPI * Document scheduler filtering side effect and reuse helper * Restore job spec proposal test * Regenerate plugin template output --------- Co-authored-by: Copilot <copilot@github.com>	7 days ago
Chris Lu	f5c35240be	Add volume dir tags and EC placement priority (#8472 ) * Add volume dir tags to topology Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Add preferred tag config for EC Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Prioritize EC destinations by tags Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Add EC placement planner tag tests Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Refactor EC placement tests to reuse buildActiveTopology Remove buildActiveTopologyWithDiskTags helper function and consolidate tag setup inline in test cases. Tests now use UpdateTopology to apply tags after topology creation, reusing the existing buildActiveTopology function rather than duplicating its logic. All tag scenario tests pass: - TestECPlacementPlannerPrefersTaggedDisks - TestECPlacementPlannerFallsBackWhenTagsInsufficient Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Consolidate normalizeTagList into shared util package Extract normalizeTagList from three locations (volume.go, detection.go, erasure_coding_handler.go) into new weed/util/tag.go as exported NormalizeTagList function. Replace all duplicate implementations with imports and calls to util.NormalizeTagList. This improves code reuse and maintainability by centralizing tag normalization logic. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Add PreferredTags to EC config persistence Add preferred_tags field to ErasureCodingTaskConfig protobuf with field number 5. Update GetConfigSpec to include preferred_tags field in the UI configuration schema. Add PreferredTags to ToTaskPolicy to serialize config to protobuf. Add PreferredTags to FromTaskPolicy to deserialize from protobuf with defensive copy to prevent external mutation. This allows EC preferred tags to be persisted and restored across worker restarts. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Add defensive copy for Tags slice in DiskLocation Copy the incoming tags slice in NewDiskLocation instead of storing by reference. This prevents external callers from mutating the DiskLocation.Tags slice after construction, improving encapsulation and preventing unexpected changes to disk metadata. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Add doc comment to buildCandidateSets method Document the tiered candidate selection and fallback behavior. Explain that for a planner with preferredTags, it accumulates disks matching each tag in order into progressively larger tiers, emits a candidate set once a tier reaches shardsNeeded, and finally falls back to the full candidates set if preferred-tag tiers are insufficient. This clarifies the intended semantics for future maintainers. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Apply final PR review fixes 1. Update parseVolumeTags to replicate single tag entry to all folders instead of leaving some folders with nil tags. This prevents nil pointer dereferences when processing folders without explicit tags. 2. Add defensive copy in ToTaskPolicy for PreferredTags slice to match the pattern used in FromTaskPolicy, preventing external mutation of the returned TaskPolicy. 3. Add clarifying comment in buildCandidateSets explaining that the shardsNeeded <= 0 branch is a defensive check for direct callers, since selectDestinations guarantees shardsNeeded > 0. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Fix nil pointer dereference in parseVolumeTags Ensure all folder tags are initialized to either normalized tags or empty slices, not nil. When multiple tag entries are provided and there are more folders than entries, remaining folders now get empty slices instead of nil, preventing nil pointer dereference in downstream code. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Fix NormalizeTagList to return empty slice instead of nil Change NormalizeTagList to always return a non-nil slice. When all tags are empty or whitespace after normalization, return an empty slice instead of nil. This prevents nil pointer dereferences in downstream code that expects a valid (possibly empty) slice. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Add nil safety check for v.tags pointer Add a safety check to handle the case where v.tags might be nil, preventing a nil pointer dereference. If v.tags is nil, use an empty string instead. This is defensive programming to prevent panics in edge cases. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Add volume.tags flag to weed server and weed mini commands Add the volume.tags CLI option to both the 'weed server' and 'weed mini' commands. This allows users to specify disk tags when running the combined server modes, just like they can with 'weed volume'. The flag uses the same format and description as the volume command: comma-separated tag groups per data dir with ':' separators (e.g. fast:ssd,archive). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --------- Co-authored-by: Copilot <copilot@github.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	1 week ago
Chris Lu	b9e560dcf1	Prevent overlapping maintenance tasks per volume (#8463 ) * Prevent concurrent maintenance tasks per volume * fix panic	1 week ago
Chris Lu	c73e65ad5e	Add customizable plugin display names and weights (#8459 ) * feat: add customizable plugin display names and weights - Add weight field to JobTypeCapability proto message - Modify ListKnownJobTypes() to return JobTypeInfo with display names and weights - Modify ListPluginJobTypes() to return JobTypeInfo instead of string - Sort plugins by weight (descending) then alphabetically - Update admin API to return enriched job type metadata - Update plugin UI template to display names instead of IDs - Consolidate API by reusing existing function names instead of suffixed variants * perf: optimize plugin job type capability lookup and add null-safe parsing - Pre-calculate job type capabilities in a map to reduce O(nm) nested loops to O(n+m) lookup time in ListKnownJobTypes() - Add parseJobTypeItem() helper function for null-safe job type item parsing - Refactor plugin.templ to use parseJobTypeItem() in all job type access points (hasJobType, applyInitialNavigation, ensureActiveNavigation, renderTopTabs) - Deterministic capability resolution by using first worker's capability templ * refactor: use parseJobTypeItem helper consistently in plugin.templ Replace duplicated job type extraction logic at line 1296-1298 with parseJobTypeItem() helper function for consistency and maintainability. * improve: prefer richer capability metadata and add null-safety checks - Improve capability selection in ListKnownJobTypes() to prefer capabilities with non-empty DisplayName and higher Weight across all workers instead of first-wins approach. Handles mixed-version clusters better. - Add defensive null checks in renderJobTypeSummary() to safely access parseJobTypeItem() result before property access - Ensures malformed or missing entries won't break the rendering pipeline * fix: preserve existing DisplayName when merging capabilities Fix capability merge logic to respect existing DisplayName values: - If existing has DisplayName but candidate doesn't, preserve existing - If existing doesn't have DisplayName but candidate does, use candidate - Only use Weight comparison if DisplayName status is equal - Prevents higher-weight capabilities with empty DisplayName from overriding capabilities with non-empty DisplayName	2 weeks ago
Chris Lu	a3cb7fa8cc	go fmt	2 weeks ago
Chris Lu	ce4940b441	fix filer link on dashboard	2 weeks ago
Anton	b4c7d42a06	fix(admin): release mutex before disk I/O in maintenance queue; remove per-request LoadAllTaskStates (#8433 ) * fix(admin): release mutex before disk I/O in maintenance queue saveTaskState performs synchronous BoltDB writes. Calling it while holding mq.mutex.Lock() in AddTask, GetNextTask, and CompleteTask blocks all readers (GetTasks via RLock) for the full disk write duration on every task state change. During a maintenance scan AddTasksFromResults calls AddTask for every volume — potentially hundreds of times — meaning the write lock is held almost continuously. The HTTP handler for /maintenance calls GetTasks which blocks on RLock, exceeding the 30s timeout and returning 408 to the browser. Fix: update in-memory state (mq.tasks, mq.pendingTasks) under the lock as before, then unlock before calling saveTaskState. In-memory state is the authoritative source; persistence is crash-recovery only and does not require lock protection during the write. * fix(admin): add mutex to ConfigPersistence to synchronize tasks/ filesystem ops saveTaskState is now called outside mq.mutex, meaning SaveTaskState, LoadAllTaskStates, DeleteTaskState, and CleanupCompletedTasks can be invoked concurrently from multiple goroutines. ConfigPersistence had no internal synchronization, creating races on the tasks/ directory: - concurrent os.WriteFile + os.ReadFile on the same .pb file could yield a partial read and unmarshal error - LoadAllTaskStates (ReadDir + per-file ReadFile) could see a directory entry for a file being written or deleted concurrently - CleanupCompletedTasks (LoadAllTaskStates + DeleteTaskState) could race with SaveTaskState on the same file Fix: add tasksMu sync.Mutex to ConfigPersistence, acquired at the top of SaveTaskState, LoadTaskState, LoadAllTaskStates, DeleteTaskState, and CleanupCompletedTasks. Extract private Locked helpers so that CleanupCompletedTasks (which holds tasksMu) can call them internally without deadlocking. --------- Co-authored-by: Anton Ustyugov <anton@devops>	2 weeks ago
Chris Lu	cba69f4593	Update layout_templ.go	2 weeks ago
Chris Lu	3f58e3bf8f	Use master shard sizes for EC volumes (#8423 ) * Use master shard sizes for EC volumes * Remove EC volume shard size fallback * Remove unused EC dash imports	2 weeks ago
Chris Lu	8d59ef41d5	Admin UI: replace gin with mux (#8420 ) * Replace admin gin router with mux * Update layout_templ.go * Harden admin handlers * Add login CSRF handling * Fix filer copy naming conflict * address comments * address comments	2 weeks ago
Chris Lu	07f284c391	fix links	2 weeks ago
Chris Lu	7b08cf74ed	consistent template generation	2 weeks ago
Chris Lu	8ec9ff4a12	Refactor plugin system and migrate worker runtime (#8369 ) * admin: add plugin runtime UI page and route wiring * pb: add plugin gRPC contract and generated bindings * admin/plugin: implement worker registry, runtime, monitoring, and config store * admin/dash: wire plugin runtime and expose plugin workflow APIs * command: add flags to enable plugin runtime * admin: rename remaining plugin v2 wording to plugin * admin/plugin: add detectable job type registry helper * admin/plugin: add scheduled detection and dispatch orchestration * admin/plugin: prefetch job type descriptors when workers connect * admin/plugin: add known job type discovery API and UI * admin/plugin: refresh design doc to match current implementation * admin/plugin: enforce per-worker scheduler concurrency limits * admin/plugin: use descriptor runtime defaults for scheduler policy * admin/ui: auto-load first known plugin job type on page open * admin/plugin: bootstrap persisted config from descriptor defaults * admin/plugin: dedupe scheduled proposals by dedupe key * admin/ui: add job type and state filters for plugin monitoring * admin/ui: add per-job-type plugin activity summary * admin/plugin: split descriptor read API from schema refresh * admin/ui: keep plugin summary metrics global while tables are filtered * admin/plugin: retry executor reservation before timing out * admin/plugin: expose scheduler states for monitoring * admin/ui: show per-job-type scheduler states in plugin monitor * pb/plugin: rename protobuf package to plugin * admin/plugin: rename pluginRuntime wiring to plugin * admin/plugin: remove runtime naming from plugin APIs and UI * admin/plugin: rename runtime files to plugin naming * admin/plugin: persist jobs and activities for monitor recovery * admin/plugin: lease one detector worker per job type * admin/ui: show worker load from plugin heartbeats * admin/plugin: skip stale workers for detector and executor picks * plugin/worker: add plugin worker command and stream runtime scaffold * plugin/worker: implement vacuum detect and execute handlers * admin/plugin: document external vacuum plugin worker starter * command: update plugin.worker help to reflect implemented flow * command/admin: drop legacy Plugin V2 label * plugin/worker: validate vacuum job type and respect min interval * plugin/worker: test no-op detect when min interval not elapsed * command/admin: document plugin.worker external process * plugin/worker: advertise configured concurrency in hello * command/plugin.worker: add jobType handler selection * command/plugin.worker: test handler selection by job type * command/plugin.worker: persist worker id in workingDir * admin/plugin: document plugin.worker jobType and workingDir flags * plugin/worker: support cancel request for in-flight work * plugin/worker: test cancel request acknowledgements * command/plugin.worker: document workingDir and jobType behavior * plugin/worker: emit executor activity events for monitor * plugin/worker: test executor activity builder * admin/plugin: send last successful run in detection request * admin/plugin: send cancel request when detect or execute context ends * admin/plugin: document worker cancel request responsibility * admin/handlers: expose plugin scheduler states API in no-auth mode * admin/handlers: test plugin scheduler states route registration * admin/plugin: keep worker id on worker-generated activity records * admin/plugin: test worker id propagation in monitor activities * admin/dash: always initialize plugin service * command/admin: remove plugin enable flags and default to enabled * admin/dash: drop pluginEnabled constructor parameter * admin/plugin UI: stop checking plugin enabled state * admin/plugin: remove docs for plugin enable flags * admin/dash: remove unused plugin enabled check method * admin/dash: fallback to in-memory plugin init when dataDir fails * admin/plugin API: expose worker gRPC port in status * command/plugin.worker: resolve admin gRPC port via plugin status * split plugin UI into overview/configuration/monitoring pages * Update layout_templ.go * add volume_balance plugin worker handler * wire plugin.worker CLI for volume_balance job type * add erasure_coding plugin worker handler * wire plugin.worker CLI for erasure_coding job type * support multi-job handlers in plugin worker runtime * allow plugin.worker jobType as comma-separated list * admin/plugin UI: rename to Workers and simplify config view * plugin worker: queue detection requests instead of capacity reject * Update plugin_worker.go * plugin volume_balance: remove force_move/timeout from worker config UI * plugin erasure_coding: enforce local working dir and cleanup * admin/plugin UI: rename admin settings to job scheduling * admin/plugin UI: persist and robustly render detection results * admin/plugin: record and return detection trace metadata * admin/plugin UI: show detection process and decision trace * plugin: surface detector decision trace as activities * mini: start a plugin worker by default * admin/plugin UI: split monitoring into detection and execution tabs * plugin worker: emit detection decision trace for EC and balance * admin workers UI: split monitoring into detection and execution pages * plugin scheduler: skip proposals for active assigned/running jobs * admin workers UI: add job queue tab * plugin worker: add dummy stress detector and executor job type * admin workers UI: reorder tabs to detection queue execution * admin workers UI: regenerate plugin template * plugin defaults: include dummy stress and add stress tests * plugin dummy stress: rotate detection selections across runs * plugin scheduler: remove cross-run proposal dedupe * plugin queue: track pending scheduled jobs * plugin scheduler: wait for executor capacity before dispatch * plugin scheduler: skip detection when waiting backlog is high * plugin: add disk-backed job detail API and persistence * admin ui: show plugin job detail modal from job id links * plugin: generate unique job ids instead of reusing proposal ids * plugin worker: emit heartbeats on work state changes * plugin registry: round-robin tied executor and detector picks * add temporary EC overnight stress runner * plugin job details: persist and render EC execution plans * ec volume details: color data and parity shard badges * shard labels: keep parity ids numeric and color-only distinction * admin: remove legacy maintenance UI routes and templates * admin: remove dead maintenance endpoint helpers * Update layout_templ.go * remove dummy_stress worker and command support * refactor plugin UI to job-type top tabs and sub-tabs * migrate weed worker command to plugin runtime * remove plugin.worker command and keep worker runtime with metrics * update helm worker args for jobType and execution flags * set plugin scheduling defaults to global 16 and per-worker 4 * stress: fix RPC context reuse and remove redundant variables in ec_stress_runner * admin/plugin: fix lifecycle races, safe channel operations, and terminal state constants * admin/dash: randomize job IDs and fix priority zero-value overwrite in plugin API * admin/handlers: implement buffered rendering to prevent response corruption * admin/plugin: implement debounced persistence flusher and optimize BuildJobDetail memory lookups * admin/plugin: fix priority overwrite and implement bounded wait in scheduler reserve * admin/plugin: implement atomic file writes and fix run record side effects * admin/plugin: use P prefix for parity shard labels in execution plans * admin/plugin: enable parallel execution for cancellation tests * admin: refactor time.Time fields to pointers for better JSON omitempty support * admin/plugin: implement pointer-safe time assignments and comparisons in plugin core * admin/plugin: fix time assignment and sorting logic in plugin monitor after pointer refactor * admin/plugin: update scheduler activity tracking to use time pointers * admin/plugin: fix time-based run history trimming after pointer refactor * admin/dash: fix JobSpec struct literal in plugin API after pointer refactor * admin/view: add D/P prefixes to EC shard badges for UI consistency * admin/plugin: use lifecycle-aware context for schema prefetching * Update ec_volume_details_templ.go * admin/stress: fix proposal sorting and log volume cleanup errors * stress: refine ec stress runner with math/rand and collection name - Added Collection field to VolumeEcShardsDeleteRequest for correct filename construction. - Replaced crypto/rand with seeded math/rand PRNG for bulk payloads. - Added documentation for EcMinAge zero-value behavior. - Added logging for ignored errors in volume/shard deletion. * admin: return internal server error for plugin store failures Changed error status code from 400 Bad Request to 500 Internal Server Error for failures in GetPluginJobDetail to correctly reflect server-side errors. * admin: implement safe channel sends and graceful shutdown sync - Added sync.WaitGroup to Plugin struct to manage background goroutines. - Implemented safeSendCh helper using recover() to prevent panics on closed channels. - Ensured Shutdown() waits for all background operations to complete. * admin: robustify plugin monitor with nil-safe time and record init - Standardized nil-safe assignment for time.Time pointers (CreatedAt, UpdatedAt, CompletedAt). - Ensured persistJobDetailSnapshot initializes new records correctly if they don't exist on disk. - Fixed debounced persistence to trigger immediate write on job completion. admin: improve scheduler shutdown behavior and logic guards - Replaced brittle error string matching with explicit r.shutdownCh selection for shutdown detection. - Removed redundant nil guard in buildScheduledJobSpec. - Standardized WaitGroup usage for schedulerLoop. * admin: implement deep copy for job parameters and atomic write fixes - Implemented deepCopyGenericValue and used it in cloneTrackedJob to prevent shared state. - Ensured atomicWriteFile creates parent directories before writing. * admin: remove unreachable branch in shard classification Removed an unreachable 'totalShards <= 0' check in classifyShardID as dataShards and parityShards are already guarded. * admin: secure UI links and use canonical shard constants - Added rel="noopener noreferrer" to external links for security. - Replaced magic number 14 with erasure_coding.TotalShardsCount. - Used renderEcShardBadge for missing shard list consistency. * admin: stabilize plugin tests and fix regressions - Composed a robust plugin_monitor_test.go to handle asynchronous persistence. - Updated all time.Time literals to use timeToPtr helper. - Added explicit Shutdown() calls in tests to synchronize with debounced writes. - Fixed syntax errors and orphaned struct literals in tests. * Potential fix for code scanning alert no. 278: Slice memory allocation with excessive size value Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com> * Potential fix for code scanning alert no. 283: Uncontrolled data used in path expression Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com> * admin: finalize refinements for error handling, scheduler, and race fixes - Standardized HTTP 500 status codes for store failures in plugin_api.go. - Tracked scheduled detection goroutines with sync.WaitGroup for safe shutdown. - Fixed race condition in safeSendDetectionComplete by extracting channel under lock. - Implemented deep copy for JobActivity details. - Used defaultDirPerm constant in atomicWriteFile. * test(ec): migrate admin dockertest to plugin APIs * admin/plugin_api: fix RunPluginJobTypeAPI to return 500 for server-side detection/filter errors * admin/plugin_api: fix ExecutePluginJobAPI to return 500 for job execution failures * admin/plugin_api: limit parseProtoJSONBody request body to 1MB to prevent unbounded memory usage * admin/plugin: consolidate regex to package-level validJobTypePattern; add char validation to sanitizeJobID * admin/plugin: fix racy Shutdown channel close with sync.Once * admin/plugin: track sendLoop and recv goroutines in WorkerStream with r.wg * admin/plugin: document writeProtoFiles atomicity — .pb is source of truth, .json is human-readable only * admin/plugin: extract activityLess helper to deduplicate nil-safe OccurredAt sort comparators * test/ec: check http.NewRequest errors to prevent nil req panics * test/ec: replace deprecated ioutil/math/rand, fix stale step comment 5.1→3.1 * plugin(ec): raise default detection and scheduling throughput limits * topology: include empty disks in volume list and EC capacity fallback * topology: remove hard 10-task cap for detection planning * Update ec_volume_details_templ.go * adjust default * fix tests --------- Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>	3 weeks ago
Chris Lu	f44e25b422	fix(iam): ensure access key status is persisted and defaulted to Active (#8341 ) * Fix master leader election startup issue Fixes #error-log-leader-not-selected-yet * not useful test * fix(iam): ensure access key status is persisted and defaulted to Active * make pb * update tests * using constants	3 weeks ago
Chris Lu	1b2f719d7c	admin: fix file browser items-per-page selector (#8291 ) * admin: fix file browser page size selector Fix file browser pagination page-size selectors to use explicit select IDs instead of this.value in templ-generated handlers, which could resolve to undefined and produce limit=undefined in requests. Add a focused template render regression test to prevent this from recurring. Fixes #8284 * revert file browser template regression test	4 weeks ago
Chris Lu	db76eb26e7	compile	4 weeks ago
Chris Lu	aef2de3109	s3tables: support multi-level namespaces in parser/admin paths (#8273 ) * s3tables: support multi-level namespace normalization * admin: handle namespace parsing errors centrally * admin: clean namespace validation duplication	4 weeks ago
Chris Lu	be26ce74ce	s3tables: support multi-level namespace normalization	4 weeks ago
Chris Lu	5a0204310c	Add Iceberg admin UI (#8246 ) * Add Iceberg table details view * Enhance Iceberg catalog browsing UI * Fix Iceberg UI security and logic issues - Fix selectSchema() and partitionFieldsFromFullMetadata() to always search for matching IDs instead of checking != 0 - Fix snapshotsFromFullMetadata() to defensive-copy before sorting to prevent mutating caller's slice - Fix XSS vulnerabilities in s3tables.js: replace innerHTML with textContent/createElement for user-controlled data - Fix deleteIcebergTable() to redirect to namespace tables list on details page instead of reloading - Fix data-bs-target in iceberg_namespaces.templ: remove templ.SafeURL for CSS selector - Add catalogName to delete modal data attributes for proper redirect - Remove unused hidden inputs from create table form (icebergTableBucketArn, icebergTableNamespace) * Regenerate templ files for Iceberg UI updates * Support complex Iceberg type objects in schema Change Type field from string to json.RawMessage in both IcebergSchemaFieldInfo and internal icebergSchemaField to properly handle Iceberg spec's complex type objects (e.g. {"type": "struct", "fields": [...]}). Currently test data only shows primitive string types, but this change makes the implementation defensively robust for future complex types by preserving the exact JSON representation. Add typeToString() helper and update schema extraction functions to marshal string types as JSON. Update template to convert json.RawMessage to string for display. * Regenerate templ files for Type field changes * templ * Fix additional Iceberg UI issues from code review - Fix lazy-load flag that was set before async operation completed, preventing retries on error; now sets loaded flag only after successful load and throws error to caller for proper error handling and UI updates - Add zero-time guards for CreatedAt and ModifiedAt fields in table details to avoid displaying Go zero-time values; render dash when time is zero - Add URL path escaping for all catalog/namespace/table names in URLs to prevent malformed URLs when names contain special characters like /, ?, or # - Remove redundant innerHTML clear in loadIcebergNamespaceTables that cleared twice before appending the table list - Fix selectSnapshotForMetrics to remove != 0 guard for consistency with selectSchema fix; now always searches for CurrentSnapshotID without zero-value gate - Enhance typeToString() helper to display '(complex)' for non-primitive JSON types * Regenerate templ files for Phase 3 updates * Fix template generation to use correct file paths Run templ generate from repo root instead of weed/admin directory to ensure generated _templ.go files have correct absolute paths in error messages (e.g., 'weed/admin/view/app/iceberg_table_details.templ' instead of 'app/iceberg_table_details.templ'). This ensures both 'make admin-generate' at repo root and 'make generate' in weed/admin directory produce identical output with consistent file path references. * Regenerate template files with correct path references * Validate S3 Tables names in UI - Add client-side validation for table bucket and namespace names to surface errors for invalid characters (dots/underscores) before submission - Use HTML validity messages with reportValidity for immediate feedback - Update namespace helper text to reflect actual constraints (single-level, lowercase letters, numbers, and underscores) * Regenerate templ files for namespace helper text * Fix Iceberg catalog REST link and actions * Disallow S3 object access on table buckets * Validate Iceberg layout for table bucket objects * Fix REST API link to /v1/config * merge iceberg page with table bucket page * Allowed Trino/Iceberg stats files in metadata validation * fixes - Backend/data handling: - Normalized Iceberg type display and fallback handling in weed/admin/dash/s3tables_management.go. - Fixed snapshot fallback pointer semantics in weed/admin/dash/s3tables_management.go. - Added CSRF token generation/propagation/validation for namespace create/delete in: - weed/admin/dash/csrf.go - weed/admin/dash/auth_middleware.go - weed/admin/dash/middleware.go - weed/admin/dash/s3tables_management.go - weed/admin/view/layout/layout.templ - weed/admin/static/js/s3tables.js - UI/template fixes: - Zero-time guards for CreatedAt fields in: - weed/admin/view/app/iceberg_namespaces.templ - weed/admin/view/app/iceberg_tables.templ - Fixed invalid templ-in-script interpolation and host/port rendering in: - weed/admin/view/app/iceberg_catalog.templ - weed/admin/view/app/s3tables_buckets.templ - Added data-catalog-name consistency on Iceberg delete action in weed/admin/view/app/iceberg_tables.templ. - Updated retry wording in weed/admin/static/js/s3tables.js. - Regenerated all affected _templ.go files. - S3 API/comment follow-ups: - Reused cached table-bucket validator in weed/s3api/bucket_paths.go. - Added validation-failure debug logging in weed/s3api/s3api_object_handlers_tagging.go. - Added multipart path-validation design comment in weed/s3api/s3api_object_handlers_multipart.go. - Build tooling: - Fixed templ generate working directory issues in weed/admin/Makefile (watch + pattern rule). * populate data * test/s3tables: harden populate service checks * admin: skip table buckets in object-store bucket list * admin sidebar: move object store to top-level links * admin iceberg catalog: guard zero times and escape links * admin forms: add csrf/error handling and client-side name validation * admin s3tables: fix namespace delete modal redeclaration * admin: replace native confirm dialogs with modal helpers * admin modal-alerts: remove noisy confirm usage console log * reduce logs * test/s3tables: use partitioned tables in trino and spark populate * admin file browser: normalize filer ServerAddress for HTTP parsing	4 weeks ago
Andrii Bratanin	aba42419be	Fix tip message in maintenance_workers.templ (#8245 )	4 weeks ago
Chris Lu	3bb9493a5b	Enhance Iceberg catalog browsing UI	4 weeks ago
Chris Lu	d9e3fb2b8e	Add Iceberg table details view	4 weeks ago
Chris Lu	e6ee293c17	Add table operations test (#8241 ) * Add Trino blog operations test * Update test/s3tables/catalog_trino/trino_blog_operations_test.go Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> * feat: add table bucket path helpers and filer operations - Add table object root and table location mapping directories - Implement ensureDirectory, upsertFile, deleteEntryIfExists helpers - Support table location bucket mapping for S3 access * feat: manage table bucket object roots on creation/deletion - Create .objects directory for table buckets on creation - Clean up table object bucket paths on deletion - Enable S3 operations on table bucket object roots * feat: add table location mapping for Iceberg REST - Track table location bucket mappings when tables are created/updated/deleted - Enable location-based routing for S3 operations on table data * feat: route S3 operations to table bucket object roots - Route table-s3 bucket names to mapped table paths - Route table buckets to object root directories - Support table location bucket mapping lookup * feat: emit table-s3 locations from Iceberg REST - Generate unique table-s3 bucket names with UUID suffix - Store table metadata under table bucket paths - Return table-s3 locations for Trino compatibility * fix: handle missing directories in S3 list operations - Propagate ErrNotFound from ListEntries for non-existent directories - Treat missing directories as empty results for list operations - Fixes Trino non-empty location checks on table creation * test: improve Trino CSV parsing for single-value results - Sanitize Trino output to skip jline warnings - Handle single-value CSV results without header rows - Strip quotes from numeric values in tests * refactor: use bucket path helpers throughout S3 API - Replace direct bucket path operations with helper functions - Leverage centralized table bucket routing logic - Improve maintainability with consistent path resolution * fix: add table bucket cache and improve filer error handling - Cache table bucket lookups to reduce filer overhead on repeated checks - Use filer_pb.CreateEntry and filer_pb.UpdateEntry helpers to check resp.Error - Fix delete order in handler_bucket_get_list_delete: delete table object before directory - Make location mapping errors best-effort: log and continue, don't fail API - Update table location mappings to delete stale prior bucket mappings on update - Add 1-second sleep before timestamp time travel query to ensure timestamps are in past - Fix CSV parsing: examine all lines, not skip first; handle single-value rows * fix: properly handle stale metadata location mapping cleanup - Capture oldMetadataLocation before mutation in handleUpdateTable - Update updateTableLocationMapping to accept both old and new locations - Use passed-in oldMetadataLocation to detect location changes - Delete stale mapping only when location actually changes - Pass empty string for oldLocation in handleCreateTable (new tables have no prior mapping) - Improve logging to show old -> new location transitions * refactor: cleanup imports and cache design - Remove unused 'sync' import from bucket_paths.go - Use filer_pb.UpdateEntry helper in setExtendedAttribute and deleteExtendedAttribute for consistent error handling - Add dedicated tableBucketCache map[string]bool to BucketRegistry instead of mixing concerns with metadataCache - Improve cache separation: table buckets cache is now separate from bucket metadata cache * fix: improve cache invalidation and add transient error handling Cache invalidation (critical fix): - Add tableLocationCache to BucketRegistry for location mapping lookups - Clear tableBucketCache and tableLocationCache in RemoveBucketMetadata - Prevents stale cache entries when buckets are deleted/recreated Transient error handling: - Only cache table bucket lookups when conclusive (found or ErrNotFound) - Skip caching on transient errors (network, permission, etc) - Prevents marking real table buckets as non-table due to transient failures Performance optimization: - Cache tableLocationDir results to avoid repeated filer RPCs on hot paths - tableLocationDir now checks cache before making expensive filer lookups - Cache stores empty string for 'not found' to avoid redundant lookups Code clarity: - Add comment to deleteDirectory explaining DeleteEntry response lacks Error field * go fmt * fix: mirror transient error handling in tableLocationDir and optimize bucketDir Transient error handling: - tableLocationDir now only caches definitive results - Mirrors isTableBucket behavior to prevent treating transient errors as permanent misses - Improves reliability on flaky systems or during recovery Performance optimization: - bucketDir avoids redundant isTableBucket call via bucketRoot - Directly use s3a.option.BucketsPath for regular buckets - Saves one cache lookup for every non-table bucket operation * fix: revert bucketDir optimization to preserve bucketRoot logic The optimization to directly use BucketsPath bypassed bucketRoot's logic and caused issues with S3 list operations on delimiter+prefix cases. Revert to using path.Join(s3a.bucketRoot(bucket), bucket) which properly handles all bucket types and ensures consistent path resolution across the codebase. The slight performance cost of an extra cache lookup is worth the correctness and consistency benefits. * feat: move table buckets under /buckets Add a table-bucket marker attribute, reuse bucket metadata cache for table bucket detection, and update list/validation/UI/test paths to treat table buckets as /buckets entries. * Fix S3 Tables code review issues - handler_bucket_create.go: Fix bucket existence check to properly validate entryResp.Entry before setting s3BucketExists flag (nil Entry should not indicate existing bucket) - bucket_paths.go: Add clarifying comment to bucketRoot() explaining unified buckets root path for all bucket types - file_browser_data.go: Optimize by extracting table bucket check early to avoid redundant WithFilerClient call * Fix list prefix delimiter handling * Handle list errors conservatively * Fix Trino FOR TIMESTAMP query - use past timestamp Iceberg requires the timestamp to be strictly in the past. Use current_timestamp - interval '1' second instead of current_timestamp. --------- Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	1 month ago
Chris Lu	c284e51d20	fix: multipart upload ETag calculation (#8238 ) * fix multipart etag * address comments * clean up * clean up * optimization * address comments * unquoted etag * dedup * upgrade * clean * etag * return quoted tag * quoted etag * debug * s3api: unify ETag retrieval and quoting across handlers Refactor newListEntry to take S3ApiServer and use getObjectETag, and update setResponseHeaders to use the same logic. This ensures consistent ETags are returned for both listing and direct access. s3api: implement ListObjects deduplication for versioned buckets Handle duplicate entries between the main path and the .versions directory by prioritizing the latest version when bucket versioning is enabled. * s3api: cleanup stale main file entries during versioned uploads Add explicit deletion of pre-existing "main" files when creating new versions in versioned buckets. This prevents stale entries from appearing in bucket listings and ensures consistency. * s3api: fix cleanup code placement in versioned uploads Correct the placement of rm calls in completeMultipartUpload and putVersionedObject to ensure stale main files are properly deleted during versioned uploads. * s3api: improve getObjectETag fallback for empty ExtETagKey Ensure that when ExtETagKey exists but contains an empty value, the function falls through to MD5/chunk-based calculation instead of returning an empty string. * s3api: fix test files for new newListEntry signature Update test files to use the new newListEntry signature where the first parameter is S3ApiServer. Created mockS3ApiServer to properly test owner display name lookup functionality. s3api: use filer.ETag for consistent Md5 handling in getEtagFromEntry Change getEtagFromEntry fallback to use filer.ETag(entry) instead of filer.ETagChunks to ensure legacy entries with Attributes.Md5 are handled consistently with the rest of the codebase. * s3api: optimize list logic and fix conditional header logging - Hoist bucket versioning check out of per-entry callback to avoid repeated getVersioningState calls - Extract appendOrDedup helper function to eliminate duplicate dedup/append logic across multiple code paths - Change If-Match mismatch logging from glog.Errorf to glog.V(3).Infof and remove DEBUG prefix for consistency * s3api: fix test mock to properly initialize IAM accounts Fixed nil pointer dereference in TestNewListEntryOwnerDisplayName by directly initializing the IdentityAccessManagement.accounts map in the test setup. This ensures newListEntry can properly look up account display names without panicking. * cleanup * s3api: remove premature main file cleanup in versioned uploads Removed incorrect cleanup logic that was deleting main files during versioned uploads. This was causing test failures because it deleted objects that should have been preserved as null versions when versioning was first enabled. The deduplication logic in listing is sufficient to handle duplicate entries without deleting files during upload. * s3api: add empty-value guard to getEtagFromEntry Added the same empty-value guard used in getObjectETag to prevent returning quoted empty strings. When ExtETagKey exists but is empty, the function now falls through to filer.ETag calculation instead of returning "". * s3api: fix listing of directory key objects with matching prefix Revert prefix handling logic to use strings.TrimPrefix instead of checking HasPrefix with empty string result. This ensures that when a directory key object exactly matches the prefix (e.g. prefix="dir/", object="dir/"), it is correctly handled as a regular entry instead of being skipped or incorrectly processed as a common prefix. Also fixed missing variable definition. * s3api: refactor list inline dedup to use appendOrDedup helper Refactored the inline deduplication logic in listFilerEntries to use the shared appendOrDedup helper function. This ensures consistent behavior and reduces code duplication. * test: fix port allocation race in s3tables integration test Updated startMiniCluster to find all required ports simultaneously using findAvailablePorts instead of sequentially. This prevents race conditions where the OS reallocates a port that was just released, causing multiple services (e.g. Filer and Volume) to be assigned the same port and fail to start.	1 month ago
Chris Lu	19c18d827a	admin: fix capacity leak in maintenance system by preserving Task IDs (#8214 ) * admin: fix capacity leak in maintenance system by preserving Task IDs Preserve the original TaskID generated during detection and sync task states (Assign/Complete/Retry) with ActiveTopology. This ensures that capacity reserved during task assignment is properly released when a task completes or fails, preventing 'need 9, have 0' capacity exhaustion. Fixes https://github.com/seaweedfs/seaweedfs/issues/8202 * Update weed/admin/maintenance/maintenance_queue.go Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> * Update weed/admin/maintenance/maintenance_queue.go Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> * test: rename ActiveTopologySync to TaskIDPreservation Rename the test case to more accurately reflect its scope, as suggested by the code review bot. * Add TestMaintenanceQueue_ActiveTopologySync to verify task state synchronization and capacity management * Implement task assignment rollback and add verification test * Enhance ActiveTopology.CompleteTask to support pending tasks * Populate storage impact in MaintenanceIntegration.SyncTask * Release capacity in RemoveStaleWorkers when worker becomes unavailable * Release capacity in MaintenanceManager.CancelTask when pending task is cancelled * Sync reloaded tasks with ActiveTopology in LoadTasksFromPersistence * Add verification tests for consistent capacity management lifecycle * Add TestMaintenanceQueue_RetryCapacitySync to verify capacity tracking during retries --------- Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	1 month ago
Chris Lu	000e2bd4a9	logging and debugging	1 month ago
Chris Lu	72a8f598f2	Fix Maintenance Task Sorting and Refactor Log Persistence (#8199 ) * fix float stepping * do not auto refresh * only logs when non 200 status * fix maintenance task sorting and cleanup redundant handler logic * Refactor log retrieval to persist to disk and fix slowness - Move log retrieval to disk-based persistence in GetMaintenanceTaskDetail - Implement background log fetching on task completion in worker_grpc_server.go - Implement async background refresh for in-progress tasks - Completely remove blocking gRPC calls from the UI path to fix 10s timeouts - Cleanup debug logs and performance profiling code * Ensure consistent deterministic sorting in config_persistence cleanup * Replace magic numbers with constants and remove debug logs - Added descriptive constants for truncation limits and timeouts in admin_server.go and worker_grpc_server.go - Replaced magic numbers with these constants throughout the codebase - Verified removal of stdout debug printing - Ensured consistent truncation logic during log persistence * Address code review feedback on history truncation and logging logic - Fix AssignmentHistory double-serialization by copying task in GetMaintenanceTaskDetail - Fix handleTaskCompletion logging logic (mutually exclusive success/failure logs) - Remove unused Timeout field from LogRequestContext and sync select timeouts with constants - Ensure AssignmentHistory is only provided in the top-level field for better JSON structure * Implement goroutine leak protection and request deduplication - Add request deduplication in RequestTaskLogs to prevent multiple concurrent fetches for the same task - Implement safe cleanup in timeout handlers to avoid race conditions in pendingLogRequests map - Add a 10s cooldown for background log refreshes in GetMaintenanceTaskDetail to prevent spamming - Ensure all persistent log-fetching goroutines are bounded and efficiently managed * Fix potential nil pointer panics in maintenance handlers - Add nil checks for adminServer in ShowTaskDetail, ShowMaintenanceWorkers, and UpdateTaskConfig - Update getMaintenanceQueueData to return a descriptive error instead of nil when adminServer is uninitialized - Ensure internal helper methods consistently check for adminServer initialization before use * Strictly enforce disk-only log reading - Remove background log fetching from GetMaintenanceTaskDetail to prevent timeouts and network calls during page view - Remove unused lastLogFetch tracking fields to clean up dead code - Ensure logs are only updated upon task completion via handleTaskCompletion * Refactor GetWorkerLogs to read from disk - Update /api/maintenance/workers/:id/logs endpoint to use configPersistence.LoadTaskExecutionLogs - Remove synchronous gRPC call RequestTaskLogs to prevent timeouts and bad gateway errors - Ensure consistent log retrieval behavior across the application (disk-only) * Fix timestamp parsing in log viewer - Update task_detail.templ JS to handle both ISO 8601 strings and Unix timestamps - Fix "Invalid time value" error when displaying logs fetched from disk - Regenerate templates * master: fallback to HDD if SSD volumes are full in Assign * worker: improve EC detection logging and fix skip counters * worker: add Sync method to TaskLogger interface * worker: implement Sync and ensure logs are flushed before task completion * admin: improve task log retrieval with retries and better timeouts * admin: robust timestamp parsing in task detail view	1 month ago
Chris Lu	746df25164	fix formatting	1 month ago
Chris Lu	1c1f80358f	templ	1 month ago
Chris Lu	2bb21ea276	feat: Add Iceberg REST Catalog server and admin UI (#8175 ) * feat: Add Iceberg REST Catalog server Implement Iceberg REST Catalog API on a separate port (default 8181) that exposes S3 Tables metadata through the Apache Iceberg REST protocol. - Add new weed/s3api/iceberg package with REST handlers - Implement /v1/config endpoint returning catalog configuration - Implement namespace endpoints (list/create/get/head/delete) - Implement table endpoints (list/create/load/head/delete/update) - Add -port.iceberg flag to S3 standalone server (s3.go) - Add -s3.port.iceberg flag to combined server mode (server.go) - Add -s3.port.iceberg flag to mini cluster mode (mini.go) - Support prefix-based routing for multiple catalogs The Iceberg REST server reuses S3 Tables metadata storage under /table-buckets and enables DuckDB, Spark, and other Iceberg clients to connect to SeaweedFS as a catalog. * feat: Add Iceberg Catalog pages to admin UI Add admin UI pages to browse Iceberg catalogs, namespaces, and tables. - Add Iceberg Catalog menu item under Object Store navigation - Create iceberg_catalog.templ showing catalog overview with REST info - Create iceberg_namespaces.templ listing namespaces in a catalog - Create iceberg_tables.templ listing tables in a namespace - Add handlers and routes in admin_handlers.go - Add Iceberg data provider methods in s3tables_management.go - Add Iceberg data types in types.go The Iceberg Catalog pages provide visibility into the same S3 Tables data through an Iceberg-centric lens, including REST endpoint examples for DuckDB and PyIceberg. * test: Add Iceberg catalog integration tests and reorg s3tables tests - Reorganize existing s3tables tests to test/s3tables/table-buckets/ - Add new test/s3tables/catalog/ for Iceberg REST catalog tests - Add TestIcebergConfig to verify /v1/config endpoint - Add TestIcebergNamespaces to verify namespace listing - Add TestDuckDBIntegration for DuckDB connectivity (requires Docker) - Update CI workflow to use new test paths * fix: Generate proper random UUIDs for Iceberg tables Address code review feedback: - Replace placeholder UUID with crypto/rand-based UUID v4 generation - Add detailed TODO comments for handleUpdateTable stub explaining the required atomic metadata swap implementation * fix: Serve Iceberg on localhost listener when binding to different interface Address code review feedback: properly serve the localhost listener when the Iceberg server is bound to a non-localhost interface. * ci: Add Iceberg catalog integration tests to CI Add new job to run Iceberg catalog tests in CI, along with: - Iceberg package build verification - Iceberg unit tests - Iceberg go vet checks - Iceberg format checks * fix: Address code review feedback for Iceberg implementation - fix: Replace hardcoded account ID with s3_constants.AccountAdminId in buildTableBucketARN() - fix: Improve UUID generation error handling with deterministic fallback (timestamp + PID + counter) - fix: Update handleUpdateTable to return HTTP 501 Not Implemented instead of fake success - fix: Better error handling in handleNamespaceExists to distinguish 404 from 500 errors - fix: Use relative URL in template instead of hardcoded localhost:8181 - fix: Add HTTP timeout to test's waitForService function to avoid hangs - fix: Use dynamic ephemeral ports in integration tests to avoid flaky parallel failures - fix: Add Iceberg port to final port configuration logging in mini.go * fix: Address critical issues in Iceberg implementation - fix: Cache table UUIDs to ensure persistence across LoadTable calls The UUID now remains stable for the lifetime of the server session. TODO: For production, UUIDs should be persisted in S3 Tables metadata. - fix: Remove redundant URL-encoded namespace parsing mux router already decodes %1F to \x1F before passing to handlers. Redundant ReplaceAll call could cause bugs with literal %1F in namespace. * fix: Improve test robustness and reduce code duplication - fix: Make DuckDB test more robust by failing on unexpected errors Instead of silently logging errors, now explicitly check for expected conditions (extension not available) and skip the test appropriately. - fix: Extract username helper method to reduce duplication Created getUsername() helper in AdminHandlers to avoid duplicating the username retrieval logic across Iceberg page handlers. * fix: Add mutex protection to table UUID cache Protects concurrent access to the tableUUIDs map with sync.RWMutex. Uses read-lock for fast path when UUID already cached, and write-lock for generating new UUIDs. Includes double-check pattern to handle race condition between read-unlock and write-lock. * style: fix go fmt errors * feat(iceberg): persist table UUID in S3 Tables metadata * feat(admin): configure Iceberg port in Admin UI and commands * refactor: address review comments (flags, tests, handlers) - command/mini: fix tracking of explicit s3.port.iceberg flag - command/admin: add explicit -iceberg.port flag - admin/handlers: reuse getUsername helper - tests: use 127.0.0.1 for ephemeral ports and os.Stat for file size check * test: check error from FileStat in verify_gc_empty_test	1 month ago
Chris Lu	3b9e367c1a	templ	1 month ago
Chris Lu	2ee6e4f391	mount: refresh and evict hot dir cache (#8174 ) * mount: refresh and evict hot dir cache * mount: guard dir update window and extend TTL * mount: reuse timestamp for cache mark * Apply suggestion from @gemini-code-assist[bot] Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> * mount: make dir cache tuning configurable * mount: dedupe dir update notices * mount: restore invalidate-all cache helper * mount: keep hot dir tuning constants * mount: centralize cache state reset * mount: mark refresh completion time * mount: allow disabling idle eviction --------- Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	1 month ago
Chris Lu	79722bcf30	Add s3tables shell and admin UI (#8172 ) * Add shared s3tables manager * Add s3tables shell commands * Add s3tables admin API * Add s3tables admin UI * Fix admin s3tables namespace create * Rename table buckets menu * Centralize s3tables tag validation * Reuse s3tables manager in admin * Extract s3tables list limit * Add s3tables bucket ARN helper * Remove write middleware from s3tables APIs * Fix bucket link and policy hint * Fix table tag parsing and nav link * Disable namespace table link on invalid ARN * Improve s3tables error decode * Return flag parse errors for s3tables tag * Accept query params for namespace create * Bind namespace create form data * Read s3tables JS data from DOM * s3tables: allow empty region ARN * shell: pass s3tables account id * shell: require account for table buckets * shell: use bucket name for namespaces * shell: use bucket name for tables * shell: use bucket name for tags * admin: add table buckets links in file browser * s3api: reuse s3tables tag validation * admin: harden s3tables UI handlers * fix admin list table buckets * allow admin s3tables access * validate s3tables bucket tags * log s3tables bucket metadata errors * rollback table bucket on owner failure * show s3tables bucket owner * add s3tables iam conditions * Add s3tables user permissions UI * Authorize s3tables using identity actions * Add s3tables permissions to user modal * Disambiguate bucket scope in user permissions * Block table bucket names that match S3 buckets * Pretty-print IAM identity JSON * Include tags in s3tables permission context * admin: refactor S3 Tables inline JavaScript into a separate file * s3tables: extend IAM policy condition operators support * shell: use LookupEntry wrapper for s3tables bucket conflict check * admin: handle buildBucketPermissions validation in create/update flows	1 month ago
MorezMartin	20952aa514	Fix jwt error in admin UI (#8140 ) * add jwt token in weed admin headers requests * add jwt token to header for download * :s/upload/download * filer_signing.read despite of filer_signing key * finalize filer_browser_handlers.go * admin: add JWT authorization to file browser handlers * security: fix typos in JWT read validation descriptions * Move security.toml to example and secure keys * security: address PR feedback on JWT enforcement and example keys * security: refactor JWT logic and improve example keys readability * Update docker/Dockerfile.local Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> --------- Co-authored-by: Chris Lu <chris.lu@gmail.com> Co-authored-by: Chris Lu <chrislusf@users.noreply.github.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>	1 month ago
Alasdair Macmillan	41d079a316	Fix Javascript merge issue and UI worker detail display bug (#8135 ) * Fix previous merge issues in Javascript Signed-off-by: Alasdair Macmillan <aimmac23@gmail.com> * Fix issue where worker detail doesn't display without tasks --------- Signed-off-by: Alasdair Macmillan <aimmac23@gmail.com>	1 month ago
Chris Lu	c5b53397c6	templ	1 month ago
Chris Lu	5a7c74feac	migrate IAM policies to multi-file storage (#8114 ) * Add IAM gRPC service definition - Add GetConfiguration/PutConfiguration for config management - Add CreateUser/GetUser/UpdateUser/DeleteUser/ListUsers for user management - Add CreateAccessKey/DeleteAccessKey/GetUserByAccessKey for access key management - Methods mirror existing IAM HTTP API functionality * Add IAM gRPC handlers on filer server - Implement IamGrpcServer with CredentialManager integration - Handle configuration get/put operations - Handle user CRUD operations - Handle access key create/delete operations - All methods delegate to CredentialManager for actual storage * Wire IAM gRPC service to filer server - Add CredentialManager field to FilerOption and FilerServer - Import credential store implementations in filer command - Initialize CredentialManager from credential.toml if available - Register IAM gRPC service on filer gRPC server - Enable credential management via gRPC alongside existing filer services * Regenerate IAM protobuf with gRPC service methods * fix: compilation error in DeleteUser * fix: address code review comments for IAM migration * feat: migrate policies to multi-file layout and fix identity duplicated content * refactor: remove configuration.json and migrate Service Accounts to multi-file layout * refactor: standardize Service Accounts as distinct store entities and fix Admin Server persistence * config: set ServiceAccountsDirectory to /etc/iam/service_accounts * Fix Chrome dialog auto-dismiss with Bootstrap modals - Add modal-alerts.js library with Bootstrap modal replacements - Replace all 15 confirm() calls with showConfirm/showDeleteConfirm - Auto-override window.alert() for all alert() calls - Fixes Chrome 132+ aggressively blocking native dialogs * Upgrade Bootstrap from 5.3.2 to 5.3.8 * Fix syntax error in object_store_users.templ - remove duplicate closing braces * create policy * display errors * migrate to multi-file policies * address PR feedback: use showDeleteConfirm and showErrorMessage in policies.templ, refine migration check * Update policies_templ.go * add service account to iam grpc * iam: fix potential path traversal in policy names by validating name pattern * iam: add GetServiceAccountByAccessKey to CredentialStore interface * iam: implement service account support for PostgresStore Includes full CRUD operations and efficient lookup by access key. * iam: implement GetServiceAccountByAccessKey for filer_etc, grpc, and memory stores Provides efficient lookup of service accounts by access key where possible, with linear scan fallbacks for file-based stores. * iam: remove filer_multiple support Deleted its implementation and references in imports, scaffold config, and core interface constants. Redundant with filer_etc. * clear comment * dash: robustify service account construction - Guard against nil sa.Credential when constructing responses - Fix Expiration logic to only set if > 0, avoiding Unix epoch 1970 - Ensure consistency across Get, Create, and Update handlers * credential/filer_etc: improve error propagation in configuration handlers - Return error from loadServiceAccountsFromMultiFile to callers - Ensure listEntries errors in SaveConfiguration (cleanup logic) are propagated unless they are "not found" failures. - Fixes potential silent failures during IAM configuration sync. * credential/filer_etc: add existence check to CreateServiceAccount Ensures consistency with other stores by preventing accidental overwrite of existing service accounts during creation. * credential/memory: improve store robustness and Reset logic - Enforce ID immutability in UpdateServiceAccount to prevent orphans - Update Reset() to also clear the policies map, ensuring full state cleanup for tests. * dash: improve service account robustness and policy docs - Wrap parent user lookup errors to preserve context - Strictly validate Status field in UpdateServiceAccount - Add deprecation comments to legacy policy management methods * credential/filer_etc: protect against path traversal in service accounts Implemented ID validation (alphanumeric, underscores, hyphens) and applied it to Get, Save, and Delete operations to ensure no directory traversal via saId.json filenames. * credential/postgres: improve robustness and cleanup comments - Removed brainstorming comments in GetServiceAccountByAccessKey - Added missing rows.Err() check during iteration - Properly propagate Scan and Unmarshal errors instead of swallowing them * admin: unify UI alerts and confirmations using Bootstrap modals - Updated modal-alerts.js with improved automated alert type detection - Replaced native alert() and confirm() with showAlert(), showConfirm(), and showDeleteConfirm() across various Templ components - Improved UX for delete operations by providing better context and styling - Ensured consistent error reporting across IAM and Maintenance views * admin: additional UI consistency fixes for alerts and confirmations - Replaced native alert() and confirm() with Bootstrap modals in: - EC volumes (repair flow) - Collection details (repair flow) - File browser (properties and delete) - Maintenance config schema (save and reset) - Improved delete confirmation in file browser with item context - Ensured consistent success/error/info styling for all feedbacks * make * iam: add GetServiceAccountByAccessKey RPC and update GetConfiguration * iam: implement GetServiceAccountByAccessKey on server and client * iam: centralize policy and service account validation * iam: optimize MemoryStore service account lookups with indexing * iam: fix postgres service_accounts table and optimize lookups * admin: refactor modal alerts and clean up dashboard logic * admin: fix EC shards table layout mismatch * admin: URL-encode IAM path parameters for safety * admin: implement pauseWorker logic in maintenance view * iam: add rows.Err() check to postgres ListServiceAccounts * iam: standardize ErrServiceAccountNotFound across credential stores * iam: map ErrServiceAccountNotFound to codes.NotFound in DeleteServiceAccount * iam: refine service account store logic, errors and schema * iam: add validation to GetServiceAccountByAccessKey * admin: refine modal titles and ensure URL safety * admin: address bot review comments for alerts and async usage * iam: fix syntax error by restoring missing function declaration * [FilerEtcStore] improve error handling in CreateServiceAccount Refine error handling to provide clearer messages when checking for existing service accounts. * [PostgresStore] add nil guards and validation to service account methods Ensure input parameters are not nil and required IDs are present to prevent runtime panics and ensure data integrity. * [JS] add shared IAM utility script Consolidate common IAM operations like deleteUser and deleteAccessKey into a shared utility script for better maintainability. * [View] include shared IAM utilities in layout Include iam-utils.js in the main layout to make IAM functions available across all administrative pages. * [View] refactor IAM logic and restore async in EC Shards view Remove redundant local IAM functions and ensure that delete confirmation callbacks are properly marked as async. * [View] consolidate IAM logic in Object Store Users view Remove redundant local definitions of deleteUser and deleteAccessKey, relying on the shared utilities instead. * [View] update generated templ files for UI consistency * credential/postgres: remove redundant name column from service_accounts table The id is already used as the unique identifier and was being copied to the name column. This removes the name column from the schema and updates the INSERT/UPDATE queries. * credential/filer_etc: improve logging for policy migration failures Added Errorf log if AtomicRenameEntry fails during migration to ensure visibility of common failure points. * credential: allow uppercase characters in service account ID username Updated ServiceAccountIdPattern to allow [A-Za-z0-9_-]+ for the username component, matching the actual service account creation logic which uses the parent user name directly. * Update object_store_users_templ.go * admin: fix ec_shards pagination to handle numeric page arguments Updated goToPage in cluster_ec_shards.templ to accept either an Event or a numeric page argument. This prevents errors when goToPage(1) is called directly. Corrected both the .templ source and generated Go code. * credential/filer_etc: improve service account storage robustness Added nil guard to saveServiceAccount, updated GetServiceAccount to return ErrServiceAccountNotFound for empty data, and improved deleteServiceAccount to handle response-level Filer errors.	1 month ago

1 2 3 4

157 Commits (fa7da0f57e6d0ea273cea1e75afb9993d95732d8)