seaweedfs

Commit Graph

Author	SHA1	Message	Date
Chris Lu	0b576dec79	fix: remove unused policyNameFromArn and strings import	3 weeks ago
Chris Lu	85aa6e345b	docs: explain why group methods skip propagateChange Group changes propagate to S3 servers via filer subscription (watching /etc/iam/groups/) rather than gRPC RPCs, since there are no group-specific RPCs in the S3 cache protocol.	3 weeks ago
Chris Lu	7643f18b33	fix: use errors.Is for sql.ErrNoRows comparison in postgres group store	3 weeks ago
Chris Lu	cc2fb6a616	fix: fail group reload on unreadable or corrupt group files Return errors instead of logging and continuing when group files cannot be read or unmarshaled. This prevents silently applying a partial IAM config with missing group memberships or policies.	3 weeks ago
Chris Lu	4f47773cd9	fix: move identity nil check before map lookup and refine hasAttachedPolicies Move the nil check on identity before accessing identity.Name to prevent panic. Also refine hasAttachedPolicies to only consider groups that are enabled and have actual policies attached, so membership in a no-policy group doesn't incorrectly trigger IAM authorization.	3 weeks ago
Chris Lu	afffa9cab9	fix: correct test cleanup ordering for group tests Replace scattered defers with single ordered t.Cleanup in each test to ensure resources are torn down in reverse-creation order: remove membership, detach policies, delete access keys, delete users, delete groups, delete policies. Move bucket cleanup to parent test scope and delete objects before bucket.	3 weeks ago
Chris Lu	e98e2231b0	fix: rollback inline policies on UpdateUser PutPolicies failure If PutPolicies fails after moving inline policies to the new username, restore both the identity name and the inline policies map to their original state to avoid a partial-write window.	3 weeks ago
Chris Lu	d1bb4b6cd0	fix: standalone IAM AttachGroupPolicy uses persisted policy store Check managed policies from GetPolicies() instead of s3cfg.Policies so dynamically created policies are found. Also add duplicate name check to UpdateGroup rename.	3 weeks ago
Chris Lu	4e7302eaa3	fix: distinguish backend errors from missing policies in AttachGroupPolicy Return ServiceFailure for credential manager errors instead of masking them as NoSuchEntity. Also switch ListGroupsForUser to use s3cfg.Groups instead of in-memory reverse index to avoid stale data. Add duplicate name check to UpdateGroup rename.	3 weeks ago
Chris Lu	b709663936	fix: add omitempty to Marker XML tags in group responses Non-truncated responses should not emit an empty <Marker/> element.	3 weeks ago
Chris Lu	81825f9d25	feat: add UpdateGroup handler to standalone IAM API Mirror the embedded IAM UpdateGroup handler in the standalone IAM API for parity.	3 weeks ago
Chris Lu	58964de34c	fix: authenticate raw IAM API calls in group tests The embedded IAM endpoint rejects anonymous requests. Replace callIAMAPI with callIAMAPIAuthenticated that uses JWT bearer token authentication via the test framework.	3 weeks ago
Chris Lu	54dc6b4d40	feat: add UpdateGroup handler to embedded IAM API Add UpdateGroup action to enable/disable groups and rename groups via the IAM API. This is a SeaweedFS extension (not in AWS SDK) used by tests to toggle group disabled status.	3 weeks ago
Chris Lu	66661de746	fix: use credentialManager.GetPolicy for AttachGroupPolicy validation Policies created via CreatePolicy through credentialManager are stored in the credential store, not in s3cfg.Policies (which only has static config policies). Change AttachGroupPolicy to use credentialManager.GetPolicy() for policy existence validation.	3 weeks ago
Chris Lu	76fdae1c5c	test: replace time.Sleep with require.Eventually for propagation waits Use polling with timeout instead of fixed sleeps to reduce flakiness in integration tests waiting for IAM policy propagation.	3 weeks ago
Chris Lu	ef1b2c3cb2	ci: add weed/pb/** to S3 IAM test trigger paths	3 weeks ago
Chris Lu	b0bca50dfd	iam: require empty PolicyNames before group deletion Reject DeleteGroup when group has attached policies, matching the existing members check. Also fix GetGroup error handling in DeletePolicy to only skip ErrGroupNotFound, not all errors.	3 weeks ago
Chris Lu	f687001b9e	iam: short-circuit UpdateUser when newUserName equals current name	3 weeks ago
Chris Lu	5b77cc7a26	admin: use groupErrorToHTTPStatus in GetGroupMembers and GetGroupPolicies	3 weeks ago
Chris Lu	5012a1b716	iam: add input validation and persist groups during migration - Validate nil/empty group name in CreateGroup and UpdateGroup - Save groups in migrateToMultiFile so they survive legacy migration	3 weeks ago
Chris Lu	519ea3c03b	admin: regenerate groups_templ.go with XSS-safe data attributes Regenerated from groups.templ which uses data-group-name attributes instead of inline onclick with string interpolation.	3 weeks ago
Chris Lu	ff9a1fdbaa	iam: wrap ErrUserNotInGroup sentinel in RemoveGroupMember error Wrap credential.ErrUserNotInGroup so errors.Is works in groupErrorToHTTPStatus, returning proper 400 instead of 500.	3 weeks ago
Chris Lu	8089e9bb5b	test: use t.Cleanup for bucket cleanup in group policy test	3 weeks ago
Chris Lu	c84e2f56af	iam: check for name collision before renaming user in UpdateUser Scan identities and inline policies for newUserName before mutating, returning EntityAlreadyExists if a collision is found. Reuse the already-loaded policies instead of loading them again inside the loop.	3 weeks ago
Chris Lu	d80e7af8f0	iam: include group policies in IAM authorization Merge policy names from user's enabled groups into the IAMIdentity used for authorization, so group-attached policies are evaluated alongside user-attached policies.	3 weeks ago
Chris Lu	886019f467	iam: check group attachment before policy deletion Reject DeletePolicy if the policy is attached to any group, matching AWS IAM behavior. Add PolicyArn to ListAttachedGroupPolicies response.	3 weeks ago
Chris Lu	79da12eee9	iam: add Marker field to group list response types Add Marker string field to GetGroupResult, ListGroupsResult, ListAttachedGroupPoliciesResult, and ListGroupsForUserResult to match AWS IAM pagination response format.	3 weeks ago
Chris Lu	1487ecbc8e	iam: detect postgres unique constraint violation and add groups index Return ErrGroupAlreadyExists when INSERT hits SQLState 23505 instead of a generic error. Add index on groups(disabled) for filtered queries.	3 weeks ago
Chris Lu	2f7289d999	iam: add explicit group methods to PropagatingCredentialStore Add CreateGroup, GetGroup, DeleteGroup, ListGroups, and UpdateGroup methods instead of relying on embedded interface fallthrough. Group changes propagate via filer subscription so no RPC propagation needed.	3 weeks ago
Chris Lu	e367600b21	iam: add defensive copies, validation, and XSS fixes for group management - Memory store: clone groups on store/retrieve to prevent mutation - Admin dash: deep copy groups before mutation, validate user/policy exists - HTTP handlers: translate credential errors to proper HTTP status codes, use *bool for Enabled field to distinguish missing vs false - Groups templ: use data attributes + event delegation instead of inline onclick for XSS safety, prevent stale async responses	3 weeks ago
Chris Lu	07f6734e80	iam: address PR review comments for group management - Fix XSS vulnerability in groups.templ: replace innerHTML string concatenation with DOM APIs (createElement/textContent) for rendering member and policy lists - Use userGroups reverse index in embedded IAM ListGroupsForUser for O(1) lookup instead of iterating all groups - Add buildUserGroupsIndex helper in standalone IAM handlers; use it in ListGroupsForUser and removeUserFromAllGroups for efficient lookup - Add note about gRPC store load-modify-save race condition limitation	3 weeks ago
Chris Lu	930252532d	test: add IAM group management integration tests Add comprehensive integration tests for group CRUD, membership, policy attachment, policy enforcement, disabled group behavior, user deletion side effects, and multi-group membership. Add "group" test type to CI matrix in s3-iam-tests workflow.	3 weeks ago
Chris Lu	75b8d7c821	admin: add group management page to admin UI Add groups page with CRUD operations, member management, policy attachment, and enable/disable toggle. Register routes in admin handlers and add Groups entry to sidebar navigation.	3 weeks ago
Chris Lu	05c51d1d5b	iam: watch /etc/iam/groups directory for config changes Add groups directory to the filer subscription watcher so group file changes trigger IAM configuration reloads.	3 weeks ago
Chris Lu	ccf0057ea8	iam: add group side effects on user deletion and rename When a user is deleted, remove them from all groups they belong to. When a user is renamed, update group membership references. Applied to both embedded and standalone IAM handlers.	3 weeks ago
Chris Lu	e660180438	iam: integrate group policies into authorization Add groups and userGroups reverse index to IdentityAccessManagement. Populate both maps during ReplaceS3ApiConfiguration and MergeS3ApiConfiguration. Modify evaluateIAMPolicies to evaluate policies from user's enabled groups in addition to user policies. Update VerifyActionPermission to consider group policies when checking hasAttachedPolicies.	3 weeks ago
Chris Lu	204512151b	iam: add group management handlers to standalone IAM API Add group handlers (CreateGroup, DeleteGroup, GetGroup, ListGroups, AddUserToGroup, RemoveUserFromGroup, AttachGroupPolicy, DetachGroupPolicy, ListAttachedGroupPolicies, ListGroupsForUser) and wire into DoActions dispatch. Also add helper functions for user/policy side effects.	3 weeks ago
Chris Lu	38d504d182	iam: add group management handlers to embedded IAM API Add CreateGroup, DeleteGroup, GetGroup, ListGroups, AddUserToGroup, RemoveUserFromGroup, AttachGroupPolicy, DetachGroupPolicy, ListAttachedGroupPolicies, and ListGroupsForUser handlers with dispatch in ExecuteAction.	3 weeks ago
Chris Lu	b7c991688e	iam: add group IAM response types Add XML response types for group management IAM actions: CreateGroup, DeleteGroup, GetGroup, ListGroups, AddUserToGroup, RemoveUserFromGroup, AttachGroupPolicy, DetachGroupPolicy, ListAttachedGroupPolicies, ListGroupsForUser.	3 weeks ago
Chris Lu	6d8e36a415	iam: add group CRUD to CredentialStore interface and all backends Add group management methods (CreateGroup, GetGroup, DeleteGroup, ListGroups, UpdateGroup) to the CredentialStore interface with implementations for memory, filer_etc, postgres, and grpc stores. Wire group loading/saving into filer_etc LoadConfiguration and SaveConfiguration.	3 weeks ago
Chris Lu	07d8ceafc7	iam: add Group message to protobuf schema Add Group message (name, members, policy_names, disabled) and add groups field to S3ApiConfiguration for IAM group management support (issue #7742).	3 weeks ago
Chris Lu	0647f66bb5	filer.sync: add exponential backoff on unexpected EOF during replication (#8557 ) * filer.sync: add exponential backoff on unexpected EOF during replication When the source volume server drops connections under high traffic, filer.sync retries aggressively (every 1-6s), hammering the already overloaded source. This adds a longer exponential backoff (10s to 2min) specifically for "unexpected EOF" errors, reducing pressure on the source while still retrying indefinitely until success. Also adds more logging throughout the replication path: - Log source URL and error at V(0) when ReadPart or io.ReadAll fails - Log content-length and byte counts at V(4) on success - Log backoff duration in retry messages Fixes #8542 * filer.sync: extract backoff helper and fix 2-minute cap - Extract nextEofBackoff() and isEofError() helpers to deduplicate the backoff logic between fetchAndWrite and uploadManifestChunk - Fix the cap: previously 80s would double to 160s and pass the < 2min check uncapped. Now doubles first, then clamps to 2min. * filer.sync: log source URL instead of empty upload URL on read errors UploadUrl is not populated until after the reader is consumed, so the V(0) and V(4) logs were printing an empty string. Add SourceUrl field to UploadOption and populate it from the HTTP response in fetchAndWrite. * filer.sync: guard isEofError against nil error * filer.sync: use errors.Is for EOF detection, fix log wording - Replace broad substring matching ("read input", "unexpected EOF") with errors.Is(err, io.ErrUnexpectedEOF) and errors.Is(err, io.EOF) so only actual EOF errors trigger the longer backoff - Fix awkward log phrasing: "interrupted replicate" → "interrupted while replicating" * filer.sync: remove EOF backoff from uploadManifestChunk uploadManifestChunk reads from an in-memory bytes.Reader, so any EOF errors there are from the destination side, not a broken source stream. The long source-oriented backoff is inappropriate; let RetryUntil handle destination retries at its normal cadence. --------- Co-authored-by: Copilot <copilot@github.com>	3 weeks ago
Chris Lu	ba66411337	Update plugin_templ.go	3 weeks ago
Chris Lu	7808b301ef	admin: remove Scheduler Settings cards from plugin UI (#8558 ) * admin: remove Scheduler Settings cards, make Next Run full-width Remove the two "Scheduler Settings" placeholder cards from the plugin UI (overview page and scheduler tab). They only contained a text note saying detection intervals are configured per job type, which is self-evident from the per-job-type settings form. Make the "Next Run" card full-width on the overview page since it no longer shares a row with the removed card. * plugin UI: promote Next Run to top summary card row Move "Next Run" from a standalone card into the top row alongside Workers, Active Jobs, and Activities as a compact stat card.	3 weeks ago
Chris Lu	fa7da0f57e	template	3 weeks ago
Chris Lu	961c270aba	admin: expose per-job-type detection interval in plugin UI (#8552 ) * admin: expose per-job-type detection interval in plugin UI The detection_interval_seconds field was not editable in the admin UI. collectAdminSettings() silently preserved the existing value, making it impossible for users to change how often a job type checks for new work. Users would change the global "Sleep Between Iterations" setting expecting it to control job scheduling frequency, but that only controls the scheduler loop's idle polling rate. Add a "Detection Interval (s)" input to the per-job-type admin settings form so users can actually configure it. Fixes #8549 * admin: remove global Sleep Between Iterations setting Now that per-job-type detection intervals are exposed in the UI, the global IdleSleepSeconds setting is redundant and confusing. It only controlled the scheduler loop's idle polling rate, which is always overridden by earliestNextDetectionAt() when job types exist. Replace the three usages with simpler alternatives: - Scheduler loop sleep: use defaultSchedulerIdleSleep constant - Initial delay for new job types: use policy.DetectionInterval/2 (more logical since it's already per-job-type) - Status fallback: use the constant The API endpoints are kept for backward compatibility but the UI no longer exposes or calls them. * admin: restore configurable idle sleep in scheduler loop The EC integration test sets idle_sleep_seconds=1 via the scheduler config API so the scheduler wakes quickly after workers connect. The previous commit replaced this with a hardcoded 613s constant, causing the scheduler to sleep through the entire test window. Restore GetSchedulerConfig().IdleSleepDuration() in the scheduler loop and status reporting. The UI removal of the setting is still correct — the API endpoint remains for programmatic use (e.g., tests). * admin: cap first-run initial delay to 5s instead of DetectionInterval/2 The initial delay for first-run job types was set to policy.DetectionInterval/2, which creates unbounded first-run latency (e.g., 1 hour for vacuum with a 2-hour detection interval). A small fixed 5-second delay provides sufficient stagger without penalizing startup time.	3 weeks ago
Chris Lu	e25558e4d8	admin: fix mobile sidebar menu inaccessible in portrait mode (#8556 ) * admin: fix mobile sidebar menu inaccessible in portrait mode The hamburger button only toggled the user dropdown, leaving the sidebar navigation inaccessible on mobile devices in portrait mode. Add a dedicated sidebar toggle button (visible only on mobile), give the sidebar an id so Bootstrap collapse can target it, add a backdrop overlay for the open state, and auto-close the sidebar when a nav link is clicked. Fixes #8550 * admin: address review feedback on mobile sidebar - Remove redundant JS show/hide.bs.collapse listeners; CSS sibling selector already handles backdrop visibility - Use const instead of var for non-reassigned variables - Move inline style on user icon to CSS class * admin: add aria attributes to user-menu toggler, use CSS variable for navbar height - Add aria-controls, aria-expanded, and aria-label to the user-menu toggle button for assistive technology - Extract hard-coded 56px navbar height into --navbar-height CSS custom property used by sidebar and backdrop positioning * admin: extract hideSidebar helper, use toggler visibility for breakpoint check - Extract duplicated collapse-hide logic into a hideSidebar helper - Replace hardcoded window.innerWidth < 768 with a check on the sidebar toggler's computed display, decoupling JS from CSS breakpoints - Add aria-expanded="false" to sidebar toggle button --------- Co-authored-by: Copilot <copilot@github.com>	3 weeks ago
Chris Lu	587c24ec89	plugin worker: support job type categories (all, default, heavy) (#8547 ) * plugin worker: add handler registry with job categories Introduce a self-registration pattern for plugin worker job handlers. Each handler can register itself via init() with a HandlerFactory that declares its job type, category (default/heavy), CLI aliases, and a builder function. ResolveHandlerFactories accepts a mix of category names ("all", "default", "heavy") and explicit job type names/aliases, returning the matching factories. This enables workers to be configured by resource profile rather than requiring explicit job type enumeration. * plugin worker: register all handlers via init() Each job handler now self-registers into the global handler registry with its canonical job type, category, CLI aliases, and build function: - vacuum: category=default - volume_balance: category=default - admin_script: category=default - erasure_coding: category=heavy - iceberg_maintenance: category=heavy Adding a new job type now only requires adding the init() call in the handler file itself — no other files need to be touched. * plugin worker: replace hardcoded job type switch with registry Remove buildPluginWorkerHandler, parsePluginWorkerJobTypes, and canonicalPluginWorkerJobType from worker_runtime.go. The simplified buildPluginWorkerHandlers now delegates to pluginworker.ResolveHandlerFactories, which resolves category names ("all", "default", "heavy") and explicit job type names/aliases. The default job type is changed from an explicit list to "all", so new handlers registered via init() are automatically picked up. Update all tests to use the new API. * plugin worker: update CLI help text for job categories Update the -jobType flag description and command examples to document category support (all, default, heavy) alongside explicit job type names. * plugin worker: address review feedback - Add CategoryAll constant; use typed constants in tokenAsCategory - Pre-allocate result slice in ResolveHandlerFactories - Add vacuum aliases (vol.vacuum, volume.vacuum) - List alias examples (ec, balance, iceberg) in -jobType flag help - Create handlers aggregator package for subpackage blank imports so new handler subpackages only need to be added in one place - Make category tests relationship-based (subset/union checks) instead of asserting exact handler counts - Add clarifying comments to worker_test.go and mini_plugin_test.go listing expected handler names next to count assertions --------- Co-authored-by: Copilot <copilot@github.com>	3 weeks ago
Chris Lu	f249fb7e63	ci: build _full and _large_disk_full images for arm64 (#8548 ) The _full and _large_disk_full Docker image variants were only built for linux/amd64, preventing ARM64 users from using features like gocdk_pub_sub (RabbitMQ notifications) that require the gocdk build tag. Add linux/arm64 platform target to these variants. Closes #8546	3 weeks ago
Chris Lu	72c2c7ef8b	Add iceberg_maintenance plugin worker handler (Phase 1) (#8501 ) * Add iceberg_maintenance plugin worker handler (Phase 1) Implement automated Iceberg table maintenance as a new plugin worker job type. The handler scans S3 table buckets for tables needing maintenance and executes operations in the correct Iceberg order: expire snapshots, remove orphan files, and rewrite manifests. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Fix unsafe int64→int narrowing for MaxSnapshotsToKeep Use int64(wouldKeep) instead of int(config.MaxSnapshotsToKeep) to avoid potential truncation on 32-bit platforms (CodeQL high severity). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Fix unsafe int64→int narrowing for MinInputFiles Use int64(len(manifests)) instead of int(config.MinInputFiles) to avoid potential truncation on 32-bit platforms (CodeQL high severity). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Fix unsafe int64→int narrowing for MaxCommitRetries Clamp MaxCommitRetries to [1,20] range and keep as int64 throughout the retry loop to avoid truncation on 32-bit platforms (CodeQL high severity). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Sort snapshots explicitly by timestamp in expireSnapshots The previous logic relied on implicit ordering of the snapshot list. Now explicitly sorts snapshots by timestamp descending (most recent first) and uses a simpler keep-count loop: keep the first MaxSnapshotsToKeep newest snapshots plus the current snapshot unconditionally, then expire the rest that exceed the retention window. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Handle errors properly in listFilerEntries Previously all errors from ListEntries and Recv were silently swallowed. Now: treat "not found" errors as empty directory, propagate other ListEntries errors, and check for io.EOF explicitly on Recv instead of breaking on any error. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Fix overly broad HasSuffix check in orphan detection The bare strings.HasSuffix(ref, entry.Name) could match files with similar suffixes (e.g. "123.avro" matching "snap-123.avro"). Replaced with exact relPath match and a "/"-prefixed suffix check to avoid false positives. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Replace fmt.Sscanf with strconv.Atoi in extractMetadataVersion strconv.Atoi is more explicit and less fragile than fmt.Sscanf for parsing a simple integer from a trimmed string. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Recursively traverse directories for orphan file detection The orphan cleanup only listed a single directory level under data/ and metadata/, skipping IsDirectory entries. Partitioned Iceberg tables store data files in nested partition directories (e.g. data/region=us-east/file.parquet) which were never evaluated. Add walkFilerEntries helper that recursively descends into subdirectories, and use it in removeOrphans so all nested files are considered for orphan checks. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Fix manifest path drift from double time.Now() calls rewriteManifests called time.Now().UnixMilli() twice: once for the path embedded in WriteManifest and once for the filename passed to saveFilerFile. These timestamps would differ, causing the manifest's internal path reference to not match the actual saved filename. Compute the filename once and reuse it for both WriteManifest and saveFilerFile so they always reference the same path. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Add TestManifestRewritePathConsistency test Verifies that WriteManifest returns a ManifestFile whose FilePath() matches the path passed in, and that path.Base() of that path matches the filename used for saveFilerFile. This validates the single- timestamp pattern used in rewriteManifests produces consistent paths. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Make parseOperations return error on unknown operations Previously parseOperations silently dropped unknown operation names and could return an empty list. Now validates inputs against the canonical set and returns a clear error if any unknown operation is specified. Updated Execute to surface the error instead of proceeding with an empty operation list. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Use gRPC status codes instead of string matching in listFilerEntries Replace brittle strings.Contains(err.Error(), "not found") check with status.Code(err) == codes.NotFound for proper gRPC error handling. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Add stale-plan guard in commit closures for expireSnapshots and rewriteManifests Both operations plan outside the commit mutation using a snapshot ID captured from the initial metadata read. If the table head advances concurrently, the mutation would create a snapshot parented to the wrong head or remove snapshots based on a stale view. Add a guard inside each mutation closure that verifies currentMeta.CurrentSnapshot().SnapshotID still matches the planned snapshot ID. If it differs, return errStalePlan which propagates immediately (not retried, since the plan itself is invalid). Also fix rewriteManifests to derive SequenceNumber from the fresh metadata (cs.SequenceNumber) instead of the captured currentSnap. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Add compare-and-swap to updateTableMetadataXattr updateTableMetadataXattr previously re-read the entry but did not verify the metadataVersion matched what commitWithRetry had loaded. A concurrent update could be silently clobbered. Now accepts expectedVersion parameter and compares it against the stored metadataVersion before writing. Returns errMetadataVersionConflict on mismatch, which commitWithRetry treats as retryable (deletes the staged metadata file and retries with fresh state). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Export shared plugin worker helpers for use by sub-packages Export ShouldSkipDetectionByInterval, BuildExecutorActivity, and BuildDetectorActivity so the iceberg sub-package can reuse them without duplicating logic. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Refactor iceberg maintenance handler into weed/plugin/worker/iceberg package Split the 1432-line iceberg_maintenance_handler.go into focused files in a new iceberg sub-package: handler.go, config.go, detection.go, operations.go, filer_io.go, and compact.go (Phase 2 data compaction). Key changes: - Rename types to drop stutter (IcebergMaintenanceHandler → Handler, etc.) - Fix loadFileByIcebergPath to preserve nested directory paths via normalizeIcebergPath instead of path.Base which dropped subdirectories - Check SendProgress errors instead of discarding them - Add stale-plan guard to compactDataFiles commitWithRetry closure - Add "compact" operation to parseOperations canonical order - Duplicate readStringConfig/readInt64Config helpers (~20 lines) - Update worker_runtime.go to import new iceberg sub-package Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Remove iceberg_maintenance from default plugin worker job types Iceberg maintenance is not yet ready to be enabled by default. Workers can still opt in by explicitly listing iceberg_maintenance in their job types configuration. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Clamp config values to safe minimums in ParseConfig Prevents misconfiguration by enforcing minimum values using the default constants for all config fields. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Harden filer I/O: path helpers, strict CAS guard, path traversal prevention - Use path.Dir/path.Base instead of strings.SplitN in loadCurrentMetadata - Make CAS guard error on missing or unparseable metadataVersion - Add path.Clean and traversal validation in loadFileByIcebergPath Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Fix compact: single snapshot ID, oversized bin splitting, ensureFilerDir - Use single newSnapID for all manifest entries in a compaction run - Add splitOversizedBin to break bins exceeding targetSize - Make ensureFilerDir only create on NotFound, propagate other errors Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Add wildcard filters, scan limit, and context cancellation to table scanning - Use wildcard matchers (, ?) for bucket/namespace/table filters - Add limit parameter to scanTablesForMaintenance for early termination - Add ctx.Done() checks in bucket and namespace scan loops - Update filter UI descriptions and placeholders for wildcard support Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Remove dead detection interval check and validate namespace parameter - Remove ineffective ShouldSkipDetectionByInterval call with hardcoded 0 - Add namespace to required parameter validation in Execute Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Improve operations: exponential backoff, orphan matching, full file cleanup - Use exponential backoff (50ms, 100ms, 200ms, ...) in commitWithRetry - Use normalizeIcebergPath for orphan matching instead of fragile suffix check - Add collectSnapshotFiles to traverse manifest lists → manifests → data files - Delete all unreferenced files after expiring snapshots, not just manifest lists - Refactor removeOrphans to reuse collectSnapshotFiles Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * iceberg: fix ensureFilerDir to handle filer_pb.ErrNotFound sentinel filer_pb.LookupEntry converts gRPC NotFound errors to filer_pb.ErrNotFound (a plain sentinel), so status.Code() never returns codes.NotFound for that error. This caused ensureFilerDir to return an error instead of creating the directory when it didn't exist. * iceberg: clean up orphaned artifacts when compaction commit fails Track all files written during compaction (merged data files, manifest, manifest list) and delete them if the commit or any subsequent write step fails, preventing orphaned files from accumulating in the filer. * iceberg: derive tablePath from namespace/tableName when empty An empty table_path parameter would be passed to maintenance operations unchecked. Default it to path.Join(namespace, tableName) when not provided. * iceberg: make collectSnapshotFiles return error on read/parse failure Previously, errors reading manifests were logged and skipped, returning a partial reference set. This could cause incorrect delete decisions during snapshot expiration or orphan cleanup. Now the function returns an error and all callers abort when reference data is incomplete. * iceberg: include active metadata file in removeOrphans referenced set The metadataFileName returned by loadCurrentMetadata was discarded, so the active metadata file could be incorrectly treated as an orphan and deleted. Capture it and add it to the referencedFiles map. * iceberg: only retry commitWithRetry on metadata version conflicts Previously all errors from updateTableMetadataXattr triggered retries. Now only errMetadataVersionConflict causes retry; other errors (permissions, transport, malformed xattr) fail immediately. * iceberg: respect req.Limit in fakeFilerServer.ListEntries mock The mock ListEntries ignored the Limit field, so tests couldn't exercise pagination. Now it stops streaming once Limit entries have been sent. * iceberg: validate parquet schema compatibility before merging files mergeParquetFiles now compares each source file's schema against the first file's schema and aborts with a clear error if they differ, instead of blindly writing rows that could panic or produce corrupt output. * iceberg: normalize empty JobType to canonical jobType in Execute events When request.Job.JobType is empty, status events and completion messages were emitted with a blank job type. Derive a canonical value early and use it consistently in all outbound events. * iceberg: log warning on unexpected config value types in read helpers readStringConfig and readInt64Config now log a V(1) warning when they encounter an unhandled ConfigValue kind, aiding debugging of unexpected config types that silently fall back to defaults. * worker: add iceberg_maintenance to default plugin worker job types Workers using the default job types list didn't advertise the iceberg_maintenance handler despite the handler and canonical name being registered. Add it so workers pick up the handler by default. * iceberg: use defer and detached context for compaction artifact cleanup The cleanup closure used the job context which could already be canceled, and was not called on ctx.Done() early exits. Switch to a deferred cleanup with a detached context (30s timeout) so artifact deletion completes on all exit paths including context cancellation. * iceberg: use proportional jitter in commitWithRetry backoff Fixed 25ms max jitter becomes insignificant at higher retry attempts. Use 0-20% of the current backoff value instead so jitter scales with the exponential delay. * iceberg: add malformed filename cases to extractMetadataVersion test Cover edge cases like "invalid.metadata.json", "metadata.json", "", and "v.metadata.json" to ensure the function returns 0 for unparseable inputs. * iceberg: fail compaction on manifest read errors and skip delete manifests Previously, unreadable manifests were silently skipped during compaction, which could drop live files from the entry set. Now manifest read/parse errors are returned as fatal errors. Also abort compaction when delete manifests exist since the compactor does not apply deletes — carrying them through unchanged could produce incorrect results. * iceberg: use table-relative path for active metadata file in orphan scan metadataFileName was stored as a basename (e.g. "v1.metadata.json") but the orphan scanner matches against table-relative paths like "metadata/v1.metadata.json". Prefix with "metadata/" so the active metadata file is correctly recognized as referenced. * iceberg: fix MetadataBuilderFromBase location to use metadata file path The second argument to MetadataBuilderFromBase records the previous metadata file in the metadata log. Using meta.Location() (the table root) was incorrect — it must be the actual metadata file path so old metadata files can be tracked and eventually cleaned up. * iceberg: update metadataLocation and versionToken in xattr on commit updateTableMetadataXattr was only updating metadataVersion, modifiedAt, and fullMetadata but not metadataLocation or versionToken. This left catalog state inconsistent after maintenance commits — the metadataLocation still pointed to the old metadata file and the versionToken was stale. Add a newMetadataLocation parameter and regenerate the versionToken on every commit, matching the S3 Tables handler behavior. * iceberg: group manifest entries by partition spec in rewriteManifests rewriteManifests was writing all entries into a single manifest using the table's current partition spec. For spec-evolved tables where manifests reference different partition specs, this produces an invalid manifest. Group entries by the source manifest's PartitionSpecID and write one merged manifest per spec, looking up each spec from the table's PartitionSpecs list. * iceberg: remove dead code loop for non-data manifests in compaction The early abort guard at the top of compactDataFiles already ensures no delete manifests are present. The loop that copied non-data manifests into allManifests was unreachable dead code. * iceberg: use JSON encoding in partitionKey for unambiguous grouping partitionKey used fmt.Sprintf("%d=%v") joined by commas, which produces ambiguous keys when partition values contain commas or '='. Use json.Marshal for values and NUL byte as separator to eliminate collisions. * iceberg: precompute normalized reference set in removeOrphans The orphan check was O(files × refs) because it normalized each reference path inside the per-file loop. Precompute the normalized set once for O(1) lookups per candidate file. * iceberg: add artifact cleanup to rewriteManifests on commit failure rewriteManifests writes merged manifests and a manifest list to the filer before committing but did not clean them up on failure. Add the same deferred cleanup pattern used by compactDataFiles: track written artifacts and delete them if the commit does not succeed. * iceberg: pass isDeleteData=true in deleteFilerFile deleteFilerFile called DoRemove with isDeleteData=false, which only removed filer metadata and left chunk data behind on volume servers. All other data-file deletion callers in the codebase pass true. * iceberg: clean up test: remove unused snapID, simplify TestDetectWithFakeFiler Remove unused snapID variable and eliminate the unnecessary second fake filer + entry copy in TestDetectWithFakeFiler by capturing the client from the first startFakeFiler call. * fix: update TestWorkerDefaultJobTypes to expect 5 job types The test expected 4 default job types but iceberg_maintenance was added as a 5th default in a previous commit. * iceberg: document client-side CAS TOCTOU limitation in updateTableMetadataXattr Add a note explaining the race window where two workers can both pass the version check and race at UpdateEntry. The proper fix requires server-side precondition support on UpdateEntryRequest. * iceberg: remove unused sender variable in TestFullExecuteFlow * iceberg: abort compaction when multiple partition specs are present The compactor writes all entries into a single manifest using the current partition spec, which is invalid for spec-evolved tables. Detect multiple PartitionSpecIDs and skip compaction until per-spec compaction is implemented. * iceberg: validate tablePath to prevent directory traversal Sanitize the table_path parameter with path.Clean and verify it matches the expected namespace/tableName prefix to prevent path traversal attacks via crafted job parameters. * iceberg: cap retry backoff at 5s and make it context-aware The exponential backoff could grow unbounded and blocked on time.Sleep ignoring context cancellation. Cap at 5s and use a timer with select on ctx.Done so retries respect cancellation. * iceberg: write manifest list with new snapshot identity in rewriteManifests The manifest list was written with the old snapshot's ID and sequence number, but the new snapshot created afterwards used a different identity. Compute newSnapshotID and newSeqNum before writing manifests and the manifest list so all artifacts are consistent. * ec: also remove .vif file in removeEcVolumeFiles removeEcVolumeFiles cleaned up .ecx, .ecj, and shard files but not the .vif volume info file, leaving it orphaned. The .vif file lives in the data directory alongside shard files. The directory handling for index vs data files was already correct: .ecx/.ecj are removed from IdxDirectory and shard files from Directory, matching how NewEcVolume loads them. Revert "ec: also remove .vif file in removeEcVolumeFiles" This reverts commit `acc82449e1`. * iceberg: skip orphan entries with nil Attributes instead of defaulting to epoch When entry.Attributes is nil, mtime defaulted to Unix epoch (1970), making unknown-age entries appear ancient and eligible for deletion. Skip these entries instead to avoid deleting files whose age cannot be determined. * iceberg: use unique metadata filenames to prevent concurrent write clobbering Add timestamp nonce to metadata filenames (e.g. v3-1709766000.metadata.json) so concurrent writers stage to distinct files. Update extractMetadataVersion to strip the nonce suffix, and loadCurrentMetadata to read the actual filename from the metadataLocation xattr field. * iceberg: defer artifact tracking until data file builder succeeds Move the writtenArtifacts append to after NewDataFileBuilder succeeds, so a failed builder doesn't leave a stale entry for an already-deleted file in the cleanup list. * iceberg: use detached context for metadata file cleanup Use context.WithTimeout(context.Background(), 10s) when deleting staged metadata files after CAS failure, so cleanup runs even if the original request context is canceled. * test: update default job types count to include iceberg_maintenance * iceberg: use parquet.EqualNodes for structural schema comparison Replace String()-based schema comparison with parquet.EqualNodes which correctly compares types, repetition levels, and logical types. * iceberg: add nonce-suffixed filename cases to TestExtractMetadataVersion * test: assert iceberg_maintenance is present in default job types * iceberg: validate operations config early in Detect Call parseOperations in Detect so typos in the operations config fail fast before emitting proposals, matching the validation already done in Execute. * iceberg: detect chunked files in loadFileByIcebergPath Return an explicit error when a file has chunks but no inline content, rather than silently returning empty data. Data files uploaded via S3 are stored as chunks, so compaction would otherwise produce corrupt merged files. --------- Co-authored-by: Copilot <copilot@github.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	3 weeks ago

1 2 3 4 5 ...

13094 Commits (0b576dec792bd2627deb6ddab8ea6668684067e7) All Branches Search

13094 Commits (0b576dec792bd2627deb6ddab8ea6668684067e7)

All Branches