seaweedfs

Commit Graph

Author	SHA1	Message	Date
Chris Lu	df5e8210df	Implement IAM managed policy operations (#8507 ) * feat: Implement IAM managed policy operations (GetPolicy, ListPolicies, DeletePolicy, AttachUserPolicy, DetachUserPolicy) - Add response type aliases in iamapi_response.go for managed policy operations - Implement 6 handler methods in iamapi_management_handlers.go: - GetPolicy: Lookup managed policy by ARN - DeletePolicy: Remove managed policy - ListPolicies: List all managed policies - AttachUserPolicy: Attach managed policy to user, aggregating inline + managed actions - DetachUserPolicy: Detach managed policy from user - ListAttachedUserPolicies: List user's attached managed policies - Add computeAllActionsForUser() to aggregate actions from both inline and managed policies - Wire 6 new DoActions switch cases for policy operations - Add comprehensive tests for all new handlers - Fixes #8506 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: address PR review feedback for IAM managed policy operations - Add parsePolicyArn() helper with proper ARN prefix validation, replacing fragile strings.Split parsing in GetPolicy, DeletePolicy, AttachUserPolicy, and DetachUserPolicy - DeletePolicy now detaches the policy from all users and recomputes their aggregated actions, preventing stale permissions after deletion - Set changed=true for DeletePolicy DoActions case so identity updates persist - Make PolicyId consistent: CreatePolicy now uses Hash(&policyName) matching GetPolicy and ListPolicies - Remove redundant nil map checks (Go handles nil map lookups safely) - DRY up action deduplication in computeAllActionsForUser with addUniqueActions closure - Add tests for invalid/empty ARN rejection and DeletePolicy identity cleanup Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat: add integration tests for managed policy lifecycle (#8506) Add two integration tests covering the user-reported use case where managed policy operations returned 500 errors: - TestS3IAMManagedPolicyLifecycle: end-to-end workflow matching the issue report — CreatePolicy, ListPolicies, GetPolicy, AttachUserPolicy, ListAttachedUserPolicies, idempotent re-attach, DeletePolicy while attached (expects DeleteConflict), DetachUserPolicy, DeletePolicy, and verification that deleted policy is gone - TestS3IAMManagedPolicyErrorCases: covers error paths — nonexistent policy/user for GetPolicy, DeletePolicy, AttachUserPolicy, DetachUserPolicy, and ListAttachedUserPolicies Also fixes DeletePolicy to reject deletion when policy is still attached to a user (AWS-compatible DeleteConflictException), and adds the 409 status code mapping for DeleteConflictException in the error response handler. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: nil map panic in CreatePolicy, add PolicyId test assertions - Initialize policies.Policies map in CreatePolicy if nil (prevents panic when no policies exist yet); also handle filer_pb.ErrNotFound like other callers - Add PolicyId assertions in TestGetPolicy and TestListPolicies to lock in the consistent Hash(&policyName) behavior - Remove redundant time.Sleep calls from new integration tests (startMiniCluster already blocks on waitForS3Ready) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: PutUserPolicy and DeleteUserPolicy now preserve managed policy actions PutUserPolicy and DeleteUserPolicy were calling computeAggregatedActionsForUser (inline-only), overwriting ident.Actions and dropping managed policy actions. Both now call computeAllActionsForUser which unions inline + managed actions. Add TestManagedPolicyActionsPreservedAcrossInlineMutations regression test: attaches a managed policy, adds an inline policy (verifies both actions present), deletes the inline policy, then asserts managed policy actions still persist. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: PutUserPolicy verifies user exists before persisting inline policy Previously the inline policy was written to storage before checking if the target user exists in s3cfg.Identities, leaving orphaned policy data when the user was absent. Now validates the user first, returning NoSuchEntityException immediately if not found. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: prevent stale/lost actions on computeAllActionsForUser failure - PutUserPolicy: on recomputation failure, preserve existing ident.Actions instead of falling back to only the current inline policy's actions - DeleteUserPolicy: on recomputation failure, preserve existing ident.Actions instead of assigning nil (which wiped all permissions) - AttachUserPolicy: roll back ident.PolicyNames and return error if action recomputation fails, keeping identity consistent - DetachUserPolicy: roll back ident.PolicyNames and return error if GetPolicies or action recomputation fails - Add doc comment on newTestIamApiServer noting it only sets s3ApiConfig Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	23 hours ago
Chris Lu	10a30a83e1	s3api: add GetObjectAttributes API support (#8504 ) * s3api: add error code and header constants for GetObjectAttributes Add ErrInvalidAttributeName error code and header constants (X-Amz-Object-Attributes, X-Amz-Max-Parts, X-Amz-Part-Number-Marker, X-Amz-Delete-Marker) needed by the S3 GetObjectAttributes API. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * s3api: implement GetObjectAttributes handler Add GetObjectAttributesHandler that returns selected object metadata (ETag, Checksum, StorageClass, ObjectSize, ObjectParts) without returning the object body. Follows the same versioning and conditional header patterns as HeadObjectHandler. The handler parses the X-Amz-Object-Attributes header to determine which attributes to include in the XML response, and supports ObjectParts pagination via X-Amz-Max-Parts and X-Amz-Part-Number-Marker. Ref: https://docs.aws.amazon.com/AmazonS3/latest/API/API_GetObjectAttributes.html Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * s3api: register GetObjectAttributes route Register the GET /{object}?attributes route for the GetObjectAttributes API, placed before other object query routes to ensure proper matching. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * s3api: add integration tests for GetObjectAttributes Test coverage: - Basic: simple object with all attribute types - MultipartObject: multipart upload with parts pagination - SelectiveAttributes: requesting only specific attributes - InvalidAttribute: server rejects invalid attribute names - NonExistentObject: returns NoSuchKey for missing objects Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * s3api: add versioned object test for GetObjectAttributes Test puts two versions of the same object and verifies that: - GetObjectAttributes returns the latest version by default - GetObjectAttributes with versionId returns the specific version - ObjectSize and VersionId are correct for each version Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * s3api: fix combined conditional header evaluation per RFC 7232 Per RFC 7232: - Section 3.4: If-Unmodified-Since MUST be ignored when If-Match is present (If-Match is the more accurate replacement) - Section 3.3: If-Modified-Since MUST be ignored when If-None-Match is present (If-None-Match is the more accurate replacement) Previously, all four conditional headers were evaluated independently. This caused incorrect 412 responses when If-Match succeeded but If-Unmodified-Since failed (should return 200 per AWS S3 behavior). Fix applied to both validateConditionalHeadersForReads (GET/HEAD) and validateConditionalHeaders (PUT) paths. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * s3api: add conditional header combination tests for GetObjectAttributes Test the RFC 7232 combined conditional header semantics: - If-Match=true + If-Unmodified-Since=false => 200 (If-Unmodified-Since ignored) - If-None-Match=false + If-Modified-Since=true => 304 (If-Modified-Since ignored) - If-None-Match=true + If-Modified-Since=false => 200 (If-Modified-Since ignored) - If-Match=true + If-Unmodified-Since=true => 200 - If-Match=false => 412 regardless Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * s3api: document Checksum attribute as not yet populated Checksum is accepted in validation (so clients requesting it don't get a 400 error, matching AWS behavior for objects without checksums) but SeaweedFS does not yet store S3 checksums. Add a comment explaining this and noting where to populate it when checksum storage is added. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * s3api: add s3:GetObjectAttributes IAM action for ?attributes query Previously, GET /{object}?attributes resolved to s3:GetObject via the fallback path since resolveFromQueryParameters had no case for the "attributes" query parameter. Add S3_ACTION_GET_OBJECT_ATTRIBUTES constant ("s3:GetObjectAttributes") and a branch in resolveFromQueryParameters to return it for GET requests with the "attributes" query parameter, so IAM policies can distinguish GetObjectAttributes from GetObject. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * s3api: evaluate conditional headers after version resolution Move conditional header evaluation (If-Match, If-None-Match, etc.) to after the version resolution step in GetObjectAttributesHandler. This ensures that when a specific versionId is requested, conditions are checked against the correct version entry rather than always against the latest version. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * s3api: use bounded HTTP client in GetObjectAttributes tests Replace http.DefaultClient with a timeout-aware http.Client (10s) in the signedGetObjectAttributes helper and testGetObjectAttributesInvalid to prevent tests from hanging indefinitely. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * s3api: check attributes query before versionId in action resolver Move the GetObjectAttributes action check before the versionId check in resolveFromQueryParameters. This fixes GET /bucket/key?attributes&versionId=xyz being incorrectly classified as s3:GetObjectVersion instead of s3:GetObjectAttributes. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * s3api: add tests for versioned conditional headers and action resolver Add integration test that verifies conditional headers (If-Match, If-None-Match) are evaluated against the requested version entry, not the latest version. This covers the fix in `55c409dec`. Add unit test for ResolveS3Action verifying that the attributes query parameter takes precedence over versionId, so GET ?attributes&versionId resolves to s3:GetObjectAttributes. This covers the fix in `b92c61c95`. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * s3api: guard negative chunk indices and rename PartsCount field Add bounds checks for b.StartChunk >= 0 and b.EndChunk >= 0 in buildObjectAttributesParts to prevent panics from corrupted metadata with negative index values. Rename ObjectAttributesParts.PartsCount to TotalPartsCount to match the AWS SDK v2 Go field naming convention, while preserving the XML element name "PartsCount" via the struct tag. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * s3api: reject malformed max-parts and part-number-marker headers Return ErrInvalidMaxParts and ErrInvalidPartNumberMarker when the X-Amz-Max-Parts or X-Amz-Part-Number-Marker headers contain non-integer or negative values, matching ListObjectPartsHandler behavior. Previously these were silently ignored with defaults. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	24 hours ago
Chris Lu	18ccc9b773	Plugin scheduler: sequential iterations with max runtime (#8496 ) * pb: add job type max runtime setting * plugin: default job type max runtime * plugin: redesign scheduler loop * admin ui: update scheduler settings * plugin: fix scheduler loop state name * plugin scheduler: restore backlog skip * plugin scheduler: drop legacy detection helper * admin api: require scheduler config body * admin ui: preserve detection interval on save * plugin scheduler: use job context and drain cancels * plugin scheduler: respect detection intervals * plugin scheduler: gate runs and drain queue * ec test: reuse req/resp vars * ec test: add scheduler debug logs * Adjust scheduler idle sleep and initial run delay * Clear pending job queue before scheduler runs * Log next detection time in EC integration test * Improve plugin scheduler debug logging in EC test * Expose scheduler next detection time * Log scheduler next detection time in EC test * Wake scheduler on config or worker updates * Expose scheduler sleep interval in UI * Fix scheduler sleep save value selection * Set scheduler idle sleep default to 613s * Show scheduler next run time in plugin UI --------- Co-authored-by: Copilot <copilot@github.com>	2 days ago
Chris Lu	fb944f0071	test: add Polaris S3 tables integration tests (#8489 ) * test: add polaris integration test harness * test: add polaris integration coverage * ci: run polaris s3 tables tests * test: harden polaris harness * test: DRY polaris integration tests * ci: pre-pull Polaris image * test: extend Polaris pull timeout * test: refine polaris credentials selection * test: keep Polaris tables inside allowed location * test: use fresh context for polaris cleanup * test: prefer specific Polaris storage credential * test: tolerate Polaris credential variants * test: request Polaris vended credentials * test: load Polaris table credentials * test: allow polaris vended access via bucket policy * test: align Polaris object keys with table location * test: rename Polaris vended role references * test: simplify Polaris vended credential extraction * test: marshal Polaris bucket policy	3 days ago
Chris Lu	340339f678	Add Apache Polaris integration tests (#8478 ) * test: add polaris integration test harness * test: add polaris integration coverage * ci: run polaris s3 tables tests * test: harden polaris harness * test: DRY polaris integration tests * ci: pre-pull Polaris image * test: extend Polaris pull timeout * test: refine polaris credentials selection * test: keep Polaris tables inside allowed location * test: use fresh context for polaris cleanup * test: prefer specific Polaris storage credential * test: tolerate Polaris credential variants * test: request Polaris vended credentials * test: load Polaris table credentials * test: allow polaris vended access via bucket policy * test: align Polaris object keys with table location	4 days ago
dependabot[bot]	2fc47a48ec	build(deps): bump com.fasterxml.jackson.core:jackson-core from 2.18.2 to 2.18.6 in /test/java/spark (#8476 )	4 days ago
Chris Lu	c5d5b517f6	Add lakekeeper table bucket integration test (#8470 ) * Add lakekeeper table bucket integration test Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Convert lakekeeper table bucket test to Go Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Add multipart upload to lakekeeper table bucket test Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Remove lakekeeper test skips Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Convert lakekeeper repro to Go Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --------- Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	5 days ago
blitt001	3d81d5bef7	Fix S3 signature verification behind reverse proxies (#8444 ) * Fix S3 signature verification behind reverse proxies When SeaweedFS is deployed behind a reverse proxy (e.g. nginx, Kong, Traefik), AWS S3 Signature V4 verification fails because the Host header the client signed with (e.g. "localhost:9000") differs from the Host header SeaweedFS receives on the backend (e.g. "seaweedfs:8333"). This commit adds a new -s3.externalUrl parameter (and S3_EXTERNAL_URL environment variable) that tells SeaweedFS what public-facing URL clients use to connect. When set, SeaweedFS uses this host value for signature verification instead of the Host header from the incoming request. New parameter: -s3.externalUrl (flag) or S3_EXTERNAL_URL (environment variable) Example: -s3.externalUrl=http://localhost:9000 Example: S3_EXTERNAL_URL=https://s3.example.com The environment variable is particularly useful in Docker/Kubernetes deployments where the external URL is injected via container config. The flag takes precedence over the environment variable when both are set. At startup, the URL is parsed and default ports are stripped to match AWS SDK behavior (port 80 for HTTP, port 443 for HTTPS), so "http://s3.example.com:80" and "http://s3.example.com" are equivalent. Bugs fixed: - Default port stripping was removed by a prior PR, causing signature mismatches when clients connect on standard ports (80/443) - X-Forwarded-Port was ignored when X-Forwarded-Host was not present - Scheme detection now uses proper precedence: X-Forwarded-Proto > TLS connection > URL scheme > "http" - Test expectations for standard port stripping were incorrect - expectedHost field in TestSignatureV4WithForwardedPort was declared but never actually checked (self-referential test) * Add Docker integration test for S3 proxy signature verification Docker Compose setup with nginx reverse proxy to validate that the -s3.externalUrl parameter (or S3_EXTERNAL_URL env var) correctly resolves S3 signature verification when SeaweedFS runs behind a proxy. The test uses nginx proxying port 9000 to SeaweedFS on port 8333, with X-Forwarded-Host/Port/Proto headers set. SeaweedFS is configured with -s3.externalUrl=http://localhost:9000 so it uses "localhost:9000" for signature verification, matching what the AWS CLI signs with. The test can be run with aws CLI on the host or without it by using the amazon/aws-cli Docker image with --network host. Test covers: create-bucket, list-buckets, put-object, head-object, list-objects-v2, get-object, content round-trip integrity, delete-object, and delete-bucket — all through the reverse proxy. * Create s3-proxy-signature-tests.yml * fix CLI * fix CI * Update s3-proxy-signature-tests.yml * address comments * Update Dockerfile * add user * no need for fuse * Update s3-proxy-signature-tests.yml * debug * weed mini * fix health check * health check * fix health checking --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> Co-authored-by: Chris Lu <chris.lu@gmail.com>	7 days ago
Chris Lu	453310b057	Add plugin worker integration tests for erasure coding (#8450 ) * test: add plugin worker integration harness * test: add erasure coding detection integration tests * test: add erasure coding execution integration tests * ci: add plugin worker integration workflow * test: extend fake volume server for vacuum and balance * test: expand erasure coding detection topologies * test: add large erasure coding detection topology * test: add vacuum plugin worker integration tests * test: add volume balance plugin worker integration tests * ci: run plugin worker tests per worker * fixes * erasure coding: stop after placement failures * erasure coding: record hasMore when early stopping * erasure coding: relax large topology expectations	1 week ago
Chris Lu	d2b92938ee	Make EC detection context aware (#8449 ) * Make EC detection context aware * Update register.go * Speed up EC detection planning * Add tests for EC detection planner * optimizations detection.go: extracted ParseCollectionFilter (exported) and feed it into the detection loop so both detection and tracing share the same parsing/whitelisting logic; the detection loop now iterates on a sorted list of volume IDs, checks the context at every iteration, and only sets hasMore when there are still unprocessed groups after hitting maxResults, keeping runtime bounded while still scheduling planned tasks before returning the results. erasure_coding_handler.go: dropped the duplicated inline filter parsing in emitErasureCodingDetectionDecisionTrace and now reuse erasurecodingtask.ParseCollectionFilter, and the summary suffix logic now only accounts for the hasMore case that can actually happen. detection_test.go: updated the helper topology builder to use master_pb.VolumeInformationMessage (matching the current protobuf types) and tightened the cancellation/max-results tests so they reliably exercise the detection logic (cancel before calling Detection, and provide enough disks so one result is produced before the limit). * use working directory * fix compilation * fix compilation * rename * go vet * fix getenv * address comments, fix error	1 week ago
Chris Lu	a92e9baddf	Add integration test for multipart operations inheriting s3:PutObject permissions TestS3MultipartOperationsInheritPutObjectPermissions verifies that multipart upload operations (CreateMultipartUpload, UploadPart, ListParts, CompleteMultipartUpload, AbortMultipartUpload, ListMultipartUploads) work correctly when a user has only s3:PutObject permission granted. This test validates the behavior where multipart operations are implicitly granted when s3:PutObject is authorized, as multipart upload is an implementation detail of putting objects in S3.	1 week ago
dependabot[bot]	96078bc87e	build(deps): bump github.com/cloudflare/circl from 1.6.1 to 1.6.3 in /test/kafka (#8447 ) build(deps): bump github.com/cloudflare/circl in /test/kafka Bumps [github.com/cloudflare/circl](https://github.com/cloudflare/circl) from 1.6.1 to 1.6.3. - [Release notes](https://github.com/cloudflare/circl/releases) - [Commits](https://github.com/cloudflare/circl/compare/v1.6.1...v1.6.3) --- updated-dependencies: - dependency-name: github.com/cloudflare/circl dependency-version: 1.6.3 dependency-type: indirect ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	1 week ago
Chris Lu	b565a0cc86	Adds volume.merge command with deduplication and disk-based backend (#8441 ) * Enhance volume.merge command with deduplication and disk-based backend * Fix copyVolume function call with correct argument order and missing bool parameter * Revert "Fix copyVolume function call with correct argument order and missing bool parameter" This reverts commit `7b4a190643`. * Fix critical issues: per-replica writable tracking, tail goroutine cancellation via done channel, and debug logging for allocation failures * Optimize memory usage with watermark approach for duplicate detection * Fix critical issues: swap copyVolume arguments, increase idle timeout, remove file double-close, use glog for logging * Replace temporary file with in-memory buffer for needle blob serialization * test(volume.merge): Add comprehensive unit and integration tests Add 7 unit tests covering: - Ordering by timestamp - Cross-stream duplicate deduplication - Empty stream handling - Complex multi-stream deduplication - Single stream passthrough - Large needle ID support - LastModified fallback when timestamp unavailable Add 2 integration validation tests: - TestMergeWorkflowValidation: Documents 9-stage merge workflow - TestMergeEdgeCaseHandling: Validates 10 edge case handling All tests passing (9/9) * fix(volume.merge): Use time window for deduplication to handle clock skew The same needle ID can have different timestamps on different servers due to clock skew and replication lag. Needles with the same ID within a 5-second time window are now treated as duplicates (same write with timestamp variance). Key changes: - Add mergeDeduplicationWindowNs constant (5 seconds) - Replace exact timestamp matching with time window comparison - Use windowInitialized flag to properly detect window transitions - Add TestMergeNeedleStreamsTimeWindowDeduplication test This ensures that replicated writes with slight timestamp differences are properly deduplicated during merge, while separate updates to the same file ID (outside the window) are preserved. All tests passing (10/10) * test: Add volume.merge integration tests with 5 comprehensive test cases * test: integration tests for volume.merge command * Fix integration tests: use TripleVolumeCluster for volume.merge testing - Created new TripleVolumeCluster framework (cluster_triple.go) with 3 volume servers - Rebuilt weed binary with volume.merge command compiled in - Updated all 5 integration tests to use TripleVolumeCluster instead of DualVolumeCluster - Tests now properly allocate volumes on 2 servers and let merge allocate on 3rd - All 5 integration tests now pass: - TestVolumeMergeBasic - TestVolumeMergeReadonly - TestVolumeMergeRestore - TestVolumeMergeTailNeedles - TestVolumeMergeDivergentReplicas * Refactor test framework: use parameterized server count instead of hardcoded - Renamed TripleVolumeCluster to MultiVolumeCluster with serverCount parameter - Replaced hardcoded volumePort0/1/2 with slices for flexible server count - Updated StartTripleVolumeCluster as backward-compatible wrapper calling StartMultiVolumeCluster(t, profile, 3) - Made directory creation, port allocation, and server startup loop-based - Updated accessor methods (VolumeAdminAddress, VolumeGRPCAddress, etc.) to support any server count - All 5 integration tests continue to pass with new parameterized cluster framework - Enables future testing with 2, 4, 5+ volume servers by calling StartMultiVolumeCluster directly * Consolidate cluster frameworks: StartDualVolumeCluster now uses MultiVolumeCluster - Made DualVolumeCluster a type alias for MultiVolumeCluster - Updated StartDualVolumeCluster to call StartMultiVolumeCluster(t, profile, 2) - Removed duplicate code from cluster_dual.go (now just 17 lines) - All existing tests using StartDualVolumeCluster continue to work without changes - Backward compatible: existing code continues to use the old function signatures - Added wrapper functions in cluster_multi.go for StartTripleVolumeCluster - Enables unified cluster management across all test suites * Address PR review comments: improve error handling and clean up code - Replace parse error swallow with proper error return - Log cleanup and restoration errors instead of silently discarding them - Remove unused offset field from memoryBackendFile struct - Fix WriteAt buffer truncation bug to preserve trailing bytes - All unit tests passing (10/10) - Code compiles successfully * Fix PR review findings: test improvements and code quality - Add timeout to runWeedShell to prevent hanging - Add server 1 readonly status verification in tests - Assert merge fails when replicas writable (not just log output) - Replace sleep with polling for writable restoration check - Fix WriteAt stale data snapshot bug in memoryBackendFile - Fix startVolume error logging to show current server log - Fix volumePubPorts double assignment in port allocation - Rename test to reflect behavior: DoesNotDeduplicateAcrossWindows - Fix misleading dedup window comment Unit tests: 10/10 passing Binary: Compiles successfully * Fix test assumption: merge command marks volumes readonly automatically TestVolumeMergeReadonly was expecting merge to fail on writable volumes, but the merge command is designed to mark volumes readonly as part of its operation. Fixed test to verify merge succeeds on writable volumes and properly restores writable state afterward. Removed redundant Test 2 code that duplicated the new behavior. * fmt * Fix deduplication logic to correctly handle same-stream vs cross-stream duplicates The dedup map previously used only NeedleId as key, causing same-stream overwrites to be incorrectly skipped as duplicates. Changed to track which stream first processed each needle ID in the current window: - Cross-stream duplicates (same ID from different streams, within window) are skipped - Same-stream duplicates (overwrites from same stream) are kept - Map now stores: needleId -> streamIndex of first occurrence in window Added TestMergeNeedleStreamsSameStreamDuplicates to verify same-stream overwrites are preserved while cross-stream duplicates are skipped. All unit tests passing (11/11) Binary compiles successfully	1 week ago
Chris Lu	da4edb5fe6	Fix live volume move tail timestamp (#8440 ) * Improve move tail timestamp * Add move tail timestamp integration test * Simulate traffic during move	1 week ago
Chris Lu	91f59e73e5	close ports	1 week ago
Chris Lu	e596542295	Move SQL engine and PostgreSQL server to their own binaries (#8417 ) * Drop SQL engine and PostgreSQL server * Split SQL tooling into weed-db and weed-sql * move * fix building	1 week ago
Chris Lu	e4b70c2521	go fix	2 weeks ago
Chris Lu	f7c27cc81f	go fmt	2 weeks ago
Chris Lu	bd0b1fe9d5	S3 IAM: Added ListPolicyVersions and GetPolicyVersion support (#8395 ) * test(s3/iam): add managed policy CRUD lifecycle integration coverage * s3/iam: add ListPolicyVersions and GetPolicyVersion support * test(s3/iam): cover ListPolicyVersions and GetPolicyVersion	2 weeks ago
Michał Szynkiewicz	2f837c4780	Fix error on deleting non-empty bucket (#8376 ) * Move check for non-empty bucket deletion out of `WithFilerClient` call * Added proper checking if a bucket has "user" objects	2 weeks ago
Chris Lu	36c469e34e	Enforce IAM for S3 Tables bucket creation (#8388 ) * Enforce IAM for s3tables bucket creation * Prefer IAM path when policies exist * Ensure IAM enforcement honors default allow * address comments * Reused the precomputed principal when setting tableBucketMetadata.OwnerAccountID, avoiding the redundant getAccountID call. * get identity * fix * dedup * fix * comments * fix tests * update iam config * go fmt * fix ports * fix flags * mini clean shutdown * Revert "update iam config" This reverts commit `ca48fdbb0a`. Revert "mini clean shutdown" This reverts commit `9e17f6baff`. Revert "fix flags" This reverts commit `e9e7b29d2f`. Revert "go fmt" This reverts commit `bd3241960b`. * test/s3tables: share single weed mini per test package via TestMain Previously each top-level test function in the catalog and s3tables package started and stopped its own weed mini instance. This caused failures when a prior instance wasn't cleanly stopped before the next one started (port conflicts, leaked global state). Changes: - catalog/iceberg_catalog_test.go: introduce TestMain that starts one shared TestEnvironment (external weed binary) before all tests and tears it down after. All individual test functions now use sharedEnv. Added randomSuffix() for unique resource names across tests. - catalog/pyiceberg_test.go: updated to use sharedEnv instead of per-test environments. - catalog/pyiceberg_test_helpers.go -> pyiceberg_test_helpers_test.go: renamed to a _test.go file so it can access TestEnvironment which is defined in a test file. - table-buckets/setup.go: add package-level sharedCluster variable. - table-buckets/s3tables_integration_test.go: introduce TestMain that starts one shared TestCluster before all tests. TestS3TablesIntegration now uses sharedCluster. Extract startMiniClusterInDir (no testing.T) for TestMain use. TestS3TablesCreateBucketIAMPolicy keeps its own cluster (different IAM config). Remove miniClusterMutex (no longer needed). Fix Stop() to not panic when t is nil." delete * parse * default allow should work with anonymous * fix port * iceberg route The failures are from Iceberg REST using the default bucket warehouse when no prefix is provided. Your tests create random buckets, so /v1/namespaces was looking in warehouse and failing. I updated the tests to use the prefixed Iceberg routes (/v1/{bucket}/...) via a small helper. * test(s3tables): fix port conflicts and IAM ARN matching in integration tests - Pass -master.dir explicitly to prevent filer store directory collision between shared cluster and per-test clusters running in the same process - Pass -volume.port.public and -volume.publicUrl to prevent the global publicPort flag (mutated from 0 → concrete port by first cluster) from being reused by a second cluster, causing 'address already in use' - Remove the flag-reset loop in Stop() that reset global flag values while other goroutines were reading them (race → panic) - Fix IAM policy Resource ARN in TestS3TablesCreateBucketIAMPolicy to use wildcards (arn:aws:s3tables:::bucket/<name>) because the handler generates ARNs with its own DefaultRegion (us-east-1) and principal name ('admin'), not the test constants testRegion/testAccountID	2 weeks ago
Chris Lu	e9c45144cf	Implement managed policy storage (#8385 ) * Persist managed IAM policies * Add IAM list/get policy integration test * Faster marker lookup and cleanup * Handle delete conflict and improve listing * Add delete-in-use policy integration test * Stabilize policy ID and guard path prefix * Tighten CreatePolicy guard and reload * Add ListPolicyNames to credential store	2 weeks ago
Chris Lu	7b8df39cf7	s3api: add AttachUserPolicy/DetachUserPolicy/ListAttachedUserPolicies (#8379 ) * iam: add XML responses for managed user policy APIs * s3api: implement attach/detach/list attached user policies * s3api: add embedded IAM tests for managed user policies * iam: update CredentialStore interface and Manager for managed policies Updated the `CredentialStore` interface to include `AttachUserPolicy`, `DetachUserPolicy`, and `ListAttachedUserPolicies` methods. The `CredentialManager` was updated to delegate these calls to the store. Added common error variables for policy management. * iam: implement managed policy methods in MemoryStore Implemented `AttachUserPolicy`, `DetachUserPolicy`, and `ListAttachedUserPolicies` in the MemoryStore. Also ensured deep copying of identities includes PolicyNames. * iam: implement managed policy methods in PostgresStore Modified Postgres schema to include `policy_names` JSONB column in `users`. Implemented `AttachUserPolicy`, `DetachUserPolicy`, and `ListAttachedUserPolicies`. Updated user CRUD operations to handle policy names persistence. * iam: implement managed policy methods in remaining stores Implemented user policy management in: - `FilerEtcStore` (partial implementation) - `IamGrpcStore` (delegated via GetUser/UpdateUser) - `PropagatingCredentialStore` (to broadcast updates) Ensures cluster-wide consistency for policy attachments. * s3api: refactor EmbeddedIamApi to use managed policy APIs - Refactored `AttachUserPolicy`, `DetachUserPolicy`, and `ListAttachedUserPolicies` to use `e.credentialManager` directly. - Fixed a critical error suppression bug in `ExecuteAction` that always returned success even on failure. - Implemented robust error matching using string comparison fallbacks. - Improved consistency by reloading configuration after policy changes. * s3api: update and refine IAM integration tests - Updated tests to use a real `MemoryStore`-backed `CredentialManager`. - Refined test configuration synchronization using `sync.Once` and manual deep-copying to prevent state corruption. - Improved `extractEmbeddedIamErrorCodeAndMessage` to handle more XML formats robustly. - Adjusted test expectations to match current AWS IAM behavior. * fix compilation * visibility * ensure 10 policies * reload * add integration tests * Guard raft command registration * Allow IAM actions in policy tests * Validate gRPC policy attachments * Revert Validate gRPC policy attachments * Tighten gRPC policy attach/detach * Improve IAM managed policy handling * Improve managed policy filters	2 weeks ago
Michał Szynkiewicz	53048ffffb	Add md5 checksum validation support on PutObject and UploadPart (#8367 ) * Add md5 checksum validation support on PutObject and UploadPart Per the S3 specification, when a client sends a Content-MD5 header, the server must compare it against the MD5 of the received body and return BadDigest (HTTP 400) if they don't match. SeaweedFS was silently accepting objects with incorrect Content-MD5 headers, which breaks data integrity verification for clients that rely on this feature (e.g. boto3). The error infrastructure (ErrBadDigest, ErrMsgBadDigest) already existed from PR #7306 but was never wired to an actual check. This commit adds MD5 verification in putToFiler after the body is streamed and the MD5 is computed, and adds Content-MD5 header validation to PutObjectPartHandler (matching PutObjectHandler). Orphaned chunks are cleaned up on mismatch. Refs: https://github.com/seaweedfs/seaweedfs/discussions/3908 * handle SSE, add uploadpart test * s3 integration test: fix typo and add multipart upload checksum test * s3api: move validateContentMd5 after GetBucketAndObject in PutObjectPartHandler * s3api: move validateContentMd5 after GetBucketAndObject in PutObjectHandler * s3api: fix MD5 validation for SSE uploads and logging in putToFiler * add SSE test with checksum validation - mostly ai-generated * Update s3_integration_test.go * Address S3 integration test feedback: fix typos, rename variables, add verification steps, and clean up comments. --------- Co-authored-by: Chris Lu <chris.lu@gmail.com>	2 weeks ago
Chris Lu	8ec9ff4a12	Refactor plugin system and migrate worker runtime (#8369 ) * admin: add plugin runtime UI page and route wiring * pb: add plugin gRPC contract and generated bindings * admin/plugin: implement worker registry, runtime, monitoring, and config store * admin/dash: wire plugin runtime and expose plugin workflow APIs * command: add flags to enable plugin runtime * admin: rename remaining plugin v2 wording to plugin * admin/plugin: add detectable job type registry helper * admin/plugin: add scheduled detection and dispatch orchestration * admin/plugin: prefetch job type descriptors when workers connect * admin/plugin: add known job type discovery API and UI * admin/plugin: refresh design doc to match current implementation * admin/plugin: enforce per-worker scheduler concurrency limits * admin/plugin: use descriptor runtime defaults for scheduler policy * admin/ui: auto-load first known plugin job type on page open * admin/plugin: bootstrap persisted config from descriptor defaults * admin/plugin: dedupe scheduled proposals by dedupe key * admin/ui: add job type and state filters for plugin monitoring * admin/ui: add per-job-type plugin activity summary * admin/plugin: split descriptor read API from schema refresh * admin/ui: keep plugin summary metrics global while tables are filtered * admin/plugin: retry executor reservation before timing out * admin/plugin: expose scheduler states for monitoring * admin/ui: show per-job-type scheduler states in plugin monitor * pb/plugin: rename protobuf package to plugin * admin/plugin: rename pluginRuntime wiring to plugin * admin/plugin: remove runtime naming from plugin APIs and UI * admin/plugin: rename runtime files to plugin naming * admin/plugin: persist jobs and activities for monitor recovery * admin/plugin: lease one detector worker per job type * admin/ui: show worker load from plugin heartbeats * admin/plugin: skip stale workers for detector and executor picks * plugin/worker: add plugin worker command and stream runtime scaffold * plugin/worker: implement vacuum detect and execute handlers * admin/plugin: document external vacuum plugin worker starter * command: update plugin.worker help to reflect implemented flow * command/admin: drop legacy Plugin V2 label * plugin/worker: validate vacuum job type and respect min interval * plugin/worker: test no-op detect when min interval not elapsed * command/admin: document plugin.worker external process * plugin/worker: advertise configured concurrency in hello * command/plugin.worker: add jobType handler selection * command/plugin.worker: test handler selection by job type * command/plugin.worker: persist worker id in workingDir * admin/plugin: document plugin.worker jobType and workingDir flags * plugin/worker: support cancel request for in-flight work * plugin/worker: test cancel request acknowledgements * command/plugin.worker: document workingDir and jobType behavior * plugin/worker: emit executor activity events for monitor * plugin/worker: test executor activity builder * admin/plugin: send last successful run in detection request * admin/plugin: send cancel request when detect or execute context ends * admin/plugin: document worker cancel request responsibility * admin/handlers: expose plugin scheduler states API in no-auth mode * admin/handlers: test plugin scheduler states route registration * admin/plugin: keep worker id on worker-generated activity records * admin/plugin: test worker id propagation in monitor activities * admin/dash: always initialize plugin service * command/admin: remove plugin enable flags and default to enabled * admin/dash: drop pluginEnabled constructor parameter * admin/plugin UI: stop checking plugin enabled state * admin/plugin: remove docs for plugin enable flags * admin/dash: remove unused plugin enabled check method * admin/dash: fallback to in-memory plugin init when dataDir fails * admin/plugin API: expose worker gRPC port in status * command/plugin.worker: resolve admin gRPC port via plugin status * split plugin UI into overview/configuration/monitoring pages * Update layout_templ.go * add volume_balance plugin worker handler * wire plugin.worker CLI for volume_balance job type * add erasure_coding plugin worker handler * wire plugin.worker CLI for erasure_coding job type * support multi-job handlers in plugin worker runtime * allow plugin.worker jobType as comma-separated list * admin/plugin UI: rename to Workers and simplify config view * plugin worker: queue detection requests instead of capacity reject * Update plugin_worker.go * plugin volume_balance: remove force_move/timeout from worker config UI * plugin erasure_coding: enforce local working dir and cleanup * admin/plugin UI: rename admin settings to job scheduling * admin/plugin UI: persist and robustly render detection results * admin/plugin: record and return detection trace metadata * admin/plugin UI: show detection process and decision trace * plugin: surface detector decision trace as activities * mini: start a plugin worker by default * admin/plugin UI: split monitoring into detection and execution tabs * plugin worker: emit detection decision trace for EC and balance * admin workers UI: split monitoring into detection and execution pages * plugin scheduler: skip proposals for active assigned/running jobs * admin workers UI: add job queue tab * plugin worker: add dummy stress detector and executor job type * admin workers UI: reorder tabs to detection queue execution * admin workers UI: regenerate plugin template * plugin defaults: include dummy stress and add stress tests * plugin dummy stress: rotate detection selections across runs * plugin scheduler: remove cross-run proposal dedupe * plugin queue: track pending scheduled jobs * plugin scheduler: wait for executor capacity before dispatch * plugin scheduler: skip detection when waiting backlog is high * plugin: add disk-backed job detail API and persistence * admin ui: show plugin job detail modal from job id links * plugin: generate unique job ids instead of reusing proposal ids * plugin worker: emit heartbeats on work state changes * plugin registry: round-robin tied executor and detector picks * add temporary EC overnight stress runner * plugin job details: persist and render EC execution plans * ec volume details: color data and parity shard badges * shard labels: keep parity ids numeric and color-only distinction * admin: remove legacy maintenance UI routes and templates * admin: remove dead maintenance endpoint helpers * Update layout_templ.go * remove dummy_stress worker and command support * refactor plugin UI to job-type top tabs and sub-tabs * migrate weed worker command to plugin runtime * remove plugin.worker command and keep worker runtime with metrics * update helm worker args for jobType and execution flags * set plugin scheduling defaults to global 16 and per-worker 4 * stress: fix RPC context reuse and remove redundant variables in ec_stress_runner * admin/plugin: fix lifecycle races, safe channel operations, and terminal state constants * admin/dash: randomize job IDs and fix priority zero-value overwrite in plugin API * admin/handlers: implement buffered rendering to prevent response corruption * admin/plugin: implement debounced persistence flusher and optimize BuildJobDetail memory lookups * admin/plugin: fix priority overwrite and implement bounded wait in scheduler reserve * admin/plugin: implement atomic file writes and fix run record side effects * admin/plugin: use P prefix for parity shard labels in execution plans * admin/plugin: enable parallel execution for cancellation tests * admin: refactor time.Time fields to pointers for better JSON omitempty support * admin/plugin: implement pointer-safe time assignments and comparisons in plugin core * admin/plugin: fix time assignment and sorting logic in plugin monitor after pointer refactor * admin/plugin: update scheduler activity tracking to use time pointers * admin/plugin: fix time-based run history trimming after pointer refactor * admin/dash: fix JobSpec struct literal in plugin API after pointer refactor * admin/view: add D/P prefixes to EC shard badges for UI consistency * admin/plugin: use lifecycle-aware context for schema prefetching * Update ec_volume_details_templ.go * admin/stress: fix proposal sorting and log volume cleanup errors * stress: refine ec stress runner with math/rand and collection name - Added Collection field to VolumeEcShardsDeleteRequest for correct filename construction. - Replaced crypto/rand with seeded math/rand PRNG for bulk payloads. - Added documentation for EcMinAge zero-value behavior. - Added logging for ignored errors in volume/shard deletion. * admin: return internal server error for plugin store failures Changed error status code from 400 Bad Request to 500 Internal Server Error for failures in GetPluginJobDetail to correctly reflect server-side errors. * admin: implement safe channel sends and graceful shutdown sync - Added sync.WaitGroup to Plugin struct to manage background goroutines. - Implemented safeSendCh helper using recover() to prevent panics on closed channels. - Ensured Shutdown() waits for all background operations to complete. * admin: robustify plugin monitor with nil-safe time and record init - Standardized nil-safe assignment for time.Time pointers (CreatedAt, UpdatedAt, CompletedAt). - Ensured persistJobDetailSnapshot initializes new records correctly if they don't exist on disk. - Fixed debounced persistence to trigger immediate write on job completion. admin: improve scheduler shutdown behavior and logic guards - Replaced brittle error string matching with explicit r.shutdownCh selection for shutdown detection. - Removed redundant nil guard in buildScheduledJobSpec. - Standardized WaitGroup usage for schedulerLoop. * admin: implement deep copy for job parameters and atomic write fixes - Implemented deepCopyGenericValue and used it in cloneTrackedJob to prevent shared state. - Ensured atomicWriteFile creates parent directories before writing. * admin: remove unreachable branch in shard classification Removed an unreachable 'totalShards <= 0' check in classifyShardID as dataShards and parityShards are already guarded. * admin: secure UI links and use canonical shard constants - Added rel="noopener noreferrer" to external links for security. - Replaced magic number 14 with erasure_coding.TotalShardsCount. - Used renderEcShardBadge for missing shard list consistency. * admin: stabilize plugin tests and fix regressions - Composed a robust plugin_monitor_test.go to handle asynchronous persistence. - Updated all time.Time literals to use timeToPtr helper. - Added explicit Shutdown() calls in tests to synchronize with debounced writes. - Fixed syntax errors and orphaned struct literals in tests. * Potential fix for code scanning alert no. 278: Slice memory allocation with excessive size value Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com> * Potential fix for code scanning alert no. 283: Uncontrolled data used in path expression Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com> * admin: finalize refinements for error handling, scheduler, and race fixes - Standardized HTTP 500 status codes for store failures in plugin_api.go. - Tracked scheduled detection goroutines with sync.WaitGroup for safe shutdown. - Fixed race condition in safeSendDetectionComplete by extracting channel under lock. - Implemented deep copy for JobActivity details. - Used defaultDirPerm constant in atomicWriteFile. * test(ec): migrate admin dockertest to plugin APIs * admin/plugin_api: fix RunPluginJobTypeAPI to return 500 for server-side detection/filter errors * admin/plugin_api: fix ExecutePluginJobAPI to return 500 for job execution failures * admin/plugin_api: limit parseProtoJSONBody request body to 1MB to prevent unbounded memory usage * admin/plugin: consolidate regex to package-level validJobTypePattern; add char validation to sanitizeJobID * admin/plugin: fix racy Shutdown channel close with sync.Once * admin/plugin: track sendLoop and recv goroutines in WorkerStream with r.wg * admin/plugin: document writeProtoFiles atomicity — .pb is source of truth, .json is human-readable only * admin/plugin: extract activityLess helper to deduplicate nil-safe OccurredAt sort comparators * test/ec: check http.NewRequest errors to prevent nil req panics * test/ec: replace deprecated ioutil/math/rand, fix stale step comment 5.1→3.1 * plugin(ec): raise default detection and scheduling throughput limits * topology: include empty disks in volume list and EC capacity fallback * topology: remove hard 10-task cap for detection planning * Update ec_volume_details_templ.go * adjust default * fix tests --------- Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>	2 weeks ago
dependabot[bot]	bddd7960c1	build(deps): bump org.apache.avro:avro from 1.11.4 to 1.11.5 in /test/java/spark (#8358 ) build(deps): bump org.apache.avro:avro in /test/java/spark Bumps org.apache.avro:avro from 1.11.4 to 1.11.5. --- updated-dependencies: - dependency-name: org.apache.avro:avro dependency-version: 1.11.5 dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Chris Lu <chrislusf@users.noreply.github.com>	2 weeks ago
Chris Lu	564fc56698	Update docker-compose.yml	2 weeks ago
Chris Lu	0d8588e3ae	S3: Implement IAM defaults and STS signing key fallback (#8348 ) * S3: Implement IAM defaults and STS signing key fallback logic * S3: Refactor startup order to init SSE-S3 key manager before IAM * S3: Derive STS signing key from KEK using HKDF for security isolation * S3: Document STS signing key fallback in security.toml * fix(s3api): refine anonymous access logic and secure-by-default behavior - Initialize anonymous identity by default in `NewIdentityAccessManagement` to prevent nil pointer exceptions. - Ensure `ReplaceS3ApiConfiguration` preserves the anonymous identity if not present in the new configuration. - Update `NewIdentityAccessManagement` signature to accept `filerClient`. - In legacy mode (no policy engine), anonymous defaults to Deny (no actions), preserving secure-by-default behavior. - Use specific `LookupAnonymous` method instead of generic map lookup. - Update tests to accommodate signature changes and verify improved anonymous handling. * feat(s3api): make IAM configuration optional - Start S3 API server without a configuration file if `EnableIam` option is set. - Default to `Allow` effect for policy engine when no configuration is provided (Zero-Config mode). - Handle empty configuration path gracefully in `loadIAMManagerFromConfig`. - Add integration test `iam_optional_test.go` to verify empty config behavior. * fix(iamapi): fix signature mismatch in NewIdentityAccessManagementWithStore * fix(iamapi): properly initialize FilerClient instead of passing nil * fix(iamapi): properly initialize filer client for IAM management - Instead of passing `nil`, construct a `wdclient.FilerClient` using the provided `Filers` addresses. - Ensure `NewIdentityAccessManagementWithStore` receives a valid `filerClient` to avoid potential nil pointer dereferences or limited functionality. * clean: remove dead code in s3api_server.go * refactor(s3api): improve IAM initialization, safety and anonymous access security * fix(s3api): ensure IAM config loads from filer after client init * fix(s3): resolve test failures in integration, CORS, and tagging tests - Fix CORS tests by providing explicit anonymous permissions config - Fix S3 integration tests by setting admin credentials in init - Align tagging test credentials in CI with IAM defaults - Added goroutine to retry IAM config load in iamapi server * fix(s3): allow anonymous access to health targets and S3 Tables when identities are present * fix(ci): use /healthz for Caddy health check in awscli tests * iam, s3api: expose DefaultAllow from IAM and Policy Engine This allows checking the global "Open by Default" configuration from other components like S3 Tables. * s3api/s3tables: support DefaultAllow in permission logic and handler Updated CheckPermissionWithContext to respect the DefaultAllow flag in PolicyContext. This enables "Open by Default" behavior for unauthenticated access in zero-config environments. Added a targeted unit test to verify the logic. * s3api/s3tables: propagate DefaultAllow through handlers Propagated the DefaultAllow flag to individual handlers for namespaces, buckets, tables, policies, and tagging. This ensures consistent "Open by Default" behavior across all S3 Tables API endpoints. * s3api: wire up DefaultAllow for S3 Tables API initialization Updated registerS3TablesRoutes to query the global IAM configuration and set the DefaultAllow flag on the S3 Tables API server. This completes the end-to-end propagation required for anonymous access in zero-config environments. Added a SetDefaultAllow method to S3TablesApiServer to facilitate this. * s3api: fix tests by adding DefaultAllow to mock IAM integrations The IAMIntegration interface was updated to include DefaultAllow(), breaking several mock implementations in tests. This commit fixes the build errors by adding the missing method to the mocks. * env * ensure ports * env * env * fix default allow * add one more test using non-anonymous user * debug * add more debug * less logs	2 weeks ago
Chris Lu	f44e25b422	fix(iam): ensure access key status is persisted and defaulted to Active (#8341 ) * Fix master leader election startup issue Fixes #error-log-leader-not-selected-yet * not useful test * fix(iam): ensure access key status is persisted and defaulted to Active * make pb * update tests * using constants	3 weeks ago
dependabot[bot]	35b6e895cc	build(deps): bump org.apache.avro:avro from 1.11.4 to 1.11.5 in /test/kafka/kafka-client-loadtest/tools (#8339 ) build(deps): bump org.apache.avro:avro Bumps org.apache.avro:avro from 1.11.4 to 1.11.5. --- updated-dependencies: - dependency-name: org.apache.avro:avro dependency-version: 1.11.5 dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	3 weeks ago
Chris Lu	49a64f50f1	Add session policy support to IAM (#8338 ) * Add session policy support to IAM - Implement policy evaluation for session tokens in policy_engine.go - Add session_policy field to session claims for tracking applied policies - Update STS service to include session policies in token generation - Add IAM integration tests for session policy validation - Update IAM manager to support policy attachment to sessions - Extend S3 API STS endpoint to handle session policy restrictions * fix: optimize session policy evaluation and add documentation * sts: add NormalizeSessionPolicy helper for inline session policies * sts: support inline session policies for AssumeRoleWithWebIdentity and credential-based flows * s3api: parse and normalize Policy parameter for STS HTTP handlers * tests: add session policy unit tests and integration tests for inline policy downscoping * tests: add s3tables STS inline policy integration * iam: handle user principals and validate tokens * sts: enforce inline session policy size limit * tests: harden s3tables STS integration config * iam: clarify principal policy resolution errors * tests: improve STS integration endpoint selection	3 weeks ago
Chris Lu	beeb375a88	Add volume server integration test suite and CI workflow (#8322 ) * docs(volume_server): add integration test development plan * test(volume_server): add integration harness and profile matrix * test(volume_server/http): add admin and options integration coverage * test(volume_server/grpc): add state and status integration coverage * test(volume_server): auto-build weed binary and harden cluster startup * test(volume_server/http): add upload read range head delete coverage * test(volume_server/grpc): expand admin lifecycle and state coverage * docs(volume_server): update progress tracker for implemented tests * test(volume_server/http): cover if-none-match and invalid-range branches * test(volume_server/grpc): add batch delete integration coverage * docs(volume_server): log latest HTTP and gRPC test coverage * ci(volume_server): run volume server integration tests in github actions * test(volume_server/grpc): add needle status configure ping and leave coverage * docs(volume_server): record additional grpc coverage progress * test(volume_server/grpc): add vacuum integration coverage * docs(volume_server): record vacuum test coverage progress * test(volume_server/grpc): add read and write needle blob error-path coverage * docs(volume_server): record data rw grpc coverage progress * test(volume_server/http): add jwt auth integration coverage * test(volume_server/grpc): add sync copy and stream error-path coverage * docs(volume_server): record jwt and sync/copy test coverage * test(volume_server/grpc): add scrub and query integration coverage * test(volume_server/grpc): add volume tail sender and receiver coverage * docs(volume_server): record scrub query and tail test progress * test(volume_server/grpc): add readonly writable and collection lifecycle coverage * test(volume_server/http): add public-port cors and method parity coverage * test(volume_server/grpc): add blob meta and read-all success path coverage * test(volume_server/grpc): expand scrub and query variation coverage * test(volume_server/grpc): add tiering and remote fetch error-path coverage * test(volume_server/http): add unchanged write and delete edge-case coverage * test(volume_server/grpc): add ping unknown and unreachable target coverage * test(volume_server/grpc): add volume delete only-empty variation coverage * test(volume_server/http): add jwt fid-mismatch auth coverage * test(volume_server/grpc): add scrub ec auto-select empty coverage * test(volume_server/grpc): stabilize ping timestamp assertion * docs(volume_server): update integration coverage progress log * test(volume_server/grpc): add tier remote backend and config variation coverage * docs(volume_server): record tier remote variation progress * test(volume_server/grpc): add incremental copy and receive-file protocol coverage * test(volume_server/http): add read path shape and if-modified-since coverage * test(volume_server/grpc): add copy-file compaction and receive-file success coverage * test(volume_server/http): add passthrough headers and static asset coverage * test(volume_server/grpc): add ping filer unreachable coverage * docs(volume_server): record copy receive and http variant progress * test(volume_server/grpc): add erasure coding maintenance and missing-path coverage * docs(volume_server): record initial erasure coding rpc coverage * test(volume_server/http): add multi-range multipart response coverage * docs(volume_server): record multi-range http coverage progress * test(volume_server/grpc): add query empty-stripe no-match coverage * docs(volume_server): record query no-match stream behavior coverage * test(volume_server/http): add upload throttling timeout and replicate bypass coverage * docs(volume_server): record upload throttling coverage progress * test(volume_server/http): add download throttling timeout coverage * docs(volume_server): record download throttling coverage progress * test(volume_server/http): add jwt wrong-cookie fid mismatch coverage * docs(volume_server): record jwt wrong-cookie mismatch coverage * test(volume_server/http): add jwt expired-token rejection coverage * docs(volume_server): record jwt expired-token coverage * test(volume_server/http): add jwt query and cookie transport coverage * docs(volume_server): record jwt token transport coverage * test(volume_server/http): add jwt token-source precedence coverage * docs(volume_server): record jwt token-source precedence coverage * test(volume_server/http): add jwt header-over-cookie precedence coverage * docs(volume_server): record jwt header cookie precedence coverage * test(volume_server/http): add jwt query-over-cookie precedence coverage * docs(volume_server): record jwt query cookie precedence coverage * test(volume_server/grpc): add setstate version mismatch and nil-state coverage * docs(volume_server): record setstate validation coverage * test(volume_server/grpc): add readonly persist-true lifecycle coverage * docs(volume_server): record readonly persist variation coverage * test(volume_server/http): add options origin cors header coverage * docs(volume_server): record options origin cors coverage * test(volume_server/http): add trace unsupported-method parity coverage * docs(volume_server): record trace method parity coverage * test(volume_server/grpc): add batch delete cookie-check variation coverage * docs(volume_server): record batch delete cookie-check coverage * test(volume_server/grpc): add admin lifecycle missing and maintenance variants * docs(volume_server): record admin lifecycle edge-case coverage * test(volume_server/grpc): add mixed batch delete status matrix coverage * docs(volume_server): record mixed batch delete matrix coverage * test(volume_server/http): add jwt-profile ui access gating coverage * docs(volume_server): record jwt ui-gating http coverage * test(volume_server/http): add propfind unsupported-method parity coverage * docs(volume_server): record propfind method parity coverage * test(volume_server/grpc): add volume configure success and rollback-path coverage * docs(volume_server): record volume configure branch coverage * test(volume_server/grpc): add volume needle status missing-path coverage * docs(volume_server): record volume needle status error-path coverage * test(volume_server/http): add readDeleted query behavior coverage * docs(volume_server): record readDeleted http behavior coverage * test(volume_server/http): add delete ts override parity coverage * docs(volume_server): record delete ts parity coverage * test(volume_server/grpc): add invalid blob/meta offset coverage * docs(volume_server): record invalid blob/meta offset coverage * test(volume_server/grpc): add read-all mixed volume abort coverage * docs(volume_server): record read-all mixed-volume abort coverage * test(volume_server/http): assert head response body parity * docs(volume_server): record head body parity assertion * test(volume_server/grpc): assert status state and memory payload completeness * docs(volume_server): record volume server status payload coverage * test(volume_server/grpc): add batch delete chunk-manifest rejection coverage * docs(volume_server): record batch delete chunk-manifest coverage * test(volume_server/grpc): add query cookie-mismatch eof parity coverage * docs(volume_server): record query cookie-mismatch parity coverage * test(volume_server/grpc): add ping master success target coverage * docs(volume_server): record ping master success coverage * test(volume_server/http): add head if-none-match conditional parity * docs(volume_server): record head if-none-match parity coverage * test(volume_server/http): add head if-modified-since parity coverage * docs(volume_server): record head if-modified-since parity coverage * test(volume_server/http): add connect unsupported-method parity coverage * docs(volume_server): record connect method parity coverage * test(volume_server/http): assert options allow-headers cors parity * docs(volume_server): record options allow-headers coverage * test(volume_server/framework): add dual volume cluster integration harness * test(volume_server/http): add missing-local read mode proxy redirect local coverage * docs(volume_server): record read mode missing-local matrix coverage * test(volume_server/http): add download over-limit replica proxy fallback coverage * docs(volume_server): record download replica fallback coverage * test(volume_server/http): add missing-local readDeleted proxy redirect parity coverage * docs(volume_server): record missing-local readDeleted mode coverage * test(volume_server/framework): add single-volume cluster with filer harness * test(volume_server/grpc): add ping filer success target coverage * docs(volume_server): record ping filer success coverage * test(volume_server/http): add proxied-loop guard download timeout coverage * docs(volume_server): record proxied-loop download coverage * test(volume_server/http): add disabled upload and download limit coverage * docs(volume_server): record disabled throttling path coverage * test(volume_server/grpc): add idempotent volume server leave coverage * docs(volume_server): record leave idempotence coverage * test(volume_server/http): add redirect collection query preservation coverage * docs(volume_server): record redirect collection query coverage * test(volume_server/http): assert admin server headers on status and health * docs(volume_server): record admin server header coverage * test(volume_server/http): assert healthz request-id echo parity * docs(volume_server): record healthz request-id parity coverage * test(volume_server/http): add over-limit invalid-vid download branch coverage * docs(volume_server): record over-limit invalid-vid branch coverage * test(volume_server/http): add public-port static asset coverage * docs(volume_server): record public static endpoint coverage * test(volume_server/http): add public head method parity coverage * docs(volume_server): record public head parity coverage * test(volume_server/http): add throttling wait-then-proceed path coverage * docs(volume_server): record throttling wait-then-proceed coverage * test(volume_server/http): add read cookie-mismatch not-found coverage * docs(volume_server): record read cookie-mismatch coverage * test(volume_server/http): add throttling timeout-recovery coverage * docs(volume_server): record throttling timeout-recovery coverage * test(volume_server/grpc): add ec generate mount info unmount lifecycle coverage * docs(volume_server): record ec positive lifecycle coverage * test(volume_server/grpc): add ec shard read and blob delete lifecycle coverage * docs(volume_server): record ec shard read/blob delete lifecycle coverage * test(volume_server/grpc): add ec rebuild and to-volume error branch coverage * docs(volume_server): record ec rebuild and to-volume branch coverage * test(volume_server/grpc): add ec shards-to-volume success roundtrip coverage * docs(volume_server): record ec shards-to-volume success coverage * test(volume_server/grpc): add ec receive and copy-file missing-source coverage * docs(volume_server): record ec receive and copy-file coverage * test(volume_server/grpc): add ec last-shard delete cleanup coverage * docs(volume_server): record ec last-shard delete cleanup coverage * test(volume_server/grpc): add volume copy success path coverage * docs(volume_server): record volume copy success coverage * test(volume_server/grpc): add volume copy overwrite-destination coverage * docs(volume_server): record volume copy overwrite coverage * test(volume_server/http): add write error-path variant coverage * docs(volume_server): record http write error-path coverage * test(volume_server/http): add conditional header precedence coverage * docs(volume_server): record conditional header precedence coverage * test(volume_server/http): add oversized combined range guard coverage * docs(volume_server): record oversized range guard coverage * test(volume_server/http): add image resize and crop read coverage * docs(volume_server): record image transform coverage * test(volume_server/http): add chunk-manifest expansion and bypass coverage * docs(volume_server): record chunk-manifest read coverage * test(volume_server/http): add compressed read encoding matrix coverage * docs(volume_server): record compressed read matrix coverage * test(volume_server/grpc): add tail receiver source replication coverage * docs(volume_server): record tail receiver replication coverage * test(volume_server/grpc): add tail sender large-needle chunking coverage * docs(volume_server): record tail sender chunking coverage * test(volume_server/grpc): add ec-backed volume needle status coverage * docs(volume_server): record ec-backed needle status coverage * test(volume_server/grpc): add ec shard copy from peer success coverage * docs(volume_server): record ec shard copy success coverage * test(volume_server/http): add chunk-manifest delete child cleanup coverage * docs(volume_server): record chunk-manifest delete cleanup coverage * test(volume_server/http): add chunk-manifest delete failure-path coverage * docs(volume_server): record chunk-manifest delete failure coverage * test(volume_server/grpc): add ec shard copy source-unavailable coverage * docs(volume_server): record ec shard copy source-unavailable coverage * parallel	3 weeks ago
Chris Lu	796f23f68a	Fix STS InvalidAccessKeyId and request body consumption issues (#8328 ) * Fix STS InvalidAccessKeyId and request body consumption in Lakekeeper integration test * Remove debug prints * Add Lakekeeper integration tests to CI * Fix connection refused in CI by binding to 0.0.0.0 * Add timeout to docker run in Lakekeeper integration test * Update weed/s3api/auth_credentials.go Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>	3 weeks ago
Chris Lu	25ea48227f	Fix STS temporary credentials to use ASIA prefix instead of AKIA (#8326 ) Temporary credentials from STS AssumeRole were using "AKIA" prefix (permanent IAM user credentials) instead of "ASIA" prefix (temporary security credentials). This violates AWS conventions and may cause compatibility issues with AWS SDKs that validate credential types. Changes: - Rename generateAccessKeyId to generateTemporaryAccessKeyId for clarity - Update function to use ASIA prefix for temporary credentials - Add unit tests to verify ASIA prefix format (weed/iam/sts/credential_prefix_test.go) - Add integration test to verify ASIA prefix in S3 API (test/s3/iam/s3_sts_credential_prefix_test.go) - Ensure AWS-compatible credential format (ASIA + 16 hex chars) The credentials are already deterministic (SHA256-based from session ID) and the SessionToken is correctly set to the JWT token, so this is just a prefix fix to follow AWS standards. Fixes #8312	3 weeks ago
Chris Lu	0082c47e04	Test: Add RisingWave DML verification test (#8317 ) * Test: Verify RisingWave DML operations (INSERT, UPDATE, DELETE) support * Test: Refine RisingWave DML test (remove sleeps, use polling)	3 weeks ago
Chris Lu	c1a9263e37	Fix STS AssumeRole with POST body param (#8320 ) * Fix STS AssumeRole with POST body param and add integration test * Add STS integration test to CI workflow * Address code review feedback: fix HPP vulnerability and style issues * Refactor: address code review feedback - Fix HTTP Parameter Pollution vulnerability in UnifiedPostHandler - Refactor permission check logic for better readability - Extract test helpers to testutil/docker.go to reduce duplication - Clean up imports and simplify context setting * Add SigV4-style test variant for AssumeRole POST body routing - Added ActionInBodyWithSigV4Style test case to validate real-world scenario - Test confirms routing works correctly for AWS SigV4-signed requests - Addresses code review feedback about testing with SigV4 signatures * Fix: always set identity in context when non-nil - Ensure UnifiedPostHandler always calls SetIdentityInContext when identity is non-nil - Only call SetIdentityNameInContext when identity.Name is non-empty - This ensures downstream handlers (embeddedIam.DoActions) always have access to identity - Addresses potential issue where empty identity.Name would skip context setting	3 weeks ago
Chris Lu	b8ef48c8f1	Add RisingWave catalog tests (#8308 ) * Add RisingWave catalog tests for S3 tables * Add RisingWave catalog integration tests to CI workflow * Refactor RisingWave catalog tests based on PR feedback * Address PR feedback: optimize checks, cleanup logs * fix tests * consistent	3 weeks ago
Chris Lu	7151181d54	fix flaky tests	3 weeks ago
Chris Lu	b57429ef2e	Switch empty-folder cleanup to bucket policy (#8292 ) * Fix Spark _temporary cleanup and add issue #8285 regression test * Generalize empty folder cleanup for Spark temp artifacts * Revert synchronous folder pruning and add cleanup diagnostics * Add actionable empty-folder cleanup diagnostics * Fix Spark temp marker cleanup in async folder cleaner * Fix Spark temp cleanup with implicit directory markers * Keep explicit directory markers non-implicit * logging * more logs * Switch empty-folder cleanup to bucket policy * Seaweed-X-Amz-Allow-Empty-Folders * less logs * go vet * less logs * refactoring	3 weeks ago
Chris Lu	17f85361e9	Remove unsupported iceberg rest signing-region from tests and docs (#8289 )	3 weeks ago
Chris Lu	d6825ffce2	Iceberg: implement stage-create finalize flow (phase 1) (#8279 ) * iceberg: implement stage-create and create-on-commit finalize * iceberg: add create validation error typing and stage-create integration test * tests: merge stage-create integration check into catalog suite * tests: cover stage-create finalize lifecycle in catalog integration * iceberg: persist and cleanup stage-create markers * iceberg: add stage-create rollout flag and marker pruning * docs: add stage-create support design and rollout plan * docs: drop stage-create design draft from PR * iceberg: use conservative 72h stage-marker retention * iceberg: address review comments on create-on-commit and tests * iceberg: keep stage-create metadata out of table location * refactor(iceberg): split iceberg.go into focused files	3 weeks ago
Chris Lu	458c12fb99	test: add Spark S3 integration regression for issue #8234 (#8249 ) * test: add Spark S3 regression integration test for issue 8234 * Update test/s3/spark/setup_test.go Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> --------- Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	4 weeks ago
Chris Lu	5a0204310c	Add Iceberg admin UI (#8246 ) * Add Iceberg table details view * Enhance Iceberg catalog browsing UI * Fix Iceberg UI security and logic issues - Fix selectSchema() and partitionFieldsFromFullMetadata() to always search for matching IDs instead of checking != 0 - Fix snapshotsFromFullMetadata() to defensive-copy before sorting to prevent mutating caller's slice - Fix XSS vulnerabilities in s3tables.js: replace innerHTML with textContent/createElement for user-controlled data - Fix deleteIcebergTable() to redirect to namespace tables list on details page instead of reloading - Fix data-bs-target in iceberg_namespaces.templ: remove templ.SafeURL for CSS selector - Add catalogName to delete modal data attributes for proper redirect - Remove unused hidden inputs from create table form (icebergTableBucketArn, icebergTableNamespace) * Regenerate templ files for Iceberg UI updates * Support complex Iceberg type objects in schema Change Type field from string to json.RawMessage in both IcebergSchemaFieldInfo and internal icebergSchemaField to properly handle Iceberg spec's complex type objects (e.g. {"type": "struct", "fields": [...]}). Currently test data only shows primitive string types, but this change makes the implementation defensively robust for future complex types by preserving the exact JSON representation. Add typeToString() helper and update schema extraction functions to marshal string types as JSON. Update template to convert json.RawMessage to string for display. * Regenerate templ files for Type field changes * templ * Fix additional Iceberg UI issues from code review - Fix lazy-load flag that was set before async operation completed, preventing retries on error; now sets loaded flag only after successful load and throws error to caller for proper error handling and UI updates - Add zero-time guards for CreatedAt and ModifiedAt fields in table details to avoid displaying Go zero-time values; render dash when time is zero - Add URL path escaping for all catalog/namespace/table names in URLs to prevent malformed URLs when names contain special characters like /, ?, or # - Remove redundant innerHTML clear in loadIcebergNamespaceTables that cleared twice before appending the table list - Fix selectSnapshotForMetrics to remove != 0 guard for consistency with selectSchema fix; now always searches for CurrentSnapshotID without zero-value gate - Enhance typeToString() helper to display '(complex)' for non-primitive JSON types * Regenerate templ files for Phase 3 updates * Fix template generation to use correct file paths Run templ generate from repo root instead of weed/admin directory to ensure generated _templ.go files have correct absolute paths in error messages (e.g., 'weed/admin/view/app/iceberg_table_details.templ' instead of 'app/iceberg_table_details.templ'). This ensures both 'make admin-generate' at repo root and 'make generate' in weed/admin directory produce identical output with consistent file path references. * Regenerate template files with correct path references * Validate S3 Tables names in UI - Add client-side validation for table bucket and namespace names to surface errors for invalid characters (dots/underscores) before submission - Use HTML validity messages with reportValidity for immediate feedback - Update namespace helper text to reflect actual constraints (single-level, lowercase letters, numbers, and underscores) * Regenerate templ files for namespace helper text * Fix Iceberg catalog REST link and actions * Disallow S3 object access on table buckets * Validate Iceberg layout for table bucket objects * Fix REST API link to /v1/config * merge iceberg page with table bucket page * Allowed Trino/Iceberg stats files in metadata validation * fixes - Backend/data handling: - Normalized Iceberg type display and fallback handling in weed/admin/dash/s3tables_management.go. - Fixed snapshot fallback pointer semantics in weed/admin/dash/s3tables_management.go. - Added CSRF token generation/propagation/validation for namespace create/delete in: - weed/admin/dash/csrf.go - weed/admin/dash/auth_middleware.go - weed/admin/dash/middleware.go - weed/admin/dash/s3tables_management.go - weed/admin/view/layout/layout.templ - weed/admin/static/js/s3tables.js - UI/template fixes: - Zero-time guards for CreatedAt fields in: - weed/admin/view/app/iceberg_namespaces.templ - weed/admin/view/app/iceberg_tables.templ - Fixed invalid templ-in-script interpolation and host/port rendering in: - weed/admin/view/app/iceberg_catalog.templ - weed/admin/view/app/s3tables_buckets.templ - Added data-catalog-name consistency on Iceberg delete action in weed/admin/view/app/iceberg_tables.templ. - Updated retry wording in weed/admin/static/js/s3tables.js. - Regenerated all affected _templ.go files. - S3 API/comment follow-ups: - Reused cached table-bucket validator in weed/s3api/bucket_paths.go. - Added validation-failure debug logging in weed/s3api/s3api_object_handlers_tagging.go. - Added multipart path-validation design comment in weed/s3api/s3api_object_handlers_multipart.go. - Build tooling: - Fixed templ generate working directory issues in weed/admin/Makefile (watch + pattern rule). * populate data * test/s3tables: harden populate service checks * admin: skip table buckets in object-store bucket list * admin sidebar: move object store to top-level links * admin iceberg catalog: guard zero times and escape links * admin forms: add csrf/error handling and client-side name validation * admin s3tables: fix namespace delete modal redeclaration * admin: replace native confirm dialogs with modal helpers * admin modal-alerts: remove noisy confirm usage console log * reduce logs * test/s3tables: use partitioned tables in trino and spark populate * admin file browser: normalize filer ServerAddress for HTTP parsing	4 weeks ago
Chris Lu	403592bb9f	Add Spark Iceberg catalog integration tests and CI support (#8242 ) * Add Spark Iceberg catalog integration tests and CI support Implement comprehensive integration tests for Spark with SeaweedFS Iceberg REST catalog: - Basic CRUD operations (Create, Read, Update, Delete) on Iceberg tables - Namespace (database) management - Data insertion, querying, and deletion - Time travel capabilities via snapshot versioning - Compatible with SeaweedFS S3 and Iceberg REST endpoints Tests mirror the structure of existing Trino integration tests but use Spark's Python SQL API and PySpark for testing. Add GitHub Actions CI job for spark-iceberg-catalog-tests in s3-tables-tests.yml to automatically run Spark integration tests on pull requests. * fmt * Fix Spark integration tests - code review feedback * go mod tidy * Add go mod tidy step to integration test jobs Add 'go mod tidy' step before test runs for all integration test jobs: - s3-tables-tests - iceberg-catalog-tests - trino-iceberg-catalog-tests - spark-iceberg-catalog-tests This ensures dependencies are clean before running tests. * Fix remaining Spark operations test issues Address final code review comments: Setup & Initialization: - Add waitForSparkReady() helper function that polls Spark readiness with backoff instead of hardcoded 10-second sleep - Extract setupSparkTestEnv() helper to reduce boilerplate duplication between TestSparkCatalogBasicOperations and TestSparkTimeTravel - Both tests now use helpers for consistent, reliable setup Assertions & Validation: - Make setup-critical operations (namespace, table creation, initial insert) use t.Fatalf instead of t.Errorf to fail fast - Validate setupSQL output in TestSparkTimeTravel and fail if not 'Setup complete' - Add validation after second INSERT in TestSparkTimeTravel: verify row count increased to 2 before time travel test - Add context to error messages with namespace and tableName params Code Quality: - Remove code duplication between test functions - All critical paths now properly validated - Consistent error handling throughout * Fix go vet errors in S3 Tables tests Fixes: 1. setup_test.go (Spark): - Add missing import: github.com/testcontainers/testcontainers-go/wait - Use wait.ForLog instead of undefined testcontainers.NewLogStrategy - Remove unused strings import 2. trino_catalog_test.go: - Use net.JoinHostPort instead of fmt.Sprintf for address formatting - Properly handles IPv6 addresses by wrapping them in brackets * Use weed mini for simpler SeaweedFS startup Replace complex multi-process startup (master, volume, filer, s3) with single 'weed mini' command that starts all services together. Benefits: - Simpler, more reliable startup - Single weed mini process vs 4 separate processes - Automatic coordination between components - Better port management with no manual coordination Changes: - Remove separate master, volume, filer process startup - Use weed mini with -master.port, -filer.port, -s3.port flags - Keep Iceberg REST as separate service (still needed) - Increase timeout to 15s for port readiness (weed mini startup) - Remove volumePort and filerProcess fields from TestEnvironment - Simplify cleanup to only handle two processes (mini, iceberg rest) * Clean up dead code and temp directory leaks Fixes: 1. Remove dead s3Process field and cleanup: - weed mini bundles S3 gateway, no separate process needed - Removed s3Process field from TestEnvironment - Removed unnecessary s3Process cleanup code 2. Fix temp config directory leak: - Add sparkConfigDir field to TestEnvironment - Store returned configDir in writeSparkConfig - Clean up sparkConfigDir in Cleanup() with os.RemoveAll - Prevents accumulation of temp directories in test runs 3. Simplify Cleanup: - Now handles only necessary processes (weed mini, iceberg rest) - Removes both seaweedfsDataDir and sparkConfigDir - Cleaner shutdown sequence * Use weed mini's built-in Iceberg REST and fix python binary Changes: - Add -s3.port.iceberg flag to weed mini for built-in Iceberg REST Catalog - Remove separate 'weed server' process for Iceberg REST - Remove icebergRestProcess field from TestEnvironment - Simplify Cleanup() to only manage weed mini + Spark - Add port readiness check for iceberg REST from weed mini - Set Spark container Cmd to '/bin/sh -c sleep 3600' to keep it running - Change python to python3 in container.Exec calls This simplifies to truly one all-in-one weed mini process (master, filer, s3, iceberg-rest) plus just the Spark container. * go fmt * clean up * bind on a non-loopback IP for container access, aligned Iceberg metadata saves/locations with table locations, and reworked Spark time travel to use TIMESTAMP AS OF with safe timestamp extraction. * shared mini start * Fixed internal directory creation under /buckets so .objects paths can auto-create without failing bucket-name validation, which restores table bucket object writes * fix path Updated table bucket objects to write under `/buckets/<bucket>` and saved Iceberg metadata there, adjusting Spark time-travel timestamp to committed_at +1s. Rebuilt the weed binary (`go install ./weed`) and confirmed passing tests for Spark and Trino with focused test commands. * Updated table bucket creation to stop creating /buckets/.objects and switched Trino REST warehouse to s3://<bucket> to match Iceberg layout. * Stabilize S3Tables integration tests * Fix timestamp extraction and remove dead code in bucketDir * Use table bucket as warehouse in s3tables tests * Update trino_blog_operations_test.go * adds the CASCADE option to handle any remaining table metadata/files in the schema directory * skip namespace not empty	4 weeks ago
Chris Lu	e6ee293c17	Add table operations test (#8241 ) * Add Trino blog operations test * Update test/s3tables/catalog_trino/trino_blog_operations_test.go Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> * feat: add table bucket path helpers and filer operations - Add table object root and table location mapping directories - Implement ensureDirectory, upsertFile, deleteEntryIfExists helpers - Support table location bucket mapping for S3 access * feat: manage table bucket object roots on creation/deletion - Create .objects directory for table buckets on creation - Clean up table object bucket paths on deletion - Enable S3 operations on table bucket object roots * feat: add table location mapping for Iceberg REST - Track table location bucket mappings when tables are created/updated/deleted - Enable location-based routing for S3 operations on table data * feat: route S3 operations to table bucket object roots - Route table-s3 bucket names to mapped table paths - Route table buckets to object root directories - Support table location bucket mapping lookup * feat: emit table-s3 locations from Iceberg REST - Generate unique table-s3 bucket names with UUID suffix - Store table metadata under table bucket paths - Return table-s3 locations for Trino compatibility * fix: handle missing directories in S3 list operations - Propagate ErrNotFound from ListEntries for non-existent directories - Treat missing directories as empty results for list operations - Fixes Trino non-empty location checks on table creation * test: improve Trino CSV parsing for single-value results - Sanitize Trino output to skip jline warnings - Handle single-value CSV results without header rows - Strip quotes from numeric values in tests * refactor: use bucket path helpers throughout S3 API - Replace direct bucket path operations with helper functions - Leverage centralized table bucket routing logic - Improve maintainability with consistent path resolution * fix: add table bucket cache and improve filer error handling - Cache table bucket lookups to reduce filer overhead on repeated checks - Use filer_pb.CreateEntry and filer_pb.UpdateEntry helpers to check resp.Error - Fix delete order in handler_bucket_get_list_delete: delete table object before directory - Make location mapping errors best-effort: log and continue, don't fail API - Update table location mappings to delete stale prior bucket mappings on update - Add 1-second sleep before timestamp time travel query to ensure timestamps are in past - Fix CSV parsing: examine all lines, not skip first; handle single-value rows * fix: properly handle stale metadata location mapping cleanup - Capture oldMetadataLocation before mutation in handleUpdateTable - Update updateTableLocationMapping to accept both old and new locations - Use passed-in oldMetadataLocation to detect location changes - Delete stale mapping only when location actually changes - Pass empty string for oldLocation in handleCreateTable (new tables have no prior mapping) - Improve logging to show old -> new location transitions * refactor: cleanup imports and cache design - Remove unused 'sync' import from bucket_paths.go - Use filer_pb.UpdateEntry helper in setExtendedAttribute and deleteExtendedAttribute for consistent error handling - Add dedicated tableBucketCache map[string]bool to BucketRegistry instead of mixing concerns with metadataCache - Improve cache separation: table buckets cache is now separate from bucket metadata cache * fix: improve cache invalidation and add transient error handling Cache invalidation (critical fix): - Add tableLocationCache to BucketRegistry for location mapping lookups - Clear tableBucketCache and tableLocationCache in RemoveBucketMetadata - Prevents stale cache entries when buckets are deleted/recreated Transient error handling: - Only cache table bucket lookups when conclusive (found or ErrNotFound) - Skip caching on transient errors (network, permission, etc) - Prevents marking real table buckets as non-table due to transient failures Performance optimization: - Cache tableLocationDir results to avoid repeated filer RPCs on hot paths - tableLocationDir now checks cache before making expensive filer lookups - Cache stores empty string for 'not found' to avoid redundant lookups Code clarity: - Add comment to deleteDirectory explaining DeleteEntry response lacks Error field * go fmt * fix: mirror transient error handling in tableLocationDir and optimize bucketDir Transient error handling: - tableLocationDir now only caches definitive results - Mirrors isTableBucket behavior to prevent treating transient errors as permanent misses - Improves reliability on flaky systems or during recovery Performance optimization: - bucketDir avoids redundant isTableBucket call via bucketRoot - Directly use s3a.option.BucketsPath for regular buckets - Saves one cache lookup for every non-table bucket operation * fix: revert bucketDir optimization to preserve bucketRoot logic The optimization to directly use BucketsPath bypassed bucketRoot's logic and caused issues with S3 list operations on delimiter+prefix cases. Revert to using path.Join(s3a.bucketRoot(bucket), bucket) which properly handles all bucket types and ensures consistent path resolution across the codebase. The slight performance cost of an extra cache lookup is worth the correctness and consistency benefits. * feat: move table buckets under /buckets Add a table-bucket marker attribute, reuse bucket metadata cache for table bucket detection, and update list/validation/UI/test paths to treat table buckets as /buckets entries. * Fix S3 Tables code review issues - handler_bucket_create.go: Fix bucket existence check to properly validate entryResp.Entry before setting s3BucketExists flag (nil Entry should not indicate existing bucket) - bucket_paths.go: Add clarifying comment to bucketRoot() explaining unified buckets root path for all bucket types - file_browser_data.go: Optimize by extracting table bucket check early to avoid redundant WithFilerClient call * Fix list prefix delimiter handling * Handle list errors conservatively * Fix Trino FOR TIMESTAMP query - use past timestamp Iceberg requires the timestamp to be strictly in the past. Use current_timestamp - interval '1' second instead of current_timestamp. --------- Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	4 weeks ago
Chris Lu	c284e51d20	fix: multipart upload ETag calculation (#8238 ) * fix multipart etag * address comments * clean up * clean up * optimization * address comments * unquoted etag * dedup * upgrade * clean * etag * return quoted tag * quoted etag * debug * s3api: unify ETag retrieval and quoting across handlers Refactor newListEntry to take S3ApiServer and use getObjectETag, and update setResponseHeaders to use the same logic. This ensures consistent ETags are returned for both listing and direct access. s3api: implement ListObjects deduplication for versioned buckets Handle duplicate entries between the main path and the .versions directory by prioritizing the latest version when bucket versioning is enabled. * s3api: cleanup stale main file entries during versioned uploads Add explicit deletion of pre-existing "main" files when creating new versions in versioned buckets. This prevents stale entries from appearing in bucket listings and ensures consistency. * s3api: fix cleanup code placement in versioned uploads Correct the placement of rm calls in completeMultipartUpload and putVersionedObject to ensure stale main files are properly deleted during versioned uploads. * s3api: improve getObjectETag fallback for empty ExtETagKey Ensure that when ExtETagKey exists but contains an empty value, the function falls through to MD5/chunk-based calculation instead of returning an empty string. * s3api: fix test files for new newListEntry signature Update test files to use the new newListEntry signature where the first parameter is S3ApiServer. Created mockS3ApiServer to properly test owner display name lookup functionality. s3api: use filer.ETag for consistent Md5 handling in getEtagFromEntry Change getEtagFromEntry fallback to use filer.ETag(entry) instead of filer.ETagChunks to ensure legacy entries with Attributes.Md5 are handled consistently with the rest of the codebase. * s3api: optimize list logic and fix conditional header logging - Hoist bucket versioning check out of per-entry callback to avoid repeated getVersioningState calls - Extract appendOrDedup helper function to eliminate duplicate dedup/append logic across multiple code paths - Change If-Match mismatch logging from glog.Errorf to glog.V(3).Infof and remove DEBUG prefix for consistency * s3api: fix test mock to properly initialize IAM accounts Fixed nil pointer dereference in TestNewListEntryOwnerDisplayName by directly initializing the IdentityAccessManagement.accounts map in the test setup. This ensures newListEntry can properly look up account display names without panicking. * cleanup * s3api: remove premature main file cleanup in versioned uploads Removed incorrect cleanup logic that was deleting main files during versioned uploads. This was causing test failures because it deleted objects that should have been preserved as null versions when versioning was first enabled. The deduplication logic in listing is sufficient to handle duplicate entries without deleting files during upload. * s3api: add empty-value guard to getEtagFromEntry Added the same empty-value guard used in getObjectETag to prevent returning quoted empty strings. When ExtETagKey exists but is empty, the function now falls through to filer.ETag calculation instead of returning "". * s3api: fix listing of directory key objects with matching prefix Revert prefix handling logic to use strings.TrimPrefix instead of checking HasPrefix with empty string result. This ensures that when a directory key object exactly matches the prefix (e.g. prefix="dir/", object="dir/"), it is correctly handled as a regular entry instead of being skipped or incorrectly processed as a common prefix. Also fixed missing variable definition. * s3api: refactor list inline dedup to use appendOrDedup helper Refactored the inline deduplication logic in listFilerEntries to use the shared appendOrDedup helper function. This ensures consistent behavior and reduces code duplication. * test: fix port allocation race in s3tables integration test Updated startMiniCluster to find all required ports simultaneously using findAvailablePorts instead of sequentially. This prevents race conditions where the OS reallocates a port that was just released, causing multiple services (e.g. Filer and Volume) to be assigned the same port and fail to start.	4 weeks ago
Chris Lu	2163570d16	test: add CRUD tests for S3 Tables Catalog Trino integration (#8236 ) * test: add comprehensive CRUD tests for S3 Tables Catalog Trino integration - Add TestNamespaceCRUD: Tests complete Create-Read-Update-Delete lifecycle for namespaces - Add TestNamespaceListingPagination: Tests listing multiple namespaces with verification - Add TestNamespaceErrorHandling: Tests error handling for edge cases (IF EXISTS, IF NOT EXISTS) - Add TestSchemaIntegrationWithCatalog: Tests integration between Trino SQL and Iceberg REST Catalog All tests pass successfully and use Trino SQL interface for practical integration testing. Tests properly skip when Docker is unavailable. Use randomized namespace names to avoid conflicts in parallel execution. The tests provide comprehensive coverage of namespace/schema CRUD operations which form the foundation of the Iceberg catalog integration with Trino. * test: Address code review feedback for S3 Tables Catalog Trino CRUD tests - Extract common test setup into setupTrinoTest() helper function - Replace all fmt.Printf calls with idiomatic t.Logf - Change namespace deletion verification from t.Logf to t.Errorf for proper test failures - Enhance TestNamespaceErrorHandling with persistence verification test - Remove unnecessary fmt import - Improve test documentation with clarifying comments * test: Fix schema naming and remove verbose output logging - Fix TestNamespaceListingPagination schema name generation: use fmt.Sprintf instead of string(rune()) - Remove verbose logging of SHOW SCHEMAS output to reduce noise in test logs - Keep high-level operation logging while removing detailed result output	4 weeks ago
Chris Lu	a3b83f8808	test: add Trino Iceberg catalog integration test (#8228 ) * test: add Trino Iceberg catalog integration test - Create test/s3/catalog_trino/trino_catalog_test.go with TestTrinoIcebergCatalog - Tests integration between Trino SQL engine and SeaweedFS Iceberg REST catalog - Starts weed mini with all services and Trino in Docker container - Validates Iceberg catalog schema creation and listing operations - Uses native S3 filesystem support in Trino with path-style access - Add workflow job to s3-tables-tests.yml for CI execution * fix: preserve AWS environment credentials when replacing S3 configuration When S3 configuration is loaded from filer/db, it replaces the identities list and inadvertently removes AWS_ACCESS_KEY_ID credentials that were added from environment variables. This caused auth to remain disabled even though valid credentials were present. Fix by preserving environment-based identities when replacing the configuration and re-adding them after the replacement. This ensures environment credentials persist across configuration reloads and properly enable authentication. * fix: use correct ServerAddress format with gRPC port encoding The admin server couldn't connect to master because the master address was missing the gRPC port information. Use pb.NewServerAddress() which properly encodes both HTTP and gRPC ports in the address string. Changes: - weed/command/mini.go: Use pb.NewServerAddress for master address in admin - test/s3/policy/policy_test.go: Store and use gRPC ports for master/filer addresses This fix applies to: 1. Admin server connection to master (mini.go) 2. Test shell commands that need master/filer addresses (policy_test.go) * move * move * fix: always include gRPC port in server address encoding The NewServerAddress() function was omitting the gRPC port from the address string when it matched the port+10000 convention. However, gRPC port allocation doesn't always follow this convention - when the calculated port is busy, an alternative port is allocated. This caused a bug where: 1. Master's gRPC port was allocated as 50661 (sequential, not port+10000) 2. Address was encoded as '192.168.1.66:50660' (gRPC port omitted) 3. Admin client called ToGrpcAddress() which assumed port+10000 offset 4. Admin tried to connect to 60660 but master was on 50661 → connection failed Fix: Always include explicit gRPC port in address format (host:httpPort.grpcPort) unless gRPC port is 0. This makes addresses unambiguous and works regardless of the port allocation strategy used. Impacts: All server-to-server gRPC connections now use properly formatted addresses. * test: fix Iceberg REST API readiness check The Iceberg REST API endpoints require authentication. When checked without credentials, the API returns 403 Forbidden (not 401 Unauthorized). The readiness check now accepts both auth error codes (401/403) as indicators that the service is up and ready, it just needs credentials. This fixes the 'Iceberg REST API did not become ready' test failure. * Fix AWS SigV4 signature verification for base64-encoded payload hashes AWS SigV4 canonical requests must use hex-encoded SHA256 hashes, but the X-Amz-Content-Sha256 header may be transmitted as base64. Changes: - Added normalizePayloadHash() function to convert base64 to hex - Call normalizePayloadHash() in extractV4AuthInfoFromHeader() - Added encoding/base64 import Fixes 403 Forbidden errors on POST requests to Iceberg REST API when clients send base64-encoded content hashes in the header. Impacted services: Iceberg REST API, S3Tables * Fix AWS SigV4 signature verification for base64-encoded payload hashes AWS SigV4 canonical requests must use hex-encoded SHA256 hashes, but the X-Amz-Content-Sha256 header may be transmitted as base64. Changes: - Added normalizePayloadHash() function to convert base64 to hex - Call normalizePayloadHash() in extractV4AuthInfoFromHeader() - Added encoding/base64 import - Removed unused fmt import Fixes 403 Forbidden errors on POST requests to Iceberg REST API when clients send base64-encoded content hashes in the header. Impacted services: Iceberg REST API, S3Tables * pass sigv4 * s3api: fix identity preservation and logging levels - Ensure environment-based identities are preserved during config replacement - Update accessKeyIdent and nameToIdentity maps correctly - Downgrade informational logs to V(2) to reduce noise * test: fix trino integration test and s3 policy test - Pin Trino image version to 479 - Fix port binding to 0.0.0.0 for Docker connectivity - Fix S3 policy test hang by correctly assigning MiniClusterCtx - Improve port finding robustness in policy tests * ci: pre-pull trino image to avoid timeouts - Pull trinodb/trino:479 after Docker setup - Ensure image is ready before integration tests start * iceberg: remove unused checkAuth and improve logging - Remove unused checkAuth method - Downgrade informational logs to V(2) - Ensure loggingMiddleware uses a status writer for accurate reported codes - Narrow catch-all route to avoid interfering with other subsystems * iceberg: fix build failure by removing unused s3api import * Update iceberg.go * use warehouse * Update trino_catalog_test.go	4 weeks ago
Chris Lu	833bcde9f3	test: add Trino Iceberg catalog integration test - Create test/s3/catalog_trino/trino_catalog_test.go with TestTrinoIcebergCatalog - Tests integration between Trino SQL engine and SeaweedFS Iceberg REST catalog - Starts weed mini with all services and Trino in Docker container - Validates Iceberg catalog schema creation and listing operations - Uses native S3 filesystem support in Trino with path-style access - Add workflow job to s3-tables-tests.yml for CI execution	4 weeks ago
Chris Lu	e39a4c2041	fix flaky test	4 weeks ago

1 2 3 4 5

221 Commits (master)