seaweedfs

Commit Graph

Author	SHA1	Message	Date
Chris Lu	35ad7d08a5	remove debug	3 days ago
Chris Lu	0d8588e3ae	S3: Implement IAM defaults and STS signing key fallback (#8348 ) * S3: Implement IAM defaults and STS signing key fallback logic * S3: Refactor startup order to init SSE-S3 key manager before IAM * S3: Derive STS signing key from KEK using HKDF for security isolation * S3: Document STS signing key fallback in security.toml * fix(s3api): refine anonymous access logic and secure-by-default behavior - Initialize anonymous identity by default in `NewIdentityAccessManagement` to prevent nil pointer exceptions. - Ensure `ReplaceS3ApiConfiguration` preserves the anonymous identity if not present in the new configuration. - Update `NewIdentityAccessManagement` signature to accept `filerClient`. - In legacy mode (no policy engine), anonymous defaults to Deny (no actions), preserving secure-by-default behavior. - Use specific `LookupAnonymous` method instead of generic map lookup. - Update tests to accommodate signature changes and verify improved anonymous handling. * feat(s3api): make IAM configuration optional - Start S3 API server without a configuration file if `EnableIam` option is set. - Default to `Allow` effect for policy engine when no configuration is provided (Zero-Config mode). - Handle empty configuration path gracefully in `loadIAMManagerFromConfig`. - Add integration test `iam_optional_test.go` to verify empty config behavior. * fix(iamapi): fix signature mismatch in NewIdentityAccessManagementWithStore * fix(iamapi): properly initialize FilerClient instead of passing nil * fix(iamapi): properly initialize filer client for IAM management - Instead of passing `nil`, construct a `wdclient.FilerClient` using the provided `Filers` addresses. - Ensure `NewIdentityAccessManagementWithStore` receives a valid `filerClient` to avoid potential nil pointer dereferences or limited functionality. * clean: remove dead code in s3api_server.go * refactor(s3api): improve IAM initialization, safety and anonymous access security * fix(s3api): ensure IAM config loads from filer after client init * fix(s3): resolve test failures in integration, CORS, and tagging tests - Fix CORS tests by providing explicit anonymous permissions config - Fix S3 integration tests by setting admin credentials in init - Align tagging test credentials in CI with IAM defaults - Added goroutine to retry IAM config load in iamapi server * fix(s3): allow anonymous access to health targets and S3 Tables when identities are present * fix(ci): use /healthz for Caddy health check in awscli tests * iam, s3api: expose DefaultAllow from IAM and Policy Engine This allows checking the global "Open by Default" configuration from other components like S3 Tables. * s3api/s3tables: support DefaultAllow in permission logic and handler Updated CheckPermissionWithContext to respect the DefaultAllow flag in PolicyContext. This enables "Open by Default" behavior for unauthenticated access in zero-config environments. Added a targeted unit test to verify the logic. * s3api/s3tables: propagate DefaultAllow through handlers Propagated the DefaultAllow flag to individual handlers for namespaces, buckets, tables, policies, and tagging. This ensures consistent "Open by Default" behavior across all S3 Tables API endpoints. * s3api: wire up DefaultAllow for S3 Tables API initialization Updated registerS3TablesRoutes to query the global IAM configuration and set the DefaultAllow flag on the S3 Tables API server. This completes the end-to-end propagation required for anonymous access in zero-config environments. Added a SetDefaultAllow method to S3TablesApiServer to facilitate this. * s3api: fix tests by adding DefaultAllow to mock IAM integrations The IAMIntegration interface was updated to include DefaultAllow(), breaking several mock implementations in tests. This commit fixes the build errors by adding the missing method to the mocks. * env * ensure ports * env * env * fix default allow * add one more test using non-anonymous user * debug * add more debug * less logs	3 days ago
Chris Lu	703d5e27b3	Fix S3 ListObjectsV2 recursion issue (#8347 ) * Fix S3 ListObjectsV2 recursion issue (#8346) Removed aggressive Limit=1 optimization in doListFilerEntries that caused missed directory entries when prefix ended with a delimiter. Added regression tests to verify deep directory traversal. * Address PR comments: condense test comments	4 days ago
Chris Lu	e863767ac7	cleanup(iam): final removal of temporary debug logging from STS and S3 API	5 days ago
Chris Lu	e29a7f1741	cleanup(iam): remove temporary debug logging from STS and S3 API (redo)	5 days ago
Chris Lu	cf8e383e1e	STS: Fallback to Caller Identity when RoleArn is missing in AssumeRole (#8345 ) * s3api: make RoleArn optional in AssumeRole * s3api: address PR feedback for optional RoleArn * iam: add configurable default role for AssumeRole * S3 STS: Use caller identity when RoleArn is missing - Fallback to PrincipalArn/Context in AssumeRole if RoleArn is empty - Handle User ARNs in prepareSTSCredentials - Fix PrincipalArn generation for env var credentials * Test: Add unit test for AssumeRole caller identity fallback * fix(s3api): propagate admin permissions to assumed role session when using caller identity fallback * STS: Fix is_admin propagation and optimize IAM policy evaluation for assumed roles - Restore is_admin propagation via JWT req_ctx - Optimize IsActionAllowed to skip role lookups for admin sessions - Ensure session policies are still applied for downscoping - Remove debug logging - Fix syntax errors in cleanup * fix(iam): resolve STS policy bypass for admin sessions - Fixed IsActionAllowed in iam_manager.go to correctly identify and validate internal STS tokens, ensuring session policies are enforced. - Refactored VerifyActionPermission in auth_credentials.go to properly handle session tokens and avoid legacy authorization short-circuits. - Added debug logging for better tracing of policy evaluation and session validation.	5 days ago
Chris Lu	7799915e50	Fix IAM identity loss on S3 restart migration (#8343 ) * Fix IAM reload after legacy config migration Handle legacy identity.json metadata events by reloading from the credential manager instead of parsing event content, and watch the correct /etc/iam multi-file directories so identity changes are applied. Add regression tests for legacy deletion and /etc/iam/identities change events. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Fix auth_credentials_subscribe_test helper to not pollute global memory store The SaveConfiguration call was affecting other tests. Use local credential manager and ReplaceS3ApiConfiguration instead. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Fix IAM event watching: subscribe to IAM directories and improve directory matching - Add /etc/iam and its subdirectories (identities, policies, service_accounts) to directoriesToWatch - Fix directory matching to avoid false positives from sibling directories - Use exact match or prefix with trailing slash instead of plain HasPrefix - Prevents matching hypothetical /etc/iam/identities_backup directory This ensures IAM config change events are actually delivered to the handler. * fix tests --------- Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	6 days ago
Chris Lu	f44e25b422	fix(iam): ensure access key status is persisted and defaulted to Active (#8341 ) * Fix master leader election startup issue Fixes #error-log-leader-not-selected-yet * not useful test * fix(iam): ensure access key status is persisted and defaulted to Active * make pb * update tests * using constants	6 days ago
Chris Lu	49a64f50f1	Add session policy support to IAM (#8338 ) * Add session policy support to IAM - Implement policy evaluation for session tokens in policy_engine.go - Add session_policy field to session claims for tracking applied policies - Update STS service to include session policies in token generation - Add IAM integration tests for session policy validation - Update IAM manager to support policy attachment to sessions - Extend S3 API STS endpoint to handle session policy restrictions * fix: optimize session policy evaluation and add documentation * sts: add NormalizeSessionPolicy helper for inline session policies * sts: support inline session policies for AssumeRoleWithWebIdentity and credential-based flows * s3api: parse and normalize Policy parameter for STS HTTP handlers * tests: add session policy unit tests and integration tests for inline policy downscoping * tests: add s3tables STS inline policy integration * iam: handle user principals and validate tokens * sts: enforce inline session policy size limit * tests: harden s3tables STS integration config * iam: clarify principal policy resolution errors * tests: improve STS integration endpoint selection	6 days ago
Chris Lu	c433fee36a	s3api: fix AccessDenied by correctly propagating principal ARN in vended tokens (#8330 ) * s3api: fix AccessDenied by correctly propagating principal ARN in vended tokens * s3api: update TestLoadS3ApiConfiguration to match standardized ARN format * s3api: address PR review comments (nil-safety and cleanup) * s3api: address second round of PR review comments (cleanups and naming conventions) * s3api: address third round of PR review comments (unify default account ID and duplicate log) * s3api: address fourth round of PR review comments (define defaultAccountID as constant)	7 days ago
Chris Lu	796f23f68a	Fix STS InvalidAccessKeyId and request body consumption issues (#8328 ) * Fix STS InvalidAccessKeyId and request body consumption in Lakekeeper integration test * Remove debug prints * Add Lakekeeper integration tests to CI * Fix connection refused in CI by binding to 0.0.0.0 * Add timeout to docker run in Lakekeeper integration test * Update weed/s3api/auth_credentials.go Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>	1 week ago
FivegenLLC	951eeefb76	fix(s3): lifecycle TTL rules inherit replication and volumeGrowthCount from filer config (#8321 ) * fix(s3): lifecycle TTL rules inherit replication from parent path and filer config PutBucketLifecycleConfiguration wrote filer.conf entries with empty replication, so effective replication could differ from operator default. Now we resolve replication from parent path rule (MatchStorageRule) then filer global config; only Replication is set on the rule (no DataCenter/Rack/DataNode for S3). * add volumeGrowthCount * review --------- Co-authored-by: Dmitiy Gushchin <dag@fivegen.ru>	1 week ago
Chris Lu	4e1065e485	Fix: preserve request body for STS signature verification (#8324 ) * Fix: preserve request body for STS signature verification - Save and restore request body in UnifiedPostHandler after ParseForm() - This allows STS handler to verify signatures correctly - Fixes 'invalid AWS signature: 53' error (ErrContentSHA256Mismatch) - ParseForm() consumes the body, so we need to restore it for downstream handlers * Improve error handling in UnifiedPostHandler - Add http.MaxBytesReader to limit body size to 10 MiB (iamRequestBodyLimit) - Add proper error handling for io.ReadAll failures - Log errors when body reading fails - Prevents DoS attacks from oversized request bodies - Addresses code review feedback	1 week ago
Chris Lu	c1a9263e37	Fix STS AssumeRole with POST body param (#8320 ) * Fix STS AssumeRole with POST body param and add integration test * Add STS integration test to CI workflow * Address code review feedback: fix HPP vulnerability and style issues * Refactor: address code review feedback - Fix HTTP Parameter Pollution vulnerability in UnifiedPostHandler - Refactor permission check logic for better readability - Extract test helpers to testutil/docker.go to reduce duplication - Clean up imports and simplify context setting * Add SigV4-style test variant for AssumeRole POST body routing - Added ActionInBodyWithSigV4Style test case to validate real-world scenario - Test confirms routing works correctly for AWS SigV4-signed requests - Addresses code review feedback about testing with SigV4 signatures * Fix: always set identity in context when non-nil - Ensure UnifiedPostHandler always calls SetIdentityInContext when identity is non-nil - Only call SetIdentityNameInContext when identity.Name is non-empty - This ensures downstream handlers (embeddedIam.DoActions) always have access to identity - Addresses potential issue where empty identity.Name would skip context setting	1 week ago
Chris Lu	8b5d31e5eb	s3api/policy_engine: use forwarded client IP for aws:SourceIp (#8304 ) * s3api: honor forwarded source IP for policy conditions Prefer X-Forwarded-For/X-Real-Ip before RemoteAddr when populating aws:SourceIp in policy condition evaluation. Also avoid noisy parsing behavior for unix socket markers and add coverage for precedence/fallback paths.\n\nFixes #8301. * s3api: simplify remote addr parsing * s3api: guard aws:SourceIp against DNS hosts * s3api: simplify remote addr fallback * s3api: simplify remote addr parsing * Update weed/s3api/policy_engine/engine.go Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Fix TestExtractConditionValuesFromRequestSourceIPPrecedence using trusted private IP * Refactor extractSourceIP to use R-to-L XFF parsing and net.IP.IsPrivate --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>	1 week ago
Chris Lu	b57429ef2e	Switch empty-folder cleanup to bucket policy (#8292 ) * Fix Spark _temporary cleanup and add issue #8285 regression test * Generalize empty folder cleanup for Spark temp artifacts * Revert synchronous folder pruning and add cleanup diagnostics * Add actionable empty-folder cleanup diagnostics * Fix Spark temp marker cleanup in async folder cleaner * Fix Spark temp cleanup with implicit directory markers * Keep explicit directory markers non-implicit * logging * more logs * Switch empty-folder cleanup to bucket policy * Seaweed-X-Amz-Allow-Empty-Folders * less logs * go vet * less logs * refactoring	1 week ago
Chris Lu	5c365e7090	s3api: return 400 for invalid namespace query in REST table routes (#8296 ) * s3api: reject invalid namespace query in REST table routes * s3api: expand namespace validation REST tests	1 week ago
Chris Lu	822dbed552	s3api: fix ListObjectsV2 NextContinuationToken duplication for nested prefix (#8294 ) * s3api: fix duplicate ListObjectsV2 continuation token for nested prefix * s3api: include prefix in common-prefix continuation token	1 week ago
Chris Lu	0385acba02	s3tables: fix shared table-location bucket mapping collisions (#8286 ) * s3tables: prevent shared table-location bucket mapping overwrite * Update weed/s3api/bucket_paths.go Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> --------- Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	1 week ago
Chris Lu	d6825ffce2	Iceberg: implement stage-create finalize flow (phase 1) (#8279 ) * iceberg: implement stage-create and create-on-commit finalize * iceberg: add create validation error typing and stage-create integration test * tests: merge stage-create integration check into catalog suite * tests: cover stage-create finalize lifecycle in catalog integration * iceberg: persist and cleanup stage-create markers * iceberg: add stage-create rollout flag and marker pruning * docs: add stage-create support design and rollout plan * docs: drop stage-create design draft from PR * iceberg: use conservative 72h stage-marker retention * iceberg: address review comments on create-on-commit and tests * iceberg: keep stage-create metadata out of table location * refactor(iceberg): split iceberg.go into focused files	1 week ago
Chris Lu	d88f6ed0af	Iceberg commit reliability: preserve statistics updates and return 409 conflicts (#8277 ) * iceberg: harden table commit updates and conflict handling * iceberg: refine commit retry and statistics patching * iceberg: cleanup metadata on non-conflict commit errors	1 week ago
Chris Lu	5ae3be44d1	iceberg: persist namespace properties for create/get (#8276 ) * iceberg: persist namespace properties via s3tables metadata * iceberg: simplify namespace properties normalization * s3tables: broaden namespace properties round-trip test * adjust logs * adjust logs	1 week ago
Chris Lu	1c62808c0e	iceberg: wire pagination for list namespaces/tables REST APIs (#8275 ) * s3api/iceberg: wire list pagination tokens and page size * fmt * Update weed/s3api/iceberg/iceberg.go Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> --------- Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	1 week ago
Chris Lu	aef2de3109	s3tables: support multi-level namespaces in parser/admin paths (#8273 ) * s3tables: support multi-level namespace normalization * admin: handle namespace parsing errors centrally * admin: clean namespace validation duplication	1 week ago
Chris Lu	be26ce74ce	s3tables: support multi-level namespace normalization	1 week ago
Chris Lu	59b02e0cba	s3api: fix multipart Complete ETag matching and lower empty-upload log noise (#8264 ) s3api: fix multipart part etag validation and reduce empty upload warning noise	1 week ago
Chris Lu	5a0204310c	Add Iceberg admin UI (#8246 ) * Add Iceberg table details view * Enhance Iceberg catalog browsing UI * Fix Iceberg UI security and logic issues - Fix selectSchema() and partitionFieldsFromFullMetadata() to always search for matching IDs instead of checking != 0 - Fix snapshotsFromFullMetadata() to defensive-copy before sorting to prevent mutating caller's slice - Fix XSS vulnerabilities in s3tables.js: replace innerHTML with textContent/createElement for user-controlled data - Fix deleteIcebergTable() to redirect to namespace tables list on details page instead of reloading - Fix data-bs-target in iceberg_namespaces.templ: remove templ.SafeURL for CSS selector - Add catalogName to delete modal data attributes for proper redirect - Remove unused hidden inputs from create table form (icebergTableBucketArn, icebergTableNamespace) * Regenerate templ files for Iceberg UI updates * Support complex Iceberg type objects in schema Change Type field from string to json.RawMessage in both IcebergSchemaFieldInfo and internal icebergSchemaField to properly handle Iceberg spec's complex type objects (e.g. {"type": "struct", "fields": [...]}). Currently test data only shows primitive string types, but this change makes the implementation defensively robust for future complex types by preserving the exact JSON representation. Add typeToString() helper and update schema extraction functions to marshal string types as JSON. Update template to convert json.RawMessage to string for display. * Regenerate templ files for Type field changes * templ * Fix additional Iceberg UI issues from code review - Fix lazy-load flag that was set before async operation completed, preventing retries on error; now sets loaded flag only after successful load and throws error to caller for proper error handling and UI updates - Add zero-time guards for CreatedAt and ModifiedAt fields in table details to avoid displaying Go zero-time values; render dash when time is zero - Add URL path escaping for all catalog/namespace/table names in URLs to prevent malformed URLs when names contain special characters like /, ?, or # - Remove redundant innerHTML clear in loadIcebergNamespaceTables that cleared twice before appending the table list - Fix selectSnapshotForMetrics to remove != 0 guard for consistency with selectSchema fix; now always searches for CurrentSnapshotID without zero-value gate - Enhance typeToString() helper to display '(complex)' for non-primitive JSON types * Regenerate templ files for Phase 3 updates * Fix template generation to use correct file paths Run templ generate from repo root instead of weed/admin directory to ensure generated _templ.go files have correct absolute paths in error messages (e.g., 'weed/admin/view/app/iceberg_table_details.templ' instead of 'app/iceberg_table_details.templ'). This ensures both 'make admin-generate' at repo root and 'make generate' in weed/admin directory produce identical output with consistent file path references. * Regenerate template files with correct path references * Validate S3 Tables names in UI - Add client-side validation for table bucket and namespace names to surface errors for invalid characters (dots/underscores) before submission - Use HTML validity messages with reportValidity for immediate feedback - Update namespace helper text to reflect actual constraints (single-level, lowercase letters, numbers, and underscores) * Regenerate templ files for namespace helper text * Fix Iceberg catalog REST link and actions * Disallow S3 object access on table buckets * Validate Iceberg layout for table bucket objects * Fix REST API link to /v1/config * merge iceberg page with table bucket page * Allowed Trino/Iceberg stats files in metadata validation * fixes - Backend/data handling: - Normalized Iceberg type display and fallback handling in weed/admin/dash/s3tables_management.go. - Fixed snapshot fallback pointer semantics in weed/admin/dash/s3tables_management.go. - Added CSRF token generation/propagation/validation for namespace create/delete in: - weed/admin/dash/csrf.go - weed/admin/dash/auth_middleware.go - weed/admin/dash/middleware.go - weed/admin/dash/s3tables_management.go - weed/admin/view/layout/layout.templ - weed/admin/static/js/s3tables.js - UI/template fixes: - Zero-time guards for CreatedAt fields in: - weed/admin/view/app/iceberg_namespaces.templ - weed/admin/view/app/iceberg_tables.templ - Fixed invalid templ-in-script interpolation and host/port rendering in: - weed/admin/view/app/iceberg_catalog.templ - weed/admin/view/app/s3tables_buckets.templ - Added data-catalog-name consistency on Iceberg delete action in weed/admin/view/app/iceberg_tables.templ. - Updated retry wording in weed/admin/static/js/s3tables.js. - Regenerated all affected _templ.go files. - S3 API/comment follow-ups: - Reused cached table-bucket validator in weed/s3api/bucket_paths.go. - Added validation-failure debug logging in weed/s3api/s3api_object_handlers_tagging.go. - Added multipart path-validation design comment in weed/s3api/s3api_object_handlers_multipart.go. - Build tooling: - Fixed templ generate working directory issues in weed/admin/Makefile (watch + pattern rule). * populate data * test/s3tables: harden populate service checks * admin: skip table buckets in object-store bucket list * admin sidebar: move object store to top-level links * admin iceberg catalog: guard zero times and escape links * admin forms: add csrf/error handling and client-side name validation * admin s3tables: fix namespace delete modal redeclaration * admin: replace native confirm dialogs with modal helpers * admin modal-alerts: remove noisy confirm usage console log * reduce logs * test/s3tables: use partitioned tables in trino and spark populate * admin file browser: normalize filer ServerAddress for HTTP parsing	2 weeks ago
Chris Lu	be6b5db65a	s3: fix health check endpoints returning 404 for HEAD requests #8243 (#8248 ) * Fix disk errors handling in vacuum compaction When a disk reports IO errors during vacuum compaction (e.g., 'read /mnt/d1/weed/oc_xyz.dat: input/output error'), the vacuum task should signal the error to the master so it can: 1. Drop the faulty volume replica 2. Rebuild the replica from healthy copies Changes: - Add checkReadWriteError() calls in vacuum read paths (ReadNeedleBlob, ReadData, ScanVolumeFile) to flag EIO errors in volume.lastIoError - Preserve error wrapping using %w format instead of %v so EIO propagates correctly - The existing heartbeat logic will detect lastIoError and remove the bad volume Fixes issue #8237 * error * s3: fix health check endpoints returning 404 for HEAD requests #8243	2 weeks ago
Chris Lu	403592bb9f	Add Spark Iceberg catalog integration tests and CI support (#8242 ) * Add Spark Iceberg catalog integration tests and CI support Implement comprehensive integration tests for Spark with SeaweedFS Iceberg REST catalog: - Basic CRUD operations (Create, Read, Update, Delete) on Iceberg tables - Namespace (database) management - Data insertion, querying, and deletion - Time travel capabilities via snapshot versioning - Compatible with SeaweedFS S3 and Iceberg REST endpoints Tests mirror the structure of existing Trino integration tests but use Spark's Python SQL API and PySpark for testing. Add GitHub Actions CI job for spark-iceberg-catalog-tests in s3-tables-tests.yml to automatically run Spark integration tests on pull requests. * fmt * Fix Spark integration tests - code review feedback * go mod tidy * Add go mod tidy step to integration test jobs Add 'go mod tidy' step before test runs for all integration test jobs: - s3-tables-tests - iceberg-catalog-tests - trino-iceberg-catalog-tests - spark-iceberg-catalog-tests This ensures dependencies are clean before running tests. * Fix remaining Spark operations test issues Address final code review comments: Setup & Initialization: - Add waitForSparkReady() helper function that polls Spark readiness with backoff instead of hardcoded 10-second sleep - Extract setupSparkTestEnv() helper to reduce boilerplate duplication between TestSparkCatalogBasicOperations and TestSparkTimeTravel - Both tests now use helpers for consistent, reliable setup Assertions & Validation: - Make setup-critical operations (namespace, table creation, initial insert) use t.Fatalf instead of t.Errorf to fail fast - Validate setupSQL output in TestSparkTimeTravel and fail if not 'Setup complete' - Add validation after second INSERT in TestSparkTimeTravel: verify row count increased to 2 before time travel test - Add context to error messages with namespace and tableName params Code Quality: - Remove code duplication between test functions - All critical paths now properly validated - Consistent error handling throughout * Fix go vet errors in S3 Tables tests Fixes: 1. setup_test.go (Spark): - Add missing import: github.com/testcontainers/testcontainers-go/wait - Use wait.ForLog instead of undefined testcontainers.NewLogStrategy - Remove unused strings import 2. trino_catalog_test.go: - Use net.JoinHostPort instead of fmt.Sprintf for address formatting - Properly handles IPv6 addresses by wrapping them in brackets * Use weed mini for simpler SeaweedFS startup Replace complex multi-process startup (master, volume, filer, s3) with single 'weed mini' command that starts all services together. Benefits: - Simpler, more reliable startup - Single weed mini process vs 4 separate processes - Automatic coordination between components - Better port management with no manual coordination Changes: - Remove separate master, volume, filer process startup - Use weed mini with -master.port, -filer.port, -s3.port flags - Keep Iceberg REST as separate service (still needed) - Increase timeout to 15s for port readiness (weed mini startup) - Remove volumePort and filerProcess fields from TestEnvironment - Simplify cleanup to only handle two processes (mini, iceberg rest) * Clean up dead code and temp directory leaks Fixes: 1. Remove dead s3Process field and cleanup: - weed mini bundles S3 gateway, no separate process needed - Removed s3Process field from TestEnvironment - Removed unnecessary s3Process cleanup code 2. Fix temp config directory leak: - Add sparkConfigDir field to TestEnvironment - Store returned configDir in writeSparkConfig - Clean up sparkConfigDir in Cleanup() with os.RemoveAll - Prevents accumulation of temp directories in test runs 3. Simplify Cleanup: - Now handles only necessary processes (weed mini, iceberg rest) - Removes both seaweedfsDataDir and sparkConfigDir - Cleaner shutdown sequence * Use weed mini's built-in Iceberg REST and fix python binary Changes: - Add -s3.port.iceberg flag to weed mini for built-in Iceberg REST Catalog - Remove separate 'weed server' process for Iceberg REST - Remove icebergRestProcess field from TestEnvironment - Simplify Cleanup() to only manage weed mini + Spark - Add port readiness check for iceberg REST from weed mini - Set Spark container Cmd to '/bin/sh -c sleep 3600' to keep it running - Change python to python3 in container.Exec calls This simplifies to truly one all-in-one weed mini process (master, filer, s3, iceberg-rest) plus just the Spark container. * go fmt * clean up * bind on a non-loopback IP for container access, aligned Iceberg metadata saves/locations with table locations, and reworked Spark time travel to use TIMESTAMP AS OF with safe timestamp extraction. * shared mini start * Fixed internal directory creation under /buckets so .objects paths can auto-create without failing bucket-name validation, which restores table bucket object writes * fix path Updated table bucket objects to write under `/buckets/<bucket>` and saved Iceberg metadata there, adjusting Spark time-travel timestamp to committed_at +1s. Rebuilt the weed binary (`go install ./weed`) and confirmed passing tests for Spark and Trino with focused test commands. * Updated table bucket creation to stop creating /buckets/.objects and switched Trino REST warehouse to s3://<bucket> to match Iceberg layout. * Stabilize S3Tables integration tests * Fix timestamp extraction and remove dead code in bucketDir * Use table bucket as warehouse in s3tables tests * Update trino_blog_operations_test.go * adds the CASCADE option to handle any remaining table metadata/files in the schema directory * skip namespace not empty	2 weeks ago
Chris Lu	e6ee293c17	Add table operations test (#8241 ) * Add Trino blog operations test * Update test/s3tables/catalog_trino/trino_blog_operations_test.go Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> * feat: add table bucket path helpers and filer operations - Add table object root and table location mapping directories - Implement ensureDirectory, upsertFile, deleteEntryIfExists helpers - Support table location bucket mapping for S3 access * feat: manage table bucket object roots on creation/deletion - Create .objects directory for table buckets on creation - Clean up table object bucket paths on deletion - Enable S3 operations on table bucket object roots * feat: add table location mapping for Iceberg REST - Track table location bucket mappings when tables are created/updated/deleted - Enable location-based routing for S3 operations on table data * feat: route S3 operations to table bucket object roots - Route table-s3 bucket names to mapped table paths - Route table buckets to object root directories - Support table location bucket mapping lookup * feat: emit table-s3 locations from Iceberg REST - Generate unique table-s3 bucket names with UUID suffix - Store table metadata under table bucket paths - Return table-s3 locations for Trino compatibility * fix: handle missing directories in S3 list operations - Propagate ErrNotFound from ListEntries for non-existent directories - Treat missing directories as empty results for list operations - Fixes Trino non-empty location checks on table creation * test: improve Trino CSV parsing for single-value results - Sanitize Trino output to skip jline warnings - Handle single-value CSV results without header rows - Strip quotes from numeric values in tests * refactor: use bucket path helpers throughout S3 API - Replace direct bucket path operations with helper functions - Leverage centralized table bucket routing logic - Improve maintainability with consistent path resolution * fix: add table bucket cache and improve filer error handling - Cache table bucket lookups to reduce filer overhead on repeated checks - Use filer_pb.CreateEntry and filer_pb.UpdateEntry helpers to check resp.Error - Fix delete order in handler_bucket_get_list_delete: delete table object before directory - Make location mapping errors best-effort: log and continue, don't fail API - Update table location mappings to delete stale prior bucket mappings on update - Add 1-second sleep before timestamp time travel query to ensure timestamps are in past - Fix CSV parsing: examine all lines, not skip first; handle single-value rows * fix: properly handle stale metadata location mapping cleanup - Capture oldMetadataLocation before mutation in handleUpdateTable - Update updateTableLocationMapping to accept both old and new locations - Use passed-in oldMetadataLocation to detect location changes - Delete stale mapping only when location actually changes - Pass empty string for oldLocation in handleCreateTable (new tables have no prior mapping) - Improve logging to show old -> new location transitions * refactor: cleanup imports and cache design - Remove unused 'sync' import from bucket_paths.go - Use filer_pb.UpdateEntry helper in setExtendedAttribute and deleteExtendedAttribute for consistent error handling - Add dedicated tableBucketCache map[string]bool to BucketRegistry instead of mixing concerns with metadataCache - Improve cache separation: table buckets cache is now separate from bucket metadata cache * fix: improve cache invalidation and add transient error handling Cache invalidation (critical fix): - Add tableLocationCache to BucketRegistry for location mapping lookups - Clear tableBucketCache and tableLocationCache in RemoveBucketMetadata - Prevents stale cache entries when buckets are deleted/recreated Transient error handling: - Only cache table bucket lookups when conclusive (found or ErrNotFound) - Skip caching on transient errors (network, permission, etc) - Prevents marking real table buckets as non-table due to transient failures Performance optimization: - Cache tableLocationDir results to avoid repeated filer RPCs on hot paths - tableLocationDir now checks cache before making expensive filer lookups - Cache stores empty string for 'not found' to avoid redundant lookups Code clarity: - Add comment to deleteDirectory explaining DeleteEntry response lacks Error field * go fmt * fix: mirror transient error handling in tableLocationDir and optimize bucketDir Transient error handling: - tableLocationDir now only caches definitive results - Mirrors isTableBucket behavior to prevent treating transient errors as permanent misses - Improves reliability on flaky systems or during recovery Performance optimization: - bucketDir avoids redundant isTableBucket call via bucketRoot - Directly use s3a.option.BucketsPath for regular buckets - Saves one cache lookup for every non-table bucket operation * fix: revert bucketDir optimization to preserve bucketRoot logic The optimization to directly use BucketsPath bypassed bucketRoot's logic and caused issues with S3 list operations on delimiter+prefix cases. Revert to using path.Join(s3a.bucketRoot(bucket), bucket) which properly handles all bucket types and ensures consistent path resolution across the codebase. The slight performance cost of an extra cache lookup is worth the correctness and consistency benefits. * feat: move table buckets under /buckets Add a table-bucket marker attribute, reuse bucket metadata cache for table bucket detection, and update list/validation/UI/test paths to treat table buckets as /buckets entries. * Fix S3 Tables code review issues - handler_bucket_create.go: Fix bucket existence check to properly validate entryResp.Entry before setting s3BucketExists flag (nil Entry should not indicate existing bucket) - bucket_paths.go: Add clarifying comment to bucketRoot() explaining unified buckets root path for all bucket types - file_browser_data.go: Optimize by extracting table bucket check early to avoid redundant WithFilerClient call * Fix list prefix delimiter handling * Handle list errors conservatively * Fix Trino FOR TIMESTAMP query - use past timestamp Iceberg requires the timestamp to be strictly in the past. Use current_timestamp - interval '1' second instead of current_timestamp. --------- Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2 weeks ago
Chris Lu	c284e51d20	fix: multipart upload ETag calculation (#8238 ) * fix multipart etag * address comments * clean up * clean up * optimization * address comments * unquoted etag * dedup * upgrade * clean * etag * return quoted tag * quoted etag * debug * s3api: unify ETag retrieval and quoting across handlers Refactor newListEntry to take S3ApiServer and use getObjectETag, and update setResponseHeaders to use the same logic. This ensures consistent ETags are returned for both listing and direct access. s3api: implement ListObjects deduplication for versioned buckets Handle duplicate entries between the main path and the .versions directory by prioritizing the latest version when bucket versioning is enabled. * s3api: cleanup stale main file entries during versioned uploads Add explicit deletion of pre-existing "main" files when creating new versions in versioned buckets. This prevents stale entries from appearing in bucket listings and ensures consistency. * s3api: fix cleanup code placement in versioned uploads Correct the placement of rm calls in completeMultipartUpload and putVersionedObject to ensure stale main files are properly deleted during versioned uploads. * s3api: improve getObjectETag fallback for empty ExtETagKey Ensure that when ExtETagKey exists but contains an empty value, the function falls through to MD5/chunk-based calculation instead of returning an empty string. * s3api: fix test files for new newListEntry signature Update test files to use the new newListEntry signature where the first parameter is S3ApiServer. Created mockS3ApiServer to properly test owner display name lookup functionality. s3api: use filer.ETag for consistent Md5 handling in getEtagFromEntry Change getEtagFromEntry fallback to use filer.ETag(entry) instead of filer.ETagChunks to ensure legacy entries with Attributes.Md5 are handled consistently with the rest of the codebase. * s3api: optimize list logic and fix conditional header logging - Hoist bucket versioning check out of per-entry callback to avoid repeated getVersioningState calls - Extract appendOrDedup helper function to eliminate duplicate dedup/append logic across multiple code paths - Change If-Match mismatch logging from glog.Errorf to glog.V(3).Infof and remove DEBUG prefix for consistency * s3api: fix test mock to properly initialize IAM accounts Fixed nil pointer dereference in TestNewListEntryOwnerDisplayName by directly initializing the IdentityAccessManagement.accounts map in the test setup. This ensures newListEntry can properly look up account display names without panicking. * cleanup * s3api: remove premature main file cleanup in versioned uploads Removed incorrect cleanup logic that was deleting main files during versioned uploads. This was causing test failures because it deleted objects that should have been preserved as null versions when versioning was first enabled. The deduplication logic in listing is sufficient to handle duplicate entries without deleting files during upload. * s3api: add empty-value guard to getEtagFromEntry Added the same empty-value guard used in getObjectETag to prevent returning quoted empty strings. When ExtETagKey exists but is empty, the function now falls through to filer.ETag calculation instead of returning "". * s3api: fix listing of directory key objects with matching prefix Revert prefix handling logic to use strings.TrimPrefix instead of checking HasPrefix with empty string result. This ensures that when a directory key object exactly matches the prefix (e.g. prefix="dir/", object="dir/"), it is correctly handled as a regular entry instead of being skipped or incorrectly processed as a common prefix. Also fixed missing variable definition. * s3api: refactor list inline dedup to use appendOrDedup helper Refactored the inline deduplication logic in listFilerEntries to use the shared appendOrDedup helper function. This ensures consistent behavior and reduces code duplication. * test: fix port allocation race in s3tables integration test Updated startMiniCluster to find all required ports simultaneously using findAvailablePorts instead of sequentially. This prevents race conditions where the OS reallocates a port that was just released, causing multiple services (e.g. Filer and Volume) to be assigned the same port and fail to start.	2 weeks ago
Chris Lu	a3b83f8808	test: add Trino Iceberg catalog integration test (#8228 ) * test: add Trino Iceberg catalog integration test - Create test/s3/catalog_trino/trino_catalog_test.go with TestTrinoIcebergCatalog - Tests integration between Trino SQL engine and SeaweedFS Iceberg REST catalog - Starts weed mini with all services and Trino in Docker container - Validates Iceberg catalog schema creation and listing operations - Uses native S3 filesystem support in Trino with path-style access - Add workflow job to s3-tables-tests.yml for CI execution * fix: preserve AWS environment credentials when replacing S3 configuration When S3 configuration is loaded from filer/db, it replaces the identities list and inadvertently removes AWS_ACCESS_KEY_ID credentials that were added from environment variables. This caused auth to remain disabled even though valid credentials were present. Fix by preserving environment-based identities when replacing the configuration and re-adding them after the replacement. This ensures environment credentials persist across configuration reloads and properly enable authentication. * fix: use correct ServerAddress format with gRPC port encoding The admin server couldn't connect to master because the master address was missing the gRPC port information. Use pb.NewServerAddress() which properly encodes both HTTP and gRPC ports in the address string. Changes: - weed/command/mini.go: Use pb.NewServerAddress for master address in admin - test/s3/policy/policy_test.go: Store and use gRPC ports for master/filer addresses This fix applies to: 1. Admin server connection to master (mini.go) 2. Test shell commands that need master/filer addresses (policy_test.go) * move * move * fix: always include gRPC port in server address encoding The NewServerAddress() function was omitting the gRPC port from the address string when it matched the port+10000 convention. However, gRPC port allocation doesn't always follow this convention - when the calculated port is busy, an alternative port is allocated. This caused a bug where: 1. Master's gRPC port was allocated as 50661 (sequential, not port+10000) 2. Address was encoded as '192.168.1.66:50660' (gRPC port omitted) 3. Admin client called ToGrpcAddress() which assumed port+10000 offset 4. Admin tried to connect to 60660 but master was on 50661 → connection failed Fix: Always include explicit gRPC port in address format (host:httpPort.grpcPort) unless gRPC port is 0. This makes addresses unambiguous and works regardless of the port allocation strategy used. Impacts: All server-to-server gRPC connections now use properly formatted addresses. * test: fix Iceberg REST API readiness check The Iceberg REST API endpoints require authentication. When checked without credentials, the API returns 403 Forbidden (not 401 Unauthorized). The readiness check now accepts both auth error codes (401/403) as indicators that the service is up and ready, it just needs credentials. This fixes the 'Iceberg REST API did not become ready' test failure. * Fix AWS SigV4 signature verification for base64-encoded payload hashes AWS SigV4 canonical requests must use hex-encoded SHA256 hashes, but the X-Amz-Content-Sha256 header may be transmitted as base64. Changes: - Added normalizePayloadHash() function to convert base64 to hex - Call normalizePayloadHash() in extractV4AuthInfoFromHeader() - Added encoding/base64 import Fixes 403 Forbidden errors on POST requests to Iceberg REST API when clients send base64-encoded content hashes in the header. Impacted services: Iceberg REST API, S3Tables * Fix AWS SigV4 signature verification for base64-encoded payload hashes AWS SigV4 canonical requests must use hex-encoded SHA256 hashes, but the X-Amz-Content-Sha256 header may be transmitted as base64. Changes: - Added normalizePayloadHash() function to convert base64 to hex - Call normalizePayloadHash() in extractV4AuthInfoFromHeader() - Added encoding/base64 import - Removed unused fmt import Fixes 403 Forbidden errors on POST requests to Iceberg REST API when clients send base64-encoded content hashes in the header. Impacted services: Iceberg REST API, S3Tables * pass sigv4 * s3api: fix identity preservation and logging levels - Ensure environment-based identities are preserved during config replacement - Update accessKeyIdent and nameToIdentity maps correctly - Downgrade informational logs to V(2) to reduce noise * test: fix trino integration test and s3 policy test - Pin Trino image version to 479 - Fix port binding to 0.0.0.0 for Docker connectivity - Fix S3 policy test hang by correctly assigning MiniClusterCtx - Improve port finding robustness in policy tests * ci: pre-pull trino image to avoid timeouts - Pull trinodb/trino:479 after Docker setup - Ensure image is ready before integration tests start * iceberg: remove unused checkAuth and improve logging - Remove unused checkAuth method - Downgrade informational logs to V(2) - Ensure loggingMiddleware uses a status writer for accurate reported codes - Narrow catch-all route to avoid interfering with other subsystems * iceberg: fix build failure by removing unused s3api import * Update iceberg.go * use warehouse * Update trino_catalog_test.go	2 weeks ago
Chris Lu	c2bfd7b524	fix: honor SSE-C chunk offsets in decryption for large chunked uploads (#8216 ) * fix: honor SSE-C chunk offsets in decryption for large chunked uploads Fixes issue #8215 where SSE-C decryption for large objects could corrupt data by ignoring per-chunk PartOffset values. Changes: - Add TestSSECLargeObjectChunkReassembly unit test to verify correct decryption of 19MB object split into 8MB chunks using PartOffset - Update decryptSSECChunkView and createMultipartSSECDecryptedReaderDirect to extract PartOffset from SSE-C metadata and pass to CreateSSECDecryptedReaderWithOffset for offset-aware decryption - Fix createCTRStreamWithOffset to use calculateIVWithOffset for proper block-aligned counter advancement, matching SSE-KMS/S3 behavior - Update comments to clarify SSE-C IV handling uses per-chunk offsets (unlike base IV approach used by KMS/S3) All tests pass: go test ./weed/s3api ✓ * fix: close chunkReader on error paths in createMultipartSSECDecryptedReader Address resource leak issue reported in PR #8216: ensure chunkReader is properly closed before returning on all error paths, including: - DeserializeSSECMetadata failures - IV decoding errors - Invalid PartOffset values - SSE-C reader creation failures - Missing per-chunk metadata This prevents leaking network connections and file handles during SSE-C multipart decryption error scenarios. * docs: clarify SSE-C IV handling in decryptSSECChunkView comment Replace misleading warning 'Do NOT call calculateIVWithOffset' with accurate explanation that: - CreateSSECDecryptedReaderWithOffset internally uses calculateIVWithOffset to advance the CTR counter to reach PartOffset - calculateIVWithOffset is applied only to the per-part IV, NOT to derive a global base IV for all parts - This differs fundamentally from SSE-KMS/SSE-S3 which use base IV + calculateIVWithOffset(ChunkOffset) This clarifies the IV advancement mechanism while contrasting it with the base IV approach used by other encryption schemes.	2 weeks ago
Chris Lu	7831257ed5	s3: allow single Statement object in policy document (#8212 ) * s3: allow single Statement object in policy document Fixes #8201 * s3: add unit test for single Statement object in policy * s3: improve error message for malformed PolicyDocument.Statement * s3: simplify error message for malformed PolicyDocument.Statement	2 weeks ago
Chris Lu	c9c46db77e	s3api: fix ListObjectVersions inconsistency with delimiters (#8210 ) * s3api: fix ListObjectVersions inconsistency with delimiters (fixes #8206) Prioritize handling of .versions and .uploads directories before delimiter processing in collectVersions. This ensures .versions directories are processed as version containers instead of being incorrectly rolled up into CommonPrefixes when a delimiter is used. * s3api: refactor processDirectory to remove redundant special directory checks These checks are now handled in the main collectVersions loop.	2 weeks ago
Chris Lu	000e2bd4a9	logging and debugging	2 weeks ago
Chris Lu	f66a23b472	Fix: filer not yet available in s3.configure (#8198 ) * Fix: Initialize filer CredentialManager with filer address * The fix involves checking for directory existence before creation. * adjust error message * Fix: Implement FilerAddressSetter in PropagatingCredentialStore * Refactor: Reorder credential manager initialization in filer server * refactor	2 weeks ago
Chris Lu	b244bb58aa	s3tables: redesign Iceberg REST Catalog using iceberg-go and automate integration tests (#8197 ) * full integration with iceberg-go * Table Commit Operations (handleUpdateTable) * s3tables: fix Iceberg v2 compliance and namespace properties This commit ensures SeaweedFS Iceberg REST Catalog is compliant with Iceberg Format Version 2 by: - Using iceberg-go's table.NewMetadataWithUUID for strict v2 compliance. - Explicitly initializing namespace properties to empty maps. - Removing omitempty from required Iceberg response fields. - Fixing CommitTableRequest unmarshaling using table.Requirements and table.Updates. * s3tables: automate Iceberg integration tests - Added Makefile for local test execution and cluster management. - Added docker-compose for PyIceberg compatibility kit. - Added Go integration test harness for PyIceberg. - Updated GitHub CI to run Iceberg catalog tests automatically. * s3tables: update PyIceberg test suite for compatibility - Updated test_rest_catalog.py to use latest PyIceberg transaction APIs. - Updated Dockerfile to include pyarrow and pandas dependencies. - Improved namespace and table handling in integration tests. * s3tables: address review feedback on Iceberg Catalog - Implemented robust metadata version parsing and incrementing. - Ensured table metadata changes are persisted during commit (handleUpdateTable). - Standardized namespace property initialization for consistency. - Fixed unused variable and incorrect struct field build errors. * s3tables: finalize Iceberg REST Catalog and optimize tests - Implemented robust metadata versioning and persistence. - Standardized namespace property initialization. - Optimized integration tests using pre-built Docker image. - Added strict property persistence validation to test suite. - Fixed build errors from previous partial updates. * Address PR review: fix Table UUID stability, implement S3Tables UpdateTable, and support full metadata persistence individually * fix: Iceberg catalog stable UUIDs, metadata persistence, and file writing - Ensure table UUIDs are stable (do not regenerate on load). - Persist full table metadata (Iceberg JSON) in s3tables extended attributes. - Add `MetadataVersion` to explicitly track version numbers, replacing regex parsing. - Implement `saveMetadataFile` to persist metadata JSON files to the Filer on commit. - Update `CreateTable` and `UpdateTable` handlers to use the new logic. * test: bind weed mini to 0.0.0.0 in integration tests to fix Docker connectivity * Iceberg: fix metadata handling in REST catalog - Add nil guard in createTable - Fix updateTable to correctly load existing metadata from storage - Ensure full metadata persistence on updates - Populate loadTable result with parsed metadata * S3Tables: add auth checks and fix response fields in UpdateTable - Add CheckPermissionWithContext to UpdateTable handler - Include TableARN and MetadataLocation in UpdateTable response - Use ErrCodeConflict (409) for version token mismatches * Tests: improve Iceberg catalog test infrastructure and cleanup - Makefile: use PID file for precise process killing - test_rest_catalog.py: remove unused variables and fix f-strings * Iceberg: fix variable shadowing in UpdateTable - Rename inner loop variable `req` to `requirement` to avoid shadowing outer request variable * S3Tables: simplify MetadataVersion initialization - Use `max(req.MetadataVersion, 1)` instead of anonymous function * Tests: remove unicode characters from S3 tables integration test logs - Remove unicode checkmarks from test output for cleaner logs * Iceberg: improve metadata persistence robustness - Fix MetadataLocation in LoadTableResult to fallback to generated location - Improve saveMetadataFile to ensure directory hierarchy existence and robust error handling	2 weeks ago
Chris Lu	1274cf038c	s3: enforce authentication and JSON error format for Iceberg REST Catalog (#8192 ) * s3: enforce authentication and JSON error format for Iceberg REST Catalog * s3/iceberg: align error exception types with OpenAPI spec examples * s3api: refactor AuthenticateRequest to return identity object * s3/iceberg: propagate full identity object to request context * s3/iceberg: differentiate NotAuthorizedException and ForbiddenException * s3/iceberg: reject requests if authenticator is nil to prevent auth bypass * s3/iceberg: refactor Auth middleware to build context incrementally and use switch for error mapping * s3api: update misleading comment for authRequestWithAuthType * s3api: return ErrAccessDenied if IAM is not configured to prevent auth bypass * s3/iceberg: optimize context update in Auth middleware * s3api: export CanDo for external authorization use * s3/iceberg: enforce identity-based authorization in all API handlers * s3api: fix compilation errors by updating internal CanDo references * s3/iceberg: robust identity validation and consistent action usage in handlers * s3api: complete CanDo rename across tests and policy engine integration * s3api: fix integration tests by allowing admin access when auth is disabled and explicit gRPC ports * duckdb * create test bucket	2 weeks ago
Chris Lu	2bb21ea276	feat: Add Iceberg REST Catalog server and admin UI (#8175 ) * feat: Add Iceberg REST Catalog server Implement Iceberg REST Catalog API on a separate port (default 8181) that exposes S3 Tables metadata through the Apache Iceberg REST protocol. - Add new weed/s3api/iceberg package with REST handlers - Implement /v1/config endpoint returning catalog configuration - Implement namespace endpoints (list/create/get/head/delete) - Implement table endpoints (list/create/load/head/delete/update) - Add -port.iceberg flag to S3 standalone server (s3.go) - Add -s3.port.iceberg flag to combined server mode (server.go) - Add -s3.port.iceberg flag to mini cluster mode (mini.go) - Support prefix-based routing for multiple catalogs The Iceberg REST server reuses S3 Tables metadata storage under /table-buckets and enables DuckDB, Spark, and other Iceberg clients to connect to SeaweedFS as a catalog. * feat: Add Iceberg Catalog pages to admin UI Add admin UI pages to browse Iceberg catalogs, namespaces, and tables. - Add Iceberg Catalog menu item under Object Store navigation - Create iceberg_catalog.templ showing catalog overview with REST info - Create iceberg_namespaces.templ listing namespaces in a catalog - Create iceberg_tables.templ listing tables in a namespace - Add handlers and routes in admin_handlers.go - Add Iceberg data provider methods in s3tables_management.go - Add Iceberg data types in types.go The Iceberg Catalog pages provide visibility into the same S3 Tables data through an Iceberg-centric lens, including REST endpoint examples for DuckDB and PyIceberg. * test: Add Iceberg catalog integration tests and reorg s3tables tests - Reorganize existing s3tables tests to test/s3tables/table-buckets/ - Add new test/s3tables/catalog/ for Iceberg REST catalog tests - Add TestIcebergConfig to verify /v1/config endpoint - Add TestIcebergNamespaces to verify namespace listing - Add TestDuckDBIntegration for DuckDB connectivity (requires Docker) - Update CI workflow to use new test paths * fix: Generate proper random UUIDs for Iceberg tables Address code review feedback: - Replace placeholder UUID with crypto/rand-based UUID v4 generation - Add detailed TODO comments for handleUpdateTable stub explaining the required atomic metadata swap implementation * fix: Serve Iceberg on localhost listener when binding to different interface Address code review feedback: properly serve the localhost listener when the Iceberg server is bound to a non-localhost interface. * ci: Add Iceberg catalog integration tests to CI Add new job to run Iceberg catalog tests in CI, along with: - Iceberg package build verification - Iceberg unit tests - Iceberg go vet checks - Iceberg format checks * fix: Address code review feedback for Iceberg implementation - fix: Replace hardcoded account ID with s3_constants.AccountAdminId in buildTableBucketARN() - fix: Improve UUID generation error handling with deterministic fallback (timestamp + PID + counter) - fix: Update handleUpdateTable to return HTTP 501 Not Implemented instead of fake success - fix: Better error handling in handleNamespaceExists to distinguish 404 from 500 errors - fix: Use relative URL in template instead of hardcoded localhost:8181 - fix: Add HTTP timeout to test's waitForService function to avoid hangs - fix: Use dynamic ephemeral ports in integration tests to avoid flaky parallel failures - fix: Add Iceberg port to final port configuration logging in mini.go * fix: Address critical issues in Iceberg implementation - fix: Cache table UUIDs to ensure persistence across LoadTable calls The UUID now remains stable for the lifetime of the server session. TODO: For production, UUIDs should be persisted in S3 Tables metadata. - fix: Remove redundant URL-encoded namespace parsing mux router already decodes %1F to \x1F before passing to handlers. Redundant ReplaceAll call could cause bugs with literal %1F in namespace. * fix: Improve test robustness and reduce code duplication - fix: Make DuckDB test more robust by failing on unexpected errors Instead of silently logging errors, now explicitly check for expected conditions (extension not available) and skip the test appropriately. - fix: Extract username helper method to reduce duplication Created getUsername() helper in AdminHandlers to avoid duplicating the username retrieval logic across Iceberg page handlers. * fix: Add mutex protection to table UUID cache Protects concurrent access to the tableUUIDs map with sync.RWMutex. Uses read-lock for fast path when UUID already cached, and write-lock for generating new UUIDs. Includes double-check pattern to handle race condition between read-unlock and write-lock. * style: fix go fmt errors * feat(iceberg): persist table UUID in S3 Tables metadata * feat(admin): configure Iceberg port in Admin UI and commands * refactor: address review comments (flags, tests, handlers) - command/mini: fix tracking of explicit s3.port.iceberg flag - command/admin: add explicit -iceberg.port flag - admin/handlers: reuse getUsername helper - tests: use 127.0.0.1 for ephemeral ports and os.Stat for file size check * test: check error from FileStat in verify_gc_empty_test	2 weeks ago
Chris Lu	621834d96a	s3tables: add Iceberg file layout validation for table buckets (#8176 ) * s3tables: add Iceberg file layout validation for table buckets This PR adds file layout validation for table buckets to enforce Apache Iceberg table structure. Files uploaded to table buckets must conform to the expected Iceberg layout: - metadata/ directory: contains metadata files (.json, .avro) - v.metadata.json (table metadata) - snap-.avro (snapshot manifests) - -m.avro (manifest files) - version-hint.text - data/ directory: contains data files (.parquet, .orc, .avro) - Supports partition paths (e.g., year=2024/month=01/) - Supports bucket subdirectories The validator exports functions for use by the S3 API: - IsTableBucketPath: checks if a path is under /table-buckets/ - GetTableInfoFromPath: extracts bucket/namespace/table from path - ValidateTableBucketUpload: validates file layout for table bucket uploads - ValidateTableBucketUploadWithClient: validates with filer client access Invalid uploads receive InvalidIcebergLayout error response. Address review comments: regex performance, error handling, stricter patterns * Fix validateMetadataFile and validateDataFile to handle subdirectories and directory creation * Fix error handling, metadata validation, reduce code duplication * Fix empty remainingPath handling for directory paths * Refactor: unify validateMetadataFile and validateDataFile * Refactor: extract UUID pattern constant * fix: allow Iceberg partition and directory paths without trailing slashes Modified validateFile to correctly handle directory paths that do not end with a trailing slash. This ensures that paths like 'data/year=2024' are validated as directories if they match partition or subdirectory patterns, rather than being incorrectly rejected as invalid files. Added comprehensive test cases for various directory and partition path combinations. * refactor: use standard path package and idiomatic returns Simplified directory and filename extraction in validateFile by using the standard path package (aliased as pathpkg). This improves readability and avoids manual string manipulation. Also updated GetTableInfoFromPath to use naked returns for named return values, aligning with Go conventions for short functions. * feat: enforce strict Iceberg top-level directories and metadata restrictions Implemented strict validation for Iceberg layout: - Bare top-level keys like 'metadata' and 'data' are now rejected; they must have a trailing slash or a subpath. - Subdirectories under 'metadata/' are now prohibited to enforce the flat structure required by Iceberg. - Updated the test suite with negative test cases and ensured proper formatting. * feat: allow table root directory markers in ValidateTableBucketUpload Modified ValidateTableBucketUpload to short-circuit and return nil when the relative path within a table is empty. This occurs when a trailing slash is used on the table directory (e.g., /table-buckets/mybucket/myns/mytable/). Added a test case 'table dir with slash' to verify this behavior. * test: add regression cases for metadata subdirs and table markers Enforced a strictly flat structure for the metadata directory by removing the "directory without trailing slash" fallback in validateFile for metadata. Added regression test cases: - metadata/nested (must fail) - /table-buckets/.../mytable/ (must pass) Verified all tests pass. * feat: reject double slashes in Iceberg table paths Modified validateDirectoryPath to return an error when encountering empty path segments, effectively rejecting double slashes like 'data//file.parquet'. Updated validateFile to use manual path splitting instead of the 'path' package for intermediate directories to ensure redundant slashes are not auto-cleaned before validation. Added regression tests for various double slash scenarios. * refactor: separate isMetadata logic in validateDirectoryPath Following reviewer feedback, refactored validateDirectoryPath to explicitly separate the handling of metadata and data paths. This improves readability and clarifies the function's intent while maintaining the strict validation rules and double-slash rejection previously implemented. * feat: validate bucket, namespace, and table path segments Updated ValidateTableBucketUpload to ensure that bucket, namespace, and table segments in the path are non-empty. This prevents invalid paths like '/table-buckets//myns/mytable/...' from being accepted during upload. Added regression tests for various empty segment scenarios. * Update weed/s3api/s3tables/iceberg_layout.go Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> * feat: block double-slash bypass in table relative paths Added a guard in ValidateTableBucketUpload to reject tableRelativePath if it starts with a '/' or contains '//'. This ensures that paths like '/table-buckets/b/ns/t//data/file.parquet' are properly rejected and cannot bypass the layout validation. Added regression tests to verify. --------- Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2 weeks ago
Chris Lu	79722bcf30	Add s3tables shell and admin UI (#8172 ) * Add shared s3tables manager * Add s3tables shell commands * Add s3tables admin API * Add s3tables admin UI * Fix admin s3tables namespace create * Rename table buckets menu * Centralize s3tables tag validation * Reuse s3tables manager in admin * Extract s3tables list limit * Add s3tables bucket ARN helper * Remove write middleware from s3tables APIs * Fix bucket link and policy hint * Fix table tag parsing and nav link * Disable namespace table link on invalid ARN * Improve s3tables error decode * Return flag parse errors for s3tables tag * Accept query params for namespace create * Bind namespace create form data * Read s3tables JS data from DOM * s3tables: allow empty region ARN * shell: pass s3tables account id * shell: require account for table buckets * shell: use bucket name for namespaces * shell: use bucket name for tables * shell: use bucket name for tags * admin: add table buckets links in file browser * s3api: reuse s3tables tag validation * admin: harden s3tables UI handlers * fix admin list table buckets * allow admin s3tables access * validate s3tables bucket tags * log s3tables bucket metadata errors * rollback table bucket on owner failure * show s3tables bucket owner * add s3tables iam conditions * Add s3tables user permissions UI * Authorize s3tables using identity actions * Add s3tables permissions to user modal * Disambiguate bucket scope in user permissions * Block table bucket names that match S3 buckets * Pretty-print IAM identity JSON * Include tags in s3tables permission context * admin: refactor S3 Tables inline JavaScript into a separate file * s3tables: extend IAM policy condition operators support * shell: use LookupEntry wrapper for s3tables bucket conflict check * admin: handle buildBucketPermissions validation in create/update flows	3 weeks ago
Chris Lu	b2b0a38e71	s3api: allow empty region and account id in s3tables ARN (#8171 ) * s3api: allow empty region and account id in s3tables ARN * s3api: refactor S3 Tables ARN regex into a constant	3 weeks ago
Chris Lu	6a9e7360df	s3api: fix S3 Tables auth to allow auto-hashing of body (#8170 ) * s3api: allow auto-hashing of request body for s3tables * s3api: add unit test for s3tables body hashing	3 weeks ago
Chris Lu	f1e27b8f30	s3: change s3 tables to use RESTful API (#8169 ) * s3: refactor s3 tables to use RESTful API * test/s3tables: guard empty namespaces * s3api: document tag parsing and validate get-table * s3api: limit S3Tables REST body size * Update weed/s3api/s3api_tables.go Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Update weed/s3api/s3tables/handler.go Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * s3api: accept encoded table bucket ARNs * s3api: validate namespaces and close body * s3api: match encoded table bucket ARNs * s3api: scope table bucket ARN routes * s3api: dedupe table bucket request builders * test/s3tables: allow list tables without namespace * s3api: validate table params and tag ARN * s3api: tighten tag handling and get-table params * s3api: loosen tag ARN route matching * Fix S3 Tables REST routing and tests * Adjust S3 Tables request parsing * Gate S3 Tables target routing * Avoid double decoding namespaces --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>	3 weeks ago
Chris Lu	88c27615c4	/table-buckets	3 weeks ago
Chris Lu	8b61fd77b5	s3api: ensure MD5 is calculated or reused during CopyObject (#8163 ) * s3api: ensure MD5 is calculated or reused during CopyObject Fixes #8155 - Capture and reuse source MD5 for direct copies - Calculate MD5 for small inline objects during copy * s3api: refactor encryption logic and safe MD5 copying - Extract duplicated bucket default encryption logic into helper - Use safe append copy for MD5 slice to avoid shared modifications * refactor * avoids unnecessary MD5 recalculations for small files	3 weeks ago
Chris Lu	d399113e0c	test: fix duplicate subtest names in permissions_test.go Rename duplicate 'combined * and ?' test cases to include singular/plural suffix for clarity and to support targeted test runs.	3 weeks ago
Chris Lu	a4217dff5f	s3tables: enhance DeleteTable authorization with policy checking Fetch and evaluate table policies in DeleteTable handler to support policy-based delegation. Aligns authorization behavior with GetTable and ListTables handlers instead of only checking ownership.	3 weeks ago
Chris Lu	745a7e40a6	s3tables: improve bucket policy error handling in DeleteTableBucket Explicitly handle ErrAttributeNotFound vs other errors when fetching bucket policy. Return errors for non-expected failures to prevent masking filer issues and ensure correct authorization decisions.	3 weeks ago

1 2 3 4 5 ...

954 Commits (35ad7d08a5a03e489932438e6f656a2dff478048)