seaweedfs

History

Chris Lu 403592bb9f Add Spark Iceberg catalog integration tests and CI support (#8242 ) * Add Spark Iceberg catalog integration tests and CI support Implement comprehensive integration tests for Spark with SeaweedFS Iceberg REST catalog: - Basic CRUD operations (Create, Read, Update, Delete) on Iceberg tables - Namespace (database) management - Data insertion, querying, and deletion - Time travel capabilities via snapshot versioning - Compatible with SeaweedFS S3 and Iceberg REST endpoints Tests mirror the structure of existing Trino integration tests but use Spark's Python SQL API and PySpark for testing. Add GitHub Actions CI job for spark-iceberg-catalog-tests in s3-tables-tests.yml to automatically run Spark integration tests on pull requests. * fmt * Fix Spark integration tests - code review feedback * go mod tidy * Add go mod tidy step to integration test jobs Add 'go mod tidy' step before test runs for all integration test jobs: - s3-tables-tests - iceberg-catalog-tests - trino-iceberg-catalog-tests - spark-iceberg-catalog-tests This ensures dependencies are clean before running tests. * Fix remaining Spark operations test issues Address final code review comments: Setup & Initialization: - Add waitForSparkReady() helper function that polls Spark readiness with backoff instead of hardcoded 10-second sleep - Extract setupSparkTestEnv() helper to reduce boilerplate duplication between TestSparkCatalogBasicOperations and TestSparkTimeTravel - Both tests now use helpers for consistent, reliable setup Assertions & Validation: - Make setup-critical operations (namespace, table creation, initial insert) use t.Fatalf instead of t.Errorf to fail fast - Validate setupSQL output in TestSparkTimeTravel and fail if not 'Setup complete' - Add validation after second INSERT in TestSparkTimeTravel: verify row count increased to 2 before time travel test - Add context to error messages with namespace and tableName params Code Quality: - Remove code duplication between test functions - All critical paths now properly validated - Consistent error handling throughout * Fix go vet errors in S3 Tables tests Fixes: 1. setup_test.go (Spark): - Add missing import: github.com/testcontainers/testcontainers-go/wait - Use wait.ForLog instead of undefined testcontainers.NewLogStrategy - Remove unused strings import 2. trino_catalog_test.go: - Use net.JoinHostPort instead of fmt.Sprintf for address formatting - Properly handles IPv6 addresses by wrapping them in brackets * Use weed mini for simpler SeaweedFS startup Replace complex multi-process startup (master, volume, filer, s3) with single 'weed mini' command that starts all services together. Benefits: - Simpler, more reliable startup - Single weed mini process vs 4 separate processes - Automatic coordination between components - Better port management with no manual coordination Changes: - Remove separate master, volume, filer process startup - Use weed mini with -master.port, -filer.port, -s3.port flags - Keep Iceberg REST as separate service (still needed) - Increase timeout to 15s for port readiness (weed mini startup) - Remove volumePort and filerProcess fields from TestEnvironment - Simplify cleanup to only handle two processes (mini, iceberg rest) * Clean up dead code and temp directory leaks Fixes: 1. Remove dead s3Process field and cleanup: - weed mini bundles S3 gateway, no separate process needed - Removed s3Process field from TestEnvironment - Removed unnecessary s3Process cleanup code 2. Fix temp config directory leak: - Add sparkConfigDir field to TestEnvironment - Store returned configDir in writeSparkConfig - Clean up sparkConfigDir in Cleanup() with os.RemoveAll - Prevents accumulation of temp directories in test runs 3. Simplify Cleanup: - Now handles only necessary processes (weed mini, iceberg rest) - Removes both seaweedfsDataDir and sparkConfigDir - Cleaner shutdown sequence * Use weed mini's built-in Iceberg REST and fix python binary Changes: - Add -s3.port.iceberg flag to weed mini for built-in Iceberg REST Catalog - Remove separate 'weed server' process for Iceberg REST - Remove icebergRestProcess field from TestEnvironment - Simplify Cleanup() to only manage weed mini + Spark - Add port readiness check for iceberg REST from weed mini - Set Spark container Cmd to '/bin/sh -c sleep 3600' to keep it running - Change python to python3 in container.Exec calls This simplifies to truly one all-in-one weed mini process (master, filer, s3, iceberg-rest) plus just the Spark container. * go fmt * clean up * bind on a non-loopback IP for container access, aligned Iceberg metadata saves/locations with table locations, and reworked Spark time travel to use TIMESTAMP AS OF with safe timestamp extraction. * shared mini start * Fixed internal directory creation under /buckets so .objects paths can auto-create without failing bucket-name validation, which restores table bucket object writes * fix path Updated table bucket objects to write under `/buckets/<bucket>` and saved Iceberg metadata there, adjusting Spark time-travel timestamp to committed_at +1s. Rebuilt the weed binary (`go install ./weed`) and confirmed passing tests for Spark and Trino with focused test commands. * Updated table bucket creation to stop creating /buckets/.objects and switched Trino REST warehouse to s3://<bucket> to match Iceberg layout. * Stabilize S3Tables integration tests * Fix timestamp extraction and remove dead code in bucketDir * Use table bucket as warehouse in s3tables tests * Update trino_blog_operations_test.go * adds the CASCADE option to handle any remaining table metadata/files in the schema directory * skip namespace not empty		1 month ago
..
abstract_sql	Fix chown Input/output error on large file sets (#7996)	2 months ago
arangodb	chore: execute goimports to format the code (#7983)	2 months ago
cassandra	feat: add TLS configuration options for Cassandra2 store (#7998)	2 months ago
cassandra2	feat: add TLS configuration options for Cassandra2 store (#7998)	2 months ago
elastic/v7	Add error list each entry func (#7485)	4 months ago
empty_folder_cleanup	filer: auto clean empty implicit s3 folders (#8051)	2 months ago
etcd	chore: execute goimports to format the code (#7983)	2 months ago
foundationdb	filer: improve FoundationDB performance by disabling batch by default (#7770)	3 months ago
hbase	chore: execute goimports to format the code (#7983)	2 months ago
leveldb	chore: execute goimports to format the code (#7983)	2 months ago
leveldb2	chore: execute goimports to format the code (#7983)	2 months ago
leveldb3	chore: execute goimports to format the code (#7983)	2 months ago
mongodb	fix: comprehensive go vet error fixes and add CI enforcement (#7861)	3 months ago
mysql	Fix chown Input/output error on large file sets (#7996)	2 months ago
mysql2	print only adapted url	2 years ago
postgres	Filer Store: postgres backend support pgbouncer (#7077)	7 months ago
postgres2	Filer Store: postgres backend support pgbouncer (#7077)	7 months ago
redis	Add error list each entry func (#7485)	4 months ago
redis2	chore: execute goimports to format the code (#7983)	2 months ago
redis3	chore: execute goimports to format the code (#7983)	2 months ago
redis_lua	chore: execute goimports to format the code (#7983)	2 months ago
rocksdb	Add error list each entry func (#7485)	4 months ago
sqlite	serialize sqlite operations	3 years ago
store_test	Add error list each entry func (#7485)	4 months ago
tarantool	Add error list each entry func (#7485)	4 months ago
tikv	Add keyPrefix support for TiKV store (#7756)	3 months ago
ydb	chore: execute goimports to format the code (#7983)	2 months ago
configuration.go	chore: execute goimports to format the code (#7983)	2 months ago
entry.go	chore: execute goimports to format the code (#7983)	2 months ago
entry_codec.go	Reduce memory allocations in hot paths (#7725)	3 months ago
filechunk_group.go	mount: improve read throughput with parallel chunk fetching (#7569)	3 months ago
filechunk_group_test.go	Context cancellation during reading range reading large files (#7093)	7 months ago
filechunk_manifest.go	Fix S3 Gateway Read Failover #8076 (#8087)	2 months ago
filechunk_manifest_test.go	move to https://github.com/seaweedfs/seaweedfs	4 years ago
filechunk_section.go	mount: improve read throughput with parallel chunk fetching (#7569)	3 months ago
filechunk_section_test.go	chore: execute goimports to format the code (#7983)	2 months ago
filechunks.go	fix: multipart upload ETag calculation (#8238)	1 month ago
filechunks2_test.go	chore: execute goimports to format the code (#7983)	2 months ago
filechunks_read.go	chore: execute goimports to format the code (#7983)	2 months ago
filechunks_read_test.go	chore: execute goimports to format the code (#7983)	2 months ago
filechunks_test.go	S3 API: Advanced IAM System (#7160)	7 months ago
filer.go	Add Spark Iceberg catalog integration tests and CI support (#8242)	1 month ago
filer_buckets.go	Prevent bucket renaming in filer, fuse mount, and S3 (#8048)	2 months ago
filer_conf.go	Reduce memory allocations in hot paths (#7725)	3 months ago
filer_conf_test.go	Reduce memory allocations in hot paths (#7725)	3 months ago
filer_delete_entry.go	Prevent bucket renaming in filer, fuse mount, and S3 (#8048)	2 months ago
filer_deletion.go	Filer: Add retry mechanism for failed file deletions (#7402)	4 months ago
filer_deletion_test.go	Filer: Add retry mechanism for failed file deletions (#7402)	4 months ago
filer_hardlink.go	move to https://github.com/seaweedfs/seaweedfs	4 years ago
filer_notify.go	fix: skip log files with deleted volumes in filer backup (#7692)	3 months ago
filer_notify_append.go	fix: include DiskType in metadata log volume assignment (#7918)	2 months ago
filer_notify_read.go	fix: skip log files with deleted volumes in filer backup (#7692)	3 months ago
filer_notify_test.go	refactor filer_pb.Entry and filer.Entry to use GetChunks()	3 years ago
filer_on_meta_event.go	filer: async empty folder cleanup via metadata events (#7614)	3 months ago
filer_rename.go	Prevent bucket renaming in filer, fuse mount, and S3 (#8048)	2 months ago
filer_search.go	filer: async empty folder cleanup via metadata events (#7614)	3 months ago
filerstore.go	Add error list each entry func (#7485)	4 months ago
filerstore_hardlink.go	Changes logging function (#6919)	9 months ago
filerstore_translate_path.go	Add error list each entry func (#7485)	4 months ago
filerstore_wrapper.go	chore: execute goimports to format the code (#7983)	2 months ago
interval_list.go	Mount concurrent read (#4400)	3 years ago
interval_list_test.go	chore: execute goimports to format the code (#7983)	2 months ago
meta_aggregator.go	Fix unaligned 64-bit atomic operation on ARM32 (#7958) (#7959)	2 months ago
meta_replay.go	"golang.org/x/exp/slices" => "slices" and go fmt	1 year ago
read_remote.go	chore: execute goimports to format the code (#7983)	2 months ago
read_write.go	added context to filer_client method calls (#6808)	10 months ago
reader_at.go	mount: improve read throughput with parallel chunk fetching (#7627)	3 months ago
reader_at_test.go	Context cancellation during reading range reading large files (#7093)	7 months ago
reader_cache.go	mount: improve read throughput with parallel chunk fetching (#7627)	3 months ago
reader_cache_test.go	chore: execute goimports to format the code (#7983)	2 months ago
reader_pattern.go	Fix a few data races when reading files in mount (#3527)	4 years ago
remote_mapping.go	chore: execute goimports to format the code (#7983)	2 months ago
remote_storage.go	chore: execute goimports to format the code (#7983)	2 months ago
remote_storage_test.go	chore: execute goimports to format the code (#7983)	2 months ago
s3iam_conf.go	convert error fromating to %w everywhere (#6995)	8 months ago
s3iam_conf_test.go	s3: fix configuring IAM for the same user	4 years ago
stream.go	Fix S3 Gateway Read Failover #8076 (#8087)	2 months ago
stream_failover_test.go	Fix S3 Gateway Read Failover #8076 (#8087)	2 months ago
topics.go	merge current message queue code changes (#6201)	1 year ago