seaweedfs

History

Chris Lu 81369b8a83 improve: large file sync throughput for remote.cache and filer.sync (#8676 ) * improve large file sync throughput for remote.cache and filer.sync Three main throughput improvements: 1. Adaptive chunk sizing for remote.cache: targets ~32 chunks per file instead of always starting at 5MB. A 500MB file now uses ~16MB chunks (32 chunks) instead of 5MB chunks (100 chunks), reducing per-chunk overhead (volume assign, gRPC call, needle write) by 3x. 2. Configurable concurrency at every layer: - remote.cache chunk concurrency: -chunkConcurrency flag (default 8) - remote.cache S3 download concurrency: -downloadConcurrency flag (default raised from 1 to 5 per chunk) - filer.sync chunk concurrency: -chunkConcurrency flag (default 32) 3. S3 multipart download concurrency raised from 1 to 5: the S3 manager downloader was using Concurrency=1, serializing all part downloads within each chunk. This alone can 5x per-chunk download speed. The concurrency values flow through the gRPC request chain: shell command → CacheRemoteObjectToLocalClusterRequest → FetchAndWriteNeedleRequest → S3 downloader Zero values in the request mean "use server defaults", maintaining full backward compatibility with existing callers. Ref #8481 * fix: use full maxMB for chunk size cap and remove loop guard Address review feedback: - Use full maxMB instead of maxMB/2 for maxChunkSize to avoid unnecessarily limiting chunk size for very large files. - Remove chunkSize < maxChunkSize guard from the safety loop so it can always grow past maxChunkSize when needed to stay under 1000 chunks (e.g., extremely large files with small maxMB). * address review feedback: help text, validation, naming, docs - Fix help text for -chunkConcurrency and -downloadConcurrency flags to say "0 = server default" instead of advertising specific numeric defaults that could drift from the server implementation. - Validate chunkConcurrency and downloadConcurrency are within int32 range before narrowing, returning a user-facing error if out of range. - Rename ReadRemoteErr to readRemoteErr to follow Go naming conventions. - Add doc comment to SetChunkConcurrency noting it must be called during initialization before replication goroutines start. - Replace doubling loop in chunk size safety check with direct ceil(remoteSize/1000) computation to guarantee the 1000-chunk cap. * address Copilot review: clamp concurrency, fix chunk count, clarify proto docs - Use ceiling division for chunk count check to avoid overcounting when file size is an exact multiple of chunk size. - Clamp chunkConcurrency (max 1024) and downloadConcurrency (max 1024 at filer, max 64 at volume server) to prevent excessive goroutines. - Always use ReadFileWithConcurrency when the client supports it, falling back to the implementation's default when value is 0. - Clarify proto comments that download_concurrency only applies when the remote storage client supports it (currently S3). - Include specific server defaults in help text (e.g., "0 = server default 8") so users see the actual values in -h output. * fix data race on executionErr and use %w for error wrapping - Protect concurrent writes to executionErr in remote.cache worker goroutines with a sync.Mutex to eliminate the data race. - Use %w instead of %v in volume_grpc_remote.go error formatting to preserve the error chain for errors.Is/errors.As callers.		2 days ago
..
constants	Nit: use `time.Duration`s instead of constants in seconds. (#7438)	5 months ago
filer_ui	fix FetchAndWriteNeedle to await all writes before checking errors	2 days ago
master_ui	fix FetchAndWriteNeedle to await all writes before checking errors	2 days ago
postgres	Move SQL engine and PostgreSQL server to their own binaries (#8417)	3 weeks ago
volume_server_ui	fix: EC UI template error when viewing shard details (#7955)	3 months ago
common.go	s3/iam: reuse one request id per request (#8538)	2 weeks ago
common_test.go	jwt check the base file id	7 years ago
filer_grpc_server.go	filer: add conditional update preconditions (#8647)	4 days ago
filer_grpc_server_admin.go	chore: execute goimports to format the code (#7983)	2 months ago
filer_grpc_server_dlm.go	reduce logs	3 months ago
filer_grpc_server_kv.go	chore: execute goimports to format the code (#7983)	2 months ago
filer_grpc_server_remote.go	improve: large file sync throughput for remote.cache and filer.sync (#8676)	2 days ago
filer_grpc_server_rename.go	Prevent bucket renaming in filer, fuse mount, and S3 (#8048)	2 months ago
filer_grpc_server_sub_meta.go	avoid repeated reading disk (#7369)	5 months ago
filer_grpc_server_traverse_meta.go	Add error list each entry func (#7485)	4 months ago
filer_grpc_server_traverse_meta_test.go	chore: execute goimports to format the code (#7983)	2 months ago
filer_jwt_test.go	fix Filer startup failure due to JWT on / path #8149 (#8167)	2 months ago
filer_server.go	Fix: filer not yet available in s3.configure (#8198)	1 month ago
filer_server_handlers.go	fix Filer startup failure due to JWT on / path #8149 (#8167)	2 months ago
filer_server_handlers_copy.go	use one http client	7 months ago
filer_server_handlers_iam_grpc.go	Implement IAM propagation to S3 servers (#8130)	2 months ago
filer_server_handlers_proxy.go	Add data file compaction to iceberg maintenance (Phase 2) (#8503)	5 days ago
filer_server_handlers_proxy_test.go	fix(filer): limit concurrent proxy reads per volume server (#8608)	1 week ago
filer_server_handlers_read.go	S3: set identity to request context, and remove obsolete code (#7523)	4 months ago
filer_server_handlers_read_dir.go	chore: execute goimports to format the code (#7983)	2 months ago
filer_server_handlers_tagging.go	Changes logging function (#6919)	9 months ago
filer_server_handlers_write.go	Reduce memory allocations in hot paths (#7725)	3 months ago
filer_server_handlers_write_autochunk.go	filer write request use context without cancellation (#7567)	4 months ago
filer_server_handlers_write_cipher.go	convert error fromating to %w everywhere (#6995)	8 months ago
filer_server_handlers_write_merge.go	S3 API: Add SSE-KMS (#7144)	7 months ago
filer_server_handlers_write_upload.go	fix: multipart upload ETag calculation (#8238)	1 month ago
filer_server_rocksdb.go	go fix	4 weeks ago
filer_server_tus_handlers.go	Add TUS protocol support for resumable uploads (#7592)	3 months ago
filer_server_tus_session.go	Add TUS protocol support for resumable uploads (#7592)	3 months ago
master_grpc_server.go	fix logs	2 weeks ago
master_grpc_server_admin.go	chore: execute goimports to format the code (#7983)	2 months ago
master_grpc_server_assign.go	master: return 503/Unavailable during topology warmup after leader change (#8529)	2 weeks ago
master_grpc_server_cluster.go	chore: execute goimports to format the code (#7983)	2 months ago
master_grpc_server_collection.go	move to https://github.com/seaweedfs/seaweedfs	4 years ago
master_grpc_server_raft.go	fix: admin does not show all master servers #7999 (#8002)	2 months ago
master_grpc_server_raft_test.go	Add cluster.raft.leader.transfer command for graceful leader change (#7819)	3 months ago
master_grpc_server_volume.go	feat: auto-disable master vacuum when plugin worker is active (#8624)	6 days ago
master_server.go	feat: auto-disable master vacuum when plugin worker is active (#8624)	6 days ago
master_server_handlers.go	master: return 503/Unavailable during topology warmup after leader change (#8529)	2 weeks ago
master_server_handlers_admin.go	fix: generate topology uuid uniformly in single-master mode (#8405)	4 weeks ago
master_server_handlers_ui.go	hide millseconds in up time (#7553)	4 months ago
raft_common.go	Prevent split-brain: Persistent ClusterID and Join Validation (#8022)	2 months ago
raft_hashicorp.go	master: return 503/Unavailable during topology warmup after leader change (#8529)	2 weeks ago
raft_hashicorp_test.go	Normalize hashicorp raft peer ids (#8253)	1 month ago
raft_server.go	s3api: add AttachUserPolicy/DetachUserPolicy/ListAttachedUserPolicies (#8379)	4 weeks ago
raft_server_handlers.go	chore: execute goimports to format the code (#7983)	2 months ago
volume_grpc_admin.go	Add a version token on RPCs to read/update volume server states. (#8191)	1 month ago
volume_grpc_batch_delete.go	Block RPC write operations on volume servers when maintenance mode is enabled (#8115)	2 months ago
volume_grpc_client_to_master.go	go fix	4 weeks ago
volume_grpc_copy.go	Fix live volume move tail timestamp (#8440)	3 weeks ago
volume_grpc_copy_incremental.go	move to https://github.com/seaweedfs/seaweedfs	4 years ago
volume_grpc_erasure_coding.go	fix(ec): gather shards from all disk locations before rebuild (#8633)	5 days ago
volume_grpc_erasure_coding_test.go	iceberg: wire pagination for list namespaces/tables REST APIs (#8275)	1 month ago
volume_grpc_query.go	move to https://github.com/seaweedfs/seaweedfs	4 years ago
volume_grpc_read_all.go	chore: execute goimports to format the code (#7983)	2 months ago
volume_grpc_read_write.go	Block RPC write operations on volume servers when maintenance mode is enabled (#8115)	2 months ago
volume_grpc_remote.go	improve: large file sync throughput for remote.cache and filer.sync (#8676)	2 days ago
volume_grpc_scrub.go	Implement full scrubbing for EC volumes (#8318)	1 month ago
volume_grpc_state.go	Add a version token on RPCs to read/update volume server states. (#8191)	1 month ago
volume_grpc_tail.go	Block RPC write operations on volume servers when maintenance mode is enabled (#8115)	2 months ago
volume_grpc_tier_download.go	avoid load volume file with BytesOffset mismatch (#3841)	3 years ago
volume_grpc_tier_upload.go	Block RPC write operations on volume servers when maintenance mode is enabled (#8115)	2 months ago
volume_grpc_vacuum.go	Block RPC write operations on volume servers when maintenance mode is enabled (#8115)	2 months ago
volume_server.go	Add volume dir tags and EC placement priority (#8472)	3 weeks ago
volume_server_handlers.go	fix: JWT validation failures during replication (#7788) (#7795)	3 months ago
volume_server_handlers_admin.go	fix: volume server healthz now checks local conditions only (#7610)	4 months ago
volume_server_handlers_helper.go	directory structure change to work with glide	10 years ago
volume_server_handlers_read.go	fix: Use mime.FormatMediaType for RFC 6266 compliant Content-Disposition (#7635)	3 months ago
volume_server_handlers_ui.go	hide millseconds in up time (#7553)	4 months ago
volume_server_handlers_write.go	fix: multipart upload ETag calculation (#8238)	1 month ago
volume_server_test.go	Add a version token on RPCs to read/update volume server states. (#8191)	1 month ago
webdav_server.go	mount: improve read throughput with parallel chunk fetching (#7569)	4 months ago
wrapped_webdav_fs.go	chore: execute goimports to format the code (#7983)	2 months ago