seaweedfs

History

Chris Lu 753e1db096 Prevent split-brain: Persistent ClusterID and Join Validation (#8022 ) * Prevent split-brain: Persistent ClusterID and Join Validation - Persist ClusterId in Raft store to survive restarts. - Validate ClusterId on Raft command application (piggybacked on MaxVolumeId). - Prevent masters with conflicting ClusterIds from joining/operating together. - Update Telemetry to report the persistent ClusterId. * Refine ClusterID validation based on feedback - Improved error message in cluster_commands.go. - Added ClusterId mismatch check in RaftServer.Recovery. * Handle Raft errors and support Hashicorp Raft for ClusterId - Check for errors when persisting ClusterId in legacy Raft. - Implement ClusterId generation and persistence for Hashicorp Raft leader changes. - Ensure consistent error logging. * Refactor ClusterId validation - Centralize ClusterId mismatch check in Topology.SetClusterId. - Simplify MaxVolumeIdCommand.Apply and RaftServer.Recovery to rely on SetClusterId. * Fix goroutine leak and add timeout - Handle channel closure in Hashicorp Raft leader listener. - Add timeout to Raft Apply call to prevent blocking. * Fix deadlock in legacy Raft listener - Wrap ClusterId generation/persistence in a goroutine to avoid blocking the Raft event loop (deadlock). * Rename ClusterId to SystemId - Renamed ClusterId to SystemId across the codebase (protobuf, topology, server, telemetry). - Regenerated telemetry.pb.go with new field. * Rename SystemId to TopologyId - Rename to SystemId was intermediate step. - Final name is TopologyId for the persistent cluster identifier. - Updated protobuf, topology, raft server, master server, and telemetry. * Optimize Hashicorp Raft listener - Integrated TopologyId generation into existing monitorLeaderLoop. - Removed extra goroutine in master_server.go. * Fix optimistic TopologyId update - Removed premature local state update of TopologyId in master_server.go and raft_hashicorp.go. - State is now solely updated via the Raft state machine Apply/Restore methods after consensus. * Add explicit log for recovered TopologyId - Added glog.V(0) info log in RaftServer.Recovery to print the recovered TopologyId on startup. * Add Raft barrier to prevent TopologyId race condition - Implement ensureTopologyId helper method - Send no-op MaxVolumeIdCommand to sync Raft log before checking TopologyId - Ensures persisted TopologyId is recovered before generating new one - Prevents race where generation happens during log replay * Serialize TopologyId generation with mutex - Add topologyIdGenLock mutex to MasterServer struct - Wrap ensureTopologyId method with lock to prevent concurrent generation - Fixes race where event listener and manual leadership check both generate IDs - Second caller waits for first to complete and sees the generated ID * Add TopologyId recovery logging to Apply method - Change log level from V(1) to V(0) for visibility - Log 'Recovered TopologyId' when applying from Raft log - Ensures recovery is visible whether from snapshot or log replay - Matches Recovery() method logging for consistency * Fix Raft barrier timing issue - Add 100ms delay after barrier command to ensure log application completes - Add debug logging to track barrier execution and TopologyId state - Return early if barrier command fails - Prevents TopologyId generation before old logs are fully applied * ensure leader * address comments * address comments * redundant * clean up * double check * refactoring * comment		4 weeks ago
..
constants	Nit: use `time.Duration`s instead of constants in seconds. (#7438)	3 months ago
filer_ui	chore: execute goimports to format the code (#7983)	1 month ago
master_ui	Fix/parse upload filename (#6241)	1 year ago
postgres	Message Queue: Add sql querying (#7185)	5 months ago
volume_server_ui	fix: EC UI template error when viewing shard details (#7955)	1 month ago
common.go	fix: Use mime.FormatMediaType for RFC 6266 compliant Content-Disposition (#7635)	2 months ago
common_test.go	jwt check the base file id	7 years ago
filer_grpc_server.go	Fix remote.meta.sync TTL issue (#8021) (#8030)	4 weeks ago
filer_grpc_server_admin.go	chore: execute goimports to format the code (#7983)	1 month ago
filer_grpc_server_dlm.go	reduce logs	2 months ago
filer_grpc_server_kv.go	chore: execute goimports to format the code (#7983)	1 month ago
filer_grpc_server_remote.go	s3: fix remote object not caching (#7790)	2 months ago
filer_grpc_server_rename.go	Prevent bucket renaming in filer, fuse mount, and S3 (#8048)	4 weeks ago
filer_grpc_server_sub_meta.go	avoid repeated reading disk (#7369)	4 months ago
filer_grpc_server_traverse_meta.go	Add error list each entry func (#7485)	3 months ago
filer_grpc_server_traverse_meta_test.go	chore: execute goimports to format the code (#7983)	1 month ago
filer_jwt_test.go	feat: Optional path-prefix and method scoping for Filer HTTP JWT (#8014)	1 month ago
filer_server.go	s3: fix remote object not caching (#7790)	2 months ago
filer_server_handlers.go	feat: Optional path-prefix and method scoping for Filer HTTP JWT (#8014)	1 month ago
filer_server_handlers_copy.go	use one http client	6 months ago
filer_server_handlers_proxy.go	Changes logging function (#6919)	8 months ago
filer_server_handlers_read.go	S3: set identity to request context, and remove obsolete code (#7523)	3 months ago
filer_server_handlers_read_dir.go	chore: execute goimports to format the code (#7983)	1 month ago
filer_server_handlers_tagging.go	Changes logging function (#6919)	8 months ago
filer_server_handlers_write.go	Reduce memory allocations in hot paths (#7725)	2 months ago
filer_server_handlers_write_autochunk.go	filer write request use context without cancellation (#7567)	3 months ago
filer_server_handlers_write_cipher.go	convert error fromating to %w everywhere (#6995)	7 months ago
filer_server_handlers_write_merge.go	S3 API: Add SSE-KMS (#7144)	6 months ago
filer_server_handlers_write_upload.go	filer write request use context without cancellation (#7567)	3 months ago
filer_server_rocksdb.go	move to https://github.com/seaweedfs/seaweedfs	4 years ago
filer_server_tus_handlers.go	Add TUS protocol support for resumable uploads (#7592)	2 months ago
filer_server_tus_session.go	Add TUS protocol support for resumable uploads (#7592)	2 months ago
master_grpc_server.go	Boostrap persistent state for volume servers. (#7984)	1 month ago
master_grpc_server_admin.go	chore: execute goimports to format the code (#7983)	1 month ago
master_grpc_server_assign.go	add disable volume_growth flag (#7196)	5 months ago
master_grpc_server_cluster.go	chore: execute goimports to format the code (#7983)	1 month ago
master_grpc_server_collection.go	move to https://github.com/seaweedfs/seaweedfs	4 years ago
master_grpc_server_raft.go	fix: admin does not show all master servers #7999 (#8002)	1 month ago
master_grpc_server_raft_test.go	Add cluster.raft.leader.transfer command for graceful leader change (#7819)	2 months ago
master_grpc_server_volume.go	Support separate volume server ID independent of RPC bind address (#7609)	2 months ago
master_server.go	Prevent split-brain: Persistent ClusterID and Join Validation (#8022)	4 weeks ago
master_server_handlers.go	add disable volume_growth flag (#7196)	5 months ago
master_server_handlers_admin.go	follow grow volume option version	8 months ago
master_server_handlers_ui.go	hide millseconds in up time (#7553)	3 months ago
raft_common.go	Prevent split-brain: Persistent ClusterID and Join Validation (#8022)	4 weeks ago
raft_hashicorp.go	Prevent split-brain: Persistent ClusterID and Join Validation (#8022)	4 weeks ago
raft_server.go	Prevent split-brain: Persistent ClusterID and Join Validation (#8022)	4 weeks ago
raft_server_handlers.go	chore: execute goimports to format the code (#7983)	1 month ago
volume_grpc_admin.go	fix: restore volume mount when VolumeConfigure fails (#7669)	2 months ago
volume_grpc_batch_delete.go	fix has changes false if deleted result size eq zero (#4909)	2 years ago
volume_grpc_client_to_master.go	Boostrap persistent state for volume servers. (#7984)	1 month ago
volume_grpc_copy.go	chore: execute goimports to format the code (#7983)	1 month ago
volume_grpc_copy_incremental.go	move to https://github.com/seaweedfs/seaweedfs	4 years ago
volume_grpc_erasure_coding.go	Fix reporting of EC shard sizes from nodes to masters. (#7835)	2 months ago
volume_grpc_query.go	move to https://github.com/seaweedfs/seaweedfs	4 years ago
volume_grpc_read_all.go	chore: execute goimports to format the code (#7983)	1 month ago
volume_grpc_read_write.go	chore: execute goimports to format the code (#7983)	1 month ago
volume_grpc_remote.go	chore: execute goimports to format the code (#7983)	1 month ago
volume_grpc_tail.go	convert error fromating to %w everywhere (#6995)	7 months ago
volume_grpc_tier_download.go	avoid load volume file with BytesOffset mismatch (#3841)	3 years ago
volume_grpc_tier_upload.go	avoid load volume file with BytesOffset mismatch (#3841)	3 years ago
volume_grpc_vacuum.go	Fix no more writable volumes by delay judgment (#4548)	3 years ago
volume_server.go	Separate vacuum speed from replication speed (#7632)	2 months ago
volume_server_handlers.go	fix: JWT validation failures during replication (#7788) (#7795)	2 months ago
volume_server_handlers_admin.go	fix: volume server healthz now checks local conditions only (#7610)	2 months ago
volume_server_handlers_helper.go	directory structure change to work with glide	10 years ago
volume_server_handlers_read.go	fix: Use mime.FormatMediaType for RFC 6266 compliant Content-Disposition (#7635)	2 months ago
volume_server_handlers_ui.go	hide millseconds in up time (#7553)	3 months ago
volume_server_handlers_write.go	use context.WithoutCancel to avoid context cancellation when the client connection is closed	1 month ago
webdav_server.go	mount: improve read throughput with parallel chunk fetching (#7569)	2 months ago
wrapped_webdav_fs.go	chore: execute goimports to format the code (#7983)	1 month ago