seaweedfs

Commit Graph

Author	SHA1	Message	Date
Ping Qiu	f501c63009	feat: CP11B-2 explainable placement / plan API New POST /block/volume/plan endpoint returns full placement preview: resolved policy, ordered candidate list, selected primary/replicas, and per-server rejection reasons with stable string constants. Core design: evaluateBlockPlacement() is a pure function with no registry/topology dependency. gatherPlacementCandidates() is the single topology bridge point. Plan and create share the same planner — parity contract is same ordered candidate list for same cluster state. Create path refactored: uses evaluateBlockPlacement() instead of PickServer(), iterates all candidates (no 3-retry cap), recomputes replica order after primary fallback. rf_not_satisfiable severity is durability-mode-aware (warning for best_effort, error for strict). 15 unit tests + 20 QA adversarial tests. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	1 week ago
Ping Qiu	683969086c	feat: CP11B-1 provisioning presets + review fixes Preset system: ResolvePolicy resolves named presets (database, general, throughput) with per-field overrides into concrete volume parameters. Create path now uses resolved policy instead of ad-hoc validation. New /block/volume/resolve diagnostic endpoint for dry-run resolution. Review fix 1 (MED): HasNVMeCapableServer now derives NVMe capability from server-level heartbeat attribute (block_nvme_addr proto field) instead of scanning volume entries. Fixes false "no NVMe" warning on fresh clusters with NVMe-capable servers but no volumes yet. Review fix 2 (LOW): /block/volume/resolve no longer proxied to leader — read-only diagnostic endpoint can be served by any master. Engine fix: ReadLBA retry loop closes stale dirty-map race when WAL entry is recycled between lookup and read. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	1 week ago
Ping Qiu	075ff52219	feat: CP11B-3 safe ops — promotion hardening, preflight, manual promote Six-task checkpoint hardening the promotion and failover paths: T1: 4-gate candidate evaluation (heartbeat freshness, WAL lag, role, server liveness) with structured rejection reasons. T2: Orphaned-primary re-evaluation on replica reconnect (B-06/B-08). T3: Deferred timer safety — epoch validation prevents stale timers from firing on recreated/changed volumes (B-07). T4: Rebuild addr cleanup on promotion (B-11), NVMe publication refresh on heartbeat, and preflight endpoint wiring. T5: Manual promote API — POST /block/volume/{name}/promote with force flag, target server selection, and structured rejection response. Shared applyPromotionLocked/finalizePromotion helpers eliminate duplication between auto and manual paths. T6: Read-only preflight endpoint (GET /block/volume/{name}/preflight) and blockapi client wrappers (Preflight, Promote). BUG-T5-1: PromotionsTotal counter moved to finalizePromotion (shared by both auto and manual paths) to prevent metrics divergence. 24 files changed, ~6500 lines added. 42 new QA adversarial tests. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	3 weeks ago
Ping Qiu	1b3edd7856	feat: CP11A-2 coordinated expand protocol for replicated block volumes Two-phase prepare/commit/cancel protocol ensures all replicas expand atomically. Standalone volumes use direct-commit (unchanged behavior). Engine: PrepareExpand/CommitExpand/CancelExpand with on-disk PreparedSize+ExpandEpoch in superblock, crash recovery clears stale prepare state on open, v.mu serializes concurrent expand operations. Proto: 3 new RPCs (PrepareExpand/CommitExpand/CancelExpandBlockVolume). Coordinator: expandClean flag pattern — ReleaseExpandInflight only on clean success or full cancel. Partial replica commit failure calls MarkExpandFailed (keeps ExpandInProgress=true, suppresses heartbeat size updates). ClearExpandFailed for manual reconciliation. Registry: AcquireExpandInflight records PendingExpandSize+ExpandEpoch. ExpandFailed state blocks new expands until cleared. Tests: 15 engine + 4 VS + 10 coordinator + heartbeat suppression regression + updated QA CP82/durability tests with prepare/commit mocks. Also includes CP11A-1 remaining: QA storage profile tests, QA io_backend config tests, testrunner perf-baseline scenarios and coordinated-expand actions. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	3 weeks ago
Ping Qiu	9ef446d0cf	feat: master-backed NVMe/TCP publication (nvme_addr + nqn plumbing) Add nvme_addr and nqn fields to proto messages (AllocateBlockVolume, CreateBlockVolume, LookupBlockVolume, BlockVolumeInfoMessage), wire through volume server → master registry → CSI driver. Volume servers report NVMe address in heartbeats when NVMe target is running. CSI MasterVolumeClient now populates NvmeAddr/NQN from master responses, enabling NVMe/TCP via the master-backend path. Proto files regenerated with protoc 29.5. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	3 weeks ago
Ping Qiu	da1b81d1c9	feat: CP8-3-1 durability modes + testrunner platform + 21 adversarial tests Durability mode implementation (sync_all, sync_quorum, best_effort): - DurabilityMode type with superblock persistence, parse/validate/string - MakeDistributedSync mode-aware barrier enforcement in dist_group_commit - blockerr sentinel package (ErrDurabilityBarrierFailed, ErrDurabilityQuorumLost) - gRPC create path: mode validation, idempotent create consistency, partial cleanup - F1: strict mode rejects partial replica provisioning with cleanup - F3: empty heartbeat does not overwrite persisted strict mode - F4: SCSI error mapping uses errors.Is sentinels (not string matching) - Proto/wire/blockapi/CLI/UI plumbing for durability_mode field - Observability dashboard: cluster health cards + per-volume columns Testrunner platform (YAML-driven integration test framework): - Engine, parser, registry, reporter (JUnit XML + HTML), metrics scraping - 52 registered actions: block, iSCSI, I/O, fault injection, assertions - Baseline regression framework with 7 hard-fail conditions - 15 YAML scenarios (smoke, crash, HA, fault, consistency, snapshot) - 49 unit tests for testrunner internals QA adversarial suite (21 tests, all PASS): - Idempotent create mode/RF mismatch detection - Heartbeat mode downgrade prevention (F3) - sync_all/sync_quorum partial replica enforcement (F1) - Concurrent create race safety - Failover/expand mode preservation - Cleanup resilience when delete fails - Master restart auto-register mode handling - Superblock roundtrip all 3 modes - Validate edge cases (mode×RF matrix) - RequiredReplicas quorum math verification - Sentinel error categorization Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	4 weeks ago
Ping Qiu	979a9b496c	feat: Phase 8 CP8-1/2/3/4 -- ops control plane, multi-replica, CSI snapshots, observability CP8-1: HTTP REST API (create/delete/lookup/list/assign/servers), blockapi Go client with multi-master failover, 5 shell commands, HTML dashboard at /block/. CP8-2: RF=2/RF=3 multi-replica support -- ShipperGroup fan-out, distributed sync, health scoring, segment-based scrub, gated promotion (heartbeat freshness + WAL LSN + role checks), failover/rebuild for N>2 replicas. CP8-3: CSI snapshot + expansion -- CreateSnapshot/DeleteSnapshot/ListSnapshots RPCs, NodeExpandVolume with iSCSI rescan, snapshot ID helpers, 20 adversarial tests covering concurrent ops, edge cases, and error injection. CP8-4: Observability -- EngineMetrics atomic counters for flusher/group-commit/ WAL-shipper/scrub, 10 new Prometheus metrics, barrier_lag_lsn SLO gauge, failover/promotion/rebuild counters, request ID correlation in master gRPC logs, baseline regression framework with 7 hard-fail conditions. Total: 63 files, ~11.2K LOC, 160+ new tests. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	4 weeks ago
Ping Qiu	8b2b5f6f66	feat: Phase 6 CP6-3 -- failover + rebuild in Kubernetes, 126 tests Wire low-level fencing primitives to master/VS control plane and CSI: - Proto: replica/rebuild address fields on assignment/info/response messages - Assignment queue: retain-until-confirmed (Peek+Confirm), stale epoch pruning - VS assignment receiver: processes assignments from HeartbeatResponse - BlockService replication: ProcessAssignments, deterministic ports (FNV hash) - Registry replica tracking: SetReplica/ClearReplica/SwapPrimaryReplica - CreateBlockVolume: primary + replica, enqueues assignments, single-copy mode - Failover: lease-aware promotion, deferred timers with cancellation on reconnect - ControllerPublish: returns fresh primary iSCSI address after failover - Recovery: recoverBlockVolumes drains pendingRebuilds, enqueues Rebuilding - Real integration tests on M02: failover address switch, rebuild data consistency, full lifecycle failover+rebuild (3 tests, all PASS) Review fixes (12 findings, 5 High, 5 Medium, 2 Low): - R1-1: AllocateBlockVolume returns replication ports - R1-2: setupPrimaryReplication starts rebuild server - R1-3: VS sends periodic block heartbeat for assignment confirmation - R2-F1: LastLeaseGrant set before Register (no stale-lease race) - R2-F2: Deferred promotion timers cancelled on VS reconnect - R2-F3: SwapPrimaryReplica uses RoleToWire instead of uint32(1) - R2-F4: DeleteBlockVolume deletes replica (best-effort) - R2-F5: SwapPrimaryReplica computes epoch atomically under lock - QA: SetReplica removes old replica from byServer index (BUG-QA-CP63-1) 126 CP6-3 tests (67 dev + 48 QA + 8 integration + 3 real). Cumulative Phase 6: 352 tests. All PASS. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	4 weeks ago
Ping Qiu	5a9a52f2d0	feat: Phase 6 CP6-2 -- CSI control-plane integration + csi-sanity/k3s validation CP6-2 wires the CSI driver to SeaweedFS master/volume-server control plane: - Proto: block volume messages in master.proto/volume_server.proto, codegen - Master registry: in-memory BlockVolumeRegistry with Pending->Active status, full/delta heartbeat, inflight lock, placement (fewest volumes) - VS gRPC: AllocateBlockVolume/DeleteBlockVolume handlers, shared naming - Master RPCs: CreateBlockVolume (retry up to 3 servers), Delete, Lookup - Heartbeat: block volume fields wired into bidirectional stream - CSI Controller: VolumeBackend interface (Local + Master), returns volume_context - CSI Node: reads volume_context for remote targets, staged map + IQN derivation - Mode flag: --mode=controller/node/all, --master for control-plane - K8s manifests: csi-driver.yaml, csi-controller.yaml, csi-node.yaml csi-sanity conformance (33 pass, 58 skip) found 6 bugs: - BUG-SANITY-1/2/3: missing VolumeCapabilities/VolumeCapability validation - BUG-SANITY-4: NodePublish used mount instead of bind mount - BUG-SANITY-5: NodeUnpublish didn't remove target path - BUG-SANITY-6: NodeUnpublish failed on unmounted path k3s Level 4 (PVC->Pod data persistence) found 1 bug: - BUG-K3S-1: IsLoggedIn didn't handle iscsiadm exit code 21 226 CSI tests + 54 server tests = 280 new tests, all passing. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	4 weeks ago
Chris Lu	57ab99d13e	fix: generate topology uuid uniformly in single-master mode (#8405 ) * fix: ensure topology uuid is generated in single master setups * ensureTopologyId adds a Hashicorp-aware implementation * simplify	1 month ago
Chris Lu	3300874cb5	filer: add default log purging to master maintenance scripts (#8359 ) * filer: add default log purging to master maintenance scripts * filer: fix default maintenance scripts to include full set of tasks * filer: refactor maintenance scripts to avoid duplication	1 month ago
Chris Lu	cb9e21cdc5	Normalize hashicorp raft peer ids (#8253 ) * Normalize raft voter ids * 4.11 * Update raft_hashicorp.go	2 months ago
Chris Lu	753e1db096	Prevent split-brain: Persistent ClusterID and Join Validation (#8022 ) * Prevent split-brain: Persistent ClusterID and Join Validation - Persist ClusterId in Raft store to survive restarts. - Validate ClusterId on Raft command application (piggybacked on MaxVolumeId). - Prevent masters with conflicting ClusterIds from joining/operating together. - Update Telemetry to report the persistent ClusterId. * Refine ClusterID validation based on feedback - Improved error message in cluster_commands.go. - Added ClusterId mismatch check in RaftServer.Recovery. * Handle Raft errors and support Hashicorp Raft for ClusterId - Check for errors when persisting ClusterId in legacy Raft. - Implement ClusterId generation and persistence for Hashicorp Raft leader changes. - Ensure consistent error logging. * Refactor ClusterId validation - Centralize ClusterId mismatch check in Topology.SetClusterId. - Simplify MaxVolumeIdCommand.Apply and RaftServer.Recovery to rely on SetClusterId. * Fix goroutine leak and add timeout - Handle channel closure in Hashicorp Raft leader listener. - Add timeout to Raft Apply call to prevent blocking. * Fix deadlock in legacy Raft listener - Wrap ClusterId generation/persistence in a goroutine to avoid blocking the Raft event loop (deadlock). * Rename ClusterId to SystemId - Renamed ClusterId to SystemId across the codebase (protobuf, topology, server, telemetry). - Regenerated telemetry.pb.go with new field. * Rename SystemId to TopologyId - Rename to SystemId was intermediate step. - Final name is TopologyId for the persistent cluster identifier. - Updated protobuf, topology, raft server, master server, and telemetry. * Optimize Hashicorp Raft listener - Integrated TopologyId generation into existing monitorLeaderLoop. - Removed extra goroutine in master_server.go. * Fix optimistic TopologyId update - Removed premature local state update of TopologyId in master_server.go and raft_hashicorp.go. - State is now solely updated via the Raft state machine Apply/Restore methods after consensus. * Add explicit log for recovered TopologyId - Added glog.V(0) info log in RaftServer.Recovery to print the recovered TopologyId on startup. * Add Raft barrier to prevent TopologyId race condition - Implement ensureTopologyId helper method - Send no-op MaxVolumeIdCommand to sync Raft log before checking TopologyId - Ensures persisted TopologyId is recovered before generating new one - Prevents race where generation happens during log replay * Serialize TopologyId generation with mutex - Add topologyIdGenLock mutex to MasterServer struct - Wrap ensureTopologyId method with lock to prevent concurrent generation - Fixes race where event listener and manual leadership check both generate IDs - Second caller waits for first to complete and sees the generated ID * Add TopologyId recovery logging to Apply method - Change log level from V(1) to V(0) for visibility - Log 'Recovered TopologyId' when applying from Raft log - Ensures recovery is visible whether from snapshot or log replay - Matches Recovery() method logging for consistency * Fix Raft barrier timing issue - Add 100ms delay after barrier command to ensure log application completes - Add debug logging to track barrier execution and TopologyId state - Return early if barrier command fails - Prevents TopologyId generation before old logs are fully applied * ensure leader * address comments * address comments * redundant * clean up * double check * refactoring * comment	2 months ago
Chris Lu	07dc552e1c	master: Fix raft url (#7255 ) * fix signature * fix url scheme	6 months ago
Dmitriy Pavlov	cd78e653e1	add disable volume_growth flag (#7196 )	7 months ago
Chris Lu	e446234e9c	remove spoof-able request header (#7103 ) * remove spoof-able request header https://github.com/seaweedfs/seaweedfs/issues/7094#issuecomment-3158320497 * Update weed/security/guard.go Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>	8 months ago
Chris Lu	0703308270	remote address parsing should handle special cases (#7101 ) * remote address parsing should handle special cases * handling ipv6 * simplify * Update weed/security/guard.go Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Update weed/security/guard.go Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> * x-real-ip * Update guard.go * fixes Hostname Whitelisting: Fully restored - supports localhost, example.com, etc. IP Whitelisting: Still works - supports exact IPs and CIDR ranges Header Support: Consistent handling of X-Forwarded-For, X-Real-IP * simplify * Update weed/security/guard.go Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> * Update weed/security/guard.go Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> * Update guard.go * adjust function signature * Update weed/security/guard.go Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> * indention * skip empty host --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	8 months ago
chrislu	798f797158	use float for sleep seconds fix https://github.com/seaweedfs/seaweedfs/pull/6795	9 months ago
chrislu	1733d0ce68	remove features and deployments fields	9 months ago
Chris Lu	a1aab8a083	add telemetry (#6926 ) * add telemetry * fix go mod * add default telemetry server url * Update README.md * replace with broker count instead of s3 count * Update telemetry.pb.go * github action to deploy	9 months ago
Aleksey Kosov	5182d46e22	Added middleware for processing request_id grpc and http requests (#6805 )	10 months ago
Lisandro Pin	fc4df944a0	Remove rate limit semaphore on master's leader selection logic. (#6494 ) This was introduced by `054374c7` (2024-03-12) and serves no practical purpose, yet it caps the maximum QPS master servers can handle.	1 year ago
Konstantin Lebedev	b65eb2ec45	[security] reload whiteList on http seerver (#6302 ) * reload whiteList * white_list add to scaffold	1 year ago
Konstantin Lebedev	fec88e64eb	[master] update LastLeaderChangeTime for hashicorp raft (#6292 )	1 year ago
chrislu	ccf1795e6f	wait a bit before getting the next volume id if the leader is recently elected	1 year ago
chrislu	6564ceda91	skip resource heavy commands from running on master nodes	2 years ago
chrislu	4463296811	add parallel vacuuming	2 years ago
Riccardo Bertossa	6fe8639504	add http endpoint to get the size of a collection (#5910 )	2 years ago
wyang	4b1f539ab8	fix allocate reduplicated volumeId to different volume (#5811 ) * fix allocate reduplicated volumeId to different volume * only check barrier when read --------- Co-authored-by: Yang Wang <yangwang@weride.ai>	2 years ago
vadimartynov	86d92a42b4	Added tls for http clients (#5766 ) * Added global http client * Added Do func for global http client * Changed the code to use the global http client * Fix http client in volume uploader * Fixed pkg name * Fixed http util funcs * Fixed http client for bench_filer_upload * Fixed http client for stress_filer_upload * Fixed http client for filer_server_handlers_proxy * Fixed http client for command_fs_merge_volumes * Fixed http client for command_fs_merge_volumes and command_volume_fsck * Fixed http client for s3api_server * Added init global client for main funcs * Rename global_client to client * Changed: - fixed NewHttpClient; - added CheckIsHttpsClientEnabled func - updated security.toml in scaffold * Reduce the visibility of some functions in the util/http/client pkg * Added the loadSecurityConfig function * Use util.LoadSecurityConfiguration() in NewHttpClient func	2 years ago
Konstantin Lebedev	67edf1d014	[master] Do Automatic Volume Grow in background (#5781 ) * Do Automatic Volume Grow in backgound * pass lastGrowCount to master * fix build * fix count to uint64	2 years ago
vadimartynov	8aae82dd71	Added context for the MasterClient's methods to avoid endless loops (#5628 ) * Added context for the MasterClient's methods to avoid endless loops * Returned WithClient function. Added WithClientCustomGetMaster function * Hid unused ctx arguments * Using a common context for the KeepConnectedToMaster and WaitUntilConnected functions * Changed the context termination check in the tryConnectToMaster function * Added a child context to the tryConnectToMaster function * Added a common context for KeepConnectedToMaster and WaitUntilConnected functions in benchmark	2 years ago
shenxingwuying	ee25ada732	reduce ambiguity about use memory_sequencer (#5555 )	2 years ago
chrislu	55976ae04a	avoid repeated calls to heavy-weighted viper	2 years ago
chrislu	d9490c5e1f	rename	2 years ago
Nico D'Cotta	796b7508f3	Implement SRV lookups for filer (#4767 )	3 years ago
chrislu	a315490f7d	proxy to master uses http address fix https://github.com/seaweedfs/seaweedfs/issues/4607	3 years ago
chrislu	adb90bd252	avoid lower casing the command fix https://github.com/seaweedfs/seaweedfs/pull/4321	3 years ago
Konstantin Lebedev	b9933d5589	master server graceful stop (#3797 )	4 years ago
Konstantin Lebedev	e90ab4ac60	avoid race conditions for OnPeerUpdate (#3525 ) https://github.com/seaweedfs/seaweedfs/issues/3524	4 years ago
Patrick Schmidt	7b424a54dc	Add raft server access mutex to avoid races (#3503 )	4 years ago
chrislu	10414fd81c	ping timeout at 15 seconds this 72 minute timeout setting seems unreasonably long 15 seconds is around the time when a new raft leader should be elected.	4 years ago
askeipx	2e78a522ab	remove old raft servers if they don't answer to pings for too long (#3398 ) * remove old raft servers if they don't answer to pings for too long add ping durations as options rename ping fields fix some todos get masters through masterclient raft remove server from leader use raft servers to ping them CheckMastersAlive for hashicorp raft only * prepare blocking ping * pass waitForReady as param * pass waitForReady through all functions * waitForReady works * refactor * remove unneeded params * rollback unneeded changes * fix	4 years ago
Konstantin Lebedev	4d4cd0948d	avoid infinite loop WaitUntilConnected() (#3431 ) https://github.com/seaweedfs/seaweedfs/issues/3421	4 years ago
Konstantin Lebedev	a98f6d66a3	rollback over onPeerupdate implementation of automatic clean-up of failed servers in favor of synchronous ping	4 years ago
chrislu	26dbc6c905	move to https://github.com/seaweedfs/seaweedfs	4 years ago
chrislu	bb01b68fa0	refactor	4 years ago
chrislu	68065128b8	add dc and rack	4 years ago
chrislu	3828b8ce87	"github.com/chrislusf/raft" => "github.com/seaweedfs/raft"	4 years ago
Konstantin Lebedev	c88ea31f62	fix RUnlock of unlocked RWMutex	4 years ago

1 2 3 4

152 Commits (a9a5e455c657aab2bbe254c7ecb2c4bcb8ed3913)