seaweedfs

Commit Graph

Author	SHA1	Message	Date
pingqiu	abbc8bff2b	fix: canonicalize host in AllocateBlockVolumeResponse (CP13-2 follow-up) AllocateBlockVolumeResponse used bs.ListenAddr() to derive replica addresses. When the VS binds to ":port" (no explicit IP), host resolved to empty string, producing ":dataPort" as the replica address. This ":port" propagated through master assignments to both primary and replica sides. Now canonicalizes empty/wildcard host using PreferredOutboundIP() before constructing replication addresses. Also exported PreferredOutboundIP for use by the server package. This is the source fix — all downstream paths (heartbeat, API response, assignment) inherit the canonical address. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	6 days ago
pingqiu	ae87a31d22	fix: store canonical replica addresses in heartbeat state setupReplicaReceiver now reads back canonical addresses from the ReplicaReceiver (which applies CP13-2 canonicalization) instead of storing raw assignment addresses in replStates. This fixes the API-level leak where replica_data_addr showed ":port" instead of "ip:port" in /block/volumes responses, even though the engine-level CP13-2 fix was working. New BlockVol.ReplicaReceiverAddr() returns canonical addresses from the running receiver. Falls back to assignment addresses if receiver didn't report. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	6 days ago
Ping Qiu	c263d082b5	fix: restart reconciliation — trust roles, upsert replicas Same-epoch reconciliation now trusts reported roles first: - one claims primary, other replica → trust roles - both claim primary → WALHeadLSN heuristic tiebreak - both claim replica → keep existing, log ambiguity Replaced addServerAsReplica with upsertServerAsReplica: checks for existing replica entry by server name before appending. Prevents duplicate ReplicaInfo rows during restart/replay windows. 2 new tests: role-trusted same-epoch, duplicate replica prevention. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	1 week ago
Ping Qiu	9137fa6486	fix: epoch-based reconciliation on master restart reconstruction When a second server reports the same volume during master restart, UpdateFullHeartbeat now uses epoch-based tie-breaking instead of first-heartbeat-wins: 1. Higher epoch wins as primary — old entry demoted to replica 2. Same epoch — higher WALHeadLSN wins (heuristic, warning logged) 3. Lower epoch — added as replica Applied in both code paths: the auto-register branch (no entry exists yet for this name) and the unlinked-server branch (entry exists but this server is not in it). This is a deterministic reconstruction improvement, not ground truth. The long-term fix is persisting authoritative volume state. 5 new tests covering all reconciliation scenarios. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	1 week ago
Ping Qiu	a9a5e455c6	fix: Lookup/ListAll return copies, add UpdateEntry for safe mutation Lookup() and ListAll() now return value copies (not pointers to internal registry state). Callers can no longer mutate registry entries without holding a lock. Added clone() on BlockVolumeEntry with deep-copied Replicas slice. Added UpdateEntry(name, func(*BlockVolumeEntry)) for locked mutation. ListByServer() also returns copies. Migrated 1 production mutation (ReplicaPlacement + Preset in create handler) and ~20 test mutations to use UpdateEntry. 5 new copy-correctness tests: Lookup returns copy, Replicas slice isolated, ListAll returns copies, UpdateEntry mutates, UpdateEntry not-found error. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	1 week ago
Ping Qiu	bb691a5458	feat: CP11B-4 observability pack — health state, alerts, dashboard Health-state derivation: deriveHealthStateWithLiveness() computes per-volume state (unsafe > rebuilding > degraded > healthy) using role, replica count, durability mode, degraded flag, and primary server liveness. Used consistently in both volume responses and cluster summary. Extended GET /block/status with health counts (healthy, degraded, rebuilding, unsafe) and NVMe-capable server count. Response is now typed BlockStatusResponse instead of untyped map. Default alert pack: 7 Prometheus rules covering WAL pressure, flusher errors, replica degradation, rebuilding, scrub errors. Alert rules reference real seaweedfs_blockvol_* metric names. Default dashboard: Grafana JSON with 17 panels — cluster health, IOPS, latency P99, WAL pressure, flusher throughput, replication, scrub, dirty map, epoch. 17 tests: 9 health derivation, 1 cluster summary, 2 handler/API, 2 alert validation, 2 dashboard validation, 1 liveness parity. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	1 week ago
Ping Qiu	f501c63009	feat: CP11B-2 explainable placement / plan API New POST /block/volume/plan endpoint returns full placement preview: resolved policy, ordered candidate list, selected primary/replicas, and per-server rejection reasons with stable string constants. Core design: evaluateBlockPlacement() is a pure function with no registry/topology dependency. gatherPlacementCandidates() is the single topology bridge point. Plan and create share the same planner — parity contract is same ordered candidate list for same cluster state. Create path refactored: uses evaluateBlockPlacement() instead of PickServer(), iterates all candidates (no 3-retry cap), recomputes replica order after primary fallback. rf_not_satisfiable severity is durability-mode-aware (warning for best_effort, error for strict). 15 unit tests + 20 QA adversarial tests. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	1 week ago
Ping Qiu	683969086c	feat: CP11B-1 provisioning presets + review fixes Preset system: ResolvePolicy resolves named presets (database, general, throughput) with per-field overrides into concrete volume parameters. Create path now uses resolved policy instead of ad-hoc validation. New /block/volume/resolve diagnostic endpoint for dry-run resolution. Review fix 1 (MED): HasNVMeCapableServer now derives NVMe capability from server-level heartbeat attribute (block_nvme_addr proto field) instead of scanning volume entries. Fixes false "no NVMe" warning on fresh clusters with NVMe-capable servers but no volumes yet. Review fix 2 (LOW): /block/volume/resolve no longer proxied to leader — read-only diagnostic endpoint can be served by any master. Engine fix: ReadLBA retry loop closes stale dirty-map race when WAL entry is recycled between lookup and read. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	1 week ago
Ping Qiu	075ff52219	feat: CP11B-3 safe ops — promotion hardening, preflight, manual promote Six-task checkpoint hardening the promotion and failover paths: T1: 4-gate candidate evaluation (heartbeat freshness, WAL lag, role, server liveness) with structured rejection reasons. T2: Orphaned-primary re-evaluation on replica reconnect (B-06/B-08). T3: Deferred timer safety — epoch validation prevents stale timers from firing on recreated/changed volumes (B-07). T4: Rebuild addr cleanup on promotion (B-11), NVMe publication refresh on heartbeat, and preflight endpoint wiring. T5: Manual promote API — POST /block/volume/{name}/promote with force flag, target server selection, and structured rejection response. Shared applyPromotionLocked/finalizePromotion helpers eliminate duplication between auto and manual paths. T6: Read-only preflight endpoint (GET /block/volume/{name}/preflight) and blockapi client wrappers (Preflight, Promote). BUG-T5-1: PromotionsTotal counter moved to finalizePromotion (shared by both auto and manual paths) to prevent metrics divergence. 24 files changed, ~6500 lines added. 42 new QA adversarial tests. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	3 weeks ago
Ping Qiu	67f6e73ca7	fix: B-09 stale entry during expand, B-10 heartbeat deletes during expand B-09: ExpandBlockVolume re-reads the registry entry after acquiring the expand inflight lock. Previously it used the entry from the initial Lookup, which could be stale if failover changed VolumeServer or Replicas between Lookup and PREPARE. B-10: UpdateFullHeartbeat stale-cleanup now skips entries with ExpandInProgress=true. Previously a primary VS restart during coordinated expand would delete the entry (path not in heartbeat), orphaning the volume and stranding the expand coordinator. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	3 weeks ago
Ping Qiu	1b3edd7856	feat: CP11A-2 coordinated expand protocol for replicated block volumes Two-phase prepare/commit/cancel protocol ensures all replicas expand atomically. Standalone volumes use direct-commit (unchanged behavior). Engine: PrepareExpand/CommitExpand/CancelExpand with on-disk PreparedSize+ExpandEpoch in superblock, crash recovery clears stale prepare state on open, v.mu serializes concurrent expand operations. Proto: 3 new RPCs (PrepareExpand/CommitExpand/CancelExpandBlockVolume). Coordinator: expandClean flag pattern — ReleaseExpandInflight only on clean success or full cancel. Partial replica commit failure calls MarkExpandFailed (keeps ExpandInProgress=true, suppresses heartbeat size updates). ClearExpandFailed for manual reconciliation. Registry: AcquireExpandInflight records PendingExpandSize+ExpandEpoch. ExpandFailed state blocks new expands until cleared. Tests: 15 engine + 4 VS + 10 coordinator + heartbeat suppression regression + updated QA CP82/durability tests with prepare/commit mocks. Also includes CP11A-1 remaining: QA storage profile tests, QA io_backend config tests, testrunner perf-baseline scenarios and coordinated-expand actions. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	3 weeks ago
Ping Qiu	a7b1b4cb22	fix: propagate NVMe fields through replica creation, heartbeat, and promotion ReplicaInfo now carries NvmeAddr/NQN. Fields are populated during replica allocation (tryCreateOneReplica), updated from replica heartbeats, and copied in PromoteBestReplica. This ensures master lookup returns correct NVMe endpoints immediately after failover, without waiting for the first post-promotion heartbeat. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	3 weeks ago
Ping Qiu	9ef446d0cf	feat: master-backed NVMe/TCP publication (nvme_addr + nqn plumbing) Add nvme_addr and nqn fields to proto messages (AllocateBlockVolume, CreateBlockVolume, LookupBlockVolume, BlockVolumeInfoMessage), wire through volume server → master registry → CSI driver. Volume servers report NVMe address in heartbeats when NVMe target is running. CSI MasterVolumeClient now populates NvmeAddr/NQN from master responses, enabling NVMe/TCP via the master-backend path. Proto files regenerated with protoc 29.5. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	3 weeks ago
Ping Qiu	bbadeeb89b	feat: Phase 10 CP10-2 -- CSI NVMe/TCP node plugin, 210 tests NVMe/TCP transport support in the CSI driver so Kubernetes pods can mount block volumes via NVMe alongside (or instead of) iSCSI. Transport selection: NVMe preferred when nvme_tcp module loaded + metadata present + nvmeUtil available. Fail-fast on NVMe errors (no silent iSCSI fallback). .transport file persists across CSI restarts. Key changes: - BuildNQN() single source of truth for NQN construction (naming.go) - NVMeUtil interface + realNVMeUtil wrapping nvme-cli (nvme_util.go) - NodeStageVolume/Unstage/Expand dual-transport paths (node.go) - NvmeAddr/NQN fields in VolumeInfo, Controller contexts - VolumeManager NvmeAddr()/VolumeNQN() getters - BlockService NvmeListenAddr()/NQN() accessors - 27 unit tests + 26 QA adversarial tests (nvme_node_test.go, qa_cp102) - Fix: flaky TestQA_Node_ConcurrentStageUnstage (pre-alloc temp dirs) Review fixes applied: F1 (NQN format mismatch), F2 (CreateVolume drops NVMe context), F3 (IsConnected error classification), F4 (findSubsys path validation), F5 (MasterVolumeClient NVMe gap documented). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	3 weeks ago
Ping Qiu	0e234f5c80	feat: Phase 10 CP10-1 -- NVMe/TCP target MVP, 109 tests NVMe over Fabrics (TCP) target implementation sharing the same BlockVol engine, fencing, replication, and failover as the existing iSCSI target. New package: weed/storage/blockvol/nvme/ (11 files, 2,242 production LOC) - protocol.go: PDU types, opcodes, status codes, marshal/unmarshal - wire.go: TCP reader/writer with header bounds validation - controller.go: IC handshake, per-queue state, command dispatch, KATO - fabric.go: Connect (admin+IO), PropertyGet/Set, Disconnect - identify.go: Controller/Namespace/NS list/NS descriptors (Linux 5.15) - admin.go: SetFeatures, GetFeatures, GetLogPage (SMART/ANA), KeepAlive - io.go: Read (C2HData), Write (inline), Flush, WriteZeros/Trim - server.go: TCP listener, admin session registry, graceful shutdown - adapter.go: BlockVol-to-NVMe bridge, error mapping, ANA state Integration: NVMeConfig + CLI flags (-block.nvme.*), disabled by default. Key design: inline-data writes only (no R2T), MaxH2CDataLength=32KB, single ANA group coherent with BlockVol role, CNTLID session registry for cross-connection IO queues, HostNQN continuity enforcement. Tests: 65 dev + 44 QA adversarial = 109 total, all passing. Bugs fixed during review: IO queue cross-connection (A), header bounds validation (B), write payload size check (C), disconnect error (D), stream desync prevention (E), HostNQN enforcement (F), capsule-before-IC state guard (H), flowCtlOff SQHD timing (I). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	3 weeks ago
Ping Qiu	9acd187587	feat: Phase 8 complete -- CP8-5 stability gate, lease grant fix, Docker e2e, 13 chaos scenarios Phase 8 closes with all 6 checkpoints done (CP8-1 through CP8-5 + CP8-3-1): - CP8-5: 12/12 enterprise QA scenarios PASS on real hardware (m01/M02) - Master-authoritative lease grants (BUG-CP85-11): master renews primary write leases on every heartbeat response, replacing retain-until-confirmed assignment queue semantics that caused 30s lease expiry - Post-rebuild WAL shipping gap fix (BUG-CP85-1): syncLSNAfterRebuild advances replica nextLSN so WAL entries are accepted after rebuild - Block heartbeat startup race fix (BUG-CP85-10): dynamic blockService check on each tick instead of one-shot at loop start - 8 new tests: 4 engine lease grant + 4 registry lease grant - 13 new YAML scenarios: chaos (kill-loop, partition, disk-full), database integrity (sqlite crash, ext4 fsck), perf baseline, metrics verify, snapshot stress, expand-failover, session storm, role flap, 24h soak - 12 new testrunner actions (database, fsck, grep_log, write_loop_bg, stop_bg, assert_metric_gt/eq/lt) + phase repeat support - Docker compose setup + getting-started guide for block storage users - 960+ cumulative unit tests, 24 YAML scenarios Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	3 weeks ago
Ping Qiu	da1b81d1c9	feat: CP8-3-1 durability modes + testrunner platform + 21 adversarial tests Durability mode implementation (sync_all, sync_quorum, best_effort): - DurabilityMode type with superblock persistence, parse/validate/string - MakeDistributedSync mode-aware barrier enforcement in dist_group_commit - blockerr sentinel package (ErrDurabilityBarrierFailed, ErrDurabilityQuorumLost) - gRPC create path: mode validation, idempotent create consistency, partial cleanup - F1: strict mode rejects partial replica provisioning with cleanup - F3: empty heartbeat does not overwrite persisted strict mode - F4: SCSI error mapping uses errors.Is sentinels (not string matching) - Proto/wire/blockapi/CLI/UI plumbing for durability_mode field - Observability dashboard: cluster health cards + per-volume columns Testrunner platform (YAML-driven integration test framework): - Engine, parser, registry, reporter (JUnit XML + HTML), metrics scraping - 52 registered actions: block, iSCSI, I/O, fault injection, assertions - Baseline regression framework with 7 hard-fail conditions - 15 YAML scenarios (smoke, crash, HA, fault, consistency, snapshot) - 49 unit tests for testrunner internals QA adversarial suite (21 tests, all PASS): - Idempotent create mode/RF mismatch detection - Heartbeat mode downgrade prevention (F3) - sync_all/sync_quorum partial replica enforcement (F1) - Concurrent create race safety - Failover/expand mode preservation - Cleanup resilience when delete fails - Master restart auto-register mode handling - Superblock roundtrip all 3 modes - Validate edge cases (mode×RF matrix) - RequiredReplicas quorum math verification - Sentinel error categorization Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	4 weeks ago
Ping Qiu	979a9b496c	feat: Phase 8 CP8-1/2/3/4 -- ops control plane, multi-replica, CSI snapshots, observability CP8-1: HTTP REST API (create/delete/lookup/list/assign/servers), blockapi Go client with multi-master failover, 5 shell commands, HTML dashboard at /block/. CP8-2: RF=2/RF=3 multi-replica support -- ShipperGroup fan-out, distributed sync, health scoring, segment-based scrub, gated promotion (heartbeat freshness + WAL LSN + role checks), failover/rebuild for N>2 replicas. CP8-3: CSI snapshot + expansion -- CreateSnapshot/DeleteSnapshot/ListSnapshots RPCs, NodeExpandVolume with iSCSI rescan, snapshot ID helpers, 20 adversarial tests covering concurrent ops, edge cases, and error injection. CP8-4: Observability -- EngineMetrics atomic counters for flusher/group-commit/ WAL-shipper/scrub, 10 new Prometheus metrics, barrier_lag_lsn SLO gauge, failover/promotion/rebuild counters, request ID correlation in master gRPC logs, baseline regression framework with 7 hard-fail conditions. Total: 63 files, ~11.2K LOC, 160+ new tests. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	4 weeks ago
Ping Qiu	8b2b5f6f66	feat: Phase 6 CP6-3 -- failover + rebuild in Kubernetes, 126 tests Wire low-level fencing primitives to master/VS control plane and CSI: - Proto: replica/rebuild address fields on assignment/info/response messages - Assignment queue: retain-until-confirmed (Peek+Confirm), stale epoch pruning - VS assignment receiver: processes assignments from HeartbeatResponse - BlockService replication: ProcessAssignments, deterministic ports (FNV hash) - Registry replica tracking: SetReplica/ClearReplica/SwapPrimaryReplica - CreateBlockVolume: primary + replica, enqueues assignments, single-copy mode - Failover: lease-aware promotion, deferred timers with cancellation on reconnect - ControllerPublish: returns fresh primary iSCSI address after failover - Recovery: recoverBlockVolumes drains pendingRebuilds, enqueues Rebuilding - Real integration tests on M02: failover address switch, rebuild data consistency, full lifecycle failover+rebuild (3 tests, all PASS) Review fixes (12 findings, 5 High, 5 Medium, 2 Low): - R1-1: AllocateBlockVolume returns replication ports - R1-2: setupPrimaryReplication starts rebuild server - R1-3: VS sends periodic block heartbeat for assignment confirmation - R2-F1: LastLeaseGrant set before Register (no stale-lease race) - R2-F2: Deferred promotion timers cancelled on VS reconnect - R2-F3: SwapPrimaryReplica uses RoleToWire instead of uint32(1) - R2-F4: DeleteBlockVolume deletes replica (best-effort) - R2-F5: SwapPrimaryReplica computes epoch atomically under lock - QA: SetReplica removes old replica from byServer index (BUG-QA-CP63-1) 126 CP6-3 tests (67 dev + 48 QA + 8 integration + 3 real). Cumulative Phase 6: 352 tests. All PASS. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	4 weeks ago
Ping Qiu	5a9a52f2d0	feat: Phase 6 CP6-2 -- CSI control-plane integration + csi-sanity/k3s validation CP6-2 wires the CSI driver to SeaweedFS master/volume-server control plane: - Proto: block volume messages in master.proto/volume_server.proto, codegen - Master registry: in-memory BlockVolumeRegistry with Pending->Active status, full/delta heartbeat, inflight lock, placement (fewest volumes) - VS gRPC: AllocateBlockVolume/DeleteBlockVolume handlers, shared naming - Master RPCs: CreateBlockVolume (retry up to 3 servers), Delete, Lookup - Heartbeat: block volume fields wired into bidirectional stream - CSI Controller: VolumeBackend interface (Local + Master), returns volume_context - CSI Node: reads volume_context for remote targets, staged map + IQN derivation - Mode flag: --mode=controller/node/all, --master for control-plane - K8s manifests: csi-driver.yaml, csi-controller.yaml, csi-node.yaml csi-sanity conformance (33 pass, 58 skip) found 6 bugs: - BUG-SANITY-1/2/3: missing VolumeCapabilities/VolumeCapability validation - BUG-SANITY-4: NodePublish used mount instead of bind mount - BUG-SANITY-5: NodeUnpublish didn't remove target path - BUG-SANITY-6: NodeUnpublish failed on unmounted path k3s Level 4 (PVC->Pod data persistence) found 1 bug: - BUG-K3S-1: IsLoggedIn didn't handle iscsiadm exit code 21 226 CSI tests + 54 server tests = 280 new tests, all passing. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	4 weeks ago
Ping Qiu	7c07d9c95a	feat: Phase 4A CP4b-3 -- assignment processing, 2 bug fixes, 20 QA tests Add ProcessBlockVolumeAssignments to BlockVolumeStore and wire AssignmentSource/AssignmentCallback into the heartbeat collector's Run() loop. Assignments are fetched and applied each tick after status collection. Bug fixes: - BUG-CP4B3-1: TOCTOU between GetBlockVolume and HandleAssignment. Added withVolume() helper that holds RLock across lookup+operation, preventing RemoveBlockVolume from closing the volume mid-assignment. - BUG-CP4B3-2: Data race on callback fields read by Run() goroutine. Made StatusCallback/AssignmentSource/AssignmentCallback private, added cbMu mutex and SetXxx() setter methods. Lock held only for load/store, not during callback execution. 7 dev tests + 13 QA adversarial tests = 20 new tests. 972 total unit tests, all passing. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	4 weeks ago
Ping Qiu	a089bf6828	feat: Phase 4A CP4b-2 -- heartbeat collector, 3 bug fixes, 9 QA tests BlockVolumeHeartbeatCollector periodically collects block volume status via callback (standalone, no gRPC wiring yet). Store() accessor on BlockService. Three bugs found by QA and fixed: Stop-before-Run deadlock (BUG-CP4B2-1), zero interval panic (BUG-CP4B2-2), callback panic crashes goroutine (BUG-CP4B2-3). 12 new tests (3 dev + 9 QA adversarial). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	4 weeks ago
Ping Qiu	ffdde15bcd	feat: Phase 4A CP4b-1 -- wire types, conversion helpers, heartbeat collection Add BlockVolumeInfoMessage, BlockVolumeShortInfoMessage, BlockVolumeAssignment wire-type structs (proto-shaped Go structs). Add conversion helpers with DiskType plumbing, overflow-safe LeaseTTLToWire, validated RoleFromWire. Add CollectBlockVolumeHeartbeat on BlockVolumeStore. 9 new tests. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	4 weeks ago
Ping Qiu	80801b0fac	feat: Phase 3 — performance tuning, iSCSI session refactor, store integration Phase 3 delivers five checkpoints: CP1 Engine Tuning: BlockVolConfig tunables, 256-shard DirtyMap, adaptive group commit (low-watermark immediate flush), WAL pressure handling with backpressure and ErrWALFull timeout. CP2 iSCSI Session Refactor: RX/TX goroutine split with respCh (cap 64), txLoop for serialized response writes, StatSN assignment modes. Login phase stays single-goroutine; full-duplex after login. CP3 Store Integration: BlockVolAdapter (iscsi.BlockDevice interface), BlockVolumeStore management, BlockService in volume_server_block.go, CLI flags (--block.listen/dir/iqn.prefix), sw-block-attach.sh helper. CP5 Concurrency Hardening: WAL reuse guard (LSN validation in ReadLBA), opsOutstanding counter with beginOp/endOp + Close drain, appendWithRetry shared by WriteLBA and TrimLBA, flusher LSN guard in FlushOnce. Bug fixes (P3-BUG-2–11): unbounded pending queue cap, Data-Out timeout, flusher error logging, GroupCommitter panic recovery, Close vs concurrent ops guard, target shutdown race, WAL-full retry vs Close, WRITE SAME(16) for XFS, MODE SENSE(10) + VPD 0xB0/0xB2 for Linux kernel compatibility. 797 tests passing (517 engine + 280 iSCSI), go vet clean. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	1 month ago
Chris Lu	da4edb5fe6	Fix live volume move tail timestamp (#8440 ) * Improve move tail timestamp * Add move tail timestamp integration test * Simulate traffic during move	1 month ago
Chris Lu	e596542295	Move SQL engine and PostgreSQL server to their own binaries (#8417 ) * Drop SQL engine and PostgreSQL server * Split SQL tooling into weed-db and weed-sql * move * fix building	1 month ago
Chris Lu	57ab99d13e	fix: generate topology uuid uniformly in single-master mode (#8405 ) * fix: ensure topology uuid is generated in single master setups * ensureTopologyId adds a Hashicorp-aware implementation * simplify	1 month ago
Chris Lu	b5f3094619	fix format of internal node URLs in master UI templates	1 month ago
Chris Lu	e4b70c2521	go fix	1 month ago
Konstantin Lebedev	01b3125815	[shell]: volume balance capacity by min volume density (#8026 ) volume balance by min volume density and active volumes	1 month ago
Chris Lu	7b8df39cf7	s3api: add AttachUserPolicy/DetachUserPolicy/ListAttachedUserPolicies (#8379 ) * iam: add XML responses for managed user policy APIs * s3api: implement attach/detach/list attached user policies * s3api: add embedded IAM tests for managed user policies * iam: update CredentialStore interface and Manager for managed policies Updated the `CredentialStore` interface to include `AttachUserPolicy`, `DetachUserPolicy`, and `ListAttachedUserPolicies` methods. The `CredentialManager` was updated to delegate these calls to the store. Added common error variables for policy management. * iam: implement managed policy methods in MemoryStore Implemented `AttachUserPolicy`, `DetachUserPolicy`, and `ListAttachedUserPolicies` in the MemoryStore. Also ensured deep copying of identities includes PolicyNames. * iam: implement managed policy methods in PostgresStore Modified Postgres schema to include `policy_names` JSONB column in `users`. Implemented `AttachUserPolicy`, `DetachUserPolicy`, and `ListAttachedUserPolicies`. Updated user CRUD operations to handle policy names persistence. * iam: implement managed policy methods in remaining stores Implemented user policy management in: - `FilerEtcStore` (partial implementation) - `IamGrpcStore` (delegated via GetUser/UpdateUser) - `PropagatingCredentialStore` (to broadcast updates) Ensures cluster-wide consistency for policy attachments. * s3api: refactor EmbeddedIamApi to use managed policy APIs - Refactored `AttachUserPolicy`, `DetachUserPolicy`, and `ListAttachedUserPolicies` to use `e.credentialManager` directly. - Fixed a critical error suppression bug in `ExecuteAction` that always returned success even on failure. - Implemented robust error matching using string comparison fallbacks. - Improved consistency by reloading configuration after policy changes. * s3api: update and refine IAM integration tests - Updated tests to use a real `MemoryStore`-backed `CredentialManager`. - Refined test configuration synchronization using `sync.Once` and manual deep-copying to prevent state corruption. - Improved `extractEmbeddedIamErrorCodeAndMessage` to handle more XML formats robustly. - Adjusted test expectations to match current AWS IAM behavior. * fix compilation * visibility * ensure 10 policies * reload * add integration tests * Guard raft command registration * Allow IAM actions in policy tests * Validate gRPC policy attachments * Revert Validate gRPC policy attachments * Tighten gRPC policy attach/detach * Improve IAM managed policy handling * Improve managed policy filters	1 month ago
Chris Lu	3300874cb5	filer: add default log purging to master maintenance scripts (#8359 ) * filer: add default log purging to master maintenance scripts * filer: fix default maintenance scripts to include full set of tasks * filer: refactor maintenance scripts to avoid duplication	1 month ago
Lisandro Pin	a9d12a0792	Implement full scrubbing for EC volumes (#8318 ) Implement full scrubbing for EC volumes.	1 month ago
Lisandro Pin	fbe7dd32c2	Implement full scrubbing for regular volumes (#8254 ) Implement full scrubbing for regular volumes.	2 months ago
Chris Lu	b08bb8237c	Fix master leader election startup issue (#8340 ) * Fix master leader election startup issue Fixes #error-log-leader-not-selected-yet * Fix master leader election startup issue This change improves server address comparison using the 'Equals' method and handles recursion in topology leader lookup, resolving the 'leader not selected yet' error during master startup. * Merge user improvements: use MaybeLeader for non-blocking checks * not useful test * Address code review: optimize Equals, fix deadlock in IsLeader, safe access in Leader	2 months ago
Lisandro Pin	e657e7d827	Implement local scrubbing for EC volumes. (#8283 )	2 months ago
Chris Lu	1c62808c0e	iceberg: wire pagination for list namespaces/tables REST APIs (#8275 ) * s3api/iceberg: wire list pagination tokens and page size * fmt * Update weed/s3api/iceberg/iceberg.go Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> --------- Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2 months ago
Chris Lu	839028b2e0	Fix EC rebuild shard detection (#8265 ) Fix EC rebuild shard counting	2 months ago
Lisandro Pin	1a5679a5eb	Implement a `VolumeEcStatus()` RPC for volume servers. (#8006 ) Just like `VolumeStatus()`, this call allows inspecting details for a given EC volume - including number of files and their total size.	2 months ago
Chris Lu	cb9e21cdc5	Normalize hashicorp raft peer ids (#8253 ) * Normalize raft voter ids * 4.11 * Update raft_hashicorp.go	2 months ago
Chris Lu	c284e51d20	fix: multipart upload ETag calculation (#8238 ) * fix multipart etag * address comments * clean up * clean up * optimization * address comments * unquoted etag * dedup * upgrade * clean * etag * return quoted tag * quoted etag * debug * s3api: unify ETag retrieval and quoting across handlers Refactor newListEntry to take S3ApiServer and use getObjectETag, and update setResponseHeaders to use the same logic. This ensures consistent ETags are returned for both listing and direct access. s3api: implement ListObjects deduplication for versioned buckets Handle duplicate entries between the main path and the .versions directory by prioritizing the latest version when bucket versioning is enabled. * s3api: cleanup stale main file entries during versioned uploads Add explicit deletion of pre-existing "main" files when creating new versions in versioned buckets. This prevents stale entries from appearing in bucket listings and ensures consistency. * s3api: fix cleanup code placement in versioned uploads Correct the placement of rm calls in completeMultipartUpload and putVersionedObject to ensure stale main files are properly deleted during versioned uploads. * s3api: improve getObjectETag fallback for empty ExtETagKey Ensure that when ExtETagKey exists but contains an empty value, the function falls through to MD5/chunk-based calculation instead of returning an empty string. * s3api: fix test files for new newListEntry signature Update test files to use the new newListEntry signature where the first parameter is S3ApiServer. Created mockS3ApiServer to properly test owner display name lookup functionality. s3api: use filer.ETag for consistent Md5 handling in getEtagFromEntry Change getEtagFromEntry fallback to use filer.ETag(entry) instead of filer.ETagChunks to ensure legacy entries with Attributes.Md5 are handled consistently with the rest of the codebase. * s3api: optimize list logic and fix conditional header logging - Hoist bucket versioning check out of per-entry callback to avoid repeated getVersioningState calls - Extract appendOrDedup helper function to eliminate duplicate dedup/append logic across multiple code paths - Change If-Match mismatch logging from glog.Errorf to glog.V(3).Infof and remove DEBUG prefix for consistency * s3api: fix test mock to properly initialize IAM accounts Fixed nil pointer dereference in TestNewListEntryOwnerDisplayName by directly initializing the IdentityAccessManagement.accounts map in the test setup. This ensures newListEntry can properly look up account display names without panicking. * cleanup * s3api: remove premature main file cleanup in versioned uploads Removed incorrect cleanup logic that was deleting main files during versioned uploads. This was causing test failures because it deleted objects that should have been preserved as null versions when versioning was first enabled. The deduplication logic in listing is sufficient to handle duplicate entries without deleting files during upload. * s3api: add empty-value guard to getEtagFromEntry Added the same empty-value guard used in getObjectETag to prevent returning quoted empty strings. When ExtETagKey exists but is empty, the function now falls through to filer.ETag calculation instead of returning "". * s3api: fix listing of directory key objects with matching prefix Revert prefix handling logic to use strings.TrimPrefix instead of checking HasPrefix with empty string result. This ensures that when a directory key object exactly matches the prefix (e.g. prefix="dir/", object="dir/"), it is correctly handled as a regular entry instead of being skipped or incorrectly processed as a common prefix. Also fixed missing variable definition. * s3api: refactor list inline dedup to use appendOrDedup helper Refactored the inline deduplication logic in listFilerEntries to use the shared appendOrDedup helper function. This ensures consistent behavior and reduces code duplication. * test: fix port allocation race in s3tables integration test Updated startMiniCluster to find all required ports simultaneously using findAvailablePorts instead of sequentially. This prevents race conditions where the OS reallocates a port that was just released, causing multiple services (e.g. Filer and Volume) to be assigned the same port and fail to start.	2 months ago
Lisandro Pin	2cda4289f4	Add a version token on RPCs to read/update volume server states. (#8191 ) * Add a version token on `GetState()`/`SetState()` RPCs for volume server states. * Make state version a property ov `VolumeServerState` instead of an in-memory counter. Also extend state atomicity to reads, instead of just writes.	2 months ago
Lisandro Pin	9d751a7b61	Contrib/volume scrub local (#8226 )	2 months ago
Lisandro Pin	f84b70c362	Implement index (fast) scrubbing for regular/EC volumes. (#8207 ) Implement index (fast) scrubbing for regular/EC volumes via `ScrubVolume()`/`ScrubEcVolume()`. Also rearranges existing index test files for reuse across unit tests for different modules.	2 months ago
Chris Lu	72a8f598f2	Fix Maintenance Task Sorting and Refactor Log Persistence (#8199 ) * fix float stepping * do not auto refresh * only logs when non 200 status * fix maintenance task sorting and cleanup redundant handler logic * Refactor log retrieval to persist to disk and fix slowness - Move log retrieval to disk-based persistence in GetMaintenanceTaskDetail - Implement background log fetching on task completion in worker_grpc_server.go - Implement async background refresh for in-progress tasks - Completely remove blocking gRPC calls from the UI path to fix 10s timeouts - Cleanup debug logs and performance profiling code * Ensure consistent deterministic sorting in config_persistence cleanup * Replace magic numbers with constants and remove debug logs - Added descriptive constants for truncation limits and timeouts in admin_server.go and worker_grpc_server.go - Replaced magic numbers with these constants throughout the codebase - Verified removal of stdout debug printing - Ensured consistent truncation logic during log persistence * Address code review feedback on history truncation and logging logic - Fix AssignmentHistory double-serialization by copying task in GetMaintenanceTaskDetail - Fix handleTaskCompletion logging logic (mutually exclusive success/failure logs) - Remove unused Timeout field from LogRequestContext and sync select timeouts with constants - Ensure AssignmentHistory is only provided in the top-level field for better JSON structure * Implement goroutine leak protection and request deduplication - Add request deduplication in RequestTaskLogs to prevent multiple concurrent fetches for the same task - Implement safe cleanup in timeout handlers to avoid race conditions in pendingLogRequests map - Add a 10s cooldown for background log refreshes in GetMaintenanceTaskDetail to prevent spamming - Ensure all persistent log-fetching goroutines are bounded and efficiently managed * Fix potential nil pointer panics in maintenance handlers - Add nil checks for adminServer in ShowTaskDetail, ShowMaintenanceWorkers, and UpdateTaskConfig - Update getMaintenanceQueueData to return a descriptive error instead of nil when adminServer is uninitialized - Ensure internal helper methods consistently check for adminServer initialization before use * Strictly enforce disk-only log reading - Remove background log fetching from GetMaintenanceTaskDetail to prevent timeouts and network calls during page view - Remove unused lastLogFetch tracking fields to clean up dead code - Ensure logs are only updated upon task completion via handleTaskCompletion * Refactor GetWorkerLogs to read from disk - Update /api/maintenance/workers/:id/logs endpoint to use configPersistence.LoadTaskExecutionLogs - Remove synchronous gRPC call RequestTaskLogs to prevent timeouts and bad gateway errors - Ensure consistent log retrieval behavior across the application (disk-only) * Fix timestamp parsing in log viewer - Update task_detail.templ JS to handle both ISO 8601 strings and Unix timestamps - Fix "Invalid time value" error when displaying logs fetched from disk - Regenerate templates * master: fallback to HDD if SSD volumes are full in Assign * worker: improve EC detection logging and fix skip counters * worker: add Sync method to TaskLogger interface * worker: implement Sync and ensure logs are flushed before task completion * admin: improve task log retrieval with retries and better timeouts * admin: robust timestamp parsing in task detail view	2 months ago
Chris Lu	f66a23b472	Fix: filer not yet available in s3.configure (#8198 ) * Fix: Initialize filer CredentialManager with filer address * The fix involves checking for directory existence before creation. * adjust error message * Fix: Implement FilerAddressSetter in PropagatingCredentialStore * Refactor: Reorder credential manager initialization in filer server * refactor	2 months ago
Lisandro Pin	ff5a8f0579	Implement RPC skeleton for regular/EC volumes scrubbing. (#8187 ) * Implement RPC skeleton for regular/EC volumes scrubbing. See https://github.com/seaweedfs/seaweedfs/issues/8018 for details. * Minor proto improvements for `ScrubVolume()`, `ScrubEcVolume()`: - Add fields for scrubbing details in `ScrubVolumeResponse` and `ScrubEcVolumeResponse`, instead of reporting these through RPC errors. - Return a list of broken shards when scrubbing EC volumes, via `EcShardInfo'.	2 months ago
Lisandro Pin	345ac950b6	Add volume server RPCs to read and update state flags. (#8186 ) * Boostrap persistent state for volume servers. This PR implements logic load/save persistent state information for storages associated with volume servers, and reporting state changes back to masters via heartbeat messages. More work ensues! See https://github.com/seaweedfs/seaweedfs/issues/7977 for details. * Add volume server RPCs to read and update state flags.	2 months ago
Lisandro Pin	9638d37fe2	Block RPC write operations on volume servers when maintenance mode is enabled (#8115 ) * Boostrap persistent state for volume servers. This PR implements logic load/save persistent state information for storages associated with volume servers, and reporting state changes back to masters via heartbeat messages. More work ensues! See https://github.com/seaweedfs/seaweedfs/issues/7977 for details. * Block RPC operations writing to volume servers when maintenance mode is on.	2 months ago
Lisandro Pin	9e15823855	Have masters update DataNode details based on state heartbeats from volume servers. (#8017 )	2 months ago

1 2 3 4 5 ...

1821 Commits (abbc8bff2bbfb22ee5d846f53b518eb916631522)