seaweedfs

Commit Graph

Author	SHA1	Message	Date
Ping Qiu	bbadeeb89b	feat: Phase 10 CP10-2 -- CSI NVMe/TCP node plugin, 210 tests NVMe/TCP transport support in the CSI driver so Kubernetes pods can mount block volumes via NVMe alongside (or instead of) iSCSI. Transport selection: NVMe preferred when nvme_tcp module loaded + metadata present + nvmeUtil available. Fail-fast on NVMe errors (no silent iSCSI fallback). .transport file persists across CSI restarts. Key changes: - BuildNQN() single source of truth for NQN construction (naming.go) - NVMeUtil interface + realNVMeUtil wrapping nvme-cli (nvme_util.go) - NodeStageVolume/Unstage/Expand dual-transport paths (node.go) - NvmeAddr/NQN fields in VolumeInfo, Controller contexts - VolumeManager NvmeAddr()/VolumeNQN() getters - BlockService NvmeListenAddr()/NQN() accessors - 27 unit tests + 26 QA adversarial tests (nvme_node_test.go, qa_cp102) - Fix: flaky TestQA_Node_ConcurrentStageUnstage (pre-alloc temp dirs) Review fixes applied: F1 (NQN format mismatch), F2 (CreateVolume drops NVMe context), F3 (IsConnected error classification), F4 (findSubsys path validation), F5 (MasterVolumeClient NVMe gap documented). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	4 days ago
Ping Qiu	0e234f5c80	feat: Phase 10 CP10-1 -- NVMe/TCP target MVP, 109 tests NVMe over Fabrics (TCP) target implementation sharing the same BlockVol engine, fencing, replication, and failover as the existing iSCSI target. New package: weed/storage/blockvol/nvme/ (11 files, 2,242 production LOC) - protocol.go: PDU types, opcodes, status codes, marshal/unmarshal - wire.go: TCP reader/writer with header bounds validation - controller.go: IC handshake, per-queue state, command dispatch, KATO - fabric.go: Connect (admin+IO), PropertyGet/Set, Disconnect - identify.go: Controller/Namespace/NS list/NS descriptors (Linux 5.15) - admin.go: SetFeatures, GetFeatures, GetLogPage (SMART/ANA), KeepAlive - io.go: Read (C2HData), Write (inline), Flush, WriteZeros/Trim - server.go: TCP listener, admin session registry, graceful shutdown - adapter.go: BlockVol-to-NVMe bridge, error mapping, ANA state Integration: NVMeConfig + CLI flags (-block.nvme.*), disabled by default. Key design: inline-data writes only (no R2T), MaxH2CDataLength=32KB, single ANA group coherent with BlockVol role, CNTLID session registry for cross-connection IO queues, HostNQN continuity enforcement. Tests: 65 dev + 44 QA adversarial = 109 total, all passing. Bugs fixed during review: IO queue cross-connection (A), header bounds validation (B), write payload size check (C), disconnect error (D), stream desync prevention (E), HostNQN enforcement (F), capsule-before-IC state guard (H), flowCtlOff SQHD timing (I). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	4 days ago
Ping Qiu	8fa1829992	feat: Phase 9A -- Kubernetes operator MVP for SeaweedFS block storage, 71 tests Nested Go module (operator/go.mod) isolating controller-runtime deps. CRD SeaweedBlockCluster (block.seaweedfs.com/v1alpha1) with dual-mode: CSI-only (MasterRef) connects to existing cluster; full-stack (Master) deploys master+volume StatefulSets. Single reconciler manages all sub-resources with ownership labels, finalizer cleanup, CHAP secret auto-generation, and multi-CR conflict detection. Review fixes: cross-NS label ownership (H1), ParseQuantity validation (H2), volume readiness probe (M1), leader election (M2), PVC StorageClassName (M3), condition type separation (M4), FQDN master address (L1), port validation (L3). QA adversarial fixes: ExtraArgs override rejection (BUG-QA-1), malformed lastRotated infinite rotation (BUG-QA-2), DNS label length validation (BUG-QA-3), replicas=0 error message (BUG-QA-4), RFC 1123 name validation (BUG-QA-5), whitespace field trimming (BUG-QA-6), zero storage size (BUG-QA-7). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	4 days ago
Ping Qiu	9acd187587	feat: Phase 8 complete -- CP8-5 stability gate, lease grant fix, Docker e2e, 13 chaos scenarios Phase 8 closes with all 6 checkpoints done (CP8-1 through CP8-5 + CP8-3-1): - CP8-5: 12/12 enterprise QA scenarios PASS on real hardware (m01/M02) - Master-authoritative lease grants (BUG-CP85-11): master renews primary write leases on every heartbeat response, replacing retain-until-confirmed assignment queue semantics that caused 30s lease expiry - Post-rebuild WAL shipping gap fix (BUG-CP85-1): syncLSNAfterRebuild advances replica nextLSN so WAL entries are accepted after rebuild - Block heartbeat startup race fix (BUG-CP85-10): dynamic blockService check on each tick instead of one-shot at loop start - 8 new tests: 4 engine lease grant + 4 registry lease grant - 13 new YAML scenarios: chaos (kill-loop, partition, disk-full), database integrity (sqlite crash, ext4 fsck), perf baseline, metrics verify, snapshot stress, expand-failover, session storm, role flap, 24h soak - 12 new testrunner actions (database, fsck, grep_log, write_loop_bg, stop_bg, assert_metric_gt/eq/lt) + phase repeat support - Docker compose setup + getting-started guide for block storage users - 960+ cumulative unit tests, 24 YAML scenarios Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	5 days ago
Ping Qiu	da1b81d1c9	feat: CP8-3-1 durability modes + testrunner platform + 21 adversarial tests Durability mode implementation (sync_all, sync_quorum, best_effort): - DurabilityMode type with superblock persistence, parse/validate/string - MakeDistributedSync mode-aware barrier enforcement in dist_group_commit - blockerr sentinel package (ErrDurabilityBarrierFailed, ErrDurabilityQuorumLost) - gRPC create path: mode validation, idempotent create consistency, partial cleanup - F1: strict mode rejects partial replica provisioning with cleanup - F3: empty heartbeat does not overwrite persisted strict mode - F4: SCSI error mapping uses errors.Is sentinels (not string matching) - Proto/wire/blockapi/CLI/UI plumbing for durability_mode field - Observability dashboard: cluster health cards + per-volume columns Testrunner platform (YAML-driven integration test framework): - Engine, parser, registry, reporter (JUnit XML + HTML), metrics scraping - 52 registered actions: block, iSCSI, I/O, fault injection, assertions - Baseline regression framework with 7 hard-fail conditions - 15 YAML scenarios (smoke, crash, HA, fault, consistency, snapshot) - 49 unit tests for testrunner internals QA adversarial suite (21 tests, all PASS): - Idempotent create mode/RF mismatch detection - Heartbeat mode downgrade prevention (F3) - sync_all/sync_quorum partial replica enforcement (F1) - Concurrent create race safety - Failover/expand mode preservation - Cleanup resilience when delete fails - Master restart auto-register mode handling - Superblock roundtrip all 3 modes - Validate edge cases (mode×RF matrix) - RequiredReplicas quorum math verification - Sentinel error categorization Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	7 days ago
Ping Qiu	979a9b496c	feat: Phase 8 CP8-1/2/3/4 -- ops control plane, multi-replica, CSI snapshots, observability CP8-1: HTTP REST API (create/delete/lookup/list/assign/servers), blockapi Go client with multi-master failover, 5 shell commands, HTML dashboard at /block/. CP8-2: RF=2/RF=3 multi-replica support -- ShipperGroup fan-out, distributed sync, health scoring, segment-based scrub, gated promotion (heartbeat freshness + WAL LSN + role checks), failover/rebuild for N>2 replicas. CP8-3: CSI snapshot + expansion -- CreateSnapshot/DeleteSnapshot/ListSnapshots RPCs, NodeExpandVolume with iSCSI rescan, snapshot ID helpers, 20 adversarial tests covering concurrent ops, edge cases, and error injection. CP8-4: Observability -- EngineMetrics atomic counters for flusher/group-commit/ WAL-shipper/scrub, 10 new Prometheus metrics, barrier_lag_lsn SLO gauge, failover/promotion/rebuild counters, request ID correlation in master gRPC logs, baseline regression framework with 7 hard-fail conditions. Total: 63 files, ~11.2K LOC, 160+ new tests. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	7 days ago
Ping Qiu	8b2b5f6f66	feat: Phase 6 CP6-3 -- failover + rebuild in Kubernetes, 126 tests Wire low-level fencing primitives to master/VS control plane and CSI: - Proto: replica/rebuild address fields on assignment/info/response messages - Assignment queue: retain-until-confirmed (Peek+Confirm), stale epoch pruning - VS assignment receiver: processes assignments from HeartbeatResponse - BlockService replication: ProcessAssignments, deterministic ports (FNV hash) - Registry replica tracking: SetReplica/ClearReplica/SwapPrimaryReplica - CreateBlockVolume: primary + replica, enqueues assignments, single-copy mode - Failover: lease-aware promotion, deferred timers with cancellation on reconnect - ControllerPublish: returns fresh primary iSCSI address after failover - Recovery: recoverBlockVolumes drains pendingRebuilds, enqueues Rebuilding - Real integration tests on M02: failover address switch, rebuild data consistency, full lifecycle failover+rebuild (3 tests, all PASS) Review fixes (12 findings, 5 High, 5 Medium, 2 Low): - R1-1: AllocateBlockVolume returns replication ports - R1-2: setupPrimaryReplication starts rebuild server - R1-3: VS sends periodic block heartbeat for assignment confirmation - R2-F1: LastLeaseGrant set before Register (no stale-lease race) - R2-F2: Deferred promotion timers cancelled on VS reconnect - R2-F3: SwapPrimaryReplica uses RoleToWire instead of uint32(1) - R2-F4: DeleteBlockVolume deletes replica (best-effort) - R2-F5: SwapPrimaryReplica computes epoch atomically under lock - QA: SetReplica removes old replica from byServer index (BUG-QA-CP63-1) 126 CP6-3 tests (67 dev + 48 QA + 8 integration + 3 real). Cumulative Phase 6: 352 tests. All PASS. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	1 week ago
Ping Qiu	5a9a52f2d0	feat: Phase 6 CP6-2 -- CSI control-plane integration + csi-sanity/k3s validation CP6-2 wires the CSI driver to SeaweedFS master/volume-server control plane: - Proto: block volume messages in master.proto/volume_server.proto, codegen - Master registry: in-memory BlockVolumeRegistry with Pending->Active status, full/delta heartbeat, inflight lock, placement (fewest volumes) - VS gRPC: AllocateBlockVolume/DeleteBlockVolume handlers, shared naming - Master RPCs: CreateBlockVolume (retry up to 3 servers), Delete, Lookup - Heartbeat: block volume fields wired into bidirectional stream - CSI Controller: VolumeBackend interface (Local + Master), returns volume_context - CSI Node: reads volume_context for remote targets, staged map + IQN derivation - Mode flag: --mode=controller/node/all, --master for control-plane - K8s manifests: csi-driver.yaml, csi-controller.yaml, csi-node.yaml csi-sanity conformance (33 pass, 58 skip) found 6 bugs: - BUG-SANITY-1/2/3: missing VolumeCapabilities/VolumeCapability validation - BUG-SANITY-4: NodePublish used mount instead of bind mount - BUG-SANITY-5: NodeUnpublish didn't remove target path - BUG-SANITY-6: NodeUnpublish failed on unmounted path k3s Level 4 (PVC->Pod data persistence) found 1 bug: - BUG-K3S-1: IsLoggedIn didn't handle iscsiadm exit code 21 226 CSI tests + 54 server tests = 280 new tests, all passing. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	1 week ago
Ping Qiu	797854b2d9	test: Phase 5 QA adversarial tests -- 49 tests for CHAP, resize, snapshots - qa_chap_test.go: 16 tests (empty secret, replay, missing fields, hex cases) - qa_resize_test.go: 12 tests (concurrent, reopen, replica reject, alignment) - qa_snapshot_test.go: 21 tests (concurrent create, CoW, recovery, role checks) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	1 week ago
Ping Qiu	531ee764ee	feat: Phase 5 CP5-3 -- CHAP auth, online resize, Prometheus metrics, 12 tests CHAP authentication (RFC 7143 S12.1): - auth.go: CHAPAuthenticator with MD5 challenge-response, ValidateCHAPConfig - login.go: multi-PDU SecurityNeg flow (challenge → verify → transit) - main.go: -chap-user/-chap-secret CLI flags with validation Online volume expand: - blockvol.go: Expand() with flusher pause, snapMu TOCTOU guard, alignment check - Rejects shrink (ErrShrinkNotSupported) and resize with active snapshots Prometheus metrics: - metrics.go: metricsAdapter wrapping BlockDevice, 15 metrics (counters, histograms, gauge funcs for WAL/dirty-map/epoch/role/snapshots) - Dedicated prometheus.NewRegistry() per server instance Admin HTTP endpoints: - POST /snapshot (create/delete/restore/list) - POST /resize (online expand) - GET /metrics (Prometheus text format) - VolumeSize added to /status response Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	1 week ago
Ping Qiu	d874e21f93	feat: Phase 5 CP5-2 -- CoW snapshots, 10 tests Sparse delta-file snapshots with copy-on-write in the flusher. Zero write-path overhead when no snapshot is active. New: snapshot.go (SnapshotBitmap, SnapshotHeader, delta file I/O) Modified: flusher.go (flushMu, CoW phase in FlushOnce, PauseAndFlush) Modified: blockvol.go (Create/Read/Delete/Restore/ListSnapshots, recovery) Modified: wal_writer.go (Reset for snapshot restore) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	1 week ago
Ping Qiu	98d0e9e631	feat: Phase 5 CP5-1 -- ALUA + multipath failover, 28 tests Add ALUA (Asymmetric Logical Unit Access) support to the iSCSI target, enabling dm-multipath on Linux to automatically detect path state changes and reroute I/O during HA failover without initiator-side intervention. - ALUAProvider interface with implicit ALUA (TPGS=0x01) - INQUIRY byte 5 TPGS bits, VPD 0x83 with NAA+TPG+RTP descriptors - REPORT TARGET PORT GROUPS handler (MAINTENANCE IN SA=0x0A) - MAINTENANCE OUT rejection (implicit-only, no SET TPG) - Standby write rejection (NOT_READY ASC=04h ASCQ=0Bh) - RoleNone maps to Active/Optimized (standalone single-node compatibility) - NAA-6 device identifier derived from volume UUID - -tpg-id flag with [1,65535] validation - dm-multipath config + setup script (group_by_tpg, ALUA prio) - 12 unit tests + 16 QA adversarial tests + 4 integration tests Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	1 week ago
Ping Qiu	7940e6b7c9	feat: Phase 4A CP4b-4 Windows iSCSI + instrumentation + QA tests Windows iSCSI Initiator compatibility: - Add TargetPortalGroupTag to login response (RFC 7143 S13.9) - Add REQUEST_SENSE, START_STOP_UNIT, MODE_SELECT(6/10) handlers - Add PERSISTENT_RESERVE_IN/OUT, MAINTENANCE_IN (REPORT SUPPORTED OPCODES) - Implement MODE SENSE caching (page 0x08) and control (page 0x0A) pages - Fix Data-In residual underflow/overflow flags (U/O bits on final PDU) - Rename ScsiReadCapacity16 -> ScsiServiceActionIn16 for correctness Instrumentation and tooling: - Add instrumentedAdapter with periodic PERF stats logging - Add pprof endpoints on admin HTTP server (/debug/pprof/*) - Add blockbench CLI tool for standalone block device benchmarking - Add SCSI CDB debug logging in session dispatch HA integration fixes: - Move HA test replica ports to 9011-9014 to avoid conflicts - Add QA adversarial tests for Phase 4A CP4b-4 (755 lines) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	1 week ago
Ping Qiu	44da35faf6	test: add integration test infrastructure for blockvol iSCSI Test harness for running blockvol iSCSI tests on WSL2 and remote nodes (m01/M02). Includes Node (SSH/local exec), ISCSIClient (discover/login/ logout), WeedTarget (weed volume server lifecycle), and test suites for smoke, stress, crash recovery, chaos, perf benchmarks, and apps (fio/dd). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	1 week ago
Ping Qiu	c39080ceaa	feat: Phase 4A CP4b-4 -- HA integration tests, admin HTTP, 5 bug fixes Add HTTP admin server to iscsi-target binary (POST /assign, GET /status, POST /replica, POST /rebuild) and 7 HA integration tests validating failover, split-brain prevention, epoch fencing, and demote-under-IO. New files: - admin.go: HTTP admin endpoint with input validation - ha_target.go: HATarget helper wrapping Target + admin HTTP calls - ha_test.go: 7 HA tests (all PASS on WSL2, 67.7s total) Bug fixes: - BUG-CP4B4-1: CmdSN init (expCmdSN=0 not 1, first SCSI cmd was dropped) - BUG-CP4B4-2: RoleNone->RoleReplica missing SetEpoch (WAL rejected) - BUG-CP4B4-3: replica applyEntry didn't update vol.nextLSN (status=0) - BUG-CP4B4-4: PID discovery killed primary instead of replica (shared binPath; fixed by grepping volFile) - BUG-CP4B4-5: artifact collector overwrote primary log with replica log (added CollectLabeled method) Also: 3s write deadline on WAL shipper data connection to avoid 120s TCP retransmission timeout when replica is dead. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	1 week ago
Ping Qiu	7c07d9c95a	feat: Phase 4A CP4b-3 -- assignment processing, 2 bug fixes, 20 QA tests Add ProcessBlockVolumeAssignments to BlockVolumeStore and wire AssignmentSource/AssignmentCallback into the heartbeat collector's Run() loop. Assignments are fetched and applied each tick after status collection. Bug fixes: - BUG-CP4B3-1: TOCTOU between GetBlockVolume and HandleAssignment. Added withVolume() helper that holds RLock across lookup+operation, preventing RemoveBlockVolume from closing the volume mid-assignment. - BUG-CP4B3-2: Data race on callback fields read by Run() goroutine. Made StatusCallback/AssignmentSource/AssignmentCallback private, added cbMu mutex and SetXxx() setter methods. Lock held only for load/store, not during callback execution. 7 dev tests + 13 QA adversarial tests = 20 new tests. 972 total unit tests, all passing. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	1 week ago
Ping Qiu	c95500cc57	test: Phase 4A CP4b-1 QA adversarial tests (19 tests) Boundary tests for RoleFromWire, LeaseTTLToWire overflow/clamp/negative, ToBlockVolumeInfoMessage with primary/stale/closed/concurrent volumes, BlockVolumeAssignment roundtrip, and heartbeat collection edge cases. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	1 week ago
Ping Qiu	ffdde15bcd	feat: Phase 4A CP4b-1 -- wire types, conversion helpers, heartbeat collection Add BlockVolumeInfoMessage, BlockVolumeShortInfoMessage, BlockVolumeAssignment wire-type structs (proto-shaped Go structs). Add conversion helpers with DiskType plumbing, overflow-safe LeaseTTLToWire, validated RoleFromWire. Add CollectBlockVolumeHeartbeat on BlockVolumeStore. 9 new tests. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	1 week ago
Ping Qiu	09c7e40d29	feat: Phase 4A CP4a -- simulated master, assignment sequence tests, BlockVolumeStatus Add SimulatedMaster test helper + 20 assignment sequence tests (8 sequence, 5 failover, 5 adversarial, 2 status). Add BlockVolumeStatus struct and Status() method. Includes QA test files for CP1-CP4a. 940 total unit tests. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	1 week ago
Ping Qiu	b31383e294	feat: Phase 4A CP3 -- promotion, rebuild, split-brain prevention Add master-driven lifecycle operations: promotion, demotion, rebuild, and split-brain prevention. All testable on Windows with mock TCP. New files: - promotion.go: HandleAssignment (single entry point for role changes), promote (Replica/None -> Primary with durable epoch), demote (Primary -> Draining -> Stale with drain timeout) - rebuild.go: RebuildServer (WAL catch-up + full extent streaming), StartRebuild client (WAL catch-up with full extent fallback, two-phase rebuild with second catch-up for concurrent writes) Modified: - wal_writer.go: ScanFrom() method, ErrWALRecycled sentinel - repl_proto.go: rebuild message types + RebuildRequest encode/decode - blockvol.go: assignMu, drainTimeout, rebuildServer fields; HandleAssignment/StartRebuildServer/StopRebuildServer methods; rebuild server stop in Close() - dirty_map.go: Clear() method for full extent rebuild 32 new tests covering WAL scan, promotion/demotion, rebuild server, rebuild client, split-brain prevention, and full lifecycle scenarios. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2 weeks ago
Ping Qiu	16a796e56d	feat: Phase 4A CP2 — WAL shipping, replica barrier, distributed group commit Primary ships WAL entries to replica over TCP (data channel), confirms durability via barrier RPC (control channel). SyncCache runs local fsync and replica barrier in parallel via MakeDistributedSync. When replica is unreachable, shipper enters permanent degraded mode and falls back to local-only sync (Phase 3 behavior). Key design: two separate TCP ports (data+control), contiguous LSN enforcement, epoch equality check, WAL-full retry on replica, cond.Wait-based barrier with configurable timeout, BarrierFsyncFailed status code. Close lifecycle: shipper → receiver → drain → committer → flusher → fd. New files: repl_proto.go, wal_shipper.go, replica_apply.go, replica_barrier.go, dist_group_commit.go Modified: blockvol.go, blockvol_test.go 27 dev tests + 21 QA tests = 48 new tests; 889 total (609 engine + 280 iSCSI), all passing. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2 weeks ago
Ping Qiu	a107685f00	feat: Phase 4A CP1 — epoch, lease, role state machine, write gate Local fencing primitives for block volumes. Every write path validates role + epoch + lease before accepting data. RoleNone (default) skips all checks for Phase 3 backward compatibility. New files: epoch.go, lease.go, role.go, write_gate.go Modified: superblock.go (Epoch field), blockvol.go (fencing fields, writeGate in WriteLBA/Trim), group_commit.go (PostSyncCheck/Gotcha A), dirty_map.go (P3-BUG-9 power-of-2 panic) Bug fixes: BUG-4A-1 (atomic epoch), BUG-4A-2 (CAS SetRole), BUG-4A-3 (mutex SetEpoch), BUG-4A-4 (single role.Load), BUG-4A-6 (safeCallback recover) 837 tests (557 engine + 280 iSCSI), all passing. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2 weeks ago
Ping Qiu	80801b0fac	feat: Phase 3 — performance tuning, iSCSI session refactor, store integration Phase 3 delivers five checkpoints: CP1 Engine Tuning: BlockVolConfig tunables, 256-shard DirtyMap, adaptive group commit (low-watermark immediate flush), WAL pressure handling with backpressure and ErrWALFull timeout. CP2 iSCSI Session Refactor: RX/TX goroutine split with respCh (cap 64), txLoop for serialized response writes, StatSN assignment modes. Login phase stays single-goroutine; full-duplex after login. CP3 Store Integration: BlockVolAdapter (iscsi.BlockDevice interface), BlockVolumeStore management, BlockService in volume_server_block.go, CLI flags (--block.listen/dir/iqn.prefix), sw-block-attach.sh helper. CP5 Concurrency Hardening: WAL reuse guard (LSN validation in ReadLBA), opsOutstanding counter with beginOp/endOp + Close drain, appendWithRetry shared by WriteLBA and TrimLBA, flusher LSN guard in FlushOnce. Bug fixes (P3-BUG-2–11): unbounded pending queue cap, Data-Out timeout, flusher error logging, GroupCommitter panic recovery, Close vs concurrent ops guard, target shutdown race, WAL-full retry vs Close, WRITE SAME(16) for XFS, MODE SENSE(10) + VPD 0xB0/0xB2 for Linux kernel compatibility. 797 tests passing (517 engine + 280 iSCSI), go vet clean. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2 weeks ago
Ping Qiu	9b7be60b0c	Add QA adversarial tests for iSCSI target (55 tests) 9 categories: PDU, Params, Login, Discovery, SCSI, DataIO, Session, Target, Integration. 2,183 lines. All 229 tests pass (164 dev + 55 QA). No new production bugs found. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2 weeks ago
Ping Qiu	feef0206ad	fix: address code review findings (nil handler, CmdSN, Data-Out order) 1. Discovery session nil handler crash: reject SCSI commands with Reject PDU when s.scsi is nil (discovery sessions have no target). 2. CmdSN window enforcement: validate incoming CmdSN against [ExpCmdSN, MaxCmdSN] using serial arithmetic. Drop out-of-window commands per RFC 7143 section 4.2.2.1. 3. Data-Out buffer offset validation: enforce BufferOffset == received for ordered data (DataPDUInOrder=Yes). Prevents silent corruption from out-of-order or overlapping data. 4. ImmediateData enforcement: reject immediate data in SCSI command PDU when negotiated ImmediateData=No. 5. UNMAP descriptor length alignment: reject blockDescLen not a multiple of 16 bytes. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2 weeks ago
Ping Qiu	bd73a81e00	fix: handle pipelined SCSI commands during Data-Out collection The Linux kernel iSCSI initiator pipelines multiple SCSI commands on the same TCP connection (command queuing). When a write needs R2T for data beyond the immediate portion, collectDataOut may read a pipelined SCSI command instead of the expected Data-Out PDU. Fix: queue non-Data-Out PDUs received during collectDataOut into a pending buffer. The main dispatch loop drains pending PDUs before reading from the connection. This correctly handles interleaved commands during multi-PDU write transfers. Bug found during WSL2 smoke test: mkfs.ext4 hangs at "Writing superblocks" because inode table zeroing sends large writes that exceed FirstBurstLength, triggering R2T while the kernel has already queued the next command. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2 weeks ago
Ping Qiu	6546c549eb	fix: iSCSI login and discovery bugs found in WSL2 smoke test - Skip InitiatorAlias in negotiation (was returning NotUnderstood) - Capture TargetName in StageLoginOp direct-jump path (iscsiadm skips security stage, sends CSG=LoginOp directly -- nil SCSIHandler crash) - Add portalAddr to TargetServer for discovery responses (listener on [::] is not routable from WSL2 clients) - Add -portal flag to iscsi-target binary Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2 weeks ago
Ping Qiu	6a400f6760	feat: add BlockVol engine and iSCSI target (Phase 1 + Phase 2) Phase 1: Extent-mapped block storage engine with WAL, crash recovery, dirty map, flusher, and group commit. 174 tests, zero SeaweedFS imports. Phase 2: Pure Go iSCSI target (RFC 7143) with PDU codec, login negotiation, SendTargets discovery, 12 SCSI opcodes, Data-In/Out/R2T sequencing, session management, and standalone iscsi-target binary. 164 tests. IQN->BlockDevice binding via DeviceLookup interface. Total: 338 tests, 14.6K lines. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2 weeks ago
Xiao Wei	9fa95dd2c6	fix: unload leveldb not take effect (#8431 )	2 weeks ago
Chris Lu	e4b70c2521	go fix	3 weeks ago
Konstantin Lebedev	01b3125815	[shell]: volume balance capacity by min volume density (#8026 ) volume balance by min volume density and active volumes	3 weeks ago
Lisandro Pin	a9d12a0792	Implement full scrubbing for EC volumes (#8318 ) Implement full scrubbing for EC volumes.	3 weeks ago
Lisandro Pin	11fdb68281	Fix superblock write error checks on volume compaction. (#8352 )	3 weeks ago
Lisandro Pin	0721e3c1e9	Rework volume compaction (a.k.a vacuuming) logic to cleanly support new parameters. (#8337 ) We'll leverage on this to support a "ignore broken needles" option, necessary to properly recover damaged volumes, as described in https://github.com/seaweedfs/seaweedfs/issues/7442#issuecomment-3897784283 .	4 weeks ago
Lisandro Pin	fbe7dd32c2	Implement full scrubbing for regular volumes (#8254 ) Implement full scrubbing for regular volumes.	4 weeks ago
Lisandro Pin	1ebc9dd530	Have local EC volume scrubbing check needle integrity whenever possible. (#8334 ) If local EC scrubbing hits needles whose chunk location reside entirely in local shards, we can fully reconstruct them, and check CRCs for data integrity.	4 weeks ago
Chris Lu	75faf826d4	Fix LevelDB panic on lazy reload (#8269 ) (#8307 ) * fix LevelDB panic on lazy reload Implemented a thread-safe reload mechanism using double-checked locking and a retry loop in Get, Put, and Delete. Added a concurrency test to verify the fix and prevent regressions. Fixes #8269 * refactor: use helper for leveldb fix and remove deprecated ioutil * fix: prevent deadlock by using getFromDb helper Extracted DB lookup to internal helper to avoid recursive RLock in Put/Delete methods. Updated Get to use the helper as well. * fix: resolve syntax error and commit deadlock prevention Fixed a duplicate function declaration syntax error. Verified that getFromDb helper correctly prevents recursive RLock scenarios. * refactor: remove redundant timeout checks Removed nested `if m.ldbTimeout > 0` checks in Get, Put, and Delete methods as suggested in PR review.	4 weeks ago
Lisandro Pin	e657e7d827	Implement local scrubbing for EC volumes. (#8283 )	4 weeks ago
Lisandro Pin	1a5679a5eb	Implement a `VolumeEcStatus()` RPC for volume servers. (#8006 ) Just like `VolumeStatus()`, this call allows inspecting details for a given EC volume - including number of files and their total size.	1 month ago
Chris Lu	be6b5db65a	s3: fix health check endpoints returning 404 for HEAD requests #8243 (#8248 ) * Fix disk errors handling in vacuum compaction When a disk reports IO errors during vacuum compaction (e.g., 'read /mnt/d1/weed/oc_xyz.dat: input/output error'), the vacuum task should signal the error to the master so it can: 1. Drop the faulty volume replica 2. Rebuild the replica from healthy copies Changes: - Add checkReadWriteError() calls in vacuum read paths (ReadNeedleBlob, ReadData, ScanVolumeFile) to flag EIO errors in volume.lastIoError - Preserve error wrapping using %w format instead of %v so EIO propagates correctly - The existing heartbeat logic will detect lastIoError and remove the bad volume Fixes issue #8237 * error * s3: fix health check endpoints returning 404 for HEAD requests #8243	1 month ago
Chris Lu	330ba7d9dc	Fix disk errors handling in vacuum compaction (#8244 ) When a disk reports IO errors during vacuum compaction (e.g., 'read /mnt/d1/weed/oc_xyz.dat: input/output error'), the vacuum task should signal the error to the master so it can: 1. Drop the faulty volume replica 2. Rebuild the replica from healthy copies Changes: - Add checkReadWriteError() calls in vacuum read paths (ReadNeedleBlob, ReadData, ScanVolumeFile) to flag EIO errors in volume.lastIoError - Preserve error wrapping using %w format instead of %v so EIO propagates correctly - The existing heartbeat logic will detect lastIoError and remove the bad volume Fixes issue #8237	1 month ago
Chris Lu	c284e51d20	fix: multipart upload ETag calculation (#8238 ) * fix multipart etag * address comments * clean up * clean up * optimization * address comments * unquoted etag * dedup * upgrade * clean * etag * return quoted tag * quoted etag * debug * s3api: unify ETag retrieval and quoting across handlers Refactor newListEntry to take S3ApiServer and use getObjectETag, and update setResponseHeaders to use the same logic. This ensures consistent ETags are returned for both listing and direct access. s3api: implement ListObjects deduplication for versioned buckets Handle duplicate entries between the main path and the .versions directory by prioritizing the latest version when bucket versioning is enabled. * s3api: cleanup stale main file entries during versioned uploads Add explicit deletion of pre-existing "main" files when creating new versions in versioned buckets. This prevents stale entries from appearing in bucket listings and ensures consistency. * s3api: fix cleanup code placement in versioned uploads Correct the placement of rm calls in completeMultipartUpload and putVersionedObject to ensure stale main files are properly deleted during versioned uploads. * s3api: improve getObjectETag fallback for empty ExtETagKey Ensure that when ExtETagKey exists but contains an empty value, the function falls through to MD5/chunk-based calculation instead of returning an empty string. * s3api: fix test files for new newListEntry signature Update test files to use the new newListEntry signature where the first parameter is S3ApiServer. Created mockS3ApiServer to properly test owner display name lookup functionality. s3api: use filer.ETag for consistent Md5 handling in getEtagFromEntry Change getEtagFromEntry fallback to use filer.ETag(entry) instead of filer.ETagChunks to ensure legacy entries with Attributes.Md5 are handled consistently with the rest of the codebase. * s3api: optimize list logic and fix conditional header logging - Hoist bucket versioning check out of per-entry callback to avoid repeated getVersioningState calls - Extract appendOrDedup helper function to eliminate duplicate dedup/append logic across multiple code paths - Change If-Match mismatch logging from glog.Errorf to glog.V(3).Infof and remove DEBUG prefix for consistency * s3api: fix test mock to properly initialize IAM accounts Fixed nil pointer dereference in TestNewListEntryOwnerDisplayName by directly initializing the IdentityAccessManagement.accounts map in the test setup. This ensures newListEntry can properly look up account display names without panicking. * cleanup * s3api: remove premature main file cleanup in versioned uploads Removed incorrect cleanup logic that was deleting main files during versioned uploads. This was causing test failures because it deleted objects that should have been preserved as null versions when versioning was first enabled. The deduplication logic in listing is sufficient to handle duplicate entries without deleting files during upload. * s3api: add empty-value guard to getEtagFromEntry Added the same empty-value guard used in getObjectETag to prevent returning quoted empty strings. When ExtETagKey exists but is empty, the function now falls through to filer.ETag calculation instead of returning "". * s3api: fix listing of directory key objects with matching prefix Revert prefix handling logic to use strings.TrimPrefix instead of checking HasPrefix with empty string result. This ensures that when a directory key object exactly matches the prefix (e.g. prefix="dir/", object="dir/"), it is correctly handled as a regular entry instead of being skipped or incorrectly processed as a common prefix. Also fixed missing variable definition. * s3api: refactor list inline dedup to use appendOrDedup helper Refactored the inline deduplication logic in listFilerEntries to use the shared appendOrDedup helper function. This ensures consistent behavior and reduces code duplication. * test: fix port allocation race in s3tables integration test Updated startMiniCluster to find all required ports simultaneously using findAvailablePorts instead of sequentially. This prevents race conditions where the OS reallocates a port that was just released, causing multiple services (e.g. Filer and Volume) to be assigned the same port and fail to start.	1 month ago
Lisandro Pin	2cda4289f4	Add a version token on RPCs to read/update volume server states. (#8191 ) * Add a version token on `GetState()`/`SetState()` RPCs for volume server states. * Make state version a property ov `VolumeServerState` instead of an in-memory counter. Also extend state atomicity to reads, instead of just writes.	1 month ago
Lisandro Pin	f84b70c362	Implement index (fast) scrubbing for regular/EC volumes. (#8207 ) Implement index (fast) scrubbing for regular/EC volumes via `ScrubVolume()`/`ScrubEcVolume()`. Also rearranges existing index test files for reuse across unit tests for different modules.	1 month ago
Chris Lu	82d9d8687b	Fix concurrent map access in EC shards info (#8222 ) * fix concurrent map access in EC shards info #8219 * refactor: simplify Disk.ToDiskInfo to use ecShards snapshot and avoid redundant locking * refactor: improve GetEcShards with pre-allocation and defer	1 month ago
Chris Lu	7169fe585d	Delete verify_gc_empty_test.go	1 month ago
Chris Lu	2bb21ea276	feat: Add Iceberg REST Catalog server and admin UI (#8175 ) * feat: Add Iceberg REST Catalog server Implement Iceberg REST Catalog API on a separate port (default 8181) that exposes S3 Tables metadata through the Apache Iceberg REST protocol. - Add new weed/s3api/iceberg package with REST handlers - Implement /v1/config endpoint returning catalog configuration - Implement namespace endpoints (list/create/get/head/delete) - Implement table endpoints (list/create/load/head/delete/update) - Add -port.iceberg flag to S3 standalone server (s3.go) - Add -s3.port.iceberg flag to combined server mode (server.go) - Add -s3.port.iceberg flag to mini cluster mode (mini.go) - Support prefix-based routing for multiple catalogs The Iceberg REST server reuses S3 Tables metadata storage under /table-buckets and enables DuckDB, Spark, and other Iceberg clients to connect to SeaweedFS as a catalog. * feat: Add Iceberg Catalog pages to admin UI Add admin UI pages to browse Iceberg catalogs, namespaces, and tables. - Add Iceberg Catalog menu item under Object Store navigation - Create iceberg_catalog.templ showing catalog overview with REST info - Create iceberg_namespaces.templ listing namespaces in a catalog - Create iceberg_tables.templ listing tables in a namespace - Add handlers and routes in admin_handlers.go - Add Iceberg data provider methods in s3tables_management.go - Add Iceberg data types in types.go The Iceberg Catalog pages provide visibility into the same S3 Tables data through an Iceberg-centric lens, including REST endpoint examples for DuckDB and PyIceberg. * test: Add Iceberg catalog integration tests and reorg s3tables tests - Reorganize existing s3tables tests to test/s3tables/table-buckets/ - Add new test/s3tables/catalog/ for Iceberg REST catalog tests - Add TestIcebergConfig to verify /v1/config endpoint - Add TestIcebergNamespaces to verify namespace listing - Add TestDuckDBIntegration for DuckDB connectivity (requires Docker) - Update CI workflow to use new test paths * fix: Generate proper random UUIDs for Iceberg tables Address code review feedback: - Replace placeholder UUID with crypto/rand-based UUID v4 generation - Add detailed TODO comments for handleUpdateTable stub explaining the required atomic metadata swap implementation * fix: Serve Iceberg on localhost listener when binding to different interface Address code review feedback: properly serve the localhost listener when the Iceberg server is bound to a non-localhost interface. * ci: Add Iceberg catalog integration tests to CI Add new job to run Iceberg catalog tests in CI, along with: - Iceberg package build verification - Iceberg unit tests - Iceberg go vet checks - Iceberg format checks * fix: Address code review feedback for Iceberg implementation - fix: Replace hardcoded account ID with s3_constants.AccountAdminId in buildTableBucketARN() - fix: Improve UUID generation error handling with deterministic fallback (timestamp + PID + counter) - fix: Update handleUpdateTable to return HTTP 501 Not Implemented instead of fake success - fix: Better error handling in handleNamespaceExists to distinguish 404 from 500 errors - fix: Use relative URL in template instead of hardcoded localhost:8181 - fix: Add HTTP timeout to test's waitForService function to avoid hangs - fix: Use dynamic ephemeral ports in integration tests to avoid flaky parallel failures - fix: Add Iceberg port to final port configuration logging in mini.go * fix: Address critical issues in Iceberg implementation - fix: Cache table UUIDs to ensure persistence across LoadTable calls The UUID now remains stable for the lifetime of the server session. TODO: For production, UUIDs should be persisted in S3 Tables metadata. - fix: Remove redundant URL-encoded namespace parsing mux router already decodes %1F to \x1F before passing to handlers. Redundant ReplaceAll call could cause bugs with literal %1F in namespace. * fix: Improve test robustness and reduce code duplication - fix: Make DuckDB test more robust by failing on unexpected errors Instead of silently logging errors, now explicitly check for expected conditions (extension not available) and skip the test appropriately. - fix: Extract username helper method to reduce duplication Created getUsername() helper in AdminHandlers to avoid duplicating the username retrieval logic across Iceberg page handlers. * fix: Add mutex protection to table UUID cache Protects concurrent access to the tableUUIDs map with sync.RWMutex. Uses read-lock for fast path when UUID already cached, and write-lock for generating new UUIDs. Includes double-check pattern to handle race condition between read-unlock and write-lock. * style: fix go fmt errors * feat(iceberg): persist table UUID in S3 Tables metadata * feat(admin): configure Iceberg port in Admin UI and commands * refactor: address review comments (flags, tests, handlers) - command/mini: fix tracking of explicit s3.port.iceberg flag - command/admin: add explicit -iceberg.port flag - admin/handlers: reuse getUsername helper - tests: use 127.0.0.1 for ephemeral ports and os.Stat for file size check * test: check error from FileStat in verify_gc_empty_test	1 month ago
Lisandro Pin	ff5a8f0579	Implement RPC skeleton for regular/EC volumes scrubbing. (#8187 ) * Implement RPC skeleton for regular/EC volumes scrubbing. See https://github.com/seaweedfs/seaweedfs/issues/8018 for details. * Minor proto improvements for `ScrubVolume()`, `ScrubEcVolume()`: - Add fields for scrubbing details in `ScrubVolumeResponse` and `ScrubEcVolumeResponse`, instead of reporting these through RPC errors. - Return a list of broken shards when scrubbing EC volumes, via `EcShardInfo'.	1 month ago
Lisandro Pin	345ac950b6	Add volume server RPCs to read and update state flags. (#8186 ) * Boostrap persistent state for volume servers. This PR implements logic load/save persistent state information for storages associated with volume servers, and reporting state changes back to masters via heartbeat messages. More work ensues! See https://github.com/seaweedfs/seaweedfs/issues/7977 for details. * Add volume server RPCs to read and update state flags.	1 month ago
Chris Lu	f23e09f58b	fix: skip exhausted blocks before creating an interval (#8180 ) * fix: skip exhausted blocks before creating an interval * refactor: optimize interval creation and fix logic duplication * docs: add docstring for LocateData * refactor: extract moveToNextBlock helper to deduplicate logic * fix: use int64 for block index comparison to prevent overflow * test: add unit test for LocateData boundary crossing (issue #8179) * fix: skip exhausted blocks to prevent negative interval size and panics (issue #8179) * refactor: apply review suggestions for test maintainability and code style	1 month ago

1 2 3 4 5 ...

1031 Commits (bbadeeb89bd547d9c85e60fb0c08187797f9917e)