seaweedfs

Commit Graph

Author	SHA1	Message	Date
Ping Qiu	e22e57a3f7	feat: WAL admission metrics for visibility into write pressure behavior Add counters (total, soft, hard, timeout) and wait-time histogram to WALAdmission, wired through EngineMetrics and exported as Prometheus metrics. Six new tests verify all code paths. Nil-safe for backwards compatibility. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	1 week ago
Ping Qiu	003b8c2f28	fix: require explicit build tags for io_uring backends, add implementation logging All three io_uring backends (iceber, giouring, raw) now require explicit build tags — no tag means standard-only. Each backend registers its name via IOUringImpl so startup logs show compiled implementation alongside requested/selected backend mode. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	1 week ago
Ping Qiu	cd1e0afa3b	feat: three io_uring backends for A/B/C benchmarking Split iouring_linux.go into three build-tagged implementations: 1. iouring_iceber_linux.go (-tags iouring_iceber) iceber/iouring-go library. Goroutine-based completion model. Known -72% write regression due to per-op channel overhead. 2. iouring_giouring_linux.go (-tags iouring_giouring) pawelgaczynski/giouring — direct liburing port. No goroutines, no channels. Direct SQE/CQE ring manipulation. Kernel 6.0+. 3. iouring_raw_linux.go (default on Linux, no tags needed) Raw syscall wrappers — io_uring_setup/io_uring_enter + mmap. Zero dependencies. ~300 LOC. Kernel 5.6+. Build commands for benchmarking: go build -tags iouring_iceber ./... # option A go build -tags iouring_giouring ./... # option B go build ./... # option C (raw, default) go build -tags no_iouring ./... # disable all io_uring All variants implement the same BatchIO interface. Cross-compile verified for all four tag combinations. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	1 week ago
Ping Qiu	5e4baccc46	fix: use RequestSet.Requests() API for io_uring result iteration The iceber/iouring-go SubmitRequests returns a RequestSet interface which cannot be ranged over directly. Use resultSet.Done() to wait for all completions, then iterate resultSet.Requests(). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	1 week ago
Ping Qiu	9d0ec8efa3	feat: tri-state IOBackend config with explicit logging and CLI flag Replace UseIOUring bool with IOBackend IOBackendMode (tri-state): - "standard" (default): sequential pread/pwrite/fdatasync - "auto": try io_uring, fall back to standard with warning log - "io_uring": require io_uring, fail startup if unavailable NewIOUring now returns ErrIOUringUnavailable instead of silently falling back — callers decide whether to fail or fall back based on the requested mode. All mode transitions are logged: io backend: requested=auto selected=standard reason=... io backend: requested=io_uring selected=io_uring CLI: --io-backend=standard\|auto\|io_uring added to iscsi-target. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	1 week ago
Ping Qiu	66d5ba0a84	fix: BatchIO review fixes — linked SQE, ring overflow, resource leak, sync parity 1. HIGH: LinkedWriteFsync now uses SubmitLinkRequests (IOSQE_IO_LINK) instead of SubmitRequests, ensuring write+fdatasync execute as a linked chain in the kernel. Falls back to sequential on error. 2. HIGH: PreadBatch/PwriteBatch chunk ops by ring capacity to prevent "too many requests" rejection when dirty map exceeds ring size (256). 3. MED: CloseBatchIO() added to Flusher, called in BlockVol.Close() after final flush to release io_uring ring / kernel resources. 4. MED: Sync parity — both standard and io_uring paths now use fdatasync (via platform-specific fdatasync_linux.go / fdatasync_other.go). Standard path previously used fsync; now matches io_uring semantics. On non-Linux, fdatasync falls back to fsync (only option available). 10 batchio tests, all blockvol tests pass. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	1 week ago
Ping Qiu	04b1827b4a	feat: io_uring BatchIO implementation + UseIOUring config wiring Add iouring_linux.go (build-tagged linux && !no_iouring) using iceber/iouring-go for batched pread/pwrite/fdatasync. Includes linked write+fsync chain for group commit optimization. iouring_other.go provides silent fallback to standard on non-Linux. blockvol.go wires UseIOUring config flag through to flusher BatchIO. NewIOUring gracefully falls back if kernel lacks io_uring support. 10 batchio tests, all blockvol tests pass unchanged. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	1 week ago
Ping Qiu	e55f369d66	feat: BatchIO interface for swappable flusher I/O backend New package batchio/ with BatchIO interface (PreadBatch, PwriteBatch, Fsync, LinkedWriteFsync) and standard sequential implementation. Flusher refactored to use BatchIO: WAL header reads, WAL entry reads, and extent writes are now batched through the interface. With the default NewStandard() backend, behavior is identical to before. UseIOUring config field added for future io_uring opt-in (Linux 5.6+). 9 interface tests, all existing blockvol tests pass unchanged. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	1 week ago
Ping Qiu	4c5f9f2b9d	feat: CP10B-1 NVMe/TCP RX/TX split + CP10B-2 bench/profiling fixes RX/TX split: rxLoop reads PDUs, txLoop writes responses via respCh. Handlers refactored to void + enqueueResponse pattern. IOCCSZ fix enables inline write data (100K IOPS vs 15K before). R2T deadlock fix via completeWaiters. Shutdown cleans up pendingCapsules buffers. Bench: ParseFioMetric accepts plain/quoted numbers for aggregated medians. Profiling actions: pprof_capture, vmstat_capture, iostat_capture. 196 NVMe tests, 92 testrunner actions. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	1 week ago
Ping Qiu	3557ae283f	feat: Phase 10 CP10-3 -- NVMe/TCP Tier 1 optimizations, WAL admission control, benchmark platform CP10-3 Tier 1 optimizations (T1-T4): - TCP_NODELAY + 256KB socket buffers on NVMe/TCP connections - Response batching: all C2H data chunks + CapsuleResp in single flush - Tiered buffer pool (4KB/64KB/256KB sync.Pool) for write payloads - Configurable MaxH2CDataLength wiring through controller/IC/chunking BUG-CP103-1: NVMe write retry with jittered backoff for transient WAL pressure - writeWithRetry() with bounded backoff [50/200/800ms] - throttleOnWALPressure() pre-write delay above 90% WAL usage - WALPressureProvider interface + NVMeAdapter.WALPressure() BUG-CP103-2: Volume-level WAL admission control - WALAdmission with counting semaphore (max concurrent writers) - Soft watermark (0.7): small delay to desynchronize herd - Hard watermark (0.9): block until flusher drains - Single-deadline budget shared across watermark wait + semaphore - Close-aware during both watermark and semaphore waits - Wired into BlockVol.WriteLBA() and Trim() Benchmark platform enhancements: - NVMe benchmark actions and scenarios (A/B, CW sweep, IOQ sweep) - Database benchmark actions (SQLite, pgbench) - K8s operator QA reconciler tests - New testrunner scenarios for HA, fault injection, CSI lifecycle Test counts: 213 NVMe + 625 engine + operator + testrunner tests, all passing. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2 weeks ago
Ping Qiu	bbadeeb89b	feat: Phase 10 CP10-2 -- CSI NVMe/TCP node plugin, 210 tests NVMe/TCP transport support in the CSI driver so Kubernetes pods can mount block volumes via NVMe alongside (or instead of) iSCSI. Transport selection: NVMe preferred when nvme_tcp module loaded + metadata present + nvmeUtil available. Fail-fast on NVMe errors (no silent iSCSI fallback). .transport file persists across CSI restarts. Key changes: - BuildNQN() single source of truth for NQN construction (naming.go) - NVMeUtil interface + realNVMeUtil wrapping nvme-cli (nvme_util.go) - NodeStageVolume/Unstage/Expand dual-transport paths (node.go) - NvmeAddr/NQN fields in VolumeInfo, Controller contexts - VolumeManager NvmeAddr()/VolumeNQN() getters - BlockService NvmeListenAddr()/NQN() accessors - 27 unit tests + 26 QA adversarial tests (nvme_node_test.go, qa_cp102) - Fix: flaky TestQA_Node_ConcurrentStageUnstage (pre-alloc temp dirs) Review fixes applied: F1 (NQN format mismatch), F2 (CreateVolume drops NVMe context), F3 (IsConnected error classification), F4 (findSubsys path validation), F5 (MasterVolumeClient NVMe gap documented). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2 weeks ago
Ping Qiu	0e234f5c80	feat: Phase 10 CP10-1 -- NVMe/TCP target MVP, 109 tests NVMe over Fabrics (TCP) target implementation sharing the same BlockVol engine, fencing, replication, and failover as the existing iSCSI target. New package: weed/storage/blockvol/nvme/ (11 files, 2,242 production LOC) - protocol.go: PDU types, opcodes, status codes, marshal/unmarshal - wire.go: TCP reader/writer with header bounds validation - controller.go: IC handshake, per-queue state, command dispatch, KATO - fabric.go: Connect (admin+IO), PropertyGet/Set, Disconnect - identify.go: Controller/Namespace/NS list/NS descriptors (Linux 5.15) - admin.go: SetFeatures, GetFeatures, GetLogPage (SMART/ANA), KeepAlive - io.go: Read (C2HData), Write (inline), Flush, WriteZeros/Trim - server.go: TCP listener, admin session registry, graceful shutdown - adapter.go: BlockVol-to-NVMe bridge, error mapping, ANA state Integration: NVMeConfig + CLI flags (-block.nvme.*), disabled by default. Key design: inline-data writes only (no R2T), MaxH2CDataLength=32KB, single ANA group coherent with BlockVol role, CNTLID session registry for cross-connection IO queues, HostNQN continuity enforcement. Tests: 65 dev + 44 QA adversarial = 109 total, all passing. Bugs fixed during review: IO queue cross-connection (A), header bounds validation (B), write payload size check (C), disconnect error (D), stream desync prevention (E), HostNQN enforcement (F), capsule-before-IC state guard (H), flowCtlOff SQHD timing (I). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2 weeks ago
Ping Qiu	8fa1829992	feat: Phase 9A -- Kubernetes operator MVP for SeaweedFS block storage, 71 tests Nested Go module (operator/go.mod) isolating controller-runtime deps. CRD SeaweedBlockCluster (block.seaweedfs.com/v1alpha1) with dual-mode: CSI-only (MasterRef) connects to existing cluster; full-stack (Master) deploys master+volume StatefulSets. Single reconciler manages all sub-resources with ownership labels, finalizer cleanup, CHAP secret auto-generation, and multi-CR conflict detection. Review fixes: cross-NS label ownership (H1), ParseQuantity validation (H2), volume readiness probe (M1), leader election (M2), PVC StorageClassName (M3), condition type separation (M4), FQDN master address (L1), port validation (L3). QA adversarial fixes: ExtraArgs override rejection (BUG-QA-1), malformed lastRotated infinite rotation (BUG-QA-2), DNS label length validation (BUG-QA-3), replicas=0 error message (BUG-QA-4), RFC 1123 name validation (BUG-QA-5), whitespace field trimming (BUG-QA-6), zero storage size (BUG-QA-7). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2 weeks ago
Ping Qiu	9acd187587	feat: Phase 8 complete -- CP8-5 stability gate, lease grant fix, Docker e2e, 13 chaos scenarios Phase 8 closes with all 6 checkpoints done (CP8-1 through CP8-5 + CP8-3-1): - CP8-5: 12/12 enterprise QA scenarios PASS on real hardware (m01/M02) - Master-authoritative lease grants (BUG-CP85-11): master renews primary write leases on every heartbeat response, replacing retain-until-confirmed assignment queue semantics that caused 30s lease expiry - Post-rebuild WAL shipping gap fix (BUG-CP85-1): syncLSNAfterRebuild advances replica nextLSN so WAL entries are accepted after rebuild - Block heartbeat startup race fix (BUG-CP85-10): dynamic blockService check on each tick instead of one-shot at loop start - 8 new tests: 4 engine lease grant + 4 registry lease grant - 13 new YAML scenarios: chaos (kill-loop, partition, disk-full), database integrity (sqlite crash, ext4 fsck), perf baseline, metrics verify, snapshot stress, expand-failover, session storm, role flap, 24h soak - 12 new testrunner actions (database, fsck, grep_log, write_loop_bg, stop_bg, assert_metric_gt/eq/lt) + phase repeat support - Docker compose setup + getting-started guide for block storage users - 960+ cumulative unit tests, 24 YAML scenarios Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2 weeks ago
Ping Qiu	da1b81d1c9	feat: CP8-3-1 durability modes + testrunner platform + 21 adversarial tests Durability mode implementation (sync_all, sync_quorum, best_effort): - DurabilityMode type with superblock persistence, parse/validate/string - MakeDistributedSync mode-aware barrier enforcement in dist_group_commit - blockerr sentinel package (ErrDurabilityBarrierFailed, ErrDurabilityQuorumLost) - gRPC create path: mode validation, idempotent create consistency, partial cleanup - F1: strict mode rejects partial replica provisioning with cleanup - F3: empty heartbeat does not overwrite persisted strict mode - F4: SCSI error mapping uses errors.Is sentinels (not string matching) - Proto/wire/blockapi/CLI/UI plumbing for durability_mode field - Observability dashboard: cluster health cards + per-volume columns Testrunner platform (YAML-driven integration test framework): - Engine, parser, registry, reporter (JUnit XML + HTML), metrics scraping - 52 registered actions: block, iSCSI, I/O, fault injection, assertions - Baseline regression framework with 7 hard-fail conditions - 15 YAML scenarios (smoke, crash, HA, fault, consistency, snapshot) - 49 unit tests for testrunner internals QA adversarial suite (21 tests, all PASS): - Idempotent create mode/RF mismatch detection - Heartbeat mode downgrade prevention (F3) - sync_all/sync_quorum partial replica enforcement (F1) - Concurrent create race safety - Failover/expand mode preservation - Cleanup resilience when delete fails - Master restart auto-register mode handling - Superblock roundtrip all 3 modes - Validate edge cases (mode×RF matrix) - RequiredReplicas quorum math verification - Sentinel error categorization Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2 weeks ago
Ping Qiu	979a9b496c	feat: Phase 8 CP8-1/2/3/4 -- ops control plane, multi-replica, CSI snapshots, observability CP8-1: HTTP REST API (create/delete/lookup/list/assign/servers), blockapi Go client with multi-master failover, 5 shell commands, HTML dashboard at /block/. CP8-2: RF=2/RF=3 multi-replica support -- ShipperGroup fan-out, distributed sync, health scoring, segment-based scrub, gated promotion (heartbeat freshness + WAL LSN + role checks), failover/rebuild for N>2 replicas. CP8-3: CSI snapshot + expansion -- CreateSnapshot/DeleteSnapshot/ListSnapshots RPCs, NodeExpandVolume with iSCSI rescan, snapshot ID helpers, 20 adversarial tests covering concurrent ops, edge cases, and error injection. CP8-4: Observability -- EngineMetrics atomic counters for flusher/group-commit/ WAL-shipper/scrub, 10 new Prometheus metrics, barrier_lag_lsn SLO gauge, failover/promotion/rebuild counters, request ID correlation in master gRPC logs, baseline regression framework with 7 hard-fail conditions. Total: 63 files, ~11.2K LOC, 160+ new tests. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2 weeks ago
Ping Qiu	8b2b5f6f66	feat: Phase 6 CP6-3 -- failover + rebuild in Kubernetes, 126 tests Wire low-level fencing primitives to master/VS control plane and CSI: - Proto: replica/rebuild address fields on assignment/info/response messages - Assignment queue: retain-until-confirmed (Peek+Confirm), stale epoch pruning - VS assignment receiver: processes assignments from HeartbeatResponse - BlockService replication: ProcessAssignments, deterministic ports (FNV hash) - Registry replica tracking: SetReplica/ClearReplica/SwapPrimaryReplica - CreateBlockVolume: primary + replica, enqueues assignments, single-copy mode - Failover: lease-aware promotion, deferred timers with cancellation on reconnect - ControllerPublish: returns fresh primary iSCSI address after failover - Recovery: recoverBlockVolumes drains pendingRebuilds, enqueues Rebuilding - Real integration tests on M02: failover address switch, rebuild data consistency, full lifecycle failover+rebuild (3 tests, all PASS) Review fixes (12 findings, 5 High, 5 Medium, 2 Low): - R1-1: AllocateBlockVolume returns replication ports - R1-2: setupPrimaryReplication starts rebuild server - R1-3: VS sends periodic block heartbeat for assignment confirmation - R2-F1: LastLeaseGrant set before Register (no stale-lease race) - R2-F2: Deferred promotion timers cancelled on VS reconnect - R2-F3: SwapPrimaryReplica uses RoleToWire instead of uint32(1) - R2-F4: DeleteBlockVolume deletes replica (best-effort) - R2-F5: SwapPrimaryReplica computes epoch atomically under lock - QA: SetReplica removes old replica from byServer index (BUG-QA-CP63-1) 126 CP6-3 tests (67 dev + 48 QA + 8 integration + 3 real). Cumulative Phase 6: 352 tests. All PASS. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2 weeks ago
Ping Qiu	5a9a52f2d0	feat: Phase 6 CP6-2 -- CSI control-plane integration + csi-sanity/k3s validation CP6-2 wires the CSI driver to SeaweedFS master/volume-server control plane: - Proto: block volume messages in master.proto/volume_server.proto, codegen - Master registry: in-memory BlockVolumeRegistry with Pending->Active status, full/delta heartbeat, inflight lock, placement (fewest volumes) - VS gRPC: AllocateBlockVolume/DeleteBlockVolume handlers, shared naming - Master RPCs: CreateBlockVolume (retry up to 3 servers), Delete, Lookup - Heartbeat: block volume fields wired into bidirectional stream - CSI Controller: VolumeBackend interface (Local + Master), returns volume_context - CSI Node: reads volume_context for remote targets, staged map + IQN derivation - Mode flag: --mode=controller/node/all, --master for control-plane - K8s manifests: csi-driver.yaml, csi-controller.yaml, csi-node.yaml csi-sanity conformance (33 pass, 58 skip) found 6 bugs: - BUG-SANITY-1/2/3: missing VolumeCapabilities/VolumeCapability validation - BUG-SANITY-4: NodePublish used mount instead of bind mount - BUG-SANITY-5: NodeUnpublish didn't remove target path - BUG-SANITY-6: NodeUnpublish failed on unmounted path k3s Level 4 (PVC->Pod data persistence) found 1 bug: - BUG-K3S-1: IsLoggedIn didn't handle iscsiadm exit code 21 226 CSI tests + 54 server tests = 280 new tests, all passing. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2 weeks ago
Ping Qiu	797854b2d9	test: Phase 5 QA adversarial tests -- 49 tests for CHAP, resize, snapshots - qa_chap_test.go: 16 tests (empty secret, replay, missing fields, hex cases) - qa_resize_test.go: 12 tests (concurrent, reopen, replica reject, alignment) - qa_snapshot_test.go: 21 tests (concurrent create, CoW, recovery, role checks) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2 weeks ago
Ping Qiu	531ee764ee	feat: Phase 5 CP5-3 -- CHAP auth, online resize, Prometheus metrics, 12 tests CHAP authentication (RFC 7143 S12.1): - auth.go: CHAPAuthenticator with MD5 challenge-response, ValidateCHAPConfig - login.go: multi-PDU SecurityNeg flow (challenge → verify → transit) - main.go: -chap-user/-chap-secret CLI flags with validation Online volume expand: - blockvol.go: Expand() with flusher pause, snapMu TOCTOU guard, alignment check - Rejects shrink (ErrShrinkNotSupported) and resize with active snapshots Prometheus metrics: - metrics.go: metricsAdapter wrapping BlockDevice, 15 metrics (counters, histograms, gauge funcs for WAL/dirty-map/epoch/role/snapshots) - Dedicated prometheus.NewRegistry() per server instance Admin HTTP endpoints: - POST /snapshot (create/delete/restore/list) - POST /resize (online expand) - GET /metrics (Prometheus text format) - VolumeSize added to /status response Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2 weeks ago
Ping Qiu	d874e21f93	feat: Phase 5 CP5-2 -- CoW snapshots, 10 tests Sparse delta-file snapshots with copy-on-write in the flusher. Zero write-path overhead when no snapshot is active. New: snapshot.go (SnapshotBitmap, SnapshotHeader, delta file I/O) Modified: flusher.go (flushMu, CoW phase in FlushOnce, PauseAndFlush) Modified: blockvol.go (Create/Read/Delete/Restore/ListSnapshots, recovery) Modified: wal_writer.go (Reset for snapshot restore) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2 weeks ago
Ping Qiu	98d0e9e631	feat: Phase 5 CP5-1 -- ALUA + multipath failover, 28 tests Add ALUA (Asymmetric Logical Unit Access) support to the iSCSI target, enabling dm-multipath on Linux to automatically detect path state changes and reroute I/O during HA failover without initiator-side intervention. - ALUAProvider interface with implicit ALUA (TPGS=0x01) - INQUIRY byte 5 TPGS bits, VPD 0x83 with NAA+TPG+RTP descriptors - REPORT TARGET PORT GROUPS handler (MAINTENANCE IN SA=0x0A) - MAINTENANCE OUT rejection (implicit-only, no SET TPG) - Standby write rejection (NOT_READY ASC=04h ASCQ=0Bh) - RoleNone maps to Active/Optimized (standalone single-node compatibility) - NAA-6 device identifier derived from volume UUID - -tpg-id flag with [1,65535] validation - dm-multipath config + setup script (group_by_tpg, ALUA prio) - 12 unit tests + 16 QA adversarial tests + 4 integration tests Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2 weeks ago
Ping Qiu	7940e6b7c9	feat: Phase 4A CP4b-4 Windows iSCSI + instrumentation + QA tests Windows iSCSI Initiator compatibility: - Add TargetPortalGroupTag to login response (RFC 7143 S13.9) - Add REQUEST_SENSE, START_STOP_UNIT, MODE_SELECT(6/10) handlers - Add PERSISTENT_RESERVE_IN/OUT, MAINTENANCE_IN (REPORT SUPPORTED OPCODES) - Implement MODE SENSE caching (page 0x08) and control (page 0x0A) pages - Fix Data-In residual underflow/overflow flags (U/O bits on final PDU) - Rename ScsiReadCapacity16 -> ScsiServiceActionIn16 for correctness Instrumentation and tooling: - Add instrumentedAdapter with periodic PERF stats logging - Add pprof endpoints on admin HTTP server (/debug/pprof/*) - Add blockbench CLI tool for standalone block device benchmarking - Add SCSI CDB debug logging in session dispatch HA integration fixes: - Move HA test replica ports to 9011-9014 to avoid conflicts - Add QA adversarial tests for Phase 4A CP4b-4 (755 lines) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2 weeks ago
Ping Qiu	44da35faf6	test: add integration test infrastructure for blockvol iSCSI Test harness for running blockvol iSCSI tests on WSL2 and remote nodes (m01/M02). Includes Node (SSH/local exec), ISCSIClient (discover/login/ logout), WeedTarget (weed volume server lifecycle), and test suites for smoke, stress, crash recovery, chaos, perf benchmarks, and apps (fio/dd). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	3 weeks ago
Ping Qiu	c39080ceaa	feat: Phase 4A CP4b-4 -- HA integration tests, admin HTTP, 5 bug fixes Add HTTP admin server to iscsi-target binary (POST /assign, GET /status, POST /replica, POST /rebuild) and 7 HA integration tests validating failover, split-brain prevention, epoch fencing, and demote-under-IO. New files: - admin.go: HTTP admin endpoint with input validation - ha_target.go: HATarget helper wrapping Target + admin HTTP calls - ha_test.go: 7 HA tests (all PASS on WSL2, 67.7s total) Bug fixes: - BUG-CP4B4-1: CmdSN init (expCmdSN=0 not 1, first SCSI cmd was dropped) - BUG-CP4B4-2: RoleNone->RoleReplica missing SetEpoch (WAL rejected) - BUG-CP4B4-3: replica applyEntry didn't update vol.nextLSN (status=0) - BUG-CP4B4-4: PID discovery killed primary instead of replica (shared binPath; fixed by grepping volFile) - BUG-CP4B4-5: artifact collector overwrote primary log with replica log (added CollectLabeled method) Also: 3s write deadline on WAL shipper data connection to avoid 120s TCP retransmission timeout when replica is dead. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	3 weeks ago
Ping Qiu	7c07d9c95a	feat: Phase 4A CP4b-3 -- assignment processing, 2 bug fixes, 20 QA tests Add ProcessBlockVolumeAssignments to BlockVolumeStore and wire AssignmentSource/AssignmentCallback into the heartbeat collector's Run() loop. Assignments are fetched and applied each tick after status collection. Bug fixes: - BUG-CP4B3-1: TOCTOU between GetBlockVolume and HandleAssignment. Added withVolume() helper that holds RLock across lookup+operation, preventing RemoveBlockVolume from closing the volume mid-assignment. - BUG-CP4B3-2: Data race on callback fields read by Run() goroutine. Made StatusCallback/AssignmentSource/AssignmentCallback private, added cbMu mutex and SetXxx() setter methods. Lock held only for load/store, not during callback execution. 7 dev tests + 13 QA adversarial tests = 20 new tests. 972 total unit tests, all passing. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	3 weeks ago
Ping Qiu	c95500cc57	test: Phase 4A CP4b-1 QA adversarial tests (19 tests) Boundary tests for RoleFromWire, LeaseTTLToWire overflow/clamp/negative, ToBlockVolumeInfoMessage with primary/stale/closed/concurrent volumes, BlockVolumeAssignment roundtrip, and heartbeat collection edge cases. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	3 weeks ago
Ping Qiu	ffdde15bcd	feat: Phase 4A CP4b-1 -- wire types, conversion helpers, heartbeat collection Add BlockVolumeInfoMessage, BlockVolumeShortInfoMessage, BlockVolumeAssignment wire-type structs (proto-shaped Go structs). Add conversion helpers with DiskType plumbing, overflow-safe LeaseTTLToWire, validated RoleFromWire. Add CollectBlockVolumeHeartbeat on BlockVolumeStore. 9 new tests. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	3 weeks ago
Ping Qiu	09c7e40d29	feat: Phase 4A CP4a -- simulated master, assignment sequence tests, BlockVolumeStatus Add SimulatedMaster test helper + 20 assignment sequence tests (8 sequence, 5 failover, 5 adversarial, 2 status). Add BlockVolumeStatus struct and Status() method. Includes QA test files for CP1-CP4a. 940 total unit tests. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	3 weeks ago
Ping Qiu	b31383e294	feat: Phase 4A CP3 -- promotion, rebuild, split-brain prevention Add master-driven lifecycle operations: promotion, demotion, rebuild, and split-brain prevention. All testable on Windows with mock TCP. New files: - promotion.go: HandleAssignment (single entry point for role changes), promote (Replica/None -> Primary with durable epoch), demote (Primary -> Draining -> Stale with drain timeout) - rebuild.go: RebuildServer (WAL catch-up + full extent streaming), StartRebuild client (WAL catch-up with full extent fallback, two-phase rebuild with second catch-up for concurrent writes) Modified: - wal_writer.go: ScanFrom() method, ErrWALRecycled sentinel - repl_proto.go: rebuild message types + RebuildRequest encode/decode - blockvol.go: assignMu, drainTimeout, rebuildServer fields; HandleAssignment/StartRebuildServer/StopRebuildServer methods; rebuild server stop in Close() - dirty_map.go: Clear() method for full extent rebuild 32 new tests covering WAL scan, promotion/demotion, rebuild server, rebuild client, split-brain prevention, and full lifecycle scenarios. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	3 weeks ago
Ping Qiu	16a796e56d	feat: Phase 4A CP2 — WAL shipping, replica barrier, distributed group commit Primary ships WAL entries to replica over TCP (data channel), confirms durability via barrier RPC (control channel). SyncCache runs local fsync and replica barrier in parallel via MakeDistributedSync. When replica is unreachable, shipper enters permanent degraded mode and falls back to local-only sync (Phase 3 behavior). Key design: two separate TCP ports (data+control), contiguous LSN enforcement, epoch equality check, WAL-full retry on replica, cond.Wait-based barrier with configurable timeout, BarrierFsyncFailed status code. Close lifecycle: shipper → receiver → drain → committer → flusher → fd. New files: repl_proto.go, wal_shipper.go, replica_apply.go, replica_barrier.go, dist_group_commit.go Modified: blockvol.go, blockvol_test.go 27 dev tests + 21 QA tests = 48 new tests; 889 total (609 engine + 280 iSCSI), all passing. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	3 weeks ago
Ping Qiu	a107685f00	feat: Phase 4A CP1 — epoch, lease, role state machine, write gate Local fencing primitives for block volumes. Every write path validates role + epoch + lease before accepting data. RoleNone (default) skips all checks for Phase 3 backward compatibility. New files: epoch.go, lease.go, role.go, write_gate.go Modified: superblock.go (Epoch field), blockvol.go (fencing fields, writeGate in WriteLBA/Trim), group_commit.go (PostSyncCheck/Gotcha A), dirty_map.go (P3-BUG-9 power-of-2 panic) Bug fixes: BUG-4A-1 (atomic epoch), BUG-4A-2 (CAS SetRole), BUG-4A-3 (mutex SetEpoch), BUG-4A-4 (single role.Load), BUG-4A-6 (safeCallback recover) 837 tests (557 engine + 280 iSCSI), all passing. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	3 weeks ago
Ping Qiu	80801b0fac	feat: Phase 3 — performance tuning, iSCSI session refactor, store integration Phase 3 delivers five checkpoints: CP1 Engine Tuning: BlockVolConfig tunables, 256-shard DirtyMap, adaptive group commit (low-watermark immediate flush), WAL pressure handling with backpressure and ErrWALFull timeout. CP2 iSCSI Session Refactor: RX/TX goroutine split with respCh (cap 64), txLoop for serialized response writes, StatSN assignment modes. Login phase stays single-goroutine; full-duplex after login. CP3 Store Integration: BlockVolAdapter (iscsi.BlockDevice interface), BlockVolumeStore management, BlockService in volume_server_block.go, CLI flags (--block.listen/dir/iqn.prefix), sw-block-attach.sh helper. CP5 Concurrency Hardening: WAL reuse guard (LSN validation in ReadLBA), opsOutstanding counter with beginOp/endOp + Close drain, appendWithRetry shared by WriteLBA and TrimLBA, flusher LSN guard in FlushOnce. Bug fixes (P3-BUG-2–11): unbounded pending queue cap, Data-Out timeout, flusher error logging, GroupCommitter panic recovery, Close vs concurrent ops guard, target shutdown race, WAL-full retry vs Close, WRITE SAME(16) for XFS, MODE SENSE(10) + VPD 0xB0/0xB2 for Linux kernel compatibility. 797 tests passing (517 engine + 280 iSCSI), go vet clean. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	3 weeks ago
Ping Qiu	9b7be60b0c	Add QA adversarial tests for iSCSI target (55 tests) 9 categories: PDU, Params, Login, Discovery, SCSI, DataIO, Session, Target, Integration. 2,183 lines. All 229 tests pass (164 dev + 55 QA). No new production bugs found. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	3 weeks ago
Ping Qiu	feef0206ad	fix: address code review findings (nil handler, CmdSN, Data-Out order) 1. Discovery session nil handler crash: reject SCSI commands with Reject PDU when s.scsi is nil (discovery sessions have no target). 2. CmdSN window enforcement: validate incoming CmdSN against [ExpCmdSN, MaxCmdSN] using serial arithmetic. Drop out-of-window commands per RFC 7143 section 4.2.2.1. 3. Data-Out buffer offset validation: enforce BufferOffset == received for ordered data (DataPDUInOrder=Yes). Prevents silent corruption from out-of-order or overlapping data. 4. ImmediateData enforcement: reject immediate data in SCSI command PDU when negotiated ImmediateData=No. 5. UNMAP descriptor length alignment: reject blockDescLen not a multiple of 16 bytes. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	3 weeks ago
Ping Qiu	bd73a81e00	fix: handle pipelined SCSI commands during Data-Out collection The Linux kernel iSCSI initiator pipelines multiple SCSI commands on the same TCP connection (command queuing). When a write needs R2T for data beyond the immediate portion, collectDataOut may read a pipelined SCSI command instead of the expected Data-Out PDU. Fix: queue non-Data-Out PDUs received during collectDataOut into a pending buffer. The main dispatch loop drains pending PDUs before reading from the connection. This correctly handles interleaved commands during multi-PDU write transfers. Bug found during WSL2 smoke test: mkfs.ext4 hangs at "Writing superblocks" because inode table zeroing sends large writes that exceed FirstBurstLength, triggering R2T while the kernel has already queued the next command. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	3 weeks ago
Ping Qiu	6546c549eb	fix: iSCSI login and discovery bugs found in WSL2 smoke test - Skip InitiatorAlias in negotiation (was returning NotUnderstood) - Capture TargetName in StageLoginOp direct-jump path (iscsiadm skips security stage, sends CSG=LoginOp directly -- nil SCSIHandler crash) - Add portalAddr to TargetServer for discovery responses (listener on [::] is not routable from WSL2 clients) - Add -portal flag to iscsi-target binary Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	3 weeks ago
Ping Qiu	6a400f6760	feat: add BlockVol engine and iSCSI target (Phase 1 + Phase 2) Phase 1: Extent-mapped block storage engine with WAL, crash recovery, dirty map, flusher, and group commit. 174 tests, zero SeaweedFS imports. Phase 2: Pure Go iSCSI target (RFC 7143) with PDU codec, login negotiation, SendTargets discovery, 12 SCSI opcodes, Data-In/Out/R2T sequencing, session management, and standalone iscsi-target binary. 164 tests. IQN->BlockDevice binding via DeviceLookup interface. Total: 338 tests, 14.6K lines. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	3 weeks ago
Xiao Wei	9fa95dd2c6	fix: unload leveldb not take effect (#8431 )	3 weeks ago
Chris Lu	e4b70c2521	go fix	4 weeks ago
Konstantin Lebedev	01b3125815	[shell]: volume balance capacity by min volume density (#8026 ) volume balance by min volume density and active volumes	4 weeks ago
Lisandro Pin	a9d12a0792	Implement full scrubbing for EC volumes (#8318 ) Implement full scrubbing for EC volumes.	1 month ago
Lisandro Pin	11fdb68281	Fix superblock write error checks on volume compaction. (#8352 )	1 month ago
Lisandro Pin	0721e3c1e9	Rework volume compaction (a.k.a vacuuming) logic to cleanly support new parameters. (#8337 ) We'll leverage on this to support a "ignore broken needles" option, necessary to properly recover damaged volumes, as described in https://github.com/seaweedfs/seaweedfs/issues/7442#issuecomment-3897784283 .	1 month ago
Lisandro Pin	fbe7dd32c2	Implement full scrubbing for regular volumes (#8254 ) Implement full scrubbing for regular volumes.	1 month ago
Lisandro Pin	1ebc9dd530	Have local EC volume scrubbing check needle integrity whenever possible. (#8334 ) If local EC scrubbing hits needles whose chunk location reside entirely in local shards, we can fully reconstruct them, and check CRCs for data integrity.	1 month ago
Chris Lu	75faf826d4	Fix LevelDB panic on lazy reload (#8269 ) (#8307 ) * fix LevelDB panic on lazy reload Implemented a thread-safe reload mechanism using double-checked locking and a retry loop in Get, Put, and Delete. Added a concurrency test to verify the fix and prevent regressions. Fixes #8269 * refactor: use helper for leveldb fix and remove deprecated ioutil * fix: prevent deadlock by using getFromDb helper Extracted DB lookup to internal helper to avoid recursive RLock in Put/Delete methods. Updated Get to use the helper as well. * fix: resolve syntax error and commit deadlock prevention Fixed a duplicate function declaration syntax error. Verified that getFromDb helper correctly prevents recursive RLock scenarios. * refactor: remove redundant timeout checks Removed nested `if m.ldbTimeout > 0` checks in Get, Put, and Delete methods as suggested in PR review.	1 month ago
Lisandro Pin	e657e7d827	Implement local scrubbing for EC volumes. (#8283 )	1 month ago
Lisandro Pin	1a5679a5eb	Implement a `VolumeEcStatus()` RPC for volume servers. (#8006 ) Just like `VolumeStatus()`, this call allows inspecting details for a given EC volume - including number of files and their total size.	1 month ago
Chris Lu	be6b5db65a	s3: fix health check endpoints returning 404 for HEAD requests #8243 (#8248 ) * Fix disk errors handling in vacuum compaction When a disk reports IO errors during vacuum compaction (e.g., 'read /mnt/d1/weed/oc_xyz.dat: input/output error'), the vacuum task should signal the error to the master so it can: 1. Drop the faulty volume replica 2. Rebuild the replica from healthy copies Changes: - Add checkReadWriteError() calls in vacuum read paths (ReadNeedleBlob, ReadData, ScanVolumeFile) to flag EIO errors in volume.lastIoError - Preserve error wrapping using %w format instead of %v so EIO propagates correctly - The existing heartbeat logic will detect lastIoError and remove the bad volume Fixes issue #8237 * error * s3: fix health check endpoints returning 404 for HEAD requests #8243	1 month ago

1 2 3 4 5 ...

1041 Commits (e22e57a3f712bec6092ecc997c2f6fabc7d8a82d)