seaweedfs

Commit Graph

Author	SHA1	Message	Date
Ping Qiu	5e4baccc46	fix: use RequestSet.Requests() API for io_uring result iteration The iceber/iouring-go SubmitRequests returns a RequestSet interface which cannot be ranged over directly. Use resultSet.Done() to wait for all completions, then iterate resultSet.Requests(). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	18 hours ago
Ping Qiu	9d0ec8efa3	feat: tri-state IOBackend config with explicit logging and CLI flag Replace UseIOUring bool with IOBackend IOBackendMode (tri-state): - "standard" (default): sequential pread/pwrite/fdatasync - "auto": try io_uring, fall back to standard with warning log - "io_uring": require io_uring, fail startup if unavailable NewIOUring now returns ErrIOUringUnavailable instead of silently falling back — callers decide whether to fail or fall back based on the requested mode. All mode transitions are logged: io backend: requested=auto selected=standard reason=... io backend: requested=io_uring selected=io_uring CLI: --io-backend=standard\|auto\|io_uring added to iscsi-target. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	18 hours ago
Ping Qiu	66d5ba0a84	fix: BatchIO review fixes — linked SQE, ring overflow, resource leak, sync parity 1. HIGH: LinkedWriteFsync now uses SubmitLinkRequests (IOSQE_IO_LINK) instead of SubmitRequests, ensuring write+fdatasync execute as a linked chain in the kernel. Falls back to sequential on error. 2. HIGH: PreadBatch/PwriteBatch chunk ops by ring capacity to prevent "too many requests" rejection when dirty map exceeds ring size (256). 3. MED: CloseBatchIO() added to Flusher, called in BlockVol.Close() after final flush to release io_uring ring / kernel resources. 4. MED: Sync parity — both standard and io_uring paths now use fdatasync (via platform-specific fdatasync_linux.go / fdatasync_other.go). Standard path previously used fsync; now matches io_uring semantics. On non-Linux, fdatasync falls back to fsync (only option available). 10 batchio tests, all blockvol tests pass. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	18 hours ago
Ping Qiu	04b1827b4a	feat: io_uring BatchIO implementation + UseIOUring config wiring Add iouring_linux.go (build-tagged linux && !no_iouring) using iceber/iouring-go for batched pread/pwrite/fdatasync. Includes linked write+fsync chain for group commit optimization. iouring_other.go provides silent fallback to standard on non-Linux. blockvol.go wires UseIOUring config flag through to flusher BatchIO. NewIOUring gracefully falls back if kernel lacks io_uring support. 10 batchio tests, all blockvol tests pass unchanged. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	19 hours ago
Ping Qiu	e55f369d66	feat: BatchIO interface for swappable flusher I/O backend New package batchio/ with BatchIO interface (PreadBatch, PwriteBatch, Fsync, LinkedWriteFsync) and standard sequential implementation. Flusher refactored to use BatchIO: WAL header reads, WAL entry reads, and extent writes are now batched through the interface. With the default NewStandard() backend, behavior is identical to before. UseIOUring config field added for future io_uring opt-in (Linux 5.6+). 9 interface tests, all existing blockvol tests pass unchanged. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	19 hours ago
Ping Qiu	4c5f9f2b9d	feat: CP10B-1 NVMe/TCP RX/TX split + CP10B-2 bench/profiling fixes RX/TX split: rxLoop reads PDUs, txLoop writes responses via respCh. Handlers refactored to void + enqueueResponse pattern. IOCCSZ fix enables inline write data (100K IOPS vs 15K before). R2T deadlock fix via completeWaiters. Shutdown cleans up pendingCapsules buffers. Bench: ParseFioMetric accepts plain/quoted numbers for aggregated medians. Profiling actions: pprof_capture, vmstat_capture, iostat_capture. 196 NVMe tests, 92 testrunner actions. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	19 hours ago
Ping Qiu	3557ae283f	feat: Phase 10 CP10-3 -- NVMe/TCP Tier 1 optimizations, WAL admission control, benchmark platform CP10-3 Tier 1 optimizations (T1-T4): - TCP_NODELAY + 256KB socket buffers on NVMe/TCP connections - Response batching: all C2H data chunks + CapsuleResp in single flush - Tiered buffer pool (4KB/64KB/256KB sync.Pool) for write payloads - Configurable MaxH2CDataLength wiring through controller/IC/chunking BUG-CP103-1: NVMe write retry with jittered backoff for transient WAL pressure - writeWithRetry() with bounded backoff [50/200/800ms] - throttleOnWALPressure() pre-write delay above 90% WAL usage - WALPressureProvider interface + NVMeAdapter.WALPressure() BUG-CP103-2: Volume-level WAL admission control - WALAdmission with counting semaphore (max concurrent writers) - Soft watermark (0.7): small delay to desynchronize herd - Hard watermark (0.9): block until flusher drains - Single-deadline budget shared across watermark wait + semaphore - Close-aware during both watermark and semaphore waits - Wired into BlockVol.WriteLBA() and Trim() Benchmark platform enhancements: - NVMe benchmark actions and scenarios (A/B, CW sweep, IOQ sweep) - Database benchmark actions (SQLite, pgbench) - K8s operator QA reconciler tests - New testrunner scenarios for HA, fault injection, CSI lifecycle Test counts: 213 NVMe + 625 engine + operator + testrunner tests, all passing. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2 days ago
Ping Qiu	bbadeeb89b	feat: Phase 10 CP10-2 -- CSI NVMe/TCP node plugin, 210 tests NVMe/TCP transport support in the CSI driver so Kubernetes pods can mount block volumes via NVMe alongside (or instead of) iSCSI. Transport selection: NVMe preferred when nvme_tcp module loaded + metadata present + nvmeUtil available. Fail-fast on NVMe errors (no silent iSCSI fallback). .transport file persists across CSI restarts. Key changes: - BuildNQN() single source of truth for NQN construction (naming.go) - NVMeUtil interface + realNVMeUtil wrapping nvme-cli (nvme_util.go) - NodeStageVolume/Unstage/Expand dual-transport paths (node.go) - NvmeAddr/NQN fields in VolumeInfo, Controller contexts - VolumeManager NvmeAddr()/VolumeNQN() getters - BlockService NvmeListenAddr()/NQN() accessors - 27 unit tests + 26 QA adversarial tests (nvme_node_test.go, qa_cp102) - Fix: flaky TestQA_Node_ConcurrentStageUnstage (pre-alloc temp dirs) Review fixes applied: F1 (NQN format mismatch), F2 (CreateVolume drops NVMe context), F3 (IsConnected error classification), F4 (findSubsys path validation), F5 (MasterVolumeClient NVMe gap documented). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2 days ago
Ping Qiu	0e234f5c80	feat: Phase 10 CP10-1 -- NVMe/TCP target MVP, 109 tests NVMe over Fabrics (TCP) target implementation sharing the same BlockVol engine, fencing, replication, and failover as the existing iSCSI target. New package: weed/storage/blockvol/nvme/ (11 files, 2,242 production LOC) - protocol.go: PDU types, opcodes, status codes, marshal/unmarshal - wire.go: TCP reader/writer with header bounds validation - controller.go: IC handshake, per-queue state, command dispatch, KATO - fabric.go: Connect (admin+IO), PropertyGet/Set, Disconnect - identify.go: Controller/Namespace/NS list/NS descriptors (Linux 5.15) - admin.go: SetFeatures, GetFeatures, GetLogPage (SMART/ANA), KeepAlive - io.go: Read (C2HData), Write (inline), Flush, WriteZeros/Trim - server.go: TCP listener, admin session registry, graceful shutdown - adapter.go: BlockVol-to-NVMe bridge, error mapping, ANA state Integration: NVMeConfig + CLI flags (-block.nvme.*), disabled by default. Key design: inline-data writes only (no R2T), MaxH2CDataLength=32KB, single ANA group coherent with BlockVol role, CNTLID session registry for cross-connection IO queues, HostNQN continuity enforcement. Tests: 65 dev + 44 QA adversarial = 109 total, all passing. Bugs fixed during review: IO queue cross-connection (A), header bounds validation (B), write payload size check (C), disconnect error (D), stream desync prevention (E), HostNQN enforcement (F), capsule-before-IC state guard (H), flowCtlOff SQHD timing (I). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	3 days ago
Ping Qiu	8fa1829992	feat: Phase 9A -- Kubernetes operator MVP for SeaweedFS block storage, 71 tests Nested Go module (operator/go.mod) isolating controller-runtime deps. CRD SeaweedBlockCluster (block.seaweedfs.com/v1alpha1) with dual-mode: CSI-only (MasterRef) connects to existing cluster; full-stack (Master) deploys master+volume StatefulSets. Single reconciler manages all sub-resources with ownership labels, finalizer cleanup, CHAP secret auto-generation, and multi-CR conflict detection. Review fixes: cross-NS label ownership (H1), ParseQuantity validation (H2), volume readiness probe (M1), leader election (M2), PVC StorageClassName (M3), condition type separation (M4), FQDN master address (L1), port validation (L3). QA adversarial fixes: ExtraArgs override rejection (BUG-QA-1), malformed lastRotated infinite rotation (BUG-QA-2), DNS label length validation (BUG-QA-3), replicas=0 error message (BUG-QA-4), RFC 1123 name validation (BUG-QA-5), whitespace field trimming (BUG-QA-6), zero storage size (BUG-QA-7). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	3 days ago
Ping Qiu	9acd187587	feat: Phase 8 complete -- CP8-5 stability gate, lease grant fix, Docker e2e, 13 chaos scenarios Phase 8 closes with all 6 checkpoints done (CP8-1 through CP8-5 + CP8-3-1): - CP8-5: 12/12 enterprise QA scenarios PASS on real hardware (m01/M02) - Master-authoritative lease grants (BUG-CP85-11): master renews primary write leases on every heartbeat response, replacing retain-until-confirmed assignment queue semantics that caused 30s lease expiry - Post-rebuild WAL shipping gap fix (BUG-CP85-1): syncLSNAfterRebuild advances replica nextLSN so WAL entries are accepted after rebuild - Block heartbeat startup race fix (BUG-CP85-10): dynamic blockService check on each tick instead of one-shot at loop start - 8 new tests: 4 engine lease grant + 4 registry lease grant - 13 new YAML scenarios: chaos (kill-loop, partition, disk-full), database integrity (sqlite crash, ext4 fsck), perf baseline, metrics verify, snapshot stress, expand-failover, session storm, role flap, 24h soak - 12 new testrunner actions (database, fsck, grep_log, write_loop_bg, stop_bg, assert_metric_gt/eq/lt) + phase repeat support - Docker compose setup + getting-started guide for block storage users - 960+ cumulative unit tests, 24 YAML scenarios Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	3 days ago
Ping Qiu	da1b81d1c9	feat: CP8-3-1 durability modes + testrunner platform + 21 adversarial tests Durability mode implementation (sync_all, sync_quorum, best_effort): - DurabilityMode type with superblock persistence, parse/validate/string - MakeDistributedSync mode-aware barrier enforcement in dist_group_commit - blockerr sentinel package (ErrDurabilityBarrierFailed, ErrDurabilityQuorumLost) - gRPC create path: mode validation, idempotent create consistency, partial cleanup - F1: strict mode rejects partial replica provisioning with cleanup - F3: empty heartbeat does not overwrite persisted strict mode - F4: SCSI error mapping uses errors.Is sentinels (not string matching) - Proto/wire/blockapi/CLI/UI plumbing for durability_mode field - Observability dashboard: cluster health cards + per-volume columns Testrunner platform (YAML-driven integration test framework): - Engine, parser, registry, reporter (JUnit XML + HTML), metrics scraping - 52 registered actions: block, iSCSI, I/O, fault injection, assertions - Baseline regression framework with 7 hard-fail conditions - 15 YAML scenarios (smoke, crash, HA, fault, consistency, snapshot) - 49 unit tests for testrunner internals QA adversarial suite (21 tests, all PASS): - Idempotent create mode/RF mismatch detection - Heartbeat mode downgrade prevention (F3) - sync_all/sync_quorum partial replica enforcement (F1) - Concurrent create race safety - Failover/expand mode preservation - Cleanup resilience when delete fails - Master restart auto-register mode handling - Superblock roundtrip all 3 modes - Validate edge cases (mode×RF matrix) - RequiredReplicas quorum math verification - Sentinel error categorization Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	5 days ago
Ping Qiu	979a9b496c	feat: Phase 8 CP8-1/2/3/4 -- ops control plane, multi-replica, CSI snapshots, observability CP8-1: HTTP REST API (create/delete/lookup/list/assign/servers), blockapi Go client with multi-master failover, 5 shell commands, HTML dashboard at /block/. CP8-2: RF=2/RF=3 multi-replica support -- ShipperGroup fan-out, distributed sync, health scoring, segment-based scrub, gated promotion (heartbeat freshness + WAL LSN + role checks), failover/rebuild for N>2 replicas. CP8-3: CSI snapshot + expansion -- CreateSnapshot/DeleteSnapshot/ListSnapshots RPCs, NodeExpandVolume with iSCSI rescan, snapshot ID helpers, 20 adversarial tests covering concurrent ops, edge cases, and error injection. CP8-4: Observability -- EngineMetrics atomic counters for flusher/group-commit/ WAL-shipper/scrub, 10 new Prometheus metrics, barrier_lag_lsn SLO gauge, failover/promotion/rebuild counters, request ID correlation in master gRPC logs, baseline regression framework with 7 hard-fail conditions. Total: 63 files, ~11.2K LOC, 160+ new tests. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	5 days ago
Ping Qiu	8b2b5f6f66	feat: Phase 6 CP6-3 -- failover + rebuild in Kubernetes, 126 tests Wire low-level fencing primitives to master/VS control plane and CSI: - Proto: replica/rebuild address fields on assignment/info/response messages - Assignment queue: retain-until-confirmed (Peek+Confirm), stale epoch pruning - VS assignment receiver: processes assignments from HeartbeatResponse - BlockService replication: ProcessAssignments, deterministic ports (FNV hash) - Registry replica tracking: SetReplica/ClearReplica/SwapPrimaryReplica - CreateBlockVolume: primary + replica, enqueues assignments, single-copy mode - Failover: lease-aware promotion, deferred timers with cancellation on reconnect - ControllerPublish: returns fresh primary iSCSI address after failover - Recovery: recoverBlockVolumes drains pendingRebuilds, enqueues Rebuilding - Real integration tests on M02: failover address switch, rebuild data consistency, full lifecycle failover+rebuild (3 tests, all PASS) Review fixes (12 findings, 5 High, 5 Medium, 2 Low): - R1-1: AllocateBlockVolume returns replication ports - R1-2: setupPrimaryReplication starts rebuild server - R1-3: VS sends periodic block heartbeat for assignment confirmation - R2-F1: LastLeaseGrant set before Register (no stale-lease race) - R2-F2: Deferred promotion timers cancelled on VS reconnect - R2-F3: SwapPrimaryReplica uses RoleToWire instead of uint32(1) - R2-F4: DeleteBlockVolume deletes replica (best-effort) - R2-F5: SwapPrimaryReplica computes epoch atomically under lock - QA: SetReplica removes old replica from byServer index (BUG-QA-CP63-1) 126 CP6-3 tests (67 dev + 48 QA + 8 integration + 3 real). Cumulative Phase 6: 352 tests. All PASS. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	6 days ago
Ping Qiu	5a9a52f2d0	feat: Phase 6 CP6-2 -- CSI control-plane integration + csi-sanity/k3s validation CP6-2 wires the CSI driver to SeaweedFS master/volume-server control plane: - Proto: block volume messages in master.proto/volume_server.proto, codegen - Master registry: in-memory BlockVolumeRegistry with Pending->Active status, full/delta heartbeat, inflight lock, placement (fewest volumes) - VS gRPC: AllocateBlockVolume/DeleteBlockVolume handlers, shared naming - Master RPCs: CreateBlockVolume (retry up to 3 servers), Delete, Lookup - Heartbeat: block volume fields wired into bidirectional stream - CSI Controller: VolumeBackend interface (Local + Master), returns volume_context - CSI Node: reads volume_context for remote targets, staged map + IQN derivation - Mode flag: --mode=controller/node/all, --master for control-plane - K8s manifests: csi-driver.yaml, csi-controller.yaml, csi-node.yaml csi-sanity conformance (33 pass, 58 skip) found 6 bugs: - BUG-SANITY-1/2/3: missing VolumeCapabilities/VolumeCapability validation - BUG-SANITY-4: NodePublish used mount instead of bind mount - BUG-SANITY-5: NodeUnpublish didn't remove target path - BUG-SANITY-6: NodeUnpublish failed on unmounted path k3s Level 4 (PVC->Pod data persistence) found 1 bug: - BUG-K3S-1: IsLoggedIn didn't handle iscsiadm exit code 21 226 CSI tests + 54 server tests = 280 new tests, all passing. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	7 days ago
Ping Qiu	797854b2d9	test: Phase 5 QA adversarial tests -- 49 tests for CHAP, resize, snapshots - qa_chap_test.go: 16 tests (empty secret, replay, missing fields, hex cases) - qa_resize_test.go: 12 tests (concurrent, reopen, replica reject, alignment) - qa_snapshot_test.go: 21 tests (concurrent create, CoW, recovery, role checks) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	1 week ago
Ping Qiu	531ee764ee	feat: Phase 5 CP5-3 -- CHAP auth, online resize, Prometheus metrics, 12 tests CHAP authentication (RFC 7143 S12.1): - auth.go: CHAPAuthenticator with MD5 challenge-response, ValidateCHAPConfig - login.go: multi-PDU SecurityNeg flow (challenge → verify → transit) - main.go: -chap-user/-chap-secret CLI flags with validation Online volume expand: - blockvol.go: Expand() with flusher pause, snapMu TOCTOU guard, alignment check - Rejects shrink (ErrShrinkNotSupported) and resize with active snapshots Prometheus metrics: - metrics.go: metricsAdapter wrapping BlockDevice, 15 metrics (counters, histograms, gauge funcs for WAL/dirty-map/epoch/role/snapshots) - Dedicated prometheus.NewRegistry() per server instance Admin HTTP endpoints: - POST /snapshot (create/delete/restore/list) - POST /resize (online expand) - GET /metrics (Prometheus text format) - VolumeSize added to /status response Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	1 week ago
Ping Qiu	d874e21f93	feat: Phase 5 CP5-2 -- CoW snapshots, 10 tests Sparse delta-file snapshots with copy-on-write in the flusher. Zero write-path overhead when no snapshot is active. New: snapshot.go (SnapshotBitmap, SnapshotHeader, delta file I/O) Modified: flusher.go (flushMu, CoW phase in FlushOnce, PauseAndFlush) Modified: blockvol.go (Create/Read/Delete/Restore/ListSnapshots, recovery) Modified: wal_writer.go (Reset for snapshot restore) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	1 week ago
Ping Qiu	98d0e9e631	feat: Phase 5 CP5-1 -- ALUA + multipath failover, 28 tests Add ALUA (Asymmetric Logical Unit Access) support to the iSCSI target, enabling dm-multipath on Linux to automatically detect path state changes and reroute I/O during HA failover without initiator-side intervention. - ALUAProvider interface with implicit ALUA (TPGS=0x01) - INQUIRY byte 5 TPGS bits, VPD 0x83 with NAA+TPG+RTP descriptors - REPORT TARGET PORT GROUPS handler (MAINTENANCE IN SA=0x0A) - MAINTENANCE OUT rejection (implicit-only, no SET TPG) - Standby write rejection (NOT_READY ASC=04h ASCQ=0Bh) - RoleNone maps to Active/Optimized (standalone single-node compatibility) - NAA-6 device identifier derived from volume UUID - -tpg-id flag with [1,65535] validation - dm-multipath config + setup script (group_by_tpg, ALUA prio) - 12 unit tests + 16 QA adversarial tests + 4 integration tests Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	1 week ago
Ping Qiu	7940e6b7c9	feat: Phase 4A CP4b-4 Windows iSCSI + instrumentation + QA tests Windows iSCSI Initiator compatibility: - Add TargetPortalGroupTag to login response (RFC 7143 S13.9) - Add REQUEST_SENSE, START_STOP_UNIT, MODE_SELECT(6/10) handlers - Add PERSISTENT_RESERVE_IN/OUT, MAINTENANCE_IN (REPORT SUPPORTED OPCODES) - Implement MODE SENSE caching (page 0x08) and control (page 0x0A) pages - Fix Data-In residual underflow/overflow flags (U/O bits on final PDU) - Rename ScsiReadCapacity16 -> ScsiServiceActionIn16 for correctness Instrumentation and tooling: - Add instrumentedAdapter with periodic PERF stats logging - Add pprof endpoints on admin HTTP server (/debug/pprof/*) - Add blockbench CLI tool for standalone block device benchmarking - Add SCSI CDB debug logging in session dispatch HA integration fixes: - Move HA test replica ports to 9011-9014 to avoid conflicts - Add QA adversarial tests for Phase 4A CP4b-4 (755 lines) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	1 week ago
Ping Qiu	44da35faf6	test: add integration test infrastructure for blockvol iSCSI Test harness for running blockvol iSCSI tests on WSL2 and remote nodes (m01/M02). Includes Node (SSH/local exec), ISCSIClient (discover/login/ logout), WeedTarget (weed volume server lifecycle), and test suites for smoke, stress, crash recovery, chaos, perf benchmarks, and apps (fio/dd). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	1 week ago
Ping Qiu	c39080ceaa	feat: Phase 4A CP4b-4 -- HA integration tests, admin HTTP, 5 bug fixes Add HTTP admin server to iscsi-target binary (POST /assign, GET /status, POST /replica, POST /rebuild) and 7 HA integration tests validating failover, split-brain prevention, epoch fencing, and demote-under-IO. New files: - admin.go: HTTP admin endpoint with input validation - ha_target.go: HATarget helper wrapping Target + admin HTTP calls - ha_test.go: 7 HA tests (all PASS on WSL2, 67.7s total) Bug fixes: - BUG-CP4B4-1: CmdSN init (expCmdSN=0 not 1, first SCSI cmd was dropped) - BUG-CP4B4-2: RoleNone->RoleReplica missing SetEpoch (WAL rejected) - BUG-CP4B4-3: replica applyEntry didn't update vol.nextLSN (status=0) - BUG-CP4B4-4: PID discovery killed primary instead of replica (shared binPath; fixed by grepping volFile) - BUG-CP4B4-5: artifact collector overwrote primary log with replica log (added CollectLabeled method) Also: 3s write deadline on WAL shipper data connection to avoid 120s TCP retransmission timeout when replica is dead. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	1 week ago
Ping Qiu	7c07d9c95a	feat: Phase 4A CP4b-3 -- assignment processing, 2 bug fixes, 20 QA tests Add ProcessBlockVolumeAssignments to BlockVolumeStore and wire AssignmentSource/AssignmentCallback into the heartbeat collector's Run() loop. Assignments are fetched and applied each tick after status collection. Bug fixes: - BUG-CP4B3-1: TOCTOU between GetBlockVolume and HandleAssignment. Added withVolume() helper that holds RLock across lookup+operation, preventing RemoveBlockVolume from closing the volume mid-assignment. - BUG-CP4B3-2: Data race on callback fields read by Run() goroutine. Made StatusCallback/AssignmentSource/AssignmentCallback private, added cbMu mutex and SetXxx() setter methods. Lock held only for load/store, not during callback execution. 7 dev tests + 13 QA adversarial tests = 20 new tests. 972 total unit tests, all passing. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	1 week ago
Ping Qiu	a089bf6828	feat: Phase 4A CP4b-2 -- heartbeat collector, 3 bug fixes, 9 QA tests BlockVolumeHeartbeatCollector periodically collects block volume status via callback (standalone, no gRPC wiring yet). Store() accessor on BlockService. Three bugs found by QA and fixed: Stop-before-Run deadlock (BUG-CP4B2-1), zero interval panic (BUG-CP4B2-2), callback panic crashes goroutine (BUG-CP4B2-3). 12 new tests (3 dev + 9 QA adversarial). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	1 week ago
Ping Qiu	c95500cc57	test: Phase 4A CP4b-1 QA adversarial tests (19 tests) Boundary tests for RoleFromWire, LeaseTTLToWire overflow/clamp/negative, ToBlockVolumeInfoMessage with primary/stale/closed/concurrent volumes, BlockVolumeAssignment roundtrip, and heartbeat collection edge cases. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	1 week ago
Ping Qiu	ffdde15bcd	feat: Phase 4A CP4b-1 -- wire types, conversion helpers, heartbeat collection Add BlockVolumeInfoMessage, BlockVolumeShortInfoMessage, BlockVolumeAssignment wire-type structs (proto-shaped Go structs). Add conversion helpers with DiskType plumbing, overflow-safe LeaseTTLToWire, validated RoleFromWire. Add CollectBlockVolumeHeartbeat on BlockVolumeStore. 9 new tests. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	1 week ago
Ping Qiu	09c7e40d29	feat: Phase 4A CP4a -- simulated master, assignment sequence tests, BlockVolumeStatus Add SimulatedMaster test helper + 20 assignment sequence tests (8 sequence, 5 failover, 5 adversarial, 2 status). Add BlockVolumeStatus struct and Status() method. Includes QA test files for CP1-CP4a. 940 total unit tests. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	1 week ago
Ping Qiu	b31383e294	feat: Phase 4A CP3 -- promotion, rebuild, split-brain prevention Add master-driven lifecycle operations: promotion, demotion, rebuild, and split-brain prevention. All testable on Windows with mock TCP. New files: - promotion.go: HandleAssignment (single entry point for role changes), promote (Replica/None -> Primary with durable epoch), demote (Primary -> Draining -> Stale with drain timeout) - rebuild.go: RebuildServer (WAL catch-up + full extent streaming), StartRebuild client (WAL catch-up with full extent fallback, two-phase rebuild with second catch-up for concurrent writes) Modified: - wal_writer.go: ScanFrom() method, ErrWALRecycled sentinel - repl_proto.go: rebuild message types + RebuildRequest encode/decode - blockvol.go: assignMu, drainTimeout, rebuildServer fields; HandleAssignment/StartRebuildServer/StopRebuildServer methods; rebuild server stop in Close() - dirty_map.go: Clear() method for full extent rebuild 32 new tests covering WAL scan, promotion/demotion, rebuild server, rebuild client, split-brain prevention, and full lifecycle scenarios. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	1 week ago
Ping Qiu	16a796e56d	feat: Phase 4A CP2 — WAL shipping, replica barrier, distributed group commit Primary ships WAL entries to replica over TCP (data channel), confirms durability via barrier RPC (control channel). SyncCache runs local fsync and replica barrier in parallel via MakeDistributedSync. When replica is unreachable, shipper enters permanent degraded mode and falls back to local-only sync (Phase 3 behavior). Key design: two separate TCP ports (data+control), contiguous LSN enforcement, epoch equality check, WAL-full retry on replica, cond.Wait-based barrier with configurable timeout, BarrierFsyncFailed status code. Close lifecycle: shipper → receiver → drain → committer → flusher → fd. New files: repl_proto.go, wal_shipper.go, replica_apply.go, replica_barrier.go, dist_group_commit.go Modified: blockvol.go, blockvol_test.go 27 dev tests + 21 QA tests = 48 new tests; 889 total (609 engine + 280 iSCSI), all passing. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	1 week ago
Ping Qiu	a107685f00	feat: Phase 4A CP1 — epoch, lease, role state machine, write gate Local fencing primitives for block volumes. Every write path validates role + epoch + lease before accepting data. RoleNone (default) skips all checks for Phase 3 backward compatibility. New files: epoch.go, lease.go, role.go, write_gate.go Modified: superblock.go (Epoch field), blockvol.go (fencing fields, writeGate in WriteLBA/Trim), group_commit.go (PostSyncCheck/Gotcha A), dirty_map.go (P3-BUG-9 power-of-2 panic) Bug fixes: BUG-4A-1 (atomic epoch), BUG-4A-2 (CAS SetRole), BUG-4A-3 (mutex SetEpoch), BUG-4A-4 (single role.Load), BUG-4A-6 (safeCallback recover) 837 tests (557 engine + 280 iSCSI), all passing. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	1 week ago
Ping Qiu	80801b0fac	feat: Phase 3 — performance tuning, iSCSI session refactor, store integration Phase 3 delivers five checkpoints: CP1 Engine Tuning: BlockVolConfig tunables, 256-shard DirtyMap, adaptive group commit (low-watermark immediate flush), WAL pressure handling with backpressure and ErrWALFull timeout. CP2 iSCSI Session Refactor: RX/TX goroutine split with respCh (cap 64), txLoop for serialized response writes, StatSN assignment modes. Login phase stays single-goroutine; full-duplex after login. CP3 Store Integration: BlockVolAdapter (iscsi.BlockDevice interface), BlockVolumeStore management, BlockService in volume_server_block.go, CLI flags (--block.listen/dir/iqn.prefix), sw-block-attach.sh helper. CP5 Concurrency Hardening: WAL reuse guard (LSN validation in ReadLBA), opsOutstanding counter with beginOp/endOp + Close drain, appendWithRetry shared by WriteLBA and TrimLBA, flusher LSN guard in FlushOnce. Bug fixes (P3-BUG-2–11): unbounded pending queue cap, Data-Out timeout, flusher error logging, GroupCommitter panic recovery, Close vs concurrent ops guard, target shutdown race, WAL-full retry vs Close, WRITE SAME(16) for XFS, MODE SENSE(10) + VPD 0xB0/0xB2 for Linux kernel compatibility. 797 tests passing (517 engine + 280 iSCSI), go vet clean. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	1 week ago
Ping Qiu	9b7be60b0c	Add QA adversarial tests for iSCSI target (55 tests) 9 categories: PDU, Params, Login, Discovery, SCSI, DataIO, Session, Target, Integration. 2,183 lines. All 229 tests pass (164 dev + 55 QA). No new production bugs found. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2 weeks ago
Ping Qiu	feef0206ad	fix: address code review findings (nil handler, CmdSN, Data-Out order) 1. Discovery session nil handler crash: reject SCSI commands with Reject PDU when s.scsi is nil (discovery sessions have no target). 2. CmdSN window enforcement: validate incoming CmdSN against [ExpCmdSN, MaxCmdSN] using serial arithmetic. Drop out-of-window commands per RFC 7143 section 4.2.2.1. 3. Data-Out buffer offset validation: enforce BufferOffset == received for ordered data (DataPDUInOrder=Yes). Prevents silent corruption from out-of-order or overlapping data. 4. ImmediateData enforcement: reject immediate data in SCSI command PDU when negotiated ImmediateData=No. 5. UNMAP descriptor length alignment: reject blockDescLen not a multiple of 16 bytes. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2 weeks ago
Ping Qiu	bd73a81e00	fix: handle pipelined SCSI commands during Data-Out collection The Linux kernel iSCSI initiator pipelines multiple SCSI commands on the same TCP connection (command queuing). When a write needs R2T for data beyond the immediate portion, collectDataOut may read a pipelined SCSI command instead of the expected Data-Out PDU. Fix: queue non-Data-Out PDUs received during collectDataOut into a pending buffer. The main dispatch loop drains pending PDUs before reading from the connection. This correctly handles interleaved commands during multi-PDU write transfers. Bug found during WSL2 smoke test: mkfs.ext4 hangs at "Writing superblocks" because inode table zeroing sends large writes that exceed FirstBurstLength, triggering R2T while the kernel has already queued the next command. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2 weeks ago
Ping Qiu	6546c549eb	fix: iSCSI login and discovery bugs found in WSL2 smoke test - Skip InitiatorAlias in negotiation (was returning NotUnderstood) - Capture TargetName in StageLoginOp direct-jump path (iscsiadm skips security stage, sends CSG=LoginOp directly -- nil SCSIHandler crash) - Add portalAddr to TargetServer for discovery responses (listener on [::] is not routable from WSL2 clients) - Add -portal flag to iscsi-target binary Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2 weeks ago
Ping Qiu	6a400f6760	feat: add BlockVol engine and iSCSI target (Phase 1 + Phase 2) Phase 1: Extent-mapped block storage engine with WAL, crash recovery, dirty map, flusher, and group commit. 174 tests, zero SeaweedFS imports. Phase 2: Pure Go iSCSI target (RFC 7143) with PDU codec, login negotiation, SendTargets discovery, 12 SCSI opcodes, Data-In/Out/R2T sequencing, session management, and standalone iscsi-target binary. 164 tests. IQN->BlockDevice binding via DeviceLookup interface. Total: 338 tests, 14.6K lines. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2 weeks ago
Chris Lu	cf3b7b3ad7	adjust weight	2 weeks ago
Chris Lu	09a1ace53a	adjust display name	2 weeks ago
Chris Lu	c73e65ad5e	Add customizable plugin display names and weights (#8459 ) * feat: add customizable plugin display names and weights - Add weight field to JobTypeCapability proto message - Modify ListKnownJobTypes() to return JobTypeInfo with display names and weights - Modify ListPluginJobTypes() to return JobTypeInfo instead of string - Sort plugins by weight (descending) then alphabetically - Update admin API to return enriched job type metadata - Update plugin UI template to display names instead of IDs - Consolidate API by reusing existing function names instead of suffixed variants * perf: optimize plugin job type capability lookup and add null-safe parsing - Pre-calculate job type capabilities in a map to reduce O(nm) nested loops to O(n+m) lookup time in ListKnownJobTypes() - Add parseJobTypeItem() helper function for null-safe job type item parsing - Refactor plugin.templ to use parseJobTypeItem() in all job type access points (hasJobType, applyInitialNavigation, ensureActiveNavigation, renderTopTabs) - Deterministic capability resolution by using first worker's capability templ * refactor: use parseJobTypeItem helper consistently in plugin.templ Replace duplicated job type extraction logic at line 1296-1298 with parseJobTypeItem() helper function for consistency and maintainability. * improve: prefer richer capability metadata and add null-safety checks - Improve capability selection in ListKnownJobTypes() to prefer capabilities with non-empty DisplayName and higher Weight across all workers instead of first-wins approach. Handles mixed-version clusters better. - Add defensive null checks in renderJobTypeSummary() to safely access parseJobTypeItem() result before property access - Ensures malformed or missing entries won't break the rendering pipeline * fix: preserve existing DisplayName when merging capabilities Fix capability merge logic to respect existing DisplayName values: - If existing has DisplayName but candidate doesn't, preserve existing - If existing doesn't have DisplayName but candidate does, use candidate - Only use Weight comparison if DisplayName status is equal - Prevents higher-weight capabilities with empty DisplayName from overriding capabilities with non-empty DisplayName	2 weeks ago
Chris Lu	8eba7ba5b2	feat: drop table location mapping support (#8458 ) * feat: drop table location mapping support Disable external metadata locations for S3 Tables and remove the table location mapping index entirely. Table metadata must live under the table bucket paths, so lookups no longer use mapping directories. Changes: - Remove mapping lookup and cache from bucket path resolution - Reject metadataLocation in CreateTable and UpdateTable - Remove mapping helpers and tests * compile * refactor * fix: accept metadataLocation in S3 Tables API requests We removed the external table location mapping feature, but still need to accept and store metadataLocation values from clients like Trino. The mapping feature was an internal implementation detail that mapped external buckets to internal table paths. The metadataLocation field itself is part of the S3 Tables API and should be preserved. * fmt * fix: handle MetadataLocation in UpdateTable requests Mirror handleCreateTable behavior by updating metadata.MetadataLocation when req.MetadataLocation is provided in UpdateTable requests. This ensures table metadata location can be updated, not just set during creation.	2 weeks ago
Chris Lu	641351da78	fix: table location mappings to /etc/s3tables (#8457 ) * fix: move table location mappings to /etc/s3tables to avoid bucket name validation Fixes #8362 - table location mappings were stored under /buckets/.table-location-mappings which fails bucket name validation because it starts with a dot. Moving them to /etc/s3tables resolves the migration error for upgrades. Changes: - Table location mappings now stored under /etc/s3tables - Ensure parent /etc directory exists before creating /etc/s3tables - Normal writes go to new location only (no legacy compatibility) - Removed bucket name validation exception for old location * refactor: simplify lookupTableLocationMapping by removing redundant mappingPath parameter The mappingPath function parameter was redundant as the path can be derived from mappingDir and bucket using path.Join. This simplifies the code and reduces the risk of path mismatches between parameters.	2 weeks ago
blitt001	3d81d5bef7	Fix S3 signature verification behind reverse proxies (#8444 ) * Fix S3 signature verification behind reverse proxies When SeaweedFS is deployed behind a reverse proxy (e.g. nginx, Kong, Traefik), AWS S3 Signature V4 verification fails because the Host header the client signed with (e.g. "localhost:9000") differs from the Host header SeaweedFS receives on the backend (e.g. "seaweedfs:8333"). This commit adds a new -s3.externalUrl parameter (and S3_EXTERNAL_URL environment variable) that tells SeaweedFS what public-facing URL clients use to connect. When set, SeaweedFS uses this host value for signature verification instead of the Host header from the incoming request. New parameter: -s3.externalUrl (flag) or S3_EXTERNAL_URL (environment variable) Example: -s3.externalUrl=http://localhost:9000 Example: S3_EXTERNAL_URL=https://s3.example.com The environment variable is particularly useful in Docker/Kubernetes deployments where the external URL is injected via container config. The flag takes precedence over the environment variable when both are set. At startup, the URL is parsed and default ports are stripped to match AWS SDK behavior (port 80 for HTTP, port 443 for HTTPS), so "http://s3.example.com:80" and "http://s3.example.com" are equivalent. Bugs fixed: - Default port stripping was removed by a prior PR, causing signature mismatches when clients connect on standard ports (80/443) - X-Forwarded-Port was ignored when X-Forwarded-Host was not present - Scheme detection now uses proper precedence: X-Forwarded-Proto > TLS connection > URL scheme > "http" - Test expectations for standard port stripping were incorrect - expectedHost field in TestSignatureV4WithForwardedPort was declared but never actually checked (self-referential test) * Add Docker integration test for S3 proxy signature verification Docker Compose setup with nginx reverse proxy to validate that the -s3.externalUrl parameter (or S3_EXTERNAL_URL env var) correctly resolves S3 signature verification when SeaweedFS runs behind a proxy. The test uses nginx proxying port 9000 to SeaweedFS on port 8333, with X-Forwarded-Host/Port/Proto headers set. SeaweedFS is configured with -s3.externalUrl=http://localhost:9000 so it uses "localhost:9000" for signature verification, matching what the AWS CLI signs with. The test can be run with aws CLI on the host or without it by using the amazon/aws-cli Docker image with --network host. Test covers: create-bucket, list-buckets, put-object, head-object, list-objects-v2, get-object, content round-trip integrity, delete-object, and delete-bucket — all through the reverse proxy. * Create s3-proxy-signature-tests.yml * fix CLI * fix CI * Update s3-proxy-signature-tests.yml * address comments * Update Dockerfile * add user * no need for fuse * Update s3-proxy-signature-tests.yml * debug * weed mini * fix health check * health check * fix health checking --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> Co-authored-by: Chris Lu <chris.lu@gmail.com>	2 weeks ago
Kirill Ilin	ae02d47433	helm: add optional parameters to COSI BucketClass (#8453 ) Add cosi.bucketClassParameters to allow passing arbitrary parameters to the default BucketClass resource. This enables use cases like tiered storage where a diskType parameter needs to be set on the BucketClass to route objects to specific volume servers. When bucketClassParameters is empty (default), the BucketClass is rendered without a parameters block, preserving backward compatibility. Signed-off-by: Kirill Ilin <stitch14@yandex.ru> Co-authored-by: Claude <noreply@anthropic.com>	2 weeks ago
Chris Lu	9b6fc49946	Chart createBuckets config #8368 : Add TTL, Object Lock, and Versioning support (#8375 ) * Chart createBuckets config #8368: Add TTL, Object Lock, and Versioning support * Update weed/shell/command_s3_bucket_versioning.go Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> * address comments * address comments * go fmt * fix failures are still treated like “bucket not found” --------- Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2 weeks ago
Lars Lehtonen	0fac6e39ea	weed/s3api/s3tables: fix dropped errors (#8456 ) * weed/s3api/s3tables: fix dropped errors * enhance errors * fail fast when listing tables --------- Co-authored-by: Chris Lu <chris.lu@gmail.com>	2 weeks ago
Chris Lu	453310b057	Add plugin worker integration tests for erasure coding (#8450 ) * test: add plugin worker integration harness * test: add erasure coding detection integration tests * test: add erasure coding execution integration tests * ci: add plugin worker integration workflow * test: extend fake volume server for vacuum and balance * test: expand erasure coding detection topologies * test: add large erasure coding detection topology * test: add vacuum plugin worker integration tests * test: add volume balance plugin worker integration tests * ci: run plugin worker tests per worker * fixes * erasure coding: stop after placement failures * erasure coding: record hasMore when early stopping * erasure coding: relax large topology expectations	2 weeks ago
Chris Lu	9b6bbf7d45	Remove volumePreallocate option from docker containers (#8451 ) Some filesystems, such as XFS, may over-allocate disk spaces when using volume preallocation. Remove this option from the default docker entrypoint scripts to allow volumes to use only the necessary disk space. Fixes: https://github.com/seaweedfs/seaweedfs/issues/6465#issuecomment-3964174718 Co-authored-by: Copilot <copilot@github.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2 weeks ago
Chris Lu	d2b92938ee	Make EC detection context aware (#8449 ) * Make EC detection context aware * Update register.go * Speed up EC detection planning * Add tests for EC detection planner * optimizations detection.go: extracted ParseCollectionFilter (exported) and feed it into the detection loop so both detection and tracing share the same parsing/whitelisting logic; the detection loop now iterates on a sorted list of volume IDs, checks the context at every iteration, and only sets hasMore when there are still unprocessed groups after hitting maxResults, keeping runtime bounded while still scheduling planned tasks before returning the results. erasure_coding_handler.go: dropped the duplicated inline filter parsing in emitErasureCodingDetectionDecisionTrace and now reuse erasurecodingtask.ParseCollectionFilter, and the summary suffix logic now only accounts for the hasMore case that can actually happen. detection_test.go: updated the helper topology builder to use master_pb.VolumeInformationMessage (matching the current protobuf types) and tightened the cancellation/max-results tests so they reliably exercise the detection logic (cancel before calling Detection, and provide enough disks so one result is produced before the limit). * use working directory * fix compilation * fix compilation * rename * go vet * fix getenv * address comments, fix error	2 weeks ago
Chris Lu	7f6e58b791	Fix SFTP file upload failures with JWT filer tokens (#8448 ) * Fix SFTP file upload failures with JWT filer tokens (issue #8425) When JWT authentication is enabled for filer operations via jwt.filer_signing.* configuration, SFTP server file upload requests were rejected because they lacked JWT authorization headers. Changes: - Added JWT signing key and expiration fields to SftpServer struct - Modified putFile() to generate and include JWT tokens in upload requests - Enhanced SFTPServiceOptions with JWT configuration fields - Updated SFTP command startup to load and pass JWT config to service This allows SFTP uploads to authenticate with JWT-enabled filers, consistent with how other SeaweedFS components (S3 API, file browser) handle filer auth. Fixes #8425 * Apply suggestion from @gemini-code-assist[bot] Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> --------- Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2 weeks ago
Chris Lu	a92e9baddf	Add integration test for multipart operations inheriting s3:PutObject permissions TestS3MultipartOperationsInheritPutObjectPermissions verifies that multipart upload operations (CreateMultipartUpload, UploadPart, ListParts, CompleteMultipartUpload, AbortMultipartUpload, ListMultipartUploads) work correctly when a user has only s3:PutObject permission granted. This test validates the behavior where multipart operations are implicitly granted when s3:PutObject is authorized, as multipart upload is an implementation detail of putting objects in S3.	2 weeks ago

1 2 3 4 5 ...

13031 Commits (5e4baccc46f27e47d532d3b7292bad79e72defc2) All Branches Search

13031 Commits (5e4baccc46f27e47d532d3b7292bad79e72defc2)

All Branches