Branch:
feature/sw-block
add-ec-vacuum
add-filer-iam-grpc
add-iam-grpc-management
add_fasthttp_client
add_remote_storage
adding-message-queue-integration-tests
adjust-fsck-cutoff-default
admin/csrf-s3tables
allow-no-role-arn
also-delete-parent-directory-if-empty
avoid_releasing_temp_file_on_write
changing-to-zap
codex-rust-volume-server-bootstrap
codex/admin-oidc-auth-ui
codex/cache-iam-policy-engines
codex/ec-repair-worker
codex/erasure-coding-shard-distribution
codex/list-object-versions-newest-first
collect-public-metrics
copilot/fix-helm-chart-installation
copilot/fix-s3-object-tagging-issue
copilot/make-renew-interval-configurable
copilot/make-renew-interval-configurable-again
copilot/sub-pr-7677
create-table-snapshot-api-design
data_query_pushdown
dependabot/maven/other/java/client/com.google.protobuf-protobuf-java-3.25.5
dependabot/maven/other/java/examples/org.apache.hadoop-hadoop-common-3.4.0
detect-and-plan-ec-tasks
do-not-retry-if-error-is-NotFound
ec-disk-type-support
enhance-erasure-coding
expand-the-s3-PutObject-permission-to-the-multipart-permissions
fasthttp
feature-8113-storage-class-disk-routing
feature/iceberg-data-compaction
feature/mini-port-detection
feature/modernize-s3-tests
feature/s3-multi-cert-support
feature/s3tables-improvements-and-spark-tests
feature/sra-uds-handler
feature/sw-block
filer1_maintenance_branch
fix-8303-s3-lifecycle-ttl-assign
fix-GetObjectLockConfigurationHandler
fix-bucket-name-case-7910
fix-helm-fromtoml-compatibility
fix-mount-http-parallelism
fix-mount-read-throughput-7504
fix-pr-7909
fix-s3-configure-consistency
fix-s3-object-tagging-issue-7589
fix-sts-session-token-7941
fix-versioning-listing-only
fix/iceberg-stage-create-semantics
fix/mount-cache-consistency
fix/object-lock-delete-enforcement
fix/plugin-ui-remove-scheduler-settings
fix/sts-body-preservation
fix/windows-test-file-cleanup
ftp
gh-pages
has-weed-sql-command
iam-multi-file-migration
iam-permissions-and-api
improve-fuse-mount
improve-fuse-mount2
logrus
master
message_send
mount2
mq-subscribe
mq2
nfs-cookie-prefix-list-fixes
optimize-delete-lookups
original_weed_mount
plugin-system-phase1
plugin-ui-enhancements-restored
pr-7412
pr/7984
pr/8140
raft-dual-write
random_access_file
refactor-needle-read-operations
refactor-volume-write
remote_overlay
remove-implicit-directory-handling
revert-5134-patch-1
revert-5819-patch-1
revert-6434-bugfix-missing-s3-audit
rust-volume-server
s3-remote-cache-singleflight
s3-select
s3tables-by-claude
scheduler-sequential-iteration
sub
tcp_read
test-reverting-lock-table
test_udp
testing
testing-sdx-generation
tikv
track-mount-e2e
upgrade-versions-to-4.00
volume_buffered_writes
worker-execute-ec-tasks
0.72
0.72.release
0.73
0.74
0.75
0.76
0.77
0.90
0.91
0.92
0.93
0.94
0.95
0.96
0.97
0.98
0.99
1.00
1.01
1.02
1.03
1.04
1.05
1.06
1.07
1.08
1.09
1.10
1.11
1.12
1.14
1.15
1.16
1.17
1.18
1.19
1.20
1.21
1.22
1.23
1.24
1.25
1.26
1.27
1.28
1.29
1.30
1.31
1.32
1.33
1.34
1.35
1.36
1.37
1.38
1.40
1.41
1.42
1.43
1.44
1.45
1.46
1.47
1.48
1.49
1.50
1.51
1.52
1.53
1.54
1.55
1.56
1.57
1.58
1.59
1.60
1.61
1.61RC
1.62
1.63
1.64
1.65
1.66
1.67
1.68
1.69
1.70
1.71
1.72
1.73
1.74
1.75
1.76
1.77
1.78
1.79
1.80
1.81
1.82
1.83
1.84
1.85
1.86
1.87
1.88
1.90
1.91
1.92
1.93
1.94
1.95
1.96
1.97
1.98
1.99
1;70
2.00
2.01
2.02
2.03
2.04
2.05
2.06
2.07
2.08
2.09
2.10
2.11
2.12
2.13
2.14
2.15
2.16
2.17
2.18
2.19
2.20
2.21
2.22
2.23
2.24
2.25
2.26
2.27
2.28
2.29
2.30
2.31
2.32
2.33
2.34
2.35
2.36
2.37
2.38
2.39
2.40
2.41
2.42
2.43
2.47
2.48
2.49
2.50
2.51
2.52
2.53
2.54
2.55
2.56
2.57
2.58
2.59
2.60
2.61
2.62
2.63
2.64
2.65
2.66
2.67
2.68
2.69
2.70
2.71
2.72
2.73
2.74
2.75
2.76
2.77
2.78
2.79
2.80
2.81
2.82
2.83
2.84
2.85
2.86
2.87
2.88
2.89
2.90
2.91
2.92
2.93
2.94
2.95
2.96
2.97
2.98
2.99
3.00
3.01
3.02
3.03
3.04
3.05
3.06
3.07
3.08
3.09
3.10
3.11
3.12
3.13
3.14
3.15
3.16
3.18
3.19
3.20
3.21
3.22
3.23
3.24
3.25
3.26
3.27
3.28
3.29
3.30
3.31
3.32
3.33
3.34
3.35
3.36
3.37
3.38
3.39
3.40
3.41
3.42
3.43
3.44
3.45
3.46
3.47
3.48
3.50
3.51
3.52
3.53
3.54
3.55
3.56
3.57
3.58
3.59
3.60
3.61
3.62
3.63
3.64
3.65
3.66
3.67
3.68
3.69
3.71
3.72
3.73
3.74
3.75
3.76
3.77
3.78
3.79
3.80
3.81
3.82
3.83
3.84
3.85
3.86
3.87
3.88
3.89
3.90
3.91
3.92
3.93
3.94
3.95
3.96
3.97
3.98
3.99
4.00
4.01
4.02
4.03
4.04
4.05
4.06
4.07
4.08
4.09
4.12
4.13
4.15
4.16
4.17
dev
helm-3.65.1
v0.69
v0.70beta
v3.33
${ noResults }
10 Commits (feature/sw-block)
| Author | SHA1 | Message | Date |
|---|---|---|---|
|
|
bbadeeb89b |
feat: Phase 10 CP10-2 -- CSI NVMe/TCP node plugin, 210 tests
NVMe/TCP transport support in the CSI driver so Kubernetes pods can mount block volumes via NVMe alongside (or instead of) iSCSI. Transport selection: NVMe preferred when nvme_tcp module loaded + metadata present + nvmeUtil available. Fail-fast on NVMe errors (no silent iSCSI fallback). .transport file persists across CSI restarts. Key changes: - BuildNQN() single source of truth for NQN construction (naming.go) - NVMeUtil interface + realNVMeUtil wrapping nvme-cli (nvme_util.go) - NodeStageVolume/Unstage/Expand dual-transport paths (node.go) - NvmeAddr/NQN fields in VolumeInfo, Controller contexts - VolumeManager NvmeAddr()/VolumeNQN() getters - BlockService NvmeListenAddr()/NQN() accessors - 27 unit tests + 26 QA adversarial tests (nvme_node_test.go, qa_cp102) - Fix: flaky TestQA_Node_ConcurrentStageUnstage (pre-alloc temp dirs) Review fixes applied: F1 (NQN format mismatch), F2 (CreateVolume drops NVMe context), F3 (IsConnected error classification), F4 (findSubsys path validation), F5 (MasterVolumeClient NVMe gap documented). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> |
3 days ago |
|
|
0e234f5c80 |
feat: Phase 10 CP10-1 -- NVMe/TCP target MVP, 109 tests
NVMe over Fabrics (TCP) target implementation sharing the same BlockVol engine, fencing, replication, and failover as the existing iSCSI target. New package: weed/storage/blockvol/nvme/ (11 files, 2,242 production LOC) - protocol.go: PDU types, opcodes, status codes, marshal/unmarshal - wire.go: TCP reader/writer with header bounds validation - controller.go: IC handshake, per-queue state, command dispatch, KATO - fabric.go: Connect (admin+IO), PropertyGet/Set, Disconnect - identify.go: Controller/Namespace/NS list/NS descriptors (Linux 5.15) - admin.go: SetFeatures, GetFeatures, GetLogPage (SMART/ANA), KeepAlive - io.go: Read (C2HData), Write (inline), Flush, WriteZeros/Trim - server.go: TCP listener, admin session registry, graceful shutdown - adapter.go: BlockVol-to-NVMe bridge, error mapping, ANA state Integration: NVMeConfig + CLI flags (-block.nvme.*), disabled by default. Key design: inline-data writes only (no R2T), MaxH2CDataLength=32KB, single ANA group coherent with BlockVol role, CNTLID session registry for cross-connection IO queues, HostNQN continuity enforcement. Tests: 65 dev + 44 QA adversarial = 109 total, all passing. Bugs fixed during review: IO queue cross-connection (A), header bounds validation (B), write payload size check (C), disconnect error (D), stream desync prevention (E), HostNQN enforcement (F), capsule-before-IC state guard (H), flowCtlOff SQHD timing (I). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> |
3 days ago |
|
|
9acd187587 |
feat: Phase 8 complete -- CP8-5 stability gate, lease grant fix, Docker e2e, 13 chaos scenarios
Phase 8 closes with all 6 checkpoints done (CP8-1 through CP8-5 + CP8-3-1): - CP8-5: 12/12 enterprise QA scenarios PASS on real hardware (m01/M02) - Master-authoritative lease grants (BUG-CP85-11): master renews primary write leases on every heartbeat response, replacing retain-until-confirmed assignment queue semantics that caused 30s lease expiry - Post-rebuild WAL shipping gap fix (BUG-CP85-1): syncLSNAfterRebuild advances replica nextLSN so WAL entries are accepted after rebuild - Block heartbeat startup race fix (BUG-CP85-10): dynamic blockService check on each tick instead of one-shot at loop start - 8 new tests: 4 engine lease grant + 4 registry lease grant - 13 new YAML scenarios: chaos (kill-loop, partition, disk-full), database integrity (sqlite crash, ext4 fsck), perf baseline, metrics verify, snapshot stress, expand-failover, session storm, role flap, 24h soak - 12 new testrunner actions (database, fsck, grep_log, write_loop_bg, stop_bg, assert_metric_gt/eq/lt) + phase repeat support - Docker compose setup + getting-started guide for block storage users - 960+ cumulative unit tests, 24 YAML scenarios Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> |
4 days ago |
|
|
da1b81d1c9 |
feat: CP8-3-1 durability modes + testrunner platform + 21 adversarial tests
Durability mode implementation (sync_all, sync_quorum, best_effort): - DurabilityMode type with superblock persistence, parse/validate/string - MakeDistributedSync mode-aware barrier enforcement in dist_group_commit - blockerr sentinel package (ErrDurabilityBarrierFailed, ErrDurabilityQuorumLost) - gRPC create path: mode validation, idempotent create consistency, partial cleanup - F1: strict mode rejects partial replica provisioning with cleanup - F3: empty heartbeat does not overwrite persisted strict mode - F4: SCSI error mapping uses errors.Is sentinels (not string matching) - Proto/wire/blockapi/CLI/UI plumbing for durability_mode field - Observability dashboard: cluster health cards + per-volume columns Testrunner platform (YAML-driven integration test framework): - Engine, parser, registry, reporter (JUnit XML + HTML), metrics scraping - 52 registered actions: block, iSCSI, I/O, fault injection, assertions - Baseline regression framework with 7 hard-fail conditions - 15 YAML scenarios (smoke, crash, HA, fault, consistency, snapshot) - 49 unit tests for testrunner internals QA adversarial suite (21 tests, all PASS): - Idempotent create mode/RF mismatch detection - Heartbeat mode downgrade prevention (F3) - sync_all/sync_quorum partial replica enforcement (F1) - Concurrent create race safety - Failover/expand mode preservation - Cleanup resilience when delete fails - Master restart auto-register mode handling - Superblock roundtrip all 3 modes - Validate edge cases (mode×RF matrix) - RequiredReplicas quorum math verification - Sentinel error categorization Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> |
5 days ago |
|
|
979a9b496c |
feat: Phase 8 CP8-1/2/3/4 -- ops control plane, multi-replica, CSI snapshots, observability
CP8-1: HTTP REST API (create/delete/lookup/list/assign/servers), blockapi Go client with multi-master failover, 5 shell commands, HTML dashboard at /block/. CP8-2: RF=2/RF=3 multi-replica support -- ShipperGroup fan-out, distributed sync, health scoring, segment-based scrub, gated promotion (heartbeat freshness + WAL LSN + role checks), failover/rebuild for N>2 replicas. CP8-3: CSI snapshot + expansion -- CreateSnapshot/DeleteSnapshot/ListSnapshots RPCs, NodeExpandVolume with iSCSI rescan, snapshot ID helpers, 20 adversarial tests covering concurrent ops, edge cases, and error injection. CP8-4: Observability -- EngineMetrics atomic counters for flusher/group-commit/ WAL-shipper/scrub, 10 new Prometheus metrics, barrier_lag_lsn SLO gauge, failover/promotion/rebuild counters, request ID correlation in master gRPC logs, baseline regression framework with 7 hard-fail conditions. Total: 63 files, ~11.2K LOC, 160+ new tests. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> |
6 days ago |
|
|
8b2b5f6f66 |
feat: Phase 6 CP6-3 -- failover + rebuild in Kubernetes, 126 tests
Wire low-level fencing primitives to master/VS control plane and CSI: - Proto: replica/rebuild address fields on assignment/info/response messages - Assignment queue: retain-until-confirmed (Peek+Confirm), stale epoch pruning - VS assignment receiver: processes assignments from HeartbeatResponse - BlockService replication: ProcessAssignments, deterministic ports (FNV hash) - Registry replica tracking: SetReplica/ClearReplica/SwapPrimaryReplica - CreateBlockVolume: primary + replica, enqueues assignments, single-copy mode - Failover: lease-aware promotion, deferred timers with cancellation on reconnect - ControllerPublish: returns fresh primary iSCSI address after failover - Recovery: recoverBlockVolumes drains pendingRebuilds, enqueues Rebuilding - Real integration tests on M02: failover address switch, rebuild data consistency, full lifecycle failover+rebuild (3 tests, all PASS) Review fixes (12 findings, 5 High, 5 Medium, 2 Low): - R1-1: AllocateBlockVolume returns replication ports - R1-2: setupPrimaryReplication starts rebuild server - R1-3: VS sends periodic block heartbeat for assignment confirmation - R2-F1: LastLeaseGrant set before Register (no stale-lease race) - R2-F2: Deferred promotion timers cancelled on VS reconnect - R2-F3: SwapPrimaryReplica uses RoleToWire instead of uint32(1) - R2-F4: DeleteBlockVolume deletes replica (best-effort) - R2-F5: SwapPrimaryReplica computes epoch atomically under lock - QA: SetReplica removes old replica from byServer index (BUG-QA-CP63-1) 126 CP6-3 tests (67 dev + 48 QA + 8 integration + 3 real). Cumulative Phase 6: 352 tests. All PASS. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> |
6 days ago |
|
|
5a9a52f2d0 |
feat: Phase 6 CP6-2 -- CSI control-plane integration + csi-sanity/k3s validation
CP6-2 wires the CSI driver to SeaweedFS master/volume-server control plane: - Proto: block volume messages in master.proto/volume_server.proto, codegen - Master registry: in-memory BlockVolumeRegistry with Pending->Active status, full/delta heartbeat, inflight lock, placement (fewest volumes) - VS gRPC: AllocateBlockVolume/DeleteBlockVolume handlers, shared naming - Master RPCs: CreateBlockVolume (retry up to 3 servers), Delete, Lookup - Heartbeat: block volume fields wired into bidirectional stream - CSI Controller: VolumeBackend interface (Local + Master), returns volume_context - CSI Node: reads volume_context for remote targets, staged map + IQN derivation - Mode flag: --mode=controller/node/all, --master for control-plane - K8s manifests: csi-driver.yaml, csi-controller.yaml, csi-node.yaml csi-sanity conformance (33 pass, 58 skip) found 6 bugs: - BUG-SANITY-1/2/3: missing VolumeCapabilities/VolumeCapability validation - BUG-SANITY-4: NodePublish used mount instead of bind mount - BUG-SANITY-5: NodeUnpublish didn't remove target path - BUG-SANITY-6: NodeUnpublish failed on unmounted path k3s Level 4 (PVC->Pod data persistence) found 1 bug: - BUG-K3S-1: IsLoggedIn didn't handle iscsiadm exit code 21 226 CSI tests + 54 server tests = 280 new tests, all passing. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> |
1 week ago |
|
|
a089bf6828 |
feat: Phase 4A CP4b-2 -- heartbeat collector, 3 bug fixes, 9 QA tests
BlockVolumeHeartbeatCollector periodically collects block volume status via callback (standalone, no gRPC wiring yet). Store() accessor on BlockService. Three bugs found by QA and fixed: Stop-before-Run deadlock (BUG-CP4B2-1), zero interval panic (BUG-CP4B2-2), callback panic crashes goroutine (BUG-CP4B2-3). 12 new tests (3 dev + 9 QA adversarial). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> |
1 week ago |
|
|
ffdde15bcd |
feat: Phase 4A CP4b-1 -- wire types, conversion helpers, heartbeat collection
Add BlockVolumeInfoMessage, BlockVolumeShortInfoMessage, BlockVolumeAssignment wire-type structs (proto-shaped Go structs). Add conversion helpers with DiskType plumbing, overflow-safe LeaseTTLToWire, validated RoleFromWire. Add CollectBlockVolumeHeartbeat on BlockVolumeStore. 9 new tests. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> |
1 week ago |
|
|
80801b0fac |
feat: Phase 3 — performance tuning, iSCSI session refactor, store integration
Phase 3 delivers five checkpoints: CP1 Engine Tuning: BlockVolConfig tunables, 256-shard DirtyMap, adaptive group commit (low-watermark immediate flush), WAL pressure handling with backpressure and ErrWALFull timeout. CP2 iSCSI Session Refactor: RX/TX goroutine split with respCh (cap 64), txLoop for serialized response writes, StatSN assignment modes. Login phase stays single-goroutine; full-duplex after login. CP3 Store Integration: BlockVolAdapter (iscsi.BlockDevice interface), BlockVolumeStore management, BlockService in volume_server_block.go, CLI flags (--block.listen/dir/iqn.prefix), sw-block-attach.sh helper. CP5 Concurrency Hardening: WAL reuse guard (LSN validation in ReadLBA), opsOutstanding counter with beginOp/endOp + Close drain, appendWithRetry shared by WriteLBA and TrimLBA, flusher LSN guard in FlushOnce. Bug fixes (P3-BUG-2–11): unbounded pending queue cap, Data-Out timeout, flusher error logging, GroupCommitter panic recovery, Close vs concurrent ops guard, target shutdown race, WAL-full retry vs Close, WRITE SAME(16) for XFS, MODE SENSE(10) + VPD 0xB0/0xB2 for Linux kernel compatibility. 797 tests passing (517 engine + 280 iSCSI), go vet clean. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> |
1 week ago |