AllocateBlockVolumeResponse used bs.ListenAddr() to derive replica
addresses. When the VS binds to ":port" (no explicit IP), host
resolved to empty string, producing ":dataPort" as the replica
address. This ":port" propagated through master assignments to both
primary and replica sides.
Now canonicalizes empty/wildcard host using PreferredOutboundIP()
before constructing replication addresses. Also exported
PreferredOutboundIP for use by the server package.
This is the source fix — all downstream paths (heartbeat, API
response, assignment) inherit the canonical address.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
setupReplicaReceiver now reads back canonical addresses from
the ReplicaReceiver (which applies CP13-2 canonicalization)
instead of storing raw assignment addresses in replStates.
This fixes the API-level leak where replica_data_addr showed
":port" instead of "ip:port" in /block/volumes responses,
even though the engine-level CP13-2 fix was working.
New BlockVol.ReplicaReceiverAddr() returns canonical addresses
from the running receiver. Falls back to assignment addresses
if receiver didn't report.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
rebuildFullExtent updated superblock.WALCheckpointLSN but not the
flusher's internal checkpointLSN. NewReplicaReceiver then read
stale 0 from flusher.CheckpointLSN(), causing post-rebuild
flushedLSN to be wrong.
Added Flusher.SetCheckpointLSN() and call it after rebuild
superblock persist. TestRebuild_PostRebuild_FlushedLSN_IsCheckpoint
flips FAIL→PASS.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The test used createSyncAllPair(t) but discarded the replica
return value, leaving the volume file open. On Windows this
caused TempDir cleanup failure. All 7 CP13-1 baseline FAILs
now PASS.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Adds per-replica state reporting in heartbeat so master can identify
which specific replica needs rebuild, not just a volume-level boolean.
New ReplicaShipperStatus{DataAddr, State, FlushedLSN} type reported
via ReplicaShipperStates field on BlockVolumeInfoMessage. Populated
from ShipperGroup.ShipperStates() on each heartbeat. Scales to RF=3+.
V1 constraints (explicit):
- NeedsRebuild cleared only by control-plane reassignment (no local exit)
- Post-rebuild replica re-enters as Disconnected/bootstrap, not InSync
- flushedLSN = checkpointLSN after rebuild (durable baseline only)
4 new tests: heartbeat per-replica state, NeedsRebuild reporting,
rebuild-complete-reenters-InSync (full cycle), epoch mismatch abort.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Flusher now holds WAL entries needed by recoverable replicas.
Both AdvanceTail (physical space) and checkpointLSN (scan gate)
are gated by the minimum flushed LSN across catch-up-eligible
replicas.
New methods on ShipperGroup:
- MinRecoverableFlushedLSN() (uint64, bool): pure read, returns
min flushed LSN across InSync/Degraded/Disconnected/CatchingUp
replicas with known progress. Excludes NeedsRebuild.
- EvaluateRetentionBudgets(timeout): separate mutation step,
escalates replicas that exceed walRetentionTimeout (5m default)
to NeedsRebuild, releasing their WAL hold.
Flusher integration: evaluates budgets then queries floor on each
flush cycle. If floor < maxLSN, holds both checkpoint and tail.
Extent writes proceed normally (reads work), only WAL reclaim
is deferred.
LastContactTime on WALShipper: updated on barrier success,
handshake success, and catch-up completion. Not on Ship (TCP
write only). Avoids misclassifying idle-but-healthy replicas.
CP13-6 ships with timeout budget only. walRetentionMaxBytes
is deferred (documented as partial slice).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Concurrent WriteLBA/Trim calls could deliver WAL entries to replicas
out of LSN order: two goroutines allocate LSN 4 and 5 concurrently,
but LSN 5 could reach the replica first via ShipAll, causing the
replica to reject it as an LSN gap.
shipMu now wraps nextLSN.Add + wal.Append + ShipAll in both
WriteLBA and Trim, guaranteeing LSN-ordered delivery to replicas
under concurrent writers.
The dirty map update and WAL pressure check happen after shipMu
is released — they don't need ordering guarantees.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
doReconnectAndCatchUp() now uses the replicaFlushedLSN returned by
the reconnect handshake as the catch-up start point, not the
shipper's stale cached value. The replica may have less durable
progress than the shipper last knew.
ReplicaReceiver initialization: flushedLSN now set from the
volume's checkpoint LSN (durable by definition), not nextLSN
(which includes unflushed entries). receivedLSN still uses
nextLSN-1 since those entries are in the WAL buffer even if
not yet synced.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Updated 3 reconnect tests to stop/restart the ReplicaReceiver on
the same addresses WITHOUT calling SetReplicaAddr. This preserves
the shipper object, its ReplicaFlushedLSN, HasFlushedProgress flag,
and catch-up state across the disconnect/reconnect cycle.
All 3 tests now PASS:
- TestReconnect_CatchupFromRetainedWal
- CatchupReplay_DataIntegrity_AllBlocksMatch
- CatchupReplay_DuplicateEntry_Idempotent
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Adds the sync_all reconnect protocol: when a degraded shipper
reconnects, it performs a handshake (ResumeShipReq/Resp) to
determine the replica's durable progress, then streams missed
WAL entries to close the gap before resuming live shipping.
New wire messages:
- MsgResumeShipReq (0x03): primary sends epoch, headLSN, retainStart
- MsgResumeShipResp (0x04): replica returns status + flushedLSN
- MsgCatchupDone (0x05): marks end of catch-up stream
Decision matrix after handshake:
- R == H: already caught up → InSync
- S <= R+1 <= H: recoverable gap → CatchingUp → stream → InSync
- R+1 < S: gap exceeds retained WAL → NeedsRebuild
- R > H: impossible progress → NeedsRebuild
WALAccess interface: narrow abstraction (RetainedRange + StreamEntries)
avoids coupling shipper to raw WAL internals.
Bootstrap vs reconnect split: fresh shippers (HasFlushedProgress=false)
use CP13-4 bootstrap path. Previously-synced shippers use handshake.
Catch-up retry budget: maxCatchupRetries=3 before NeedsRebuild.
ReplicaReceiver now initializes receivedLSN/flushedLSN from volume's
nextLSN on construction (handles receiver restart on existing volume).
TestBug2_SyncAll_SyncCache_AfterDegradedShipperRecovers flips FAIL→PASS.
All previously-passing baseline tests remain green.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replaces binary degraded flag with ReplicaState type:
Disconnected, Connecting, CatchingUp, InSync, Degraded, NeedsRebuild.
Ship() allowed from Disconnected (bootstrap: data must flow before
first barrier) and InSync (steady state). Ship does NOT change state.
Barrier() gating:
- InSync: proceed normally
- Disconnected: bootstrap path (connect + barrier)
- Degraded: reconnect both data+ctrl connections, then barrier
- Connecting/CatchingUp/NeedsRebuild: rejected immediately
Only barrier success grants InSync. Reconnect alone does not.
IsDegraded() now means "not sync-eligible" (any non-InSync state).
InSyncCount() added to ShipperGroup.
dist_group_commit.go: removed AllDegraded short-circuit that
prevented bootstrap. Barrier attempts always run — individual
shippers handle their own state-based gating.
8 CP13-4 tests + TestBarrier_RejectsReplicaNotInSync flips FAIL→PASS.
All previously-passing baseline tests remain green.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Barrier response extended from 1-byte status to 9-byte payload
carrying the replica's durable WAL progress (FlushedLSN). Updated
only after successful fd.Sync(), never on receive/append/send.
Replica side: new flushedLSN field on ReplicaReceiver, advanced
only in handleBarrier after proven contiguous receipt + sync.
max() guard prevents regression.
Shipper side: new replicaFlushedLSN (authoritative) replacing
ShippedLSN (diagnostic only). Monotonic CAS update from barrier
response. hasFlushedProgress flag tracks whether replica supports
the extended protocol.
ShipperGroup: MinReplicaFlushedLSN() returns (uint64, bool) —
minimum across shippers with known progress. (0, false) for empty
groups or legacy replicas.
Backward compat: 1-byte legacy responses decoded as FlushedLSN=0.
Legacy replicas explicitly excluded from sync_all correctness.
7 new tests: roundtrip, backward compat, flush-only-after-sync,
not-on-receive, shipper update, monotonicity, group minimum.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
ReplicaReceiver.DataAddr()/CtrlAddr() now return canonical ip:port
instead of raw listener addresses that may be wildcard (:port,
0.0.0.0:port, [::]:port).
New canonicalizeListenerAddr() resolves wildcard IPs using the
provided advertised host (from VS listen address). Falls back to
outbound-IP detection when no advertised host is available.
NewReplicaReceiver accepts optional advertisedHost parameter for
multi-NIC correctness. In production, the assignment path already
provides canonical addresses; this fix ensures test patterns with
:0 bind also produce routable addresses.
7 new tests. TestBug3_ReplicaAddr_MustBeIPPort_WildcardBind flips
from FAIL to PASS.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Same-epoch reconciliation now trusts reported roles first:
- one claims primary, other replica → trust roles
- both claim primary → WALHeadLSN heuristic tiebreak
- both claim replica → keep existing, log ambiguity
Replaced addServerAsReplica with upsertServerAsReplica: checks
for existing replica entry by server name before appending.
Prevents duplicate ReplicaInfo rows during restart/replay windows.
2 new tests: role-trusted same-epoch, duplicate replica prevention.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
When a second server reports the same volume during master restart,
UpdateFullHeartbeat now uses epoch-based tie-breaking instead of
first-heartbeat-wins:
1. Higher epoch wins as primary — old entry demoted to replica
2. Same epoch — higher WALHeadLSN wins (heuristic, warning logged)
3. Lower epoch — added as replica
Applied in both code paths: the auto-register branch (no entry
exists yet for this name) and the unlinked-server branch (entry
exists but this server is not in it).
This is a deterministic reconstruction improvement, not ground
truth. The long-term fix is persisting authoritative volume state.
5 new tests covering all reconciliation scenarios.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Lookup() and ListAll() now return value copies (not pointers to
internal registry state). Callers can no longer mutate registry
entries without holding a lock.
Added clone() on BlockVolumeEntry with deep-copied Replicas slice.
Added UpdateEntry(name, func(*BlockVolumeEntry)) for locked mutation.
ListByServer() also returns copies.
Migrated 1 production mutation (ReplicaPlacement + Preset in create
handler) and ~20 test mutations to use UpdateEntry.
5 new copy-correctness tests: Lookup returns copy, Replicas slice
isolated, ListAll returns copies, UpdateEntry mutates, UpdateEntry
not-found error.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
superMu is mandatory for correctness — all superblock mutation+persist
must be serialized. Remove the nil guard in updateSuperblockCheckpoint
and add SuperMu to all 7 test FlusherConfig sites.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Adds sync.Mutex (superMu) to BlockVol, shared between group commit's
syncWithWALProgress() and flusher's updateSuperblockCheckpoint().
Both paths now serialize superblock mutation + persist, preventing
WALTail/WALCheckpointLSN regression when flusher and group commit
write the full superblock concurrently.
persistSuperblock() also guarded for consistency.
Removes temporary log.Printf lines in the open/recovery path that
were added during BUG-RESTART-ZEROS investigation.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Adds sync.RWMutex (ioMu) to BlockVol enforcing mutual exclusion
between normal I/O and destructive state operations.
Shared (RLock): WriteLBA, ReadLBA, Trim, SyncCache, replica
applyEntry, rebuild applyRebuildEntry — concurrent I/O safe.
Exclusive (Lock): RestoreSnapshot, ImportSnapshot, Expand,
PrepareExpand, CommitExpand, CancelExpand — drains all in-flight
I/O before modifying extent/WAL/dirtyMap.
Scope rule: RLock covers local data-structure mutation only.
Replication shipping is asynchronous and outside the lock, so
exclusive holders block only behind local I/O, not network stalls.
Lock ordering: ioMu > snapMu > assignMu > mu.
Closes the critical ER item: restore/import vs concurrent WriteLBA
silent data corruption gap.
3 new tests: concurrent writes allowed, real restore-vs-write
contention with data integrity check, close coordination.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
New POST /block/volume/plan endpoint returns full placement preview:
resolved policy, ordered candidate list, selected primary/replicas,
and per-server rejection reasons with stable string constants.
Core design: evaluateBlockPlacement() is a pure function with no
registry/topology dependency. gatherPlacementCandidates() is the
single topology bridge point. Plan and create share the same planner —
parity contract is same ordered candidate list for same cluster state.
Create path refactored: uses evaluateBlockPlacement() instead of
PickServer(), iterates all candidates (no 3-retry cap), recomputes
replica order after primary fallback. rf_not_satisfiable severity
is durability-mode-aware (warning for best_effort, error for strict).
15 unit tests + 20 QA adversarial tests.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Preset system: ResolvePolicy resolves named presets (database, general,
throughput) with per-field overrides into concrete volume parameters.
Create path now uses resolved policy instead of ad-hoc validation.
New /block/volume/resolve diagnostic endpoint for dry-run resolution.
Review fix 1 (MED): HasNVMeCapableServer now derives NVMe capability
from server-level heartbeat attribute (block_nvme_addr proto field)
instead of scanning volume entries. Fixes false "no NVMe" warning on
fresh clusters with NVMe-capable servers but no volumes yet.
Review fix 2 (LOW): /block/volume/resolve no longer proxied to leader —
read-only diagnostic endpoint can be served by any master.
Engine fix: ReadLBA retry loop closes stale dirty-map race when WAL
entry is recycled between lookup and read.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Six-task checkpoint hardening the promotion and failover paths:
T1: 4-gate candidate evaluation (heartbeat freshness, WAL lag, role,
server liveness) with structured rejection reasons.
T2: Orphaned-primary re-evaluation on replica reconnect (B-06/B-08).
T3: Deferred timer safety — epoch validation prevents stale timers
from firing on recreated/changed volumes (B-07).
T4: Rebuild addr cleanup on promotion (B-11), NVMe publication
refresh on heartbeat, and preflight endpoint wiring.
T5: Manual promote API — POST /block/volume/{name}/promote with
force flag, target server selection, and structured rejection
response. Shared applyPromotionLocked/finalizePromotion helpers
eliminate duplication between auto and manual paths.
T6: Read-only preflight endpoint (GET /block/volume/{name}/preflight)
and blockapi client wrappers (Preflight, Promote).
BUG-T5-1: PromotionsTotal counter moved to finalizePromotion (shared
by both auto and manual paths) to prevent metrics divergence.
24 files changed, ~6500 lines added. 42 new QA adversarial tests.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
BUG-CP11A4-1 (HIGH): ImportSnapshot now rejects when active snapshots
exist. Import overwrites the extent region that non-CoW'd snapshot blocks
read from, which would silently return import data instead of snapshot-time
data. New ErrImportActiveSnapshots error and snapMu-guarded check.
BUG-CP11A4-2 (HIGH): Double import without AllowOverwrite now correctly
rejected. Import bypasses WAL so nextLSN stays at 1; added FlagImported
(Superblock.Flags bit 0) set after successful import and checked alongside
nextLSN in the non-empty gate.
BUG-CP11A4-3 (MED): Replaced fixed exportTempSnapID (0xFFFFFFFE) with
atomic sequence counter (exportTempSnapBase + exportTempSnapSeq). Each
auto-export gets a unique temp snapshot ID, preventing concurrent export
races and user snapshot ID collisions.
Also added beginOp()/endOp() lifecycle guards to both ExportSnapshot and
ImportSnapshot, and documented the non-atomic import failure semantics.
5 new regression tests + QA-EX-3 rewritten for rejection behavior.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add PressureState() and writer wait tracking to WALAdmission, WALStatus
snapshot API on BlockVol, WAL sizing guidance pure functions, Prometheus
histogram/gauge/counter exports, and admin /status WAL fields. 23 new
tests (7 admission, 10 guidance, 6 QA adversarial).
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
B-09: ExpandBlockVolume re-reads the registry entry after acquiring
the expand inflight lock. Previously it used the entry from the
initial Lookup, which could be stale if failover changed VolumeServer
or Replicas between Lookup and PREPARE.
B-10: UpdateFullHeartbeat stale-cleanup now skips entries with
ExpandInProgress=true. Previously a primary VS restart during
coordinated expand would delete the entry (path not in heartbeat),
orphaning the volume and stranding the expand coordinator.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Two-phase prepare/commit/cancel protocol ensures all replicas expand
atomically. Standalone volumes use direct-commit (unchanged behavior).
Engine: PrepareExpand/CommitExpand/CancelExpand with on-disk
PreparedSize+ExpandEpoch in superblock, crash recovery clears stale
prepare state on open, v.mu serializes concurrent expand operations.
Proto: 3 new RPCs (PrepareExpand/CommitExpand/CancelExpandBlockVolume).
Coordinator: expandClean flag pattern — ReleaseExpandInflight only on
clean success or full cancel. Partial replica commit failure calls
MarkExpandFailed (keeps ExpandInProgress=true, suppresses heartbeat
size updates). ClearExpandFailed for manual reconciliation.
Registry: AcquireExpandInflight records PendingExpandSize+ExpandEpoch.
ExpandFailed state blocks new expands until cleared.
Tests: 15 engine + 4 VS + 10 coordinator + heartbeat suppression
regression + updated QA CP82/durability tests with prepare/commit mocks.
Also includes CP11A-1 remaining: QA storage profile tests, QA
io_backend config tests, testrunner perf-baseline scenarios and
coordinated-expand actions.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
ReplicaInfo now carries NvmeAddr/NQN. Fields are populated during
replica allocation (tryCreateOneReplica), updated from replica
heartbeats, and copied in PromoteBestReplica. This ensures master
lookup returns correct NVMe endpoints immediately after failover,
without waiting for the first post-promotion heartbeat.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add nvme_addr and nqn fields to proto messages (AllocateBlockVolume,
CreateBlockVolume, LookupBlockVolume, BlockVolumeInfoMessage), wire
through volume server → master registry → CSI driver. Volume servers
report NVMe address in heartbeats when NVMe target is running. CSI
MasterVolumeClient now populates NvmeAddr/NQN from master responses,
enabling NVMe/TCP via the master-backend path.
Proto files regenerated with protoc 29.5.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Finding 1: IOBackend=io_uring was accepted and logged as resolved but
had no runtime effect. Now rejected by Validate() until actually wired,
preventing user confusion.
Finding 2: wal_admit_wait_seconds_total was exported as GaugeFunc but
is monotonically increasing. Changed to CounterFunc to match _total
naming convention.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add counters (total, soft, hard, timeout) and wait-time histogram to
WALAdmission, wired through EngineMetrics and exported as Prometheus
metrics. Six new tests verify all code paths. Nil-safe for backwards
compatibility.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
All three io_uring backends (iceber, giouring, raw) now require explicit
build tags — no tag means standard-only. Each backend registers its name
via IOUringImpl so startup logs show compiled implementation alongside
requested/selected backend mode.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Split iouring_linux.go into three build-tagged implementations:
1. iouring_iceber_linux.go (-tags iouring_iceber)
iceber/iouring-go library. Goroutine-based completion model.
Known -72% write regression due to per-op channel overhead.
2. iouring_giouring_linux.go (-tags iouring_giouring)
pawelgaczynski/giouring — direct liburing port. No goroutines,
no channels. Direct SQE/CQE ring manipulation. Kernel 6.0+.
3. iouring_raw_linux.go (default on Linux, no tags needed)
Raw syscall wrappers — io_uring_setup/io_uring_enter + mmap.
Zero dependencies. ~300 LOC. Kernel 5.6+.
Build commands for benchmarking:
go build -tags iouring_iceber ./... # option A
go build -tags iouring_giouring ./... # option B
go build ./... # option C (raw, default)
go build -tags no_iouring ./... # disable all io_uring
All variants implement the same BatchIO interface. Cross-compile
verified for all four tag combinations.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The iceber/iouring-go SubmitRequests returns a RequestSet interface
which cannot be ranged over directly. Use resultSet.Done() to wait
for all completions, then iterate resultSet.Requests().
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace UseIOUring bool with IOBackend IOBackendMode (tri-state):
- "standard" (default): sequential pread/pwrite/fdatasync
- "auto": try io_uring, fall back to standard with warning log
- "io_uring": require io_uring, fail startup if unavailable
NewIOUring now returns ErrIOUringUnavailable instead of silently
falling back — callers decide whether to fail or fall back based
on the requested mode. All mode transitions are logged:
io backend: requested=auto selected=standard reason=...
io backend: requested=io_uring selected=io_uring
CLI: --io-backend=standard|auto|io_uring added to iscsi-target.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
1. HIGH: LinkedWriteFsync now uses SubmitLinkRequests (IOSQE_IO_LINK)
instead of SubmitRequests, ensuring write+fdatasync execute as a
linked chain in the kernel. Falls back to sequential on error.
2. HIGH: PreadBatch/PwriteBatch chunk ops by ring capacity to prevent
"too many requests" rejection when dirty map exceeds ring size (256).
3. MED: CloseBatchIO() added to Flusher, called in BlockVol.Close()
after final flush to release io_uring ring / kernel resources.
4. MED: Sync parity — both standard and io_uring paths now use
fdatasync (via platform-specific fdatasync_linux.go / fdatasync_other.go).
Standard path previously used fsync; now matches io_uring semantics.
On non-Linux, fdatasync falls back to fsync (only option available).
10 batchio tests, all blockvol tests pass.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add iouring_linux.go (build-tagged linux && !no_iouring) using
iceber/iouring-go for batched pread/pwrite/fdatasync. Includes
linked write+fsync chain for group commit optimization.
iouring_other.go provides silent fallback to standard on non-Linux.
blockvol.go wires UseIOUring config flag through to flusher BatchIO.
NewIOUring gracefully falls back if kernel lacks io_uring support.
10 batchio tests, all blockvol tests pass unchanged.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
New package batchio/ with BatchIO interface (PreadBatch, PwriteBatch,
Fsync, LinkedWriteFsync) and standard sequential implementation.
Flusher refactored to use BatchIO: WAL header reads, WAL entry reads,
and extent writes are now batched through the interface. With the
default NewStandard() backend, behavior is identical to before.
UseIOUring config field added for future io_uring opt-in (Linux 5.6+).
9 interface tests, all existing blockvol tests pass unchanged.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>