feat: Phase 09 P0 — production execution closure plan

Execution-closure targets: - P1: TransferFullBase — reuse rebuild.go TCP protocol - P2: TransferSnapshot — checkpoint image + WAL tail - P3: TruncateWAL — AdvanceTail + superblock update - P4: Runtime ownership — V2 orchestrator drives execution Key reuse sources identified: - rebuild.go: rebuildFullExtent (client), RebuildServer (server) - wal_writer.go: AdvanceTail - flusher.go: updateSuperblockCheckpoint - blockvol.go: ScanWALEntries (already wired) Slice order: full-base first (highest value), then snapshot, then truncation, then runtime ownership. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
22 hours ago · 46faf0f7e3
11 changed files with 2092 additions and 60 deletions
--- a/sw-block/.private/phase/phase-08-decisions.md
+++ b/sw-block/.private/phase/phase-08-decisions.md
@ -76,3 +76,112 @@ The hard rule remains:
 1. engine owns recovery policy
 2. bridge translates confirmed control/storage truth
 3. `blockvol` executes I/O
+
+## Decision 9: Phase 08 P1 is accepted with explicit scope limits
+
+Accepted `P1` coverage is:
+
+1. real `ProcessAssignments()` path drives V2 engine sender/session state change
+2. stable remote `ReplicaID` is derived from `ServerID`, not address
+3. address change preserves sender identity through the live control path
+4. stale epoch/session invalidation occurs through the live control path
+5. missing `ServerID` fails closed
+
+Not accepted as part of `P1`:
+
+1. full end-to-end gRPC heartbeat delivery proof
+2. integrated catch-up execution through the live path
+3. rebuild execution through the live path
+4. final local stable identity beyond transport-shaped `listenAddr`
+
+## Decision 10: Phase 08 P2 is accepted as real execution closure
+
+Accepted `P2` coverage is:
+
+1. `CommittedLSN` is separated from `CheckpointLSN` on the chosen `sync_all` path
+2. catch-up is proven as one live chain:
+   - engine plan
+   - engine executor
+   - `v2bridge`
+   - real `blockvol` I/O
+   - completion
+   - cleanup
+3. rebuild is proven as one live chain for the delivered path
+4. cleanup/pin release is asserted after execution
+
+Residual non-blocking scope notes:
+
+1. `CatchUpStartLSN` is not directly asserted in tests
+2. rebuild source variants are not all forced and individually asserted
+
+## Decision 11: Phase 08 now moves to unified hardening validation
+
+With `P1` and `P2` accepted, the next required step is:
+
+1. replay the accepted failure-class set again on the unified live path
+2. validate at least one real failover / reassignment cycle
+3. validate concurrent retention/pinner behavior
+4. make the committed-truth gate decision explicit for the chosen candidate path
+
+## Decision 12: Phase 08 P3 is accepted as unified hardening validation
+
+Accepted `P3` coverage is:
+
+1. replay of the accepted failure-class set on the unified `P1` + `P2` live path
+2. at least one real failover / reassignment cycle through the live control path
+3. one true simultaneous-overlap retention/pinner safety proof
+4. stronger causality assertions for invalidation, escalation, catch-up, and completion
+
+## Decision 13: The committed-truth gate is decided for the chosen candidate path
+
+For the chosen `RF=2 sync_all` candidate path:
+
+1. `CommittedLSN = WALHeadLSN`
+2. `CheckpointLSN` remains the durable base-image boundary
+3. this separation is accepted as sufficient for the candidate-path hardening boundary
+
+This decision is intentionally scoped:
+
+1. it is accepted for the chosen candidate path
+2. it is not yet a blanket truth for every future path or durability mode
+
+## Decision 14: Phase 08 P4 is candidate-path judgment, not broad new engineering expansion
+
+`P4` should close `Phase 08` by producing one explicit candidate-path judgment.
+
+Its main output is not more isolated engineering progress, but:
+
+1. a bounded candidate-path statement
+2. an evidence-to-claim mapping from accepted `P1` / `P2` / `P3` results
+3. an explicit list of accepted bounds, remaining deferrals, and production blockers
+
+`P4` may include small closure work if needed to make the candidate statement coherent, but it should not reopen protocol design or grow into another broad hardening slice.
+
+## Decision 15: Phase 08 P4 is accepted as candidate package closure
+
+Accepted `P4` coverage is:
+
+1. one explicit candidate package for the chosen `RF=2 sync_all` path
+2. candidate-safe claims mapped to accepted `P1` / `P2` / `P3` evidence
+3. explicit bounds, deferred items, and production blockers
+4. committed-truth decision scoped to the chosen candidate path
+5. module/package boundary summary for the next heavy engineering phase
+
+Accepted judgment:
+
+1. candidate-safe-with-bounds
+2. not production-ready
+
+## Decision 16: Phase 08 is closed and the next heavy phase is production execution closure
+
+With `P0` through `P4` accepted, `Phase 08` is closed.
+
+The next phase should not be a light packaging-only round.
+It should begin with:
+
+1. `Phase 09: Production Execution Closure`
+2. `P0` planning for:
+   - real `TransferFullBase`
+   - real `TransferSnapshot`
+   - real `TruncateWAL`
+   - stronger live runtime execution ownership
--- a/sw-block/.private/phase/phase-08-log.md
+++ b/sw-block/.private/phase/phase-08-log.md
@ -17,5 +17,398 @@
 ### Next

 1. Phase 08 P0 accepted
-2. Phase 08 P1 real master/control delivery integration
-3. Phase 08 P2 integrated execution closure
+2. Phase 08 P1 accepted
+3. Phase 08 P2 accepted
+4. Phase 08 P3 hardening validation on the unified live path
+5. Phase 08 P4 candidate package closure accepted
+6. Phase 08 closeout bookkeeping complete
+7. next: open Phase 09 P0 for production execution closure planning
+
+### P3 Technical Pack
+
+Purpose:
+
+- provide the minimum design/algo/test detail needed to execute `P3`
+- reuse accepted `P1` / `P2` live-path closure
+- avoid broad scenario growth or repeated proof of already accepted mechanics
+
+#### Design / algo focus
+
+`P3` is not another execution-closure slice.
+It assumes these are already accepted on the chosen path:
+
+- real control delivery
+- real catch-up one-chain closure
+- real rebuild one-chain closure
+
+What `P3` adds is hardening evidence on top of that live path:
+
+1. replay accepted failure classes again on the unified path
+2. prove one real failover / reassignment cycle
+3. prove one overlapping retention/pinner safety case
+4. produce one explicit committed-truth gate decision
+
+Key algorithm rules for `P3`:
+
+- control truth remains primary:
+  - failover / reassignment is driven by new assignment / epoch truth
+  - storage/runtime must not invent role changes
+- recovery choice remains engine-owned:
+  - engine chooses `zero_gap` / `catchup` / `needs_rebuild`
+  - bridge and `blockvol` execute what the engine already decided
+- overlapping recovery must remain fail-closed:
+  - retained floor = minimum active retention requirement
+  - stale or cancelled plan must release its hold
+  - a new authoritative plan must not inherit leaked resources from an old one
+- committed-truth gate must be output, not discussed informally:
+  - either the chosen candidate path is accepted with current committed/checkpoint semantics
+  - or the next phase is blocked on further separation/bounding work
+
+#### Validation matrix
+
+Use one compact replay matrix rather than many near-duplicate tests.
+
+1. Changed-address restart
+   - trigger: address refresh / reassignment while prior identity is preserved
+   - expected: old session invalidated, same logical `ReplicaID`, new recovery starts cleanly
+   - assert:
+     - no stale session mutation
+     - no leaked pins
+     - logs show why identity stayed and session changed
+
+2. Stale epoch / stale session
+   - trigger: epoch bump during or before recovery continuation
+   - expected: stale execution loses authority immediately
+   - assert:
+     - old session cannot mutate
+     - replacement assignment/session becomes the only live authority
+     - logs show invalidation reason
+
+3. Unrecoverable gap / needs-rebuild
+   - trigger: replica falls behind retained WAL
+   - expected: engine chooses `needs_rebuild`, rebuild path executes or is prepared according to accepted boundary
+   - assert:
+     - no catch-up overclaim
+     - correct rebuild source/result logged
+     - no leaked pins after completion/failure
+
+4. Post-checkpoint boundary behavior
+   - trigger: replica state around checkpoint / committed boundary
+   - expected: classification and execution match the chosen candidate-path semantics
+   - assert:
+     - chosen path does not overclaim beyond the accepted boundary
+     - committed/checkpoint truth used here matches the explicit gate decision
+
+#### Required extra cases
+
+Besides the replay matrix, `P3` should add only two new validation cases:
+
+1. One real failover / promotion / reassignment cycle
+   - primary change or reassignment through the live control path
+   - verify old authority dies, new authority starts, recovery resumes/starts correctly
+
+2. One true simultaneous-overlap retention/pinner case
+   - two live recovery holds coexist before the earlier one is released
+   - verify:
+     - minimum retention floor is respected while both are live
+     - releasing one hold leaves the other hold still contributing the correct floor
+     - released/cancelled plan stops contributing to retention floor
+     - final hold count returns to zero
+
+#### Expected evidence
+
+For each accepted `P3` case, prefer explicit evidence blocks:
+
+- entry truth:
+  - assignment / epoch / role that started the case
+- engine result:
+  - selected outcome or invalidation result
+- execution result:
+  - completion / cancel / failure
+- cleanup result:
+  - `ActiveHoldCount() == 0`
+  - no surviving active session when case should be closed
+- observability result:
+  - logs explain:
+    - why control truth changed
+    - why session changed
+    - why catch-up vs rebuild happened
+    - why execution completed / failed / cancelled
+
+#### Efficient test plan
+
+Keep `P3` small and high-signal:
+
+- one unified replay test package or compact matrix
+- one real failover-cycle test
+- one overlapping-retention test
+- one explicit gate-decision record in delivery / phase status
+
+Avoid:
+
+- re-proving isolated `P2` one-chain mechanics
+- broad combinatorial growth across many replicas / roles / timing permutations
+- turning `P3` into another protocol-design slice
+
+### P4 Technical Pack
+
+Purpose:
+
+- provide the minimum design/algo/test detail needed to close `Phase 08`
+- convert accepted `P1` / `P2` / `P3` evidence into one candidate-path judgment
+- keep `P4` as a closure slice, not another broad engineering slice
+
+#### Delivery sequence
+
+Use this order:
+
+1. `sw` develops the candidate package
+2. `architect` reviews code/claim shape before tester time is spent
+3. `tester` validates the evidence-to-claim mapping
+4. `manager` records the final phase/accounting decision
+
+Do not collapse these roles:
+
+- `sw` builds the candidate statement and supporting artifacts
+- `architect` checks whether the resulting package has obvious semantic, scope, or evidence-shape problems before tester validation
+- `tester` checks whether every claim is actually supported
+- `manager` decides acceptance/bookkeeping after architect + tester feedback
+
+Recommended handoff gate before tester:
+
+- if architect finds obvious overclaim, missing evidence mapping, or broken candidate shape, return to `sw` first
+- do not spend tester time on a package that is clearly not ready
+
+#### Design / algo focus
+
+`P4` should not introduce new protocol shape.
+It consumes already accepted results:
+
+- `P1`: real control delivery
+- `P2`: real execution closure
+- `P3`: unified hardening validation
+
+The main design task is to classify the chosen path into three buckets:
+
+1. candidate-safe
+   - supported by accepted evidence
+   - allowed to appear in the candidate statement
+2. intentionally bounded
+   - accepted only within narrow limits
+   - must appear as explicit candidate bounds
+3. deferred or blocking
+   - not yet supported enough
+   - must not be implied as candidate-ready
+
+Algorithmically, `P4` is a classification/output slice:
+
+- no new recovery FSM
+- no new identity model
+- no new rebuild policy
+- no new durability model
+
+It should only:
+
+- map accepted evidence to accepted candidate claims
+- map residual limitations to explicit bounds or blockers
+- separate candidate readiness from production readiness
+
+#### Required output artifacts
+
+`sw` should produce exactly these artifacts:
+
+1. Candidate statement
+   - what the chosen `RF=2 sync_all` path is allowed to claim
+
+2. Evidence-to-claim map
+   - each candidate claim points to accepted evidence from `P1` / `P2` / `P3`
+
+3. Bound list
+   - explicit candidate-safe bounds, for example:
+     - chosen path only
+     - chosen durability mode only
+     - accepted rebuild coverage only
+
+4. Deferred / blocking list
+   - what remains outside the candidate path
+   - what still blocks production readiness
+
+#### Candidate statement shape
+
+Keep the candidate statement short and structured.
+It should answer only:
+
+1. What path is the candidate?
+2. What is proven for that path?
+3. What is intentionally bounded for that path?
+4. What is still deferred or blocking?
+
+Good pattern:
+
+- candidate path:
+  - `RF=2 sync_all` on the accepted master/heartbeat control path
+- proven:
+  - real control delivery
+  - real catch-up closure
+  - real rebuild closure for accepted coverage
+  - unified replay and failover validation
+- bounded:
+  - only the chosen path / mode
+  - only accepted rebuild/source coverage
+- not yet claimed:
+  - general future path/mode truth
+  - production readiness
+
+#### Candidate statement template
+
+Use this exact structure for the `P4` delivery statement:
+
+1. Candidate path
+   - The first candidate path is:
+     - `<path / topology / durability mode>`
+
+2. Candidate-safe claims
+   - The candidate path is supported for:
+     - `<claim 1>` — evidence: `<P1/P2/P3 reference>`
+     - `<claim 2>` — evidence: `<P1/P2/P3 reference>`
+     - `<claim 3>` — evidence: `<P1/P2/P3 reference>`
+
+3. Explicit bounds
+   - This candidate statement is intentionally bounded to:
+     - `<bound 1>`
+     - `<bound 2>`
+     - `<bound 3>`
+
+4. Deferred or blocking items
+   - Not yet claimed as candidate-safe:
+     - `<deferred item 1>`
+     - `<deferred item 2>`
+   - Still blocking production readiness:
+     - `<blocker 1>`
+     - `<blocker 2>`
+
+5. Committed-truth decision
+   - For this candidate path:
+     - `<committed-truth decision>`
+   - Scope:
+     - `<why this does not automatically generalize>`
+
+6. Overall judgment
+   - Judgment:
+     - `<candidate-safe / candidate-safe-with-bounds / not-yet-candidate>`
+   - Reason:
+     - `<one short paragraph tying evidence to judgment>`
+
+When `sw` fills this template:
+
+- every positive claim must carry an evidence reference
+- every important missing area must appear either under:
+  - explicit bounds
+  - deferred
+  - blockers
+- avoid prose that mixes candidate judgment with production-readiness language
+
+#### Assignment template
+
+Use this template when assigning `P4` work to `sw`:
+
+1. Goal
+   - Build the `P4` candidate package for the chosen path.
+
+2. Required outputs
+   - candidate statement
+   - evidence-to-claim mapping
+   - explicit bounds list
+   - deferred / blocking list
+   - committed-truth decision statement
+
+3. Hard rules
+   - no new protocol redesign
+   - no broad scope growth without candidate impact
+   - every positive claim must map to accepted `P1` / `P2` / `P3` evidence
+   - do not mix candidate readiness with production readiness
+
+4. Delivery order
+   - first hand to architect review
+   - only after architect review passes, hand to tester validation
+   - manager records final acceptance/bookkeeping last
+
+5. Reject before handoff if
+   - evidence-to-claim mapping is incomplete
+   - important limitations are not classified as bounded / deferred / blocking
+   - claims exceed accepted evidence
+
+Use this template when assigning `P4` validation to `tester`:
+
+1. Goal
+   - Validate that the candidate package is fully supported by accepted evidence.
+
+2. Validate
+   - each claim has accepted evidence
+   - each bound/deferred/blocker is explicit
+   - committed-truth decision stays scoped correctly
+   - no candidate-to-production overclaim exists
+
+3. Output
+   - pass/fail on each candidate claim group
+   - findings on unsupported claims, missing bounds, or hidden blockers
+
+#### Tester validation checklist
+
+`tester` should validate:
+
+1. every positive candidate claim has accepted evidence
+2. every important limitation appears in either:
+   - bounded
+   - deferred
+   - blocking
+3. no accepted evidence is stretched into a broader product claim
+4. committed-truth decision stays scoped to the chosen candidate path
+5. candidate readiness is not confused with production readiness
+
+#### Architect review focus
+
+`architect` should review only:
+
+1. semantic correctness of the candidate statement
+2. whether the evidence-to-claim mapping is honest
+3. whether bounds are explicit enough to prevent future drift
+4. whether any hidden overclaim remains
+
+This review should not reopen already accepted `P1` / `P2` / `P3` mechanics unless the candidate statement contradicts them.
+
+#### Efficient test / evidence plan
+
+`P4` should mostly reuse accepted evidence rather than add new broad tests.
+
+Preferred work:
+
+- collect accepted evidence references
+- compress them into candidate-safe claims
+- write one explicit residual-gap list
+
+Only add new code/tests if a small missing blocker prevents a coherent candidate statement.
+
+Avoid:
+
+- large new replay matrices
+- new protocol experiments
+- broad implementation growth without candidate impact
+
+### Closeout bookkeeping
+
+Manager follow-up after `P4` acceptance found only a minor bookkeeping concern:
+
+- ensure `phase-08.md` is explicitly closed before treating `Phase 09` as opened
+
+Closeout check:
+
+1. `phase-08.md` is `Status: complete`
+2. `P4` is recorded as accepted
+3. `Phase-close note` points to `Phase 09: Production Execution Closure`
+4. `phase-08-decisions.md` records `Decision 16`
+
+Final bookkeeping judgment:
+
+- `Phase 08` is closed
+- `Phase 09 P0` is the active next planning/engineering package
--- a/sw-block/.private/phase/phase-08.md
+++ b/sw-block/.private/phase/phase-08.md
@ -1,7 +1,7 @@
 # Phase 08

 Date: 2026-03-31
-Status: active
+Status: complete
 Purpose: convert the accepted Phase 07 product path into a pre-production-hardening program without reopening accepted V2 protocol shape

 ## Why This Phase Exists
@ -19,6 +19,15 @@ What still does not exist is a pre-production-ready system path. The remaining w

 Harden the first accepted V2 product path until the remaining gap to a production candidate is explicit, bounded, and implementation-driven.

+This phase doc is the canonical hardening contract for `sw` and `tester`.
+Use `phase-08-log.md` for deeper engineering process, alternatives, and implementation detail.
+
+Algorithm note:
+
+- the accepted V2 algorithm / protocol shape is treated as fixed for this phase
+- remaining work is engineering closure over real Seaweed/V1 runtime paths under V2 boundaries
+- do not reopen protocol design unless a live contradiction is found
+
 ## Scope

 ### In scope
@ -67,6 +76,11 @@ Status:
  - candidate-path readiness vs production readiness
 - accepted

+Reference:
+
+- `sw-block/design/phase-08-engine-skeleton-map.md` is the implementation-side skeleton map for this phase
+- it is subordinate to `sw-block/design/v2-protocol-truths.md` and this `phase-08.md`; use it for module layout, execution order, interim fields, hard gates, and reuse guidance
+
 ### P1: Real Control Delivery

 1. connect real master/heartbeat assignment delivery into the bridge
@ -109,13 +123,6 @@ Implementation route (`reuse map`):
  - keep engine as the recovery-policy owner
  - keep `blockvol` as the I/O executor

-Expectation note:
-
- the `P1` tester expectation is already embedded in this phase doc under:
-  - `P1 / Validation focus`
-  - `P1 / Reject if`
- do not grow a separate long template unless `P1` scope expands materially
-
 Validation focus:

 - prove live assignment delivery into the bridge/engine path
@ -135,34 +142,312 @@ Reject if:
 - failover / reassignment is claimed without a real replay target
 - delivery claims general production readiness rather than control-path closure

+Status:
+
+- accepted
+- real assignment delivery into the V2 path is now proven through `ProcessAssignments()`
+- accepted evidence includes:
+  - live assignment -> engine sender/session creation
+  - stable remote `ReplicaID = <volume>/<ServerID>`
+  - address-change identity preservation through the live path
+  - stale epoch/session invalidation through the live path
+  - fail-closed skip on missing `ServerID`
+- accepted with explicit carry-forwards:
+  - `localServerID = listenAddr` remains transport-shaped for local identity
+  - heartbeat -> `ProcessAssignments()` is proven, but not full end-to-end gRPC delivery
+  - integrated catch-up execution is not yet proven through the live path
+  - rebuild execution remains deferred
+  - `CommittedLSN = CheckpointLSN` remains unresolved
+
 ### P2: Execution Closure

 1. close the live engine -> executor -> `v2bridge` execution chain
 2. make catch-up execution evidence integrated rather than split across layers
 3. close the first rebuild execution path required by the product path

+Technical focus:
+
+- keep execution ownership explicit:
+  - engine plans and owns recovery state transitions
+  - engine executor drives stepwise execution
+  - `v2bridge` translates execution requests into real blockvol work
+  - `blockvol` performs I/O only
+- prove catch-up as one real path:
+  - accepted control delivery
+  - real retained-history input
+  - real WAL retention pin
+  - real WAL scan / progress return
+  - real session completion
+- choose the narrowest rebuild closure required by the current product path:
+  - first real `full-base` rebuild path is preferred
+  - `snapshot + tail` can remain later unless needed by the chosen path
+- keep resource ownership fail-closed:
+  - pin acquisition before execution
+  - release on success
+  - release on cancel / invalidation
+  - release on partial failure
+- keep observability causal:
+  - execution start
+  - execution progress
+  - execution cancel / invalidation
+  - execution failure
+  - completion
+
+Implementation route:
+
+- reuse engine-side execution core:
+  - `sw-block/engine/replication/driver.go`
+  - `sw-block/engine/replication/executor.go`
+  - `sw-block/engine/replication/orchestrator.go`
+- reuse storage/runtime execution bridge:
+  - `weed/storage/blockvol/v2bridge/executor.go`
+  - `weed/storage/blockvol/v2bridge/pinner.go`
+  - `weed/storage/blockvol/v2bridge/reader.go`
+- reuse block runtime execution reality:
+  - `weed/storage/blockvol/blockvol.go`
+  - `weed/storage/blockvol/replica_apply.go`
+  - `weed/storage/blockvol/replica_barrier.go`
+  - rebuild-side files under `weed/storage/blockvol/`
+- preserve the boundary:
+  - do not move zero-gap / catch-up / rebuild classification into `blockvol`
+  - do not let executor convenience paths redefine protocol semantics
+
+Validation focus:
+
+- prove one live integrated catch-up chain:
+  - assignment/control arrives through accepted `P1` path
+  - engine plans
+  - executor drives `v2bridge`
+  - `blockvol` executes
+  - progress returns
+  - session completes
+- prove one real rebuild execution path for the chosen product path
+- prove retention pin / release symmetry on the live path
+- prove rebuild resource pin / release symmetry on the live path
+- prove invalidation / cancel cleanup on the live path
+- prove execution logs explain:
+  - why catch-up started
+  - why rebuild started
+  - why execution failed
+  - why execution was cancelled
+  - why completion succeeded
+
+Reject if:
+
+- catch-up is still only proven by split evidence
+- rebuild remains only a detection outcome
+- `blockvol` starts deciding recovery mode or rebuild fallback
+- resources leak on cancel / invalidation / partial failure
+- execution logs are too weak to replay causality offline
+- the slice quietly broadens protocol semantics beyond the current accepted boundary
+
+Recommended first cut:
+
+1. close the live catch-up chain first
+2. close the first real `full-base` rebuild path second
+3. leave unified replay to `P3`
+
+Minimum closure threshold:
+
+- do not accept `P2` on glue code + partial chain tests alone
+- at least one accepted catch-up proof must drive the real engine executor path:
+  - `PlanRecovery(...)`
+  - `NewCatchUpExecutor(...)`
+  - executor-managed progress / completion
+  - real `v2bridge` / `blockvol` execution underneath
+- at least one accepted rebuild proof must drive the real engine executor path:
+  - rebuild assignment
+  - `PlanRebuild(...)`
+  - `NewRebuildExecutor(...)`
+  - executor-managed completion
+  - real `TransferFullBase(...)` underneath
+- resource-cleanup proof must include live-path assertions, not only logs:
+  - active holds released
+  - retention floor no longer pinned after release
+  - no surviving session/plan ownership after cancel / invalidation / failure
+- observability proof should include executor-generated events, not only planner-side events
+- if these thresholds are not met, record `P2` as partial execution progress, not execution closure
+
+Carry-forward note:
+
+- on the chosen `RF=2 sync_all` path, `CommittedLSN` separation is resolved in this slice:
+  - `CommittedLSN = WALHeadLSN`
+  - `CheckpointLSN` remains the durable base-image boundary
+- this is not yet a blanket truth for every future path or durability mode
+- post-checkpoint catch-up remains bounded unless explicitly closed
+- rebuild coverage is limited to the first chosen executable path if that is all that lands
+
+Status:
+
+- accepted
+- real one-chain execution is now proven for:
+  - catch-up
+  - rebuild
+- accepted evidence includes:
+  - `CommittedLSN` separated from `CheckpointLSN` on the chosen `sync_all` path
+  - live engine plan -> executor -> `v2bridge` -> `blockvol` catch-up chain
+  - live engine plan -> executor -> `v2bridge` -> `blockvol` rebuild chain
+  - explicit pin cleanup assertions after execution
+- accepted with explicit residual scope:
+  - `CatchUpStartLSN` is not directly asserted in tests
+  - rebuild source is not yet forced/verified per source variant
+  - broader rebuild-source coverage can remain follow-up work
+
+Review checklist:
+
+- is there one accepted catch-up proof from real `P1` control path to real session completion, using `CatchUpExecutor`
+- is there one accepted first rebuild proof on the chosen path, using `RebuildExecutor`
+- do live-path assertions prove pin/hold release on success, cancel, invalidation, and failure
+- do logs/status explain start, cancel, failure, and completion without hidden transitions
+- does the delivery avoid overclaiming general post-checkpoint catch-up, broad rebuild coverage, or production readiness
+
 ### P3: Hardening Validation

-1. validate diagnosability under the live integrated path
-2. validate retention/pinner behavior under concurrent load
-3. replay the accepted failure-class set again on the newly unified live path after `P1` and `P2` land
-4. confirm the remaining gap to a production candidate
+1. replay the accepted failure-class set again on the unified live path after `P1` + `P2`
+2. validate at least one real failover / promotion / reassignment cycle through the live control path
+3. validate concurrent retention/pinner behavior under overlapping recovery activity
+4. make the committed-truth gate decision explicit for the chosen candidate path
+
+Slice adjustment note:
+
+- if `P2` lands only partially, `P3` should first close the missing execution outcome:
+  - real catch-up closure if still missing
+  - real first rebuild closure if still missing
+- only after both are real should `P3` spend most of its weight on unified replay, failover / reassignment validation, and concurrent retention / cleanup hardening
+
+Efficiency note:
+
+- `P3` is a hardening-validation slice, not another execution-closure slice
+- reuse the accepted `P1` / `P2` live path as the base; do not re-prove already accepted chain mechanics in isolation
+- prefer one compact replay matrix over many near-duplicate tests
+- prefer one real failover cycle and one true simultaneous-overlap retention case over broad scenario expansion
+- the required new outputs are:
+  - unified replay evidence
+  - one real failover / reassignment replay
+  - one concurrent retention/pinner safety result
+  - one explicit committed-truth gate decision

 Validation focus:

- prove the chosen path through a real control-delivery path
- prove the live engine -> executor -> `v2bridge` execution chain as one path, not split evidence
- prove the first rebuild execution path required by the chosen product path
- prove at least one real failover / promotion / reassignment cycle
- prove concurrent retention/pinner behavior does not break recovery guarantees
+- unified replay for:
+  - changed-address restart
+  - stale epoch / stale session
+  - unrecoverable gap / needs-rebuild
+  - post-checkpoint boundary behavior
+- at least one real failover / promotion / reassignment cycle
+- concurrent retention/pinner safety under at least one true simultaneous-overlap hold case
+- logs explain:
+  - why control truth changed
+  - why a session was invalidated
+  - why catch-up vs rebuild was chosen
+  - why execution completed, failed, or was cancelled

 Reject if:

- catch-up semantics are overclaimed beyond the currently proven boundary
- rebuild is claimed as supported without real execution closure
- master/control delivery is claimed as real without the live path in place
- `CommittedLSN` vs `CheckpointLSN` remains an unclassified note instead of a gate decision
- `P1` and `P2` land independently but the accepted failure-class set is not replayed again on the unified live path
+- accepted failure classes are still only partially replayed on the unified path
+- failover / reassignment is claimed without a real live-path replay
+- concurrent retention/pinner behavior leaks pins or violates recovery safety
+- logs are too weak to replay causality offline
+- the committed-truth gate is still just a note instead of an explicit decision
+
+Status:
+
+- accepted
+- unified hardening replay is now proven on the accepted live path
+- accepted evidence includes:
+  - replay of the accepted failure-class set on the unified `P1` + `P2` path
+  - at least one real failover / reassignment cycle through the live control path
+  - one true simultaneous-overlap retention/pinner safety proof
+  - stronger causality assertions for invalidation, escalation, catch-up, and completion
+- committed-truth gate decision for the chosen candidate path:
+  - for the chosen `RF=2 sync_all` candidate path, `CommittedLSN = WALHeadLSN` with `CheckpointLSN` kept separate is accepted as sufficient for the candidate-path hardening boundary
+  - this is not yet a blanket truth for every future path or durability mode
+
+### P4: Candidate Package Closure
+
+1. classify what is truly ready for a first candidate path
+2. package the accepted `P1` / `P2` / `P3` evidence into one bounded candidate package
+3. turn carry-forwards into explicit candidate bounds or hard gates
+4. state clearly what still remains before production readiness
+
+Goal:
+
+- finish `Phase 08` with one explicit candidate package, not just a collection of accepted slices
+
+Verification mechanism:
+
+- evidence map:
+  - every candidate claim must point to accepted evidence from `P1` / `P2` / `P3`
+- tester validation:
+  - verify each candidate claim is supported by accepted evidence
+  - reject any claim that exceeds the proven boundary
+- manager validation:
+  - verify the candidate statement is explicit, bounded, and not confused with production readiness
+
+Output artifacts:
+
+1. candidate-path statement in `phase-08.md`
+2. candidate/gate decision record in `phase-08-decisions.md`
+3. concise candidate package summary:
+   - candidate-safe capabilities
+   - explicit bounds
+   - deferred / blocking items
+4. concise residual-gap summary:
+   - candidate-safe
+   - intentionally bounded
+   - still deferred / still blocking
+5. short module/package boundary summary for later phases:
+   - what is already strong enough
+   - what moves to the next heavy engineering phase
+
+Efficiency note:
+
+- `P4` should mostly consume already accepted evidence, not create broad new engineering work
+- only add implementation work if a small remaining blocker must be closed to make the candidate statement coherent
+- if a gap is real but not worth closing in `Phase 08`, classify it explicitly rather than expanding scope implicitly
+- `P4` exists inside `Phase 08` so the next phase can begin with substantial engineering work, not a light packaging-only round
+
+Validation focus:
+
+- make the candidate-path boundary explicit:
+  - what is proven
+  - what is intentionally bounded
+  - what is still deferred
+- make the candidate package explicit:
+  - candidate-safe capability list
+  - evidence-to-claim mapping
+  - short module/package boundary summary
+- make the committed-truth decision explicit:
+  - accepted for the chosen `RF=2 sync_all` candidate path
+  - still unclassified for future paths / durability modes unless separately proven
+- prove the accepted product path can be described as an engineering candidate, not only as a set of slice-local proofs
+- provide one explicit residual-gap list that separates:
+  - candidate-safe bounds
+  - future hardening work
+  - production blockers
+
+Reject if:
+
+- `P4` reopens protocol design instead of closing engineering gaps
+- candidate claims are broader than the proven path
+- carry-forwards remain informal notes rather than bounds or gates
+- production readiness is implied from candidate readiness
+- `P4` produces only prose summary without an evidence-to-claim mapping
+- `P4` is too thin to leave the next phase with substantial engineering closure work
+
+Status:
+
+- accepted
+- the first candidate package is now explicit for the chosen path
+- accepted evidence includes:
+  - candidate-safe claims mapped to accepted `P1` / `P2` / `P3` evidence
+  - explicit bounds for `RF=2 sync_all`
+  - explicit deferred / blocking items before production use
+  - committed-truth decision scoped to the chosen candidate path
+  - short module/package boundary summary for the next heavy engineering phase
+- accepted judgment:
+  - candidate-safe-with-bounds
+  - not production-ready

 ## Guardrails

@ -195,12 +480,13 @@ Especially:

 ### Guardrail 5: The committed-truth carry-forward must become a gate, not a note

-Before the next phase, `Phase 08` must decide one of:
+For the chosen `RF=2 sync_all` candidate path, this gate is now decided:

-1. committed-truth separation is mandatory before a production-candidate phase
-2. the first candidate path is intentionally bounded to the currently proven pre-checkpoint replay behavior
+1. `CommittedLSN = WALHeadLSN`
+2. `CheckpointLSN` remains the durable base-image boundary
+3. this separation is accepted as sufficient for the candidate-path hardening boundary

-It must not remain an unclassified carry-forward.
+For future paths or durability modes, the gate must still be classified explicitly rather than carried forward informally.

 ## Exit Criteria

@ -214,41 +500,36 @@ Phase 08 is done when:
 6. operational/debug evidence is sufficient for pre-production use
 7. the remaining gap to a production candidate is small and explicit

+Phase-close note:
+
+- `Phase 08` is now closed
+- next phase:
+  - `Phase 09: Production Execution Closure`
+  - start with `P0` planning for real execution completeness:
+    - real `TransferFullBase`
+    - real `TransferSnapshot`
+    - real `TruncateWAL`
+    - stronger live runtime execution ownership
+
 ## Assignment For `sw`

-Next tasks:
+Current next tasks:

-1. drive `Phase 08 P1` as real master/control delivery integration
-2. replace direct `AssignmentIntent` construction for the first live path
-3. preserve through the real control path:
-   - stable `ReplicaID`
-   - epoch fencing
-   - address-change invalidation
-4. include at least one real failover / promotion / reassignment validation target
-5. keep acceptance claims scoped:
-   - real control delivery path
-   - not yet general production readiness
-6. keep explicit carry-forwards:
-   - `CommittedLSN != CheckpointLSN` still unresolved
-   - integrated catch-up execution chain still incomplete
-   - rebuild execution still incomplete
+1. close out `Phase 08` bookkeeping only if any wording drift remains
+2. move to `Phase 09 P0` planning for production execution closure
+3. focus the next heavy engineering package on:
+   - real `TransferFullBase`
+   - real `TransferSnapshot`
+   - real `TruncateWAL`
+   - stronger live runtime execution ownership

 ## Assignment For `tester`

-Next tasks:
-
-1. use the accepted `Phase 08` plan framing as the `P1` validation oracle
-2. validate real control delivery for:
-   - live assignment delivery
-   - stable identity through the control path
-   - stale epoch/session invalidation
-   - at least one real failover / reassignment cycle
-3. keep the no-overclaim rule active around:
-   - catch-up semantics
-   - rebuild execution
-   - master/control delivery
-4. keep the committed-truth gate explicit:
-   - still unresolved in `P1`
-5. prepare `P2` follow-up expectations for:
-   - integrated engine -> executor -> `v2bridge` execution closure
-   - unified replay after `P1` and `P2`
+Current next tasks:
+
+1. treat `Phase 08` as closed after any final wording/bookkeeping sync
+2. prepare the `Phase 09 P0` validation oracle for production execution closure
+3. keep no-overclaim active around:
+   - validation-grade transfer vs production-grade transfer
+   - truncation execution
+   - stronger runtime ownership vs current bounded path
--- a/sw-block/.private/phase/phase-09-decisions.md
+++ b/sw-block/.private/phase/phase-09-decisions.md
@ -0,0 +1,26 @@
+# Phase 09 Decisions
+
+## Decision 1: Phase 09 is production execution closure, not packaging
+
+The candidate-path packaging/judgment work remains inside `Phase 08 P4`.
+
+`Phase 09` starts directly with substantial backend engineering closure.
+
+## Decision 2: The first Phase 09 targets are real transfer, truncation, and stronger runtime ownership
+
+The initial heavy execution blockers are:
+
+1. real `TransferFullBase`
+2. real `TransferSnapshot`
+3. real `TruncateWAL`
+4. stronger live runtime execution ownership
+
+## Decision 3: Phase 09 remains bounded to the chosen candidate path unless evidence forces expansion
+
+Default scope remains:
+
+1. `RF=2`
+2. `sync_all`
+3. existing master / volume-server heartbeat path
+
+Future paths or durability modes should not be absorbed casually into this phase.
--- a/sw-block/.private/phase/phase-09-log.md
+++ b/sw-block/.private/phase/phase-09-log.md
@ -0,0 +1,269 @@
+# Phase 09 Log
+
+## 2026-03-31
+
+### Opened
+
+`Phase 09` opened as:
+
+- production execution closure
+
+### Starting basis
+
+1. `Phase 08`: closed
+2. chosen candidate path exists for `RF=2 sync_all`
+3. main remaining heavy engineering work is backend production-grade execution
+
+### Next
+
+1. `Phase 09 P0` planning for:
+   - real `TransferFullBase`
+   - real `TransferSnapshot`
+   - real `TruncateWAL`
+   - stronger live runtime execution ownership
+
+### P0 Technical Pack
+
+Purpose:
+
+- provide the minimum design/algo/test detail needed to start `Phase 09`
+- keep the work centered on backend production execution closure
+- avoid broad scope growth into control-plane redesign or product-surface work
+
+#### Execution-closure target
+
+`Phase 09` should close this gap:
+
+- current path is candidate-safe but still partially validation-grade
+- next path must be backend-production-grade for the chosen `RF=2 sync_all` path
+
+This phase should not try to make every surrounding product surface complete.
+It should make the backend execution path real enough that later phases can build on it.
+
+#### What "real" means in this phase
+
+Use these definitions.
+
+1. Real `TransferFullBase`
+   - not only "extent is accessible"
+   - must read and transfer real block/base contents through the execution path
+   - completion must depend on the transfer actually occurring
+
+2. Real `TransferSnapshot`
+   - not only "checkpoint exists and is readable"
+   - must read and transfer the snapshot/base image through the execution path
+   - tail replay must remain aligned with the transferred snapshot boundary
+
+3. Real `TruncateWAL`
+   - not only "replica ahead detected"
+   - must execute the physical correction required by the chosen path
+   - completion must depend on truncation having actually happened
+
+4. Stronger live runtime execution ownership
+   - V2 recovery execution should be driven by a stronger live runtime path than test-only orchestration
+   - the volume-server path should own plan / execute / cancel / cleanup more directly
+   - avoid split ownership where tests prove the path but the running service still does not drive it coherently
+
+#### Recommended slice order inside Phase 09
+
+Keep the phase substantial, but still ordered by dependency:
+
+1. `P1` full-base execution closure
+   - make `TransferFullBase` real
+   - prove rebuild path no longer depends on accessibility-only validation
+
+2. `P2` snapshot execution closure
+   - make `TransferSnapshot` real
+   - prove snapshot/tail rebuild path can use a real transferred base
+
+3. `P3` truncation execution closure
+   - make `TruncateWAL` real
+   - prove replica-ahead path is executable, not only detectable
+
+4. `P4` stronger live runtime ownership
+   - move the accepted execution logic closer to the real volume-server/runtime loop
+   - prove cleanup / cancel / replacement under the stronger live path
+
+This order is recommended because:
+
+1. transfer closure is the largest production blocker
+2. truncation depends on a clearer execution contract
+3. runtime ownership should build on real execution, not on validation-grade stubs
+
+#### Design rules
+
+1. engine still owns policy
+   - do not move catch-up / rebuild / truncation decision logic into `v2bridge` or `blockvol`
+
+2. `v2bridge` owns real execution translation
+   - implement real transfer/truncate behavior there or through bounded runtime hooks
+   - keep it as the execution adapter, not the policy owner
+
+3. `blockvol` owns storage/runtime reality
+   - WAL
+   - checkpoint
+   - extent/snapshot data
+   - low-level execution primitives
+
+4. chosen-path bounds remain explicit
+   - `RF=2`
+   - `sync_all`
+   - existing master / volume-server heartbeat path
+
+#### Primary reuse / update targets
+
+For each target, state whether the expected action is:
+
+1. `update in place`
+2. `reference only`
+3. `copy is allowed`
+
+For `Phase 09`, the main targets are:
+
+1. `weed/storage/blockvol/v2bridge/executor.go`
+   - action: `update in place`
+   - why:
+     - this is the direct V2 execution adapter
+     - real transfer/truncate closure belongs here first
+   - boundary:
+     - add real execution behavior
+     - do not add policy decisions here
+
+2. `weed/storage/blockvol/blockvol.go`
+   - action: `update in place`
+   - why:
+     - authoritative runtime/storage hooks live here
+     - transfer/truncate/recovery primitives may need to be exposed or tightened here
+   - boundary:
+     - expose/execute real runtime behavior
+     - do not let old replication semantics redefine V2 truth
+
+3. `weed/storage/blockvol/rebuild.go`
+   - action: `reference first`, then `update in place if needed`
+   - why:
+     - existing rebuild transport/runtime reality may be reused
+   - boundary:
+     - reuse transfer reality
+     - keep rebuild-source choice and recovery policy in the V2 engine
+
+4. `weed/server/volume_server_block.go`
+   - action: `update in place`
+   - why:
+     - stronger live runtime execution ownership will likely terminate here
+   - boundary:
+     - strengthen live orchestration/runtime handoff
+     - do not collapse V2 boundaries into server-local convenience semantics
+
+5. `weed/storage/blockvol/v2bridge/control.go`
+   - action: `reference only` for `Phase 09` unless execution work exposes a real gap
+   - why:
+     - `Phase 09` is not the main control-plane closure phase
+
+6. product surfaces (`CSI`, `NVMe`, `iSCSI`)
+   - action: `reference only`
+   - why:
+     - not in scope for this phase
+     - avoid accidental scope growth
+
+7. old V1 shipper/rebuild execution paths
+   - action: `reference only`, `copy is not default`
+   - why:
+     - they are reality/integration references
+     - not the default semantic template for V2
+
+When `sw` submits a slice package in this phase, include a short reuse note:
+
+- files updated in place
+- files used as references only
+- any file copied from older code and why copy was safer than in-place update
+
+#### Validation expectations
+
+For each execution closure target, require:
+
+1. one-chain proof
+   - engine plan
+   - engine executor/runtime driver
+   - `v2bridge`
+   - real `blockvol` operation
+   - completion
+   - cleanup
+
+2. physical-effect proof
+   - the operation must do real transfer/truncate work, not only validation
+
+3. fail-closed behavior
+   - partial failure
+   - cancellation
+   - replacement
+   - all release resources correctly
+
+4. observability
+   - logs explain:
+     - why the execution started
+     - what exact execution path ran
+     - why it completed / failed / cancelled
+
+#### Suggested validation package
+
+Keep the package focused:
+
+1. one full-base execution test
+2. one snapshot execution test
+3. one truncation execution test
+4. one live runtime ownership test
+5. one cleanup/adversarial package covering:
+   - cancel
+   - replacement
+   - partial failure
+
+Avoid:
+
+1. broad matrix growth before these closures are real
+2. product-surface tests (`CSI` / `NVMe` / `iSCSI`) in this phase
+3. new protocol exploration
+
+#### Assignment template for `sw`
+
+1. Goal
+   - Build the `Phase 09` production-execution closure package for the chosen candidate path.
+
+2. Required outputs
+   - explicit definition of "real" for each target:
+     - `TransferFullBase`
+     - `TransferSnapshot`
+     - `TruncateWAL`
+     - stronger runtime ownership
+   - slice/package order inside `Phase 09`
+   - implementation plan for the first heavy closure target
+   - expected tests/evidence for each target
+
+3. Hard rules
+   - no protocol redesign
+   - no broad scope growth into product surfaces
+   - no moving policy logic into `v2bridge` / `blockvol`
+   - keep the chosen-path bound explicit
+
+4. Delivery order
+   - first hand to architect review
+   - only after architect review passes, hand to tester validation
+
+5. Reject before handoff if
+   - "real" is still defined as accessibility-only validation
+   - target order ignores execution dependency
+   - runtime ownership remains too vague to verify
+
+#### Assignment template for `tester`
+
+1. Goal
+   - Validate that the `Phase 09` execution-closure plan is concrete enough to support substantial engineering work.
+
+2. Validate
+   - each target has a real/physical-effect definition
+   - each target has a one-chain proof expectation
+   - cleanup and fail-closed expectations are explicit
+   - the phase remains bounded to the chosen path
+
+3. Output
+   - pass/fail on execution-target clarity
+   - findings on vague "real" definitions, missing cleanup proofs, or hidden scope growth
--- a/sw-block/.private/phase/phase-09.md
+++ b/sw-block/.private/phase/phase-09.md
@ -0,0 +1,127 @@
+# Phase 09
+
+Date: 2026-03-31
+Status: active
+Purpose: turn the accepted candidate-safe backend path into a production-grade execution path without reopening accepted V2 recovery semantics
+
+## Why This Phase Exists
+
+`Phase 08` closed:
+
+1. real control delivery on the chosen path
+2. real one-chain catch-up and rebuild closure on the chosen path
+3. unified hardening replay on the accepted live path
+4. one bounded candidate package for `RF=2 sync_all`
+
+What still does not exist is production-grade execution completeness.
+
+The main remaining gap is no longer:
+
+1. whether the path is candidate-safe
+
+It is now:
+
+1. whether the backend execution path is production-grade rather than validation-grade
+
+## Phase Goal
+
+Close the main backend execution gaps so the chosen path is no longer blocked by validation-grade transfer/truncation behavior.
+
+## Scope
+
+### In scope
+
+1. real `TransferFullBase`
+2. real `TransferSnapshot`
+3. real `TruncateWAL`
+4. stronger live runtime execution ownership on the volume-server path
+
+### Out of scope
+
+1. broad control-plane redesign
+2. `RF>2`
+3. `best_effort` / `sync_quorum` recovery semantics
+4. product-surface rebinding (`CSI` / `NVMe` / `iSCSI`)
+5. broad performance optimization
+
+## Phase 09 Items
+
+### P0: Production Execution Closure Plan
+
+1. convert the accepted candidate package into a production-execution closure plan
+2. define the minimum execution blockers that must be closed in this phase
+3. order the execution work by dependency and risk
+4. keep the chosen-path bound explicit while making the backend path production-grade
+
+Goal:
+
+- start `Phase 09` with one substantial execution-closure plan, not another light packaging round
+
+Must prove:
+
+1. the phase is centered on real backend execution work
+2. the required closures are explicit:
+   - `TransferFullBase`
+   - `TransferSnapshot`
+   - `TruncateWAL`
+   - stronger runtime ownership
+3. the phase remains bounded to the chosen candidate path unless new evidence expands it
+
+Verification mechanism:
+
+1. architect review:
+   - phase shape is substantial and outcome-based
+   - work is ordered by real engineering dependency
+2. tester review:
+   - validation expectations are explicit for each execution closure target
+3. manager review:
+   - the phase is large enough to justify a full engineering round
+
+Output artifacts:
+
+1. explicit execution-closure target list
+2. explicit execution blocker list
+3. initial slice/package order inside `Phase 09`
+
+Execution note:
+
+- use `phase-09-log.md` as the technical pack for:
+  - the definition of "real" for each execution target
+  - recommended slice order
+  - validation expectations
+  - assignment templates for `sw` and `tester`
+
+Reject if:
+
+1. `Phase 09` is framed as another packaging/documentation phase
+2. execution blockers remain implicit
+3. the phase quietly expands into product surfaces or unrelated control-plane work
+4. the phase has no clear verification mechanism
+
+## Assignment For `sw`
+
+Current next tasks:
+
+1. define the concrete execution-closure package for `Phase 09`
+2. specify what "real" means for:
+   - `TransferFullBase`
+   - `TransferSnapshot`
+   - `TruncateWAL`
+3. specify how stronger live runtime execution ownership should work on the volume-server path
+4. keep the phase bounded to the chosen candidate path unless new evidence forces expansion
+5. hand the package to architect review before tester work begins
+
+## Assignment For `tester`
+
+Current next tasks:
+
+1. prepare the validation oracle for production execution closure
+2. require explicit validation targets for:
+   - real transfer behavior
+   - truncation execution
+   - cleanup on success/failure/cancel
+   - stronger live runtime ownership
+3. keep no-overclaim active around:
+   - validation-grade vs production-grade execution
+   - chosen path vs future paths/modes
+4. review only after architect pre-review passes
--- a/sw-block/design/README.md
+++ b/sw-block/design/README.md
@ -22,7 +22,10 @@ Current WAL V2 design set:
 - `v2-engine-slicing-plan.md`
 - `v2-protocol-truths.md`
 - `v2-production-roadmap.md`
+- `v2-product-completion-overview.md`
+- `v2-phase-development-plan.md`
 - `phase-07-service-slice-plan.md`
+- `phase-08-engine-skeleton-map.md`
 - `agent_dev_process.md`

 These documents are the working design home for the V2 line.
--- a/sw-block/design/agent_dev_process.md
+++ b/sw-block/design/agent_dev_process.md
@ -101,6 +101,10 @@ Each delivery should include:
 3. resources acquired/released
 4. test inventory
 5. known carry-forward notes
+6. reuse note:
+   - files updated in place
+   - files used as references only
+   - files copied and why

 This template is required between:

@ -126,6 +130,9 @@ Test inventory:

 Carry-forward notes:
 - ...
+
+Reuse note:
+- ...
 ```

 ## Phase Doc Usage
@ -153,6 +160,10 @@ Use for:
 3. carry-forward discussion
 4. open observations
 5. why wording or scope changed
+6. slice-level reuse instructions:
+   - `update in place`
+   - `reference only`
+   - `copy is allowed`

 This document may be longer and more detailed.

@ -290,6 +301,20 @@ Any such reuse should be reviewed explicitly as:
 3. temporary carry-forward
 4. hard gate before later phases

+### Rule 6: Every substantial slice should declare reuse instructions
+
+Before implementation grows, the slice package should state:
+
+1. which existing files are expected to be updated in place
+2. which existing files are reference-only
+3. whether any copying is allowed and why
+
+This helps prevent:
+
+1. accidental scope growth
+2. unclear ownership of old files
+3. hidden semantic inheritance from V1/V1.5 paths
+
 ## Current Direction

 The project has moved from exploration-heavy work to evidence-first engine work.
--- a/sw-block/design/phase-08-engine-skeleton-map.md
+++ b/sw-block/design/phase-08-engine-skeleton-map.md
@ -0,0 +1,301 @@
+# Phase 08 Engine Skeleton Map
+
+Date: 2026-03-31
+Status: active
+Purpose: provide a short structural map for the `Phase 08` hardening path so implementation can move faster without reopening accepted V2 boundaries
+
+## Scope
+
+This is not the final standalone `sw-block` architecture.
+
+It is the shortest useful engine skeleton for the accepted `Phase 08` hardening path:
+
+- `RF=2`
+- `sync_all`
+- existing `Seaweed` master / volume-server heartbeat path
+- V2 engine owns recovery policy
+- `blockvol` remains the execution backend
+
+## Module Map
+
+### 1. Control plane
+
+Role:
+
+- authoritative control truth
+
+Primary sources:
+
+- `weed/server/master_grpc_server.go`
+- `weed/server/master_block_registry.go`
+- `weed/server/master_block_failover.go`
+- `weed/server/volume_grpc_client_to_master.go`
+
+What it produces:
+
+- confirmed assignment
+- `Epoch`
+- target `Role`
+- failover / promotion / reassignment result
+- stable server identity
+
+### 2. Control bridge
+
+Role:
+
+- translate real control truth into V2 engine intent
+
+Primary files:
+
+- `weed/storage/blockvol/v2bridge/control.go`
+- `sw-block/bridge/blockvol/control_adapter.go`
+- entry path in `weed/server/volume_server_block.go`
+
+What it produces:
+
+- `AssignmentIntent`
+- stable `ReplicaID`
+- `Endpoint`
+- `SessionKind`
+
+### 3. Engine runtime
+
+Role:
+
+- recovery-policy core
+
+Primary files:
+
+- `sw-block/engine/replication/orchestrator.go`
+- `sw-block/engine/replication/driver.go`
+- `sw-block/engine/replication/executor.go`
+- `sw-block/engine/replication/sender.go`
+- `sw-block/engine/replication/history.go`
+
+What it decides:
+
+- zero-gap / catch-up / needs-rebuild
+- sender/session ownership
+- stale authority rejection
+- resource acquisition / release
+- rebuild source selection
+
+### 4. Storage bridge
+
+Role:
+
+- translate real blockvol storage truth and execution capability into engine-facing adapters
+
+Primary files:
+
+- `weed/storage/blockvol/v2bridge/reader.go`
+- `weed/storage/blockvol/v2bridge/pinner.go`
+- `weed/storage/blockvol/v2bridge/executor.go`
+- `sw-block/bridge/blockvol/storage_adapter.go`
+
+What it provides:
+
+- `RetainedHistory`
+- WAL retention pin / release
+- snapshot pin / release
+- full-base pin / release
+- WAL scan execution
+
+### 5. Block runtime
+
+Role:
+
+- execute real I/O
+
+Primary files:
+
+- `weed/storage/blockvol/blockvol.go`
+- `weed/storage/blockvol/replica_apply.go`
+- `weed/storage/blockvol/replica_barrier.go`
+- `weed/storage/blockvol/recovery.go`
+- `weed/storage/blockvol/rebuild.go`
+- `weed/storage/blockvol/wal_shipper.go`
+
+What it owns:
+
+- WAL
+- extent
+- flusher
+- checkpoint / superblock
+- receiver / shipper
+- rebuild server
+
+## Execution Order
+
+### Control path
+
+```text
+master heartbeat / failover truth
+  -> BlockVolumeAssignment
+  -> volume server ProcessAssignments
+  -> v2bridge control conversion
+  -> engine ProcessAssignment
+  -> sender/session state updated
+```
+
+### Catch-up path
+
+```text
+assignment accepted
+  -> engine reads retained history
+  -> engine plans catch-up
+  -> storage bridge pins WAL retention
+  -> engine executor drives v2bridge executor
+  -> blockvol scans WAL / ships entries
+  -> engine completes session
+```
+
+### Rebuild path
+
+```text
+assignment accepted
+  -> engine detects NeedsRebuild
+  -> engine selects rebuild source
+  -> storage bridge pins snapshot/full-base/tail
+  -> executor drives transfer path
+  -> blockvol performs restore / replay work
+  -> engine completes rebuild
+```
+
+### Local durability path
+
+```text
+WriteLBA / Trim
+  -> WAL append
+  -> shipping / barrier
+  -> client-visible durability decision
+  -> flusher writes extent
+  -> checkpoint advances
+  -> retention floor decides WAL reclaimability
+```
+
+## Interim Fields
+
+These are currently acceptable only as explicit hardening carry-forwards:
+
+### `localServerID`
+
+Current source:
+
+- `BlockService.listenAddr`
+
+Meaning:
+
+- temporary local identity source for replica/rebuild-side assignment translation
+
+Status:
+
+- interim only
+- should become registry-assigned stable server identity later
+
+### `CommittedLSN = CheckpointLSN`
+
+Current source:
+
+- `v2bridge.Reader` / `BlockVol.StatusSnapshot()`
+
+Meaning:
+
+- current V1-style interim mapping where committed truth collapses to local checkpoint truth
+
+Status:
+
+- not final V2 truth
+- must become a gate decision before a production-candidate phase
+
+### heartbeat as control carrier
+
+Current source:
+
+- existing master <-> volume-server heartbeat path
+
+Meaning:
+
+- current transport for assignment/control delivery
+
+Status:
+
+- acceptable as current carrier
+- not yet a final proof that no separate control channel will ever be needed
+
+## Hard Gates
+
+These should remain explicit in `Phase 08`:
+
+### Gate 1: committed truth
+
+Before production-candidate:
+
+- either separate `CommittedLSN` from `CheckpointLSN`
+- or explicitly bound the first candidate path to currently proven pre-checkpoint replay behavior
+
+### Gate 2: live control delivery
+
+Required:
+
+- real assignment delivery must reach the engine on the live path
+- not only converter-level proof
+
+### Gate 3: integrated catch-up closure
+
+Required:
+
+- engine -> executor -> `v2bridge` -> blockvol must be proven as one live chain
+- not planner proof plus direct WAL-scan proof as separate evidence
+
+### Gate 4: first rebuild execution path
+
+Required:
+
+- rebuild must not remain only a detection outcome
+- the chosen product path needs one real executable rebuild closure
+
+### Gate 5: unified replay
+
+Required:
+
+- after control and execution closure land, rerun the accepted failure-class set on the unified live path
+
+## Reuse Map
+
+### Reuse directly
+
+- `weed/server/master_grpc_server.go`
+- `weed/server/volume_grpc_client_to_master.go`
+- `weed/server/volume_server_block.go`
+- `weed/server/master_block_registry.go`
+- `weed/server/master_block_failover.go`
+- `weed/storage/blockvol/blockvol.go`
+- `weed/storage/blockvol/replica_apply.go`
+- `weed/storage/blockvol/replica_barrier.go`
+- `weed/storage/blockvol/v2bridge/`
+
+### Reuse as implementation reality, not truth
+
+- `shipperGroup`
+- `RetentionFloorFn`
+- `ReplicaReceiver`
+- checkpoint/superblock machinery
+- existing failover heuristics
+
+### Do not inherit as V2 semantics
+
+- address-shaped identity
+- old degraded/catch-up intuition from V1/V1.5
+- `CommittedLSN = CheckpointLSN` as final truth
+- blockvol-side recovery policy decisions
+
+## Short Rule
+
+Use this skeleton as:
+
+- a hardening map for the current product path
+
+Do not mistake it for:
+
+- the final standalone `sw-block` architecture
--- a/sw-block/design/v2-phase-development-plan.md
+++ b/sw-block/design/v2-phase-development-plan.md
@ -0,0 +1,231 @@
+# V2 Phase Development Plan
+
+Date: 2026-03-31
+Status: active
+Purpose: define the execution-oriented phase plan after the current candidate-path work, with explicit module status and target phase ownership
+
+## Why This Document Exists
+
+The project now needs a development plan that is:
+
+1. phase-oriented
+2. execution-oriented
+3. large enough to avoid overhead-heavy micro-slices
+4. explicit about which module belongs to which future phase
+
+This document is the planning bridge between:
+
+1. `v2-product-completion-overview.md`
+2. `../.private/phase/phase-08.md`
+3. future implementation phases
+
+## Planning Rules
+
+Use these rules for all later phases:
+
+1. one phase should close one meaningful product/engineering outcome
+2. every phase must have a clear verification mechanism
+3. phases should prefer real code/test/evidence over wording-only progress
+4. later phases may reuse V1 engineering reality, but must not inherit V1 recovery semantics as truth
+5. a phase is too small if it does not move the overall product-completion state clearly
+
+## Current Baseline
+
+Current accepted/closing path through `Phase 08`:
+
+1. protocol/algo truth set is strong
+2. engine recovery core is strong
+3. real control delivery exists on the chosen path
+4. real one-chain catch-up and rebuild closure exist on the chosen path
+5. unified hardening validation exists on the chosen path
+6. one bounded candidate statement exists for:
+  - `RF=2`
+  - `sync_all`
+  - existing master / volume-server heartbeat path
+
+Phase-accounting note:
+
+1. this document assumes the current `Phase 08` path and bookkeeping are being closed consistently
+2. if `Phase 08` bookkeeping is still open, read the candidate statement items above as the current accepted/closing path, not as a fully closed phase label
+
+This means the next phases should focus mainly on:
+
+1. production-grade execution completeness
+2. stronger runtime ownership
+3. stronger control-plane closure
+4. later product-surface rebinding
+5. production hardening
+
+## Phase Roadmap
+
+### Phase 09: Production Execution Closure
+
+Goal:
+
+1. turn validation-grade backend execution into production-grade backend execution
+
+Must prove:
+
+1. full-base rebuild performs real data transfer
+2. snapshot rebuild performs real image transfer
+3. replica-ahead path is physically executable, not only detected
+4. runtime execution ownership is stronger than the current bounded candidate path
+
+Typical outputs:
+
+1. real `TransferFullBase`
+2. real `TransferSnapshot`
+3. real `TruncateWAL`
+4. stronger executor/runtime integration in the live volume-server path
+
+Verification mechanism:
+
+1. one-chain execution tests on real backend paths
+2. cleanup assertions after success/failure/cancel
+3. focused adversarial tests for truncation and rebuild execution
+
+Workload:
+
+1. large
+2. this is likely the single biggest remaining engineering phase
+
+### Phase 10: Real Control-Plane Closure
+
+Goal:
+
+1. strengthen from accepted assignment-entry closure to fuller end-to-end control-plane closure
+
+Must prove:
+
+1. heartbeat/gRPC-level delivery is real for the chosen path
+2. failover / reassignment state converges through the real control path
+3. local and remote identity are consistent enough for product use
+
+Typical outputs:
+
+1. stronger heartbeat/gRPC delivery proof
+2. stronger result/reporting convergence
+3. cleaner local identity than transport-shaped `listenAddr`
+
+Verification mechanism:
+
+1. real failover/reassignment tests at the fuller control-plane level
+2. identity/fencing assertions through the end-to-end path
+
+Workload:
+
+1. medium-large
+
+### Phase 11: Product Surface Rebinding
+
+Goal:
+
+1. bind product-facing surfaces onto the V2-backed block path after backend closure is strong enough
+
+Must prove:
+
+1. the V2-backed backend can support selected product surfaces without semantic drift
+2. reuse of V1 surfaces does not reintroduce V1 recovery truth
+
+Candidate areas:
+
+1. snapshot product path
+2. `CSI`
+3. `NVMe`
+4. `iSCSI`
+
+Verification mechanism:
+
+1. selected surface integration tests
+2. product-surface contract checks
+3. no-overclaim review that the surface does not imply unsupported backend capability
+
+Workload:
+
+1. medium-large
+2. can be split by product surface if needed, but only after backend closure is strong
+
+### Phase 12: Production Hardening
+
+Goal:
+
+1. move from candidate-safe to production-safe
+
+Must prove:
+
+1. restart/recovery stability under repeated disturbance
+2. long-run/soak viability
+3. operational diagnosability
+4. acceptable production blockers list or production-ready gate
+
+Verification mechanism:
+
+1. soak/adversarial runs
+2. failover/restart under disturbance
+3. runbook/debug validation
+
+Workload:
+
+1. large
+
+## Module Status Map
+
+
+| Module area                                                   | Current status               | Current owner phase      | Next target phase | Notes                                                                                      |
+| ------------------------------------------------------------- | ---------------------------- | ------------------------ | ----------------- | ------------------------------------------------------------------------------------------ |
+| `sw-block/engine/replication` core FSM/orchestrator/driver    | Strong                       | `Phase 08` accepted      | `Phase 09`        | Main later work is runtime/product execution closure, not new core semantics.              |
+| Engine executor real I/O boundary (`CatchUpIO` / `RebuildIO`) | Strong on chosen path        | `Phase 08 P2/P3`         | `Phase 09`        | Keep the boundary; make underlying transfer/truncate production-grade.                     |
+| `weed/storage/blockvol/v2bridge/control.go`                   | Strong on chosen path        | `Phase 08 P1`            | `Phase 10`        | Next step is fuller control-plane closure, not new mapping semantics.                      |
+| `weed/storage/blockvol/v2bridge/reader.go`                    | Strong                       | `Phase 08 P2/P3`         | `Phase 09/10`     | Keep comments/status aligned with candidate-path committed-truth decision.                 |
+| `weed/storage/blockvol/v2bridge/pinner.go`                    | Strong                       | `Phase 08 P1/P3`         | `Phase 09`        | Retention safety proven; later work is product-grade execution under that safety.          |
+| `weed/storage/blockvol/v2bridge/executor.go` WAL scan         | Strong                       | `Phase 08 P2`            | `Phase 09`        | Real scan is good; later work is real transfer/truncate completeness.                      |
+| `v2bridge` `TransferFullBase`                                 | Partial                      | `Phase 08 P2/P4`         | `Phase 09`        | Validation-grade now; target is real production streaming.                                 |
+| `v2bridge` `TransferSnapshot`                                 | Partial                      | `Phase 08 P2/P4`         | `Phase 09`        | Validation-grade now; target is real image transfer.                                       |
+| `v2bridge` `TruncateWAL`                                      | Weak/stub                    | `Phase 08 P4` bound      | `Phase 09`        | Must become a real executable path.                                                        |
+| `weed/server/volume_server_block.go` V2 assignment intake     | Medium-strong                | `Phase 08 P1`            | `Phase 09/10`     | Real intake exists; later work is stronger runtime ownership + fuller control-plane proof. |
+| `blockvol` WAL/flusher/checkpoint runtime                     | Reuse reality                | Existing production code | `Phase 09`        | Reuse implementation; do not let old semantics redefine V2 truth.                          |
+| `blockvol` rebuild transport/server reality                   | Reuse with redesign boundary | Existing production code | `Phase 09`        | Good area for production execution closure work.                                           |
+| local server identity (`localServerID`)                       | Partial                      | `Phase 08` bounded       | `Phase 10`        | Still transport-shaped; should become cleaner under control-plane closure.                 |
+| Snapshot product path                                         | Partial/reuse candidate      | not core in `Phase 08`   | `Phase 11`        | Reuse implementation, but V2 semantics own placement and claims.                           |
+| `CSI` integration                                             | Deferred reuse candidate     | not core in `Phase 08`   | `Phase 11`        | Product surface, not next core closure target.                                             |
+| `NVMe` / `iSCSI` front-ends                                   | Deferred reuse candidate     | not core in `Phase 08`   | `Phase 11`        | Rebind after backend path is stronger.                                                     |
+| Testrunner / infra / metrics                                  | Strong support layer         | existing                 | `Phase 10-12`     | Reuse to validate later control-plane and hardening phases.                                |
+
+
+## Completion-State Targets
+
+Use these rough targets to judge whether a phase is moving the product meaningfully.
+
+
+| Phase      | Expected completion move                                                      |
+| ---------- | ----------------------------------------------------------------------------- |
+| `Phase 09` | from validation-grade backend execution to production-grade backend execution |
+| `Phase 10` | from bounded control-entry proof to stronger end-to-end control-plane closure |
+| `Phase 11` | from backend-ready path to selected product-surface readiness                 |
+| `Phase 12` | from candidate-safe to production-safe                                        |
+
+
+## Near-Term Execution Direction
+
+If the goal is to maximize product completion efficiently, the recommended order is:
+
+1. finish `Phase 08` bookkeeping cleanly
+2. `Phase 09` production execution closure
+3. `Phase 10` real control-plane closure
+4. `Phase 11` product surface rebinding
+5. `Phase 12` production hardening
+
+The most important near-term engineering weight should go to `Phase 09`.
+
+## Short Summary
+
+The V2 line already has a real bounded candidate path.
+The next development plan should treat later work as product-completion phases, not more protocol discovery.
+
+The main heavy engineering work still ahead is:
+
+1. production-grade execution
+2. stronger runtime/control closure
+3. later product-surface rebinding
+4. production hardening
+
--- a/sw-block/design/v2-product-completion-overview.md
+++ b/sw-block/design/v2-product-completion-overview.md
@ -0,0 +1,267 @@
+# V2 Product Completion Overview
+
+Date: 2026-03-31
+Status: active
+Purpose: provide one product-level overview of current V2 engineering completion, V1 reuse strategy, and the roadmap from the accepted candidate path to a production-ready block engine
+
+## Why This Document Exists
+
+The project now has enough accepted V2 algorithm, engine, and hardening evidence that the next question is no longer only:
+
+1. is the protocol correct
+
+It is also:
+
+1. how complete is the product path
+2. which parts are already strong
+3. which parts can reuse V1 engineering
+4. which parts still require major implementation work
+5. which future phases actually move product completion
+
+This document is the product-completion view.
+
+It complements:
+
+1. `v2-protocol-truths.md` for accepted semantics
+2. `v2-production-roadmap.md` for the older roadmap ladder
+3. `../.private/phase/phase-08.md` for current phase contract
+
+## Current Position
+
+The accepted first candidate path is:
+
+1. `RF=2`
+2. `sync_all`
+3. existing master / volume-server heartbeat path
+4. V2 engine owns recovery policy
+5. `v2bridge` translates real storage/control truth
+6. `blockvol` remains the execution backend
+
+This means the project is no longer at "algorithm only".
+It already has:
+
+1. accepted protocol truths
+2. accepted engine execution closure
+3. accepted hardening replay on a real integrated path
+4. one bounded candidate statement
+
+## Engineering Completion Snapshot
+
+These levels are rough engineering estimates, not exact percentages.
+
+| Area | Current level | Notes |
+|------|---------------|-------|
+| Algorithm / protocol truths | Strong | Core V2 semantics are accepted and should remain stable unless contradicted by live evidence. |
+| Simulator / prototype evidence | Strong | Main failure classes and protocol boundaries are already well-exercised. |
+| Engine recovery core | Strong | Sender/session/orchestrator/driver/executor are substantially implemented. |
+| Weed bridge integration | Strong | Reader / pinner / control / executor are real and tested on the chosen path. |
+| Integrated candidate path | Medium-strong | `P1` + `P2` + `P3` prove one bounded candidate path. |
+| Runtime ownership inside live server loop | Medium | Real intake exists, but full product-grade recovery ownership is not yet fully closed. |
+| Production-grade data transfer | Medium-weak | Validation-grade transfer exists; full production byte streaming is still incomplete. |
+| Truncation / replica-ahead execution | Weak | Detection exists; full execution path is still incomplete. |
+| End-to-end control-plane closure | Medium | `ProcessAssignments()` is real; full heartbeat/gRPC proof is still bounded. |
+| Product surfaces (`CSI`, `NVMe`, `iSCSI`, snapshot productization) | Partial | Mostly reuse candidates, but not the current core closure target. |
+| Production hardening / ops | Partial | Candidate-level evidence exists; production-grade hardening is still ahead. |
+
+## Reuse Strategy
+
+Use this rule:
+
+1. if a component decides truth, V2 must own it
+2. if a component consumes truth, V1 engineering can often be reused
+
+### V2-owned semantics
+
+These should not inherit V1 semantics casually:
+
+1. recovery choice: `zero_gap` / `catchup` / `needs_rebuild`
+2. sender/session ownership and fencing
+3. stable `ReplicaID` and stale-authority rejection
+4. committed/checkpoint interpretation
+5. rebuild-source choice and recovery outcome meaning
+
+### V1 engineering that is usually reusable
+
+These are implementation/reality layers, not protocol truth:
+
+1. `blockvol` storage runtime
+2. WAL / flusher / checkpoint machinery
+3. real assignment receive/apply path
+4. front-end adapters such as `NVMe` / `iSCSI`
+5. much of `CSI` lifecycle integration
+6. monitoring / metrics / test harness infrastructure
+
+### Reuse with explicit bounds
+
+These can reuse implementation, but their semantic placement must remain V2-owned:
+
+1. snapshot export / checkpoint plumbing
+2. rebuild transport / extent read path
+3. master/heartbeat/control delivery path
+
+## Module Treatment Overview
+
+| Module area | Current treatment | Near-term plan |
+|-------------|-------------------|----------------|
+| Recovery engine | V2-owned | Continue closing runtime/product path under accepted semantics. |
+| `v2bridge` | V2 boundary adapter | Keep expanding real I/O/runtime closure without leaking policy downward. |
+| `blockvol` WAL/flusher/runtime | Reuse reality | Reuse implementation, but do not let V1 replication semantics redefine V2 truth. |
+| Snapshot capability | Reuse implementation, V2-owned semantics | Do not make this a main near-term phase goal until core execution/runtime closure is stronger. |
+| `CSI` | Later product surface | Rebind after the V2-backed candidate path is stable enough. |
+| `NVMe` / `iSCSI` | Later product surface | Reuse as front-end adapters once the backend candidate path is stronger. |
+| Rebuild server / transfer mechanisms | Reuse with redesign boundary | Good candidate for later production execution closure work. |
+| Control plane | Reuse existing path | Continue from `ProcessAssignments()` toward stronger end-to-end closure. |
+
+## What The Candidate Path Already Proves
+
+For the chosen `RF=2 sync_all` path, the project can already claim:
+
+1. stable remote identity across address change when `ServerID` is present
+2. stale epoch/session fencing through the integrated path
+3. real catch-up one-chain closure on the chosen path
+4. rebuild control/execution chain proven on the chosen path
+   - validation-grade execution closure
+   - not yet production-grade block/image streaming
+5. replay of accepted failure classes on the unified live path
+6. one real failover / reassignment cycle
+7. one true simultaneous-overlap retention safety proof
+8. committed/checkpoint separation accepted for this candidate path:
+   - `CommittedLSN = WALHeadLSN`
+   - `CheckpointLSN` remains the durable base-image boundary
+
+## What Is Still Missing For Product Completion
+
+The biggest remaining product-completion gaps are:
+
+1. production-grade rebuild data transfer
+   - `TransferFullBase` must become real streaming, not only accessibility validation
+   - `TransferSnapshot` must become real image streaming, not only checkpoint validation
+2. replica-ahead physical correction
+   - `TruncateWAL` must stop being a stub
+3. stronger live runtime ownership
+   - the V2 recovery driver/executors should become a more complete live runtime path, not only a bounded hardening path
+4. stronger control-plane closure
+   - current proof reaches `ProcessAssignments()`
+   - full heartbeat/gRPC-level closure is still bounded
+5. product-surface rebinding
+   - `CSI`
+   - `NVMe`
+   - `iSCSI`
+   - snapshot product path
+6. production hardening
+   - restart / soak / repeated disturbance / diagnosis quality
+
+## Recommended Completion Roadmap
+
+### Stage 1: Finish Phase 08 cleanly
+
+Target:
+
+1. close candidate-path judgment with explicit bounds and package it cleanly inside `Phase 08 P4`
+
+Main output:
+
+1. one accepted candidate package for the chosen path
+
+### Stage 2: Phase 09 Production Execution Closure
+
+Target:
+
+1. turn validation-grade execution into production-grade execution
+
+Main work:
+
+1. real `TransferFullBase`
+2. real `TransferSnapshot`
+3. real `TruncateWAL`
+4. stronger runtime ownership of recovery execution
+
+Why it matters:
+
+This is the largest remaining engineering block between "candidate-safe-with-bounds" and a serious product path.
+
+### Stage 3: Phase 10 Real Control-Plane Closure
+
+Target:
+
+1. strengthen from accepted assignment-entry closure to fuller end-to-end control-path closure
+
+Main work:
+
+1. heartbeat/gRPC-level proof
+2. stronger control/result convergence
+3. better identity completeness for local and remote server roles
+
+### Stage 4: Phase 11 Product Surface Rebinding
+
+Target:
+
+1. connect product-facing surfaces to the V2-backed block path
+
+Candidate areas:
+
+1. snapshot product path
+2. `CSI`
+3. `NVMe`
+4. `iSCSI`
+
+Rule:
+
+Do this after the backend engine/runtime path is strong enough, not before.
+
+### Stage 5: Phase 12 Production Hardening
+
+Target:
+
+1. move from candidate-safe to production-safe
+
+Main work:
+
+1. soak / restart / repeated failover
+2. operational diagnosis quality
+3. performance floor and cost characterization
+4. explicit production blockers / rollout gates
+
+## Completion Gates
+
+The most important gates from here are:
+
+1. execution gate
+   - validation-grade transfer/truncation must become production-grade
+2. runtime ownership gate
+   - V2 recovery must be a stronger live runtime path, not only a bounded tested path
+3. control-plane gate
+   - stronger end-to-end control delivery proof
+4. product-surface gate
+   - front-end surfaces should only rebind after backend correctness is strong enough
+5. production-hardening gate
+   - restart, soak, diagnosis, and repeated disturbance must be acceptable
+
+## Near-Term Planning Guidance
+
+If the goal is to maximize product completion efficiently:
+
+1. do not make `CSI`, `NVMe`, or broad snapshot productization the immediate next heavy phase
+2. first close production execution gaps in the backend path
+3. then strengthen control-plane closure
+4. then rebind product surfaces
+
+In short:
+
+1. backend truth and execution first
+2. product surfaces second
+3. production hardening last
+
+## Short Summary
+
+The V2 line is already beyond "algorithm only".
+It has a real bounded candidate path.
+
+But the remaining work is still substantial, and it is mostly engineering work:
+
+1. production-grade execution
+2. stronger runtime/control closure
+3. product-surface rebinding
+4. production hardening
+
+That is the practical path from the current candidate-safe engine to a production-ready block product.