diff --git a/sw-block/docs/archive/design/README.md b/sw-block/docs/archive/design/README.md new file mode 100644 index 000000000..aaeb5e5c2 --- /dev/null +++ b/sw-block/docs/archive/design/README.md @@ -0,0 +1,28 @@ +# Design Archive + +This directory contains historical `sw-block` design/planning documents that are still worth keeping as references, but are no longer the main entrypoints for current work. + +Use `sw-block/design/` for active design and process documents. +Use `sw-block/.private/phase/` for current phase contracts, logs, and slice-level execution packages. + +## Archived Here + +- `v2-production-roadmap.md` +- `v2-engine-readiness-review.md` +- `v2-engine-slicing-plan.md` +- `v2-prototype-roadmap-and-gates.md` +- `phase-07-service-slice-plan.md` +- `phase-08-engine-skeleton-map.md` +- `v2-first-slice-session-ownership.md` +- `v2-first-slice-sender-ownership.md` +- `a5-a8-traceability.md` + +## Why Archived + +These documents are useful for: + +1. historical decision context +2. earlier slice/phase rationale +3. traceability for passed reviews and planning gates + +They are not the canonical source for the current phase roadmap. diff --git a/sw-block/docs/archive/design/a5-a8-traceability.md b/sw-block/docs/archive/design/a5-a8-traceability.md new file mode 100644 index 000000000..7f1c44ff7 --- /dev/null +++ b/sw-block/docs/archive/design/a5-a8-traceability.md @@ -0,0 +1,117 @@ +# A5-A8 Acceptance Traceability + +Date: 2026-03-29 +Status: historical evidence traceability + +## Purpose + +Map each acceptance criterion to specific executable evidence. +Two evidence layers: +- **Simulator** (distsim): protocol-level proof +- **Prototype** (enginev2): ownership/session-level proof + +--- + +## A5: Non-Convergent Catch-Up Escalates Explicitly + +**Must prove**: tail-chasing or failed catch-up does not pretend success. + +**Pass condition**: explicit `CatchingUp → NeedsRebuild` transition. + +| Evidence | Test | File | Layer | Status | +|----------|------|------|-------|--------| +| Tail-chasing converges or aborts | `TestS6_TailChasing_ConvergesOrAborts` | `cluster_test.go` | distsim | PASS | +| Tail-chasing non-convergent → NeedsRebuild | `TestS6_TailChasing_NonConvergent_EscalatesToNeedsRebuild` | `phase02_advanced_test.go` | distsim | PASS | +| Catch-up timeout → NeedsRebuild | `TestP03_CatchupTimeout_EscalatesToNeedsRebuild` | `phase03_timeout_test.go` | distsim | PASS | +| Reservation expiry aborts catch-up | `TestReservationExpiryAbortsCatchup` | `cluster_test.go` | distsim | PASS | +| Flapping budget exceeded → NeedsRebuild | `TestP02_S5_FlappingExceedsBudget_EscalatesToNeedsRebuild` | `phase02_advanced_test.go` | distsim | PASS | +| Catch-up converges or escalates (I3) | `TestI3_CatchUpConvergesOrEscalates` | `phase045_crash_test.go` | distsim | PASS | +| Catch-up timeout in enginev2 | `TestE2E_NeedsRebuild_Escalation` | `p2_test.go` | enginev2 | PASS | + +**Verdict**: A5 is well-covered. Both simulator and prototype prove explicit escalation. No pretend-success path exists. + +--- + +## A6: Recoverability Boundary Is Explicit + +**Must prove**: recoverable vs unrecoverable gap is decided explicitly. + +**Pass condition**: recovery aborts when reservation/payload availability is lost; rebuild is explicit fallback. + +| Evidence | Test | File | Layer | Status | +|----------|------|------|-------|--------| +| Reservation expiry aborts catch-up | `TestReservationExpiryAbortsCatchup` | `cluster_test.go` | distsim | PASS | +| WAL GC beyond replica → NeedsRebuild | `TestI5_CheckpointGC_PreservesAckedBoundary` | `phase045_crash_test.go` | distsim | PASS | +| Rebuild from snapshot + tail | `TestReplicaRebuildFromSnapshotAndTail` | `cluster_test.go` | distsim | PASS | +| Smart WAL: resolvable → unresolvable | `TestP02_SmartWAL_RecoverableThenUnrecoverable` | `phase02_advanced_test.go` | distsim | PASS | +| Time-varying payload availability | `TestP02_SmartWAL_TimeVaryingAvailability` | `phase02_advanced_test.go` | distsim | PASS | +| RecoverableLSN is replayability proof | `RecoverableLSN()` in `storage.go` | `storage.go` | distsim | Implemented | +| Handshake outcome: NeedsRebuild | `TestExec_HandshakeOutcome_NeedsRebuild_InvalidatesSession` | `execution_test.go` | enginev2 | PASS | + +**Verdict**: A6 is covered. Recovery boundary is decided by explicit reservation + recoverability check, not by optimistic assumption. `RecoverableLSN()` verifies contiguous WAL coverage. + +--- + +## A7: Historical Data Correctness Holds + +**Must prove**: recovered data for target LSN is historically correct; current extent cannot fake old history. + +**Pass condition**: snapshot + tail rebuild matches reference; current-extent reconstruction of old LSN fails correctness. + +| Evidence | Test | File | Layer | Status | +|----------|------|------|-------|--------| +| Snapshot + tail matches reference | `TestReplicaRebuildFromSnapshotAndTail` | `cluster_test.go` | distsim | PASS | +| Historical state not reconstructable after GC | `TestA7_HistoricalState_NotReconstructableAfterGC` | `phase045_crash_test.go` | distsim | PASS | +| `CanReconstructAt()` rejects faked history | `CanReconstructAt()` in `storage.go` | `storage.go` | distsim | Implemented | +| Checkpoint does not leak applied state | `TestI2_CheckpointDoesNotLeakAppliedState` | `phase045_crash_test.go` | distsim | PASS | +| Extent-referenced resolvable records | `TestExtentReferencedResolvableRecordsAreRecoverable` | `cluster_test.go` | distsim | PASS | +| Extent-referenced unresolvable → rebuild | `TestExtentReferencedUnresolvableForcesRebuild` | `cluster_test.go` | distsim | PASS | +| ACK'd flush recoverable after crash (I1) | `TestI1_AckedFlush_RecoverableAfterPrimaryCrash` | `phase045_crash_test.go` | distsim | PASS | + +**Verdict**: A7 is now covered with the Phase 4.5 crash-consistency additions. The critical gap ("current extent cannot fake old history") is proven by `CanReconstructAt()` + `TestA7_HistoricalState_NotReconstructableAfterGC`. + +--- + +## A8: Durability Mode Semantics Are Correct + +**Must prove**: best_effort, sync_all, sync_quorum behave as intended under mixed replica states. + +**Pass condition**: sync_all strict, sync_quorum commits only with true durable quorum, invalid topology rejected. + +| Evidence | Test | File | Layer | Status | +|----------|------|------|-------|--------| +| sync_quorum continues with one lagging | `TestSyncQuorumContinuesWithOneLaggingReplica` | `cluster_test.go` | distsim | PASS | +| sync_all blocks with one lagging | `TestSyncAllBlocksWithOneLaggingReplica` | `cluster_test.go` | distsim | PASS | +| sync_quorum mixed states | `TestSyncQuorumWithMixedReplicaStates` | `cluster_test.go` | distsim | PASS | +| sync_all mixed states | `TestSyncAllBlocksWithMixedReplicaStates` | `cluster_test.go` | distsim | PASS | +| Barrier timeout: sync_all blocked | `TestP03_BarrierTimeout_SyncAll_Blocked` | `phase03_timeout_test.go` | distsim | PASS | +| Barrier timeout: sync_quorum commits | `TestP03_BarrierTimeout_SyncQuorum_StillCommits` | `phase03_timeout_test.go` | distsim | PASS | +| Promotion uses RecoverableLSN | `EvaluateCandidateEligibility()` | `cluster.go` | distsim | Implemented | +| Promoted replica has committed prefix (I4) | `TestI4_PromotedReplica_HasCommittedPrefix` | `phase045_crash_test.go` | distsim | PASS | + +**Verdict**: A8 is well-covered. sync_all is strict (blocks on lagging), sync_quorum uses true durable quorum (not connection count). Promotion now uses `RecoverableLSN()` for committed-prefix check. + +--- + +## Summary + +| Criterion | Simulator Evidence | Prototype Evidence | Status | +|-----------|-------------------|-------------------|--------| +| A5 (catch-up escalation) | 6 tests | 1 test | **Strong** | +| A6 (recoverability boundary) | 6 tests + RecoverableLSN() | 1 test | **Strong** | +| A7 (historical correctness) | 7 tests + CanReconstructAt() | — | **Strong** (new in Phase 4.5) | +| A8 (durability modes) | 7 tests + RecoverableLSN() | — | **Strong** | + +**Total executable evidence**: 26 simulator tests + 2 prototype tests + 2 new storage methods. + +All A5-A8 acceptance criteria have direct test evidence. No criterion depends solely on design-doc claims. + +--- + +## Still Open (Not Blocking) + +| Item | Priority | Why not blocking | +|------|----------|-----------------| +| Predicate exploration / adversarial search | P2 | Manual scenarios already cover known failure classes | +| Catch-up convergence under sustained load | P2 | I3 proves escalation; load-rate modeling is optimization | +| A5-A8 in a single grouped runner view | P3 | Traceability doc serves as grouped evidence for now | diff --git a/sw-block/docs/archive/design/phase-07-service-slice-plan.md b/sw-block/docs/archive/design/phase-07-service-slice-plan.md new file mode 100644 index 000000000..3f0ece875 --- /dev/null +++ b/sw-block/docs/archive/design/phase-07-service-slice-plan.md @@ -0,0 +1,403 @@ +# Phase 07 Service-Slice Plan + +Date: 2026-03-30 +Status: historical phase-planning artifact +Scope: `Phase 07 P0` + +## Purpose + +Define the first real-system service slice that will host the V2 engine, choose the first concrete integration path in the existing codebase, and map engine adapters onto real modules. + +This is a planning document. It does not claim the integration already works. + +## Decision + +The first service slice should be: + +- a single `blockvol` primary on a real volume server +- with one replica target (`RF=2` path) +- driven by the existing master heartbeat / assignment loop +- using the V2 engine only for replication recovery ownership / planning / execution + +This is the narrowest real-system slice that still exercises: + +1. real assignment delivery +2. real epoch and failover signals +3. real volume-server lifecycle +4. real WAL/checkpoint/base-image truth +5. real changed-address / reconnect behavior + +It is narrow enough to avoid reopening the whole system, but real enough to stop hiding behind engine-local mocks. + +## Why This Slice + +This slice is the right first integration target because: + +1. `weed/server/master_grpc_server.go` already delivers block-volume assignments over heartbeat +2. `weed/server/master_block_failover.go` already owns failover / promotion / pending rebuild decisions +3. `weed/storage/blockvol/blockvol.go` already owns the current replication runtime (`shipperGroup`, receiver, WAL retention, checkpoint state) +4. the existing V1/V1.5 failure history is concentrated in exactly this master <-> volume-server <-> blockvol path + +So this slice gives maximum validation value with minimum new surface. + +## First Concrete Integration Path + +The first integration path should be: + +1. master receives volume-server heartbeat +2. master updates block registry and emits `BlockVolumeAssignment` +3. volume server receives assignment +4. block volume adapter converts assignment + local storage state into V2 engine inputs +5. V2 engine drives sender/session/recovery state +6. existing block-volume runtime executes the actual data-path work under engine decisions + +In code, that path starts here: + +- master side: + - `weed/server/master_grpc_server.go` + - `weed/server/master_block_failover.go` + - `weed/server/master_block_registry.go` +- volume / storage side: + - `weed/storage/blockvol/blockvol.go` + - `weed/storage/blockvol/recovery.go` + - `weed/storage/blockvol/wal_shipper.go` + - assignment-handling code under `weed/storage/blockvol/` +- V2 engine side: + - `sw-block/engine/replication/` + +## Service-Slice Boundaries + +### In-process placement + +The V2 engine should initially live: + +- in-process with the volume server / `blockvol` runtime +- not in master +- not as a separate service yet + +Reason: + +- the engine needs local access to storage truth and local recovery execution +- master should remain control-plane authority, not recovery executor + +### Control-plane boundary + +Master remains authoritative for: + +1. epoch +2. role / assignment +3. promotion / failover decision +4. replica membership + +The engine consumes these as control inputs. It does not replace master failover policy in `Phase 07`. + +### Control-Over-Heartbeat Upgrade Path + +For the first V2 product path, the recommended direction is: + +- reuse the existing master <-> volume-server heartbeat path as the control carrier +- upgrade the block-specific control semantics carried on that path +- do not immediately invent a separate control service or assignment channel + +Why: + +1. this is the real Seaweed path already carrying block assignments and confirmations today +2. this gives the fastest route to a real integrated control path +3. it preserves compatibility with existing Seaweed master/volume-server semantics while V2 hardens its own control truth + +Concretely, the current V1 path already provides: + +1. block assignments delivered in heartbeat responses from `weed/server/master_grpc_server.go` +2. assignment application on the volume server in `weed/server/volume_grpc_client_to_master.go` and `weed/server/volume_server_block.go` +3. assignment confirmation and address-change refresh driven by later heartbeats in `weed/server/master_grpc_server.go` and `weed/server/master_block_registry.go` +4. immediate block heartbeat on selected shipper state changes in `weed/server/volume_grpc_client_to_master.go` + +What should be upgraded for V2 is not mainly the transport, but the control contract carried on it: + +1. stable `ReplicaID` +2. explicit `Epoch` +3. explicit role / assignment authority +4. explicit apply/confirm semantics +5. explicit stale assignment rejection +6. explicit address-change refresh as endpoint change, not identity change + +Current cadence note: + +- the block volume heartbeat is periodic (`5 * sleepInterval`) with some immediate state-change heartbeats +- this is acceptable as the first hardening carrier +- it should not be assumed to be the final control responsiveness model + +Deferred design decision: + +- whether block control should eventually move beyond heartbeat-only carriage into a more explicit control/assignment channel should be decided only after the `Phase 08 P1` real control-delivery path exists and can be measured + +That later decision should be based on: + +1. failover / reassignment responsiveness +2. assignment confirmation precision +3. operational complexity +4. whether heartbeat carriage remains too coarse for the block-control path + +Until then, the preferred direction is: + +- strengthen block control semantics over the existing heartbeat path +- do not prematurely create a second control plane + +### Storage boundary + +`blockvol` remains authoritative for: + +1. WAL head / retention reality +2. checkpoint/base-image reality +3. actual catch-up streaming +4. actual rebuild transfer / restore operations + +The engine consumes these as storage truth and recovery execution capabilities. It does not replace the storage backend in `Phase 07`. + +## First-Slice Identity Mapping + +This must be explicit in the first integration slice. + +For `RF=2` on the existing master / block registry path: + +- stable engine `ReplicaID` should be derived from: + - `/` +- not from: + - `DataAddr` + - `CtrlAddr` + - heartbeat transport endpoint + +For this slice, the adapter should map: + +1. `ReplicaID` +- from master/block-registry identity for the replica host entry + +2. `Endpoint` +- from the current replica receiver/data/control addresses reported by the real runtime + +3. `Epoch` +- from the confirmed master assignment for the volume + +4. `SessionKind` +- from master-driven recovery intent / role transition outcome + +This is a hard first-slice requirement because address refresh must not collapse identity back into endpoint-shaped keys. + +## Adapter Mapping + +### 1. ControlPlaneAdapter + +Engine interface today: + +- `HandleHeartbeat(serverID, volumes)` +- `HandleFailover(deadServerID)` + +Real mapping should be: + +- master-side source: + - `weed/server/master_grpc_server.go` + - `weed/server/master_block_failover.go` + - `weed/server/master_block_registry.go` +- volume-server side sink: + - assignment receive/apply path in `weed/storage/blockvol/` + +Recommended real shape: + +- do not literally push raw heartbeat messages into the engine +- instead introduce a thin adapter that converts confirmed master assignment state into: + - stable `ReplicaID` + - endpoint set + - epoch + - recovery target kind + +That keeps master as control owner and the engine as execution owner. + +Important note: + +- the adapter should treat heartbeat as the transport carrier, not as the final protocol shape +- block-control semantics should be made explicit over that carrier +- if a later phase concludes that heartbeat-only carriage is too coarse, that should be a separate design decision after the real hardening path is measured + +### 2. StorageAdapter + +Engine interface today: + +- `GetRetainedHistory()` +- `PinSnapshot(lsn)` / `ReleaseSnapshot(pin)` +- `PinWALRetention(startLSN)` / `ReleaseWALRetention(pin)` +- `PinFullBase(committedLSN)` / `ReleaseFullBase(pin)` + +Real mapping should be: + +- retained history source: + - current WAL head/tail/checkpoint state from `weed/storage/blockvol/blockvol.go` + - recovery helpers in `weed/storage/blockvol/recovery.go` +- WAL retention pin: + - existing retention-floor / replica-aware WAL retention machinery around `shipperGroup` +- snapshot pin: + - existing snapshot/checkpoint artifacts in `blockvol` +- full-base pin: + - explicit pinned full-extent export or equivalent consistent base handle from `blockvol` + +Important constraint: + +- `Phase 07` must not fake this by reconstructing `RetainedHistory` from tests or metadata alone + +### 3. Execution Driver / Executor hookup + +Engine side already has: + +- planner/executor split in `sw-block/engine/replication/driver.go` +- stepwise executors in `sw-block/engine/replication/executor.go` + +Real mapping should be: + +- engine planner decides: + - zero-gap / catch-up / rebuild + - trusted-base requirement + - replayable-tail requirement +- blockvol runtime performs: + - actual WAL catch-up transport + - actual snapshot/base transfer + - actual truncation / apply operations + +Recommended split: + +- engine owns contract and state transitions +- blockvol adapter owns concrete I/O work + +## First-Slice Acceptance Rule + +For the first integration slice, this is a hard rule: + +- `blockvol` may execute recovery I/O +- `blockvol` must not own recovery policy + +Concretely, `blockvol` must not decide: + +1. zero-gap vs catch-up vs rebuild +2. trusted-base validity +3. replayable-tail sufficiency +4. whether rebuild fallback is required + +Those decisions must remain in the V2 engine. + +The bridge may translate engine decisions into concrete blockvol actions, but it must not re-decide recovery policy underneath the engine. + +## First Product Path + +The first product path should be: + +- `RF=2` block volume replication on the existing heartbeat/assignment loop +- primary + one replica +- failover / reconnect / changed-address handling +- rebuild as the formal non-catch-up recovery path + +This is the right first path because it exercises the core correctness boundary without introducing N-replica coordination complexity too early. + +## What Must Be Replaced First + +Current engine-stage pieces that are still mock/test-only or too abstract: + +### Replace first + +1. `mockStorage` in engine tests +- replace with a real `blockvol`-backed `StorageAdapter` + +2. synthetic control events in engine tests +- replace with assignment-driven events from the real master/volume-server path + +3. convenience recovery completion wrappers +- keep them test-only +- real integration should use planner + executor + storage work loop + +### Can remain temporarily abstract in Phase 07 P0/P1 + +1. `ControlPlaneAdapter` exact public shape +- can remain thin while the integration path is being chosen + +2. async production scheduler details +- executor can still be driven by a service loop before full background-task architecture is finalized + +## Recommended Concrete Modules + +### Engine stays here + +- `sw-block/engine/replication/` + +### First real adapter package should be added near blockvol + +Recommended initial location: + +- `weed/storage/blockvol/v2bridge/` + +Reason: + +- keeps V2 engine independent under `sw-block/` +- keeps real-system glue close to blockvol storage truth +- avoids copying engine logic into `weed/` + +Suggested contents: + +1. `control_adapter.go` +- convert master assignment / local apply path into engine intents + +2. `storage_adapter.go` +- expose retained history, pin/release, trusted-base export handles from real blockvol state + +3. `executor_bridge.go` +- translate engine executor steps into actual blockvol recovery actions + +4. `observe_adapter.go` +- map engine status/logs into service-visible diagnostics + +## First Failure Replay Set For Phase 07 + +The first real-system replay set should be: + +1. changed-address restart +- current risk: old identity/address coupling reappears in service glue + +2. stale epoch / stale result after failover +- current risk: master and engine disagree on authority timing + +3. unreplayable-tail rebuild fallback +- current risk: service glue over-trusts checkpoint/base availability + +4. plan/execution cleanup after resource failure +- current risk: blockvol-side resource failures leave engine or service state dangling + +5. primary failover to replica with rebuild pending on old primary reconnect +- current risk: old V1/V1.5 semantics leak back into reconnect handling + +## Non-Goals For This Slice + +Do not use `Phase 07` to: + +1. widen catch-up semantics +2. add smart rebuild optimizations +3. redesign all blockvol internals +4. replace the full V1 runtime in one move +5. claim production readiness + +## Deliverables For Phase 07 P0 + +A good `P0` delivery should include: + +1. chosen service slice +2. chosen integration path in the current repo +3. adapter-to-module mapping +4. list of test-only adapters to replace first +5. first failure replay set +6. explicit note of what remains outside this first slice + +## Short Form + +`Phase 07 P0` should start with: + +- engine in `sw-block/engine/replication/` +- bridge in `weed/storage/blockvol/v2bridge/` +- first real slice = blockvol primary + one replica on the existing master heartbeat / assignment path +- `ReplicaID = /` for the first slice +- `blockvol` executes I/O but does not own recovery policy +- first product path = `RF=2` failover/reconnect/rebuild correctness diff --git a/sw-block/docs/archive/design/phase-08-engine-skeleton-map.md b/sw-block/docs/archive/design/phase-08-engine-skeleton-map.md new file mode 100644 index 000000000..312cca5e0 --- /dev/null +++ b/sw-block/docs/archive/design/phase-08-engine-skeleton-map.md @@ -0,0 +1,301 @@ +# Phase 08 Engine Skeleton Map + +Date: 2026-03-31 +Status: historical phase map +Purpose: provide a short structural map for the `Phase 08` hardening path so implementation can move faster without reopening accepted V2 boundaries + +## Scope + +This is not the final standalone `sw-block` architecture. + +It is the shortest useful engine skeleton for the accepted `Phase 08` hardening path: + +- `RF=2` +- `sync_all` +- existing `Seaweed` master / volume-server heartbeat path +- V2 engine owns recovery policy +- `blockvol` remains the execution backend + +## Module Map + +### 1. Control plane + +Role: + +- authoritative control truth + +Primary sources: + +- `weed/server/master_grpc_server.go` +- `weed/server/master_block_registry.go` +- `weed/server/master_block_failover.go` +- `weed/server/volume_grpc_client_to_master.go` + +What it produces: + +- confirmed assignment +- `Epoch` +- target `Role` +- failover / promotion / reassignment result +- stable server identity + +### 2. Control bridge + +Role: + +- translate real control truth into V2 engine intent + +Primary files: + +- `weed/storage/blockvol/v2bridge/control.go` +- `sw-block/bridge/blockvol/control_adapter.go` +- entry path in `weed/server/volume_server_block.go` + +What it produces: + +- `AssignmentIntent` +- stable `ReplicaID` +- `Endpoint` +- `SessionKind` + +### 3. Engine runtime + +Role: + +- recovery-policy core + +Primary files: + +- `sw-block/engine/replication/orchestrator.go` +- `sw-block/engine/replication/driver.go` +- `sw-block/engine/replication/executor.go` +- `sw-block/engine/replication/sender.go` +- `sw-block/engine/replication/history.go` + +What it decides: + +- zero-gap / catch-up / needs-rebuild +- sender/session ownership +- stale authority rejection +- resource acquisition / release +- rebuild source selection + +### 4. Storage bridge + +Role: + +- translate real blockvol storage truth and execution capability into engine-facing adapters + +Primary files: + +- `weed/storage/blockvol/v2bridge/reader.go` +- `weed/storage/blockvol/v2bridge/pinner.go` +- `weed/storage/blockvol/v2bridge/executor.go` +- `sw-block/bridge/blockvol/storage_adapter.go` + +What it provides: + +- `RetainedHistory` +- WAL retention pin / release +- snapshot pin / release +- full-base pin / release +- WAL scan execution + +### 5. Block runtime + +Role: + +- execute real I/O + +Primary files: + +- `weed/storage/blockvol/blockvol.go` +- `weed/storage/blockvol/replica_apply.go` +- `weed/storage/blockvol/replica_barrier.go` +- `weed/storage/blockvol/recovery.go` +- `weed/storage/blockvol/rebuild.go` +- `weed/storage/blockvol/wal_shipper.go` + +What it owns: + +- WAL +- extent +- flusher +- checkpoint / superblock +- receiver / shipper +- rebuild server + +## Execution Order + +### Control path + +```text +master heartbeat / failover truth + -> BlockVolumeAssignment + -> volume server ProcessAssignments + -> v2bridge control conversion + -> engine ProcessAssignment + -> sender/session state updated +``` + +### Catch-up path + +```text +assignment accepted + -> engine reads retained history + -> engine plans catch-up + -> storage bridge pins WAL retention + -> engine executor drives v2bridge executor + -> blockvol scans WAL / ships entries + -> engine completes session +``` + +### Rebuild path + +```text +assignment accepted + -> engine detects NeedsRebuild + -> engine selects rebuild source + -> storage bridge pins snapshot/full-base/tail + -> executor drives transfer path + -> blockvol performs restore / replay work + -> engine completes rebuild +``` + +### Local durability path + +```text +WriteLBA / Trim + -> WAL append + -> shipping / barrier + -> client-visible durability decision + -> flusher writes extent + -> checkpoint advances + -> retention floor decides WAL reclaimability +``` + +## Interim Fields + +These are currently acceptable only as explicit hardening carry-forwards: + +### `localServerID` + +Current source: + +- `BlockService.listenAddr` + +Meaning: + +- temporary local identity source for replica/rebuild-side assignment translation + +Status: + +- interim only +- should become registry-assigned stable server identity later + +### `CommittedLSN = CheckpointLSN` + +Current source: + +- `v2bridge.Reader` / `BlockVol.StatusSnapshot()` + +Meaning: + +- current V1-style interim mapping where committed truth collapses to local checkpoint truth + +Status: + +- not final V2 truth +- must become a gate decision before a production-candidate phase + +### heartbeat as control carrier + +Current source: + +- existing master <-> volume-server heartbeat path + +Meaning: + +- current transport for assignment/control delivery + +Status: + +- acceptable as current carrier +- not yet a final proof that no separate control channel will ever be needed + +## Hard Gates + +These should remain explicit in `Phase 08`: + +### Gate 1: committed truth + +Before production-candidate: + +- either separate `CommittedLSN` from `CheckpointLSN` +- or explicitly bound the first candidate path to currently proven pre-checkpoint replay behavior + +### Gate 2: live control delivery + +Required: + +- real assignment delivery must reach the engine on the live path +- not only converter-level proof + +### Gate 3: integrated catch-up closure + +Required: + +- engine -> executor -> `v2bridge` -> blockvol must be proven as one live chain +- not planner proof plus direct WAL-scan proof as separate evidence + +### Gate 4: first rebuild execution path + +Required: + +- rebuild must not remain only a detection outcome +- the chosen product path needs one real executable rebuild closure + +### Gate 5: unified replay + +Required: + +- after control and execution closure land, rerun the accepted failure-class set on the unified live path + +## Reuse Map + +### Reuse directly + +- `weed/server/master_grpc_server.go` +- `weed/server/volume_grpc_client_to_master.go` +- `weed/server/volume_server_block.go` +- `weed/server/master_block_registry.go` +- `weed/server/master_block_failover.go` +- `weed/storage/blockvol/blockvol.go` +- `weed/storage/blockvol/replica_apply.go` +- `weed/storage/blockvol/replica_barrier.go` +- `weed/storage/blockvol/v2bridge/` + +### Reuse as implementation reality, not truth + +- `shipperGroup` +- `RetentionFloorFn` +- `ReplicaReceiver` +- checkpoint/superblock machinery +- existing failover heuristics + +### Do not inherit as V2 semantics + +- address-shaped identity +- old degraded/catch-up intuition from V1/V1.5 +- `CommittedLSN = CheckpointLSN` as final truth +- blockvol-side recovery policy decisions + +## Short Rule + +Use this skeleton as: + +- a hardening map for the current product path + +Do not mistake it for: + +- the final standalone `sw-block` architecture diff --git a/sw-block/docs/archive/design/v2-engine-readiness-review.md b/sw-block/docs/archive/design/v2-engine-readiness-review.md new file mode 100644 index 000000000..4c00158cf --- /dev/null +++ b/sw-block/docs/archive/design/v2-engine-readiness-review.md @@ -0,0 +1,170 @@ +# V2 Engine Readiness Review + +Date: 2026-03-29 +Status: historical readiness review +Purpose: record the decision on whether the current V2 design + prototype + simulator stack is strong enough to begin real V2 engine slicing + +## Decision + +Current judgment: + +- proceed to real V2 engine planning +- do not open a `V2.5` redesign track at this time + +This is a planning-readiness decision, not a production-readiness claim. + +## Why This Review Exists + +The project has now completed: + +1. design/FSM closure for the V2 line +2. protocol simulation closure for: + - V1 / V1.5 / V2 comparison + - timeout/race behavior + - ownership/session semantics +3. standalone prototype closure for: + - sender/session ownership + - execution authority + - recovery branching + - minimal historical-data proof + - prototype scenario closure +4. `Phase 4.5` hardening for: + - bounded `CatchUp` + - first-class `Rebuild` + - crash-consistency / restart-recoverability + - `A5-A8` stronger evidence + +So the question is no longer: + +- "can the prototype be made richer?" + +The question is: + +- "is the evidence now strong enough to begin real engine slicing?" + +## Evidence Summary + +### 1. Design / Protocol + +Primary docs: + +- `sw-block/design/v2-acceptance-criteria.md` +- `sw-block/design/v2-open-questions.md` +- `sw-block/design/v2_scenarios.md` +- `sw-block/design/v1-v15-v2-comparison.md` +- `sw-block/docs/archive/design/v2-prototype-roadmap-and-gates.md` + +Judgment: + +- protocol story is coherent +- acceptance set exists +- major V1 / V1.5 failures are mapped into V2 scenarios + +### 2. Simulator + +Primary code/tests: + +- `sw-block/prototype/distsim/` +- `sw-block/prototype/distsim/eventsim.go` +- `learn/projects/sw-block/test/results/v2-simulation-review.md` + +Judgment: + +- strong enough for protocol/design validation +- strong enough to challenge crash-consistency and liveness assumptions +- not a substitute for real engine / hardware proof + +### 3. Prototype + +Primary code/tests: + +- `sw-block/prototype/enginev2/` +- `sw-block/prototype/enginev2/acceptance_test.go` + +Judgment: + +- ownership is explicit and fenced +- execution authority is explicit and fenced +- bounded `CatchUp` is semantic, not documentary +- `Rebuild` is a first-class sender-owned path +- historical-data and recoverability reasoning are executable + +### 4. `A5-A8` Double Evidence + +Prototype-side grouped evidence: + +- `sw-block/prototype/enginev2/acceptance_test.go` + +Simulator-side grouped evidence: + +- `sw-block/docs/archive/design/a5-a8-traceability.md` +- `sw-block/prototype/distsim/` + +Judgment: + +- the critical acceptance items that most affect engine risk now have materially stronger proof on both sides + +## What Is Good Enough Now + +The following are good enough to begin engine slicing: + +1. sender/session ownership model +2. stale authority fencing +3. recovery orchestration shape +4. bounded `CatchUp` contract +5. `Rebuild` as formal path +6. committed/recoverable boundary thinking +7. crash-consistency / restart-recoverability proof style + +## What Is Still Not Proven + +The following still require real engine work and later real-system validation: + +1. actual engine lifecycle integration +2. real storage/backend implementation +3. real control-plane integration +4. real durability / fsync behavior under the actual engine +5. real hardware timing / performance +6. final production observability and failure handling + +These are expected gaps. They do not block engine planning. + +## Open Risks To Carry Forward + +These are not blockers, but they should remain explicit: + +1. prototype and simulator are still reduced models +2. rebuild-source quality in the real engine will depend on actual checkpoint/base-image mechanics +3. durability truth in the real engine must still be re-proven against actual persistence behavior +4. predicate exploration can still grow, but should not block engine slicing + +## Engine-Planning Decision + +Decision: + +- start real V2 engine planning + +Reason: + +1. no current evidence points to a structural flaw requiring `V2.5` +2. the remaining gaps are implementation/system gaps, not prototype ambiguity +3. continuing to extend prototype/simulator breadth would have diminishing returns + +## Required Outputs After This Review + +1. `sw-block/docs/archive/design/v2-engine-slicing-plan.md` +2. first real engine slice definition +3. explicit non-goals for first engine stage +4. explicit validation plan for engine slices + +## Non-Goals Of This Review + +This review does not claim: + +1. V2 is production-ready +2. V2 should replace V1 immediately +3. all design questions are forever closed + +It only claims: + +- the project now has enough evidence to begin disciplined real engine slicing diff --git a/sw-block/docs/archive/design/v2-engine-slicing-plan.md b/sw-block/docs/archive/design/v2-engine-slicing-plan.md new file mode 100644 index 000000000..6c3bc2d03 --- /dev/null +++ b/sw-block/docs/archive/design/v2-engine-slicing-plan.md @@ -0,0 +1,191 @@ +# V2 Engine Slicing Plan + +Date: 2026-03-29 +Status: historical slicing plan +Purpose: define the first real V2 engine slices after prototype and `Phase 4.5` closure + +## Goal + +Move from: + +- standalone design/prototype truth under `sw-block/prototype/` + +to: + +- a real V2 engine core under `sw-block/` + +without dragging V1.5 lifecycle assumptions into the implementation. + +## Planning Rules + +1. reuse V1 ideas and tests selectively, not structurally +2. prefer narrow vertical slices over broad skeletons +3. each slice must preserve the accepted V2 ownership/fencing model +4. keep simulator/prototype as validation support, not as the implementation itself +5. do not mix V2 engine work into `weed/storage/blockvol/` + +## First Engine Stage + +The first engine stage should build the control/recovery core, not the full storage engine. + +That means: + +1. per-replica sender identity +2. one active recovery session per replica per epoch +3. sender-owned execution authority +4. explicit recovery outcomes: + - zero gap + - bounded catch-up + - rebuild +5. rebuild execution shell only + - do not hard-code final snapshot + tail vs full base decision logic yet + - keep real rebuild-source choice tied to Slice 3 recoverability inputs + +## Recommended Slice Order + +### Slice 1: Engine Ownership Core + +Purpose: + +- carry the accepted `enginev2` ownership/fencing model into the real engine core + +Scope: + +1. stable per-replica sender object +2. stable recovery-session object +3. session identity fencing +4. endpoint / epoch invalidation +5. sender-group or equivalent ownership registry + +Acceptance: + +1. stale session results cannot mutate current authority +2. changed-address and epoch-bump invalidation work in engine code +3. the 4 V2-boundary ownership themes remain provable + +### Slice 2: Engine Recovery Execution Core + +Purpose: + +- move the prototype execution APIs into real engine behavior + +Scope: + +1. connect / handshake / catch-up flow +2. bounded `CatchUp` +3. explicit `NeedsRebuild` +4. sender-owned rebuild execution path +5. rebuild execution shell without final trusted-base selection policy + +Acceptance: + +1. bounded catch-up does not chase indefinitely +2. rebuild is exclusive from catch-up +3. session completion rules are explicit and fenced + +### Slice 3: Engine Data / Recoverability Core + +Purpose: + +- connect recovery behavior to real retained-history / checkpoint mechanics + +Scope: + +1. real recoverability decision inputs +2. trusted-base decision for rebuild source +3. minimal real checkpoint/base-image integration +4. real truncation / safe-boundary handling + +This is the first slice that should decide, from real engine inputs, between: + +1. `snapshot + tail` +2. `full base` + +Acceptance: + +1. engine can explain why recovery is allowed +2. rebuild-source choice is explicit and testable +3. historical correctness and truncation rules remain intact + +### Slice 4: Engine Integration Closure + +Purpose: + +- bind engine control/recovery core to real orchestration and validation surfaces + +Scope: + +1. real assignment/control intent entry path +2. engine-facing observability +3. focused real-engine tests for V2-boundary cases +4. first integration review against real failure classes + +Acceptance: + +1. key V2-boundary failures are reproduced and closed in engine tests +2. engine observability is good enough to debug ownership/recovery failures +3. remaining gaps are system/performance gaps, not control-model ambiguity + +## What To Reuse + +Good reuse candidates: + +1. tests and failure cases from V1 / V1.5 +2. narrow utility/data helpers where not coupled to V1 lifecycle +3. selected WAL/history concepts if they fit V2 ownership boundaries + +Do not structurally reuse: + +1. V1/V1.5 shipper lifecycle +2. address-based identity assumptions +3. `SetReplicaAddrs`-style behavior +4. old recovery control structure + +## Where The Work Should Live + +Real V2 engine work should continue under: + +- `sw-block/` + +Recommended next area: + +- `sw-block/core/` +or +- `sw-block/engine/` + +Exact path can be chosen later, but it should remain separate from: + +- `sw-block/prototype/` +- `weed/storage/blockvol/` + +## Validation Plan For Engine Slices + +Each engine slice should be validated at three levels: + +1. prototype alignment +- does engine behavior preserve the accepted prototype invariant? + +2. focused engine tests +- does the real engine slice enforce the same contract? + +3. scenario mapping +- does at least one important V1/V1.5 failure class remain closed? + +## Non-Goals For First Engine Stage + +Do not try to do these immediately: + +1. full Smart WAL expansion +2. performance optimization +3. V1 replacement/migration plan +4. full product integration +5. all storage/backend redesign at once + +## Immediate Next Assignment + +The first concrete engine-planning task should be: + +1. choose the real V2 engine module location under `sw-block/` +2. define Slice 1 file/module boundaries +3. write a short engine ownership-core spec +4. map 3-5 acceptance scenarios directly onto Slice 1 expectations diff --git a/sw-block/docs/archive/design/v2-first-slice-sender-ownership.md b/sw-block/docs/archive/design/v2-first-slice-sender-ownership.md new file mode 100644 index 000000000..e8dd94aec --- /dev/null +++ b/sw-block/docs/archive/design/v2-first-slice-sender-ownership.md @@ -0,0 +1,159 @@ +# V2 First Slice: Per-Replica Sender/Session Ownership + +Date: 2026-03-27 +Status: historical first-slice note +Depends-on: Q1 (recovery session), Q6 (orchestrator scope), Q7 (first slice) + +## Problem + +`SetReplicaAddrs()` replaces the entire `ShipperGroup` atomically. This causes: + +1. **State loss on topology change.** All shippers are destroyed and recreated. + Recovery state (`replicaFlushedLSN`, `lastContactTime`, catch-up progress) is lost. + After a changed-address restart, the new shipper starts from scratch. + +2. **No per-replica identity.** Shippers are identified by array index. The master + cannot target a specific replica for rebuild/catch-up — it must re-issue the + entire address set. + +3. **Background reconnect races.** A reconnect cycle may be in progress when + `SetReplicaAddrs` replaces the group. The in-progress reconnect's connection + objects become orphaned. + +## Design + +### Per-replica sender identity + +`ShipperGroup` changes from `[]*WALShipper` to `map[string]*WALShipper`, keyed by +the replica's canonical data address. Each shipper stores its own `ReplicaID`. + +```go +type WALShipper struct { + ReplicaID string // canonical data address — identity across reconnects + // ... existing fields +} + +type ShipperGroup struct { + mu sync.RWMutex + shippers map[string]*WALShipper // keyed by ReplicaID +} +``` + +### ReconcileReplicas replaces SetReplicaAddrs + +Instead of replacing the entire group, `ReconcileReplicas` diffs old vs new: + +``` +ReconcileReplicas(newAddrs []ReplicaAddr): + for each existing shipper: + if NOT in newAddrs → Stop and remove + for each newAddr: + if matching shipper exists → keep (preserve state) + if no match → create new shipper +``` + +This preserves `replicaFlushedLSN`, `lastContactTime`, catch-up progress, and +background reconnect goroutines for replicas that stay in the set. + +`SetReplicaAddrs` becomes a wrapper: +```go +func (v *BlockVol) SetReplicaAddrs(addrs []ReplicaAddr) { + if v.shipperGroup == nil { + v.shipperGroup = NewShipperGroup(nil) + } + v.shipperGroup.ReconcileReplicas(addrs, v.makeShipperFactory()) +} +``` + +### Changed-address restart flow + +1. Replica restarts on new port. Heartbeat reports new address. +2. Master detects endpoint change (address differs, same volume). +3. Master sends assignment update to primary with new replica address. +4. Primary's `ReconcileReplicas` receives `[oldAddr1, newAddr2]`. +5. Old shipper for the changed replica is stopped (old address gone from set). +6. New shipper created with new address — but this is a fresh shipper. +7. New shipper bootstraps: Disconnected → Connecting → CatchingUp → InSync. + +The improvement over V1.5: the **other** replicas in the set are NOT disturbed. +Only the changed replica gets a fresh shipper. Recovery state for stable replicas +is preserved. + +### Recovery session + +Each WALShipper already contains the recovery state machine: +- `state` (Disconnected → Connecting → CatchingUp → InSync → Degraded → NeedsRebuild) +- `replicaFlushedLSN` (authoritative progress) +- `lastContactTime` (retention budget) +- `catchupFailures` (escalation counter) +- Background reconnect goroutine + +No separate `RecoverySession` object is needed. The WALShipper IS the per-replica +recovery session. The state machine already tracks the session lifecycle. + +What changes: the session is no longer destroyed on topology change (unless the +replica itself is removed from the set). + +### Coordinator vs primary responsibilities + +| Responsibility | Owner | +|---------------|-------| +| Endpoint truth (canonical address) | Coordinator (master) | +| Assignment updates (add/remove replicas) | Coordinator | +| Epoch authority | Coordinator | +| Session creation trigger | Coordinator (via assignment) | +| Session execution (reconnect, catch-up, barrier) | Primary (via WALShipper) | +| Timeout enforcement | Primary | +| Ordered receive/apply | Replica | +| Barrier ack | Replica | +| Heartbeat reporting | Replica | + +### Migration from current code + +| Current | V2 | +|---------|-----| +| `ShipperGroup.shippers []*WALShipper` | `ShipperGroup.shippers map[string]*WALShipper` | +| `SetReplicaAddrs()` creates all new | `ReconcileReplicas()` diffs and preserves | +| `StopAll()` in demote | `StopAll()` unchanged (stops all) | +| `ShipAll(entry)` iterates slice | `ShipAll(entry)` iterates map values | +| `BarrierAll(lsn)` parallel slice | `BarrierAll(lsn)` parallel map values | +| `MinReplicaFlushedLSN()` iterates slice | Same, iterates map values | +| `ShipperStates()` iterates slice | Same, iterates map values | +| No per-shipper identity | `WALShipper.ReplicaID` = canonical data addr | + +### Files changed + +| File | Change | +|------|--------| +| `wal_shipper.go` | Add `ReplicaID` field, pass in constructor | +| `shipper_group.go` | `map[string]*WALShipper`, `ReconcileReplicas`, update iterators | +| `blockvol.go` | `SetReplicaAddrs` calls `ReconcileReplicas`, shipper factory | +| `promotion.go` | No change (StopAll unchanged) | +| `dist_group_commit.go` | No change (uses ShipperGroup API) | +| `block_heartbeat.go` | No change (uses ShipperStates) | + +### Acceptance bar + +The following existing tests must continue to pass: +- All CP13-1 through CP13-7 protocol tests (sync_all_protocol_test.go) +- All adversarial tests (sync_all_adversarial_test.go) +- All baseline tests (sync_all_bug_test.go) +- All rebuild tests (rebuild_v1_test.go) + +The following CP13-8 tests validate the V2 improvement: +- `TestCP13_SyncAll_ReplicaRestart_Rejoin` — changed-address recovery +- `TestAdversarial_ReconnectUsesHandshakeNotBootstrap` — V2 reconnect protocol +- `TestAdversarial_CatchupMultipleDisconnects` — state preservation across reconnects + +New tests to add: +- `TestReconcileReplicas_PreservesExistingShipper` — stable replica keeps state +- `TestReconcileReplicas_RemovesStaleShipper` — removed replica stopped +- `TestReconcileReplicas_AddsNewShipper` — new replica bootstraps +- `TestReconcileReplicas_MixedUpdate` — one kept, one removed, one added + +## Non-goals for this slice + +- Smart WAL payload classes +- Recovery reservation protocol +- Full coordinator orchestration +- New transport layer diff --git a/sw-block/docs/archive/design/v2-first-slice-session-ownership.md b/sw-block/docs/archive/design/v2-first-slice-session-ownership.md new file mode 100644 index 000000000..025e30254 --- /dev/null +++ b/sw-block/docs/archive/design/v2-first-slice-session-ownership.md @@ -0,0 +1,194 @@ +# V2 First Slice: Per-Replica Sender and Recovery Session Ownership + +Date: 2026-03-27 +Status: historical first-slice note + +## Purpose + +This document defines the first real V2 implementation slice. + +The slice is intentionally narrow: + +- per-replica sender ownership +- explicit recovery session ownership +- clear coordinator vs primary responsibility + +This is the first step toward a standalone V2 block engine under `sw-block/`. + +## Why This Slice First + +It directly addresses the clearest V1.5 structural limits: + +- sender identity loss when replica sets are refreshed +- changed-address restart recovery complexity +- repeated reconnect cycles without stable per-replica ownership +- adversarial Phase 13 boundary tests that V1.5 cannot cleanly satisfy + +It also avoids jumping too early into: + +- Smart WAL +- new backend storage layout +- full production transport redesign + +## Core Decision + +Use: + +- **one sender owner per replica** +- **at most one active recovery session per replica per epoch** + +Healthy replicas may only need their steady sender object. + +Degraded / reconnecting replicas gain an explicit recovery session owned by the primary. + +## Ownership Split + +### Coordinator + +Owns: + +- replica identity / endpoint truth +- assignment updates +- epoch authority +- session creation / destruction intent + +Does not own: + +- byte-by-byte catch-up execution +- local sender loop scheduling + +### Primary + +Owns: + +- per-replica sender objects +- per-replica recovery session execution +- reconnect / catch-up progress +- timeout enforcement for active session +- transition from: + - normal sender + - to recovery session + - back to normal sender + +### Replica + +Owns: + +- receive/apply path +- barrier ack +- heartbeat/reporting + +Replica remains passive from the recovery-orchestration point of view. + +## Data Model + +## Sender Owner + +Per replica, maintain a stable sender owner with: + +- replica logical ID +- current endpoint +- current epoch view +- steady-state health/status +- optional active recovery session reference + +## Recovery Session + +Per replica, per epoch: + +- `ReplicaID` +- `Epoch` +- `EndpointVersion` or equivalent endpoint truth +- `State` + - `connecting` + - `catching_up` + - `in_sync` + - `needs_rebuild` +- `StartLSN` +- `TargetLSN` +- timeout / deadline metadata + +## Session Rules + +1. only one active session per replica per epoch +2. new assignment for same replica: +- supersedes old session only if epoch/session generation is newer +3. stale session must not continue after: +- epoch bump +- endpoint truth change +- explicit coordinator replacement + +## Minimal State Transitions + +### Healthy path + +1. replica sender exists +2. sender ships normally +3. replica remains `InSync` + +### Recovery path + +1. sender detects or is told replica is not healthy +2. coordinator provides valid assignment/endpoint truth +3. primary creates recovery session +4. session connects +5. session catches up if recoverable +6. on success: +- session closes +- steady sender resumes normal state + +### Rebuild path + +1. session determines catch-up is not sufficient +2. session transitions to `needs_rebuild` +3. higher layer rebuild flow takes over + +## What This Slice Does Not Include + +Not in the first slice: + +- Smart WAL payload classes in production +- snapshot pinning / GC logic +- new on-disk engine +- frontend publication changes +- full production event scheduler + +## Proposed V2 Workspace Target + +Do this under `sw-block/`, not `weed/storage/blockvol/`. + +Suggested area: + +- `sw-block/prototype/enginev2/` + +Suggested first files: + +- `sw-block/prototype/enginev2/session.go` +- `sw-block/prototype/enginev2/sender.go` +- `sw-block/prototype/enginev2/group.go` +- `sw-block/prototype/enginev2/session_test.go` + +The first code does not need full storage I/O. +It should prove ownership and transition shape first. + +## Acceptance For This Slice + +The slice is good enough when: + +1. sender identity is stable per replica +2. changed-address reassignment updates the right sender owner +3. multiple reconnect cycles do not lose recovery ownership +4. stale session does not survive epoch bump +5. the 4 Phase 13 V2-boundary tests have a clear path to become satisfiable + +## Relationship To Existing Simulator + +This slice should align with: + +- `v2-acceptance-criteria.md` +- `v2-open-questions.md` +- `v1-v15-v2-comparison.md` +- `distsim` / `eventsim` behavior + +The simulator remains the design oracle. +The first implementation slice should not contradict it. diff --git a/sw-block/docs/archive/design/v2-production-roadmap.md b/sw-block/docs/archive/design/v2-production-roadmap.md new file mode 100644 index 000000000..3766cd678 --- /dev/null +++ b/sw-block/docs/archive/design/v2-production-roadmap.md @@ -0,0 +1,199 @@ +# V2 Production Roadmap + +Date: 2026-03-30 +Status: historical roadmap +Purpose: define the path from the accepted V2 engine core to a production candidate + +## Current Position + +Completed: + +1. design / FSM closure +2. simulator / protocol validation +3. prototype closure +4. evidence hardening +5. engine core slices: + - Slice 1 ownership core + - Slice 2 recovery execution core + - Slice 3 data / recoverability core + - Slice 4 integration closure + +Current stage: + +- entering broader engine implementation + +This means the main risk is no longer: + +- whether the V2 idea stands up + +The main risk is: + +- whether the accepted engine core can be turned into a real system without reintroducing V1/V1.5 structure and semantics + +## Roadmap Summary + +1. Phase 06: broader engine implementation stage +2. Phase 07: real-system integration / product-path decision +3. Phase 08: pre-production hardening +4. Phase 09: performance / scale / soak validation +5. Phase 10: production candidate and rollout gate + +## Phase 06 + +### Goal + +Connect the accepted engine core to: + +1. real control truth +2. real storage truth +3. explicit engine execution steps + +### Outputs + +1. control-plane adapter into the engine core +2. storage/base/recoverability adapters +3. explicit execution-driver model where synchronous helpers are no longer sufficient +4. validation against selected real failure classes + +### Gate + +At the end of Phase 06, the project should be able to say: + +- the engine core can live inside a real system shape + +## Phase 07 + +### Goal + +Move from engine-local correctness to a real runnable subsystem. + +### Outputs + +1. service-style runnable engine slice +2. integration with real control and storage surfaces +3. crash/failover/restart integration tests +4. decision on the first viable product path + +### Gate + +At the end of Phase 07, the project should be able to say: + +- the engine can run as a real subsystem, not only as an isolated core + +## Phase 08 + +### Goal + +Turn correctness into operational safety. + +### Outputs + +1. observability hardening +2. operator/debug flows +3. recovery/runbook procedures +4. config surface cleanup +5. realistic durability/restart validation + +### Gate + +At the end of Phase 08, the project should be able to say: + +- operators can run, debug, and recover the system safely + +## Phase 09 + +### Goal + +Prove viability under load and over time. + +### Outputs + +1. throughput / latency baselines +2. rebuild / catch-up cost characterization +3. steady-state overhead measurement +4. soak testing +5. scale and failure-under-load validation + +### Gate + +At the end of Phase 09, the project should be able to say: + +- the design is not only correct, but viable at useful scale and duration + +## Phase 10 + +### Goal + +Produce a controlled production candidate. + +### Outputs + +1. feature-gated production candidate +2. rollback strategy +3. migration/coexistence plan with V1 +4. staged rollout plan +5. production acceptance checklist + +### Gate + +At the end of Phase 10, the project should be able to say: + +- the system is ready for a controlled production rollout + +## Cross-Phase Rules + +### Rule 1: Do not reopen protocol shape casually + +The accepted core should remain stable unless new implementation evidence forces a change. + +### Rule 2: Use V1 as validation source, not design template + +Use: + +1. `learn/projects/sw-block/` +2. `weed/storage/block*` + +for: + +1. failure gates +2. constraints +3. integration references + +Do not use them as the default V2 architecture template. + +### Rule 3: Keep `CatchUp` narrow + +Do not let later implementation phases re-expand `CatchUp` into a broad, optimistic, long-lived recovery mode. + +### Rule 4: Keep evidence quality ahead of object growth + +New work should preferentially improve: + +1. traceability +2. diagnosability +3. real-failure validation +4. operational confidence + +not simply add new objects, states, or mechanisms. + +## Production Readiness Ladder + +The project should move through this ladder explicitly: + +1. proof-of-design +2. proof-of-engine-shape +3. proof-of-runnable-engine-stage +4. proof-of-operable-system +5. proof-of-viable-production-candidate + +Current ladder position: + +- between `2` and `3` +- engine core accepted; broader runnable engine stage underway + +## Next Documents To Maintain + +1. `sw-block/.private/phase/phase-06.md` +2. `sw-block/docs/archive/design/v2-engine-readiness-review.md` +3. `sw-block/docs/archive/design/v2-engine-slicing-plan.md` +4. this roadmap diff --git a/sw-block/docs/archive/design/v2-prototype-roadmap-and-gates.md b/sw-block/docs/archive/design/v2-prototype-roadmap-and-gates.md new file mode 100644 index 000000000..99967766c --- /dev/null +++ b/sw-block/docs/archive/design/v2-prototype-roadmap-and-gates.md @@ -0,0 +1,239 @@ +# V2 Prototype Roadmap And Gates + +Date: 2026-03-27 +Status: historical prototype roadmap +Purpose: define the remaining prototype roadmap, the validation gates between stages, and the decision point between real V2 engine work and possible V2.5 redesign + +## Current Position + +V2 design/FSM/simulator work is sufficiently closed for serious prototyping, but not frozen against later `V2.5` adjustments. + +Current state: + +- design proof: high +- execution proof: medium +- data/recovery proof: low +- prototype end-to-end proof: low + +Rough prototype progress: + +- `25%` to `35%` + +This is early executable prototype, not engine-ready prototype. + +## Roadmap Goal + +Answer this question with prototype evidence: + +- can V2 become a real engine path? +- or should it become `V2.5` before real implementation begins? + +## Step 1: Execution Authority Closure + +Purpose: + +- finish the sender / recovery-session authority model so stale work is unambiguously rejected + +Scope: + +1. ownership-only `AttachSession()` / `SupersedeSession()` +2. execution begins only through execution APIs +3. stale handshake / progress / completion fenced by `sessionID` +4. endpoint bump / epoch bump invalidate execution authority +5. sender-group preserve-or-kill behavior is explicit + +Done when: + +1. all execution APIs are sender-gated and reject stale `sessionID` +2. session creation is separated from execution start +3. phase ordering is enforced +4. endpoint bump / epoch bump invalidate execution authority correctly +5. mixed add/remove/update reconciliation preserves or kills state exactly as intended + +Main files: + +- `sw-block/prototype/enginev2/` +- `sw-block/prototype/distsim/` +- `learn/projects/sw-block/phases/phase-13-v2-boundary-tests.md` + +Key gate: + +- old recovery work cannot mutate current sender state at any execution stage + +## Step 2: Orchestrated Recovery Prototype + +Purpose: + +- move from good local sender APIs to an actual prototype recovery flow driven by assignment/update intent + +Scope: + +1. assignment/update intent creates or supersedes recovery attempts +2. reconnect / reassignment / catch-up / rebuild decision path +3. sender-group becomes orchestration entry point +4. explicit outcome branching: + - zero-gap fast completion + - positive-gap catch-up + - unrecoverable gap -> `NeedsRebuild` + +Done when: + +1. the prototype expresses a realistic recovery flow from topology/control intent +2. sender-group drives recovery creation, not only unit helpers +3. recovery outcomes are explicit and testable +4. orchestrator responsibility is clear enough to narrow `v2-open-questions.md` item 6 + +Key gate: + +- recovery control is no longer scattered across helper calls; it has one clear orchestration path + +## Step 3: Minimal Historical Data Prototype + +Purpose: + +- prove the recovery model against real data-history assumptions, not only control logic + +Scope: + +1. minimal WAL/history model, not full engine +2. enough to exercise: + - catch-up range + - retained prefix/window + - rebuild fallback + - historical correctness at target LSN +3. enough reservation/recoverability state to make recovery explicit + +Done when: + +1. the prototype can prove why a gap is recoverable or unrecoverable +2. catch-up and rebuild decisions are backed by minimal data/history state +3. `v2-open-questions.md` items 3, 4, 5 are closed or sharply narrowed +4. prototype evidence strengthens acceptance criteria `A5`, `A6`, and `A7` + +Key gate: + +- the prototype must explain why recovery is allowed, not just that policy says it is + +## Step 4: Prototype Scenario Closure + +Purpose: + +- make the prototype itself demonstrate the V2 story end-to-end + +Scope: + +1. map key V2 scenarios onto the prototype +2. express the 4 V2-boundary cases against prototype behavior +3. add one small end-to-end harness inside `sw-block/prototype/` +4. align prototype evidence with acceptance criteria + +Done when: + +1. prototype behavior can be reviewed scenario-by-scenario +2. key V1/V1.5 failures have prototype equivalents +3. prototype outcomes match intended V2 design claims +4. remaining gaps are clearly real-engine gaps, not protocol/prototype ambiguity + +Key gate: + +- a reviewer can trace: + - acceptance criteria -> scenario -> prototype behavior + without hand-waving + +## Gates + +### Gate 1: Design Closed Enough + +Status: + +- mostly passed + +Meaning: + +1. acceptance criteria exist +2. core simulator exists +3. ownership gap from V1.5 is understood + +### Gate 2: Execution Authority Closed + +Passes after Step 1. + +Meaning: + +- stale execution results cannot mutate current authority + +### Gate 3: Orchestrated Recovery Closed + +Passes after Step 2. + +Meaning: + +- recovery flow is controlled by one coherent orchestration model + +### Gate 4: Historical Data Model Closed + +Passes after Step 3. + +Meaning: + +- catch-up vs rebuild is backed by executable data-history logic + +### Gate 5: Prototype Convincing + +Passes after Step 4. + +Meaning: + +- enough evidence exists to choose: + - real V2 engine path + - or `V2.5` redesign + +## Decision Gate After Step 4 + +### Path A: Real V2 Engine Planning + +Choose this if: + +1. prototype control logic is coherent +2. recovery boundary is explicit +3. boundary cases are convincing +4. no major structural flaw remains + +Outputs: + +1. real engine slicing plan +2. migration/integration plan into future standalone `sw-block` +3. explicit non-goals for first production version + +### Path B: V2.5 Redesign + +Choose this if the prototype reveals: + +1. ownership/orchestration still too fragile +2. recovery boundary still too implicit +3. historical correctness model too costly or too unclear +4. too much complexity leaks into the hot path + +Output: + +- write `V2.5` as a design/prototype correction before engine work + +## What Not To Do Yet + +1. no Smart WAL expansion beyond what Step 3 minimally needs +2. no backend/storage-engine redesign +3. no V1 production integration +4. no frontend/wire protocol work +5. no performance optimization as a primary goal + +## Practical Summary + +Current sequence: + +1. finish execution authority +2. build orchestrated recovery +3. add minimal historical-data proof +4. close key scenarios against the prototype +5. decide: + - V2 engine + - or `V2.5`