Browse Source
chore: archive superseded V2 design docs
chore: archive superseded V2 design docs
Copies of design docs removed in Phase 09, preserved in sw-block/docs/archive/ for historical reference. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>feature/sw-block
10 changed files with 2001 additions and 0 deletions
-
28sw-block/docs/archive/design/README.md
-
117sw-block/docs/archive/design/a5-a8-traceability.md
-
403sw-block/docs/archive/design/phase-07-service-slice-plan.md
-
301sw-block/docs/archive/design/phase-08-engine-skeleton-map.md
-
170sw-block/docs/archive/design/v2-engine-readiness-review.md
-
191sw-block/docs/archive/design/v2-engine-slicing-plan.md
-
159sw-block/docs/archive/design/v2-first-slice-sender-ownership.md
-
194sw-block/docs/archive/design/v2-first-slice-session-ownership.md
-
199sw-block/docs/archive/design/v2-production-roadmap.md
-
239sw-block/docs/archive/design/v2-prototype-roadmap-and-gates.md
@ -0,0 +1,28 @@ |
|||||
|
# Design Archive |
||||
|
|
||||
|
This directory contains historical `sw-block` design/planning documents that are still worth keeping as references, but are no longer the main entrypoints for current work. |
||||
|
|
||||
|
Use `sw-block/design/` for active design and process documents. |
||||
|
Use `sw-block/.private/phase/` for current phase contracts, logs, and slice-level execution packages. |
||||
|
|
||||
|
## Archived Here |
||||
|
|
||||
|
- `v2-production-roadmap.md` |
||||
|
- `v2-engine-readiness-review.md` |
||||
|
- `v2-engine-slicing-plan.md` |
||||
|
- `v2-prototype-roadmap-and-gates.md` |
||||
|
- `phase-07-service-slice-plan.md` |
||||
|
- `phase-08-engine-skeleton-map.md` |
||||
|
- `v2-first-slice-session-ownership.md` |
||||
|
- `v2-first-slice-sender-ownership.md` |
||||
|
- `a5-a8-traceability.md` |
||||
|
|
||||
|
## Why Archived |
||||
|
|
||||
|
These documents are useful for: |
||||
|
|
||||
|
1. historical decision context |
||||
|
2. earlier slice/phase rationale |
||||
|
3. traceability for passed reviews and planning gates |
||||
|
|
||||
|
They are not the canonical source for the current phase roadmap. |
||||
@ -0,0 +1,117 @@ |
|||||
|
# A5-A8 Acceptance Traceability |
||||
|
|
||||
|
Date: 2026-03-29 |
||||
|
Status: historical evidence traceability |
||||
|
|
||||
|
## Purpose |
||||
|
|
||||
|
Map each acceptance criterion to specific executable evidence. |
||||
|
Two evidence layers: |
||||
|
- **Simulator** (distsim): protocol-level proof |
||||
|
- **Prototype** (enginev2): ownership/session-level proof |
||||
|
|
||||
|
--- |
||||
|
|
||||
|
## A5: Non-Convergent Catch-Up Escalates Explicitly |
||||
|
|
||||
|
**Must prove**: tail-chasing or failed catch-up does not pretend success. |
||||
|
|
||||
|
**Pass condition**: explicit `CatchingUp → NeedsRebuild` transition. |
||||
|
|
||||
|
| Evidence | Test | File | Layer | Status | |
||||
|
|----------|------|------|-------|--------| |
||||
|
| Tail-chasing converges or aborts | `TestS6_TailChasing_ConvergesOrAborts` | `cluster_test.go` | distsim | PASS | |
||||
|
| Tail-chasing non-convergent → NeedsRebuild | `TestS6_TailChasing_NonConvergent_EscalatesToNeedsRebuild` | `phase02_advanced_test.go` | distsim | PASS | |
||||
|
| Catch-up timeout → NeedsRebuild | `TestP03_CatchupTimeout_EscalatesToNeedsRebuild` | `phase03_timeout_test.go` | distsim | PASS | |
||||
|
| Reservation expiry aborts catch-up | `TestReservationExpiryAbortsCatchup` | `cluster_test.go` | distsim | PASS | |
||||
|
| Flapping budget exceeded → NeedsRebuild | `TestP02_S5_FlappingExceedsBudget_EscalatesToNeedsRebuild` | `phase02_advanced_test.go` | distsim | PASS | |
||||
|
| Catch-up converges or escalates (I3) | `TestI3_CatchUpConvergesOrEscalates` | `phase045_crash_test.go` | distsim | PASS | |
||||
|
| Catch-up timeout in enginev2 | `TestE2E_NeedsRebuild_Escalation` | `p2_test.go` | enginev2 | PASS | |
||||
|
|
||||
|
**Verdict**: A5 is well-covered. Both simulator and prototype prove explicit escalation. No pretend-success path exists. |
||||
|
|
||||
|
--- |
||||
|
|
||||
|
## A6: Recoverability Boundary Is Explicit |
||||
|
|
||||
|
**Must prove**: recoverable vs unrecoverable gap is decided explicitly. |
||||
|
|
||||
|
**Pass condition**: recovery aborts when reservation/payload availability is lost; rebuild is explicit fallback. |
||||
|
|
||||
|
| Evidence | Test | File | Layer | Status | |
||||
|
|----------|------|------|-------|--------| |
||||
|
| Reservation expiry aborts catch-up | `TestReservationExpiryAbortsCatchup` | `cluster_test.go` | distsim | PASS | |
||||
|
| WAL GC beyond replica → NeedsRebuild | `TestI5_CheckpointGC_PreservesAckedBoundary` | `phase045_crash_test.go` | distsim | PASS | |
||||
|
| Rebuild from snapshot + tail | `TestReplicaRebuildFromSnapshotAndTail` | `cluster_test.go` | distsim | PASS | |
||||
|
| Smart WAL: resolvable → unresolvable | `TestP02_SmartWAL_RecoverableThenUnrecoverable` | `phase02_advanced_test.go` | distsim | PASS | |
||||
|
| Time-varying payload availability | `TestP02_SmartWAL_TimeVaryingAvailability` | `phase02_advanced_test.go` | distsim | PASS | |
||||
|
| RecoverableLSN is replayability proof | `RecoverableLSN()` in `storage.go` | `storage.go` | distsim | Implemented | |
||||
|
| Handshake outcome: NeedsRebuild | `TestExec_HandshakeOutcome_NeedsRebuild_InvalidatesSession` | `execution_test.go` | enginev2 | PASS | |
||||
|
|
||||
|
**Verdict**: A6 is covered. Recovery boundary is decided by explicit reservation + recoverability check, not by optimistic assumption. `RecoverableLSN()` verifies contiguous WAL coverage. |
||||
|
|
||||
|
--- |
||||
|
|
||||
|
## A7: Historical Data Correctness Holds |
||||
|
|
||||
|
**Must prove**: recovered data for target LSN is historically correct; current extent cannot fake old history. |
||||
|
|
||||
|
**Pass condition**: snapshot + tail rebuild matches reference; current-extent reconstruction of old LSN fails correctness. |
||||
|
|
||||
|
| Evidence | Test | File | Layer | Status | |
||||
|
|----------|------|------|-------|--------| |
||||
|
| Snapshot + tail matches reference | `TestReplicaRebuildFromSnapshotAndTail` | `cluster_test.go` | distsim | PASS | |
||||
|
| Historical state not reconstructable after GC | `TestA7_HistoricalState_NotReconstructableAfterGC` | `phase045_crash_test.go` | distsim | PASS | |
||||
|
| `CanReconstructAt()` rejects faked history | `CanReconstructAt()` in `storage.go` | `storage.go` | distsim | Implemented | |
||||
|
| Checkpoint does not leak applied state | `TestI2_CheckpointDoesNotLeakAppliedState` | `phase045_crash_test.go` | distsim | PASS | |
||||
|
| Extent-referenced resolvable records | `TestExtentReferencedResolvableRecordsAreRecoverable` | `cluster_test.go` | distsim | PASS | |
||||
|
| Extent-referenced unresolvable → rebuild | `TestExtentReferencedUnresolvableForcesRebuild` | `cluster_test.go` | distsim | PASS | |
||||
|
| ACK'd flush recoverable after crash (I1) | `TestI1_AckedFlush_RecoverableAfterPrimaryCrash` | `phase045_crash_test.go` | distsim | PASS | |
||||
|
|
||||
|
**Verdict**: A7 is now covered with the Phase 4.5 crash-consistency additions. The critical gap ("current extent cannot fake old history") is proven by `CanReconstructAt()` + `TestA7_HistoricalState_NotReconstructableAfterGC`. |
||||
|
|
||||
|
--- |
||||
|
|
||||
|
## A8: Durability Mode Semantics Are Correct |
||||
|
|
||||
|
**Must prove**: best_effort, sync_all, sync_quorum behave as intended under mixed replica states. |
||||
|
|
||||
|
**Pass condition**: sync_all strict, sync_quorum commits only with true durable quorum, invalid topology rejected. |
||||
|
|
||||
|
| Evidence | Test | File | Layer | Status | |
||||
|
|----------|------|------|-------|--------| |
||||
|
| sync_quorum continues with one lagging | `TestSyncQuorumContinuesWithOneLaggingReplica` | `cluster_test.go` | distsim | PASS | |
||||
|
| sync_all blocks with one lagging | `TestSyncAllBlocksWithOneLaggingReplica` | `cluster_test.go` | distsim | PASS | |
||||
|
| sync_quorum mixed states | `TestSyncQuorumWithMixedReplicaStates` | `cluster_test.go` | distsim | PASS | |
||||
|
| sync_all mixed states | `TestSyncAllBlocksWithMixedReplicaStates` | `cluster_test.go` | distsim | PASS | |
||||
|
| Barrier timeout: sync_all blocked | `TestP03_BarrierTimeout_SyncAll_Blocked` | `phase03_timeout_test.go` | distsim | PASS | |
||||
|
| Barrier timeout: sync_quorum commits | `TestP03_BarrierTimeout_SyncQuorum_StillCommits` | `phase03_timeout_test.go` | distsim | PASS | |
||||
|
| Promotion uses RecoverableLSN | `EvaluateCandidateEligibility()` | `cluster.go` | distsim | Implemented | |
||||
|
| Promoted replica has committed prefix (I4) | `TestI4_PromotedReplica_HasCommittedPrefix` | `phase045_crash_test.go` | distsim | PASS | |
||||
|
|
||||
|
**Verdict**: A8 is well-covered. sync_all is strict (blocks on lagging), sync_quorum uses true durable quorum (not connection count). Promotion now uses `RecoverableLSN()` for committed-prefix check. |
||||
|
|
||||
|
--- |
||||
|
|
||||
|
## Summary |
||||
|
|
||||
|
| Criterion | Simulator Evidence | Prototype Evidence | Status | |
||||
|
|-----------|-------------------|-------------------|--------| |
||||
|
| A5 (catch-up escalation) | 6 tests | 1 test | **Strong** | |
||||
|
| A6 (recoverability boundary) | 6 tests + RecoverableLSN() | 1 test | **Strong** | |
||||
|
| A7 (historical correctness) | 7 tests + CanReconstructAt() | — | **Strong** (new in Phase 4.5) | |
||||
|
| A8 (durability modes) | 7 tests + RecoverableLSN() | — | **Strong** | |
||||
|
|
||||
|
**Total executable evidence**: 26 simulator tests + 2 prototype tests + 2 new storage methods. |
||||
|
|
||||
|
All A5-A8 acceptance criteria have direct test evidence. No criterion depends solely on design-doc claims. |
||||
|
|
||||
|
--- |
||||
|
|
||||
|
## Still Open (Not Blocking) |
||||
|
|
||||
|
| Item | Priority | Why not blocking | |
||||
|
|------|----------|-----------------| |
||||
|
| Predicate exploration / adversarial search | P2 | Manual scenarios already cover known failure classes | |
||||
|
| Catch-up convergence under sustained load | P2 | I3 proves escalation; load-rate modeling is optimization | |
||||
|
| A5-A8 in a single grouped runner view | P3 | Traceability doc serves as grouped evidence for now | |
||||
@ -0,0 +1,403 @@ |
|||||
|
# Phase 07 Service-Slice Plan |
||||
|
|
||||
|
Date: 2026-03-30 |
||||
|
Status: historical phase-planning artifact |
||||
|
Scope: `Phase 07 P0` |
||||
|
|
||||
|
## Purpose |
||||
|
|
||||
|
Define the first real-system service slice that will host the V2 engine, choose the first concrete integration path in the existing codebase, and map engine adapters onto real modules. |
||||
|
|
||||
|
This is a planning document. It does not claim the integration already works. |
||||
|
|
||||
|
## Decision |
||||
|
|
||||
|
The first service slice should be: |
||||
|
|
||||
|
- a single `blockvol` primary on a real volume server |
||||
|
- with one replica target (`RF=2` path) |
||||
|
- driven by the existing master heartbeat / assignment loop |
||||
|
- using the V2 engine only for replication recovery ownership / planning / execution |
||||
|
|
||||
|
This is the narrowest real-system slice that still exercises: |
||||
|
|
||||
|
1. real assignment delivery |
||||
|
2. real epoch and failover signals |
||||
|
3. real volume-server lifecycle |
||||
|
4. real WAL/checkpoint/base-image truth |
||||
|
5. real changed-address / reconnect behavior |
||||
|
|
||||
|
It is narrow enough to avoid reopening the whole system, but real enough to stop hiding behind engine-local mocks. |
||||
|
|
||||
|
## Why This Slice |
||||
|
|
||||
|
This slice is the right first integration target because: |
||||
|
|
||||
|
1. `weed/server/master_grpc_server.go` already delivers block-volume assignments over heartbeat |
||||
|
2. `weed/server/master_block_failover.go` already owns failover / promotion / pending rebuild decisions |
||||
|
3. `weed/storage/blockvol/blockvol.go` already owns the current replication runtime (`shipperGroup`, receiver, WAL retention, checkpoint state) |
||||
|
4. the existing V1/V1.5 failure history is concentrated in exactly this master <-> volume-server <-> blockvol path |
||||
|
|
||||
|
So this slice gives maximum validation value with minimum new surface. |
||||
|
|
||||
|
## First Concrete Integration Path |
||||
|
|
||||
|
The first integration path should be: |
||||
|
|
||||
|
1. master receives volume-server heartbeat |
||||
|
2. master updates block registry and emits `BlockVolumeAssignment` |
||||
|
3. volume server receives assignment |
||||
|
4. block volume adapter converts assignment + local storage state into V2 engine inputs |
||||
|
5. V2 engine drives sender/session/recovery state |
||||
|
6. existing block-volume runtime executes the actual data-path work under engine decisions |
||||
|
|
||||
|
In code, that path starts here: |
||||
|
|
||||
|
- master side: |
||||
|
- `weed/server/master_grpc_server.go` |
||||
|
- `weed/server/master_block_failover.go` |
||||
|
- `weed/server/master_block_registry.go` |
||||
|
- volume / storage side: |
||||
|
- `weed/storage/blockvol/blockvol.go` |
||||
|
- `weed/storage/blockvol/recovery.go` |
||||
|
- `weed/storage/blockvol/wal_shipper.go` |
||||
|
- assignment-handling code under `weed/storage/blockvol/` |
||||
|
- V2 engine side: |
||||
|
- `sw-block/engine/replication/` |
||||
|
|
||||
|
## Service-Slice Boundaries |
||||
|
|
||||
|
### In-process placement |
||||
|
|
||||
|
The V2 engine should initially live: |
||||
|
|
||||
|
- in-process with the volume server / `blockvol` runtime |
||||
|
- not in master |
||||
|
- not as a separate service yet |
||||
|
|
||||
|
Reason: |
||||
|
|
||||
|
- the engine needs local access to storage truth and local recovery execution |
||||
|
- master should remain control-plane authority, not recovery executor |
||||
|
|
||||
|
### Control-plane boundary |
||||
|
|
||||
|
Master remains authoritative for: |
||||
|
|
||||
|
1. epoch |
||||
|
2. role / assignment |
||||
|
3. promotion / failover decision |
||||
|
4. replica membership |
||||
|
|
||||
|
The engine consumes these as control inputs. It does not replace master failover policy in `Phase 07`. |
||||
|
|
||||
|
### Control-Over-Heartbeat Upgrade Path |
||||
|
|
||||
|
For the first V2 product path, the recommended direction is: |
||||
|
|
||||
|
- reuse the existing master <-> volume-server heartbeat path as the control carrier |
||||
|
- upgrade the block-specific control semantics carried on that path |
||||
|
- do not immediately invent a separate control service or assignment channel |
||||
|
|
||||
|
Why: |
||||
|
|
||||
|
1. this is the real Seaweed path already carrying block assignments and confirmations today |
||||
|
2. this gives the fastest route to a real integrated control path |
||||
|
3. it preserves compatibility with existing Seaweed master/volume-server semantics while V2 hardens its own control truth |
||||
|
|
||||
|
Concretely, the current V1 path already provides: |
||||
|
|
||||
|
1. block assignments delivered in heartbeat responses from `weed/server/master_grpc_server.go` |
||||
|
2. assignment application on the volume server in `weed/server/volume_grpc_client_to_master.go` and `weed/server/volume_server_block.go` |
||||
|
3. assignment confirmation and address-change refresh driven by later heartbeats in `weed/server/master_grpc_server.go` and `weed/server/master_block_registry.go` |
||||
|
4. immediate block heartbeat on selected shipper state changes in `weed/server/volume_grpc_client_to_master.go` |
||||
|
|
||||
|
What should be upgraded for V2 is not mainly the transport, but the control contract carried on it: |
||||
|
|
||||
|
1. stable `ReplicaID` |
||||
|
2. explicit `Epoch` |
||||
|
3. explicit role / assignment authority |
||||
|
4. explicit apply/confirm semantics |
||||
|
5. explicit stale assignment rejection |
||||
|
6. explicit address-change refresh as endpoint change, not identity change |
||||
|
|
||||
|
Current cadence note: |
||||
|
|
||||
|
- the block volume heartbeat is periodic (`5 * sleepInterval`) with some immediate state-change heartbeats |
||||
|
- this is acceptable as the first hardening carrier |
||||
|
- it should not be assumed to be the final control responsiveness model |
||||
|
|
||||
|
Deferred design decision: |
||||
|
|
||||
|
- whether block control should eventually move beyond heartbeat-only carriage into a more explicit control/assignment channel should be decided only after the `Phase 08 P1` real control-delivery path exists and can be measured |
||||
|
|
||||
|
That later decision should be based on: |
||||
|
|
||||
|
1. failover / reassignment responsiveness |
||||
|
2. assignment confirmation precision |
||||
|
3. operational complexity |
||||
|
4. whether heartbeat carriage remains too coarse for the block-control path |
||||
|
|
||||
|
Until then, the preferred direction is: |
||||
|
|
||||
|
- strengthen block control semantics over the existing heartbeat path |
||||
|
- do not prematurely create a second control plane |
||||
|
|
||||
|
### Storage boundary |
||||
|
|
||||
|
`blockvol` remains authoritative for: |
||||
|
|
||||
|
1. WAL head / retention reality |
||||
|
2. checkpoint/base-image reality |
||||
|
3. actual catch-up streaming |
||||
|
4. actual rebuild transfer / restore operations |
||||
|
|
||||
|
The engine consumes these as storage truth and recovery execution capabilities. It does not replace the storage backend in `Phase 07`. |
||||
|
|
||||
|
## First-Slice Identity Mapping |
||||
|
|
||||
|
This must be explicit in the first integration slice. |
||||
|
|
||||
|
For `RF=2` on the existing master / block registry path: |
||||
|
|
||||
|
- stable engine `ReplicaID` should be derived from: |
||||
|
- `<volume-name>/<replica-server-id>` |
||||
|
- not from: |
||||
|
- `DataAddr` |
||||
|
- `CtrlAddr` |
||||
|
- heartbeat transport endpoint |
||||
|
|
||||
|
For this slice, the adapter should map: |
||||
|
|
||||
|
1. `ReplicaID` |
||||
|
- from master/block-registry identity for the replica host entry |
||||
|
|
||||
|
2. `Endpoint` |
||||
|
- from the current replica receiver/data/control addresses reported by the real runtime |
||||
|
|
||||
|
3. `Epoch` |
||||
|
- from the confirmed master assignment for the volume |
||||
|
|
||||
|
4. `SessionKind` |
||||
|
- from master-driven recovery intent / role transition outcome |
||||
|
|
||||
|
This is a hard first-slice requirement because address refresh must not collapse identity back into endpoint-shaped keys. |
||||
|
|
||||
|
## Adapter Mapping |
||||
|
|
||||
|
### 1. ControlPlaneAdapter |
||||
|
|
||||
|
Engine interface today: |
||||
|
|
||||
|
- `HandleHeartbeat(serverID, volumes)` |
||||
|
- `HandleFailover(deadServerID)` |
||||
|
|
||||
|
Real mapping should be: |
||||
|
|
||||
|
- master-side source: |
||||
|
- `weed/server/master_grpc_server.go` |
||||
|
- `weed/server/master_block_failover.go` |
||||
|
- `weed/server/master_block_registry.go` |
||||
|
- volume-server side sink: |
||||
|
- assignment receive/apply path in `weed/storage/blockvol/` |
||||
|
|
||||
|
Recommended real shape: |
||||
|
|
||||
|
- do not literally push raw heartbeat messages into the engine |
||||
|
- instead introduce a thin adapter that converts confirmed master assignment state into: |
||||
|
- stable `ReplicaID` |
||||
|
- endpoint set |
||||
|
- epoch |
||||
|
- recovery target kind |
||||
|
|
||||
|
That keeps master as control owner and the engine as execution owner. |
||||
|
|
||||
|
Important note: |
||||
|
|
||||
|
- the adapter should treat heartbeat as the transport carrier, not as the final protocol shape |
||||
|
- block-control semantics should be made explicit over that carrier |
||||
|
- if a later phase concludes that heartbeat-only carriage is too coarse, that should be a separate design decision after the real hardening path is measured |
||||
|
|
||||
|
### 2. StorageAdapter |
||||
|
|
||||
|
Engine interface today: |
||||
|
|
||||
|
- `GetRetainedHistory()` |
||||
|
- `PinSnapshot(lsn)` / `ReleaseSnapshot(pin)` |
||||
|
- `PinWALRetention(startLSN)` / `ReleaseWALRetention(pin)` |
||||
|
- `PinFullBase(committedLSN)` / `ReleaseFullBase(pin)` |
||||
|
|
||||
|
Real mapping should be: |
||||
|
|
||||
|
- retained history source: |
||||
|
- current WAL head/tail/checkpoint state from `weed/storage/blockvol/blockvol.go` |
||||
|
- recovery helpers in `weed/storage/blockvol/recovery.go` |
||||
|
- WAL retention pin: |
||||
|
- existing retention-floor / replica-aware WAL retention machinery around `shipperGroup` |
||||
|
- snapshot pin: |
||||
|
- existing snapshot/checkpoint artifacts in `blockvol` |
||||
|
- full-base pin: |
||||
|
- explicit pinned full-extent export or equivalent consistent base handle from `blockvol` |
||||
|
|
||||
|
Important constraint: |
||||
|
|
||||
|
- `Phase 07` must not fake this by reconstructing `RetainedHistory` from tests or metadata alone |
||||
|
|
||||
|
### 3. Execution Driver / Executor hookup |
||||
|
|
||||
|
Engine side already has: |
||||
|
|
||||
|
- planner/executor split in `sw-block/engine/replication/driver.go` |
||||
|
- stepwise executors in `sw-block/engine/replication/executor.go` |
||||
|
|
||||
|
Real mapping should be: |
||||
|
|
||||
|
- engine planner decides: |
||||
|
- zero-gap / catch-up / rebuild |
||||
|
- trusted-base requirement |
||||
|
- replayable-tail requirement |
||||
|
- blockvol runtime performs: |
||||
|
- actual WAL catch-up transport |
||||
|
- actual snapshot/base transfer |
||||
|
- actual truncation / apply operations |
||||
|
|
||||
|
Recommended split: |
||||
|
|
||||
|
- engine owns contract and state transitions |
||||
|
- blockvol adapter owns concrete I/O work |
||||
|
|
||||
|
## First-Slice Acceptance Rule |
||||
|
|
||||
|
For the first integration slice, this is a hard rule: |
||||
|
|
||||
|
- `blockvol` may execute recovery I/O |
||||
|
- `blockvol` must not own recovery policy |
||||
|
|
||||
|
Concretely, `blockvol` must not decide: |
||||
|
|
||||
|
1. zero-gap vs catch-up vs rebuild |
||||
|
2. trusted-base validity |
||||
|
3. replayable-tail sufficiency |
||||
|
4. whether rebuild fallback is required |
||||
|
|
||||
|
Those decisions must remain in the V2 engine. |
||||
|
|
||||
|
The bridge may translate engine decisions into concrete blockvol actions, but it must not re-decide recovery policy underneath the engine. |
||||
|
|
||||
|
## First Product Path |
||||
|
|
||||
|
The first product path should be: |
||||
|
|
||||
|
- `RF=2` block volume replication on the existing heartbeat/assignment loop |
||||
|
- primary + one replica |
||||
|
- failover / reconnect / changed-address handling |
||||
|
- rebuild as the formal non-catch-up recovery path |
||||
|
|
||||
|
This is the right first path because it exercises the core correctness boundary without introducing N-replica coordination complexity too early. |
||||
|
|
||||
|
## What Must Be Replaced First |
||||
|
|
||||
|
Current engine-stage pieces that are still mock/test-only or too abstract: |
||||
|
|
||||
|
### Replace first |
||||
|
|
||||
|
1. `mockStorage` in engine tests |
||||
|
- replace with a real `blockvol`-backed `StorageAdapter` |
||||
|
|
||||
|
2. synthetic control events in engine tests |
||||
|
- replace with assignment-driven events from the real master/volume-server path |
||||
|
|
||||
|
3. convenience recovery completion wrappers |
||||
|
- keep them test-only |
||||
|
- real integration should use planner + executor + storage work loop |
||||
|
|
||||
|
### Can remain temporarily abstract in Phase 07 P0/P1 |
||||
|
|
||||
|
1. `ControlPlaneAdapter` exact public shape |
||||
|
- can remain thin while the integration path is being chosen |
||||
|
|
||||
|
2. async production scheduler details |
||||
|
- executor can still be driven by a service loop before full background-task architecture is finalized |
||||
|
|
||||
|
## Recommended Concrete Modules |
||||
|
|
||||
|
### Engine stays here |
||||
|
|
||||
|
- `sw-block/engine/replication/` |
||||
|
|
||||
|
### First real adapter package should be added near blockvol |
||||
|
|
||||
|
Recommended initial location: |
||||
|
|
||||
|
- `weed/storage/blockvol/v2bridge/` |
||||
|
|
||||
|
Reason: |
||||
|
|
||||
|
- keeps V2 engine independent under `sw-block/` |
||||
|
- keeps real-system glue close to blockvol storage truth |
||||
|
- avoids copying engine logic into `weed/` |
||||
|
|
||||
|
Suggested contents: |
||||
|
|
||||
|
1. `control_adapter.go` |
||||
|
- convert master assignment / local apply path into engine intents |
||||
|
|
||||
|
2. `storage_adapter.go` |
||||
|
- expose retained history, pin/release, trusted-base export handles from real blockvol state |
||||
|
|
||||
|
3. `executor_bridge.go` |
||||
|
- translate engine executor steps into actual blockvol recovery actions |
||||
|
|
||||
|
4. `observe_adapter.go` |
||||
|
- map engine status/logs into service-visible diagnostics |
||||
|
|
||||
|
## First Failure Replay Set For Phase 07 |
||||
|
|
||||
|
The first real-system replay set should be: |
||||
|
|
||||
|
1. changed-address restart |
||||
|
- current risk: old identity/address coupling reappears in service glue |
||||
|
|
||||
|
2. stale epoch / stale result after failover |
||||
|
- current risk: master and engine disagree on authority timing |
||||
|
|
||||
|
3. unreplayable-tail rebuild fallback |
||||
|
- current risk: service glue over-trusts checkpoint/base availability |
||||
|
|
||||
|
4. plan/execution cleanup after resource failure |
||||
|
- current risk: blockvol-side resource failures leave engine or service state dangling |
||||
|
|
||||
|
5. primary failover to replica with rebuild pending on old primary reconnect |
||||
|
- current risk: old V1/V1.5 semantics leak back into reconnect handling |
||||
|
|
||||
|
## Non-Goals For This Slice |
||||
|
|
||||
|
Do not use `Phase 07` to: |
||||
|
|
||||
|
1. widen catch-up semantics |
||||
|
2. add smart rebuild optimizations |
||||
|
3. redesign all blockvol internals |
||||
|
4. replace the full V1 runtime in one move |
||||
|
5. claim production readiness |
||||
|
|
||||
|
## Deliverables For Phase 07 P0 |
||||
|
|
||||
|
A good `P0` delivery should include: |
||||
|
|
||||
|
1. chosen service slice |
||||
|
2. chosen integration path in the current repo |
||||
|
3. adapter-to-module mapping |
||||
|
4. list of test-only adapters to replace first |
||||
|
5. first failure replay set |
||||
|
6. explicit note of what remains outside this first slice |
||||
|
|
||||
|
## Short Form |
||||
|
|
||||
|
`Phase 07 P0` should start with: |
||||
|
|
||||
|
- engine in `sw-block/engine/replication/` |
||||
|
- bridge in `weed/storage/blockvol/v2bridge/` |
||||
|
- first real slice = blockvol primary + one replica on the existing master heartbeat / assignment path |
||||
|
- `ReplicaID = <volume-name>/<replica-server-id>` for the first slice |
||||
|
- `blockvol` executes I/O but does not own recovery policy |
||||
|
- first product path = `RF=2` failover/reconnect/rebuild correctness |
||||
@ -0,0 +1,301 @@ |
|||||
|
# Phase 08 Engine Skeleton Map |
||||
|
|
||||
|
Date: 2026-03-31 |
||||
|
Status: historical phase map |
||||
|
Purpose: provide a short structural map for the `Phase 08` hardening path so implementation can move faster without reopening accepted V2 boundaries |
||||
|
|
||||
|
## Scope |
||||
|
|
||||
|
This is not the final standalone `sw-block` architecture. |
||||
|
|
||||
|
It is the shortest useful engine skeleton for the accepted `Phase 08` hardening path: |
||||
|
|
||||
|
- `RF=2` |
||||
|
- `sync_all` |
||||
|
- existing `Seaweed` master / volume-server heartbeat path |
||||
|
- V2 engine owns recovery policy |
||||
|
- `blockvol` remains the execution backend |
||||
|
|
||||
|
## Module Map |
||||
|
|
||||
|
### 1. Control plane |
||||
|
|
||||
|
Role: |
||||
|
|
||||
|
- authoritative control truth |
||||
|
|
||||
|
Primary sources: |
||||
|
|
||||
|
- `weed/server/master_grpc_server.go` |
||||
|
- `weed/server/master_block_registry.go` |
||||
|
- `weed/server/master_block_failover.go` |
||||
|
- `weed/server/volume_grpc_client_to_master.go` |
||||
|
|
||||
|
What it produces: |
||||
|
|
||||
|
- confirmed assignment |
||||
|
- `Epoch` |
||||
|
- target `Role` |
||||
|
- failover / promotion / reassignment result |
||||
|
- stable server identity |
||||
|
|
||||
|
### 2. Control bridge |
||||
|
|
||||
|
Role: |
||||
|
|
||||
|
- translate real control truth into V2 engine intent |
||||
|
|
||||
|
Primary files: |
||||
|
|
||||
|
- `weed/storage/blockvol/v2bridge/control.go` |
||||
|
- `sw-block/bridge/blockvol/control_adapter.go` |
||||
|
- entry path in `weed/server/volume_server_block.go` |
||||
|
|
||||
|
What it produces: |
||||
|
|
||||
|
- `AssignmentIntent` |
||||
|
- stable `ReplicaID` |
||||
|
- `Endpoint` |
||||
|
- `SessionKind` |
||||
|
|
||||
|
### 3. Engine runtime |
||||
|
|
||||
|
Role: |
||||
|
|
||||
|
- recovery-policy core |
||||
|
|
||||
|
Primary files: |
||||
|
|
||||
|
- `sw-block/engine/replication/orchestrator.go` |
||||
|
- `sw-block/engine/replication/driver.go` |
||||
|
- `sw-block/engine/replication/executor.go` |
||||
|
- `sw-block/engine/replication/sender.go` |
||||
|
- `sw-block/engine/replication/history.go` |
||||
|
|
||||
|
What it decides: |
||||
|
|
||||
|
- zero-gap / catch-up / needs-rebuild |
||||
|
- sender/session ownership |
||||
|
- stale authority rejection |
||||
|
- resource acquisition / release |
||||
|
- rebuild source selection |
||||
|
|
||||
|
### 4. Storage bridge |
||||
|
|
||||
|
Role: |
||||
|
|
||||
|
- translate real blockvol storage truth and execution capability into engine-facing adapters |
||||
|
|
||||
|
Primary files: |
||||
|
|
||||
|
- `weed/storage/blockvol/v2bridge/reader.go` |
||||
|
- `weed/storage/blockvol/v2bridge/pinner.go` |
||||
|
- `weed/storage/blockvol/v2bridge/executor.go` |
||||
|
- `sw-block/bridge/blockvol/storage_adapter.go` |
||||
|
|
||||
|
What it provides: |
||||
|
|
||||
|
- `RetainedHistory` |
||||
|
- WAL retention pin / release |
||||
|
- snapshot pin / release |
||||
|
- full-base pin / release |
||||
|
- WAL scan execution |
||||
|
|
||||
|
### 5. Block runtime |
||||
|
|
||||
|
Role: |
||||
|
|
||||
|
- execute real I/O |
||||
|
|
||||
|
Primary files: |
||||
|
|
||||
|
- `weed/storage/blockvol/blockvol.go` |
||||
|
- `weed/storage/blockvol/replica_apply.go` |
||||
|
- `weed/storage/blockvol/replica_barrier.go` |
||||
|
- `weed/storage/blockvol/recovery.go` |
||||
|
- `weed/storage/blockvol/rebuild.go` |
||||
|
- `weed/storage/blockvol/wal_shipper.go` |
||||
|
|
||||
|
What it owns: |
||||
|
|
||||
|
- WAL |
||||
|
- extent |
||||
|
- flusher |
||||
|
- checkpoint / superblock |
||||
|
- receiver / shipper |
||||
|
- rebuild server |
||||
|
|
||||
|
## Execution Order |
||||
|
|
||||
|
### Control path |
||||
|
|
||||
|
```text |
||||
|
master heartbeat / failover truth |
||||
|
-> BlockVolumeAssignment |
||||
|
-> volume server ProcessAssignments |
||||
|
-> v2bridge control conversion |
||||
|
-> engine ProcessAssignment |
||||
|
-> sender/session state updated |
||||
|
``` |
||||
|
|
||||
|
### Catch-up path |
||||
|
|
||||
|
```text |
||||
|
assignment accepted |
||||
|
-> engine reads retained history |
||||
|
-> engine plans catch-up |
||||
|
-> storage bridge pins WAL retention |
||||
|
-> engine executor drives v2bridge executor |
||||
|
-> blockvol scans WAL / ships entries |
||||
|
-> engine completes session |
||||
|
``` |
||||
|
|
||||
|
### Rebuild path |
||||
|
|
||||
|
```text |
||||
|
assignment accepted |
||||
|
-> engine detects NeedsRebuild |
||||
|
-> engine selects rebuild source |
||||
|
-> storage bridge pins snapshot/full-base/tail |
||||
|
-> executor drives transfer path |
||||
|
-> blockvol performs restore / replay work |
||||
|
-> engine completes rebuild |
||||
|
``` |
||||
|
|
||||
|
### Local durability path |
||||
|
|
||||
|
```text |
||||
|
WriteLBA / Trim |
||||
|
-> WAL append |
||||
|
-> shipping / barrier |
||||
|
-> client-visible durability decision |
||||
|
-> flusher writes extent |
||||
|
-> checkpoint advances |
||||
|
-> retention floor decides WAL reclaimability |
||||
|
``` |
||||
|
|
||||
|
## Interim Fields |
||||
|
|
||||
|
These are currently acceptable only as explicit hardening carry-forwards: |
||||
|
|
||||
|
### `localServerID` |
||||
|
|
||||
|
Current source: |
||||
|
|
||||
|
- `BlockService.listenAddr` |
||||
|
|
||||
|
Meaning: |
||||
|
|
||||
|
- temporary local identity source for replica/rebuild-side assignment translation |
||||
|
|
||||
|
Status: |
||||
|
|
||||
|
- interim only |
||||
|
- should become registry-assigned stable server identity later |
||||
|
|
||||
|
### `CommittedLSN = CheckpointLSN` |
||||
|
|
||||
|
Current source: |
||||
|
|
||||
|
- `v2bridge.Reader` / `BlockVol.StatusSnapshot()` |
||||
|
|
||||
|
Meaning: |
||||
|
|
||||
|
- current V1-style interim mapping where committed truth collapses to local checkpoint truth |
||||
|
|
||||
|
Status: |
||||
|
|
||||
|
- not final V2 truth |
||||
|
- must become a gate decision before a production-candidate phase |
||||
|
|
||||
|
### heartbeat as control carrier |
||||
|
|
||||
|
Current source: |
||||
|
|
||||
|
- existing master <-> volume-server heartbeat path |
||||
|
|
||||
|
Meaning: |
||||
|
|
||||
|
- current transport for assignment/control delivery |
||||
|
|
||||
|
Status: |
||||
|
|
||||
|
- acceptable as current carrier |
||||
|
- not yet a final proof that no separate control channel will ever be needed |
||||
|
|
||||
|
## Hard Gates |
||||
|
|
||||
|
These should remain explicit in `Phase 08`: |
||||
|
|
||||
|
### Gate 1: committed truth |
||||
|
|
||||
|
Before production-candidate: |
||||
|
|
||||
|
- either separate `CommittedLSN` from `CheckpointLSN` |
||||
|
- or explicitly bound the first candidate path to currently proven pre-checkpoint replay behavior |
||||
|
|
||||
|
### Gate 2: live control delivery |
||||
|
|
||||
|
Required: |
||||
|
|
||||
|
- real assignment delivery must reach the engine on the live path |
||||
|
- not only converter-level proof |
||||
|
|
||||
|
### Gate 3: integrated catch-up closure |
||||
|
|
||||
|
Required: |
||||
|
|
||||
|
- engine -> executor -> `v2bridge` -> blockvol must be proven as one live chain |
||||
|
- not planner proof plus direct WAL-scan proof as separate evidence |
||||
|
|
||||
|
### Gate 4: first rebuild execution path |
||||
|
|
||||
|
Required: |
||||
|
|
||||
|
- rebuild must not remain only a detection outcome |
||||
|
- the chosen product path needs one real executable rebuild closure |
||||
|
|
||||
|
### Gate 5: unified replay |
||||
|
|
||||
|
Required: |
||||
|
|
||||
|
- after control and execution closure land, rerun the accepted failure-class set on the unified live path |
||||
|
|
||||
|
## Reuse Map |
||||
|
|
||||
|
### Reuse directly |
||||
|
|
||||
|
- `weed/server/master_grpc_server.go` |
||||
|
- `weed/server/volume_grpc_client_to_master.go` |
||||
|
- `weed/server/volume_server_block.go` |
||||
|
- `weed/server/master_block_registry.go` |
||||
|
- `weed/server/master_block_failover.go` |
||||
|
- `weed/storage/blockvol/blockvol.go` |
||||
|
- `weed/storage/blockvol/replica_apply.go` |
||||
|
- `weed/storage/blockvol/replica_barrier.go` |
||||
|
- `weed/storage/blockvol/v2bridge/` |
||||
|
|
||||
|
### Reuse as implementation reality, not truth |
||||
|
|
||||
|
- `shipperGroup` |
||||
|
- `RetentionFloorFn` |
||||
|
- `ReplicaReceiver` |
||||
|
- checkpoint/superblock machinery |
||||
|
- existing failover heuristics |
||||
|
|
||||
|
### Do not inherit as V2 semantics |
||||
|
|
||||
|
- address-shaped identity |
||||
|
- old degraded/catch-up intuition from V1/V1.5 |
||||
|
- `CommittedLSN = CheckpointLSN` as final truth |
||||
|
- blockvol-side recovery policy decisions |
||||
|
|
||||
|
## Short Rule |
||||
|
|
||||
|
Use this skeleton as: |
||||
|
|
||||
|
- a hardening map for the current product path |
||||
|
|
||||
|
Do not mistake it for: |
||||
|
|
||||
|
- the final standalone `sw-block` architecture |
||||
@ -0,0 +1,170 @@ |
|||||
|
# V2 Engine Readiness Review |
||||
|
|
||||
|
Date: 2026-03-29 |
||||
|
Status: historical readiness review |
||||
|
Purpose: record the decision on whether the current V2 design + prototype + simulator stack is strong enough to begin real V2 engine slicing |
||||
|
|
||||
|
## Decision |
||||
|
|
||||
|
Current judgment: |
||||
|
|
||||
|
- proceed to real V2 engine planning |
||||
|
- do not open a `V2.5` redesign track at this time |
||||
|
|
||||
|
This is a planning-readiness decision, not a production-readiness claim. |
||||
|
|
||||
|
## Why This Review Exists |
||||
|
|
||||
|
The project has now completed: |
||||
|
|
||||
|
1. design/FSM closure for the V2 line |
||||
|
2. protocol simulation closure for: |
||||
|
- V1 / V1.5 / V2 comparison |
||||
|
- timeout/race behavior |
||||
|
- ownership/session semantics |
||||
|
3. standalone prototype closure for: |
||||
|
- sender/session ownership |
||||
|
- execution authority |
||||
|
- recovery branching |
||||
|
- minimal historical-data proof |
||||
|
- prototype scenario closure |
||||
|
4. `Phase 4.5` hardening for: |
||||
|
- bounded `CatchUp` |
||||
|
- first-class `Rebuild` |
||||
|
- crash-consistency / restart-recoverability |
||||
|
- `A5-A8` stronger evidence |
||||
|
|
||||
|
So the question is no longer: |
||||
|
|
||||
|
- "can the prototype be made richer?" |
||||
|
|
||||
|
The question is: |
||||
|
|
||||
|
- "is the evidence now strong enough to begin real engine slicing?" |
||||
|
|
||||
|
## Evidence Summary |
||||
|
|
||||
|
### 1. Design / Protocol |
||||
|
|
||||
|
Primary docs: |
||||
|
|
||||
|
- `sw-block/design/v2-acceptance-criteria.md` |
||||
|
- `sw-block/design/v2-open-questions.md` |
||||
|
- `sw-block/design/v2_scenarios.md` |
||||
|
- `sw-block/design/v1-v15-v2-comparison.md` |
||||
|
- `sw-block/docs/archive/design/v2-prototype-roadmap-and-gates.md` |
||||
|
|
||||
|
Judgment: |
||||
|
|
||||
|
- protocol story is coherent |
||||
|
- acceptance set exists |
||||
|
- major V1 / V1.5 failures are mapped into V2 scenarios |
||||
|
|
||||
|
### 2. Simulator |
||||
|
|
||||
|
Primary code/tests: |
||||
|
|
||||
|
- `sw-block/prototype/distsim/` |
||||
|
- `sw-block/prototype/distsim/eventsim.go` |
||||
|
- `learn/projects/sw-block/test/results/v2-simulation-review.md` |
||||
|
|
||||
|
Judgment: |
||||
|
|
||||
|
- strong enough for protocol/design validation |
||||
|
- strong enough to challenge crash-consistency and liveness assumptions |
||||
|
- not a substitute for real engine / hardware proof |
||||
|
|
||||
|
### 3. Prototype |
||||
|
|
||||
|
Primary code/tests: |
||||
|
|
||||
|
- `sw-block/prototype/enginev2/` |
||||
|
- `sw-block/prototype/enginev2/acceptance_test.go` |
||||
|
|
||||
|
Judgment: |
||||
|
|
||||
|
- ownership is explicit and fenced |
||||
|
- execution authority is explicit and fenced |
||||
|
- bounded `CatchUp` is semantic, not documentary |
||||
|
- `Rebuild` is a first-class sender-owned path |
||||
|
- historical-data and recoverability reasoning are executable |
||||
|
|
||||
|
### 4. `A5-A8` Double Evidence |
||||
|
|
||||
|
Prototype-side grouped evidence: |
||||
|
|
||||
|
- `sw-block/prototype/enginev2/acceptance_test.go` |
||||
|
|
||||
|
Simulator-side grouped evidence: |
||||
|
|
||||
|
- `sw-block/docs/archive/design/a5-a8-traceability.md` |
||||
|
- `sw-block/prototype/distsim/` |
||||
|
|
||||
|
Judgment: |
||||
|
|
||||
|
- the critical acceptance items that most affect engine risk now have materially stronger proof on both sides |
||||
|
|
||||
|
## What Is Good Enough Now |
||||
|
|
||||
|
The following are good enough to begin engine slicing: |
||||
|
|
||||
|
1. sender/session ownership model |
||||
|
2. stale authority fencing |
||||
|
3. recovery orchestration shape |
||||
|
4. bounded `CatchUp` contract |
||||
|
5. `Rebuild` as formal path |
||||
|
6. committed/recoverable boundary thinking |
||||
|
7. crash-consistency / restart-recoverability proof style |
||||
|
|
||||
|
## What Is Still Not Proven |
||||
|
|
||||
|
The following still require real engine work and later real-system validation: |
||||
|
|
||||
|
1. actual engine lifecycle integration |
||||
|
2. real storage/backend implementation |
||||
|
3. real control-plane integration |
||||
|
4. real durability / fsync behavior under the actual engine |
||||
|
5. real hardware timing / performance |
||||
|
6. final production observability and failure handling |
||||
|
|
||||
|
These are expected gaps. They do not block engine planning. |
||||
|
|
||||
|
## Open Risks To Carry Forward |
||||
|
|
||||
|
These are not blockers, but they should remain explicit: |
||||
|
|
||||
|
1. prototype and simulator are still reduced models |
||||
|
2. rebuild-source quality in the real engine will depend on actual checkpoint/base-image mechanics |
||||
|
3. durability truth in the real engine must still be re-proven against actual persistence behavior |
||||
|
4. predicate exploration can still grow, but should not block engine slicing |
||||
|
|
||||
|
## Engine-Planning Decision |
||||
|
|
||||
|
Decision: |
||||
|
|
||||
|
- start real V2 engine planning |
||||
|
|
||||
|
Reason: |
||||
|
|
||||
|
1. no current evidence points to a structural flaw requiring `V2.5` |
||||
|
2. the remaining gaps are implementation/system gaps, not prototype ambiguity |
||||
|
3. continuing to extend prototype/simulator breadth would have diminishing returns |
||||
|
|
||||
|
## Required Outputs After This Review |
||||
|
|
||||
|
1. `sw-block/docs/archive/design/v2-engine-slicing-plan.md` |
||||
|
2. first real engine slice definition |
||||
|
3. explicit non-goals for first engine stage |
||||
|
4. explicit validation plan for engine slices |
||||
|
|
||||
|
## Non-Goals Of This Review |
||||
|
|
||||
|
This review does not claim: |
||||
|
|
||||
|
1. V2 is production-ready |
||||
|
2. V2 should replace V1 immediately |
||||
|
3. all design questions are forever closed |
||||
|
|
||||
|
It only claims: |
||||
|
|
||||
|
- the project now has enough evidence to begin disciplined real engine slicing |
||||
@ -0,0 +1,191 @@ |
|||||
|
# V2 Engine Slicing Plan |
||||
|
|
||||
|
Date: 2026-03-29 |
||||
|
Status: historical slicing plan |
||||
|
Purpose: define the first real V2 engine slices after prototype and `Phase 4.5` closure |
||||
|
|
||||
|
## Goal |
||||
|
|
||||
|
Move from: |
||||
|
|
||||
|
- standalone design/prototype truth under `sw-block/prototype/` |
||||
|
|
||||
|
to: |
||||
|
|
||||
|
- a real V2 engine core under `sw-block/` |
||||
|
|
||||
|
without dragging V1.5 lifecycle assumptions into the implementation. |
||||
|
|
||||
|
## Planning Rules |
||||
|
|
||||
|
1. reuse V1 ideas and tests selectively, not structurally |
||||
|
2. prefer narrow vertical slices over broad skeletons |
||||
|
3. each slice must preserve the accepted V2 ownership/fencing model |
||||
|
4. keep simulator/prototype as validation support, not as the implementation itself |
||||
|
5. do not mix V2 engine work into `weed/storage/blockvol/` |
||||
|
|
||||
|
## First Engine Stage |
||||
|
|
||||
|
The first engine stage should build the control/recovery core, not the full storage engine. |
||||
|
|
||||
|
That means: |
||||
|
|
||||
|
1. per-replica sender identity |
||||
|
2. one active recovery session per replica per epoch |
||||
|
3. sender-owned execution authority |
||||
|
4. explicit recovery outcomes: |
||||
|
- zero gap |
||||
|
- bounded catch-up |
||||
|
- rebuild |
||||
|
5. rebuild execution shell only |
||||
|
- do not hard-code final snapshot + tail vs full base decision logic yet |
||||
|
- keep real rebuild-source choice tied to Slice 3 recoverability inputs |
||||
|
|
||||
|
## Recommended Slice Order |
||||
|
|
||||
|
### Slice 1: Engine Ownership Core |
||||
|
|
||||
|
Purpose: |
||||
|
|
||||
|
- carry the accepted `enginev2` ownership/fencing model into the real engine core |
||||
|
|
||||
|
Scope: |
||||
|
|
||||
|
1. stable per-replica sender object |
||||
|
2. stable recovery-session object |
||||
|
3. session identity fencing |
||||
|
4. endpoint / epoch invalidation |
||||
|
5. sender-group or equivalent ownership registry |
||||
|
|
||||
|
Acceptance: |
||||
|
|
||||
|
1. stale session results cannot mutate current authority |
||||
|
2. changed-address and epoch-bump invalidation work in engine code |
||||
|
3. the 4 V2-boundary ownership themes remain provable |
||||
|
|
||||
|
### Slice 2: Engine Recovery Execution Core |
||||
|
|
||||
|
Purpose: |
||||
|
|
||||
|
- move the prototype execution APIs into real engine behavior |
||||
|
|
||||
|
Scope: |
||||
|
|
||||
|
1. connect / handshake / catch-up flow |
||||
|
2. bounded `CatchUp` |
||||
|
3. explicit `NeedsRebuild` |
||||
|
4. sender-owned rebuild execution path |
||||
|
5. rebuild execution shell without final trusted-base selection policy |
||||
|
|
||||
|
Acceptance: |
||||
|
|
||||
|
1. bounded catch-up does not chase indefinitely |
||||
|
2. rebuild is exclusive from catch-up |
||||
|
3. session completion rules are explicit and fenced |
||||
|
|
||||
|
### Slice 3: Engine Data / Recoverability Core |
||||
|
|
||||
|
Purpose: |
||||
|
|
||||
|
- connect recovery behavior to real retained-history / checkpoint mechanics |
||||
|
|
||||
|
Scope: |
||||
|
|
||||
|
1. real recoverability decision inputs |
||||
|
2. trusted-base decision for rebuild source |
||||
|
3. minimal real checkpoint/base-image integration |
||||
|
4. real truncation / safe-boundary handling |
||||
|
|
||||
|
This is the first slice that should decide, from real engine inputs, between: |
||||
|
|
||||
|
1. `snapshot + tail` |
||||
|
2. `full base` |
||||
|
|
||||
|
Acceptance: |
||||
|
|
||||
|
1. engine can explain why recovery is allowed |
||||
|
2. rebuild-source choice is explicit and testable |
||||
|
3. historical correctness and truncation rules remain intact |
||||
|
|
||||
|
### Slice 4: Engine Integration Closure |
||||
|
|
||||
|
Purpose: |
||||
|
|
||||
|
- bind engine control/recovery core to real orchestration and validation surfaces |
||||
|
|
||||
|
Scope: |
||||
|
|
||||
|
1. real assignment/control intent entry path |
||||
|
2. engine-facing observability |
||||
|
3. focused real-engine tests for V2-boundary cases |
||||
|
4. first integration review against real failure classes |
||||
|
|
||||
|
Acceptance: |
||||
|
|
||||
|
1. key V2-boundary failures are reproduced and closed in engine tests |
||||
|
2. engine observability is good enough to debug ownership/recovery failures |
||||
|
3. remaining gaps are system/performance gaps, not control-model ambiguity |
||||
|
|
||||
|
## What To Reuse |
||||
|
|
||||
|
Good reuse candidates: |
||||
|
|
||||
|
1. tests and failure cases from V1 / V1.5 |
||||
|
2. narrow utility/data helpers where not coupled to V1 lifecycle |
||||
|
3. selected WAL/history concepts if they fit V2 ownership boundaries |
||||
|
|
||||
|
Do not structurally reuse: |
||||
|
|
||||
|
1. V1/V1.5 shipper lifecycle |
||||
|
2. address-based identity assumptions |
||||
|
3. `SetReplicaAddrs`-style behavior |
||||
|
4. old recovery control structure |
||||
|
|
||||
|
## Where The Work Should Live |
||||
|
|
||||
|
Real V2 engine work should continue under: |
||||
|
|
||||
|
- `sw-block/` |
||||
|
|
||||
|
Recommended next area: |
||||
|
|
||||
|
- `sw-block/core/` |
||||
|
or |
||||
|
- `sw-block/engine/` |
||||
|
|
||||
|
Exact path can be chosen later, but it should remain separate from: |
||||
|
|
||||
|
- `sw-block/prototype/` |
||||
|
- `weed/storage/blockvol/` |
||||
|
|
||||
|
## Validation Plan For Engine Slices |
||||
|
|
||||
|
Each engine slice should be validated at three levels: |
||||
|
|
||||
|
1. prototype alignment |
||||
|
- does engine behavior preserve the accepted prototype invariant? |
||||
|
|
||||
|
2. focused engine tests |
||||
|
- does the real engine slice enforce the same contract? |
||||
|
|
||||
|
3. scenario mapping |
||||
|
- does at least one important V1/V1.5 failure class remain closed? |
||||
|
|
||||
|
## Non-Goals For First Engine Stage |
||||
|
|
||||
|
Do not try to do these immediately: |
||||
|
|
||||
|
1. full Smart WAL expansion |
||||
|
2. performance optimization |
||||
|
3. V1 replacement/migration plan |
||||
|
4. full product integration |
||||
|
5. all storage/backend redesign at once |
||||
|
|
||||
|
## Immediate Next Assignment |
||||
|
|
||||
|
The first concrete engine-planning task should be: |
||||
|
|
||||
|
1. choose the real V2 engine module location under `sw-block/` |
||||
|
2. define Slice 1 file/module boundaries |
||||
|
3. write a short engine ownership-core spec |
||||
|
4. map 3-5 acceptance scenarios directly onto Slice 1 expectations |
||||
@ -0,0 +1,159 @@ |
|||||
|
# V2 First Slice: Per-Replica Sender/Session Ownership |
||||
|
|
||||
|
Date: 2026-03-27 |
||||
|
Status: historical first-slice note |
||||
|
Depends-on: Q1 (recovery session), Q6 (orchestrator scope), Q7 (first slice) |
||||
|
|
||||
|
## Problem |
||||
|
|
||||
|
`SetReplicaAddrs()` replaces the entire `ShipperGroup` atomically. This causes: |
||||
|
|
||||
|
1. **State loss on topology change.** All shippers are destroyed and recreated. |
||||
|
Recovery state (`replicaFlushedLSN`, `lastContactTime`, catch-up progress) is lost. |
||||
|
After a changed-address restart, the new shipper starts from scratch. |
||||
|
|
||||
|
2. **No per-replica identity.** Shippers are identified by array index. The master |
||||
|
cannot target a specific replica for rebuild/catch-up — it must re-issue the |
||||
|
entire address set. |
||||
|
|
||||
|
3. **Background reconnect races.** A reconnect cycle may be in progress when |
||||
|
`SetReplicaAddrs` replaces the group. The in-progress reconnect's connection |
||||
|
objects become orphaned. |
||||
|
|
||||
|
## Design |
||||
|
|
||||
|
### Per-replica sender identity |
||||
|
|
||||
|
`ShipperGroup` changes from `[]*WALShipper` to `map[string]*WALShipper`, keyed by |
||||
|
the replica's canonical data address. Each shipper stores its own `ReplicaID`. |
||||
|
|
||||
|
```go |
||||
|
type WALShipper struct { |
||||
|
ReplicaID string // canonical data address — identity across reconnects |
||||
|
// ... existing fields |
||||
|
} |
||||
|
|
||||
|
type ShipperGroup struct { |
||||
|
mu sync.RWMutex |
||||
|
shippers map[string]*WALShipper // keyed by ReplicaID |
||||
|
} |
||||
|
``` |
||||
|
|
||||
|
### ReconcileReplicas replaces SetReplicaAddrs |
||||
|
|
||||
|
Instead of replacing the entire group, `ReconcileReplicas` diffs old vs new: |
||||
|
|
||||
|
``` |
||||
|
ReconcileReplicas(newAddrs []ReplicaAddr): |
||||
|
for each existing shipper: |
||||
|
if NOT in newAddrs → Stop and remove |
||||
|
for each newAddr: |
||||
|
if matching shipper exists → keep (preserve state) |
||||
|
if no match → create new shipper |
||||
|
``` |
||||
|
|
||||
|
This preserves `replicaFlushedLSN`, `lastContactTime`, catch-up progress, and |
||||
|
background reconnect goroutines for replicas that stay in the set. |
||||
|
|
||||
|
`SetReplicaAddrs` becomes a wrapper: |
||||
|
```go |
||||
|
func (v *BlockVol) SetReplicaAddrs(addrs []ReplicaAddr) { |
||||
|
if v.shipperGroup == nil { |
||||
|
v.shipperGroup = NewShipperGroup(nil) |
||||
|
} |
||||
|
v.shipperGroup.ReconcileReplicas(addrs, v.makeShipperFactory()) |
||||
|
} |
||||
|
``` |
||||
|
|
||||
|
### Changed-address restart flow |
||||
|
|
||||
|
1. Replica restarts on new port. Heartbeat reports new address. |
||||
|
2. Master detects endpoint change (address differs, same volume). |
||||
|
3. Master sends assignment update to primary with new replica address. |
||||
|
4. Primary's `ReconcileReplicas` receives `[oldAddr1, newAddr2]`. |
||||
|
5. Old shipper for the changed replica is stopped (old address gone from set). |
||||
|
6. New shipper created with new address — but this is a fresh shipper. |
||||
|
7. New shipper bootstraps: Disconnected → Connecting → CatchingUp → InSync. |
||||
|
|
||||
|
The improvement over V1.5: the **other** replicas in the set are NOT disturbed. |
||||
|
Only the changed replica gets a fresh shipper. Recovery state for stable replicas |
||||
|
is preserved. |
||||
|
|
||||
|
### Recovery session |
||||
|
|
||||
|
Each WALShipper already contains the recovery state machine: |
||||
|
- `state` (Disconnected → Connecting → CatchingUp → InSync → Degraded → NeedsRebuild) |
||||
|
- `replicaFlushedLSN` (authoritative progress) |
||||
|
- `lastContactTime` (retention budget) |
||||
|
- `catchupFailures` (escalation counter) |
||||
|
- Background reconnect goroutine |
||||
|
|
||||
|
No separate `RecoverySession` object is needed. The WALShipper IS the per-replica |
||||
|
recovery session. The state machine already tracks the session lifecycle. |
||||
|
|
||||
|
What changes: the session is no longer destroyed on topology change (unless the |
||||
|
replica itself is removed from the set). |
||||
|
|
||||
|
### Coordinator vs primary responsibilities |
||||
|
|
||||
|
| Responsibility | Owner | |
||||
|
|---------------|-------| |
||||
|
| Endpoint truth (canonical address) | Coordinator (master) | |
||||
|
| Assignment updates (add/remove replicas) | Coordinator | |
||||
|
| Epoch authority | Coordinator | |
||||
|
| Session creation trigger | Coordinator (via assignment) | |
||||
|
| Session execution (reconnect, catch-up, barrier) | Primary (via WALShipper) | |
||||
|
| Timeout enforcement | Primary | |
||||
|
| Ordered receive/apply | Replica | |
||||
|
| Barrier ack | Replica | |
||||
|
| Heartbeat reporting | Replica | |
||||
|
|
||||
|
### Migration from current code |
||||
|
|
||||
|
| Current | V2 | |
||||
|
|---------|-----| |
||||
|
| `ShipperGroup.shippers []*WALShipper` | `ShipperGroup.shippers map[string]*WALShipper` | |
||||
|
| `SetReplicaAddrs()` creates all new | `ReconcileReplicas()` diffs and preserves | |
||||
|
| `StopAll()` in demote | `StopAll()` unchanged (stops all) | |
||||
|
| `ShipAll(entry)` iterates slice | `ShipAll(entry)` iterates map values | |
||||
|
| `BarrierAll(lsn)` parallel slice | `BarrierAll(lsn)` parallel map values | |
||||
|
| `MinReplicaFlushedLSN()` iterates slice | Same, iterates map values | |
||||
|
| `ShipperStates()` iterates slice | Same, iterates map values | |
||||
|
| No per-shipper identity | `WALShipper.ReplicaID` = canonical data addr | |
||||
|
|
||||
|
### Files changed |
||||
|
|
||||
|
| File | Change | |
||||
|
|------|--------| |
||||
|
| `wal_shipper.go` | Add `ReplicaID` field, pass in constructor | |
||||
|
| `shipper_group.go` | `map[string]*WALShipper`, `ReconcileReplicas`, update iterators | |
||||
|
| `blockvol.go` | `SetReplicaAddrs` calls `ReconcileReplicas`, shipper factory | |
||||
|
| `promotion.go` | No change (StopAll unchanged) | |
||||
|
| `dist_group_commit.go` | No change (uses ShipperGroup API) | |
||||
|
| `block_heartbeat.go` | No change (uses ShipperStates) | |
||||
|
|
||||
|
### Acceptance bar |
||||
|
|
||||
|
The following existing tests must continue to pass: |
||||
|
- All CP13-1 through CP13-7 protocol tests (sync_all_protocol_test.go) |
||||
|
- All adversarial tests (sync_all_adversarial_test.go) |
||||
|
- All baseline tests (sync_all_bug_test.go) |
||||
|
- All rebuild tests (rebuild_v1_test.go) |
||||
|
|
||||
|
The following CP13-8 tests validate the V2 improvement: |
||||
|
- `TestCP13_SyncAll_ReplicaRestart_Rejoin` — changed-address recovery |
||||
|
- `TestAdversarial_ReconnectUsesHandshakeNotBootstrap` — V2 reconnect protocol |
||||
|
- `TestAdversarial_CatchupMultipleDisconnects` — state preservation across reconnects |
||||
|
|
||||
|
New tests to add: |
||||
|
- `TestReconcileReplicas_PreservesExistingShipper` — stable replica keeps state |
||||
|
- `TestReconcileReplicas_RemovesStaleShipper` — removed replica stopped |
||||
|
- `TestReconcileReplicas_AddsNewShipper` — new replica bootstraps |
||||
|
- `TestReconcileReplicas_MixedUpdate` — one kept, one removed, one added |
||||
|
|
||||
|
## Non-goals for this slice |
||||
|
|
||||
|
- Smart WAL payload classes |
||||
|
- Recovery reservation protocol |
||||
|
- Full coordinator orchestration |
||||
|
- New transport layer |
||||
@ -0,0 +1,194 @@ |
|||||
|
# V2 First Slice: Per-Replica Sender and Recovery Session Ownership |
||||
|
|
||||
|
Date: 2026-03-27 |
||||
|
Status: historical first-slice note |
||||
|
|
||||
|
## Purpose |
||||
|
|
||||
|
This document defines the first real V2 implementation slice. |
||||
|
|
||||
|
The slice is intentionally narrow: |
||||
|
|
||||
|
- per-replica sender ownership |
||||
|
- explicit recovery session ownership |
||||
|
- clear coordinator vs primary responsibility |
||||
|
|
||||
|
This is the first step toward a standalone V2 block engine under `sw-block/`. |
||||
|
|
||||
|
## Why This Slice First |
||||
|
|
||||
|
It directly addresses the clearest V1.5 structural limits: |
||||
|
|
||||
|
- sender identity loss when replica sets are refreshed |
||||
|
- changed-address restart recovery complexity |
||||
|
- repeated reconnect cycles without stable per-replica ownership |
||||
|
- adversarial Phase 13 boundary tests that V1.5 cannot cleanly satisfy |
||||
|
|
||||
|
It also avoids jumping too early into: |
||||
|
|
||||
|
- Smart WAL |
||||
|
- new backend storage layout |
||||
|
- full production transport redesign |
||||
|
|
||||
|
## Core Decision |
||||
|
|
||||
|
Use: |
||||
|
|
||||
|
- **one sender owner per replica** |
||||
|
- **at most one active recovery session per replica per epoch** |
||||
|
|
||||
|
Healthy replicas may only need their steady sender object. |
||||
|
|
||||
|
Degraded / reconnecting replicas gain an explicit recovery session owned by the primary. |
||||
|
|
||||
|
## Ownership Split |
||||
|
|
||||
|
### Coordinator |
||||
|
|
||||
|
Owns: |
||||
|
|
||||
|
- replica identity / endpoint truth |
||||
|
- assignment updates |
||||
|
- epoch authority |
||||
|
- session creation / destruction intent |
||||
|
|
||||
|
Does not own: |
||||
|
|
||||
|
- byte-by-byte catch-up execution |
||||
|
- local sender loop scheduling |
||||
|
|
||||
|
### Primary |
||||
|
|
||||
|
Owns: |
||||
|
|
||||
|
- per-replica sender objects |
||||
|
- per-replica recovery session execution |
||||
|
- reconnect / catch-up progress |
||||
|
- timeout enforcement for active session |
||||
|
- transition from: |
||||
|
- normal sender |
||||
|
- to recovery session |
||||
|
- back to normal sender |
||||
|
|
||||
|
### Replica |
||||
|
|
||||
|
Owns: |
||||
|
|
||||
|
- receive/apply path |
||||
|
- barrier ack |
||||
|
- heartbeat/reporting |
||||
|
|
||||
|
Replica remains passive from the recovery-orchestration point of view. |
||||
|
|
||||
|
## Data Model |
||||
|
|
||||
|
## Sender Owner |
||||
|
|
||||
|
Per replica, maintain a stable sender owner with: |
||||
|
|
||||
|
- replica logical ID |
||||
|
- current endpoint |
||||
|
- current epoch view |
||||
|
- steady-state health/status |
||||
|
- optional active recovery session reference |
||||
|
|
||||
|
## Recovery Session |
||||
|
|
||||
|
Per replica, per epoch: |
||||
|
|
||||
|
- `ReplicaID` |
||||
|
- `Epoch` |
||||
|
- `EndpointVersion` or equivalent endpoint truth |
||||
|
- `State` |
||||
|
- `connecting` |
||||
|
- `catching_up` |
||||
|
- `in_sync` |
||||
|
- `needs_rebuild` |
||||
|
- `StartLSN` |
||||
|
- `TargetLSN` |
||||
|
- timeout / deadline metadata |
||||
|
|
||||
|
## Session Rules |
||||
|
|
||||
|
1. only one active session per replica per epoch |
||||
|
2. new assignment for same replica: |
||||
|
- supersedes old session only if epoch/session generation is newer |
||||
|
3. stale session must not continue after: |
||||
|
- epoch bump |
||||
|
- endpoint truth change |
||||
|
- explicit coordinator replacement |
||||
|
|
||||
|
## Minimal State Transitions |
||||
|
|
||||
|
### Healthy path |
||||
|
|
||||
|
1. replica sender exists |
||||
|
2. sender ships normally |
||||
|
3. replica remains `InSync` |
||||
|
|
||||
|
### Recovery path |
||||
|
|
||||
|
1. sender detects or is told replica is not healthy |
||||
|
2. coordinator provides valid assignment/endpoint truth |
||||
|
3. primary creates recovery session |
||||
|
4. session connects |
||||
|
5. session catches up if recoverable |
||||
|
6. on success: |
||||
|
- session closes |
||||
|
- steady sender resumes normal state |
||||
|
|
||||
|
### Rebuild path |
||||
|
|
||||
|
1. session determines catch-up is not sufficient |
||||
|
2. session transitions to `needs_rebuild` |
||||
|
3. higher layer rebuild flow takes over |
||||
|
|
||||
|
## What This Slice Does Not Include |
||||
|
|
||||
|
Not in the first slice: |
||||
|
|
||||
|
- Smart WAL payload classes in production |
||||
|
- snapshot pinning / GC logic |
||||
|
- new on-disk engine |
||||
|
- frontend publication changes |
||||
|
- full production event scheduler |
||||
|
|
||||
|
## Proposed V2 Workspace Target |
||||
|
|
||||
|
Do this under `sw-block/`, not `weed/storage/blockvol/`. |
||||
|
|
||||
|
Suggested area: |
||||
|
|
||||
|
- `sw-block/prototype/enginev2/` |
||||
|
|
||||
|
Suggested first files: |
||||
|
|
||||
|
- `sw-block/prototype/enginev2/session.go` |
||||
|
- `sw-block/prototype/enginev2/sender.go` |
||||
|
- `sw-block/prototype/enginev2/group.go` |
||||
|
- `sw-block/prototype/enginev2/session_test.go` |
||||
|
|
||||
|
The first code does not need full storage I/O. |
||||
|
It should prove ownership and transition shape first. |
||||
|
|
||||
|
## Acceptance For This Slice |
||||
|
|
||||
|
The slice is good enough when: |
||||
|
|
||||
|
1. sender identity is stable per replica |
||||
|
2. changed-address reassignment updates the right sender owner |
||||
|
3. multiple reconnect cycles do not lose recovery ownership |
||||
|
4. stale session does not survive epoch bump |
||||
|
5. the 4 Phase 13 V2-boundary tests have a clear path to become satisfiable |
||||
|
|
||||
|
## Relationship To Existing Simulator |
||||
|
|
||||
|
This slice should align with: |
||||
|
|
||||
|
- `v2-acceptance-criteria.md` |
||||
|
- `v2-open-questions.md` |
||||
|
- `v1-v15-v2-comparison.md` |
||||
|
- `distsim` / `eventsim` behavior |
||||
|
|
||||
|
The simulator remains the design oracle. |
||||
|
The first implementation slice should not contradict it. |
||||
@ -0,0 +1,199 @@ |
|||||
|
# V2 Production Roadmap |
||||
|
|
||||
|
Date: 2026-03-30 |
||||
|
Status: historical roadmap |
||||
|
Purpose: define the path from the accepted V2 engine core to a production candidate |
||||
|
|
||||
|
## Current Position |
||||
|
|
||||
|
Completed: |
||||
|
|
||||
|
1. design / FSM closure |
||||
|
2. simulator / protocol validation |
||||
|
3. prototype closure |
||||
|
4. evidence hardening |
||||
|
5. engine core slices: |
||||
|
- Slice 1 ownership core |
||||
|
- Slice 2 recovery execution core |
||||
|
- Slice 3 data / recoverability core |
||||
|
- Slice 4 integration closure |
||||
|
|
||||
|
Current stage: |
||||
|
|
||||
|
- entering broader engine implementation |
||||
|
|
||||
|
This means the main risk is no longer: |
||||
|
|
||||
|
- whether the V2 idea stands up |
||||
|
|
||||
|
The main risk is: |
||||
|
|
||||
|
- whether the accepted engine core can be turned into a real system without reintroducing V1/V1.5 structure and semantics |
||||
|
|
||||
|
## Roadmap Summary |
||||
|
|
||||
|
1. Phase 06: broader engine implementation stage |
||||
|
2. Phase 07: real-system integration / product-path decision |
||||
|
3. Phase 08: pre-production hardening |
||||
|
4. Phase 09: performance / scale / soak validation |
||||
|
5. Phase 10: production candidate and rollout gate |
||||
|
|
||||
|
## Phase 06 |
||||
|
|
||||
|
### Goal |
||||
|
|
||||
|
Connect the accepted engine core to: |
||||
|
|
||||
|
1. real control truth |
||||
|
2. real storage truth |
||||
|
3. explicit engine execution steps |
||||
|
|
||||
|
### Outputs |
||||
|
|
||||
|
1. control-plane adapter into the engine core |
||||
|
2. storage/base/recoverability adapters |
||||
|
3. explicit execution-driver model where synchronous helpers are no longer sufficient |
||||
|
4. validation against selected real failure classes |
||||
|
|
||||
|
### Gate |
||||
|
|
||||
|
At the end of Phase 06, the project should be able to say: |
||||
|
|
||||
|
- the engine core can live inside a real system shape |
||||
|
|
||||
|
## Phase 07 |
||||
|
|
||||
|
### Goal |
||||
|
|
||||
|
Move from engine-local correctness to a real runnable subsystem. |
||||
|
|
||||
|
### Outputs |
||||
|
|
||||
|
1. service-style runnable engine slice |
||||
|
2. integration with real control and storage surfaces |
||||
|
3. crash/failover/restart integration tests |
||||
|
4. decision on the first viable product path |
||||
|
|
||||
|
### Gate |
||||
|
|
||||
|
At the end of Phase 07, the project should be able to say: |
||||
|
|
||||
|
- the engine can run as a real subsystem, not only as an isolated core |
||||
|
|
||||
|
## Phase 08 |
||||
|
|
||||
|
### Goal |
||||
|
|
||||
|
Turn correctness into operational safety. |
||||
|
|
||||
|
### Outputs |
||||
|
|
||||
|
1. observability hardening |
||||
|
2. operator/debug flows |
||||
|
3. recovery/runbook procedures |
||||
|
4. config surface cleanup |
||||
|
5. realistic durability/restart validation |
||||
|
|
||||
|
### Gate |
||||
|
|
||||
|
At the end of Phase 08, the project should be able to say: |
||||
|
|
||||
|
- operators can run, debug, and recover the system safely |
||||
|
|
||||
|
## Phase 09 |
||||
|
|
||||
|
### Goal |
||||
|
|
||||
|
Prove viability under load and over time. |
||||
|
|
||||
|
### Outputs |
||||
|
|
||||
|
1. throughput / latency baselines |
||||
|
2. rebuild / catch-up cost characterization |
||||
|
3. steady-state overhead measurement |
||||
|
4. soak testing |
||||
|
5. scale and failure-under-load validation |
||||
|
|
||||
|
### Gate |
||||
|
|
||||
|
At the end of Phase 09, the project should be able to say: |
||||
|
|
||||
|
- the design is not only correct, but viable at useful scale and duration |
||||
|
|
||||
|
## Phase 10 |
||||
|
|
||||
|
### Goal |
||||
|
|
||||
|
Produce a controlled production candidate. |
||||
|
|
||||
|
### Outputs |
||||
|
|
||||
|
1. feature-gated production candidate |
||||
|
2. rollback strategy |
||||
|
3. migration/coexistence plan with V1 |
||||
|
4. staged rollout plan |
||||
|
5. production acceptance checklist |
||||
|
|
||||
|
### Gate |
||||
|
|
||||
|
At the end of Phase 10, the project should be able to say: |
||||
|
|
||||
|
- the system is ready for a controlled production rollout |
||||
|
|
||||
|
## Cross-Phase Rules |
||||
|
|
||||
|
### Rule 1: Do not reopen protocol shape casually |
||||
|
|
||||
|
The accepted core should remain stable unless new implementation evidence forces a change. |
||||
|
|
||||
|
### Rule 2: Use V1 as validation source, not design template |
||||
|
|
||||
|
Use: |
||||
|
|
||||
|
1. `learn/projects/sw-block/` |
||||
|
2. `weed/storage/block*` |
||||
|
|
||||
|
for: |
||||
|
|
||||
|
1. failure gates |
||||
|
2. constraints |
||||
|
3. integration references |
||||
|
|
||||
|
Do not use them as the default V2 architecture template. |
||||
|
|
||||
|
### Rule 3: Keep `CatchUp` narrow |
||||
|
|
||||
|
Do not let later implementation phases re-expand `CatchUp` into a broad, optimistic, long-lived recovery mode. |
||||
|
|
||||
|
### Rule 4: Keep evidence quality ahead of object growth |
||||
|
|
||||
|
New work should preferentially improve: |
||||
|
|
||||
|
1. traceability |
||||
|
2. diagnosability |
||||
|
3. real-failure validation |
||||
|
4. operational confidence |
||||
|
|
||||
|
not simply add new objects, states, or mechanisms. |
||||
|
|
||||
|
## Production Readiness Ladder |
||||
|
|
||||
|
The project should move through this ladder explicitly: |
||||
|
|
||||
|
1. proof-of-design |
||||
|
2. proof-of-engine-shape |
||||
|
3. proof-of-runnable-engine-stage |
||||
|
4. proof-of-operable-system |
||||
|
5. proof-of-viable-production-candidate |
||||
|
|
||||
|
Current ladder position: |
||||
|
|
||||
|
- between `2` and `3` |
||||
|
- engine core accepted; broader runnable engine stage underway |
||||
|
|
||||
|
## Next Documents To Maintain |
||||
|
|
||||
|
1. `sw-block/.private/phase/phase-06.md` |
||||
|
2. `sw-block/docs/archive/design/v2-engine-readiness-review.md` |
||||
|
3. `sw-block/docs/archive/design/v2-engine-slicing-plan.md` |
||||
|
4. this roadmap |
||||
@ -0,0 +1,239 @@ |
|||||
|
# V2 Prototype Roadmap And Gates |
||||
|
|
||||
|
Date: 2026-03-27 |
||||
|
Status: historical prototype roadmap |
||||
|
Purpose: define the remaining prototype roadmap, the validation gates between stages, and the decision point between real V2 engine work and possible V2.5 redesign |
||||
|
|
||||
|
## Current Position |
||||
|
|
||||
|
V2 design/FSM/simulator work is sufficiently closed for serious prototyping, but not frozen against later `V2.5` adjustments. |
||||
|
|
||||
|
Current state: |
||||
|
|
||||
|
- design proof: high |
||||
|
- execution proof: medium |
||||
|
- data/recovery proof: low |
||||
|
- prototype end-to-end proof: low |
||||
|
|
||||
|
Rough prototype progress: |
||||
|
|
||||
|
- `25%` to `35%` |
||||
|
|
||||
|
This is early executable prototype, not engine-ready prototype. |
||||
|
|
||||
|
## Roadmap Goal |
||||
|
|
||||
|
Answer this question with prototype evidence: |
||||
|
|
||||
|
- can V2 become a real engine path? |
||||
|
- or should it become `V2.5` before real implementation begins? |
||||
|
|
||||
|
## Step 1: Execution Authority Closure |
||||
|
|
||||
|
Purpose: |
||||
|
|
||||
|
- finish the sender / recovery-session authority model so stale work is unambiguously rejected |
||||
|
|
||||
|
Scope: |
||||
|
|
||||
|
1. ownership-only `AttachSession()` / `SupersedeSession()` |
||||
|
2. execution begins only through execution APIs |
||||
|
3. stale handshake / progress / completion fenced by `sessionID` |
||||
|
4. endpoint bump / epoch bump invalidate execution authority |
||||
|
5. sender-group preserve-or-kill behavior is explicit |
||||
|
|
||||
|
Done when: |
||||
|
|
||||
|
1. all execution APIs are sender-gated and reject stale `sessionID` |
||||
|
2. session creation is separated from execution start |
||||
|
3. phase ordering is enforced |
||||
|
4. endpoint bump / epoch bump invalidate execution authority correctly |
||||
|
5. mixed add/remove/update reconciliation preserves or kills state exactly as intended |
||||
|
|
||||
|
Main files: |
||||
|
|
||||
|
- `sw-block/prototype/enginev2/` |
||||
|
- `sw-block/prototype/distsim/` |
||||
|
- `learn/projects/sw-block/phases/phase-13-v2-boundary-tests.md` |
||||
|
|
||||
|
Key gate: |
||||
|
|
||||
|
- old recovery work cannot mutate current sender state at any execution stage |
||||
|
|
||||
|
## Step 2: Orchestrated Recovery Prototype |
||||
|
|
||||
|
Purpose: |
||||
|
|
||||
|
- move from good local sender APIs to an actual prototype recovery flow driven by assignment/update intent |
||||
|
|
||||
|
Scope: |
||||
|
|
||||
|
1. assignment/update intent creates or supersedes recovery attempts |
||||
|
2. reconnect / reassignment / catch-up / rebuild decision path |
||||
|
3. sender-group becomes orchestration entry point |
||||
|
4. explicit outcome branching: |
||||
|
- zero-gap fast completion |
||||
|
- positive-gap catch-up |
||||
|
- unrecoverable gap -> `NeedsRebuild` |
||||
|
|
||||
|
Done when: |
||||
|
|
||||
|
1. the prototype expresses a realistic recovery flow from topology/control intent |
||||
|
2. sender-group drives recovery creation, not only unit helpers |
||||
|
3. recovery outcomes are explicit and testable |
||||
|
4. orchestrator responsibility is clear enough to narrow `v2-open-questions.md` item 6 |
||||
|
|
||||
|
Key gate: |
||||
|
|
||||
|
- recovery control is no longer scattered across helper calls; it has one clear orchestration path |
||||
|
|
||||
|
## Step 3: Minimal Historical Data Prototype |
||||
|
|
||||
|
Purpose: |
||||
|
|
||||
|
- prove the recovery model against real data-history assumptions, not only control logic |
||||
|
|
||||
|
Scope: |
||||
|
|
||||
|
1. minimal WAL/history model, not full engine |
||||
|
2. enough to exercise: |
||||
|
- catch-up range |
||||
|
- retained prefix/window |
||||
|
- rebuild fallback |
||||
|
- historical correctness at target LSN |
||||
|
3. enough reservation/recoverability state to make recovery explicit |
||||
|
|
||||
|
Done when: |
||||
|
|
||||
|
1. the prototype can prove why a gap is recoverable or unrecoverable |
||||
|
2. catch-up and rebuild decisions are backed by minimal data/history state |
||||
|
3. `v2-open-questions.md` items 3, 4, 5 are closed or sharply narrowed |
||||
|
4. prototype evidence strengthens acceptance criteria `A5`, `A6`, and `A7` |
||||
|
|
||||
|
Key gate: |
||||
|
|
||||
|
- the prototype must explain why recovery is allowed, not just that policy says it is |
||||
|
|
||||
|
## Step 4: Prototype Scenario Closure |
||||
|
|
||||
|
Purpose: |
||||
|
|
||||
|
- make the prototype itself demonstrate the V2 story end-to-end |
||||
|
|
||||
|
Scope: |
||||
|
|
||||
|
1. map key V2 scenarios onto the prototype |
||||
|
2. express the 4 V2-boundary cases against prototype behavior |
||||
|
3. add one small end-to-end harness inside `sw-block/prototype/` |
||||
|
4. align prototype evidence with acceptance criteria |
||||
|
|
||||
|
Done when: |
||||
|
|
||||
|
1. prototype behavior can be reviewed scenario-by-scenario |
||||
|
2. key V1/V1.5 failures have prototype equivalents |
||||
|
3. prototype outcomes match intended V2 design claims |
||||
|
4. remaining gaps are clearly real-engine gaps, not protocol/prototype ambiguity |
||||
|
|
||||
|
Key gate: |
||||
|
|
||||
|
- a reviewer can trace: |
||||
|
- acceptance criteria -> scenario -> prototype behavior |
||||
|
without hand-waving |
||||
|
|
||||
|
## Gates |
||||
|
|
||||
|
### Gate 1: Design Closed Enough |
||||
|
|
||||
|
Status: |
||||
|
|
||||
|
- mostly passed |
||||
|
|
||||
|
Meaning: |
||||
|
|
||||
|
1. acceptance criteria exist |
||||
|
2. core simulator exists |
||||
|
3. ownership gap from V1.5 is understood |
||||
|
|
||||
|
### Gate 2: Execution Authority Closed |
||||
|
|
||||
|
Passes after Step 1. |
||||
|
|
||||
|
Meaning: |
||||
|
|
||||
|
- stale execution results cannot mutate current authority |
||||
|
|
||||
|
### Gate 3: Orchestrated Recovery Closed |
||||
|
|
||||
|
Passes after Step 2. |
||||
|
|
||||
|
Meaning: |
||||
|
|
||||
|
- recovery flow is controlled by one coherent orchestration model |
||||
|
|
||||
|
### Gate 4: Historical Data Model Closed |
||||
|
|
||||
|
Passes after Step 3. |
||||
|
|
||||
|
Meaning: |
||||
|
|
||||
|
- catch-up vs rebuild is backed by executable data-history logic |
||||
|
|
||||
|
### Gate 5: Prototype Convincing |
||||
|
|
||||
|
Passes after Step 4. |
||||
|
|
||||
|
Meaning: |
||||
|
|
||||
|
- enough evidence exists to choose: |
||||
|
- real V2 engine path |
||||
|
- or `V2.5` redesign |
||||
|
|
||||
|
## Decision Gate After Step 4 |
||||
|
|
||||
|
### Path A: Real V2 Engine Planning |
||||
|
|
||||
|
Choose this if: |
||||
|
|
||||
|
1. prototype control logic is coherent |
||||
|
2. recovery boundary is explicit |
||||
|
3. boundary cases are convincing |
||||
|
4. no major structural flaw remains |
||||
|
|
||||
|
Outputs: |
||||
|
|
||||
|
1. real engine slicing plan |
||||
|
2. migration/integration plan into future standalone `sw-block` |
||||
|
3. explicit non-goals for first production version |
||||
|
|
||||
|
### Path B: V2.5 Redesign |
||||
|
|
||||
|
Choose this if the prototype reveals: |
||||
|
|
||||
|
1. ownership/orchestration still too fragile |
||||
|
2. recovery boundary still too implicit |
||||
|
3. historical correctness model too costly or too unclear |
||||
|
4. too much complexity leaks into the hot path |
||||
|
|
||||
|
Output: |
||||
|
|
||||
|
- write `V2.5` as a design/prototype correction before engine work |
||||
|
|
||||
|
## What Not To Do Yet |
||||
|
|
||||
|
1. no Smart WAL expansion beyond what Step 3 minimally needs |
||||
|
2. no backend/storage-engine redesign |
||||
|
3. no V1 production integration |
||||
|
4. no frontend/wire protocol work |
||||
|
5. no performance optimization as a primary goal |
||||
|
|
||||
|
## Practical Summary |
||||
|
|
||||
|
Current sequence: |
||||
|
|
||||
|
1. finish execution authority |
||||
|
2. build orchestrated recovery |
||||
|
3. add minimal historical-data proof |
||||
|
4. close key scenarios against the prototype |
||||
|
5. decide: |
||||
|
- V2 engine |
||||
|
- or `V2.5` |
||||
Write
Preview
Loading…
Cancel
Save
Reference in new issue