5.2 KiB

Raw Blame History

Protocol Development Process

Date: 2026-03-27

Purpose

This document defines how sw-block protocol work should be developed.

The process is meant to work for:

V2
future V3
or a later block algorithm that is not WAL-based

The point is to make protocol work systematic rather than reactive.

Core Philosophy

1. Design before implementation

Do not start with production code and hope the protocol becomes clear later.

Start with:

system contract
invariants
state model
scenario backlog

Only then move to implementation.

2. Real failures are inputs, not just bugs

When V1 or V1.5 fails in real testing, treat that as:

a design requirement
a scenario source
a simulator input

Do not patch and forget.

3. Simulator is part of the protocol, not a side tool

The simulator exists to answer:

what should happen
what must never happen
which old designs fail
why the new design is better

It is not a replacement for real testing. It is the design-validation layer before production implementation.

4. Passing tests are not enough

Green tests are necessary, not sufficient.

We also require:

explicit invariants
explicit scenario intent
clear state transitions
review of assumptions and abstraction boundaries

5. Keep hot-path and recovery-path reasoning separate

Healthy steady-state behavior and degraded recovery behavior are different problems.

Both must be designed explicitly.

Development Ladder

Every major protocol feature should move through these steps:

Problem statement

what real bug, limit, or product goal is driving the work

Contract

what the protocol guarantees
what it does not guarantee

State model

node state
coordinator state
recovery state
role / epoch / lineage rules

Scenario backlog

named scenarios
source:
- real failure
- design obligation
- adversarial distributed case

Prototype / simulator

reduced but explicit model
invariant checks
V1 / V1.5 / V2 comparison where relevant

Implementation

production code only after the protocol shape is clear enough

Real validation

unit
component
integration
real hardware where needed

Feedback loop

turn new failures back into scenario/design inputs

Required Artifacts

For protocol work to be considered real progress, we usually want:

Design

design doc
scenario doc
comparison doc when replacing an older approach

Prototype

simulator or prototype code
tests that assert protocol behavior

Implementation

production patch
production tests
docs updated to match the actual algorithm

Review

implementation gate
design/protocol gate

Two-Gate Rule

We use two acceptance gates.

Gate 1: implementation

Owned by the coding side.

Questions:

does it build?
do tests pass?
does it behave as intended in code?

Gate 2: protocol/design

Owned by the design/review side.

Questions:

is the logic actually sound?
do tests prove the intended thing?
are assumptions explicit?
is the abstraction boundary honest?

A task is not accepted until both gates pass.

Layering Rule

Keep simulation layers separate.

`distsim`

Use for:

protocol correctness
state transitions
fencing
recoverability
promotion / lineage
reference-state checking

`eventsim`

Use for:

timeout behavior
timer races
event ordering
same-tick / delayed event interactions

Do not duplicate scenarios blindly across both layers.

Test Selection Rule

Do not choose simulator inputs only from failing tests.

Review all relevant tests and classify them by:

protocol significance
simulator value
implementation specificity

Good simulator candidates often come from:

barrier truth
catch-up vs rebuild
stale message rejection
failover / promotion safety
changed-address restart
mode semantics

Keep real-only tests for:

wire format
OS timing
exact WAL file behavior
frontend transport specifics

Version Comparison Rule

When designing a successor protocol:

keep the old version visible
reproduce the old failure or limitation
show the improved behavior in the new version

For sw-block, that means:

V1
V1.5
V2

should be compared explicitly where possible.

Documentation Rule

The docs must track three different things:

`learn/projects/sw-block/`

Use for:

project history
V1/V1.5 algorithm records
phase records
real test history

`sw-block/design/`

Use for:

active design truth
V2 and later protocol docs
scenario backlog
comparison docs

`sw-block/.private/phase/`

Use for:

active execution plan
log
decisions

What Good Progress Looks Like

A good protocol iteration usually has this pattern:

real failure or design pressure identified
scenario named and written down
simulator reproduces the bad case
new protocol handles it explicitly
implementation follows
real tests validate it

If one of those steps is missing, confidence is weaker.

Bottom Line

The process is:

design the contract
model the state
define the scenarios
simulate the protocol
implement carefully
validate in real tests
feed failures back into design

That is the process we should keep using for V2 and any later protocol line.

5.2 KiB Raw Blame History

Protocol Development Process

Purpose

Core Philosophy

1. Design before implementation

2. Real failures are inputs, not just bugs

3. Simulator is part of the protocol, not a side tool

4. Passing tests are not enough

5. Keep hot-path and recovery-path reasoning separate

Development Ladder

Required Artifacts

Design

Prototype

Implementation

Review

Two-Gate Rule

Gate 1: implementation

Gate 2: protocol/design

Layering Rule

distsim

eventsim

Test Selection Rule

Version Comparison Rule

Documentation Rule

learn/projects/sw-block/

sw-block/design/

sw-block/.private/phase/

What Good Progress Looks Like

Bottom Line

5.2 KiB

Raw Blame History

`distsim`

`eventsim`

`learn/projects/sw-block/`

`sw-block/design/`

`sw-block/.private/phase/`