# Protocol Development Process Date: 2026-03-27 ## Purpose This document defines how `sw-block` protocol work should be developed. The process is meant to work for: - V2 - future V3 - or a later block algorithm that is not WAL-based The point is to make protocol work systematic rather than reactive. ## Core Philosophy ### 1. Design before implementation Do not start with production code and hope the protocol becomes clear later. Start with: 1. system contract 2. invariants 3. state model 4. scenario backlog Only then move to implementation. ### 2. Real failures are inputs, not just bugs When V1 or V1.5 fails in real testing, treat that as: - a design requirement - a scenario source - a simulator input Do not patch and forget. ### 3. Simulator is part of the protocol, not a side tool The simulator exists to answer: - what should happen - what must never happen - which old designs fail - why the new design is better It is not a replacement for real testing. It is the design-validation layer before production implementation. ### 4. Passing tests are not enough Green tests are necessary, not sufficient. We also require: - explicit invariants - explicit scenario intent - clear state transitions - review of assumptions and abstraction boundaries ### 5. Keep hot-path and recovery-path reasoning separate Healthy steady-state behavior and degraded recovery behavior are different problems. Both must be designed explicitly. ## Development Ladder Every major protocol feature should move through these steps: 1. **Problem statement** - what real bug, limit, or product goal is driving the work 2. **Contract** - what the protocol guarantees - what it does not guarantee 3. **State model** - node state - coordinator state - recovery state - role / epoch / lineage rules 4. **Scenario backlog** - named scenarios - source: - real failure - design obligation - adversarial distributed case 5. **Prototype / simulator** - reduced but explicit model - invariant checks - V1 / V1.5 / V2 comparison where relevant 6. **Implementation** - production code only after the protocol shape is clear enough 7. **Real validation** - unit - component - integration - real hardware where needed 8. **Feedback loop** - turn new failures back into scenario/design inputs ## Required Artifacts For protocol work to be considered real progress, we usually want: ### Design - design doc - scenario doc - comparison doc when replacing an older approach ### Prototype - simulator or prototype code - tests that assert protocol behavior ### Implementation - production patch - production tests - docs updated to match the actual algorithm ### Review - implementation gate - design/protocol gate ## Two-Gate Rule We use two acceptance gates. ### Gate 1: implementation Owned by the coding side. Questions: - does it build? - do tests pass? - does it behave as intended in code? ### Gate 2: protocol/design Owned by the design/review side. Questions: - is the logic actually sound? - do tests prove the intended thing? - are assumptions explicit? - is the abstraction boundary honest? A task is not accepted until both gates pass. ## Layering Rule Keep simulation layers separate. ### `distsim` Use for: - protocol correctness - state transitions - fencing - recoverability - promotion / lineage - reference-state checking ### `eventsim` Use for: - timeout behavior - timer races - event ordering - same-tick / delayed event interactions Do not duplicate scenarios blindly across both layers. ## Test Selection Rule Do not choose simulator inputs only from failing tests. Review all relevant tests and classify them by: - protocol significance - simulator value - implementation specificity Good simulator candidates often come from: - barrier truth - catch-up vs rebuild - stale message rejection - failover / promotion safety - changed-address restart - mode semantics Keep real-only tests for: - wire format - OS timing - exact WAL file behavior - frontend transport specifics ## Version Comparison Rule When designing a successor protocol: - keep the old version visible - reproduce the old failure or limitation - show the improved behavior in the new version For `sw-block`, that means: - `V1` - `V1.5` - `V2` should be compared explicitly where possible. ## Documentation Rule The docs must track three different things: ### `learn/projects/sw-block/` Use for: - project history - V1/V1.5 algorithm records - phase records - real test history ### `sw-block/design/` Use for: - active design truth - V2 and later protocol docs - scenario backlog - comparison docs ### `sw-block/.private/phase/` Use for: - active execution plan - log - decisions ## What Good Progress Looks Like A good protocol iteration usually has this pattern: 1. real failure or design pressure identified 2. scenario named and written down 3. simulator reproduces the bad case 4. new protocol handles it explicitly 5. implementation follows 6. real tests validate it If one of those steps is missing, confidence is weaker. ## Bottom Line The process is: 1. design the contract 2. model the state 3. define the scenarios 4. simulate the protocol 5. implement carefully 6. validate in real tests 7. feed failures back into design That is the process we should keep using for V2 and any later protocol line.