Flusher now holds WAL entries needed by recoverable replicas.
Both AdvanceTail (physical space) and checkpointLSN (scan gate)
are gated by the minimum flushed LSN across catch-up-eligible
replicas.
New methods on ShipperGroup:
- MinRecoverableFlushedLSN() (uint64, bool): pure read, returns
min flushed LSN across InSync/Degraded/Disconnected/CatchingUp
replicas with known progress. Excludes NeedsRebuild.
- EvaluateRetentionBudgets(timeout): separate mutation step,
escalates replicas that exceed walRetentionTimeout (5m default)
to NeedsRebuild, releasing their WAL hold.
Flusher integration: evaluates budgets then queries floor on each
flush cycle. If floor < maxLSN, holds both checkpoint and tail.
Extent writes proceed normally (reads work), only WAL reclaim
is deferred.
LastContactTime on WALShipper: updated on barrier success,
handshake success, and catch-up completion. Not on Ship (TCP
write only). Avoids misclassifying idle-but-healthy replicas.
CP13-6 ships with timeout budget only. walRetentionMaxBytes
is deferred (documented as partial slice).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>