Tree:
47238df0d7
add-ec-vacuum
add-filer-iam-grpc
add-iam-grpc-management
add_fasthttp_client
add_remote_storage
adding-message-queue-integration-tests
adjust-fsck-cutoff-default
admin/csrf-s3tables
allow-no-role-arn
also-delete-parent-directory-if-empty
avoid_releasing_temp_file_on_write
cautious-dinosaur
changing-to-zap
coderabbitai/autofix/fafd849
codex-rust-volume-server-bootstrap
codex/8712-directory-marker-content-type
codex/admin-oidc-auth-ui
codex/cache-iam-policy-engines
codex/ec-repair-worker
codex/erasure-coding-shard-distribution
codex/list-object-versions-newest-first
codex/s3tables-maint-lifecycle-parity
codex/s3tables-maint-planner-multispec
codex/s3tables-maintenance-designs
collect-public-metrics
copilot/fix-helm-chart-installation
copilot/fix-s3-object-tagging-issue
copilot/make-renew-interval-configurable
copilot/make-renew-interval-configurable-again
copilot/sub-pr-7677
create-table-snapshot-api-design
data_query_pushdown
dependabot/maven/other/java/client/com.google.protobuf-protobuf-java-3.25.5
dependabot/maven/other/java/examples/org.apache.hadoop-hadoop-common-3.4.0
detect-and-plan-ec-tasks
do-not-retry-if-error-is-NotFound
ec-disk-type-support
enhance-erasure-coding
expand-the-s3-PutObject-permission-to-the-multipart-permissions
fasthttp
feat/mount-showSystemEntries
feature-8113-storage-class-disk-routing
feature/mini-port-detection
feature/modernize-s3-tests
feature/s3-multi-cert-support
feature/s3tables-improvements-and-spark-tests
feature/sra-uds-handler
feature/sw-block
filer1_maintenance_branch
fix-8303-s3-lifecycle-ttl-assign
fix-GetObjectLockConfigurationHandler
fix-bucket-name-case-7910
fix-helm-fromtoml-compatibility
fix-mount-http-parallelism
fix-mount-read-throughput-7504
fix-pr-7909
fix-s3-configure-consistency
fix-s3-object-tagging-issue-7589
fix-sts-session-token-7941
fix-versioning-listing-only
fix/8712-directory-markers-content-type
fix/iceberg-stage-create-semantics
fix/lock-table-shared-lock-precedence
fix/mount-cache-consistency
fix/object-lock-delete-enforcement
fix/plugin-ui-remove-scheduler-settings
fix/s3-conditional-headers-toctou-race
fix/s3-delete-directory-marker-non-empty
fix/sts-body-preservation
fix/subscribe-metadata-slow-consumer-blocked
fix/windows-test-file-cleanup
ftp
gh-pages
has-weed-sql-command
iam-multi-file-migration
iam-permissions-and-api
improve-fuse-mount
improve-fuse-mount2
lifecycle/pr1-evaluator
logrus
master
mercury-dance
message_send
moored-spoon
mount2
mq-subscribe
mq2
nfs-cookie-prefix-list-fixes
optimize-delete-lookups
original_weed_mount
plugin-system-phase1
plugin-ui-enhancements-restored
pr-7412
pr/7984
pr/8140
pr/8680
raft-dual-write
random_access_file
refactor-needle-read-operations
refactor-volume-write
remote_overlay
remove-claude-ci
remove-implicit-directory-handling
revert-5134-patch-1
revert-5819-patch-1
revert-6434-bugfix-missing-s3-audit
s3-remote-cache-singleflight
s3-select
s3tables-by-claude
scheduler-sequential-iteration
sub
tcp_read
test-reverting-lock-table
test_udp
testing
testing-sdx-generation
tikv
track-mount-e2e
upgrade-versions-to-4.00
volume_buffered_writes
worker-execute-ec-tasks
0.72
0.72.release
0.73
0.74
0.75
0.76
0.77
0.90
0.91
0.92
0.93
0.94
0.95
0.96
0.97
0.98
0.99
1.00
1.01
1.02
1.03
1.04
1.05
1.06
1.07
1.08
1.09
1.10
1.11
1.12
1.14
1.15
1.16
1.17
1.18
1.19
1.20
1.21
1.22
1.23
1.24
1.25
1.26
1.27
1.28
1.29
1.30
1.31
1.32
1.33
1.34
1.35
1.36
1.37
1.38
1.40
1.41
1.42
1.43
1.44
1.45
1.46
1.47
1.48
1.49
1.50
1.51
1.52
1.53
1.54
1.55
1.56
1.57
1.58
1.59
1.60
1.61
1.61RC
1.62
1.63
1.64
1.65
1.66
1.67
1.68
1.69
1.70
1.71
1.72
1.73
1.74
1.75
1.76
1.77
1.78
1.79
1.80
1.81
1.82
1.83
1.84
1.85
1.86
1.87
1.88
1.90
1.91
1.92
1.93
1.94
1.95
1.96
1.97
1.98
1.99
1;70
2.00
2.01
2.02
2.03
2.04
2.05
2.06
2.07
2.08
2.09
2.10
2.11
2.12
2.13
2.14
2.15
2.16
2.17
2.18
2.19
2.20
2.21
2.22
2.23
2.24
2.25
2.26
2.27
2.28
2.29
2.30
2.31
2.32
2.33
2.34
2.35
2.36
2.37
2.38
2.39
2.40
2.41
2.42
2.43
2.47
2.48
2.49
2.50
2.51
2.52
2.53
2.54
2.55
2.56
2.57
2.58
2.59
2.60
2.61
2.62
2.63
2.64
2.65
2.66
2.67
2.68
2.69
2.70
2.71
2.72
2.73
2.74
2.75
2.76
2.77
2.78
2.79
2.80
2.81
2.82
2.83
2.84
2.85
2.86
2.87
2.88
2.89
2.90
2.91
2.92
2.93
2.94
2.95
2.96
2.97
2.98
2.99
3.00
3.01
3.02
3.03
3.04
3.05
3.06
3.07
3.08
3.09
3.10
3.11
3.12
3.13
3.14
3.15
3.16
3.18
3.19
3.20
3.21
3.22
3.23
3.24
3.25
3.26
3.27
3.28
3.29
3.30
3.31
3.32
3.33
3.34
3.35
3.36
3.37
3.38
3.39
3.40
3.41
3.42
3.43
3.44
3.45
3.46
3.47
3.48
3.50
3.51
3.52
3.53
3.54
3.55
3.56
3.57
3.58
3.59
3.60
3.61
3.62
3.63
3.64
3.65
3.66
3.67
3.68
3.69
3.71
3.72
3.73
3.74
3.75
3.76
3.77
3.78
3.79
3.80
3.81
3.82
3.83
3.84
3.85
3.86
3.87
3.88
3.89
3.90
3.91
3.92
3.93
3.94
3.95
3.96
3.97
3.98
3.99
4.00
4.01
4.02
4.03
4.04
4.05
4.06
4.07
4.08
4.09
4.12
4.13
4.15
4.16
4.17
dev
helm-3.65.1
v0.69
v0.70beta
v3.33
${ noResults }
19 Commits (47238df0d71682f47c8ec646bf1bd865a1487f20)
| Author | SHA1 | Message | Date |
|---|---|---|---|
|
|
47238df0d7 |
fix: add RecoveryOrchestrator as real integrated entry path
New: orchestrator.go — RecoveryOrchestrator drives recovery lifecycle from assignment through execution to completion/escalation: - ProcessAssignment: reconcile + session creation + auto-log - ExecuteRecovery: connect → handshake from RetainedHistory → outcome - CompleteCatchUp: begin catch-up → progress → complete + auto-log - CompleteRebuild: connect → handshake → history-driven source → transfer → tail replay → complete + auto-log - InvalidateEpoch: invalidate stale sessions + auto-log All integration tests rewritten to use orchestrator as entry path. No direct sender API calls in recovery lifecycle. SessionSnapshot now includes: TruncateRequired/ToLSN/Recorded, RebuildSource, RebuildPhase. RecoveryLog is auto-populated by orchestrator at every transition. 7 integration tests via orchestrator: - ChangedAddress, NeedsRebuild→Rebuild, EpochBump, MultiReplica - Observability: session snapshot, rebuild snapshot, auto-populated log Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> |
2 days ago |
|
|
7436b3b79c |
feat: add integration closure and observability (Phase 05 Slice 4)
New files: - observe.go: RegistryStatus, SenderStatus, RecoveryLog for debugging - integration_test.go: V2-boundary integration tests through real engine entry path Observability: - Registry.Status() returns full snapshot: per-sender state, session snapshots, counts by category (InSync, Recovering, Rebuilding) - RecoveryLog: append-only event log for recovery lifecycle debugging Integration tests (6): - ChangedAddress_FullFlow: initial recovery → address change → sender preserved → new session → recovery with proof - NeedsRebuild_ThenRebuildAssignment: catch-up fails → NeedsRebuild → rebuild assignment → history-driven source → InSync - EpochBump_DuringRecovery: mid-recovery epoch bump → old session rejected → new assignment at new epoch → InSync - MultiReplica_MixedOutcomes: 3 replicas, 3 outcomes via RetainedHistory proofs, registry status verified - RegistryStatus_Snapshot: observability snapshot structure - RecoveryLog: event recording and filtering Engine module at 54 tests (12 + 18 + 18 + 6). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> |
2 days ago |
|
|
4d06622c01 |
fix: add nil check for RetainedHistory in sender APIs
RecordHandshakeFromHistory and SelectRebuildFromHistory now return an error instead of panicking on nil history input. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> |
2 days ago |
|
|
cc8c529962 |
fix: connect recovery decisions to RetainedHistory, fix rebuild source
RetainedHistory as engine input: - RecordHandshakeFromHistory: sender-level API consuming RetainedHistory directly, returns RecoverabilityProof alongside outcome - SelectRebuildFromHistory: sender-level API consuming RetainedHistory for rebuild-source decision RebuildSourceDecision soundness: - Now requires BOTH trusted checkpoint AND replayable tail (CheckpointLSN >= TailLSN and CommittedLSN <= HeadLSN) - Trusted checkpoint with unreplayable tail falls back to full_base 4 new tests: - TrustedCheckpoint_UnreplayableTail (the regression case) - SenderDriven_CatchUp (history → proof → outcome → complete) - SenderDriven_Rebuild_SnapshotTail (history → source → rebuild) - SenderDriven_Rebuild_FallsBackToFullBase (unreplayable tail) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> |
2 days ago |
|
|
ff7ea41099 |
feat: add engine data/recoverability core (Phase 05 Slice 3)
New file: history.go — RetainedHistory connects recovery decisions to actual WAL retention state: - IsRecoverable: checks gap against tail/head boundaries - MakeHandshakeResult: generates HandshakeResult from retention state - RebuildSourceDecision: chooses snapshot+tail vs full base from checkpoint state (trusted vs untrusted) - ProveRecoverability: generates explicit proof explaining why recovery is or is not allowed 14 new tests (recoverability_test.go): - Recoverable/unrecoverable gap (exact boundary, beyond head) - Trusted/untrusted/no checkpoint → rebuild source selection - Handshake from retained history → outcome classification - Recoverability proofs (zero-gap, ahead, within retention, beyond) - E2E: two replicas driven by retained history (catch-up + rebuild) - Truncation required for replica ahead of committed Engine module at 44 tests (12 + 18 + 14). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> |
2 days ago |
|
|
368a956aee |
fix: correct catch-up entry counting and rebuild transfer gate
Entry counting: - Session.setRange now initializes recoveredTo = startLSN - RecordCatchUpProgress delta counts only actual catch-up work (recoveredTo - startLSN), not the replica's pre-existing prefix Rebuild transfer gate: - BeginTailReplay requires TransferredTo >= SnapshotLSN - Prevents tail replay on incomplete base transfer 3 new regression tests: - BudgetEntries_NonZeroStart_CountsOnlyDelta (30 entries within 50 budget) - BudgetEntries_NonZeroStart_ExceedsBudget (30 entries exceeds 20 budget) - Rebuild_PartialTransfer_BlocksTailReplay Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> |
3 days ago |
|
|
930de4ba78 |
feat: add Slice 2 recovery execution tests (Phase 05)
15 new engine-level recovery execution tests: - Zero-gap / catch-up / needs-rebuild branching (3 tests) - Stale execution rejection during active recovery (2 tests) - Bounded catch-up: frozen target, duration, entries, stall (5 tests) - Completion before convergence rejected - Rebuild exclusivity: catch-up APIs excluded (1 test) - Rebuild lifecycle: snapshot+tail, full base, stale ID (3 tests) - Assignment-driven recovery flow Engine module now at 27 tests (12 Slice 1 + 15 Slice 2). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> |
3 days ago |
|
|
61e9408261 |
fix: separate stable ReplicaID from Endpoint in registry
Registry is now keyed by stable ReplicaID, not by address.
DataAddr changes preserve sender identity — the core V2 invariant.
Changes:
- ReplicaAssignment{ReplicaID, Endpoint} replaces map[string]Endpoint
- AssignmentIntent.Replicas uses []ReplicaAssignment
- Registry.Reconcile takes []ReplicaAssignment
- Tests use stable IDs ("replica-1", "r1") independent of addresses
New test: ChangedDataAddr_PreservesSenderIdentity
- Same ReplicaID, different DataAddr (10.0.0.1 → 10.0.0.2)
- Sender pointer preserved, session invalidated, new session attached
- This is the exact V1/V1.5 regression that V2 must fix
doc.go: clarified Slice 1 core vs carried-forward files
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
3 days ago |
|
|
bb24b4b039 |
fix: encapsulate engine sender/session authority state
All mutable state on Sender and Session is now unexported: - Sender.state, .epoch, .endpoint, .session, .stopped → accessors - Session.id, .phase, .kind, etc. → read-only accessors - Session() replaced by SessionSnapshot() (returns disconnected copy) - SessionID() and HasActiveSession() for common queries - AttachSession returns (sessionID, error) not (*Session, error) - SupersedeSession returns sessionID not *Session Budget configuration via SessionOption: - WithBudget(CatchUpBudget) passed to AttachSession - No direct field mutation on session from external code New test: Encapsulation_SnapshotIsReadOnly proves snapshot mutation does not leak back to sender state. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> |
3 days ago |
|
|
20d70f9fb6 |
feat: add V2 engine replication core (Phase 05 Slice 1)
Creates sw-block/engine/replication/ — the real V2 engine ownership core, promoted from sw-block/prototype/enginev2/ with all accepted invariants. Files: - types.go: Endpoint, ReplicaState, SessionKind, SessionPhase, FSM transitions - sender.go: per-replica Sender with full execution + rebuild APIs - session.go: Session with identity, phases, frozen target, truncation, budget - registry.go: Registry with reconcile + assignment intent + epoch invalidation - budget.go: CatchUpBudget (duration, entries, stall detection) - rebuild.go: RebuildState FSM (snapshot+tail vs full base) - outcome.go: HandshakeResult + ClassifyRecoveryOutcome Tests (ownership_test.go, 13 tests): - Changed-address invalidation (A10) - Stale session ID rejected at all APIs (A3) - Stale completion after supersede (A3) - Epoch bump invalidates all sessions (A3) - Stale assignment epoch rejected - Rebuild exclusivity (catch-up APIs rejected) - Rebuild full lifecycle - Frozen target rejects chase (A5) - Budget violation escalates (A5) - E2E: 3 replicas, 3 outcomes Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> |
3 days ago |
|
|
26a1b33c2e |
feat: add A5-A8 acceptance traceability and rebuild-source evidence
Cleanup: removed redundant TargetLSNAtStart from CatchUpBudget. FrozenTargetLSN on RecoverySession is the single source of truth. Acceptance traceability (acceptance_test.go): - A5: 3 evidence tests (unrecoverable gap, budget escalation, frozen target) - A6: 2 evidence tests (exact boundary, contiguity required) - A7: 3 evidence tests (snapshot history, catch-up replay, truncation) - A8: 2 evidence tests (convergence required, truncation required) Rebuild-source decision evidence: - snapshot_tail when trusted base exists - full_base when no snapshot or untrusted - 3 explicit tests 13 new tests total. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> |
3 days ago |
|
|
8f5070679c |
fix: make frozen target intrinsic and rebuild completion exclusive
Frozen target is now unconditional: - FrozenTargetLSN field on RecoverySession, set by BeginCatchUp - RecordCatchUpProgress enforces FrozenTargetLSN regardless of Budget - Catch-up is always a bounded (R, H0] contract Rebuild completion exclusivity: - CompleteSessionByID explicitly rejects SessionRebuild by kind - Rebuild sessions can ONLY complete via CompleteRebuild Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> |
3 days ago |
|
|
8e4028758f |
fix: make rebuild path exclusive, enforce phase discipline, require tick for stall budget
Rebuild exclusivity:
- BeginCatchUp rejects SessionRebuild ("must use rebuild APIs")
- RecordCatchUpProgress rejects SessionRebuild
- Rebuild sessions can only be completed via CompleteRebuild
- All legacy rebuild-through-catch-up paths in tests converted
Phase discipline:
- SelectRebuildSource requires session.Phase == PhaseHandshake
- Cannot skip BeginConnect + RecordHandshake
Stall budget:
- RecordCatchUpProgress requires tick parameter when
ProgressDeadlineTicks > 0 (no silent stall budget bypass)
3 new tests: rebuild exclusivity (catch-up APIs rejected),
rebuild source requires handshake phase, stall budget requires tick.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
3 days ago |
|
|
5b66a85f92 |
fix: wire rebuild FSM into sender, enforce frozen target, fix entry counting
Rebuild execution path: - newRecoverySession auto-initializes RebuildState for SessionRebuild - Sender rebuild APIs: SelectRebuildSource, BeginRebuildTransfer, RecordRebuildTransferProgress, BeginRebuildTailReplay, RecordRebuildTailProgress, CompleteRebuild - All rebuild APIs are sender-authority-gated by sessionID - E2E rebuild test now drives through rebuild FSM, not catch-up APIs Bounded CatchUp enforcement: - BeginCatchUp freezes TargetLSNAtStart from session.TargetLSN - RecordCatchUpProgress rejects progress beyond frozen target - Entry counting uses LSN delta (recoveredTo - previous), not call count - Merged RecordCatchUpProgressAt into RecordCatchUpProgress (tick param) 5 new tests: target-frozen enforcement, sender-level rebuild via rebuild APIs, reject non-rebuild, reject stale ID on rebuild. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> |
3 days ago |
|
|
3f0048cbd9 |
feat: add bounded CatchUp budget and Rebuild mode state machine (Phase 4.5 P0)
Bounded CatchUp: - CatchUpBudget: MaxDurationTicks, MaxEntries, ProgressDeadlineTicks - BudgetCheck: runtime consumption tracker (StartTick, EntriesReplayed, LastProgressTick) - Sender.CheckBudget: evaluates budget, escalates to NeedsRebuild on violation - RecordCatchUpProgressAt: tracks progress tick for stall detection - BeginCatchUp accepts optional startTick for budget tracking Rebuild state machine: - RebuildSource: snapshot_tail (preferred) vs full_base (fallback) - RebuildPhase: init → source_select → transfer → tail_replay → completed|aborted - SelectSource: chooses based on snapshot availability - Phase ordering enforced, transfer regression rejected - ReadyToComplete validates target reached 13 new tests: budget enforcement (duration, entries, stall, no-budget), sender budget integration, rebuild lifecycle (snapshot+tail, full base, abort, phase order, regression), E2E bounded catch-up → rebuild. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> |
3 days ago |
|
|
90c39b549d |
feat: add prototype scenario closure (Phase 04 P4)
Maps V2 acceptance criteria A1-A7, A10 to enginev2 prototype evidence. Adds 4 V2-boundary scenarios against the prototype. Scenario tests: - A1: committed data survives promotion (WAL truncation boundary) - A2: uncommitted data truncated, not revived - A3: stale epoch fenced at sender + session + assignment layers - A4: short-gap catch-up with WAL-backed proof + data verification - A5: unrecoverable gap escalates to NeedsRebuild with proof - A6: recoverability boundary exact (tail +/- 1 LSN) - A7: historical data correct after tail advancement (snapshot) - A10: changed-address → invalidation → new assignment → recovery V2-boundary scenarios: - NeedsRebuild persists across topology update - catch-up does not overwrite safe data - 5 disconnect/reconnect cycles preserve sender identity - full V2 harness: 3 replicas, 3 outcomes (zero-gap, catch-up, rebuild) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> |
3 days ago |
|
|
942a0b7da7 |
fix: strengthen IsRecoverable contiguity check and StateAt snapshot correctness
IsRecoverable now verifies three conditions: - startExclusive >= tailLSN (not recycled) - endInclusive <= headLSN (within WAL) - all LSNs in range exist contiguously (no holes) StateAt now uses base snapshot captured during AdvanceTail: - returns nil for LSNs before snapshot boundary (unreconstructable) - correctly includes block state from recycled entries via snapshot 5 new tests: end-beyond-head, missing entries, state after tail advance, nil before snapshot, block last written before tail. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> |
4 days ago |
|
|
c89709e47e |
feat: add WAL history model and recoverability proof (Phase 04 P3)
Adds minimal historical-data prototype to enginev2: - WALHistory: retained-prefix model with Append, Commit, AdvanceTail, Truncate, EntriesInRange, IsRecoverable, StateAt - MakeHandshakeResult connects WAL state to outcome classification - RecordTruncation execution API for divergent tail cleanup - CompleteSessionByID gates on truncation when required - Zero-gap requires exact equality (FlushedLSN == CommittedLSN) - Replica-ahead classified as CatchUp with mandatory truncation 15 new tests: WAL basics, provable recoverability, unprovable gap, exact boundary, truncation enforcement, WAL-backed end-to-end recovery with data verification. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> |
4 days ago |
|
|
edec7098e8 |
feat: add V2 protocol simulator and enginev2 sender/session prototype
Adds sw-block/ directory with:
- distsim: protocol correctness simulator (96 tests)
- cluster model with epoch fencing, barrier semantics, commit modes
- endpoint identity, control-plane flow, candidate eligibility
- timeout events, timer races, same-tick ordering
- session ownership tracking with ID-based stale fencing
- enginev2: standalone V2 sender/session implementation (63 tests)
- per-replica Sender with identity-preserving reconciliation
- RecoverySession with FSM phase transitions and session ID
- execution APIs: BeginConnect, RecordHandshake, BeginCatchUp,
RecordCatchUpProgress, CompleteSessionByID — all sender-authority-gated
- recovery outcome branching: zero-gap, catch-up, needs-rebuild
- assignment-intent orchestration with epoch fencing
- design docs: acceptance criteria, open questions, first-slice spec,
protocol development process
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
4 days ago |