Browse Source
fix(test): address flaky S3 distributed lock integration test (#8888)
fix(test): address flaky S3 distributed lock integration test (#8888)
* fix(test): address flaky S3 distributed lock integration test Two root causes: 1. Lock ring convergence race: After waitForFilerCount(2) confirms the master sees both filers, there's a window where filer0's lock ring still only contains itself (master's LockRingUpdate broadcast is delayed by the 1s stabilization timer). During this window filer0 considers itself primary for ALL keys, so both filers can independently grant the same lock. Fix: Add waitForLockRingConverged() that acquires the same lock through both filers and verifies mutual exclusion before proceeding. 2. Hash function mismatch: ownerForObjectLock used util.HashStringToLong (MD5 + modulo) to predict lock owners, but the production DLM uses CRC32 consistent hashing via HashRing. This meant the test could pick keys that route to the same filer, not exercising the cross-filer coordination it intended to test. Fix: Use lock_manager.NewHashRing + GetPrimary() to match production routing exactly. * fix(test): verify lock denial reason in convergence check Ensure the convergence check only returns true when the second lock attempt is denied specifically because the lock is already owned, avoiding false positives from transient errors. * fix(test): check one key per primary filer in convergence wait A single arbitrary key can false-pass: if its real primary is the filer with the stale ring, mutual exclusion holds trivially because that filer IS the correct primary. Generate one test key per distinct primary using the same consistent-hash ring as production, so a stale ring on any filer is caught deterministically.pull/8879/merge
committed by
GitHub
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
2 changed files with 133 additions and 15 deletions
-
127test/s3/distributed_lock/distributed_lock_cluster_test.go
-
21test/s3/distributed_lock/distributed_lock_test.go
Write
Preview
Loading…
Cancel
Save
Reference in new issue