Browse Source

fix: always reset vidMap cache on master reconnection

The previous refactoring removed the else block that resets vidMap when
the first message from a newly connected master is not a VolumeLocation.

Problem scenario:
  1. Client connects to master-1 and builds vidMap cache
  2. Master-1 fails, client connects to master-2
  3. First message from master-2 is a ClusterNodeUpdate (not VolumeLocation)
  4. Old code: vidMap is reset and updated 
  5. New code: vidMap is NOT reset 
  6. Result: Client uses stale cache from master-1 → data access errors

Example flow with bug:
  Connect to master-2
  First message: ClusterNodeUpdate {filer.x added}
  → No resetVidMap() call
  → vidMap still has master-1's stale volume locations
  → Client reads from wrong volume servers → 404 errors

Fix:
  Restored the else block that resets vidMap when first message is not
  a VolumeLocation:

    if resp.VolumeLocation != nil {
      // ... check leader, reset, and update ...
    } else {
      // First message is ClusterNodeUpdate or other type
      // Must still reset to avoid stale data
      mc.resetVidMap()
    }

This ensures the cache is always cleared when establishing a new master
connection, regardless of what the first message type is.

Root cause:
  During the vidMapClient refactoring, this else block was accidentally
  dropped, making failover behavior fragile and non-deterministic (depends
  on which message type arrives first from the new master).

Impact:
  - High severity for master failover scenarios
  - Could cause read failures, 404s, or wrong data access
  - Only manifests when first message is not VolumeLocation
pull/7518/head
chrislu 2 weeks ago
parent
commit
7b264afdb4
  1. 6
      weed/wdclient/masterclient.go

6
weed/wdclient/masterclient.go

@ -201,6 +201,10 @@ func (mc *MasterClient) tryConnectToMaster(ctx context.Context, master pb.Server
}
mc.resetVidMap()
mc.updateVidMap(resp)
} else {
// First message from master is not VolumeLocation (e.g., ClusterNodeUpdate)
// Still need to reset cache to ensure we don't use stale data from previous master
mc.resetVidMap()
}
mc.setCurrentMaster(master)
@ -324,6 +328,7 @@ func (mc *MasterClient) setCurrentMaster(master pb.ServerAddress) {
// background goroutine, this will block indefinitely (or until ctx is canceled).
//
// Typical initialization pattern:
//
// mc := wdclient.NewMasterClient(...)
// go mc.KeepConnectedToMaster(ctx) // Start connection management
// // ... later ...
@ -404,4 +409,3 @@ func (mc *MasterClient) FindLeaderFromOtherPeers(myMasterAddress pb.ServerAddres
glog.V(0).Infof("No existing leader found!")
return
}
Loading…
Cancel
Save