Browse Source
CRITICAL: Assignment validation was running on EVERY LookupTopicBrokers call! Problem (from CPU profile): - ensureTopicActiveAssignments: 14.18% CPU (2.56s out of 18.05s) - EnsureAssignmentsToActiveBrokers: 14.18% CPU (2.56s) - ConcurrentMap.IterBuffered: 12.85% CPU (2.32s) - iterating all brokers - Called on EVERY LookupTopicBrokers request, even with cached config! Root Cause: LookupTopicBrokers flow was: 1. getTopicConfFromCache() - returns cached config (fast ✅) 2. ensureTopicActiveAssignments() - validates assignments (slow ❌) Even though config was cached, we still validated assignments every time, iterating through ALL active brokers on every single request. With 250 requests/sec, this meant 250 full broker iterations per second! Solution: Move assignment validation inside getTopicConfFromCache() and only run it on cache misses: Changes to broker_topic_conf_read_write.go: - Modified getTopicConfFromCache() to validate assignments after filer read - Validation only runs on cache miss (not on cache hit) - If hasChanges: Save to filer immediately, invalidate cache, return - If no changes: Cache config with validated assignments - Added ensureTopicActiveAssignmentsUnsafe() helper (returns bool) - Kept ensureTopicActiveAssignments() for other callers (saves to filer) Changes to broker_grpc_lookup.go: - Removed ensureTopicActiveAssignments() call from LookupTopicBrokers - Assignment validation now implicit in getTopicConfFromCache() - Added comments explaining the optimization Cache Behavior: - Cache HIT: Return config immediately, skip validation (saves 14% CPU!) - Cache MISS: Read filer -> validate assignments -> cache result - If broker changes detected: Save to filer, invalidate cache, return - Next request will re-read and re-validate (ensures consistency) Performance Impact: With 30-second cache TTL and 250 lookups/sec: - Before: 250 validations/sec × 10ms each = 2.5s CPU/sec (14% overhead) - After: 0.17 validations/sec (only on cache miss) - Reduction: 99.93% fewer validations Expected CPU Reduction: - Before (with cache): 18.05s total, 2.56s validation (14%) - After (with optimization): ~15.5s total (-14% = ~2.5s saved) - Combined with previous cache fix: 25.18s -> ~15.5s (38% total reduction) Cache Consistency: - Assignments validated when config first cached - If broker membership changes, assignments updated and saved - Cache invalidated to force fresh read - All brokers eventually converge on correct assignments Testing: - ✅ Compiles successfully - Ready to deploy and measure CPU improvement Priority: CRITICAL - Completes optimization of LookupTopicBrokers hot pathpull/7329/head
2 changed files with 45 additions and 9 deletions
Loading…
Reference in new issue