CRITICAL: Assignment validation was running on EVERY LookupTopicBrokers call!
Problem (from CPU profile):
- ensureTopicActiveAssignments: 14.18% CPU (2.56s out of 18.05s)
- EnsureAssignmentsToActiveBrokers: 14.18% CPU (2.56s)
- ConcurrentMap.IterBuffered: 12.85% CPU (2.32s) - iterating all brokers
- Called on EVERY LookupTopicBrokers request, even with cached config!
Root Cause:
LookupTopicBrokers flow was:
1. getTopicConfFromCache() - returns cached config (fast ✅)
2. ensureTopicActiveAssignments() - validates assignments (slow ❌)
Even though config was cached, we still validated assignments every time,
iterating through ALL active brokers on every single request. With 250
requests/sec, this meant 250 full broker iterations per second!
Solution:
Move assignment validation inside getTopicConfFromCache() and only run it
on cache misses:
Changes to broker_topic_conf_read_write.go:
- Modified getTopicConfFromCache() to validate assignments after filer read
- Validation only runs on cache miss (not on cache hit)
- If hasChanges: Save to filer immediately, invalidate cache, return
- If no changes: Cache config with validated assignments
- Added ensureTopicActiveAssignmentsUnsafe() helper (returns bool)
- Kept ensureTopicActiveAssignments() for other callers (saves to filer)
Changes to broker_grpc_lookup.go:
- Removed ensureTopicActiveAssignments() call from LookupTopicBrokers
- Assignment validation now implicit in getTopicConfFromCache()
- Added comments explaining the optimization
Cache Behavior:
- Cache HIT: Return config immediately, skip validation (saves 14% CPU!)
- Cache MISS: Read filer -> validate assignments -> cache result
- If broker changes detected: Save to filer, invalidate cache, return
- Next request will re-read and re-validate (ensures consistency)
Performance Impact:
With 30-second cache TTL and 250 lookups/sec:
- Before: 250 validations/sec × 10ms each = 2.5s CPU/sec (14% overhead)
- After: 0.17 validations/sec (only on cache miss)
- Reduction: 99.93% fewer validations
Expected CPU Reduction:
- Before (with cache): 18.05s total, 2.56s validation (14%)
- After (with optimization): ~15.5s total (-14% = ~2.5s saved)
- Combined with previous cache fix: 25.18s -> ~15.5s (38% total reduction)
Cache Consistency:
- Assignments validated when config first cached
- If broker membership changes, assignments updated and saved
- Cache invalidated to force fresh read
- All brokers eventually converge on correct assignments
Testing:
- ✅ Compiles successfully
- Ready to deploy and measure CPU improvement
Priority: CRITICAL - Completes optimization of LookupTopicBrokers hot path