seaweedfs

chrislu 0e1afe8943 perf: add topic configuration cache to fix 60% CPU overhead CRITICAL PERFORMANCE FIX: Added topic configuration caching to eliminate massive CPU overhead from repeated filer reads and JSON unmarshaling on EVERY fetch request. Problem (from CPU profile): - ReadTopicConfFromFiler: 42.45% CPU (5.76s out of 13.57s) - protojson.Unmarshal: 25.64% CPU (3.48s) - GetOrGenerateLocalPartition called on EVERY FetchMessage request - No caching - reading from filer and unmarshaling JSON every time - This caused filer, gateway, and broker to be extremely busy Root Cause: GetOrGenerateLocalPartition() is called on every FetchMessage request and was calling ReadTopicConfFromFiler() without any caching. Each call: 1. Makes gRPC call to filer (expensive) 2. Reads JSON from disk (expensive) 3. Unmarshals protobuf JSON (25% of CPU!) The disk I/O fix (previous commit) made this worse by enabling more reads, exposing this performance bottleneck. Solution: Added topicConfCache similar to existing topicExistsCache: Changes to broker_server.go: - Added topicConfCacheEntry struct - Added topicConfCache map to MessageQueueBroker - Added topicConfCacheMu RWMutex for thread safety - Added topicConfCacheTTL (30 seconds) - Initialize cache in NewMessageBroker() Changes to broker_topic_conf_read_write.go: - Modified GetOrGenerateLocalPartition() to check cache first - Cache HIT: Return cached config immediately (V(4) log) - Cache MISS: Read from filer, cache result, proceed - Added invalidateTopicConfCache() for cache invalidation - Added import "time" for cache TTL Cache Strategy: - TTL: 30 seconds (matches topicExistsCache) - Thread-safe with RWMutex - Cache key: topic.String() (e.g., "kafka.loadtest-topic-0") - Invalidation: Call invalidateTopicConfCache() when config changes Expected Results: - Before: 60% CPU on filer reads + JSON unmarshaling - After: <1% CPU (only on cache miss every 30s) - Filer load: Reduced by ~99% (from every fetch to once per 30s) - Gateway CPU: Dramatically reduced - Broker CPU: Dramatically reduced - Throughput: Should increase significantly Performance Impact: With 50 msgs/sec per topic × 5 topics = 250 fetches/sec: - Before: 250 filer reads/sec (25000% overhead!) - After: 0.17 filer reads/sec (5 topics / 30s TTL) - Reduction: 99.93% fewer filer calls Testing: - ✅ Compiles successfully - Ready for load test to verify CPU reduction Priority: CRITICAL - Fixes production-breaking performance issue Related: Works with previous commit (disk I/O fix) to enable correct and fast reads	2 weeks ago
..
jquery.sparkline.min.js	embed static resources via statik	7 years ago

chrislu 0e1afe8943 perf: add topic configuration cache to fix 60% CPU overhead

CRITICAL PERFORMANCE FIX: Added topic configuration caching to eliminate
massive CPU overhead from repeated filer reads and JSON unmarshaling on
EVERY fetch request.

Problem (from CPU profile):
- ReadTopicConfFromFiler: 42.45% CPU (5.76s out of 13.57s)
- protojson.Unmarshal: 25.64% CPU (3.48s)
- GetOrGenerateLocalPartition called on EVERY FetchMessage request
- No caching - reading from filer and unmarshaling JSON every time
- This caused filer, gateway, and broker to be extremely busy

Root Cause:
GetOrGenerateLocalPartition() is called on every FetchMessage request and
was calling ReadTopicConfFromFiler() without any caching. Each call:
1. Makes gRPC call to filer (expensive)
2. Reads JSON from disk (expensive)
3. Unmarshals protobuf JSON (25% of CPU!)

The disk I/O fix (previous commit) made this worse by enabling more reads,
exposing this performance bottleneck.

Solution:
Added topicConfCache similar to existing topicExistsCache:

Changes to broker_server.go:
- Added topicConfCacheEntry struct
- Added topicConfCache map to MessageQueueBroker
- Added topicConfCacheMu RWMutex for thread safety
- Added topicConfCacheTTL (30 seconds)
- Initialize cache in NewMessageBroker()

Changes to broker_topic_conf_read_write.go:
- Modified GetOrGenerateLocalPartition() to check cache first
- Cache HIT: Return cached config immediately (V(4) log)
- Cache MISS: Read from filer, cache result, proceed
- Added invalidateTopicConfCache() for cache invalidation
- Added import "time" for cache TTL

Cache Strategy:
- TTL: 30 seconds (matches topicExistsCache)
- Thread-safe with RWMutex
- Cache key: topic.String() (e.g., "kafka.loadtest-topic-0")
- Invalidation: Call invalidateTopicConfCache() when config changes

Expected Results:
- Before: 60% CPU on filer reads + JSON unmarshaling
- After: <1% CPU (only on cache miss every 30s)
- Filer load: Reduced by ~99% (from every fetch to once per 30s)
- Gateway CPU: Dramatically reduced
- Broker CPU: Dramatically reduced
- Throughput: Should increase significantly

Performance Impact:
With 50 msgs/sec per topic × 5 topics = 250 fetches/sec:
- Before: 250 filer reads/sec (25000% overhead!)
- After: 0.17 filer reads/sec (5 topics / 30s TTL)
- Reduction: 99.93% fewer filer calls

Testing:
- ✅ Compiles successfully
- Ready for load test to verify CPU reduction

Priority: CRITICAL - Fixes production-breaking performance issue
Related: Works with previous commit (disk I/O fix) to enable correct and fast reads

2 weeks ago

jquery.sparkline.min.js

embed static resources via statik

7 years ago