seaweedfs

Commit Graph

Author	SHA1	Message	Date
Chris Lu	f5c666052e	feat: add S3 bucket size and object count metrics (#7776 ) * feat: add S3 bucket size and object count metrics Adds periodic collection of bucket size metrics: - SeaweedFS_s3_bucket_size_bytes: logical size (deduplicated across replicas) - SeaweedFS_s3_bucket_physical_size_bytes: physical size (including replicas) - SeaweedFS_s3_bucket_object_count: object count (deduplicated) Collection runs every 1 minute via background goroutine that queries filer Statistics RPC for each bucket's collection. Also adds Grafana dashboard panels for: - S3 Bucket Size (logical vs physical) - S3 Bucket Object Count * address PR comments: fix bucket size metrics collection 1. Fix collectCollectionInfoFromMaster to use master VolumeList API - Now properly queries master for topology info - Uses WithMasterClient to get volume list from master - Correctly calculates logical vs physical size based on replication 2. Return error when filerClient is nil to trigger fallback - Changed from 'return nil, nil' to 'return nil, error' - Ensures fallback to filer stats is properly triggered 3. Implement pagination in listBucketNames - Added listBucketPageSize constant (1000) - Uses StartFromFileName for pagination - Continues fetching until fewer entries than limit returned 4. Handle NewReplicaPlacementFromByte error and prevent division by zero - Check error return from NewReplicaPlacementFromByte - Default to 1 copy if error occurs - Add explicit check for copyCount == 0 * simplify bucket size metrics: remove filer fallback, align with quota enforcement - Remove fallback to filer Statistics RPC - Use only master topology for collection info (same as s3.bucket.quota.enforce) - Updated comments to clarify this runs the same collection logic as quota enforcement - Simplified code by removing collectBucketSizeFromFilerStats * use s3a.option.Masters directly instead of querying filer * address PR comments: fix dashboard overlaps and improve metrics collection Grafana dashboard fixes: - Fix overlapping panels 55 and 59 in grafana_seaweedfs.json (moved 59 to y=30) - Fix grid collision in k8s dashboard (moved panel 72 to y=48) - Aggregate bucket metrics with max() by (bucket) for multi-instance S3 gateways Go code improvements: - Add graceful shutdown support via context cancellation - Use ticker instead of time.Sleep for better shutdown responsiveness - Distinguish EOF from actual errors in stream handling * improve bucket size metrics: multi-master failover and proper error handling - Initial delay now respects context cancellation using select with time.After - Use WithOneOfGrpcMasterClients for multi-master failover instead of hardcoding Masters[0] - Properly propagate stream errors instead of just logging them (EOF vs real errors) * improve bucket size metrics: distributed lock and volume ID deduplication - Add distributed lock (LiveLock) so only one S3 instance collects metrics at a time - Add IsLocked() method to LiveLock for checking lock status - Fix deduplication: use volume ID tracking instead of dividing by copyCount - Previous approach gave wrong results if replicas were missing - Now tracks seen volume IDs and counts each volume only once - Physical size still includes all replicas for accurate disk usage reporting * rename lock to s3.leader * simplify: remove StartBucketSizeMetricsCollection wrapper function * fix data race: use atomic operations for LiveLock.isLocked field - Change isLocked from bool to int32 - Use atomic.LoadInt32/StoreInt32 for all reads/writes - Sync shared isLocked field in StartLongLivedLock goroutine * add nil check for topology info to prevent panic * fix bucket metrics: use Ticker for consistent intervals, fix pagination logic - Use time.Ticker instead of time.After for consistent interval execution - Fix pagination: count all entries (not just directories) for proper termination - Update lastFileName for all entries to prevent pagination issues * address PR comments: remove redundant atomic store, propagate context - Remove redundant atomic.StoreInt32 in StartLongLivedLock (AttemptToLock already sets it) - Propagate context through metrics collection for proper cancellation on shutdown - collectAndUpdateBucketSizeMetrics now accepts ctx - collectCollectionInfoFromMaster uses ctx for VolumeList RPC - listBucketNames uses ctx for ListEntries RPC	3 days ago
Chris Lu	848bec6d24	Metrics: Add Prometheus metrics for concurrent upload tracking (#7555 ) * metrics: add Prometheus metrics for concurrent upload tracking Add Prometheus metrics to monitor concurrent upload activity for both filer and S3 servers. This provides visibility into the upload limiting feature added in the previous PR. New Metrics: - SeaweedFS_filer_in_flight_upload_bytes: Current bytes being uploaded to filer - SeaweedFS_filer_in_flight_upload_count: Current number of uploads to filer - SeaweedFS_s3_in_flight_upload_bytes: Current bytes being uploaded to S3 - SeaweedFS_s3_in_flight_upload_count: Current number of uploads to S3 The metrics are updated atomically whenever uploads start or complete, providing real-time visibility into upload concurrency levels. This helps operators: - Monitor upload concurrency in real-time - Set appropriate limits based on actual usage patterns - Detect potential bottlenecks or capacity issues - Track the effectiveness of upload limiting configuration * grafana: add dashboard panels for concurrent upload metrics Add 4 new panels to the Grafana dashboard to visualize the concurrent upload metrics added in this PR: Filer Section: - Filer Concurrent Uploads: Shows current number of concurrent uploads - Filer Concurrent Upload Bytes: Shows current bytes being uploaded S3 Gateway Section: - S3 Concurrent Uploads: Shows current number of concurrent uploads - S3 Concurrent Upload Bytes: Shows current bytes being uploaded These panels help operators monitor upload concurrency in real-time and tune the upload limiting configuration based on actual usage patterns. * more efficient	3 weeks ago
Chris Lu	5f7a292334	add build info metrics (#7525 ) * add build info metrics * unused * metrics on build * size limit * once	4 weeks ago
Konstantin Lebedev	93007c1842	[volume] refactor and add metrics for flight upload and download data limit condition (#6920 ) * refactor concurrentDownloadLimit * fix loop * fix cmdServer * fix: resolve conversation pr 6920 * Changes logging function (#6919) * updated logging methods for stores * updated logging methods for stores * updated logging methods for filer * updated logging methods for uploader and http_util * updated logging methods for weed server --------- Co-authored-by: akosov <a.kosov@kryptonite.ru> * Improve lock ring (#6921) * fix flaky lock ring test * add more tests * fix: build * fix: rm import util/version * fix: serverOptions * refactoring --------- Co-authored-by: Aleksey Kosov <rusyak777@list.ru> Co-authored-by: akosov <a.kosov@kryptonite.ru> Co-authored-by: Chris Lu <chrislusf@users.noreply.github.com> Co-authored-by: chrislu <chris.lu@gmail.com>	6 months ago
Aleksey Kosov	ef4eda0761	added re-generating and writing the Volume UUID if it is empty (#6568 )	10 months ago
zouyixiong	8eab76c5db	fix: record and delete bucket metrics after inactive (#6523 ) Co-authored-by: XYZ <XYZ>	10 months ago
Hadi Zamani	a2330f624b	Add metrics for uploaded and deleted s3 objects (#6475 )	11 months ago
Hadi Zamani	c7ae969c06	Add bucket's traffic metrics (#6444 ) * Add bucket's traffic metrics * Add bucket traffic to dashboards * Fix bucket metrics help messages * Fix variable names	11 months ago
zouyixiong	d6f3e1970d	fix: filer may crash by bucketLastActiveTsNs concurrency access. (#6350 )	1 year ago
zouyixiong	9987a65e8a	fix: record and delete bucket metrics after inactive (#6349 )	1 year ago
Konstantin Lebedev	167b50be88	fix missing register master metric MasterPickForWriteErrorCounter (#6277 )	1 year ago
wyang	a7973ed7d1	fix deadlock hang when broadcast to clients (#6184 ) fix deadlock when broadcast to clients when master thransfer leader, the old master will disconnect with all filers and volumeServers, if the cluster is a big , the broadcast messages may be more big than the max of the channel len 100, then if the KeepConnect was not listen on the channel in disconnect, it will deadlock. and the whole cluster will not serve!	1 year ago
steve.wei	cfbe45c765	feat: add in-flight metric for s3/file/volume-server (#6120 )	1 year ago
Konstantin Lebedev	67a252ee8a	[master] refactor func ShouldGrowVolumes (#5884 )	1 year ago
Konstantin Lebedev	b2ffcdaab2	[master] do sync grow request only if absolutely necessary (#5821 ) * do sync grow request only if absolutely necessary https://github.com/seaweedfs/seaweedfs/pull/5819 * remove check VolumeGrowStrategy Threshold on PickForWrite * fix fmt.Errorf	1 year ago
Konstantin Lebedev	33964fa292	metrics stats of volume layout depends on the data center (#5775 ) stats volume layout depends on the data center	1 year ago
steve.wei	0bdf121e51	rename VolumeServerVolumeGauge (#5504 )	2 years ago
Konstantin Lebedev	d42a04cceb	[s3] fix s3 test_multipart_resend_first_finishes_last (#5471 ) * try fix s3 test https://github.com/seaweedfs/seaweedfs/pull/5466 * add error handler metrics * refactor * refactor multipartExt * delete bad entry parts	2 years ago
Konstantin Lebedev	dc9568fc0d	[master] add test for PickForWrite add metrics for volume layout (#5413 )	2 years ago
Konstantin Lebedev	a7fc723ae0	chore: add status code for request_total metrics (#5188 )	2 years ago
chrislu	81f11883e3	go fmt	2 years ago
chrislu	6ebe26a765	Revert "Revert "Revert "Add disk type to prometheus metrics" (#4777 )"" This reverts commit `567d788928`.	2 years ago
chrislu	567d788928	Revert "Revert "Add disk type to prometheus metrics" (#4777 )" This reverts commit `9215ba24be`.	2 years ago
Nico D'Cotta	796b7508f3	Implement SRV lookups for filer (#4767 )	2 years ago
Chris Lu	9215ba24be	Revert "Add disk type to prometheus metrics" (#4777 ) Revert "Add disk type to prometheus metrics (#4736)" This reverts commit `9956d93a40`.	2 years ago
Dmitry Mishin	9956d93a40	Add disk type to prometheus metrics (#4736 ) * Add disk type to prometheus metrics * Del metrics * Disk type as readable string --------- Co-authored-by: Dima Mishin <dimm@dimm.dev>	2 years ago
SmsS4	f61490966f	Add time to first byte metric for s3 (#4768 ) * Add time to first byte metric for s3 * Change second to millisecond	2 years ago
Kevin Liu	244385bf0d	Fix binding metrics to ipv6 (#4286 ) * Fix binding metrics to ipv6 * Update weed/stats/metrics.go --------- Co-authored-by: Chris Lu <chrislusf@users.noreply.github.com>	3 years ago
Jiffs Maverick	4b0430e71d	[metrics] Add the ability to control bind ip (#4012 )	3 years ago
Konstantin Lebedev	1f7e52c63e	vacuum metrics and force sync dst files (#3832 )	3 years ago
Konstantin Lebedev	5db25a8f2a	metric shows who is currently blocking the cluster or not (#3799 ) * master_admin_lock Shows whether cluster is locked now or not https://github.com/seaweedfs/seaweedfs/issues/3452 * fix metric MasterAdminLock	3 years ago
Konstantin Lebedev	a522507f95	configure raft metrics (#3798 )	3 years ago
chrislu	26dbc6c905	move to https://github.com/seaweedfs/seaweedfs	3 years ago
Evgeny Kuzhelev	ac5ce31278	leave notion to refactor after prometheus upgrade	3 years ago
Evgeny Kuzhelev	47c72e6f35	remove all (currently existing) collection volume metrics	3 years ago
zzq09494	62f74f5e3c	add bucket label to s3 prometheus metrics	4 years ago
zhihao.qu	42d04c581b	feat(filer.sync): add metricsServer in filer.sync. Metrics include: (1) the offset of the filer.sync (2) the last send timestamp of the filer subscription	4 years ago
Konstantin Lebedev	fb57aa431c	stats master_replica_placement_mismatch	4 years ago
Konstantin Lebedev	5c9259fa3c	fix metrics master name	4 years ago
Konstantin Lebedev	c9952759c4	metrics master is leader	4 years ago
Konstantin Lebedev	28efe31524	new master metrics	4 years ago
Chris Lu	0128239c0f	handle ipv6 addresses	4 years ago
Jonas Falck	829b195084	Add process metrics of weed itself	5 years ago
Chris Lu	575d7952a1	add available resource stats fix https://github.com/chrislusf/seaweedfs/issues/1555	5 years ago
Konstantin Lebedev	dc2e13092d	add number of read only volumes metric	5 years ago
Chris Lu	9ab98fa912	s3 metrics adjust the label	5 years ago
Konstantin Lebedev	86329bbf2b	label name is statusCode	5 years ago
Konstantin Lebedev	68463e92c1	add status code in S3RequestCounter	5 years ago
Chris Lu	4856bce0ee	adjust for metrics port	5 years ago
Konstantin Lebedev	324e44d4b3	add start metrics server	5 years ago

1 2

68 Commits (f5c666052ec1bb0b002a23b28a8eadedfa754f26)