seaweedfs

History

chrislu 3a5b5ea02c improve: add circuit breaker to skip known-unhealthy filers The previous implementation tried all filers on every failure, including known-unhealthy ones. This wasted time retrying permanently down filers. Problem scenario (3 filers, filer0 is down): - Last successful: filer1 (saved as filerIndex=1) - Next lookup when filer1 fails: Retry 1: filer1(fail) → filer2(fail) → filer0(fail, wastes 5s timeout) Retry 2: filer1(fail) → filer2(fail) → filer0(fail, wastes 5s timeout) Retry 3: filer1(fail) → filer2(fail) → filer0(fail, wastes 5s timeout) Total wasted: 15 seconds on known-bad filer! Solution: Circuit breaker pattern - Track consecutive failures per filer (atomic int32) - Skip filers with 3+ consecutive failures - Re-check unhealthy filers every 30 seconds - Reset failure count on success New behavior: - filer0 fails 3 times → marked unhealthy - Future lookups skip filer0 for 30 seconds - After 30s, re-check filer0 (allows recovery) - If filer0 succeeds, reset failure count to 0 Benefits: 1. Avoids wasting time on known-down filers 2. Still sticks to last healthy filer (via filerIndex) 3. Allows recovery (30s re-check window) 4. No configuration needed (automatic) Implementation details: - filerHealth struct tracks failureCount (atomic) + lastFailureTime - shouldSkipUnhealthyFiler(): checks if we should skip this filer - recordFilerSuccess(): resets failure count to 0 - recordFilerFailure(): increments count, updates timestamp - Logs when skipping unhealthy filers (V(2) level) Example with circuit breaker: - filer0 down, saved filerIndex=1 (filer1 healthy) - Lookup 1: filer1(ok) → Done (0.01s) - Lookup 2: filer1(fail) → filer2(ok) → Done, save filerIndex=2 (0.01s) - Lookup 3: filer2(fail) → skip filer0 (unhealthy) → filer1(ok) → Done (0.01s) Much better than wasting 15s trying filer0 repeatedly!		2 months ago
..
admin	muted texts	3 months ago
cluster	adds FilerClient to use cached volume id	2 months ago
command	backup: handle volume not found when backing up (#7465)	3 months ago
credential	Filer Store: postgres backend support pgbouncer (#7077)	6 months ago
filer	improve: address remaining code review findings	2 months ago
filer_client	Clean up logs and deprecated functions (#7339)	4 months ago
glog	Add Kafka Gateway (#7231)	4 months ago
iam	S3: Enforce bucket policy (#7471)	3 months ago
iamapi	Clean up logs and deprecated functions (#7339)	4 months ago
images	Migrates from disintegration/imaging c2019 to cognusion/imaging c2024. (#5533)	2 years ago
kms	S3 API: Add integration with KMS providers (#7152)	5 months ago
mount	improve: address remaining code review findings	2 months ago
mq	S3: Directly read write volume servers (#7481)	3 months ago
notification	fix: dead letter message log message (#7072)	6 months ago
operation	S3: Directly read write volume servers (#7481)	3 months ago
pb	S3: Directly read write volume servers (#7481)	3 months ago
query	Fix date string parsing bug for the SQL Engine. (#7446)	3 months ago
remote_storage	Filer: Fixed critical bugs in the Azure SDK migration (PR #7310) (#7401)	3 months ago
replication	Filer: Fixed critical bugs in the Azure SDK migration (PR #7310) (#7401)	3 months ago
s3api	fix: FilerClient supports multiple filer addresses for high availability	2 months ago
security	remove spoof-able request header (#7103)	6 months ago
sequence	remove unused function	2 years ago
server	filer store: add foundationdb (#7178)	3 months ago
sftpd	S3 API: Advanced IAM System (#7160)	5 months ago
shell	Account Info (#7507)	3 months ago
static	Fix Broken Links (#5287)	2 years ago
stats	[volume] refactor and add metrics for flight upload and download data limit condition (#6920)	7 months ago
storage	Volume Server: avoid aggressive volume assignment (#7501)	3 months ago
telemetry	convert error fromating to %w everywhere (#6995)	7 months ago
topology	master: fix negative active volumes (#7440)	3 months ago
util	S3: Directly read write volume servers (#7481)	3 months ago
wdclient	improve: add circuit breaker to skip known-unhealthy filers	2 months ago
worker	go fmt	3 months ago
Makefile	test versioning also (#7000)	7 months ago
weed.go	set exit status	11 months ago