* feat: add S3 bucket size and object count metrics
Adds periodic collection of bucket size metrics:
- SeaweedFS_s3_bucket_size_bytes: logical size (deduplicated across replicas)
- SeaweedFS_s3_bucket_physical_size_bytes: physical size (including replicas)
- SeaweedFS_s3_bucket_object_count: object count (deduplicated)
Collection runs every 1 minute via background goroutine that queries
filer Statistics RPC for each bucket's collection.
Also adds Grafana dashboard panels for:
- S3 Bucket Size (logical vs physical)
- S3 Bucket Object Count
* address PR comments: fix bucket size metrics collection
1. Fix collectCollectionInfoFromMaster to use master VolumeList API
- Now properly queries master for topology info
- Uses WithMasterClient to get volume list from master
- Correctly calculates logical vs physical size based on replication
2. Return error when filerClient is nil to trigger fallback
- Changed from 'return nil, nil' to 'return nil, error'
- Ensures fallback to filer stats is properly triggered
3. Implement pagination in listBucketNames
- Added listBucketPageSize constant (1000)
- Uses StartFromFileName for pagination
- Continues fetching until fewer entries than limit returned
4. Handle NewReplicaPlacementFromByte error and prevent division by zero
- Check error return from NewReplicaPlacementFromByte
- Default to 1 copy if error occurs
- Add explicit check for copyCount == 0
* simplify bucket size metrics: remove filer fallback, align with quota enforcement
- Remove fallback to filer Statistics RPC
- Use only master topology for collection info (same as s3.bucket.quota.enforce)
- Updated comments to clarify this runs the same collection logic as quota enforcement
- Simplified code by removing collectBucketSizeFromFilerStats
* use s3a.option.Masters directly instead of querying filer
* address PR comments: fix dashboard overlaps and improve metrics collection
Grafana dashboard fixes:
- Fix overlapping panels 55 and 59 in grafana_seaweedfs.json (moved 59 to y=30)
- Fix grid collision in k8s dashboard (moved panel 72 to y=48)
- Aggregate bucket metrics with max() by (bucket) for multi-instance S3 gateways
Go code improvements:
- Add graceful shutdown support via context cancellation
- Use ticker instead of time.Sleep for better shutdown responsiveness
- Distinguish EOF from actual errors in stream handling
* improve bucket size metrics: multi-master failover and proper error handling
- Initial delay now respects context cancellation using select with time.After
- Use WithOneOfGrpcMasterClients for multi-master failover instead of hardcoding Masters[0]
- Properly propagate stream errors instead of just logging them (EOF vs real errors)
* improve bucket size metrics: distributed lock and volume ID deduplication
- Add distributed lock (LiveLock) so only one S3 instance collects metrics at a time
- Add IsLocked() method to LiveLock for checking lock status
- Fix deduplication: use volume ID tracking instead of dividing by copyCount
- Previous approach gave wrong results if replicas were missing
- Now tracks seen volume IDs and counts each volume only once
- Physical size still includes all replicas for accurate disk usage reporting
* rename lock to s3.leader
* simplify: remove StartBucketSizeMetricsCollection wrapper function
* fix data race: use atomic operations for LiveLock.isLocked field
- Change isLocked from bool to int32
- Use atomic.LoadInt32/StoreInt32 for all reads/writes
- Sync shared isLocked field in StartLongLivedLock goroutine
* add nil check for topology info to prevent panic
* fix bucket metrics: use Ticker for consistent intervals, fix pagination logic
- Use time.Ticker instead of time.After for consistent interval execution
- Fix pagination: count all entries (not just directories) for proper termination
- Update lastFileName for all entries to prevent pagination issues
* address PR comments: remove redundant atomic store, propagate context
- Remove redundant atomic.StoreInt32 in StartLongLivedLock (AttemptToLock already sets it)
- Propagate context through metrics collection for proper cancellation on shutdown
- collectAndUpdateBucketSizeMetrics now accepts ctx
- collectCollectionInfoFromMaster uses ctx for VolumeList RPC
- listBucketNames uses ctx for ListEntries RPC
* fix: add S3 bucket traffic sent metric tracking
The BucketTrafficSent() function was defined but never called, causing
the S3 Bucket Traffic Sent Grafana dashboard panel to not display data.
Added BucketTrafficSent() calls in the streaming functions:
- streamFromVolumeServers: for inline and chunked content
- streamFromVolumeServersWithSSE: for encrypted range and full object requests
The traffic received metric already worked because BucketTrafficReceived()
was properly called in putToFiler() for both regular and multipart uploads.
* feat: add S3 API Calls per Bucket panel to Grafana dashboards
Added a new panel showing API calls per bucket using the existing
SeaweedFS_s3_request_total metric aggregated by bucket.
Updated all Grafana dashboard files:
- other/metrics/grafana_seaweedfs.json
- other/metrics/grafana_seaweedfs_k8s.json
- other/metrics/grafana_seaweedfs_heartbeat.json
- k8s/charts/seaweedfs/dashboards/seaweedfs-grafana-dashboard.json
* address PR comments: use actual bytes written for traffic metrics
- Use actual bytes written from w.Write instead of expected size for inline content
- Add countingWriter wrapper to track actual bytes for chunked content streaming
- Update streamDecryptedRangeFromChunks to return actual bytes written for SSE
- Remove redundant nil check that caused linter warning
- Fix duplicate panel id 86 in grafana_seaweedfs.json (changed to 90)
- Fix overlapping panel positions in grafana_seaweedfs_k8s.json (rebalanced x positions)
* fix grafana k8s dashboard: rebalance S3 panels to avoid overlap
- Panel 86 (S3 API Calls per Bucket): w:6, x:0, y:15
- Panel 67 (S3 Request Duration 95th): w:6, x:6, y:15
- Panel 68 (S3 Request Duration 80th): w:6, x:12, y:15
- Panel 65 (S3 Request Duration 99th): w:6, x:18, y:15
All four S3 panels now fit in a single row (y:15) with width 6 each.
Filer row header at y:22 and subsequent panels remain correctly positioned.
* add input validation and clarify comments in adjustRangeForPart
- Add validation that partStartOffset <= partEndOffset at function start
- Add clarifying comments for suffix-range handling where clientEnd
temporarily holds the suffix length before being reassigned
* align pluginVersion for panel 86 to 10.3.1 in k8s dashboard
* track partial writes for accurate egress traffic accounting
- Change condition from 'err == nil' to 'written > 0' for inline content
- Move BucketTrafficSent before error check for chunked content streaming
- Track traffic even on partial SSE range writes
- Track traffic even on partial full SSE object copies
This ensures egress traffic is counted even when writes fail partway through,
providing more accurate bandwidth metrics.
Fixes#7467
The -mserver argument line in volume-statefulset.yaml was missing a
trailing backslash, which prevented extraArgs from being passed to
the weed volume process.
Also:
- Extracted master server list generation logic into shared helper
templates in _helpers.tpl for better maintainability
- Updated all occurrences of deprecated -mserver flag to -master
across docker-compose files, test files, and documentation
The allowEmptyFolder option is no longer functional because:
1. The code that used it was already commented out
2. Empty folder cleanup is now handled asynchronously by EmptyFolderCleaner
The CLI flags are kept for backward compatibility but marked as deprecated
and ignored. This removes:
- S3ApiServerOption.AllowEmptyFolder field
- The actual usage in s3api_object_handlers_list.go
- Helm chart values and template references
- References in test Makefiles and docker-compose files
* helm: enhance all-in-one deployment configuration
Fixes#7110
This PR addresses multiple issues with the all-in-one Helm chart configuration:
## New Features
### Configurable Replicas
- Added `allInOne.replicas` (was hardcoded to 1)
### S3 Gateway Configuration
- Added full S3 config under `allInOne.s3`:
- port, httpsPort, domainName, allowEmptyFolder
- enableAuth, existingConfigSecret, auditLogConfig
- createBuckets for declarative bucket creation
### SFTP Server Configuration
- Added full SFTP config under `allInOne.sftp`:
- port, sshPrivateKey, hostKeysFolder, authMethods
- maxAuthTries, bannerMessage, loginGraceTime
- clientAliveInterval, clientAliveCountMax, enableAuth
### Command Line Arguments
- Added `allInOne.extraArgs` for custom CLI arguments
### Update Strategy
- Added `allInOne.updateStrategy.type` (Recreate/RollingUpdate)
### Secret Environment Variables
- Added `allInOne.secretExtraEnvironmentVars` for injecting secrets
### Ingress Support
- Added `allInOne.ingress` with S3, filer, and master sub-configs
### Storage Options
- Enhanced `allInOne.data` with existingClaim support
- Added PVC template for persistentVolumeClaim type
## CI Enhancements
- Added comprehensive tests for all-in-one configurations
- Tests cover replicas, S3, SFTP, extraArgs, strategies, PVC, ingress
* helm: add real cluster deployment tests to CI
- Deploy all-in-one cluster with S3 enabled on kind cluster
- Test Master API (/cluster/status endpoint)
- Test Filer API (file upload/download)
- Test S3 API (/status endpoint)
- Test S3 operations with AWS CLI:
- Create/delete buckets
- Upload/download/delete objects
- Verify file content integrity
* helm: simplify CI and remove all-in-one ingress
Address review comments:
- Remove detailed all-in-one template rendering tests from CI
- Remove real cluster deployment tests from CI
- Remove all-in-one ingress template and values configuration
Keep the core improvements:
- allInOne.replicas configuration
- allInOne.s3.* full configuration
- allInOne.sftp.* full configuration
- allInOne.extraArgs support
- allInOne.updateStrategy configuration
- allInOne.secretExtraEnvironmentVars support
* helm: address review comments
- Fix post-install-bucket-hook.yaml: add filer.s3.enableAuth and
filer.s3.existingConfigSecret to or statements for consistency
- Fix all-in-one-deployment.yaml: use default function for s3.domainName
- Fix all-in-one-deployment.yaml: use hasKey function for s3.allowEmptyFolder
* helm: clarify updateStrategy multi-replica behavior
Expand comment to warn users that RollingUpdate with multiple replicas
requires shared storage (ReadWriteMany) to avoid data loss.
* helm: address gemini-code-assist review comments
- Make PVC accessModes configurable to support ReadWriteMany for
multi-replica deployments (defaults to ReadWriteOnce)
- Use configured readiness probe paths in post-install bucket hook
instead of hardcoded paths, respecting custom configurations
* helm: simplify allowEmptyFolder logic using coalesce
Use coalesce function for cleaner template code as suggested in review.
* helm: fix extraArgs trailing backslash issue
Remove trailing backslash after the last extraArgs argument to avoid
shell syntax error. Use counter to only add backslash between arguments.
* helm: fix fallback logic for allInOne s3/sftp configuration
Changes:
- Set allInOne.s3.* and allInOne.sftp.* override parameters to null by default
This allows proper inheritance from global s3.* and sftp.* settings
- Fix allowEmptyFolder logic to use explicit nil checking instead of coalesce
The coalesce/default functions treat 'false' as empty, causing incorrect
fallback behavior when users want to explicitly set false values
Addresses review feedback about default value conflicts with fallback logic.
* helm: fix exec in bucket creation loop causing premature termination
Remove 'exec' from the range loops that create and configure S3 buckets.
The exec command replaces the current shell process, causing the script
to terminate after the first bucket, preventing creation/configuration
of subsequent buckets.
* helm: quote extraArgs to handle arguments with spaces
Use the quote function to ensure each item in extraArgs is treated as
a single, complete argument even if it contains spaces.
* helm: make s3/filer ingress work for both normal and all-in-one modes
Modified s3-ingress.yaml and filer-ingress.yaml to dynamically select
the service name based on deployment mode:
- Normal mode: points to seaweedfs-s3 / seaweedfs-filer services
- All-in-one mode: points to seaweedfs-all-in-one service
This eliminates the need for separate all-in-one ingress templates.
Users can now use the standard s3.ingress and filer.ingress settings
for both deployment modes.
* helm: fix allInOne.data.size and storageClass to use null defaults
Change size and storageClass from empty strings to null so the template
defaults (10Gi for size, cluster default for storageClass) will apply
correctly. Empty strings prevent the Helm | default function from working.
* helm: fix S3 ingress to include standalone S3 gateway case
Add s3.enabled check to the $s3Enabled logic so the ingress works for:
1. Standalone S3 gateway (s3.enabled)
2. S3 on Filer (filer.s3.enabled) when not in all-in-one mode
3. S3 in all-in-one mode (allInOne.s3.enabled)
* fix: S3 downloads failing after idle timeout (#7618)
The idle timeout was incorrectly terminating active downloads because
read and write deadlines were managed independently. During a download,
the server writes data but rarely reads, so the read deadline would
expire even though the connection was actively being used.
Changes:
1. Simplify to single Timeout field - since this is a 'no activity timeout'
where any activity extends the deadline, separate read/write timeouts
are unnecessary. Now uses SetDeadline() which sets both at once.
2. Implement proper 'no activity timeout' - any activity (read or write)
now extends the deadline. The connection only times out when there's
genuinely no activity in either direction.
3. Increase default S3 idleTimeout from 10s to 120s for additional safety
margin when fetching chunks from slow storage backends.
Fixes#7618
* Update weed/util/net_timeout.go
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
---------
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Fix the templates to read scheme from httpGet.scheme instead of the
probe level, matching the structure defined in values.yaml.
This ensures that changing *.livenessProbe.httpGet.scheme or
*.readinessProbe.httpGet.scheme in values.yaml now correctly affects
the rendered manifests.
Affected components: master, filer, volume, s3, all-in-one
Fixes#7615
* ingress config
* fixing issues
* prefix path type
For the S3 ingress path /, using pathType: Prefix is more explicit and standard-compliant for matching all subpaths. While ImplementationSpecific might work similarly with your current Ingress controller (often defaulting to a prefix match when use-regex is not enabled), Prefix clearly states the intent and improves portability across different Ingress controllers.
---------
Co-authored-by: Philipp Kraus <philipp.kraus@flashpixx.de>
Co-authored-by: Chris Lu <chris.lu@gmail.com>
* WEED_CLUSTER_SW_* Environment Variables should not be passed to allInOne config
* address comment
* address comments
Fixed filtering logic: Replaced specific key matching with regex patterns that catch ALL WEED_CLUSTER_*_MASTER and WEED_CLUSTER_*_FILER variables:
}
Corrected merge precedence: Fixed the merge order so global environment variables properly override allInOne variables:
* refactoring
the `helm.sh/chart` line with the changing version number breaks helm upgrades to due to `matchLabels` being immutable.
drop the offending line as it does not belong into the `matchLabels`
* fix missing support for .Values.global.repository
* rework based on gemini feedback to handle repository+imageName more cleanly
* use base rather than last + splitList