Add rust_builder stage to Dockerfile that builds weed-volume from the
cloned source. Uses TARGETARCH to only compile on amd64/arm64, placing
an empty placeholder on arm/386. TAGS=5BytesOffset controls the 5-byte
offset feature (large disk mode).
Add volume-rust entrypoint case that checks for a real binary before
exec, printing a helpful error on unsupported platforms.
Rename the Rust volume server binary to weed-volume for consistency
with the Go weed binary naming. The library crate name remains
seaweed_volume to avoid changing all internal imports.
The Rust volume server uses 5-byte offsets by default (17-byte idx
entries). Go tests that parse idx files need -tags 5BytesOffset so
NeedleMapEntrySize matches (17 instead of 16).
Go's request_id.New() generates "%X%08X" (timestamp hex + random hex),
not a UUID. Update Rust to use the same format, and fix the test
assertion which expected UUID format (>= 32 chars with hyphens).
The go.mod was bumped to go 1.25.0 on master, breaking CI workflows
that hardcoded GO_VERSION: '1.24'. Switch to go-version-file: 'go.mod'
so the Go version is always derived from go.mod automatically.
* admin: remove misleading "secret key only shown once" warning
The access key details modal already allows viewing both the access key
and secret key at any time, so the warning about the secret key only
being displayed once is incorrect and misleading.
* admin: allow specifying custom access key and secret key
Add optional access_key and secret_key fields to the create access key
API. When provided, the specified keys are used instead of generating
random ones. The UI now shows a form with optional fields when creating
a new key, with a note that leaving them blank auto-generates keys.
* admin: check access key uniqueness before creating
Access keys must be globally unique across all users since S3 auth
looks them up in a single global map. Add an explicit check using
GetUserByAccessKey before creating, so the user gets a clear error
("access key is already in use") rather than a generic store error.
* Update object_store_users_templ.go
* admin: address review feedback for access key creation
Handler:
- Use decodeJSONBody/newJSONMaxReader instead of raw json.Decode to
enforce request size limits and handle malformed JSON properly
- Return 409 Conflict for duplicate access keys, 400 Bad Request for
validation errors, instead of generic 500
Backend:
- Validate access key length (4-128 chars) and secret key length
(8-128 chars) when user-provided
Frontend:
- Extract resetCreateKeyForm() helper to avoid duplicated cleanup logic
- Wire resetCreateKeyForm to accessKeysModal hidden.bs.modal event so
form state is always cleared when modal is dismissed
- Change secret key input to type="password" with a visibility toggle
* admin: guard against nil request and handle GetUserByAccessKey errors
- Add nil check for the CreateAccessKeyRequest pointer before
dereferencing, defaulting to an empty request (auto-generate both
keys).
- Handle non-"not found" errors from GetUserByAccessKey explicitly
instead of silently proceeding, so store errors (e.g. db connection
failures) surface rather than being swallowed.
* Update object_store_users_templ.go
* admin: fix access key uniqueness check with gRPC store
GetUserByAccessKey returns a gRPC NotFound status error (not the
sentinel credential.ErrAccessKeyNotFound) when using the gRPC store,
causing the uniqueness check to fail with a spurious error.
Treat the lookup as best-effort: only reject when a user is found
(err == nil). Any error (not-found via any store, connectivity issues)
falls through to the store's own CreateAccessKey which enforces
uniqueness definitively.
* admin: fix error handling and input validation for access key creation
Backend:
- Remove access key value from the duplicate-key error message to avoid
logging the caller-supplied identifier.
Handler:
- Handle empty POST body (io.EOF) as a valid request that auto-generates
both keys, instead of rejecting it as malformed JSON.
- Return 404 for "not found" errors (e.g. non-existent user) instead of
collapsing them into a 500.
Frontend:
- Add minlength/maxlength attributes matching backend constraints
(access key 4-128, secret key 8-128).
- Call reportValidity() before submitting so invalid lengths are caught
client-side without a round trip.
* admin: use sentinel errors and fix GetUserByAccessKey error handling
Backend (user_management.go):
- Define sentinel errors (ErrAccessKeyInUse, ErrUserNotFound,
ErrInvalidInput) and wrap them in returned errors so callers can use
errors.Is.
- Handle GetUserByAccessKey errors properly: check the sentinel
credential.ErrAccessKeyNotFound first, then fall back to string
matching for stores (gRPC) that return non-sentinel not-found errors.
Surface unexpected errors instead of silently proceeding.
Handler (user_handlers.go):
- Replace fragile strings.Contains error matching with errors.Is
against the new dash sentinels.
Frontend (object_store_users.templ):
- Add double-submit guard (isCreatingKey flag + button disabling) to
prevent duplicate access key creation requests.
Three operational improvements to match Go volume server behavior:
1. Options file (-options flag):
Load CLI options from a file, one per line (key=value format).
Supports comments (#), leading dashes stripped. CLI args override.
2. /metrics on admin port:
Serve Prometheus metrics on the main admin HTTP port in addition
to the separate metrics port, matching Go's behavior.
3. SIGHUP reload:
On SIGHUP, reload security config (whitelist from security.toml)
and scan disk locations for new volumes (LoadNewVolumes equivalent).
Guard wrapped in RwLock for runtime whitelist updates.
* fix: ListObjectVersions interleave Version and DeleteMarker in sort order
Go's default xml.Marshal serializes struct fields in definition order,
causing all <Version> elements to appear before all <DeleteMarker>
elements. The S3 API contract requires these elements to be interleaved
in the correct global sort order (by key ascending, then newest version
first within each key).
This broke clients that validate version list ordering within a single
key — an older Version would appear before a newer DeleteMarker for the
same object.
Fix: Replace the separate Versions/DeleteMarkers/CommonPrefixes arrays
with a single Entries []VersionListEntry slice. Each VersionListEntry
uses a per-element MarshalXML that outputs the correct XML tag name
(<Version>, <DeleteMarker>, or <CommonPrefixes>) based on which field
is populated. Since the entries are already in their correct sorted
order from buildSortedCombinedList, the XML output is automatically
interleaved correctly.
Also removes the unused ListObjectVersionsResult struct.
Note: The reporter also mentioned a cross-key timestamp ordering issue
when paginating with max-keys=1, but that is correct S3 behavior —
ListObjectVersions sorts by key name (ascending), not by timestamp.
Different keys having non-monotonic timestamps is expected.
* test: add CommonPrefixes XML marshaling coverage for ListObjectVersions
* fix: validate VersionListEntry has exactly one field set in MarshalXML
Return an error instead of silently emitting an empty <Version> element
when no field (or multiple fields) are populated. Also clean up the
misleading xml:"Version" struct tag on the Entries field.
* iam: add Group message to protobuf schema
Add Group message (name, members, policy_names, disabled) and
add groups field to S3ApiConfiguration for IAM group management
support (issue #7742).
* iam: add group CRUD to CredentialStore interface and all backends
Add group management methods (CreateGroup, GetGroup, DeleteGroup,
ListGroups, UpdateGroup) to the CredentialStore interface with
implementations for memory, filer_etc, postgres, and grpc stores.
Wire group loading/saving into filer_etc LoadConfiguration and
SaveConfiguration.
* iam: add group IAM response types
Add XML response types for group management IAM actions:
CreateGroup, DeleteGroup, GetGroup, ListGroups, AddUserToGroup,
RemoveUserFromGroup, AttachGroupPolicy, DetachGroupPolicy,
ListAttachedGroupPolicies, ListGroupsForUser.
* iam: add group management handlers to embedded IAM API
Add CreateGroup, DeleteGroup, GetGroup, ListGroups, AddUserToGroup,
RemoveUserFromGroup, AttachGroupPolicy, DetachGroupPolicy,
ListAttachedGroupPolicies, and ListGroupsForUser handlers with
dispatch in ExecuteAction.
* iam: add group management handlers to standalone IAM API
Add group handlers (CreateGroup, DeleteGroup, GetGroup, ListGroups,
AddUserToGroup, RemoveUserFromGroup, AttachGroupPolicy, DetachGroupPolicy,
ListAttachedGroupPolicies, ListGroupsForUser) and wire into DoActions
dispatch. Also add helper functions for user/policy side effects.
* iam: integrate group policies into authorization
Add groups and userGroups reverse index to IdentityAccessManagement.
Populate both maps during ReplaceS3ApiConfiguration and
MergeS3ApiConfiguration. Modify evaluateIAMPolicies to evaluate
policies from user's enabled groups in addition to user policies.
Update VerifyActionPermission to consider group policies when
checking hasAttachedPolicies.
* iam: add group side effects on user deletion and rename
When a user is deleted, remove them from all groups they belong to.
When a user is renamed, update group membership references. Applied
to both embedded and standalone IAM handlers.
* iam: watch /etc/iam/groups directory for config changes
Add groups directory to the filer subscription watcher so group
file changes trigger IAM configuration reloads.
* admin: add group management page to admin UI
Add groups page with CRUD operations, member management, policy
attachment, and enable/disable toggle. Register routes in admin
handlers and add Groups entry to sidebar navigation.
* test: add IAM group management integration tests
Add comprehensive integration tests for group CRUD, membership,
policy attachment, policy enforcement, disabled group behavior,
user deletion side effects, and multi-group membership. Add
"group" test type to CI matrix in s3-iam-tests workflow.
* iam: address PR review comments for group management
- Fix XSS vulnerability in groups.templ: replace innerHTML string
concatenation with DOM APIs (createElement/textContent) for rendering
member and policy lists
- Use userGroups reverse index in embedded IAM ListGroupsForUser for
O(1) lookup instead of iterating all groups
- Add buildUserGroupsIndex helper in standalone IAM handlers; use it
in ListGroupsForUser and removeUserFromAllGroups for efficient lookup
- Add note about gRPC store load-modify-save race condition limitation
* iam: add defensive copies, validation, and XSS fixes for group management
- Memory store: clone groups on store/retrieve to prevent mutation
- Admin dash: deep copy groups before mutation, validate user/policy exists
- HTTP handlers: translate credential errors to proper HTTP status codes,
use *bool for Enabled field to distinguish missing vs false
- Groups templ: use data attributes + event delegation instead of inline
onclick for XSS safety, prevent stale async responses
* iam: add explicit group methods to PropagatingCredentialStore
Add CreateGroup, GetGroup, DeleteGroup, ListGroups, and UpdateGroup
methods instead of relying on embedded interface fallthrough. Group
changes propagate via filer subscription so no RPC propagation needed.
* iam: detect postgres unique constraint violation and add groups index
Return ErrGroupAlreadyExists when INSERT hits SQLState 23505 instead of
a generic error. Add index on groups(disabled) for filtered queries.
* iam: add Marker field to group list response types
Add Marker string field to GetGroupResult, ListGroupsResult,
ListAttachedGroupPoliciesResult, and ListGroupsForUserResult to
match AWS IAM pagination response format.
* iam: check group attachment before policy deletion
Reject DeletePolicy if the policy is attached to any group, matching
AWS IAM behavior. Add PolicyArn to ListAttachedGroupPolicies response.
* iam: include group policies in IAM authorization
Merge policy names from user's enabled groups into the IAMIdentity
used for authorization, so group-attached policies are evaluated
alongside user-attached policies.
* iam: check for name collision before renaming user in UpdateUser
Scan identities and inline policies for newUserName before mutating,
returning EntityAlreadyExists if a collision is found. Reuse the
already-loaded policies instead of loading them again inside the loop.
* test: use t.Cleanup for bucket cleanup in group policy test
* iam: wrap ErrUserNotInGroup sentinel in RemoveGroupMember error
Wrap credential.ErrUserNotInGroup so errors.Is works in
groupErrorToHTTPStatus, returning proper 400 instead of 500.
* admin: regenerate groups_templ.go with XSS-safe data attributes
Regenerated from groups.templ which uses data-group-name attributes
instead of inline onclick with string interpolation.
* iam: add input validation and persist groups during migration
- Validate nil/empty group name in CreateGroup and UpdateGroup
- Save groups in migrateToMultiFile so they survive legacy migration
* admin: use groupErrorToHTTPStatus in GetGroupMembers and GetGroupPolicies
* iam: short-circuit UpdateUser when newUserName equals current name
* iam: require empty PolicyNames before group deletion
Reject DeleteGroup when group has attached policies, matching the
existing members check. Also fix GetGroup error handling in
DeletePolicy to only skip ErrGroupNotFound, not all errors.
* ci: add weed/pb/** to S3 IAM test trigger paths
* test: replace time.Sleep with require.Eventually for propagation waits
Use polling with timeout instead of fixed sleeps to reduce flakiness
in integration tests waiting for IAM policy propagation.
* fix: use credentialManager.GetPolicy for AttachGroupPolicy validation
Policies created via CreatePolicy through credentialManager are stored
in the credential store, not in s3cfg.Policies (which only has static
config policies). Change AttachGroupPolicy to use credentialManager.GetPolicy()
for policy existence validation.
* feat: add UpdateGroup handler to embedded IAM API
Add UpdateGroup action to enable/disable groups and rename groups
via the IAM API. This is a SeaweedFS extension (not in AWS SDK) used
by tests to toggle group disabled status.
* fix: authenticate raw IAM API calls in group tests
The embedded IAM endpoint rejects anonymous requests. Replace
callIAMAPI with callIAMAPIAuthenticated that uses JWT bearer token
authentication via the test framework.
* feat: add UpdateGroup handler to standalone IAM API
Mirror the embedded IAM UpdateGroup handler in the standalone IAM API
for parity.
* fix: add omitempty to Marker XML tags in group responses
Non-truncated responses should not emit an empty <Marker/> element.
* fix: distinguish backend errors from missing policies in AttachGroupPolicy
Return ServiceFailure for credential manager errors instead of masking
them as NoSuchEntity. Also switch ListGroupsForUser to use s3cfg.Groups
instead of in-memory reverse index to avoid stale data. Add duplicate
name check to UpdateGroup rename.
* fix: standalone IAM AttachGroupPolicy uses persisted policy store
Check managed policies from GetPolicies() instead of s3cfg.Policies
so dynamically created policies are found. Also add duplicate name
check to UpdateGroup rename.
* fix: rollback inline policies on UpdateUser PutPolicies failure
If PutPolicies fails after moving inline policies to the new username,
restore both the identity name and the inline policies map to their
original state to avoid a partial-write window.
* fix: correct test cleanup ordering for group tests
Replace scattered defers with single ordered t.Cleanup in each test
to ensure resources are torn down in reverse-creation order:
remove membership, detach policies, delete access keys, delete users,
delete groups, delete policies. Move bucket cleanup to parent test
scope and delete objects before bucket.
* fix: move identity nil check before map lookup and refine hasAttachedPolicies
Move the nil check on identity before accessing identity.Name to
prevent panic. Also refine hasAttachedPolicies to only consider groups
that are enabled and have actual policies attached, so membership in
a no-policy group doesn't incorrectly trigger IAM authorization.
* fix: fail group reload on unreadable or corrupt group files
Return errors instead of logging and continuing when group files
cannot be read or unmarshaled. This prevents silently applying a
partial IAM config with missing group memberships or policies.
* fix: use errors.Is for sql.ErrNoRows comparison in postgres group store
* docs: explain why group methods skip propagateChange
Group changes propagate to S3 servers via filer subscription
(watching /etc/iam/groups/) rather than gRPC RPCs, since there
are no group-specific RPCs in the S3 cache protocol.
* fix: remove unused policyNameFromArn and strings import
* fix: update service account ParentUser on user rename
When renaming a user via UpdateUser, also update ParentUser references
in service accounts to prevent them from becoming orphaned after the
next configuration reload.
* fix: wrap DetachGroupPolicy error with ErrPolicyNotAttached sentinel
Use credential.ErrPolicyNotAttached so groupErrorToHTTPStatus maps
it to 400 instead of falling back to 500.
* fix: use admin S3 client for bucket cleanup in enforcement test
The user S3 client may lack permissions by cleanup time since the
user is removed from the group in an earlier subtest. Use the admin
S3 client to ensure bucket and object cleanup always succeeds.
* fix: add nil guard for group param in propagating store log calls
Prevent potential nil dereference when logging group.Name in
CreateGroup and UpdateGroup of PropagatingCredentialStore.
* fix: validate Disabled field in UpdateGroup handlers
Reject values other than "true" or "false" with InvalidInputException
instead of silently treating them as false.
* fix: seed mergedGroups from existing groups in MergeS3ApiConfiguration
Previously the merge started with empty group maps, dropping any
static-file groups. Now seeds from existing iam.groups before
overlaying dynamic config, and builds the reverse index after
merging to avoid stale entries from overridden groups.
* fix: use errors.Is for filer_pb.ErrNotFound comparison in group loading
Replace direct equality (==) with errors.Is() to correctly match
wrapped errors, consistent with the rest of the codebase.
* fix: add ErrUserNotFound and ErrPolicyNotFound to groupErrorToHTTPStatus
Map these sentinel errors to 404 so AddGroupMember and
AttachGroupPolicy return proper HTTP status codes.
* fix: log cleanup errors in group integration tests
Replace fire-and-forget cleanup calls with error-checked versions
that log failures via t.Logf for debugging visibility.
* fix: prevent duplicate group test runs in CI matrix
The basic lane's -run "TestIAM" regex also matched TestIAMGroup*
tests, causing them to run in both the basic and group lanes.
Replace with explicit test function names.
* fix: add GIN index on groups.members JSONB for membership lookups
Without this index, ListGroupsForUser and membership queries
require full table scans on the groups table.
* fix: handle cross-directory moves in IAM config subscription
When a file is moved out of an IAM directory (e.g., /etc/iam/groups),
the dir variable was overwritten with NewParentPath, causing the
source directory change to be missed. Now also notifies handlers
about the source directory for cross-directory moves.
* fix: validate members/policies before deleting group in admin handler
AdminServer.DeleteGroup now checks for attached members and policies
before delegating to credentialManager, matching the IAM handler guards.
* fix: merge groups by name instead of blind append during filer load
Match the identity loader's merge behavior: find existing group
by name and replace, only append when no match exists. Prevents
duplicates when legacy and multi-file configs overlap.
* fix: check DeleteEntry response error when cleaning obsolete group files
Capture and log resp.Error from filer DeleteEntry calls during
group file cleanup, matching the pattern used in deleteGroupFile.
* fix: verify source user exists before no-op check in UpdateUser
Reorder UpdateUser to find the source identity first and return
NoSuchEntityException if not found, before checking if the rename
is a no-op. Previously a non-existent user renamed to itself
would incorrectly return success.
* fix: update service account parent refs on user rename in embedded IAM
The embedded IAM UpdateUser handler updated group membership but
not service account ParentUser fields, unlike the standalone handler.
* fix: replay source-side events for all handlers on cross-dir moves
Pass nil newEntry to bucket, IAM, and circuit-breaker handlers for
the source directory during cross-directory moves, so all watchers
can clear caches for the moved-away resource.
* fix: don't seed mergedGroups from existing iam.groups in merge
Groups are always dynamic (from filer), never static (from s3.config).
Seeding from iam.groups caused stale deleted groups to persist.
Now only uses config.Groups from the dynamic filer config.
* fix: add deferred user cleanup in TestIAMGroupUserDeletionSideEffect
Register t.Cleanup for the created user so it gets cleaned up
even if the test fails before the inline DeleteUser call.
* fix: assert UpdateGroup HTTP status in disabled group tests
Add require.Equal checks for 200 status after UpdateGroup calls
so the test fails immediately on API errors rather than relying
on the subsequent Eventually timeout.
* fix: trim whitespace from group name in filer store operations
Trim leading/trailing whitespace from group.Name before validation
in CreateGroup and UpdateGroup to prevent whitespace-only filenames.
Also merge groups by name during multi-file load to prevent duplicates.
* fix: add nil/empty group validation in gRPC store
Guard CreateGroup and UpdateGroup against nil group or empty name
to prevent panics and invalid persistence.
* fix: add nil/empty group validation in postgres store
Guard CreateGroup and UpdateGroup against nil group or empty name
to prevent panics from nil member access and empty-name row inserts.
* fix: add name collision check in embedded IAM UpdateUser
The embedded IAM handler renamed users without checking if the
target name already existed, unlike the standalone handler.
* fix: add ErrGroupNotEmpty sentinel and map to HTTP 409
AdminServer.DeleteGroup now wraps conflict errors with
ErrGroupNotEmpty, and groupErrorToHTTPStatus maps it to
409 Conflict instead of 500.
* fix: use appropriate error message in GetGroupDetails based on status
Return "Group not found" only for 404, use "Failed to retrieve group"
for other error statuses instead of always saying "Group not found".
* fix: use backend-normalized group.Name in CreateGroup response
After credentialManager.CreateGroup may normalize the name (e.g.,
trim whitespace), use group.Name instead of the raw input for
the returned GroupData to ensure consistency.
* fix: add nil/empty group validation in memory store
Guard CreateGroup and UpdateGroup against nil group or empty name
to prevent panics from nil pointer dereference on map access.
* fix: reorder embedded IAM UpdateUser to verify source first
Find the source identity before checking for collisions, matching
the standalone handler's logic. Previously a non-existent user
renamed to an existing name would get EntityAlreadyExists instead
of NoSuchEntity.
* fix: handle same-directory renames in metadata subscription
Replay a delete event for the old entry name during same-directory
renames so handlers like onBucketMetadataChange can clean up stale
state for the old name.
* fix: abort GetGroups on non-ErrGroupNotFound errors
Only skip groups that return ErrGroupNotFound. Other errors (e.g.,
transient backend failures) now abort the handler and return the
error to the caller instead of silently producing partial results.
* fix: add aria-label and title to icon-only group action buttons
Add accessible labels to View and Delete buttons so screen readers
and tooltips provide meaningful context.
* fix: validate group name in saveGroup to prevent invalid filenames
Trim whitespace and reject empty names before writing group JSON
files, preventing creation of files like ".json".
* fix: add /etc/iam/groups to filer subscription watched directories
The groups directory was missing from the watched directories list,
so S3 servers in a cluster would not detect group changes made by
other servers via filer. The onIamConfigChange handler already had
code to handle group directory changes but it was never triggered.
* add direct gRPC propagation for group changes to S3 servers
Groups now have the same dual propagation as identities and policies:
direct gRPC push via propagateChange + async filer subscription.
- Add PutGroup/RemoveGroup proto messages and RPCs
- Add PutGroup/RemoveGroup in-memory cache methods on IAM
- Add PutGroup/RemoveGroup gRPC server handlers
- Update PropagatingCredentialStore to call propagateChange on group mutations
* reduce log verbosity for config load summary
Change ReplaceS3ApiConfiguration log from Infof to V(1).Infof
to avoid noisy output on every config reload.
* admin: show user groups in view and edit user modals
- Add Groups field to UserDetails and populate from credential manager
- Show groups as badges in user details view modal
- Add group management to edit user modal: display current groups,
add to group via dropdown, remove from group via badge x button
* fix: remove duplicate showAlert that broke modal-alerts.js
admin.js defined showAlert(type, message) which overwrote the
modal-alerts.js version showAlert(message, type), causing broken
unstyled alert boxes. Remove the duplicate and swap all callers
in admin.js to use the correct (message, type) argument order.
* fix: unwrap groups API response in edit user modal
The /api/groups endpoint returns {"groups": [...]}, not a bare array.
* Update object_store_users_templ.go
* test: assert AccessDenied error code in group denial tests
Replace plain assert.Error checks with awserr.Error type assertion
and AccessDenied code verification, matching the pattern used in
other IAM integration tests.
* fix: propagate GetGroups errors in ShowGroups handler
getGroupsPageData was swallowing errors and returning an empty page
with 200 status. Now returns the error so ShowGroups can respond
with a proper error status.
* fix: reject AttachGroupPolicy when credential manager is nil
Previously skipped policy existence validation when credentialManager
was nil, allowing attachment of nonexistent policies. Now returns
a ServiceFailureException error.
* fix: preserve groups during partial MergeS3ApiConfiguration updates
UpsertIdentity calls MergeS3ApiConfiguration with a partial config
containing only the updated identity (nil Groups). This was wiping
all in-memory group state. Now only replaces groups when
config.Groups is non-nil (full config reload).
* fix: propagate errors from group lookup in GetObjectStoreUserDetails
ListGroups and GetGroup errors were silently ignored, potentially
showing incomplete group data in the UI.
* fix: use DOM APIs for group badge remove button to prevent XSS
Replace innerHTML with onclick string interpolation with DOM
createElement + addEventListener pattern. Also add aria-label
and title to the add-to-group button.
* fix: snapshot group policies under RLock to prevent concurrent map access
evaluateIAMPolicies was copying the map reference via groupMap :=
iam.groups under RLock then iterating after RUnlock, while PutGroup
mutates the map in-place. Now copies the needed policy names into
a slice while holding the lock.
* fix: add nil IAM check to PutGroup and RemoveGroup gRPC handlers
Match the nil guard pattern used by PutPolicy/DeletePolicy to
prevent nil pointer dereference when IAM is not initialized.
- test_ec_encode_with_separate_idx_dir: EC encode reads .idx from separate dir
- test_ec_encode_fails_with_wrong_idx_dir: guards against ignoring idx_dir param
- test_destroy_preserves_vif: .vif survives volume destroy (needed by EC volumes)
- test_destroy_with_separate_idx_dir: destroy cleans both data and idx dirs
- test_parse_grpc_address_*: 5 tests for IP:port.grpcPort parsing including
IPv4 dot regression that broke VolumeEcShardsCopy
write_ec_files now accepts an idx_dir parameter so it can find the .idx
file when --dir.idx is configured. Previously it looked for .idx in the
data directory, which failed when a separate index directory was in use.
Two bugs found during EC encoding + balancing test with 4 Rust volume servers:
1. VolumeEcShardsCopy manually split source_data_node on '.' which broke
on IP addresses with dots. Now uses parse_grpc_address() like other RPCs.
2. remove_volume_files() deleted .vif files, breaking EC volumes that
need the .vif after the original volume is destroyed. Matches Go's
Destroy() which only removes .dat/.idx.
- rust_binaries_release.yml: builds linux/darwin/windows on tags
- rust_binaries_dev.yml: builds dev binaries on master push
- install.sh: universal installer for Go weed and Rust volume server
Calls GetMasterConfiguration RPC before heartbeat loop to fetch metrics
config, matching Go's checkWithMaster(). Ping RPC now actually connects
to the target volume/master server instead of returning a dummy response.
Pretty printing now activates for any non-empty ?pretty value (not just
?pretty=y). JSONP output drops trailing semicolon/newline to match Go.
StreamingBody now re-reads needle offset on compaction revision change.
During streaming reads, checks if the volume's compaction revision changed
between chunks and re-looks up the needle offset from the needle map,
matching Go's readNeedleDataInto behavior. Also threads ReadOption through
the volume read path.
Adds ReadOption with fields for meta-only reads, volume revision tracking,
slow read detection, and out-of-range flagging. Threaded through volume
read paths for future behavioral parity.
Parse CA certificate path from [https.volume] and [grpc.volume] sections
in security.toml. When configured, enables client certificate verification
using WebPkiClientVerifier for HTTP and client_ca_root for gRPC.
The previous implementation (set_len/set_len(0)) was a no-op.
Now uses fallocate(FALLOC_FL_KEEP_SIZE) on Linux to actually reserve
disk blocks without changing the visible file size.
Matches Go's architecture where EC volumes are managed per disk location,
enabling correct per-location max volume count calculation and proper
distribution of EC shards across disks.
When -max=0, dynamically calculate max volume count based on free disk
space, existing volumes, and EC shard count — matching Go's
MaybeAdjustVolumeMax(). Recalculate on each heartbeat tick and when
volume_size_limit changes from master.
Go's flag package uses single dash (-port) while clap uses double dash
(--port). Add normalize_args_vec() to convert single-dash long options
to double-dash before clap parsing, so both formats work. Update test
framework to use single-dash flags matching Go convention.
Port of Go's CompactMap: segmented sorted arrays with compressed keys.
NeedleId is split into chunk (u64) and compact key (u16), reducing
per-entry memory from ~40-48 bytes (HashMap) to ~10 bytes.
For 1M needles: ~10 MB instead of ~40-48 MB.
Three improvements to RedbNeedleMap:
- Use Durability::None on all write transactions since .idx is the
crash recovery source and redb is always rebuildable from it
- Delete stale .rdb before rebuild in load_from_idx to prevent
leftover entries surviving a crash
- Reuse existing .rdb on clean restart by storing .idx file size in
a metadata table; incrementally replay only new .idx entries when
the .idx has grown since last build
Add .rdb (redb index) cleanup to removeVolumeFiles and vacuum commit
in Go code, for compatibility with mixed Rust/Go volume server
deployments. Route .rdb through dirIdx in FileName() like .idx/.ldb.
Closes the metrics gap between Rust and Go volume servers (8 → 23
metrics). Adds handler counters, vacuuming histograms, volume/disk
gauges, inflight request tracking, and concurrent limit gauges.
Centralizes request counting in store handlers instead of per-handler.
* fix: volume balance detection now returns multiple tasks per run (#8551)
Previously, detectForDiskType() returned at most 1 balance task per disk
type, making the MaxJobsPerDetection setting ineffective. The detection
loop now iterates within each disk type, planning multiple moves until
the imbalance drops below threshold or maxResults is reached. Effective
volume counts are adjusted after each planned move so the algorithm
correctly re-evaluates which server is overloaded.
* fix: factor pending tasks into destination scoring and use UnixNano for task IDs
- Use UnixNano instead of Unix for task IDs to avoid collisions when
multiple tasks are created within the same second
- Adjust calculateBalanceScore to include LoadCount (pending + assigned
tasks) in the utilization estimate, so the destination picker avoids
stacking multiple planned moves onto the same target disk
* test: add comprehensive balance detection tests for complex scenarios
Cover multi-server convergence, max-server shifting, destination
spreading, pre-existing pending task skipping, no-duplicate-volume
invariant, and parameterized convergence verification across different
cluster shapes and thresholds.
* fix: address PR review findings in balance detection
- hasMore flag: compute from len(results) >= maxResults so the scheduler
knows more pages may exist, matching vacuum/EC handler pattern
- Exhausted server fallthrough: when no eligible volumes remain on the
current maxServer (all have pending tasks) or destination planning
fails, mark the server as exhausted and continue to the next
overloaded server instead of stopping the entire detection loop
- Return canonical destination server ID directly from createBalanceTask
instead of resolving via findServerIDByAddress, eliminating the
fragile address→ID lookup for adjustment tracking
- Fix bestScore sentinel: use math.Inf(-1) instead of -1.0 so disks
with negative scores (high pending load, same rack/DC) are still
selected as the best available destination
- Add TestDetection_ExhaustedServerFallsThrough covering the scenario
where the top server's volumes are all blocked by pre-existing tasks
* test: fix computeEffectiveCounts and add len guard in no-duplicate test
- computeEffectiveCounts now takes a servers slice to seed counts for all
known servers (including empty ones) and uses an address→ID map from
the topology spec instead of scanning metrics, so destination servers
with zero initial volumes are tracked correctly
- TestDetection_NoDuplicateVolumesAcrossIterations now asserts len > 1
before checking duplicates, so the test actually fails if Detection
regresses to returning a single task
* fix: remove redundant HasAnyTask check in createBalanceTask
The HasAnyTask check in createBalanceTask duplicated the same check
already performed in detectForDiskType's volume selection loop.
Since detection runs single-threaded (MaxDetectionConcurrency: 1),
no race can occur between the two points.
* fix: consistent hasMore pattern and remove double-counted LoadCount in scoring
- Adopt vacuum_handler's hasMore pattern: over-fetch by 1, check
len > maxResults, and truncate — consistent truncation semantics
- Remove direct LoadCount penalty in calculateBalanceScore since
LoadCount is already factored into effectiveVolumeCount for
utilization scoring; bump utilization weight from 40 to 50 to
compensate for the removed 10-point load penalty
* fix: handle zero maxResults as no-cap, emit trace after trim, seed empty servers
- When MaxResults is 0 (omitted), treat as no explicit cap instead of
defaulting to 1; only apply the +1 over-fetch probe when caller
supplies a positive limit
- Move decision trace emission after hasMore/trim so the trace
accurately reflects the returned proposals
- Seed serverVolumeCounts from ActiveTopology so servers that have a
matching disk type but zero volumes are included in the imbalance
calculation and MinServerCount check
* fix: nil-guard clusterInfo, uncap legacy DetectionFunc, deterministic disk type order
- Add early nil guard for clusterInfo in Detection to prevent panics
in downstream helpers (detectForDiskType, createBalanceTask)
- Change register.go DetectionFunc wrapper from maxResults=1 to 0
(no cap) so the legacy code path returns all detected tasks
- Sort disk type keys before iteration so results are deterministic
when maxResults spans multiple disk types (HDD/SSD)
* fix: don't over-fetch in stateful detection to avoid orphaned pending tasks
Detection registers planned moves in ActiveTopology via AddPendingTask,
so requesting maxResults+1 would create an extra pending task that gets
discarded during trim. Use len(results) >= maxResults as the hasMore
signal instead, which is correct since Detection already caps internally.
* fix: return explicit truncated flag from Detection instead of approximating
Detection now returns (results, truncated, error) where truncated is true
only when the loop stopped because it hit maxResults, not when it ran out
of work naturally. This eliminates false hasMore signals when detection
happens to produce exactly maxResults results by resolving the imbalance.
* cleanup: simplify detection logic and remove redundancies
- Remove redundant clusterInfo nil check in detectForDiskType since
Detection already guards against nil clusterInfo
- Remove adjustments loop for destination servers not in
serverVolumeCounts — topology seeding ensures all servers with
matching disk type are already present
- Merge two-loop min/max calculation into a single loop: min across
all servers, max only among non-exhausted servers
- Replace magic number 100 with len(metrics) for minC initialization
in convergence test
* fix: accurate truncation flag, deterministic server order, indexed volume lookup
- Track balanced flag to distinguish "hit maxResults cap" from "cluster
balanced at exactly maxResults" — truncated is only true when there's
genuinely more work to do
- Sort servers for deterministic iteration and tie-breaking when
multiple servers have equal volume counts
- Pre-index volumes by server with per-server cursors to avoid
O(maxResults * volumes) rescanning on each iteration
- Add truncation flag assertions to RespectsMaxResults test: true when
capped, false when detection finishes naturally
* fix: seed trace server counts from ActiveTopology to match detection logic
The decision trace was building serverVolumeCounts only from metrics,
missing zero-volume servers seeded from ActiveTopology by Detection.
This could cause the trace to report wrong server counts, incorrect
imbalance ratios, or spurious "too few servers" messages. Pass
activeTopology into the trace function and seed server counts the
same way Detection does.
* fix: don't exhaust server on per-volume planning failure, sort volumes by ID
- When createBalanceTask returns nil, continue to the next volume on
the same server instead of marking the entire server as exhausted.
The failure may be volume-specific (not found in topology, pending
task registration failed) and other volumes on the server may still
be viable candidates.
- Sort each server's volume slice by VolumeID after pre-indexing so
volume selection is fully deterministic regardless of input order.
* fix: use require instead of assert to prevent nil dereference panic in CORS test
The test used assert.NoError (non-fatal) for GetBucketCors, then
immediately accessed getResp.CORSRules. When the API returns an error,
getResp is nil causing a panic. Switch to require.NoError/NotNil/Len
so the test stops before dereferencing a nil response.
* fix: deterministic disk tie-breaking and stronger pre-existing task test
- Sort available disks by NodeID then DiskID before scoring so
destination selection is deterministic when two disks score equally
- Add task count bounds assertion to SkipsPreExistingPendingTasks test:
with 15 of 20 volumes already having pending tasks, at most 5 new
tasks should be created and at least 1 (imbalance still exists)
* fix: seed adjustments from existing pending/assigned tasks to prevent over-scheduling
Detection now calls ActiveTopology.GetTaskServerAdjustments() to
initialize the adjustments map with source/destination deltas from
existing pending and assigned balance tasks. This ensures
effectiveCounts reflects in-flight moves, preventing the algorithm
from planning additional moves in the same direction when prior
moves already address the imbalance.
Added GetTaskServerAdjustments(taskType) to ActiveTopology which
iterates pending and assigned tasks, decrementing source servers
and incrementing destination servers for the given task type.