seaweedfs

Commit Graph

Author	SHA1	Message	Date
Chris Lu	ba624f1f34	Rust volume server implementation with CI (#8539 ) * Match Go gRPC client transport defaults * Honor Go HTTP idle timeout * Honor maintenanceMBps during volume copy * Honor images.fix.orientation on uploads * Honor cpuprofile when pprof is disabled * Match Go memory status payloads * Propagate request IDs across gRPC calls * Format pending Rust source updates * Match Go stats endpoint payloads * Serve Go volume server UI assets * Enforce Go HTTP whitelist guards * Align Rust metrics admin-port test with Go behavior * Format pending Rust server updates * Honor access.ui without per-request JWT checks * Honor keepLocalDatFile in tier upload shortcut * Honor Go remote volume write mode * Load tier backends from master config * Check master config before loading volumes * Remove vif files on volume destroy * Delete remote tier data on volume destroy * Honor vif version defaults and overrides * Reject mismatched vif bytes offsets * Load remote-only tiered volumes * Report Go tail offsets in sync status * Stream remote dat in incremental copy * Honor collection vif for EC shard config * Persist EC expireAtSec in vif metadata * Stream remote volume reads through HTTP * Serve HTTP ranges from backend source * Match Go ReadAllNeedles scan order * Match Go CopyFile zero-stop metadata * Delete EC volumes with collection cleanup * Drop deleted collection metrics * Match Go tombstone ReadNeedleMeta * Match Go TTL parsing: all-digit default to minutes, two-pass fit algorithm * Match Go needle ID/cookie formatting and name size computation * Match Go image ext checks: webp resize only, no crop; empty healthz body * Match Go Prometheus metric names and add missing handler counter constants * Match Go ReplicaPlacement short string parsing with zero-padding * Add missing EC constants MAX_SHARD_COUNT and MIN_TOTAL_DISKS * Add walk_ecx_stats for accurate EC volume file counts and size * Match Go VolumeStatus dat file size, EC shard stats, and disk pct precision * Match Go needle map: unconditional delete counter, fix redb idx walk offset * Add CompactMapSegment overflow panic guard matching Go * Match Go volume: vif creation, version from superblock, TTL expiry, dedup data_size, garbage_level fallback * Match Go 304 Not Modified: return bare status with no headers * Match Go JWT error message: use "wrong jwt" instead of detailed error * Match Go read handler bare 400, delete error prefix, download throttle timeout * Match Go pretty JSON 1-space indent and "Deletion Failed:" error prefix * Match Go heartbeat: keep is_heartbeating on error, add EC shard identification * Match Go needle ReadBytes V2: tolerate EOF on truncated body * Match Go volume: cookie check on any existing needle, return DataSize, 128KB meta guard * Match Go DeleteCollection: propagate destroy errors * Match Go gRPC: BatchDelete no flag, IncrementalCopy error, FetchAndWrite concurrent, VolumeUnmount/DeleteCollection errors, tail draining, query error code * Match Go Content-Disposition RFC 6266 formatting with RFC 2231 encoding * Match Go Guard isWriteActive: combine whitelist and signing key check * Match Go DeleteCollectionMetrics: use partial label matching * Match Go heartbeat: send state-only delta on volume state changes * Match Go ReadNeedleMeta paged I/O: read header+tail only, skip data; add EIO tracking * Match Go ScrubVolume INDEX mode dispatch; add VolumeCopy preallocation and EC NeedleStatus TODOs * Add read_ec_shard_needle for full needle reconstruction from local EC shards * Make heartbeat master config helpers pub for VolumeCopy preallocation * Match Go gRPC: VolumeCopy preallocation, EC NeedleStatus full read, error message wording * Match Go HTTP responses: omitempty fields, 2-space JSON indent, JWT JSON error, delete pretty/JSONP, 304 Last-Modified, raw write error * Match Go WriteNeedleBlob V3 timestamp patching, fix makeup_diff double padding, count==0 read handling * Add rebuild_ecx_file for EC index reconstruction from data shards * Match Go gRPC: tail header first-chunk-only, EC cleanup on failure, copy append mode, ecx rebuild, compact cancellation * Add EC volume read and delete support in HTTP handlers * Add per-shard EC mount/unmount, location predicate search, idx directory for EC * Add CheckVolumeDataIntegrity on volume load matching Go * Match Go gRPC: EC multi-disk placement, per-shard mount/unmount, no auto-mount on reconstruct, streaming ReadAll/EcShardRead, ReceiveFile cleanup, version check, proxy streaming, redirect Content-Type * Match Go heartbeat metric accounting * Match Go duplicate UUID heartbeat retries * Delete expired EC volumes during heartbeat * Match Go volume heartbeat pruning * Honor master preallocate in volume max * Report remote storage info in heartbeats * Emit EC heartbeat deltas on shard changes * Match Go throttle boundary: use <= instead of <, fix pretty JSON to 1-space * Match Go write_needle_blob monotonic appendAtNs via get_append_at_ns * Match Go VolumeUnmount: idempotent success when volume not found * Match Go TTL Display: return empty string when unit is Empty Go checks `t.Unit == Empty` separately and returns "" for TTLs with nonzero count but Empty unit. Rust only checked is_empty() (count==0 && unit==0), so count>0 with unit=0 would format as "5 " instead of "". * Match Go error behavior for truncated needle data in read_body_v2 Go's readNeedleDataVersion2 returns "index out of range %d" errors (indices 1-7) when needle body or metadata fields are truncated. Rust was silently tolerating truncation and returning Ok. Now returns NeedleError::IndexOutOfRange with the matching index for each field. * Match Go download throttle: return JSON error instead of plain text * Match Go crop params: default x1/y1 to 0 when not provided * Match Go ScrubEcVolume: accumulate total_files from EC shards * Match Go ScrubVolume: count total_files even on scrub error * Match Go VolumeEcShardsCopy: set ignore_source_file_not_found for .vif * Match Go VolumeTailSender: send needle_header on every chunk * Match Go read_super_block: apply replication override from .vif * Match Go check_volume_data_integrity: verify all 10 entries, detect trailing corruption * Match Go WriteNeedleBlob: dedup check before writing during replication * handlers: use meta-only reads for HEAD * handlers: align range parsing and responses with Go * handlers: align upload parsing with Go * deps: enable webp support * Make 5bytes the default feature for idx entry compatibility * Match Go TTL: preserve original unit when count fits in byte * Fix EC locate_needle: use get_actual_size for full needle size * Fix raw body POST: only parse multipart when Content-Type contains form-data * Match Go ReceiveFile: return protocol errors in response body, not gRPC status * add docs * Match Go VolumeEcShardsCopy: append to .ecj file instead of truncating * Match Go ParsePath: support _delta suffix on file IDs for sub-file addressing * Match Go chunk manifest: add Accept-Ranges, Content-Disposition, filename fallback, MIME detection * Match Go privateStoreHandler: use proper JSON error for unsupported methods * Match Go Destroy: add only_empty parameter to reject non-empty volume deletion * Fix compilation: set_read_only_persist and set_writable return () These methods fire-and-forget save_vif internally, so gRPC callers should not try to chain .map_err() on the unit return type. * Match Go SaveVolumeInfo: check writability and propagate errors in save_vif * Match Go VolumeDelete: propagate only_empty to delete_volume for defense in depth The gRPC VolumeDelete handler had a pre-check for only_empty but then passed false to store.delete_volume(), bypassing the store-level check. Go passes req.OnlyEmpty directly to DeleteVolume. Now Rust does the same for defense in depth against TOCTOU races (though the store write lock makes this unlikely). * Match Go ProcessRangeRequest: return full content for empty/oversized ranges Go returns nil from ProcessRangeRequest when ranges are empty or total range size exceeds content length, causing the caller to serve the full content as a normal 200 response. Rust was returning an empty 200 body. * Match Go Query: quote JSON keys in output records Go's ToJson produces valid JSON with quoted keys like {"name":"Alice"}. Rust was producing invalid JSON with unquoted keys like {name:"Alice"}. * Match Go VolumeCopy: reject when no suitable disk location exists Go returns ErrVolumeNoSpaceLeft when no location matches the disk type and has sufficient space. Rust had an unsafe fallback that silently picked the first location regardless of type or available space. * Match Go DeleteVolumeNeedle: check noWriteOrDelete before allowing delete Go checks v.noWriteOrDelete before proceeding with needle deletion, returning "volume is read only" if true. Rust was skipping this check. * Match Go ReceiveFile: prefer HardDrive location for EC and use response-level write errors Two fixes: (1) Go prefers HardDriveType disk location for EC volumes, falling back to first location. Returns "no storage location available" when no locations exist. (2) Write failures are now response-level errors (in response body) instead of gRPC status errors, matching Go. * Match Go CopyFile: sync EC volume journal to disk before copying Go calls ecVolume.Sync() before copying EC volume files to ensure the .ecj journal is flushed to disk. Added sync_to_disk() to EcVolume and call it in the CopyFile EC branch. * Match Go readSuperBlock: propagate replication parse errors Go returns an error when parsing the replication string from the .vif file fails. Rust was silently ignoring the parse failure and using the super block's replication as-is. * Match Go TTL expiry: remove append_at_ns > 0 guard Go computes TTL expiry from AppendAtNs without guarding against zero. When append_at_ns is 0, the expiry is epoch + TTL which is in the past, correctly returning NotFound. Rust's extra guard skipped the check, incorrectly returning success for such needles. * Match Go delete_collection: skip volumes with compaction in progress Go checks !v.isCompactionInProgress.Load() before destroying a volume during collection deletion, skipping compacting volumes. Also changed destroy errors to log instead of aborting the entire collection delete. * Match Go MarkReadonly/MarkWritable: always notify master even on local error Go always notifies the master regardless of whether the local set_read_only_persist or set_writable step fails. The Rust code was using `?` which short-circuited on error, skipping the final master notification. Save the result and defer the `?` until after the notify call. * Match Go PostHandler: return 500 for all write errors Go returns 500 (InternalServerError) for all write failures. Rust was returning 404 for volume-not-found and 403 for read-only volumes. * Match Go makeupDiff: validate .cpd compaction revision is old + 1 Go reads the new .cpd file's super block and verifies the compaction revision is exactly old + 1. Rust only validated the old revision. * Match Go VolumeStatus: check data backend before returning status Go checks v.DataBackend != nil before building the status response, returning an error if missing. Rust was silently returning size 0. * Match Go PostHandler: always include mime field in upload response JSON Go always serializes the mime field even when empty ("mime":""). Rust was omitting it when empty due to Option<String> with skip_serializing_if. * Match Go FindFreeLocation: account for EC shards in free slot calculation Go subtracts EC shard equivalents when computing available volume slots. Rust was only comparing volume count, potentially over-counting free slots on locations with many EC shards. * Match Go privateStoreHandler: use INVALID as metrics label for unsupported methods Go records the method as INVALID in metrics for unsupported HTTP methods. Rust was using the actual method name. * Match Go volume: add commit_compact guard and scrub data size validation Two fixes: (1) commit_compact now checks/sets is_compacting flag to prevent concurrent commits, matching Go's CompareAndSwap guard. (2) scrub now validates total needle sizes against .dat file size. * Match Go gRPC: fix TailSender error propagation, EcShardsInfo all slots, EcShardRead .ecx check Three fixes: (1) VolumeTailSender now propagates binary search errors instead of silently falling back to start. (2) VolumeEcShardsInfo returns entries for all shard slots including unmounted. (3) VolumeEcShardRead checks .ecx index for deletions instead of .ecj. * Match Go metrics: add BuildInfo gauge and connection tracking functions Go exposes a BuildInfo Prometheus metric with version labels, and tracks open connections via stats.ConnectionOpen/Close. Added both to Rust. * Match Go NeedleMap.Delete: use !is_deleted() instead of is_valid() Go's CompactMap.Delete checks !IsDeleted() not IsValid(), so needles with size==0 (live but anomalous) can still be deleted. The Rust code was using is_valid() which returns false for size==0, preventing deletion of such needles. * Match Go fitTtlCount: always normalize TTL to coarsest unit Go's fitTtlCount always converts to seconds first, then finds the coarsest unit that fits in one byte (e.g., 120m → 2h). Rust had an early return for count<=255 that skipped normalization, producing different binary encodings for the same duration. * Match Go BuildInfo metric: correct name and add missing labels Go uses SeaweedFS_build_info (Namespace=SeaweedFS, Subsystem=build, Name=info) with labels [version, commit, sizelimit, goos, goarch]. Rust had SeaweedFS_volumeServer_buildInfo with only [version]. * Match Go HTTP handlers: fix UploadResult fields, DiskStatus JSON, chunk manifest ETag - UploadResult.mime: add skip_serializing_if to omit empty MIME (Go uses omitempty) - UploadResult.contentMd5: only include when request provided Content-MD5 header - Content-MD5 response header: only set when request provided it - DiskStatuses: use camelCase field names (percentFree, percentUsed, diskType) to match Go's protobuf JSON marshaling - Chunk manifest: preserve needle ETag in expanded response headers * Match Go volume: fix version(), integrity check, scrub, and commit_compact - version(): use self.version() instead of self.super_block.version in read_all_needles, check_volume_data_integrity, scan_raw_needles_from to respect volumeInfo.version override - check_volume_data_integrity: initialize healthy_index_size to idx_size (matching Go) and continue on EOF instead of returning error - scrub(): count deleted needles in total_read since they still occupy space in the .dat file (matches Go's totalRead += actualSize for deleted) - commit_compact: clean up .cpd/.cpx files on makeup_diff failure (matches Go's error path cleanup) * Match Go write queue: add 4MB batch byte limit Go's startWorker breaks the batch at either 128 requests or 4MB of accumulated write data. Rust only had the 128-request limit, allowing large writes to accumulate unbounded latency. * Add TTL normalization tests for Go parity verification Test that fit_ttl_count normalizes 120m→2h, 24h→1d, 7d→1w even when count fits in a byte, matching Go's fitTtlCount behavior. * Match Go FindFreeLocation: account for EC shards in free slot calculation Go's free volume count subtracts both regular volumes and EC volumes from max_volume_count. Rust was only counting regular volumes, which could over-report available slots when EC shards are mounted. * Match Go EC volume: mark deletions in .ecx and replay .ecj at startup Go's DeleteNeedleFromEcx marks needles as deleted in the .ecx index in-place (writing TOMBSTONE_FILE_SIZE at the size field) in addition to appending to the .ecj journal. Go's RebuildEcxFile replays .ecj entries into .ecx on startup, then removes the .ecj file. Rust was only appending to .ecj without marking .ecx, which meant deleted EC needles remained readable via .ecx binary search. This fix: - Opens .ecx in read/write mode (was read-only) - Adds mark_needle_deleted_in_ecx: binary search + in-place write - Calls it from journal_delete before appending to .ecj - Adds rebuild_ecx_from_journal: replays .ecj into .ecx on startup * Match Go check_all_ec_shards_deleted: use MAX_SHARD_COUNT instead of hardcoded 14 Go's TotalShardsCount is DataShardsCount + ParityShardsCount = 14 by default, but custom EC configs via .vif can have more shards (up to MaxShardCount = 32). Using MAX_SHARD_COUNT ensures all shard files are checked regardless of EC configuration. * Match Go EC locate: subtract 1 from shard size and use datFileSize override Go's LocateEcShardNeedleInterval passes shard.ecdFileSize-1 to LocateData (shards are padded, -1 avoids overcounting large block rows). When datFileSize is known, Go uses datFileSize/DataShards instead. Rust was passing the raw shard file size without adjustment. * Fix TTL parsing and DiskStatus field names to match Go exactly TTL::read: Go's ReadTTL preserves the original unit (7d stays 7d, not 1w) and errors on count > 255. The previous normalization change was incorrect — Go only normalizes internally via fitTtlCount, not during string parsing. DiskStatus: Go uses encoding/json on protobuf structs, which reads the json struct tags (snake_case: percent_free, percent_used, disk_type), not the protobuf JSON names (camelCase). Revert to snake_case to match Go's actual output. * Fix heartbeat: check leader != current master before redirect, process duplicated UUIDs first Match Go's volume_grpc_client_to_master.go behavior: 1. Only trigger leader redirect when the leader address differs from the current master (prevents unnecessary reconnect loops when master confirms its own address). 2. Process duplicated_uuids before leader redirect check, matching Go's ordering where duplicate UUID detection takes priority. * Remove SetState version check to match Go behavior Go's SetState unconditionally applies the state without any version mismatch check. The Rust version had an extra optimistic concurrency check that would reject valid requests from Go clients that don't track versions. * Fix TTL::read() to normalize via fit_ttl_count matching Go's ReadTTL Go's ReadTTL calls fitTtlCount which converts to seconds and normalizes to the coarsest unit that fits in a byte count (e.g. 120m->2h, 7d->1w, 24h->1d). The Rust version was preserving the original unit, producing different binary encodings on disk and in heartbeat messages. * Always return Content-MD5 header and JSON field on successful writes Go always sets Content-MD5 in the response regardless of whether the request included it. The Rust version was conditionally including it only when the request provided Content-MD5. * Include name and size in UploadResult JSON even when empty/zero Go's encoding/json always includes empty strings and zero values in the upload response. The Rust version was using skip_serializing_if to omit them, causing JSON structure differences. * Include deleted needles in scan_raw_needles_from to match Go Go's ScanVolumeFileFrom visits ALL needles including deleted ones. Skipping deleted entries during incremental copy would cause tombstones to not be propagated, making deleted files reappear on the receiving side. * Match Go NeedleMap.Delete: always write tombstone to idx file Go's NeedleMap.Delete unconditionally writes a tombstone entry to the idx file and updates metrics, even if the needle doesn't exist or is already deleted. This is important for replication where every delete operation must produce an idx write. The Rust version was skipping the tombstone write for non-existent or already-deleted needles. * Limit MIME type to 255 bytes matching Go's CreateNeedleFromRequest * Title-case Seaweed-* pair keys to match Go HTTP header canonicalization * Unify DiskType::Hdd into HardDrive to match Go's single HardDriveType * Skip tombstone entries in walk_ecx_stats total_size matching Go's Raw() * Return EMPTY TTL when computed seconds is zero matching Go's fitTtlCount * Include disk-space-low in Volume.is_read_only() matching Go * Log error on CIDR parse failure in whitelist matching Go's glog.Errorf * Log cookie mismatch in gRPC Query matching Go's V(0).Infof * Fix is_expired volume_size comparison to use < matching Go Go checks `volumeSize < super_block.SuperBlockSize` (strict less-than), but Rust used `<=`. This meant Rust would fail to expire a volume that is exactly SUPER_BLOCK_SIZE bytes. * Apply Go's JWT expiry defaults: 10s write, 60s read Go calls v.SetDefault("jwt.signing.expires_after_seconds", 10) and v.SetDefault("jwt.signing.read.expires_after_seconds", 60). Rust defaulted to 0 for both, which meant tokens would never expire when security.toml has a signing key but omits expires_after_seconds. * Stop [grpc.volume].ca from overriding [grpc].ca matching Go Go reads the gRPC CA file only from config.GetString("grpc.ca"), i.e. the [grpc] section. The [grpc.volume] section only provides cert and key. Rust was also reading ca from [grpc.volume] which would silently override the [grpc].ca value when both were present. * Fix free_volume_count to use EC shard count matching Go Was counting EC volumes instead of EC shards, which underestimates EC space usage. One EC volume with 14 shards uses ~1.4 volume slots, not 1. Now uses Go's formula: ((max - volumes) * DataShardsCount - ecShardCount) / DataShardsCount. * Include preallocate in compaction space check matching Go Go uses max(preallocate, estimatedCompactSize) for the free space check. Rust was only using the estimated volume size, which could start a compaction that fails mid-way if preallocate exceeds the volume size. * Check gzip magic bytes before setting Content-Encoding matching Go Go checks both Accept-Encoding contains "gzip" AND IsGzippedContent (data starts with 0x1f 0x8b) before setting Content-Encoding: gzip. Rust only checked Accept-Encoding, which could incorrectly declare gzip encoding for non-gzip compressed data. * Only set upload response name when needle HasName matching Go Go checks reqNeedle.HasName() before setting ret.Name. Rust always set the name from the filename variable, which could return the fid portion of the path as the name for raw PUT requests without a filename. * Treat MaxVolumeCount==0 as unlimited matching Go's hasFreeDiskLocation Go's hasFreeDiskLocation returns true immediately when MaxVolumeCount is 0, treating it as unlimited. Rust was computing effective_free as <= 0 for max==0, rejecting the location. This could fail volume creation during early startup before the first heartbeat adjusts max. * Read lastAppendAtNs from deleted V3 entries in integrity check Go's doCheckAndFixVolumeData reads AppendAtNs from both live entries (verifyNeedleIntegrity) and deleted tombstones (verifyDeletedNeedleIntegrity). Rust was skipping deleted entries, which could result in a stale last_append_at_ns if the last index entry is a deletion. * Return empty body for empty/oversized range requests matching Go Go's ProcessRangeRequest returns nil (empty body, 200 OK) when parsed ranges are empty or combined range size exceeds total content size. The Rust buffered path incorrectly returned the full file data for both cases. The streaming path already handled this correctly. * Dispatch ScrubEcVolume by mode matching Go's INDEX/LOCAL/FULL Go's ScrubEcVolume switches on mode: INDEX calls v.ScrubIndex() (ecx integrity only), LOCAL calls v.ScrubLocal(), FULL calls vs.store.ScrubEcVolume(). Rust was ignoring the mode and always running verify_ec_shards. Now INDEX mode checks ecx index integrity (sorted overlap detection + file size validation) without shard I/O, while LOCAL/FULL modes run the existing shard verification. * Fix TTL test expectation: 7d normalizes to 1w matching Go's fitTtlCount Go's ReadTTL calls fitTtlCount which normalizes to the coarsest unit that fits: 7 days = 1 week, so "7d" becomes {Count:1, Unit:Week} which displays as "1w". Both Go and Rust normalize identically. * Add version mismatch check to SetState matching Go's State.Update Go's State.Update compares the incoming version with the stored version and returns "version mismatch" error if they differ. This provides optimistic concurrency control. The Rust implementation was accepting any version unconditionally. * Use unquoted keys in Query JSON output matching Go's json.ToJson Go's json.ToJson produces records with unquoted keys like {score:12} not {"score":12}. This is a custom format used internally by SeaweedFS for query results. * Fix TTL test expectation in VolumeNeedleStatus: 7d normalizes to 1w Same normalization as the HTTP test: Go's ReadTTL calls fitTtlCount which converts 7 days to 1 week. * Include ETag header in 304 Not Modified responses matching Go behavior Go sets ETag on the response writer (via SetEtag) before the If-Modified-Since and If-None-Match conditional checks, so both 304 response paths include the ETag header. The Rust implementation was only adding ETag to 200 responses. * Remove needle-name fallback in chunk manifest filename resolution Go's tryHandleChunkedFile only falls back from URL filename to manifest name. Rust had an extra fallback to needle.name that Go does not perform, which could produce different Content-Disposition filenames for chunk manifests. * Validate JWT nbf (Not Before) claim matching Go's jwt-go/v5 Go's jwt.ParseWithClaims validates the nbf claim when present, rejecting tokens whose nbf is in the future. The Rust jsonwebtoken crate defaults validate_nbf to false, so tokens with future nbf were incorrectly accepted. * Set isHeartbeating to true at startup matching Go's VolumeServer init Go unconditionally sets isHeartbeating: true in the VolumeServer struct literal. Rust was starting with false when masters are configured, causing /healthz to return 503 until the first heartbeat succeeds. * Call store.close() on shutdown matching Go's Shutdown() Go's Shutdown() calls vs.store.Close() which closes all volumes and flushes file handles. The Rust server was relying on process exit for cleanup, which could leave data unflushed. * Include server ID in maintenance mode error matching Go's format Go returns "volume server %s is in maintenance mode" with the store ID. Rust was returning a generic "maintenance mode" message. * Fix DiskType test: use HardDrive variant matching Go's HddType="" Go maps both "" and "hdd" to HardDriveType (empty string). The Rust enum variant is HardDrive, not Hdd. The test referenced a nonexistent Hdd variant causing compilation failure. * Do not include ETag in 304 responses matching Go's GetOrHeadHandler Go sets ETag at L235 AFTER the If-Modified-Since and If-None-Match 304 return paths, so Go's 304 responses do not include the ETag header. The Rust code was incorrectly including ETag in both 304 response paths. * Return 400 on malformed query strings in PostHandler matching Go's ParseForm Go's r.ParseForm() returns HTTP 400 with "form parse error: ..." when the query string is malformed. Rust was silently falling back to empty query params via unwrap_or_default(). * Load EC volume version from .vif matching Go's NewEcVolume Go sets ev.Version = needle.Version(volumeInfo.Version) from the .vif file. Rust was always using Version::current() (V3), which would produce wrong needle actual size calculations for volumes created with V1 or V2. * Sync .ecx file before close matching Go's EcVolume.Close Go calls ev.ecxFile.Sync() before closing to ensure in-place deletion marks are flushed to disk. Without this, deletion marks written via MarkNeedleDeleted could be lost on crash. * Validate SuperBlock extra data size matching Go's Bytes() guard Go checks extraSize > 256256-2 and calls glog.Fatalf to prevent corrupt super block headers. Rust was silently truncating via u16 cast, which would write an incorrect extra_size field. Update quinn-proto 0.11.13 -> 0.11.14 to fix GHSA-6xvm-j4wr-6v98 Fixes Dependency Review CI failure: quinn-proto < 0.11.14 is vulnerable to unauthenticated remote DoS via panic in QUIC transport parameter parsing. * Skip TestMultipartUploadUsesFormFieldsForTimestampAndTTL for Go server Go's r.FormValue() cannot read multipart text fields after r.MultipartReader() consumes the body, so ts/ttl sent as multipart form fields only work with the Rust volume server. Skip this test when VOLUME_SERVER_IMPL != "rust" to fix CI failure. * Flush .ecx in EC volume sync_to_disk matching Go's Sync() Go's EcVolume.Sync() flushes both the .ecj journal and the .ecx index to disk. The Rust version only flushed .ecj, leaving in-place deletion marks in .ecx unpersisted until close(). This could cause data inconsistency if the server crashes after marking a needle deleted in .ecx but before close(). * Remove .vif file in EC volume destroy matching Go's Destroy() Go's EcVolume.Destroy() removes .ecx, .ecj, and .vif files. The Rust version only removed .ecx and .ecj, leaving orphaned .vif files on disk after EC volume destruction (e.g., after TTL expiry). * Fix is_expired to use <= for SuperBlockSize check matching Go Go checks contentSize <= SuperBlockSize to detect empty volumes (no needles). Rust used < which would incorrectly allow a volume with exactly SuperBlockSize bytes (header only, no data) to proceed to the TTL expiry check and potentially be marked as expired. * Fix read_append_at_ns to read timestamps from tombstone entries Go reads the full needle body for all entries including tombstones (deleted needles with size=0) to extract the actual AppendAtNs timestamp. The Rust version returned 0 early for size <= 0 entries, which would cause the binary search in incremental copy to produce incorrect results for positions containing deleted needles. Now uses get_actual_size to compute the on-disk size (which handles tombstones correctly) and only returns 0 when the actual size is 0. * Add X-Request-Id response header matching Go's requestIDMiddleware Go sets both X-Request-Id and x-amz-request-id response headers. The Rust server only set x-amz-request-id, missing X-Request-Id. * Add skip_serializing_if for UploadResult name and size fields Go's UploadResult uses json:"name,omitempty" and json:"size,omitempty", omitting these fields from JSON when they are zero values (empty string / 0). The Rust struct always serialized them, producing "name":"" and "size":0 where Go would omit them. * Support JSONP/pretty-print for write success responses Go's writeJsonQuiet checks for callback (JSONP) and pretty query parameters on all JSON responses including write success. The Rust write success path used axum::Json directly, bypassing JSONP and pretty-print support. Now uses json_result_with_query to match Go. * Include actual limit in file size limit error message Go returns "file over the limited %d bytes" with the actual limit value included. Rust returned a generic "file size limit exceeded" without the limit value, making it harder to debug. * Extract extension from 2-segment URL paths for image operations Go's parseURLPath extracts the file extension from all URL formats including 2-segment paths like /vid,fid.jpg. The Rust version only handled 3-segment paths (/vid/fid/filename.ext), so extensions in 2-segment paths were lost. This caused image resize/crop operations requested via query params to be silently skipped for those paths. * Add size_hint to TrackedBody so throttled downloads get Content-Length TrackedBody (used for download throttling) did not implement size_hint(), causing HTTP/1.1 to fall back to chunked transfer encoding instead of setting Content-Length. Go always sets Content-Length explicitly for non-range responses. * Add Last-Modified, pairs, and S3 headers to chunk manifest responses Go sets Last-Modified, needle pairs, and S3 pass-through headers on the response writer BEFORE calling tryHandleChunkedFile. Since the Rust chunk manifest handler created fresh response headers and returned early, these headers were missing from chunk manifest responses. Now passes last_modified_str into the chunk manifest handler and applies pairs and S3 pass-through query params (response-cache-control, response-content-encoding, etc.) to the chunk manifest response headers. * Fix multipart fallback to use first part data when no filename Go reads the first part's data unconditionally, then looks for a part with a filename. If none found, Go uses the first part's data (with empty filename). Rust only captured parts with filenames, so when no part had a filename it fell back to the raw multipart body bytes (including boundary delimiters), producing corrupt needle data. * Set HasName and HasMime flags for empty values matching Go Go's CreateNeedleFromRequest sets HasName and HasMime flags even when the filename or MIME type is empty (len < 256 is true for len 0). Rust skipped empty values, causing the on-disk needle format to differ: Go-written needles include extra bytes for the empty name/mime size fields, changing the serialized needle size in the idx entry. This ensures binary format compatibility between Go and Rust servers. * Add is_stopping guard to vacuum_volume_commit matching Go Go's CommitCompactVolume (store_vacuum.go L53-54) checks s.isStopping before committing compaction to prevent file swaps during shutdown. The Rust handler was missing this check, which could allow compaction commits while the server is stopping. * Remove disk_type from required status fields since Go omits it Go's default DiskType is "" (HardDriveType), and protobuf's omitempty tag causes empty strings to be dropped from JSON output. * test: honor rust env in dual volume harness * grpc: notify master after volume lifecycle changes * http: proxy to replicas before download-limit timeout * test: pass readMode to rust volume harnesses * fix store free-location predicate selection * fix volume copy disk placement and heartbeat notification * fix chunk manifest delete replication * fix write replication to survive client disconnects * fix download limit proxy and wait flow * fix crop gating for streamed reads * fix upload limit wait counter behavior * fix chunk manifest image transforms * fix has_resize_ops to check width/height > 0 instead of is_some() Go's shouldResizeImages condition is `width > 0 \|\| height > 0`, so `?width=0` correctly evaluates to false. Rust was using `is_some()` which made `?width=0` evaluate to true, unnecessarily disabling streaming reads for those requests. * fix Content-MD5 to only compute and return when provided by client Go only computes the MD5 of uncompressed data when a Content-MD5 header or multipart field is provided. Rust was always computing and returning it. Also fix the mismatch error message to include size, matching Go's format. * fix save_vif to compute ExpireAtSec from TTL Go's SaveVolumeInfo always computes ExpireAtSec = now + ttlSeconds when the volume has a TTL. The save_vif path (used by set_read_only and set_writable) was missing this computation, causing .vif files to be written without the correct expiration timestamp for TTL volumes. * fix set_writable to not modify no_write_can_delete Go's MarkVolumeWritable only sets noWriteOrDelete=false and persists. Rust was additionally setting no_write_can_delete=has_remote_file, which could incorrectly change the write mode for remote-file volumes when the master explicitly asks to make the volume writable. * fix write_needle_blob_and_index to error on too-small V3 blob Go returns an error when the needle blob is too small for timestamp patching. Rust was silently skipping the patch and writing the blob with a stale/zero timestamp, which could cause data integrity issues during incremental replication that relies on AppendAtNs ordering. * fix VolumeEcShardsToVolume to validate dataShards range Go validates that dataShards is > 0 and <= MaxShardCount before proceeding with EC-to-volume reconstruction. Without this check, a zero or excessively large data_shards value could cause confusing downstream failures. * fix destroy to use VolumeError::NotEmpty instead of generic Io error The dedicated NotEmpty variant exists in the enum but was not being used. This makes error matching consistent with Go's ErrVolumeNotEmpty. * fix SetState to persist state to disk with rollback on failure Go's State.Update saves VolumeServerState to a state.pb file after each SetState call, and rolls back the in-memory state if persistence fails. Rust was only updating in-memory atomics, so maintenance mode would be lost on server restart. Now saves protobuf-encoded state.pb and loads it on startup. * fix VolumeTierMoveDatToRemote to close local dat backend after upload Go calls v.LoadRemoteFile() after saving volume info, which closes the local DataBackend before transitioning to remote storage. Without this, the volume holds a stale file handle to the deleted local .dat file, causing reads to fail until server restart. * fix VolumeTierMoveDatFromRemote to close remote dat backend after download Go calls v.DataBackend.Close() and sets DataBackend=nil after removing the remote file reference. Without this, the stale remote backend state lingers and reads may not discover the newly downloaded local .dat file until server restart. * fix redirect to use internal url instead of public_url Go's proxyReqToTargetServer builds the redirect Location header from loc.Url (the internal URL), not publicUrl. Using public_url could cause redirect failures when internal and external URLs differ. * fix redirect test and add state_file_path to integration test Update redirect unit test to expect internal url (matching the previous fix). Add missing state_file_path field to the integration test VolumeServerState constructor. * fix FetchAndWriteNeedle to await all writes before checking errors Go uses a WaitGroup to await all writes (local + replicas) before checking errors. Rust was short-circuiting on local write failure, which could leave replica writes in-flight without waiting for completion. * fix shutdown to send deregister heartbeat before pre_stop delay Go's StopHeartbeat() closes stopChan immediately on interrupt, causing the heartbeat goroutine to send the deregister heartbeat right away, before the preStopSeconds delay. Rust was only setting is_stopping=true without waking the heartbeat loop, so the deregister was delayed until after the pre_stop sleep. Now we call volume_state_notify.notify_one() to wake the heartbeat immediately. * fix heartbeat response ordering to check duplicate UUIDs first Go processes heartbeat responses in this order: DuplicatedUuids first, then volume options (prealloc/size limit), then leader redirect. Rust was applying volume options before checking for duplicate UUIDs, which meant volume option changes would take effect even when the response contained a duplicate UUID error that should cause an immediate return. * the test thread was blocked * fix(deps): update aws-lc-sys 0.38.0 → 0.39.0 to resolve security advisories Bumps aws-lc-rs 1.16.1 → 1.16.2, pulling in aws-lc-sys 0.39.0 which fixes GHSA-394x-vwmw-crm3 (X.509 Name Constraints wildcard/unicode bypass) and GHSA-9f94-5g5w-gf6r (CRL Distribution Point scope check logic error). * fix: match Go Content-MD5 mismatch error message format Go uses "Content-MD5 did not match md5 of file data expected [X] received [Y] size Z" while Rust had a shorter format. Match the exact Go error string so clients see identical messages. * fix: match Go Bearer token length check (> 7, not >= 7) Go requires len(bearer) > 7 ensuring at least one char after "Bearer ". Rust used >= 7 which would accept an empty token. * fix(deps): drop legacy rustls 0.21 to resolve rustls-webpki GHSA-pwjx-qhcg-rvj4 aws-sdk-s3's default "rustls" feature enables tls-rustls in aws-smithy-runtime, which pulls in legacy-rustls-ring (rustls 0.21 → rustls-webpki 0.101.7, moderate CRL advisory). Replace with explicit default-https-client which uses only rustls 0.23 / rustls-webpki 0.103.9. * fix: use uploaded filename for auto-compression extension detection Go extracts the file extension from pu.FileName (the uploaded filename) for auto-compression decisions. Rust was using the URL path, which typically has no extension for SeaweedFS file IDs. * fix: add CRC legacy Value() backward-compat check on needle read Go double-checks CRC: n.Checksum != crc && uint32(n.Checksum) != crc.Value(). The Value() path is a deprecated transform for compat with seaweed versions prior to commit `056c480eb`. Rust had the legacy_value() method but wasn't using it in validation. * fix: remove /stats/* endpoints to match Go (commented out since L130) Go's volume_server.go has the /stats/counter, /stats/memory, and /stats/disk endpoints commented out (lines 130-134). Remove them from the Rust router along with the now-unused whitelist_guard middleware. * fix: filter application/octet-stream MIME for chunk manifests Go's tryHandleChunkedFile (L334) filters out application/octet-stream from chunk manifest MIME types, falling back to extension-based detection. Rust was returning the stored MIME as-is for manifests. * fix: VolumeMarkWritable returns error before notifying master Go returns early at L200 if MarkVolumeWritable fails, before reaching the master notification at L206. Rust was notifying master even on failure, creating inconsistent state where master thinks the volume is writable but local marking failed. * fix: check volume existence before maintenance in MarkReadonly/Writable Go's VolumeMarkReadonly (L239-241) and VolumeMarkWritable (L253-255) look up the volume first, then call makeVolumeReadonly/Writable which checks maintenance. Rust was checking maintenance first, returning "maintenance mode" instead of "not found" for missing volumes. * feat: implement ScrubVolume mark_broken_volumes_readonly (PR #8360) Add the mark_broken_volumes_readonly flag from PR #8360: - Sync proto field (tag 3) to local volume_server.proto - After scrubbing, if flag is set, call makeVolumeReadonly on each broken volume (notify master, mark local readonly, notify again) - Collect errors via joined error semantics matching Go's errors.Join - Factor out make_volume_readonly helper reused by both VolumeMarkReadonly and ScrubVolume Also refactors VolumeMarkReadonly to use the shared helper. * fix(deps): update rustls-webpki 0.103.9 → 0.103.10 (GHSA-pwjx-qhcg-rvj4) CRL Distribution Point matching logic fix for moderate severity advisory about CRLs not considered authoritative. * test: update integration tests for removed /stats/* endpoints Replace tests that expected /stats/* routes to return 200/401 with tests confirming they now fall through to the store handler (400), matching Go's commented-out stats endpoints. * docs: fix misleading comment about default offset feature The comment said "4-byte offsets unless explicitly built with 5-byte support" but the default feature enables 5bytes. This is intentional for production parity with Go -tags 5BytesOffset builds. Fix the comment to match reality.	2 days ago
Chris Lu	d5ee35c8df	Fix S3 delete for non-empty directory markers (#8740 ) * Fix S3 delete for non-empty directory markers * Address review feedback on directory marker deletes * Stabilize FUSE concurrent directory operations	5 days ago
Chris Lu	d6a872c4b9	Preserve explicit directory markers with octet-stream MIME (#8726 ) * Preserve octet-stream MIME on explicit directory markers * Run empty directory marker regression in CI * Run S3 Spark workflow for filer changes	7 days ago
Chris Lu	80f3079d2a	fix(s3): include directory markers in ListObjects without delimiter (#8704 ) * fix(s3): include directory markers in ListObjects without delimiter (#8698) Directory key objects (zero-byte objects with keys ending in "/") created via PutObject were omitted from ListObjects/ListObjectsV2 results when no delimiter was specified. AWS S3 includes these as regular keys in Contents. The issue was in doListFilerEntries: when recursing into directories in non-delimiter mode, directory key objects were only emitted when prefixEndsOnDelimiter was true. Added an else branch to emit them in the general recursive case as well. * remove issue reference from inline comment * test: add child-under-marker and paginated listing coverage Extend test 6 to place a child object under the directory marker and paginate with MaxKeys=1 so the emit-then-recurse truncation path is exercised. * fix(test): skip directory markers in Spark temporary artifacts check The listing check now correctly shows directory markers (keys ending in "/") after the ListObjects fix. These 0-byte metadata objects are not data artifacts — filter them from the listing check since the HeadObject-based check already verifies their cleanup with a timeout.	1 week ago
Chris Lu	8cde3d4486	Add data file compaction to iceberg maintenance (Phase 2) (#8503 ) * Add iceberg_maintenance plugin worker handler (Phase 1) Implement automated Iceberg table maintenance as a new plugin worker job type. The handler scans S3 table buckets for tables needing maintenance and executes operations in the correct Iceberg order: expire snapshots, remove orphan files, and rewrite manifests. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Add data file compaction to iceberg maintenance handler (Phase 2) Implement bin-packing compaction for small Parquet data files: - Enumerate data files from manifests, group by partition - Merge small files using parquet-go (read rows, write merged output) - Create new manifest with ADDED/DELETED/EXISTING entries - Commit new snapshot with compaction metadata Add 'compact' operation to maintenance order (runs before expire_snapshots), configurable via target_file_size_bytes and min_input_files thresholds. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Fix memory exhaustion in mergeParquetFiles by processing files sequentially Previously all source Parquet files were loaded into memory simultaneously, risking OOM when a compaction bin contained many small files. Now each file is loaded, its rows are streamed into the output writer, and its data is released before the next file is loaded — keeping peak memory proportional to one input file plus the output buffer. * Validate bucket/namespace/table names against path traversal Reject names containing '..', '/', or '\' in Execute to prevent directory traversal via crafted job parameters. * Add filer address failover in iceberg maintenance handler Try each filer address from cluster context in order instead of only using the first one. This improves resilience when the primary filer is temporarily unreachable. * Add separate MinManifestsToRewrite config for manifest rewrite threshold The rewrite_manifests operation was reusing MinInputFiles (meant for compaction bin file counts) as its manifest count threshold. Add a dedicated MinManifestsToRewrite field with its own config UI section and default value (5) so the two thresholds can be tuned independently. * Fix risky mtime fallback in orphan removal that could delete new files When entry.Attributes is nil, mtime defaulted to Unix epoch (1970), which would always be older than the safety threshold, causing the file to be treated as eligible for deletion. Skip entries with nil Attributes instead, matching the safer logic in operations.go. * Fix undefined function references in iceberg_maintenance_handler.go Use the exported function names (ShouldSkipDetectionByInterval, BuildDetectorActivity, BuildExecutorActivity) matching their definitions in vacuum_handler.go. * Remove duplicated iceberg maintenance handler in favor of iceberg/ subpackage The IcebergMaintenanceHandler and its compaction code in the parent pluginworker package duplicated the logic already present in the iceberg/ subpackage (which self-registers via init()). The old code lacked stale-plan guards, proper path normalization, CAS-based xattr updates, and error-returning parseOperations. Since the registry pattern (default "all") makes the old handler unreachable, remove it entirely. All functionality is provided by iceberg.Handler with the reviewed improvements. * Fix MinManifestsToRewrite clamping to match UI minimum of 2 The clamp reset values below 2 to the default of 5, contradicting the UI's advertised MinValue of 2. Clamp to 2 instead. * Sort entries by size descending in splitOversizedBin for better packing Entries were processed in insertion order which is non-deterministic from map iteration. Sorting largest-first before the splitting loop improves bin packing efficiency by filling bins more evenly. * Add context cancellation check to drainReader loop The row-streaming loop in drainReader did not check ctx between iterations, making long compaction merges uncancellable. Check ctx.Done() at the top of each iteration. * Fix splitOversizedBin to always respect targetSize limit The minFiles check in the split condition allowed bins to grow past targetSize when they had fewer than minFiles entries, defeating the OOM protection. Now bins always split at targetSize, and a trailing runt with fewer than minFiles entries is merged into the previous bin. * Add integration tests for iceberg table maintenance plugin worker Tests start a real weed mini cluster, create S3 buckets and Iceberg table metadata via filer gRPC, then exercise the iceberg.Handler operations (ExpireSnapshots, RemoveOrphans, RewriteManifests) against the live filer. A full maintenance cycle test runs all operations in sequence and verifies metadata consistency. Also adds exported method wrappers (testing_api.go) so the integration test package can call the unexported handler methods. * Fix splitOversizedBin dropping files and add source path to drainReader errors The runt-merge step could leave leading bins with fewer than minFiles entries (e.g. [80,80,10,10] with targetSize=100, minFiles=2 would drop the first 80-byte file). Replace the filter-based approach with an iterative merge that folds any sub-minFiles bin into its smallest neighbor, preserving all eligible files. Also add the source file path to drainReader error messages so callers can identify which Parquet file caused a read/write failure. * Harden integration test error handling - s3put: fail immediately on HTTP 4xx/5xx instead of logging and continuing - lookupEntry: distinguish NotFound (return nil) from unexpected RPC errors (fail the test) - writeOrphan and orphan creation in FullMaintenanceCycle: check CreateEntryResponse.Error in addition to the RPC error * go fmt --------- Co-authored-by: Copilot <copilot@github.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2 weeks ago
Chris Lu	10a30a83e1	s3api: add GetObjectAttributes API support (#8504 ) * s3api: add error code and header constants for GetObjectAttributes Add ErrInvalidAttributeName error code and header constants (X-Amz-Object-Attributes, X-Amz-Max-Parts, X-Amz-Part-Number-Marker, X-Amz-Delete-Marker) needed by the S3 GetObjectAttributes API. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * s3api: implement GetObjectAttributes handler Add GetObjectAttributesHandler that returns selected object metadata (ETag, Checksum, StorageClass, ObjectSize, ObjectParts) without returning the object body. Follows the same versioning and conditional header patterns as HeadObjectHandler. The handler parses the X-Amz-Object-Attributes header to determine which attributes to include in the XML response, and supports ObjectParts pagination via X-Amz-Max-Parts and X-Amz-Part-Number-Marker. Ref: https://docs.aws.amazon.com/AmazonS3/latest/API/API_GetObjectAttributes.html Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * s3api: register GetObjectAttributes route Register the GET /{object}?attributes route for the GetObjectAttributes API, placed before other object query routes to ensure proper matching. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * s3api: add integration tests for GetObjectAttributes Test coverage: - Basic: simple object with all attribute types - MultipartObject: multipart upload with parts pagination - SelectiveAttributes: requesting only specific attributes - InvalidAttribute: server rejects invalid attribute names - NonExistentObject: returns NoSuchKey for missing objects Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * s3api: add versioned object test for GetObjectAttributes Test puts two versions of the same object and verifies that: - GetObjectAttributes returns the latest version by default - GetObjectAttributes with versionId returns the specific version - ObjectSize and VersionId are correct for each version Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * s3api: fix combined conditional header evaluation per RFC 7232 Per RFC 7232: - Section 3.4: If-Unmodified-Since MUST be ignored when If-Match is present (If-Match is the more accurate replacement) - Section 3.3: If-Modified-Since MUST be ignored when If-None-Match is present (If-None-Match is the more accurate replacement) Previously, all four conditional headers were evaluated independently. This caused incorrect 412 responses when If-Match succeeded but If-Unmodified-Since failed (should return 200 per AWS S3 behavior). Fix applied to both validateConditionalHeadersForReads (GET/HEAD) and validateConditionalHeaders (PUT) paths. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * s3api: add conditional header combination tests for GetObjectAttributes Test the RFC 7232 combined conditional header semantics: - If-Match=true + If-Unmodified-Since=false => 200 (If-Unmodified-Since ignored) - If-None-Match=false + If-Modified-Since=true => 304 (If-Modified-Since ignored) - If-None-Match=true + If-Modified-Since=false => 200 (If-Modified-Since ignored) - If-Match=true + If-Unmodified-Since=true => 200 - If-Match=false => 412 regardless Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * s3api: document Checksum attribute as not yet populated Checksum is accepted in validation (so clients requesting it don't get a 400 error, matching AWS behavior for objects without checksums) but SeaweedFS does not yet store S3 checksums. Add a comment explaining this and noting where to populate it when checksum storage is added. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * s3api: add s3:GetObjectAttributes IAM action for ?attributes query Previously, GET /{object}?attributes resolved to s3:GetObject via the fallback path since resolveFromQueryParameters had no case for the "attributes" query parameter. Add S3_ACTION_GET_OBJECT_ATTRIBUTES constant ("s3:GetObjectAttributes") and a branch in resolveFromQueryParameters to return it for GET requests with the "attributes" query parameter, so IAM policies can distinguish GetObjectAttributes from GetObject. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * s3api: evaluate conditional headers after version resolution Move conditional header evaluation (If-Match, If-None-Match, etc.) to after the version resolution step in GetObjectAttributesHandler. This ensures that when a specific versionId is requested, conditions are checked against the correct version entry rather than always against the latest version. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * s3api: use bounded HTTP client in GetObjectAttributes tests Replace http.DefaultClient with a timeout-aware http.Client (10s) in the signedGetObjectAttributes helper and testGetObjectAttributesInvalid to prevent tests from hanging indefinitely. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * s3api: check attributes query before versionId in action resolver Move the GetObjectAttributes action check before the versionId check in resolveFromQueryParameters. This fixes GET /bucket/key?attributes&versionId=xyz being incorrectly classified as s3:GetObjectVersion instead of s3:GetObjectAttributes. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * s3api: add tests for versioned conditional headers and action resolver Add integration test that verifies conditional headers (If-Match, If-None-Match) are evaluated against the requested version entry, not the latest version. This covers the fix in `55c409dec`. Add unit test for ResolveS3Action verifying that the attributes query parameter takes precedence over versionId, so GET ?attributes&versionId resolves to s3:GetObjectAttributes. This covers the fix in `b92c61c95`. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * s3api: guard negative chunk indices and rename PartsCount field Add bounds checks for b.StartChunk >= 0 and b.EndChunk >= 0 in buildObjectAttributesParts to prevent panics from corrupted metadata with negative index values. Rename ObjectAttributesParts.PartsCount to TotalPartsCount to match the AWS SDK v2 Go field naming convention, while preserving the XML element name "PartsCount" via the struct tag. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * s3api: reject malformed max-parts and part-number-marker headers Return ErrInvalidMaxParts and ErrInvalidPartNumberMarker when the X-Amz-Max-Parts or X-Amz-Part-Number-Marker headers contain non-integer or negative values, matching ListObjectPartsHandler behavior. Previously these were silently ignored with defaults. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	3 weeks ago
Michał Szynkiewicz	2f837c4780	Fix error on deleting non-empty bucket (#8376 ) * Move check for non-empty bucket deletion out of `WithFilerClient` call * Added proper checking if a bucket has "user" objects	1 month ago
Michał Szynkiewicz	53048ffffb	Add md5 checksum validation support on PutObject and UploadPart (#8367 ) * Add md5 checksum validation support on PutObject and UploadPart Per the S3 specification, when a client sends a Content-MD5 header, the server must compare it against the MD5 of the received body and return BadDigest (HTTP 400) if they don't match. SeaweedFS was silently accepting objects with incorrect Content-MD5 headers, which breaks data integrity verification for clients that rely on this feature (e.g. boto3). The error infrastructure (ErrBadDigest, ErrMsgBadDigest) already existed from PR #7306 but was never wired to an actual check. This commit adds MD5 verification in putToFiler after the body is streamed and the MD5 is computed, and adds Content-MD5 header validation to PutObjectPartHandler (matching PutObjectHandler). Orphaned chunks are cleaned up on mismatch. Refs: https://github.com/seaweedfs/seaweedfs/discussions/3908 * handle SSE, add uploadpart test * s3 integration test: fix typo and add multipart upload checksum test * s3api: move validateContentMd5 after GetBucketAndObject in PutObjectPartHandler * s3api: move validateContentMd5 after GetBucketAndObject in PutObjectHandler * s3api: fix MD5 validation for SSE uploads and logging in putToFiler * add SSE test with checksum validation - mostly ai-generated * Update s3_integration_test.go * Address S3 integration test feedback: fix typos, rename variables, add verification steps, and clean up comments. --------- Co-authored-by: Chris Lu <chris.lu@gmail.com>	1 month ago
Chris Lu	0d8588e3ae	S3: Implement IAM defaults and STS signing key fallback (#8348 ) * S3: Implement IAM defaults and STS signing key fallback logic * S3: Refactor startup order to init SSE-S3 key manager before IAM * S3: Derive STS signing key from KEK using HKDF for security isolation * S3: Document STS signing key fallback in security.toml * fix(s3api): refine anonymous access logic and secure-by-default behavior - Initialize anonymous identity by default in `NewIdentityAccessManagement` to prevent nil pointer exceptions. - Ensure `ReplaceS3ApiConfiguration` preserves the anonymous identity if not present in the new configuration. - Update `NewIdentityAccessManagement` signature to accept `filerClient`. - In legacy mode (no policy engine), anonymous defaults to Deny (no actions), preserving secure-by-default behavior. - Use specific `LookupAnonymous` method instead of generic map lookup. - Update tests to accommodate signature changes and verify improved anonymous handling. * feat(s3api): make IAM configuration optional - Start S3 API server without a configuration file if `EnableIam` option is set. - Default to `Allow` effect for policy engine when no configuration is provided (Zero-Config mode). - Handle empty configuration path gracefully in `loadIAMManagerFromConfig`. - Add integration test `iam_optional_test.go` to verify empty config behavior. * fix(iamapi): fix signature mismatch in NewIdentityAccessManagementWithStore * fix(iamapi): properly initialize FilerClient instead of passing nil * fix(iamapi): properly initialize filer client for IAM management - Instead of passing `nil`, construct a `wdclient.FilerClient` using the provided `Filers` addresses. - Ensure `NewIdentityAccessManagementWithStore` receives a valid `filerClient` to avoid potential nil pointer dereferences or limited functionality. * clean: remove dead code in s3api_server.go * refactor(s3api): improve IAM initialization, safety and anonymous access security * fix(s3api): ensure IAM config loads from filer after client init * fix(s3): resolve test failures in integration, CORS, and tagging tests - Fix CORS tests by providing explicit anonymous permissions config - Fix S3 integration tests by setting admin credentials in init - Align tagging test credentials in CI with IAM defaults - Added goroutine to retry IAM config load in iamapi server * fix(s3): allow anonymous access to health targets and S3 Tables when identities are present * fix(ci): use /healthz for Caddy health check in awscli tests * iam, s3api: expose DefaultAllow from IAM and Policy Engine This allows checking the global "Open by Default" configuration from other components like S3 Tables. * s3api/s3tables: support DefaultAllow in permission logic and handler Updated CheckPermissionWithContext to respect the DefaultAllow flag in PolicyContext. This enables "Open by Default" behavior for unauthenticated access in zero-config environments. Added a targeted unit test to verify the logic. * s3api/s3tables: propagate DefaultAllow through handlers Propagated the DefaultAllow flag to individual handlers for namespaces, buckets, tables, policies, and tagging. This ensures consistent "Open by Default" behavior across all S3 Tables API endpoints. * s3api: wire up DefaultAllow for S3 Tables API initialization Updated registerS3TablesRoutes to query the global IAM configuration and set the DefaultAllow flag on the S3 Tables API server. This completes the end-to-end propagation required for anonymous access in zero-config environments. Added a SetDefaultAllow method to S3TablesApiServer to facilitate this. * s3api: fix tests by adding DefaultAllow to mock IAM integrations The IAMIntegration interface was updated to include DefaultAllow(), breaking several mock implementations in tests. This commit fixes the build errors by adding the missing method to the mocks. * env * ensure ports * env * env * fix default allow * add one more test using non-anonymous user * debug * add more debug * less logs	1 month ago
Chris Lu	551a31e156	Implement IAM propagation to S3 servers (#8130 ) * Implement IAM propagation to S3 servers - Add PropagatingCredentialStore to propagate IAM changes to S3 servers via gRPC - Add Policy management RPCs to S3 proto and S3ApiServer - Update CredentialManager to use PropagatingCredentialStore when MasterClient is available - Wire FilerServer to enable propagation * Implement parallel IAM propagation and fix S3 cluster registration - Parallelized IAM change propagation with 10s timeout. - Refined context usage in PropagatingCredentialStore. - Added S3Type support to cluster node management. - Enabled S3 servers to register with gRPC address to the master. - Ensured IAM configuration reload after policy updates via gRPC. * Optimize IAM propagation with direct in-memory cache updates * Secure IAM propagation: Use metadata to skip persistence only on propagation * pb: refactor IAM and S3 services for unidirectional IAM propagation - Move SeaweedS3IamCache service from iam.proto to s3.proto. - Remove legacy IAM management RPCs and empty SeaweedS3 service from s3.proto. - Enforce that S3 servers only use the synchronization interface. * pb: regenerate Go code for IAM and S3 services Updated generated code following the proto refactoring of IAM synchronization services. * s3api: implement read-only mode for Embedded IAM API - Add readOnly flag to EmbeddedIamApi to reject write operations via HTTP. - Enable read-only mode by default in S3ApiServer. - Handle AccessDenied error in writeIamErrorResponse. - Embed SeaweedS3IamCacheServer in S3ApiServer. * credential: refactor PropagatingCredentialStore for unidirectional IAM flow - Update to use s3_pb.SeaweedS3IamCacheClient for propagation to S3 servers. - Propagate full Identity object via PutIdentity for consistency. - Remove redundant propagation of specific user/account/policy management RPCs. - Add timeout context for propagation calls. * s3api: implement SeaweedS3IamCacheServer for unidirectional sync - Update S3ApiServer to implement the cache synchronization gRPC interface. - Methods (PutIdentity, RemoveIdentity, etc.) now perform direct in-memory cache updates. - Register SeaweedS3IamCacheServer in command/s3.go. - Remove registration for the legacy and now empty SeaweedS3 service. * s3api: update tests for read-only IAM and propagation - Added TestEmbeddedIamReadOnly to verify rejection of write operations in read-only mode. - Update test setup to pass readOnly=false to NewEmbeddedIamApi in routing tests. - Updated EmbeddedIamApiForTest helper with read-only checks matching production behavior. * s3api: add back temporary debug logs for IAM updates Log IAM updates received via: - gRPC propagation (PutIdentity, PutPolicy, etc.) - Metadata configuration reloads (LoadS3ApiConfigurationFromCredentialManager) - Core identity management (UpsertIdentity, RemoveIdentity) * IAM: finalize propagation fix with reduced logging and clarified architecture * Allow configuring IAM read-only mode for S3 server integration tests * s3api: add defensive validation to UpsertIdentity * s3api: fix log message to reference correct IAM read-only flag * test/s3/iam: ensure WaitForS3Service checks for IAM write permissions * test: enable writable IAM in Makefile for integration tests * IAM: add GetPolicy/ListPolicies RPCs to s3.proto * S3: add GetBucketPolicy and ListBucketPolicies helpers * S3: support storing generic IAM policies in IdentityAccessManagement * S3: implement IAM policy RPCs using IdentityAccessManagement * IAM: fix stale user identity on rename propagation	2 months ago
Chris Lu	d664ca5ed3	fix: IAM authentication with AWS Signature V4 and environment credentials (#8099 ) * fix: IAM authentication with AWS Signature V4 and environment credentials Three key fixes for authenticated IAM requests to work: 1. Fix request body consumption before signature verification - iamMatcher was calling r.ParseForm() which consumed POST body - This broke AWS Signature V4 verification on subsequent reads - Now only check query string in matcher, preserving body for verification - File: weed/s3api/s3api_server.go 2. Preserve environment variable credentials across config reloads - After IAM mutations, config reload overwrote env var credentials - Extract env var loading into loadEnvironmentVariableCredentials() - Call after every config reload to persist credentials - File: weed/s3api/auth_credentials.go 3. Add authenticated IAM tests and test infrastructure - New TestIAMAuthenticated suite with AWS SDK + Signature V4 - Dynamic port allocation for independent test execution - Flag reset to prevent state leakage between tests - CI workflow to run S3 and IAM tests separately - Files: test/s3/example/, .github/workflows/s3-example-integration-tests.yml All tests pass: - TestIAMCreateUser (unauthenticated) - TestIAMAuthenticated (with AWS Signature V4) - S3 integration tests fmt * chore: rename test/s3/example to test/s3/normal * simplify: CI runs all integration tests in single job * Update s3-example-integration-tests.yml * ci: run each test group separately to avoid raft registry conflicts	2 months ago

11 Commits (f98d63fcd036aeb68290fefe9d5eaaf8654a26c9)