* Add multi-partition-spec compaction and delete-aware compaction (Phase 3)
Multi-partition-spec compaction:
- Add SpecID to compactionBin struct and group by spec+partition key
- Remove the len(specIDs) > 1 skip that blocked spec-evolved tables
- Write per-spec manifests in compaction commit using specByID map
- Use per-bin PartitionSpec when calling NewDataFileBuilder
Delete-aware compaction:
- Add ApplyDeletes config (default: true) with readBoolConfig helper
- Implement position delete collection (file_path + pos Parquet columns)
- Implement equality delete collection (field ID to column mapping)
- Update mergeParquetFiles to filter rows via position deletes (binary
search) and equality deletes (hash set lookup)
- Smart delete manifest carry-forward: drop when all data files compacted
- Fix EXISTING/DELETED entries to include sequence numbers
Tests for multi-spec bins, delete collection, merge filtering, and
end-to-end compaction with position/equality/mixed deletes.
* Add structured metrics and per-bin progress to iceberg maintenance
- Change return type of all four operations from (string, error) to
(string, map[string]int64, error) with structured metric counts
(files_merged, snapshots_expired, orphans_removed, duration_ms, etc.)
- Add onProgress callback to compactDataFiles for per-bin progress
- In Execute, pass progress callback that sends JobProgressUpdate with
per-bin stage messages
- Accumulate per-operation metrics with dot-prefixed keys
(e.g. compact.files_merged) into OutputValues on completion
- Update testing_api.go wrappers and integration test call sites
- Add tests: TestCompactDataFilesMetrics, TestExpireSnapshotsMetrics,
TestExecuteCompletionOutputValues
* Address review feedback: group equality deletes by field IDs, use metric constants
- Group equality deletes by distinct equality_ids sets so different
delete files with different equality columns are handled correctly
- Use length-prefixed type-aware encoding in buildEqualityKey to avoid
ambiguity between types and collisions from null bytes
- Extract metric key strings into package-level constants
* Fix buildEqualityKey to use length-prefixed type-aware encoding
The previous implementation used plain String() concatenation with null
byte separators, which caused type ambiguity (int 123 vs string "123")
and separator collisions when values contain null bytes. Now each value
is serialized as "kind:length:value" for unambiguous composite keys.
This fix was missed in the prior cherry-pick due to a merge conflict.
* Address nitpick review comments
- Document patchManifestContentToDeletes workaround: explain that
iceberg-go WriteManifest cannot create delete manifests, and note
the fail-fast validation on pattern match
- Document makeTestEntries: note that specID field is ignored and
callers should use makeTestEntriesWithSpec for multi-spec testing
* fmt
* Fix path normalization, manifest threshold, and artifact filename collisions
- Normalize file paths in position delete collection and lookup so that
absolute S3 URLs and relative paths match correctly
- Fix rewriteManifests threshold check to count only data manifests
(was including delete manifests in the count and metric)
- Add random suffix to artifact filenames in compactDataFiles and
rewriteManifests to prevent collisions between concurrent runs
- Sort compaction bins by SpecID then PartitionKey for deterministic
ordering across specs
* Fix pos delete read, deduplicate column resolution, minor cleanups
- Remove broken Column() guard in position delete reading that silently
defaulted pos to 0; unconditionally extract Int64() instead
- Deduplicate column resolution in readEqualityDeleteFile by calling
resolveEqualityColIndices instead of inlining the same logic
- Add warning log in readBoolConfig for unrecognized string values
- Fix CompactDataFiles call site in integration test to capture 3 return
values
* Advance progress on all bins, deterministic manifest order, assert metrics
- Call onProgress for every bin iteration including skipped/failed bins
so progress reporting never appears stalled
- Sort spec IDs before iterating specEntriesMap to produce deterministic
manifest list ordering across runs
- Assert expected metric keys in CompactDataFiles integration test
---------
Co-authored-by: Copilot <copilot@github.com>
When S3StorageClass is empty (the default), aws.String("") was passed
as the StorageClass in PutObject requests. While AWS S3 treats this as
"use default," S3-compatible providers (e.g. SharkTech) reject it with
InvalidStorageClass. Only set StorageClass when a non-empty value is
configured, letting the provider use its default.
Fixes#8644
Change iceberg target_file_size config from bytes to MB
Rename the config field from target_file_size_bytes to
target_file_size_mb with a default of 256 (MB). The value is
converted to bytes internally. This makes the config more
user-friendly — entering 256 is clearer than 268435456.
Co-authored-by: Copilot <copilot@github.com>
* Add iceberg_maintenance plugin worker handler (Phase 1)
Implement automated Iceberg table maintenance as a new plugin worker job
type. The handler scans S3 table buckets for tables needing maintenance
and executes operations in the correct Iceberg order: expire snapshots,
remove orphan files, and rewrite manifests.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* Add data file compaction to iceberg maintenance handler (Phase 2)
Implement bin-packing compaction for small Parquet data files:
- Enumerate data files from manifests, group by partition
- Merge small files using parquet-go (read rows, write merged output)
- Create new manifest with ADDED/DELETED/EXISTING entries
- Commit new snapshot with compaction metadata
Add 'compact' operation to maintenance order (runs before expire_snapshots),
configurable via target_file_size_bytes and min_input_files thresholds.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* Fix memory exhaustion in mergeParquetFiles by processing files sequentially
Previously all source Parquet files were loaded into memory simultaneously,
risking OOM when a compaction bin contained many small files. Now each file
is loaded, its rows are streamed into the output writer, and its data is
released before the next file is loaded — keeping peak memory proportional
to one input file plus the output buffer.
* Validate bucket/namespace/table names against path traversal
Reject names containing '..', '/', or '\' in Execute to prevent
directory traversal via crafted job parameters.
* Add filer address failover in iceberg maintenance handler
Try each filer address from cluster context in order instead of only
using the first one. This improves resilience when the primary filer
is temporarily unreachable.
* Add separate MinManifestsToRewrite config for manifest rewrite threshold
The rewrite_manifests operation was reusing MinInputFiles (meant for
compaction bin file counts) as its manifest count threshold. Add a
dedicated MinManifestsToRewrite field with its own config UI section
and default value (5) so the two thresholds can be tuned independently.
* Fix risky mtime fallback in orphan removal that could delete new files
When entry.Attributes is nil, mtime defaulted to Unix epoch (1970),
which would always be older than the safety threshold, causing the
file to be treated as eligible for deletion. Skip entries with nil
Attributes instead, matching the safer logic in operations.go.
* Fix undefined function references in iceberg_maintenance_handler.go
Use the exported function names (ShouldSkipDetectionByInterval,
BuildDetectorActivity, BuildExecutorActivity) matching their
definitions in vacuum_handler.go.
* Remove duplicated iceberg maintenance handler in favor of iceberg/ subpackage
The IcebergMaintenanceHandler and its compaction code in the parent
pluginworker package duplicated the logic already present in the
iceberg/ subpackage (which self-registers via init()). The old code
lacked stale-plan guards, proper path normalization, CAS-based xattr
updates, and error-returning parseOperations.
Since the registry pattern (default "all") makes the old handler
unreachable, remove it entirely. All functionality is provided by
iceberg.Handler with the reviewed improvements.
* Fix MinManifestsToRewrite clamping to match UI minimum of 2
The clamp reset values below 2 to the default of 5, contradicting the
UI's advertised MinValue of 2. Clamp to 2 instead.
* Sort entries by size descending in splitOversizedBin for better packing
Entries were processed in insertion order which is non-deterministic
from map iteration. Sorting largest-first before the splitting loop
improves bin packing efficiency by filling bins more evenly.
* Add context cancellation check to drainReader loop
The row-streaming loop in drainReader did not check ctx between
iterations, making long compaction merges uncancellable. Check
ctx.Done() at the top of each iteration.
* Fix splitOversizedBin to always respect targetSize limit
The minFiles check in the split condition allowed bins to grow past
targetSize when they had fewer than minFiles entries, defeating the
OOM protection. Now bins always split at targetSize, and a trailing
runt with fewer than minFiles entries is merged into the previous bin.
* Add integration tests for iceberg table maintenance plugin worker
Tests start a real weed mini cluster, create S3 buckets and Iceberg
table metadata via filer gRPC, then exercise the iceberg.Handler
operations (ExpireSnapshots, RemoveOrphans, RewriteManifests) against
the live filer. A full maintenance cycle test runs all operations in
sequence and verifies metadata consistency.
Also adds exported method wrappers (testing_api.go) so the integration
test package can call the unexported handler methods.
* Fix splitOversizedBin dropping files and add source path to drainReader errors
The runt-merge step could leave leading bins with fewer than minFiles
entries (e.g. [80,80,10,10] with targetSize=100, minFiles=2 would drop
the first 80-byte file). Replace the filter-based approach with an
iterative merge that folds any sub-minFiles bin into its smallest
neighbor, preserving all eligible files.
Also add the source file path to drainReader error messages so callers
can identify which Parquet file caused a read/write failure.
* Harden integration test error handling
- s3put: fail immediately on HTTP 4xx/5xx instead of logging and
continuing
- lookupEntry: distinguish NotFound (return nil) from unexpected RPC
errors (fail the test)
- writeOrphan and orphan creation in FullMaintenanceCycle: check
CreateEntryResponse.Error in addition to the RPC error
* go fmt
---------
Co-authored-by: Copilot <copilot@github.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>