Branch:
rust-volume-server
add-ec-vacuum
add-filer-iam-grpc
add-iam-grpc-management
add_fasthttp_client
add_remote_storage
adding-message-queue-integration-tests
adjust-fsck-cutoff-default
admin/csrf-s3tables
allow-no-role-arn
also-delete-parent-directory-if-empty
avoid_releasing_temp_file_on_write
changing-to-zap
codex-rust-volume-server-bootstrap
codex/admin-oidc-auth-ui
codex/cache-iam-policy-engines
codex/ec-repair-worker
codex/erasure-coding-shard-distribution
codex/list-object-versions-newest-first
codex/s3tables-maint-lifecycle-parity
codex/s3tables-maint-planner-multispec
codex/s3tables-maintenance-designs
collect-public-metrics
copilot/fix-helm-chart-installation
copilot/fix-s3-object-tagging-issue
copilot/make-renew-interval-configurable
copilot/make-renew-interval-configurable-again
copilot/sub-pr-7677
create-table-snapshot-api-design
data_query_pushdown
dependabot/maven/other/java/client/com.google.protobuf-protobuf-java-3.25.5
dependabot/maven/other/java/examples/org.apache.hadoop-hadoop-common-3.4.0
detect-and-plan-ec-tasks
do-not-retry-if-error-is-NotFound
ec-disk-type-support
enhance-erasure-coding
expand-the-s3-PutObject-permission-to-the-multipart-permissions
fasthttp
feature-8113-storage-class-disk-routing
feature/mini-port-detection
feature/modernize-s3-tests
feature/s3-multi-cert-support
feature/s3tables-improvements-and-spark-tests
feature/sra-uds-handler
feature/sw-block
filer1_maintenance_branch
fix-8303-s3-lifecycle-ttl-assign
fix-GetObjectLockConfigurationHandler
fix-bucket-name-case-7910
fix-helm-fromtoml-compatibility
fix-mount-http-parallelism
fix-mount-read-throughput-7504
fix-pr-7909
fix-s3-configure-consistency
fix-s3-object-tagging-issue-7589
fix-sts-session-token-7941
fix-versioning-listing-only
fix/iceberg-stage-create-semantics
fix/lock-table-shared-lock-precedence
fix/mount-cache-consistency
fix/object-lock-delete-enforcement
fix/plugin-ui-remove-scheduler-settings
fix/sts-body-preservation
fix/windows-test-file-cleanup
ftp
gh-pages
has-weed-sql-command
iam-multi-file-migration
iam-permissions-and-api
improve-fuse-mount
improve-fuse-mount2
logrus
master
message_send
mount2
mq-subscribe
mq2
nfs-cookie-prefix-list-fixes
optimize-delete-lookups
original_weed_mount
plugin-system-phase1
plugin-ui-enhancements-restored
pr-7412
pr/7984
pr/8140
pr/8680
raft-dual-write
random_access_file
refactor-needle-read-operations
refactor-volume-write
remote_overlay
remove-implicit-directory-handling
revert-5134-patch-1
revert-5819-patch-1
revert-6434-bugfix-missing-s3-audit
rust-volume-server
s3-remote-cache-singleflight
s3-select
s3tables-by-claude
scheduler-sequential-iteration
sub
tcp_read
test-reverting-lock-table
test_udp
testing
testing-sdx-generation
tikv
track-mount-e2e
upgrade-versions-to-4.00
volume_buffered_writes
worker-execute-ec-tasks
0.72
0.72.release
0.73
0.74
0.75
0.76
0.77
0.90
0.91
0.92
0.93
0.94
0.95
0.96
0.97
0.98
0.99
1.00
1.01
1.02
1.03
1.04
1.05
1.06
1.07
1.08
1.09
1.10
1.11
1.12
1.14
1.15
1.16
1.17
1.18
1.19
1.20
1.21
1.22
1.23
1.24
1.25
1.26
1.27
1.28
1.29
1.30
1.31
1.32
1.33
1.34
1.35
1.36
1.37
1.38
1.40
1.41
1.42
1.43
1.44
1.45
1.46
1.47
1.48
1.49
1.50
1.51
1.52
1.53
1.54
1.55
1.56
1.57
1.58
1.59
1.60
1.61
1.61RC
1.62
1.63
1.64
1.65
1.66
1.67
1.68
1.69
1.70
1.71
1.72
1.73
1.74
1.75
1.76
1.77
1.78
1.79
1.80
1.81
1.82
1.83
1.84
1.85
1.86
1.87
1.88
1.90
1.91
1.92
1.93
1.94
1.95
1.96
1.97
1.98
1.99
1;70
2.00
2.01
2.02
2.03
2.04
2.05
2.06
2.07
2.08
2.09
2.10
2.11
2.12
2.13
2.14
2.15
2.16
2.17
2.18
2.19
2.20
2.21
2.22
2.23
2.24
2.25
2.26
2.27
2.28
2.29
2.30
2.31
2.32
2.33
2.34
2.35
2.36
2.37
2.38
2.39
2.40
2.41
2.42
2.43
2.47
2.48
2.49
2.50
2.51
2.52
2.53
2.54
2.55
2.56
2.57
2.58
2.59
2.60
2.61
2.62
2.63
2.64
2.65
2.66
2.67
2.68
2.69
2.70
2.71
2.72
2.73
2.74
2.75
2.76
2.77
2.78
2.79
2.80
2.81
2.82
2.83
2.84
2.85
2.86
2.87
2.88
2.89
2.90
2.91
2.92
2.93
2.94
2.95
2.96
2.97
2.98
2.99
3.00
3.01
3.02
3.03
3.04
3.05
3.06
3.07
3.08
3.09
3.10
3.11
3.12
3.13
3.14
3.15
3.16
3.18
3.19
3.20
3.21
3.22
3.23
3.24
3.25
3.26
3.27
3.28
3.29
3.30
3.31
3.32
3.33
3.34
3.35
3.36
3.37
3.38
3.39
3.40
3.41
3.42
3.43
3.44
3.45
3.46
3.47
3.48
3.50
3.51
3.52
3.53
3.54
3.55
3.56
3.57
3.58
3.59
3.60
3.61
3.62
3.63
3.64
3.65
3.66
3.67
3.68
3.69
3.71
3.72
3.73
3.74
3.75
3.76
3.77
3.78
3.79
3.80
3.81
3.82
3.83
3.84
3.85
3.86
3.87
3.88
3.89
3.90
3.91
3.92
3.93
3.94
3.95
3.96
3.97
3.98
3.99
4.00
4.01
4.02
4.03
4.04
4.05
4.06
4.07
4.08
4.09
4.12
4.13
4.15
4.16
4.17
dev
helm-3.65.1
v0.69
v0.70beta
v3.33
${ noResults }
14 Commits (rust-volume-server)
| Author | SHA1 | Message | Date |
|---|---|---|---|
|
|
8cde3d4486
|
Add data file compaction to iceberg maintenance (Phase 2) (#8503)
* Add iceberg_maintenance plugin worker handler (Phase 1) Implement automated Iceberg table maintenance as a new plugin worker job type. The handler scans S3 table buckets for tables needing maintenance and executes operations in the correct Iceberg order: expire snapshots, remove orphan files, and rewrite manifests. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Add data file compaction to iceberg maintenance handler (Phase 2) Implement bin-packing compaction for small Parquet data files: - Enumerate data files from manifests, group by partition - Merge small files using parquet-go (read rows, write merged output) - Create new manifest with ADDED/DELETED/EXISTING entries - Commit new snapshot with compaction metadata Add 'compact' operation to maintenance order (runs before expire_snapshots), configurable via target_file_size_bytes and min_input_files thresholds. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Fix memory exhaustion in mergeParquetFiles by processing files sequentially Previously all source Parquet files were loaded into memory simultaneously, risking OOM when a compaction bin contained many small files. Now each file is loaded, its rows are streamed into the output writer, and its data is released before the next file is loaded — keeping peak memory proportional to one input file plus the output buffer. * Validate bucket/namespace/table names against path traversal Reject names containing '..', '/', or '\' in Execute to prevent directory traversal via crafted job parameters. * Add filer address failover in iceberg maintenance handler Try each filer address from cluster context in order instead of only using the first one. This improves resilience when the primary filer is temporarily unreachable. * Add separate MinManifestsToRewrite config for manifest rewrite threshold The rewrite_manifests operation was reusing MinInputFiles (meant for compaction bin file counts) as its manifest count threshold. Add a dedicated MinManifestsToRewrite field with its own config UI section and default value (5) so the two thresholds can be tuned independently. * Fix risky mtime fallback in orphan removal that could delete new files When entry.Attributes is nil, mtime defaulted to Unix epoch (1970), which would always be older than the safety threshold, causing the file to be treated as eligible for deletion. Skip entries with nil Attributes instead, matching the safer logic in operations.go. * Fix undefined function references in iceberg_maintenance_handler.go Use the exported function names (ShouldSkipDetectionByInterval, BuildDetectorActivity, BuildExecutorActivity) matching their definitions in vacuum_handler.go. * Remove duplicated iceberg maintenance handler in favor of iceberg/ subpackage The IcebergMaintenanceHandler and its compaction code in the parent pluginworker package duplicated the logic already present in the iceberg/ subpackage (which self-registers via init()). The old code lacked stale-plan guards, proper path normalization, CAS-based xattr updates, and error-returning parseOperations. Since the registry pattern (default "all") makes the old handler unreachable, remove it entirely. All functionality is provided by iceberg.Handler with the reviewed improvements. * Fix MinManifestsToRewrite clamping to match UI minimum of 2 The clamp reset values below 2 to the default of 5, contradicting the UI's advertised MinValue of 2. Clamp to 2 instead. * Sort entries by size descending in splitOversizedBin for better packing Entries were processed in insertion order which is non-deterministic from map iteration. Sorting largest-first before the splitting loop improves bin packing efficiency by filling bins more evenly. * Add context cancellation check to drainReader loop The row-streaming loop in drainReader did not check ctx between iterations, making long compaction merges uncancellable. Check ctx.Done() at the top of each iteration. * Fix splitOversizedBin to always respect targetSize limit The minFiles check in the split condition allowed bins to grow past targetSize when they had fewer than minFiles entries, defeating the OOM protection. Now bins always split at targetSize, and a trailing runt with fewer than minFiles entries is merged into the previous bin. * Add integration tests for iceberg table maintenance plugin worker Tests start a real weed mini cluster, create S3 buckets and Iceberg table metadata via filer gRPC, then exercise the iceberg.Handler operations (ExpireSnapshots, RemoveOrphans, RewriteManifests) against the live filer. A full maintenance cycle test runs all operations in sequence and verifies metadata consistency. Also adds exported method wrappers (testing_api.go) so the integration test package can call the unexported handler methods. * Fix splitOversizedBin dropping files and add source path to drainReader errors The runt-merge step could leave leading bins with fewer than minFiles entries (e.g. [80,80,10,10] with targetSize=100, minFiles=2 would drop the first 80-byte file). Replace the filter-based approach with an iterative merge that folds any sub-minFiles bin into its smallest neighbor, preserving all eligible files. Also add the source file path to drainReader error messages so callers can identify which Parquet file caused a read/write failure. * Harden integration test error handling - s3put: fail immediately on HTTP 4xx/5xx instead of logging and continuing - lookupEntry: distinguish NotFound (return nil) from unexpected RPC errors (fail the test) - writeOrphan and orphan creation in FullMaintenanceCycle: check CreateEntryResponse.Error in addition to the RPC error * go fmt --------- Co-authored-by: Copilot <copilot@github.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> |
4 days ago |
|
|
a838661b83
|
feat(plugin): EC shard balance handler for plugin worker (#8629)
* feat(ec_balance): add TaskTypeECBalance constant and protobuf definitions Add the ec_balance task type constant to both topology and worker type systems. Define EcBalanceTaskParams, EcShardMoveSpec, and EcBalanceTaskConfig protobuf messages for EC shard balance operations. * feat(ec_balance): add configuration for EC shard balance task Config includes imbalance threshold, min server count, collection filter, disk type, and preferred tags for tag-aware placement. * feat(ec_balance): add multi-phase EC shard balance detection algorithm Implements four detection phases adapted from the ec.balance shell command: 1. Duplicate shard detection and removal proposals 2. Cross-rack shard distribution balancing 3. Within-rack node-level shard balancing 4. Global shard count equalization across nodes Detection is side-effect-free: it builds an EC topology view from ActiveTopology and generates move proposals without executing them. * feat(ec_balance): add EC shard move task execution Implements the shard move sequence using the same VolumeEcShardsCopy, VolumeEcShardsMount, VolumeEcShardsUnmount, and VolumeEcShardsDelete RPCs as the shell ec.balance command. Supports both regular shard moves and dedup-phase deletions (unmount+delete without copy). * feat(ec_balance): add task registration and scheduling Register EC balance task definition with auto-config update support. Scheduling respects max concurrent limits and worker capabilities. * feat(ec_balance): add plugin handler for EC shard balance Implements the full plugin handler with detection, execution, admin and worker config forms, proposal building, and decision trace reporting. Supports collection/DC/disk type filtering, preferred tag placement, and configurable detection intervals. Auto-registered via init() with the handler registry. * test(ec_balance): add tests for detection algorithm and plugin handler Detection tests cover: duplicate shard detection, cross-rack imbalance, within-rack imbalance, global rebalancing, topology building, collection filtering, and edge cases. Handler tests cover: config derivation with clamping, proposal building, protobuf encode/decode round-trip, fallback parameter decoding, capability, and config policy round-trip. * fix(ec_balance): address PR review feedback and fix CI test failure - Update TestWorkerDefaultJobTypes to expect 6 handlers (was 5) - Extract threshold constants (ecBalanceMinImbalanceThreshold, etc.) to eliminate magic numbers in Descriptor and config derivation - Remove duplicate ShardIdsToUint32 helper (use erasure_coding package) - Add bounds checks for int64→int/uint32 conversions to fix CodeQL integer conversion warnings * fix(ec_balance): address code review findings storage_impact.go: - Add TaskTypeECBalance case returning shard-level reservation (ShardSlots: -1/+1) instead of falling through to default which incorrectly reserves a full volume slot on target. detection.go: - Use dc:rack composite key to avoid cross-DC rack name collisions. Only create rack entries after confirming node has matching disks. - Add exceedsImbalanceThreshold check to cross-rack, within-rack, and global phases so trivial skews below the configured threshold are ignored. Dedup phase always runs since duplicates are errors. - Reserve destination capacity after each planned move (decrement destNode.freeSlots, update rackShardCount/nodeShardCount) to prevent overbooking the same destination. - Skip nodes with freeSlots <= 0 when selecting minNode in global balance to avoid proposing moves to full nodes. - Include loop index and source/target node IDs in TaskID to guarantee uniqueness across moves with the same volumeID/shardID. ec_balance_handler.go: - Fail fast with error when shard_id is absent in fallback parameter decoding instead of silently defaulting to shard 0. ec_balance_task.go: - Delegate GetProgress() to BaseTask.GetProgress() so progress updates from ReportProgressWithStage are visible to callers. - Add fail-fast guard rejecting multiple sources/targets until batch execution is implemented. Findings verified but not changed (matches existing codebase pattern in vacuum/balance/erasure_coding handlers): - register.go globalTaskDef.Config race: same unsynchronized pattern in all 4 task packages. - CreateTask using generated ID: same fmt.Sprintf pattern in all 4 task packages. * fix(ec_balance): harden parameter decoding, progress tracking, and validation ec_balance_handler.go (decodeECBalanceTaskParams): - Validate execution-critical fields (Sources[0].Node, ShardIds, Targets[0].Node, ShardIds) after protobuf deserialization. - Require source_disk_id and target_disk_id in legacy fallback path so Targets[0].DiskId is populated for VolumeEcShardsCopyRequest. - All error messages reference decodeECBalanceTaskParams and the specific missing field (TaskParams, shard_id, Targets[0].DiskId, EcBalanceTaskParams) for debuggability. ec_balance_task.go: - Track progress in ECBalanceTask.progress field, updated via reportProgress() helper called before ReportProgressWithStage(), so GetProgress() returns real stage progress instead of stale 0. - Validate: require exactly 1 source and 1 target (mirrors Execute guard), require ShardIds on both, with error messages referencing ECBalanceTask.Validate and the specific field. * fix(ec_balance): fix dedup execution path, stale topology, collection filter, timeout, and dedupeKey detection.go: - Dedup moves now set target=source so isDedupPhase() triggers the unmount+delete-only execution path instead of attempting a copy. - Apply moves to in-memory topology between phases via applyMovesToTopology() so subsequent phases see updated shard placement and don't conflict with already-planned moves. - detectGlobalImbalance now accepts allowedVids and filters both shard counting and shard selection to respect CollectionFilter. ec_balance_task.go: - Apply EcBalanceTaskParams.TimeoutSeconds to the context via context.WithTimeout so all RPC operations respect the configured timeout instead of hanging indefinitely. ec_balance_handler.go: - Include source node ID in dedupeKey so dedup deletions from different source nodes for the same shard aren't collapsed. - Clamp minServerCountRaw and minIntervalRaw lower bounds on int64 before narrowing to int, preventing undefined overflow on 32-bit. * fix(ec_balance): log warning before cancelling on progress send failure Log the error, job ID, job type, progress percentage, and stage before calling execCancel() in the progress callback so failed progress sends are diagnosable instead of silently cancelling. |
5 days ago |
|
|
78a3441b30
|
fix: volume balance detection returns multiple tasks per run (#8559)
* fix: volume balance detection now returns multiple tasks per run (#8551) Previously, detectForDiskType() returned at most 1 balance task per disk type, making the MaxJobsPerDetection setting ineffective. The detection loop now iterates within each disk type, planning multiple moves until the imbalance drops below threshold or maxResults is reached. Effective volume counts are adjusted after each planned move so the algorithm correctly re-evaluates which server is overloaded. * fix: factor pending tasks into destination scoring and use UnixNano for task IDs - Use UnixNano instead of Unix for task IDs to avoid collisions when multiple tasks are created within the same second - Adjust calculateBalanceScore to include LoadCount (pending + assigned tasks) in the utilization estimate, so the destination picker avoids stacking multiple planned moves onto the same target disk * test: add comprehensive balance detection tests for complex scenarios Cover multi-server convergence, max-server shifting, destination spreading, pre-existing pending task skipping, no-duplicate-volume invariant, and parameterized convergence verification across different cluster shapes and thresholds. * fix: address PR review findings in balance detection - hasMore flag: compute from len(results) >= maxResults so the scheduler knows more pages may exist, matching vacuum/EC handler pattern - Exhausted server fallthrough: when no eligible volumes remain on the current maxServer (all have pending tasks) or destination planning fails, mark the server as exhausted and continue to the next overloaded server instead of stopping the entire detection loop - Return canonical destination server ID directly from createBalanceTask instead of resolving via findServerIDByAddress, eliminating the fragile address→ID lookup for adjustment tracking - Fix bestScore sentinel: use math.Inf(-1) instead of -1.0 so disks with negative scores (high pending load, same rack/DC) are still selected as the best available destination - Add TestDetection_ExhaustedServerFallsThrough covering the scenario where the top server's volumes are all blocked by pre-existing tasks * test: fix computeEffectiveCounts and add len guard in no-duplicate test - computeEffectiveCounts now takes a servers slice to seed counts for all known servers (including empty ones) and uses an address→ID map from the topology spec instead of scanning metrics, so destination servers with zero initial volumes are tracked correctly - TestDetection_NoDuplicateVolumesAcrossIterations now asserts len > 1 before checking duplicates, so the test actually fails if Detection regresses to returning a single task * fix: remove redundant HasAnyTask check in createBalanceTask The HasAnyTask check in createBalanceTask duplicated the same check already performed in detectForDiskType's volume selection loop. Since detection runs single-threaded (MaxDetectionConcurrency: 1), no race can occur between the two points. * fix: consistent hasMore pattern and remove double-counted LoadCount in scoring - Adopt vacuum_handler's hasMore pattern: over-fetch by 1, check len > maxResults, and truncate — consistent truncation semantics - Remove direct LoadCount penalty in calculateBalanceScore since LoadCount is already factored into effectiveVolumeCount for utilization scoring; bump utilization weight from 40 to 50 to compensate for the removed 10-point load penalty * fix: handle zero maxResults as no-cap, emit trace after trim, seed empty servers - When MaxResults is 0 (omitted), treat as no explicit cap instead of defaulting to 1; only apply the +1 over-fetch probe when caller supplies a positive limit - Move decision trace emission after hasMore/trim so the trace accurately reflects the returned proposals - Seed serverVolumeCounts from ActiveTopology so servers that have a matching disk type but zero volumes are included in the imbalance calculation and MinServerCount check * fix: nil-guard clusterInfo, uncap legacy DetectionFunc, deterministic disk type order - Add early nil guard for clusterInfo in Detection to prevent panics in downstream helpers (detectForDiskType, createBalanceTask) - Change register.go DetectionFunc wrapper from maxResults=1 to 0 (no cap) so the legacy code path returns all detected tasks - Sort disk type keys before iteration so results are deterministic when maxResults spans multiple disk types (HDD/SSD) * fix: don't over-fetch in stateful detection to avoid orphaned pending tasks Detection registers planned moves in ActiveTopology via AddPendingTask, so requesting maxResults+1 would create an extra pending task that gets discarded during trim. Use len(results) >= maxResults as the hasMore signal instead, which is correct since Detection already caps internally. * fix: return explicit truncated flag from Detection instead of approximating Detection now returns (results, truncated, error) where truncated is true only when the loop stopped because it hit maxResults, not when it ran out of work naturally. This eliminates false hasMore signals when detection happens to produce exactly maxResults results by resolving the imbalance. * cleanup: simplify detection logic and remove redundancies - Remove redundant clusterInfo nil check in detectForDiskType since Detection already guards against nil clusterInfo - Remove adjustments loop for destination servers not in serverVolumeCounts — topology seeding ensures all servers with matching disk type are already present - Merge two-loop min/max calculation into a single loop: min across all servers, max only among non-exhausted servers - Replace magic number 100 with len(metrics) for minC initialization in convergence test * fix: accurate truncation flag, deterministic server order, indexed volume lookup - Track balanced flag to distinguish "hit maxResults cap" from "cluster balanced at exactly maxResults" — truncated is only true when there's genuinely more work to do - Sort servers for deterministic iteration and tie-breaking when multiple servers have equal volume counts - Pre-index volumes by server with per-server cursors to avoid O(maxResults * volumes) rescanning on each iteration - Add truncation flag assertions to RespectsMaxResults test: true when capped, false when detection finishes naturally * fix: seed trace server counts from ActiveTopology to match detection logic The decision trace was building serverVolumeCounts only from metrics, missing zero-volume servers seeded from ActiveTopology by Detection. This could cause the trace to report wrong server counts, incorrect imbalance ratios, or spurious "too few servers" messages. Pass activeTopology into the trace function and seed server counts the same way Detection does. * fix: don't exhaust server on per-volume planning failure, sort volumes by ID - When createBalanceTask returns nil, continue to the next volume on the same server instead of marking the entire server as exhausted. The failure may be volume-specific (not found in topology, pending task registration failed) and other volumes on the server may still be viable candidates. - Sort each server's volume slice by VolumeID after pre-indexing so volume selection is fully deterministic regardless of input order. * fix: use require instead of assert to prevent nil dereference panic in CORS test The test used assert.NoError (non-fatal) for GetBucketCors, then immediately accessed getResp.CORSRules. When the API returns an error, getResp is nil causing a panic. Switch to require.NoError/NotNil/Len so the test stops before dereferencing a nil response. * fix: deterministic disk tie-breaking and stronger pre-existing task test - Sort available disks by NodeID then DiskID before scoring so destination selection is deterministic when two disks score equally - Add task count bounds assertion to SkipsPreExistingPendingTasks test: with 15 of 20 volumes already having pending tasks, at most 5 new tasks should be created and at least 1 (imbalance still exists) * fix: seed adjustments from existing pending/assigned tasks to prevent over-scheduling Detection now calls ActiveTopology.GetTaskServerAdjustments() to initialize the adjustments map with source/destination deltas from existing pending and assigned balance tasks. This ensures effectiveCounts reflects in-flight moves, preventing the algorithm from planning additional moves in the same direction when prior moves already address the imbalance. Added GetTaskServerAdjustments(taskType) to ActiveTopology which iterates pending and assigned tasks, decrementing source servers and incrementing destination servers for the given task type. |
2 weeks ago |
|
|
c19f88eef1
|
fix: resolve ServerAddress to NodeId in maintenance task sync (#8508)
* fix: maintenance task topology lookup, retry, and stale task cleanup 1. Strip gRPC port from ServerAddress in SyncTask using ToHttpAddress() so task targets match topology disk keys (NodeId format). 2. Skip capacity check when topology has no disks yet (startup race where tasks are loaded from persistence before first topology update). 3. Don't retry permanent errors like "volume not found" - these will never succeed on retry. 4. Cancel all pending tasks for each task type before re-detection, ensuring stale proposals from previous cycles are cleaned up. This prevents stale tasks from blocking new detection and from repeatedly failing. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * logs Co-Authored-By: Copilot <223556219+Copilot@users.noreply.github.com> * less lock scope Co-Authored-By: Copilot <223556219+Copilot@users.noreply.github.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> |
2 weeks ago |
|
|
f5c35240be
|
Add volume dir tags and EC placement priority (#8472)
* Add volume dir tags to topology Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Add preferred tag config for EC Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Prioritize EC destinations by tags Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Add EC placement planner tag tests Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Refactor EC placement tests to reuse buildActiveTopology Remove buildActiveTopologyWithDiskTags helper function and consolidate tag setup inline in test cases. Tests now use UpdateTopology to apply tags after topology creation, reusing the existing buildActiveTopology function rather than duplicating its logic. All tag scenario tests pass: - TestECPlacementPlannerPrefersTaggedDisks - TestECPlacementPlannerFallsBackWhenTagsInsufficient Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Consolidate normalizeTagList into shared util package Extract normalizeTagList from three locations (volume.go, detection.go, erasure_coding_handler.go) into new weed/util/tag.go as exported NormalizeTagList function. Replace all duplicate implementations with imports and calls to util.NormalizeTagList. This improves code reuse and maintainability by centralizing tag normalization logic. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Add PreferredTags to EC config persistence Add preferred_tags field to ErasureCodingTaskConfig protobuf with field number 5. Update GetConfigSpec to include preferred_tags field in the UI configuration schema. Add PreferredTags to ToTaskPolicy to serialize config to protobuf. Add PreferredTags to FromTaskPolicy to deserialize from protobuf with defensive copy to prevent external mutation. This allows EC preferred tags to be persisted and restored across worker restarts. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Add defensive copy for Tags slice in DiskLocation Copy the incoming tags slice in NewDiskLocation instead of storing by reference. This prevents external callers from mutating the DiskLocation.Tags slice after construction, improving encapsulation and preventing unexpected changes to disk metadata. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Add doc comment to buildCandidateSets method Document the tiered candidate selection and fallback behavior. Explain that for a planner with preferredTags, it accumulates disks matching each tag in order into progressively larger tiers, emits a candidate set once a tier reaches shardsNeeded, and finally falls back to the full candidates set if preferred-tag tiers are insufficient. This clarifies the intended semantics for future maintainers. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Apply final PR review fixes 1. Update parseVolumeTags to replicate single tag entry to all folders instead of leaving some folders with nil tags. This prevents nil pointer dereferences when processing folders without explicit tags. 2. Add defensive copy in ToTaskPolicy for PreferredTags slice to match the pattern used in FromTaskPolicy, preventing external mutation of the returned TaskPolicy. 3. Add clarifying comment in buildCandidateSets explaining that the shardsNeeded <= 0 branch is a defensive check for direct callers, since selectDestinations guarantees shardsNeeded > 0. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Fix nil pointer dereference in parseVolumeTags Ensure all folder tags are initialized to either normalized tags or empty slices, not nil. When multiple tag entries are provided and there are more folders than entries, remaining folders now get empty slices instead of nil, preventing nil pointer dereference in downstream code. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Fix NormalizeTagList to return empty slice instead of nil Change NormalizeTagList to always return a non-nil slice. When all tags are empty or whitespace after normalization, return an empty slice instead of nil. This prevents nil pointer dereferences in downstream code that expects a valid (possibly empty) slice. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Add nil safety check for v.tags pointer Add a safety check to handle the case where v.tags might be nil, preventing a nil pointer dereference. If v.tags is nil, use an empty string instead. This is defensive programming to prevent panics in edge cases. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Add volume.tags flag to weed server and weed mini commands Add the volume.tags CLI option to both the 'weed server' and 'weed mini' commands. This allows users to specify disk tags when running the combined server modes, just like they can with 'weed volume'. The flag uses the same format and description as the volume command: comma-separated tag groups per data dir with ':' separators (e.g. fast:ssd,archive). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --------- Co-authored-by: Copilot <copilot@github.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> |
3 weeks ago |
|
|
8ec9ff4a12
|
Refactor plugin system and migrate worker runtime (#8369)
* admin: add plugin runtime UI page and route wiring * pb: add plugin gRPC contract and generated bindings * admin/plugin: implement worker registry, runtime, monitoring, and config store * admin/dash: wire plugin runtime and expose plugin workflow APIs * command: add flags to enable plugin runtime * admin: rename remaining plugin v2 wording to plugin * admin/plugin: add detectable job type registry helper * admin/plugin: add scheduled detection and dispatch orchestration * admin/plugin: prefetch job type descriptors when workers connect * admin/plugin: add known job type discovery API and UI * admin/plugin: refresh design doc to match current implementation * admin/plugin: enforce per-worker scheduler concurrency limits * admin/plugin: use descriptor runtime defaults for scheduler policy * admin/ui: auto-load first known plugin job type on page open * admin/plugin: bootstrap persisted config from descriptor defaults * admin/plugin: dedupe scheduled proposals by dedupe key * admin/ui: add job type and state filters for plugin monitoring * admin/ui: add per-job-type plugin activity summary * admin/plugin: split descriptor read API from schema refresh * admin/ui: keep plugin summary metrics global while tables are filtered * admin/plugin: retry executor reservation before timing out * admin/plugin: expose scheduler states for monitoring * admin/ui: show per-job-type scheduler states in plugin monitor * pb/plugin: rename protobuf package to plugin * admin/plugin: rename pluginRuntime wiring to plugin * admin/plugin: remove runtime naming from plugin APIs and UI * admin/plugin: rename runtime files to plugin naming * admin/plugin: persist jobs and activities for monitor recovery * admin/plugin: lease one detector worker per job type * admin/ui: show worker load from plugin heartbeats * admin/plugin: skip stale workers for detector and executor picks * plugin/worker: add plugin worker command and stream runtime scaffold * plugin/worker: implement vacuum detect and execute handlers * admin/plugin: document external vacuum plugin worker starter * command: update plugin.worker help to reflect implemented flow * command/admin: drop legacy Plugin V2 label * plugin/worker: validate vacuum job type and respect min interval * plugin/worker: test no-op detect when min interval not elapsed * command/admin: document plugin.worker external process * plugin/worker: advertise configured concurrency in hello * command/plugin.worker: add jobType handler selection * command/plugin.worker: test handler selection by job type * command/plugin.worker: persist worker id in workingDir * admin/plugin: document plugin.worker jobType and workingDir flags * plugin/worker: support cancel request for in-flight work * plugin/worker: test cancel request acknowledgements * command/plugin.worker: document workingDir and jobType behavior * plugin/worker: emit executor activity events for monitor * plugin/worker: test executor activity builder * admin/plugin: send last successful run in detection request * admin/plugin: send cancel request when detect or execute context ends * admin/plugin: document worker cancel request responsibility * admin/handlers: expose plugin scheduler states API in no-auth mode * admin/handlers: test plugin scheduler states route registration * admin/plugin: keep worker id on worker-generated activity records * admin/plugin: test worker id propagation in monitor activities * admin/dash: always initialize plugin service * command/admin: remove plugin enable flags and default to enabled * admin/dash: drop pluginEnabled constructor parameter * admin/plugin UI: stop checking plugin enabled state * admin/plugin: remove docs for plugin enable flags * admin/dash: remove unused plugin enabled check method * admin/dash: fallback to in-memory plugin init when dataDir fails * admin/plugin API: expose worker gRPC port in status * command/plugin.worker: resolve admin gRPC port via plugin status * split plugin UI into overview/configuration/monitoring pages * Update layout_templ.go * add volume_balance plugin worker handler * wire plugin.worker CLI for volume_balance job type * add erasure_coding plugin worker handler * wire plugin.worker CLI for erasure_coding job type * support multi-job handlers in plugin worker runtime * allow plugin.worker jobType as comma-separated list * admin/plugin UI: rename to Workers and simplify config view * plugin worker: queue detection requests instead of capacity reject * Update plugin_worker.go * plugin volume_balance: remove force_move/timeout from worker config UI * plugin erasure_coding: enforce local working dir and cleanup * admin/plugin UI: rename admin settings to job scheduling * admin/plugin UI: persist and robustly render detection results * admin/plugin: record and return detection trace metadata * admin/plugin UI: show detection process and decision trace * plugin: surface detector decision trace as activities * mini: start a plugin worker by default * admin/plugin UI: split monitoring into detection and execution tabs * plugin worker: emit detection decision trace for EC and balance * admin workers UI: split monitoring into detection and execution pages * plugin scheduler: skip proposals for active assigned/running jobs * admin workers UI: add job queue tab * plugin worker: add dummy stress detector and executor job type * admin workers UI: reorder tabs to detection queue execution * admin workers UI: regenerate plugin template * plugin defaults: include dummy stress and add stress tests * plugin dummy stress: rotate detection selections across runs * plugin scheduler: remove cross-run proposal dedupe * plugin queue: track pending scheduled jobs * plugin scheduler: wait for executor capacity before dispatch * plugin scheduler: skip detection when waiting backlog is high * plugin: add disk-backed job detail API and persistence * admin ui: show plugin job detail modal from job id links * plugin: generate unique job ids instead of reusing proposal ids * plugin worker: emit heartbeats on work state changes * plugin registry: round-robin tied executor and detector picks * add temporary EC overnight stress runner * plugin job details: persist and render EC execution plans * ec volume details: color data and parity shard badges * shard labels: keep parity ids numeric and color-only distinction * admin: remove legacy maintenance UI routes and templates * admin: remove dead maintenance endpoint helpers * Update layout_templ.go * remove dummy_stress worker and command support * refactor plugin UI to job-type top tabs and sub-tabs * migrate weed worker command to plugin runtime * remove plugin.worker command and keep worker runtime with metrics * update helm worker args for jobType and execution flags * set plugin scheduling defaults to global 16 and per-worker 4 * stress: fix RPC context reuse and remove redundant variables in ec_stress_runner * admin/plugin: fix lifecycle races, safe channel operations, and terminal state constants * admin/dash: randomize job IDs and fix priority zero-value overwrite in plugin API * admin/handlers: implement buffered rendering to prevent response corruption * admin/plugin: implement debounced persistence flusher and optimize BuildJobDetail memory lookups * admin/plugin: fix priority overwrite and implement bounded wait in scheduler reserve * admin/plugin: implement atomic file writes and fix run record side effects * admin/plugin: use P prefix for parity shard labels in execution plans * admin/plugin: enable parallel execution for cancellation tests * admin: refactor time.Time fields to pointers for better JSON omitempty support * admin/plugin: implement pointer-safe time assignments and comparisons in plugin core * admin/plugin: fix time assignment and sorting logic in plugin monitor after pointer refactor * admin/plugin: update scheduler activity tracking to use time pointers * admin/plugin: fix time-based run history trimming after pointer refactor * admin/dash: fix JobSpec struct literal in plugin API after pointer refactor * admin/view: add D/P prefixes to EC shard badges for UI consistency * admin/plugin: use lifecycle-aware context for schema prefetching * Update ec_volume_details_templ.go * admin/stress: fix proposal sorting and log volume cleanup errors * stress: refine ec stress runner with math/rand and collection name - Added Collection field to VolumeEcShardsDeleteRequest for correct filename construction. - Replaced crypto/rand with seeded math/rand PRNG for bulk payloads. - Added documentation for EcMinAge zero-value behavior. - Added logging for ignored errors in volume/shard deletion. * admin: return internal server error for plugin store failures Changed error status code from 400 Bad Request to 500 Internal Server Error for failures in GetPluginJobDetail to correctly reflect server-side errors. * admin: implement safe channel sends and graceful shutdown sync - Added sync.WaitGroup to Plugin struct to manage background goroutines. - Implemented safeSendCh helper using recover() to prevent panics on closed channels. - Ensured Shutdown() waits for all background operations to complete. * admin: robustify plugin monitor with nil-safe time and record init - Standardized nil-safe assignment for *time.Time pointers (CreatedAt, UpdatedAt, CompletedAt). - Ensured persistJobDetailSnapshot initializes new records correctly if they don't exist on disk. - Fixed debounced persistence to trigger immediate write on job completion. * admin: improve scheduler shutdown behavior and logic guards - Replaced brittle error string matching with explicit r.shutdownCh selection for shutdown detection. - Removed redundant nil guard in buildScheduledJobSpec. - Standardized WaitGroup usage for schedulerLoop. * admin: implement deep copy for job parameters and atomic write fixes - Implemented deepCopyGenericValue and used it in cloneTrackedJob to prevent shared state. - Ensured atomicWriteFile creates parent directories before writing. * admin: remove unreachable branch in shard classification Removed an unreachable 'totalShards <= 0' check in classifyShardID as dataShards and parityShards are already guarded. * admin: secure UI links and use canonical shard constants - Added rel="noopener noreferrer" to external links for security. - Replaced magic number 14 with erasure_coding.TotalShardsCount. - Used renderEcShardBadge for missing shard list consistency. * admin: stabilize plugin tests and fix regressions - Composed a robust plugin_monitor_test.go to handle asynchronous persistence. - Updated all time.Time literals to use timeToPtr helper. - Added explicit Shutdown() calls in tests to synchronize with debounced writes. - Fixed syntax errors and orphaned struct literals in tests. * Potential fix for code scanning alert no. 278: Slice memory allocation with excessive size value Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com> * Potential fix for code scanning alert no. 283: Uncontrolled data used in path expression Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com> * admin: finalize refinements for error handling, scheduler, and race fixes - Standardized HTTP 500 status codes for store failures in plugin_api.go. - Tracked scheduled detection goroutines with sync.WaitGroup for safe shutdown. - Fixed race condition in safeSendDetectionComplete by extracting channel under lock. - Implemented deep copy for JobActivity details. - Used defaultDirPerm constant in atomicWriteFile. * test(ec): migrate admin dockertest to plugin APIs * admin/plugin_api: fix RunPluginJobTypeAPI to return 500 for server-side detection/filter errors * admin/plugin_api: fix ExecutePluginJobAPI to return 500 for job execution failures * admin/plugin_api: limit parseProtoJSONBody request body to 1MB to prevent unbounded memory usage * admin/plugin: consolidate regex to package-level validJobTypePattern; add char validation to sanitizeJobID * admin/plugin: fix racy Shutdown channel close with sync.Once * admin/plugin: track sendLoop and recv goroutines in WorkerStream with r.wg * admin/plugin: document writeProtoFiles atomicity — .pb is source of truth, .json is human-readable only * admin/plugin: extract activityLess helper to deduplicate nil-safe OccurredAt sort comparators * test/ec: check http.NewRequest errors to prevent nil req panics * test/ec: replace deprecated ioutil/math/rand, fix stale step comment 5.1→3.1 * plugin(ec): raise default detection and scheduling throughput limits * topology: include empty disks in volume list and EC capacity fallback * topology: remove hard 10-task cap for detection planning * Update ec_volume_details_templ.go * adjust default * fix tests --------- Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com> |
4 weeks ago |
|
|
19c18d827a
|
admin: fix capacity leak in maintenance system by preserving Task IDs (#8214)
* admin: fix capacity leak in maintenance system by preserving Task IDs Preserve the original TaskID generated during detection and sync task states (Assign/Complete/Retry) with ActiveTopology. This ensures that capacity reserved during task assignment is properly released when a task completes or fails, preventing 'need 9, have 0' capacity exhaustion. Fixes https://github.com/seaweedfs/seaweedfs/issues/8202 * Update weed/admin/maintenance/maintenance_queue.go Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> * Update weed/admin/maintenance/maintenance_queue.go Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> * test: rename ActiveTopologySync to TaskIDPreservation Rename the test case to more accurately reflect its scope, as suggested by the code review bot. * Add TestMaintenanceQueue_ActiveTopologySync to verify task state synchronization and capacity management * Implement task assignment rollback and add verification test * Enhance ActiveTopology.CompleteTask to support pending tasks * Populate storage impact in MaintenanceIntegration.SyncTask * Release capacity in RemoveStaleWorkers when worker becomes unavailable * Release capacity in MaintenanceManager.CancelTask when pending task is cancelled * Sync reloaded tasks with ActiveTopology in LoadTasksFromPersistence * Add verification tests for consistent capacity management lifecycle * Add TestMaintenanceQueue_RetryCapacitySync to verify capacity tracking during retries --------- Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> |
1 month ago |
|
|
13dcf445a4
|
Fix maintenance worker panic and add EC integration tests (#8068)
* Fix nil pointer panic in maintenance worker when receiving empty task assignment When a worker requests a task and none are available, the admin server sends an empty TaskAssignment message. The worker was attempting to log the task details without checking if the TaskId was empty, causing a nil pointer dereference when accessing taskAssign.Params.VolumeId. This fix adds a check for empty TaskId before processing the assignment, preventing worker crashes and improving stability in production environments. * Add EC integration test for admin-worker maintenance system Adds comprehensive integration test that verifies the end-to-end flow of erasure coding maintenance tasks: - Admin server detects volumes needing EC encoding - Workers register and receive task assignments - EC encoding is executed and verified in master topology - File read-back validation confirms data integrity The test uses unique absolute working directories for each worker to prevent ID conflicts and ensure stable worker registration. Includes proper cleanup and process management for reliable test execution. * Improve maintenance system stability and task deduplication - Add cross-type task deduplication to prevent concurrent maintenance operations on the same volume (EC, balance, vacuum) - Implement HasAnyTask check in ActiveTopology for better coordination - Increase RequestTask timeout from 5s to 30s to prevent unnecessary worker reconnections - Add TaskTypeNone sentinel for generic task checks - Update all task detectors to use HasAnyTask for conflict prevention - Improve config persistence and schema handling * Add GitHub Actions workflow for EC integration tests Adds CI workflow that runs EC integration tests on push and pull requests to master branch. The workflow: - Triggers on changes to admin, worker, or test files - Builds the weed binary - Runs the EC integration test suite - Uploads test logs as artifacts on failure for debugging This ensures the maintenance system remains stable and worker-admin integration is validated in CI. * go version 1.24 * address comments * Update maintenance_integration.go * support seconds * ec prioritize over balancing in tests |
2 months ago |
|
|
c260e6a22e
|
Fix issue #7880: Tasks use Volume IDs instead of ip:port (#7881)
* Fix issue #7880: Tasks use Volume IDs instead of ip:port When volume servers are registered with custom IDs, tasks were attempting to connect using the ID instead of the actual ip:port address, causing connection failures. Modified task detection logic in balance, erasure coding, and vacuum tasks to resolve volume server IDs to their actual ip:port addresses using ActiveTopology information. * Use server addresses directly instead of translating from IDs Modified VolumeHealthMetrics to include ServerAddress field populated directly from topology DataNodeInfo.Address. Updated task detection logic to use addresses directly without runtime lookups. Changes: - Added ServerAddress field to VolumeHealthMetrics - Updated maintenance scanner to populate ServerAddress - Modified task detection to use ServerAddress for Node fields - Updated DestinationPlan to include TargetAddress - Removed runtime address lookups in favor of direct address usage * Address PR comments: add ServerAddress field, improve error handling - Add missing ServerAddress field to VolumeHealthMetrics struct - Add warning in vacuum detection when server not found in topology - Improve error handling in erasure coding to abort task if sources missing - Make vacuum task stricter by skipping if server not found in topology * Refactor: Extract common address resolution logic into shared utility - Created weed/worker/tasks/util/address.go with ResolveServerAddress function - Updated balance, erasure_coding, and vacuum detection to use the shared utility - Removed code duplication and improved maintainability - Consistent error handling across all task types * Fix critical issues in task address resolution - Vacuum: Require topology availability and fail if server not found (no fallback to ID) - Ensure all task types consistently fail early when topology is incomplete - Prevent creation of tasks that would fail due to missing server addresses * Address additional PR feedback - Add validation for empty addresses in ResolveServerAddress - Remove redundant serverAddress variable in vacuum detection - Improve robustness of address resolution * Improve error logging in vacuum detection - Include actual error details in log message for better diagnostics - Make error messages consistent with other task types |
3 months ago |
|
|
5b86d33c3c
|
Fix worker reconnection race condition causing context canceled errors (#7825)
* Fix worker reconnection race condition causing context canceled errors Fixes #7824 This commit fixes critical connection stability issues between admin server and workers that manifested as rapid reconnection cycles with 'context canceled' errors, particularly after 24+ hours of operation in containerized environments. Root Cause: ----------- Race condition where TWO goroutines were calling stream.Recv() on the same gRPC bidirectional stream concurrently: 1. sendRegistrationSync() started a goroutine that calls stream.Recv() 2. handleIncoming() also calls stream.Recv() in a loop Per gRPC specification, only ONE goroutine can call Recv() on a stream at a time. Concurrent Recv() calls cause undefined behavior, manifesting as 'context canceled' errors and stream corruption. The race occurred during worker reconnection: - Sometimes sendRegistrationSync goroutine read the registration response first (success) - Sometimes handleIncoming read it first, causing sendRegistrationSync to timeout - This left the stream in an inconsistent state, triggering 'context canceled' error - The error triggered rapid reconnection attempts, creating a reconnection storm Why it happened after 24 hours: Container orchestration systems (Docker Swarm/Kubernetes) periodically restart pods. Over time, workers reconnect multiple times. Each reconnection had a chance of hitting the race condition. Eventually the race manifested and caused the connection storm. Changes: -------- weed/worker/client.go: - Start handleIncoming and handleOutgoing goroutines BEFORE sending registration - Use sendRegistration() instead of sendRegistrationSync() - Ensures only ONE goroutine (handleIncoming) calls stream.Recv() - Eliminates race condition entirely weed/admin/dash/worker_grpc_server.go: - Clean up old connection when worker reconnects with same ID - Cancel old connection context to stop its goroutines - Prevents resource leaks and stale connection accumulation Impact: ------- Before: Random 'context canceled' errors during reconnection, rapid reconnection cycles, resource leaks, requires manual restart to recover After: Reliable reconnection, single Recv() goroutine, proper cleanup, stable operation over 24+ hours Testing: -------- Build verified successful with no compilation errors. How to reproduce the bug: 1. Start admin server and worker 2. Restart admin server (simulates container recreation) 3. Worker reconnects 4. Race condition may manifest, causing 'context canceled' error 5. Observe rapid reconnection cycles in logs The fix is backward compatible and requires no configuration changes. * Add MaxConnectionAge to gRPC server for Docker Swarm DNS handling - Configure MaxConnectionAge and MaxConnectionAgeGrace for gRPC server - Expand error detection in shouldInvalidateConnection for better cache invalidation - Add connection lifecycle logging for debugging * Add topology validation and nil-safety checks - Add validation guards in UpdateTopology to prevent invalid updates - Add nil-safety checks in rebuildIndexes - Add GetDiskCount method for diagnostic purposes * Fix worker registration race condition - Reorder goroutine startup in WorkerStream to prevent race conditions - Add defensive cleanup in unregisterWorker with panic-safe channel closing * Add comprehensive topology update logging - Enhance UpdateTopologyInfo with detailed logging of datacenter/node/disk counts - Add metrics logging for topology changes * Add periodic diagnostic status logging - Implement topologyStatusLoop running every 5 minutes - Add logTopologyStatus function reporting system metrics - Run as background goroutine in maintenance manager * Enhance master client connection logging - Add connection timing logs in tryConnectToMaster - Add reconnection attempt counting in KeepConnectedToMaster - Improve diagnostic visibility for connection issues * Remove unused sendRegistrationSync function - Function is no longer called after switching to asynchronous sendRegistration - Contains the problematic concurrent stream.Recv() pattern that caused race conditions - Cleanup as suggested in PR review * Clarify comment for channel closing during disconnection - Improve comment to explain why channels are closed and their effect - Make the code more self-documenting as suggested in PR review * Address code review feedback: refactor and improvements - Extract topology counting logic to shared helper function CountTopologyResources() to eliminate duplication between topology_management.go and maintenance_integration.go - Use gRPC status codes for more robust error detection in shouldInvalidateConnection(), falling back to string matching for transport-level errors - Add recover wrapper for channel close consistency in cleanupStaleConnections() to match unregisterWorker() pattern * Update grpc_client_server.go * Fix data race on lastSeen field access - Add mutex protection around conn.lastSeen = time.Now() in WorkerStream method - Ensures thread-safe access consistent with cleanupStaleConnections * Fix goroutine leaks in worker reconnection logic - Close streamExit in reconnect() before creating new connection - Close streamExit in attemptConnection() when sendRegistration fails - Prevents orphaned handleOutgoing/handleIncoming goroutines from previous connections - Ensures proper cleanup of goroutines competing for shared outgoing channel * Minor cleanup improvements for consistency and clarity - Remove redundant string checks in shouldInvalidateConnection that overlap with gRPC status codes - Add recover block to Stop() method for consistency with other channel close operations - Maintains valuable DNS and transport-specific error detection while eliminating redundancy * Improve topology update error handling - Return descriptive errors instead of silently preserving topology for invalid updates - Change nil topologyInfo case to return 'rejected invalid topology update: nil topologyInfo' - Change empty DataCenterInfos case to return 'rejected invalid topology update: empty DataCenterInfos (had X nodes, Y disks)' - Keep existing glog.Warningf calls but append error details to logs before returning errors - Allows callers to distinguish rejected updates and handle them appropriately * Refactor safe channel closing into helper method - Add safeCloseOutgoingChannel helper method to eliminate code duplication - Replace repeated recover blocks in Stop, unregisterWorker, and cleanupStaleConnections - Improves maintainability and ensures consistent error handling across all channel close operations - Maintains same panic recovery behavior with contextual source identification * Make connection invalidation string matching case-insensitive - Convert error string to lowercase once for all string.Contains checks - Improves robustness by catching error message variations from different sources - Eliminates need for separate 'DNS resolution' and 'dns' checks - Maintains same error detection coverage with better reliability * Clean up warning logs in UpdateTopology to avoid duplicating error text - Remove duplicated error phrases from glog.Warningf messages - Keep concise contextual warnings that don't repeat the fmt.Errorf content - Maintain same error returns for backward compatibility * Add robust validation to prevent topology wipeout during master restart - Reject topology updates with 0 nodes when current topology has nodes - Prevents transient empty topology from overwriting valid state - Improves resilience during master restart scenarios - Maintains backward compatibility for legitimate empty topology updates |
3 months ago |
|
|
25bbf4c3d4
|
Admin UI: Fetch task logs (#7114)
* show task details * loading tasks * task UI works * generic rendering * rendering the export link * removing placementConflicts from task parameters * remove TaskSourceLocation * remove "Server ID" column * rendering balance task source * sources and targets * fix ec task generation * move info * render timeline * simplified worker id * simplify * read task logs from worker * isValidTaskID * address comments * Update weed/worker/tasks/balance/execution.go Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Update weed/worker/tasks/erasure_coding/ec_task.go Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Update weed/worker/tasks/task_log_handler.go Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * fix shard ids * plan distributing shard id * rendering planned shards in task details * remove Conflicts * worker logs correctly * pass in dc and rack * task logging * Update weed/admin/maintenance/maintenance_queue.go Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> * display log details * logs have fields now * sort field keys * fix link * fix collection filtering * avoid hard coded ec shard counts --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> |
7 months ago |
|
|
0ecb466eda
|
Admin: refactoring active topology (#7073)
* refactoring * add ec shard size * address comments * passing task id There seems to be a disconnect between the pending tasks created in ActiveTopology and the TaskDetectionResult returned by this function. A taskID is generated locally and used to create pending tasks via AddPendingECShardTask, but this taskID is not stored in the TaskDetectionResult or passed along in any way. This makes it impossible for the worker that eventually executes the task to know which pending task in ActiveTopology it corresponds to. Without the correct taskID, the worker cannot call AssignTask or CompleteTask on the master, breaking the entire task lifecycle and capacity management feature. A potential solution is to add a TaskID field to TaskDetectionResult and worker_pb.TaskParams, ensuring the ID is propagated from detection to execution. * 1 source multiple destinations * task supports multi source and destination * ec needs to clean up previous shards * use erasure coding constants * getPlanningCapacityUnsafe getEffectiveAvailableCapacityUnsafe should return StorageSlotChange for calculation * use CanAccommodate to calculate * remove dead code * address comments * fix Mutex Copying in Protobuf Structs * use constants * fix estimatedSize The calculation for estimatedSize only considers source.EstimatedSize and dest.StorageChange, but omits dest.EstimatedSize. The TaskDestination struct has an EstimatedSize field, which seems to be ignored here. This could lead to an incorrect estimation of the total size of data involved in tasks on a disk. The loop should probably also include estimatedSize += dest.EstimatedSize. * at.assignTaskToDisk(task) * refactoring * Update weed/admin/topology/internal.go Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> * fail fast * fix compilation * Update weed/worker/tasks/erasure_coding/detection.go Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> * indexes for volume and shard locations * dedup with ToVolumeSlots * return an additional boolean to indicate success, or an error * Update abstract_sql_store.go * fix * Update weed/worker/tasks/erasure_coding/detection.go Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> * Update weed/admin/topology/task_management.go Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> * faster findVolumeDisk * Update weed/worker/tasks/erasure_coding/detection.go Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Update weed/admin/topology/storage_slot_test.go Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * refactor * simplify * remove unused GetDiskStorageImpact function * refactor * add comments * Update weed/admin/topology/storage_impact.go Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> * Update weed/admin/topology/storage_slot_test.go Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> * Update storage_impact.go * AddPendingTask The unified AddPendingTask function now serves as the single entry point for all task creation, successfully consolidating the previously separate functions while maintaining full functionality and improving code organization. --------- Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> |
8 months ago |
|
|
0975968e71
|
admin: Refactor task destination planning (#7063)
* refactor planning into task detection * refactoring worker tasks * refactor * compiles, but only balance task is registered * compiles, but has nil exception * avoid nil logger * add back ec task * setting ec log directory * implement balance and vacuum tasks * EC tasks will no longer fail with "file not found" errors * Use ReceiveFile API to send locally generated shards * distributing shard files and ecx,ecj,vif files * generate .ecx files correctly * do not mount all possible EC shards (0-13) on every destination * use constants * delete all replicas * rename files * pass in volume size to tasks |
8 months ago |
|
|
891a2fb6eb
|
Admin: misc improvements on admin server and workers. EC now works. (#7055)
* initial design * added simulation as tests * reorganized the codebase to move the simulation framework and tests into their own dedicated package * integration test. ec worker task * remove "enhanced" reference * start master, volume servers, filer Current Status ✅ Master: Healthy and running (port 9333) ✅ Filer: Healthy and running (port 8888) ✅ Volume Servers: All 6 servers running (ports 8080-8085) 🔄 Admin/Workers: Will start when dependencies are ready * generate write load * tasks are assigned * admin start wtih grpc port. worker has its own working directory * Update .gitignore * working worker and admin. Task detection is not working yet. * compiles, detection uses volumeSizeLimitMB from master * compiles * worker retries connecting to admin * build and restart * rendering pending tasks * skip task ID column * sticky worker id * test canScheduleTaskNow * worker reconnect to admin * clean up logs * worker register itself first * worker can run ec work and report status but: 1. one volume should not be repeatedly worked on. 2. ec shards needs to be distributed and source data should be deleted. * move ec task logic * listing ec shards * local copy, ec. Need to distribute. * ec is mostly working now * distribution of ec shards needs improvement * need configuration to enable ec * show ec volumes * interval field UI component * rename * integration test with vauuming * garbage percentage threshold * fix warning * display ec shard sizes * fix ec volumes list * Update ui.go * show default values * ensure correct default value * MaintenanceConfig use ConfigField * use schema defined defaults * config * reduce duplication * refactor to use BaseUIProvider * each task register its schema * checkECEncodingCandidate use ecDetector * use vacuumDetector * use volumeSizeLimitMB * remove remove * remove unused * refactor * use new framework * remove v2 reference * refactor * left menu can scroll now * The maintenance manager was not being initialized when no data directory was configured for persistent storage. * saving config * Update task_config_schema_templ.go * enable/disable tasks * protobuf encoded task configurations * fix system settings * use ui component * remove logs * interface{} Reduction * reduce interface{} * reduce interface{} * avoid from/to map * reduce interface{} * refactor * keep it DRY * added logging * debug messages * debug level * debug * show the log caller line * use configured task policy * log level * handle admin heartbeat response * Update worker.go * fix EC rack and dc count * Report task status to admin server * fix task logging, simplify interface checking, use erasure_coding constants * factor in empty volume server during task planning * volume.list adds disk id * track disk id also * fix locking scheduled and manual scanning * add active topology * simplify task detector * ec task completed, but shards are not showing up * implement ec in ec_typed.go * adjust log level * dedup * implementing ec copying shards and only ecx files * use disk id when distributing ec shards 🎯 Planning: ActiveTopology creates DestinationPlan with specific TargetDisk 📦 Task Creation: maintenance_integration.go creates ECDestination with DiskId 🚀 Task Execution: EC task passes DiskId in VolumeEcShardsCopyRequest 💾 Volume Server: Receives disk_id and stores shards on specific disk (vs.store.Locations[req.DiskId]) 📂 File System: EC shards and metadata land in the exact disk directory planned * Delete original volume from all locations * clean up existing shard locations * local encoding and distributing * Update docker/admin_integration/EC-TESTING-README.md Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> * check volume id range * simplify * fix tests * fix types * clean up logs and tests --------- Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> |
8 months ago |