Update pickBestDiskOnNode to accept a strictDiskType parameter:
- strictDiskType=true (balancing): Only use disks of matching type.
This maintains storage tier isolation during normal rebalancing.
- strictDiskType=false (evacuation): Prefer same disk type, but
fall back to other disk types if no matching disk is available.
This ensures evacuation can complete even when same-type capacity
is insufficient.
Priority order for evacuation:
1. Same disk type with lowest shard count (preferred)
2. Different disk type with lowest shard count (fallback)
When evacuating or rebalancing EC shards, pickBestDiskOnNode now
filters disks by the target disk type. This ensures:
1. EC shards from SSD disks are moved to SSD disks on destination nodes
2. EC shards from HDD disks are moved to HDD disks on destination nodes
3. No cross-disk-type shard movement occurs
This maintains the storage tier isolation when moving EC shards
between nodes during evacuation or rebalancing operations.
Remove -diskType flag from volumeServer.evacuate since evacuation
should move all EC volumes regardless of disk type.
The command now iterates over all disk types (HDD, SSD) and evacuates
EC shards from each, moving them to destination nodes with matching
disk types.
Address code review comments:
1. Fix variable shadowing in collectEcVolumeServersByDc():
- Rename loop variable 'diskType' to 'diskTypeKey' and 'diskTypeStr'
to avoid shadowing the function parameter
2. Fix hardcoded HardDriveType in ecBalancer methods:
- balanceEcRack(): use ecb.diskType instead of types.HardDriveType
- collectVolumeIdToEcNodes(): use ecb.diskType
3. Add -diskType flag to ec.rebuild command:
- Add diskType field to ecRebuilder struct
- Pass diskType to collectEcNodes() and addEcVolumeShards()
4. Add -diskType flag to volumeServer.evacuate command:
- Add diskType field to commandVolumeServerEvacuate struct
- Pass diskType to collectEcVolumeServersByDc() and moveMountedShardToEcNode()
Update the following functions to accept/use diskType parameter:
- findEcVolumeShards()
- addEcVolumeShards()
- deleteEcVolumeShards()
- moveMountedShardToEcNode()
- countShardsByRack()
- pickNEcShardsToMoveFrom()
All ecBalancer methods now use ecb.diskType instead of hardcoded
types.HardDriveType. Non-ecBalancer callers (like volumeServer.evacuate
and ec.rebuild) use types.HardDriveType as the default.
Update all test files to pass diskType where needed.
Add diskType parameter to:
- ecBalancer struct
- collectEcVolumeServersByDc()
- collectEcNodesForDC()
- collectEcNodes()
- EcBalance()
This allows EC operations to target specific disk types (hdd, ssd, etc.)
instead of being hardcoded to HardDriveType only.
For backward compatibility, all callers currently pass types.HardDriveType
as the default value. Subsequent commits will add -diskType flags to
the individual EC commands.
* Add placement package for EC shard placement logic
- Consolidate EC shard placement algorithm for reuse across shell and worker tasks
- Support multi-pass selection: racks, then servers, then disks
- Include proper spread verification and scoring functions
- Comprehensive test coverage for various cluster topologies
* Make ec.balance disk-aware for multi-disk servers
- Add EcDisk struct to track individual disks on volume servers
- Update EcNode to maintain per-disk shard distribution
- Parse disk_id from EC shard information during topology collection
- Implement pickBestDiskOnNode() for selecting best disk per shard
- Add diskDistributionScore() for tie-breaking node selection
- Update all move operations to specify target disk in RPC calls
- Improves shard balance within multi-disk servers, not just across servers
* Use placement package in EC detection for consistent disk-level placement
- Replace custom EC disk selection logic with shared placement package
- Convert topology DiskInfo to placement.DiskCandidate format
- Use SelectDestinations() for multi-rack/server/disk spreading
- Convert placement results back to topology DiskInfo for task creation
- Ensures EC detection uses same placement logic as shell commands
* Make volume server evacuation disk-aware
- Use pickBestDiskOnNode() when selecting evacuation target disk
- Specify target disk in evacuation RPC requests
- Maintains balanced disk distribution during server evacuations
* Rename PlacementConfig to PlacementRequest for clarity
PlacementRequest better reflects that this is a request for placement
rather than a configuration object. This improves API semantics.
* Rename DefaultConfig to DefaultPlacementRequest
Aligns with the PlacementRequest type naming for consistency
* Address review comments from Gemini and CodeRabbit
Fix HIGH issues:
- Fix empty disk discovery: Now discovers all disks from VolumeInfos,
not just from EC shards. This ensures disks without EC shards are
still considered for placement.
- Fix EC shard count calculation in detection.go: Now correctly filters
by DiskId and sums actual shard counts using ShardBits.ShardIdCount()
instead of just counting EcShardInfo entries.
Fix MEDIUM issues:
- Add disk ID to evacuation log messages for consistency with other logging
- Remove unused serverToDisks variable in placement.go
- Fix comment that incorrectly said 'ascending' when sorting is 'descending'
* add ec tests
* Update ec-integration-tests.yml
* Update ec_integration_test.go
* Fix EC integration tests CI: build weed binary and update actions
- Add 'Build weed binary' step before running tests
- Update actions/setup-go from v4 to v6 (Node20 compatibility)
- Update actions/checkout from v2 to v4 (Node20 compatibility)
- Move working-directory to test step only
* Add disk-aware EC rebalancing integration tests
- Add TestDiskAwareECRebalancing test with multi-disk cluster setup
- Test EC encode with disk awareness (shows disk ID in output)
- Test EC balance with disk-level shard distribution
- Add helper functions for disk-level verification:
- startMultiDiskCluster: 3 servers x 4 disks each
- countShardsPerDisk: track shards per disk per server
- calculateDiskShardVariance: measure distribution balance
- Verify no single disk is overloaded with shards
* Unify the parameter to disable dry-run on weed shell commands to --apply (instead of --force).
* lint
* refactor
* Execution Order Corrected
* handle deprecated force flag
* fix help messages
* Refactoring]: Using flag.FlagSet.Visit()
* consistent with other commands
* Checks for both flags
* fix toml files
---------
Co-authored-by: chrislu <chris.lu@gmail.com>