seaweedfs

Commit Graph

Author	SHA1	Message	Date
Chris Lu	f9b4a4c396	fix: check freeEcSlot before evacuating EC shards to prevent data loss (#7621 ) * fix: check freeEcSlot before evacuating EC shards to prevent data loss Related to #7619 The moveAwayOneEcVolume function was missing the freeEcSlot check that exists in other EC shard placement functions. This could cause EC shards to be moved to volume servers that have no capacity, resulting in: 1. 0-byte shard files when disk is full 2. Data loss when source shards are deleted after 'successful' copy Changes: - Add freeEcSlot check before attempting to move EC shards - Sort destinations by both shard count and free slots - Refresh topology during evacuation to get updated slot counts - Log when nodes are skipped due to no free slots - Update freeEcSlot count after successful moves * fix: clarify comment wording per CodeRabbit review The comment stated 'after each move' but the code executes before calling moveAwayOneEcVolume. Updated to 'before moving each EC volume' for accuracy. * fix: collect topology once and track capacity changes locally Remove the topology refresh within the loop as it gives a false sense of correctness - the refreshed topology could still be stale (minutes old). Instead, we: 1. Collect topology once at the start 2. Track capacity changes ourselves via freeEcSlot decrement after each move This is more accurate because we know exactly what moves we've made, rather than relying on potentially stale topology refreshes. * fix: ensure partial EC volume moves are reported as failures Set hasMoved=false when a shard fails to move, even if previous shards succeeded. This prevents the caller from incorrectly assuming the entire volume was evacuated, which could lead to data loss if the source server is decommissioned based on this incorrect status. * fix: also reset hasMoved on moveMountedShardToEcNode error Same issue as the previous fix: if moveMountedShardToEcNode fails after some shards succeeded, hasMoved would incorrectly be true. Ensure partial moves are always reported as failures.	1 week ago
chrislusf	2b089065d6	ec: allow disk type fallback during evacuation Update pickBestDiskOnNode to accept a strictDiskType parameter: - strictDiskType=true (balancing): Only use disks of matching type. This maintains storage tier isolation during normal rebalancing. - strictDiskType=false (evacuation): Prefer same disk type, but fall back to other disk types if no matching disk is available. This ensures evacuation can complete even when same-type capacity is insufficient. Priority order for evacuation: 1. Same disk type with lowest shard count (preferred) 2. Different disk type with lowest shard count (fallback)	1 week ago
chrislusf	0d67470112	ec: filter disk selection by disk type in pickBestDiskOnNode When evacuating or rebalancing EC shards, pickBestDiskOnNode now filters disks by the target disk type. This ensures: 1. EC shards from SSD disks are moved to SSD disks on destination nodes 2. EC shards from HDD disks are moved to HDD disks on destination nodes 3. No cross-disk-type shard movement occurs This maintains the storage tier isolation when moving EC shards between nodes during evacuation or rebalancing operations.	1 week ago
chrislusf	5d85a424c5	volumeServer.evacuate: evacuate EC volumes from all disk types Remove -diskType flag from volumeServer.evacuate since evacuation should move all EC volumes regardless of disk type. The command now iterates over all disk types (HDD, SSD) and evacuates EC shards from each, moving them to destination nodes with matching disk types.	1 week ago
chrislusf	dc8a0fdf77	ec: fix variable shadowing and add -diskType to ec.rebuild and volumeServer.evacuate Address code review comments: 1. Fix variable shadowing in collectEcVolumeServersByDc(): - Rename loop variable 'diskType' to 'diskTypeKey' and 'diskTypeStr' to avoid shadowing the function parameter 2. Fix hardcoded HardDriveType in ecBalancer methods: - balanceEcRack(): use ecb.diskType instead of types.HardDriveType - collectVolumeIdToEcNodes(): use ecb.diskType 3. Add -diskType flag to ec.rebuild command: - Add diskType field to ecRebuilder struct - Pass diskType to collectEcNodes() and addEcVolumeShards() 4. Add -diskType flag to volumeServer.evacuate command: - Add diskType field to commandVolumeServerEvacuate struct - Pass diskType to collectEcVolumeServersByDc() and moveMountedShardToEcNode()	1 week ago
chrislusf	90b134f830	ec: update helper functions to use configurable diskType Update the following functions to accept/use diskType parameter: - findEcVolumeShards() - addEcVolumeShards() - deleteEcVolumeShards() - moveMountedShardToEcNode() - countShardsByRack() - pickNEcShardsToMoveFrom() All ecBalancer methods now use ecb.diskType instead of hardcoded types.HardDriveType. Non-ecBalancer callers (like volumeServer.evacuate and ec.rebuild) use types.HardDriveType as the default. Update all test files to pass diskType where needed.	2 weeks ago
chrislusf	306bc31a28	ec: add diskType parameter to core EC functions Add diskType parameter to: - ecBalancer struct - collectEcVolumeServersByDc() - collectEcNodesForDC() - collectEcNodes() - EcBalance() This allows EC operations to target specific disk types (hdd, ssd, etc.) instead of being hardcoded to HardDriveType only. For backward compatibility, all callers currently pass types.HardDriveType as the default value. Subsequent commits will add -diskType flags to the individual EC commands.	2 weeks ago
Chris Lu	4f038820dc	Add disk-aware EC rebalancing (#7597 ) * Add placement package for EC shard placement logic - Consolidate EC shard placement algorithm for reuse across shell and worker tasks - Support multi-pass selection: racks, then servers, then disks - Include proper spread verification and scoring functions - Comprehensive test coverage for various cluster topologies * Make ec.balance disk-aware for multi-disk servers - Add EcDisk struct to track individual disks on volume servers - Update EcNode to maintain per-disk shard distribution - Parse disk_id from EC shard information during topology collection - Implement pickBestDiskOnNode() for selecting best disk per shard - Add diskDistributionScore() for tie-breaking node selection - Update all move operations to specify target disk in RPC calls - Improves shard balance within multi-disk servers, not just across servers * Use placement package in EC detection for consistent disk-level placement - Replace custom EC disk selection logic with shared placement package - Convert topology DiskInfo to placement.DiskCandidate format - Use SelectDestinations() for multi-rack/server/disk spreading - Convert placement results back to topology DiskInfo for task creation - Ensures EC detection uses same placement logic as shell commands * Make volume server evacuation disk-aware - Use pickBestDiskOnNode() when selecting evacuation target disk - Specify target disk in evacuation RPC requests - Maintains balanced disk distribution during server evacuations * Rename PlacementConfig to PlacementRequest for clarity PlacementRequest better reflects that this is a request for placement rather than a configuration object. This improves API semantics. * Rename DefaultConfig to DefaultPlacementRequest Aligns with the PlacementRequest type naming for consistency * Address review comments from Gemini and CodeRabbit Fix HIGH issues: - Fix empty disk discovery: Now discovers all disks from VolumeInfos, not just from EC shards. This ensures disks without EC shards are still considered for placement. - Fix EC shard count calculation in detection.go: Now correctly filters by DiskId and sums actual shard counts using ShardBits.ShardIdCount() instead of just counting EcShardInfo entries. Fix MEDIUM issues: - Add disk ID to evacuation log messages for consistency with other logging - Remove unused serverToDisks variable in placement.go - Fix comment that incorrectly said 'ascending' when sorting is 'descending' * add ec tests * Update ec-integration-tests.yml * Update ec_integration_test.go * Fix EC integration tests CI: build weed binary and update actions - Add 'Build weed binary' step before running tests - Update actions/setup-go from v4 to v6 (Node20 compatibility) - Update actions/checkout from v2 to v4 (Node20 compatibility) - Move working-directory to test step only * Add disk-aware EC rebalancing integration tests - Add TestDiskAwareECRebalancing test with multi-disk cluster setup - Test EC encode with disk awareness (shows disk ID in output) - Test EC balance with disk-level shard distribution - Add helper functions for disk-level verification: - startMultiDiskCluster: 3 servers x 4 disks each - countShardsPerDisk: track shards per disk per server - calculateDiskShardVariance: measure distribution balance - Verify no single disk is overloaded with shards	2 weeks ago
Lisandro Pin	76e4a51964	Unify the parameter to disable dry-run on weed shell commands to `-apply` (instead of `-force`). (#7450 ) * Unify the parameter to disable dry-run on weed shell commands to --apply (instead of --force). * lint * refactor * Execution Order Corrected * handle deprecated force flag * fix help messages * Refactoring]: Using flag.FlagSet.Visit() * consistent with other commands * Checks for both flags * fix toml files --------- Co-authored-by: chrislu <chris.lu@gmail.com>	1 month ago
chrislu	ec155022e7	"golang.org/x/exp/slices" => "slices" and go fmt	12 months ago
Konstantin Lebedev	5bddf0c085	[shell] volume.balance collect volume servers by dc rack node (#6191 ) * chore: balance by rack * fix: rm check lock * fix: selected racks * fix: selected nodes * fix: containts * fix: one collectVolumeServersByDcRackNode * fix: revert lock and add lock * fix: panic test * revert noLock	1 year ago
chrislu	ec30a504ba	refactor	1 year ago
chrislu	701abbb9df	add IsResourceHeavy() to command interface	1 year ago
wyang	c1bffca246	fix evacuate volume to different disk types (#5530 ) Co-authored-by: wyang <wyang@wyangs-Air.lan>	1 year ago
chrislu	645ae8c57b	Revert "Revert "Merge branch 'master' of https://github.com/seaweedfs/seaweedfs "" This reverts commit `8cb42c39`	2 years ago
chrislu	8cb42c39ad	Revert "Merge branch 'master' of https://github.com/seaweedfs/seaweedfs " This reverts commit `2e5aa06026`, reversing changes made to `4d414f54a2`.	2 years ago
dependabot[bot]	a04bd4d26f	Bump github.com/rclone/rclone from 1.63.1 to 1.64.0 (#4850 ) * Bump github.com/rclone/rclone from 1.63.1 to 1.64.0 Bumps [github.com/rclone/rclone](https://github.com/rclone/rclone) from 1.63.1 to 1.64.0. - [Release notes](https://github.com/rclone/rclone/releases) - [Changelog](https://github.com/rclone/rclone/blob/master/RELEASE.md) - [Commits](https://github.com/rclone/rclone/compare/v1.63.1...v1.64.0) --- updated-dependencies: - dependency-name: github.com/rclone/rclone dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> * API changes * go mod --------- Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Chris Lu <chrislusf@users.noreply.github.com> Co-authored-by: chrislu <chris.lu@gmail.com>	2 years ago
chrislu	50dc2fe96b	cleaning variables	3 years ago
Konstantin Lebedev	e17429223e	shell script unclean variables (#4298 )	3 years ago
chrislu	98dc1e5c15	move volume: find target volume server by exiting/max ratio	3 years ago
chrislu	bf88006037	format	3 years ago
chrislu	26dbc6c905	move to https://github.com/seaweedfs/seaweedfs	3 years ago
Konstantin Lebedev	c5189c343b	remove ticker update the topology before each file	3 years ago
Konstantin Lebedev	e2d991d8d0	ticker.Stop	3 years ago
Konstantin Lebedev	de4fcc0e2c	sync update topologyInfo	3 years ago
Konstantin Lebedev	fa88dff7ce	update otherNodes	3 years ago
Konstantin Lebedev	72dca31cfa	fix update topologyInfo	3 years ago
Konstantin Lebedev	884ffbafee	clouse background update	3 years ago
Konstantin Lebedev	b5e5f6f55a	update topologyInfo	3 years ago
Konstantin Lebedev	867269cdcf	help rack	3 years ago
Konstantin Lebedev	6f764e1014	volume server evacuate from rack	3 years ago
Konstantin Lebedev	ba0e3ce5fa	volume server evacuate to target server	3 years ago
Konstantin Lebedev	d3f7c09c03	remove ticker update the topology before each file	3 years ago
Konstantin Lebedev	d422e7769c	ticker.Stop	3 years ago
Konstantin Lebedev	73a0dea16b	sync update topologyInfo	3 years ago
Konstantin Lebedev	2b4112e462	update otherNodes	3 years ago
Konstantin Lebedev	3c2774ec3d	fix update topologyInfo	3 years ago
Konstantin Lebedev	4d5144e50d	clouse background update	3 years ago
Konstantin Lebedev	8372721a62	update topologyInfo	3 years ago
Konstantin Lebedev	ee95d23a22	help rack	3 years ago
Konstantin Lebedev	087fa1347f	volume server evacuate from rack	3 years ago
Konstantin Lebedev	4236c36599	volume server evacuate to target server	3 years ago
Konstantin Lebedev	5ed8165161	fix logic add option targetServer https://github.com/chrislusf/seaweedfs/issues/3255	4 years ago
chrislu	6793bc853c	help message when in simulation mode	4 years ago
justin	3551ca2fcf	enhancement: replace sort.Slice with slices.SortFunc to reduce reflection	4 years ago
chrislu	f18803424a	volume.balance: add delay during tight loop fix https://github.com/chrislusf/seaweedfs/issues/2637	4 years ago
chrislu	a2d3f89c7b	add lock messages	4 years ago
Chris Lu	119d5908dd	shell: do not need to lock to see volume -h	4 years ago
Chris Lu	7359193e97	go fmt	4 years ago
Chris Lu	2f209675ab	Added `-retry` option for `volumeServer.evacuate` related to https://github.com/chrislusf/seaweedfs/issues/2191	4 years ago

1 2

52 Commits (4e8dca098bc3e2a7f5d7cc9129cfc9cd117d0d0e)