seaweedfs

Commit Graph

Author	SHA1	Message	Date
Chris Lu	f9b4a4c396	fix: check freeEcSlot before evacuating EC shards to prevent data loss (#7621 ) * fix: check freeEcSlot before evacuating EC shards to prevent data loss Related to #7619 The moveAwayOneEcVolume function was missing the freeEcSlot check that exists in other EC shard placement functions. This could cause EC shards to be moved to volume servers that have no capacity, resulting in: 1. 0-byte shard files when disk is full 2. Data loss when source shards are deleted after 'successful' copy Changes: - Add freeEcSlot check before attempting to move EC shards - Sort destinations by both shard count and free slots - Refresh topology during evacuation to get updated slot counts - Log when nodes are skipped due to no free slots - Update freeEcSlot count after successful moves * fix: clarify comment wording per CodeRabbit review The comment stated 'after each move' but the code executes before calling moveAwayOneEcVolume. Updated to 'before moving each EC volume' for accuracy. * fix: collect topology once and track capacity changes locally Remove the topology refresh within the loop as it gives a false sense of correctness - the refreshed topology could still be stale (minutes old). Instead, we: 1. Collect topology once at the start 2. Track capacity changes ourselves via freeEcSlot decrement after each move This is more accurate because we know exactly what moves we've made, rather than relying on potentially stale topology refreshes. * fix: ensure partial EC volume moves are reported as failures Set hasMoved=false when a shard fails to move, even if previous shards succeeded. This prevents the caller from incorrectly assuming the entire volume was evacuated, which could lead to data loss if the source server is decommissioned based on this incorrect status. * fix: also reset hasMoved on moveMountedShardToEcNode error Same issue as the previous fix: if moveMountedShardToEcNode fails after some shards succeeded, hasMoved would incorrectly be true. Ensure partial moves are always reported as failures.	6 days ago
Chris Lu	4f038820dc	Add disk-aware EC rebalancing (#7597 ) * Add placement package for EC shard placement logic - Consolidate EC shard placement algorithm for reuse across shell and worker tasks - Support multi-pass selection: racks, then servers, then disks - Include proper spread verification and scoring functions - Comprehensive test coverage for various cluster topologies * Make ec.balance disk-aware for multi-disk servers - Add EcDisk struct to track individual disks on volume servers - Update EcNode to maintain per-disk shard distribution - Parse disk_id from EC shard information during topology collection - Implement pickBestDiskOnNode() for selecting best disk per shard - Add diskDistributionScore() for tie-breaking node selection - Update all move operations to specify target disk in RPC calls - Improves shard balance within multi-disk servers, not just across servers * Use placement package in EC detection for consistent disk-level placement - Replace custom EC disk selection logic with shared placement package - Convert topology DiskInfo to placement.DiskCandidate format - Use SelectDestinations() for multi-rack/server/disk spreading - Convert placement results back to topology DiskInfo for task creation - Ensures EC detection uses same placement logic as shell commands * Make volume server evacuation disk-aware - Use pickBestDiskOnNode() when selecting evacuation target disk - Specify target disk in evacuation RPC requests - Maintains balanced disk distribution during server evacuations * Rename PlacementConfig to PlacementRequest for clarity PlacementRequest better reflects that this is a request for placement rather than a configuration object. This improves API semantics. * Rename DefaultConfig to DefaultPlacementRequest Aligns with the PlacementRequest type naming for consistency * Address review comments from Gemini and CodeRabbit Fix HIGH issues: - Fix empty disk discovery: Now discovers all disks from VolumeInfos, not just from EC shards. This ensures disks without EC shards are still considered for placement. - Fix EC shard count calculation in detection.go: Now correctly filters by DiskId and sums actual shard counts using ShardBits.ShardIdCount() instead of just counting EcShardInfo entries. Fix MEDIUM issues: - Add disk ID to evacuation log messages for consistency with other logging - Remove unused serverToDisks variable in placement.go - Fix comment that incorrectly said 'ascending' when sorting is 'descending' * add ec tests * Update ec-integration-tests.yml * Update ec_integration_test.go * Fix EC integration tests CI: build weed binary and update actions - Add 'Build weed binary' step before running tests - Update actions/setup-go from v4 to v6 (Node20 compatibility) - Update actions/checkout from v2 to v4 (Node20 compatibility) - Move working-directory to test step only * Add disk-aware EC rebalancing integration tests - Add TestDiskAwareECRebalancing test with multi-disk cluster setup - Test EC encode with disk awareness (shows disk ID in output) - Test EC balance with disk-level shard distribution - Add helper functions for disk-level verification: - startMultiDiskCluster: 3 servers x 4 disks each - countShardsPerDisk: track shards per disk per server - calculateDiskShardVariance: measure distribution balance - Verify no single disk is overloaded with shards	1 week ago
Lisandro Pin	76e4a51964	Unify the parameter to disable dry-run on weed shell commands to `-apply` (instead of `-force`). (#7450 ) * Unify the parameter to disable dry-run on weed shell commands to --apply (instead of --force). * lint * refactor * Execution Order Corrected * handle deprecated force flag * fix help messages * Refactoring]: Using flag.FlagSet.Visit() * consistent with other commands * Checks for both flags * fix toml files --------- Co-authored-by: chrislu <chris.lu@gmail.com>	1 month ago
chrislu	ec155022e7	"golang.org/x/exp/slices" => "slices" and go fmt	12 months ago
Konstantin Lebedev	5bddf0c085	[shell] volume.balance collect volume servers by dc rack node (#6191 ) * chore: balance by rack * fix: rm check lock * fix: selected racks * fix: selected nodes * fix: containts * fix: one collectVolumeServersByDcRackNode * fix: revert lock and add lock * fix: panic test * revert noLock	1 year ago
chrislu	ec30a504ba	refactor	1 year ago
chrislu	701abbb9df	add IsResourceHeavy() to command interface	1 year ago
wyang	c1bffca246	fix evacuate volume to different disk types (#5530 ) Co-authored-by: wyang <wyang@wyangs-Air.lan>	1 year ago
chrislu	645ae8c57b	Revert "Revert "Merge branch 'master' of https://github.com/seaweedfs/seaweedfs "" This reverts commit `8cb42c39`	2 years ago
chrislu	8cb42c39ad	Revert "Merge branch 'master' of https://github.com/seaweedfs/seaweedfs " This reverts commit `2e5aa06026`, reversing changes made to `4d414f54a2`.	2 years ago
dependabot[bot]	a04bd4d26f	Bump github.com/rclone/rclone from 1.63.1 to 1.64.0 (#4850 ) * Bump github.com/rclone/rclone from 1.63.1 to 1.64.0 Bumps [github.com/rclone/rclone](https://github.com/rclone/rclone) from 1.63.1 to 1.64.0. - [Release notes](https://github.com/rclone/rclone/releases) - [Changelog](https://github.com/rclone/rclone/blob/master/RELEASE.md) - [Commits](https://github.com/rclone/rclone/compare/v1.63.1...v1.64.0) --- updated-dependencies: - dependency-name: github.com/rclone/rclone dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> * API changes * go mod --------- Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Chris Lu <chrislusf@users.noreply.github.com> Co-authored-by: chrislu <chris.lu@gmail.com>	2 years ago
chrislu	50dc2fe96b	cleaning variables	3 years ago
Konstantin Lebedev	e17429223e	shell script unclean variables (#4298 )	3 years ago
chrislu	98dc1e5c15	move volume: find target volume server by exiting/max ratio	3 years ago
chrislu	bf88006037	format	3 years ago
chrislu	26dbc6c905	move to https://github.com/seaweedfs/seaweedfs	3 years ago
Konstantin Lebedev	c5189c343b	remove ticker update the topology before each file	3 years ago
Konstantin Lebedev	e2d991d8d0	ticker.Stop	3 years ago
Konstantin Lebedev	de4fcc0e2c	sync update topologyInfo	3 years ago
Konstantin Lebedev	fa88dff7ce	update otherNodes	3 years ago
Konstantin Lebedev	72dca31cfa	fix update topologyInfo	3 years ago
Konstantin Lebedev	884ffbafee	clouse background update	3 years ago
Konstantin Lebedev	b5e5f6f55a	update topologyInfo	3 years ago
Konstantin Lebedev	867269cdcf	help rack	3 years ago
Konstantin Lebedev	6f764e1014	volume server evacuate from rack	3 years ago
Konstantin Lebedev	ba0e3ce5fa	volume server evacuate to target server	3 years ago
Konstantin Lebedev	d3f7c09c03	remove ticker update the topology before each file	3 years ago
Konstantin Lebedev	d422e7769c	ticker.Stop	3 years ago
Konstantin Lebedev	73a0dea16b	sync update topologyInfo	3 years ago
Konstantin Lebedev	2b4112e462	update otherNodes	3 years ago
Konstantin Lebedev	3c2774ec3d	fix update topologyInfo	3 years ago
Konstantin Lebedev	4d5144e50d	clouse background update	3 years ago
Konstantin Lebedev	8372721a62	update topologyInfo	3 years ago
Konstantin Lebedev	ee95d23a22	help rack	3 years ago
Konstantin Lebedev	087fa1347f	volume server evacuate from rack	3 years ago
Konstantin Lebedev	4236c36599	volume server evacuate to target server	3 years ago
Konstantin Lebedev	5ed8165161	fix logic add option targetServer https://github.com/chrislusf/seaweedfs/issues/3255	3 years ago
chrislu	6793bc853c	help message when in simulation mode	4 years ago
justin	3551ca2fcf	enhancement: replace sort.Slice with slices.SortFunc to reduce reflection	4 years ago
chrislu	f18803424a	volume.balance: add delay during tight loop fix https://github.com/chrislusf/seaweedfs/issues/2637	4 years ago
chrislu	a2d3f89c7b	add lock messages	4 years ago
Chris Lu	119d5908dd	shell: do not need to lock to see volume -h	4 years ago
Chris Lu	7359193e97	go fmt	4 years ago
Chris Lu	2f209675ab	Added `-retry` option for `volumeServer.evacuate` related to https://github.com/chrislusf/seaweedfs/issues/2191	4 years ago
Chris Lu	ba92f2e714	add node.selectedVolumes fix https://github.com/chrislusf/seaweedfs/issues/1990	5 years ago
Chris Lu	1c233ad986	refactoring	5 years ago
Chris Lu	38efc6f572	simplify	5 years ago
Chris Lu	43101ccea0	move to the empty nodes first	5 years ago
Chris Lu	3fe628f04e	use hdd instead of empty string	5 years ago
Chris Lu	f8446b42ab	this can compile now!!!	5 years ago

45 Commits (5167bbd2a9ecc832c566a5a21819dfd9b5384358)