seaweedfs

Commit Graph

Author	SHA1	Message	Date
Chris Lu	4dcd33bbc8	fix: handle missing idx file for empty volumes during copy (#7777 ) (#7778 ) When copying/evacuating empty volumes, the .idx file may not exist on disk (this is allowed by checkIdxFile for volumes with only super block in .dat). This fix: 1. Uses os.IsNotExist() instead of err == os.ErrNotExist for proper wrapped error checking in CopyFile 2. Treats missing source file as success when StopOffset == 0 (empty file) 3. Allows checkCopyFiles to pass when idx file doesn't exist but IdxFileSize == 0 (empty volume) Fixes volumeServer.evacuate and volume.fix.replication for empty volumes.	3 days ago
Chris Lu	32a9a1f46f	fix: sync EC volume files before copying to fix deleted needles not being marked when decoding (#7755 ) * fix: sync EC volume files before copying to fix deleted needles not being marked when decoding (#7751) When a file is deleted from an EC volume, the deletion is written to both the .ecx and .ecj files. However, these writes were not synced to disk before the files were copied during ec.decode. This caused the copied files to miss the deletion markers, resulting in 'leaked' space where deleted files were not properly tracked after decoding. This fix: 1. Adds a Sync() method to EcVolume that flushes .ecx and .ecj files to disk without closing them 2. Calls Sync() in CopyFile before copying EC volume files to ensure all deletions are visible to the copy operation Fixes #7751 * test: add integration tests for EC volume deletion sync (issue #7751) Add comprehensive tests to verify that deleted needles are properly visible after EcVolume.Sync() is called. These tests cover: 1. TestWriteIdxFileFromEcIndex_PreservesDeletedNeedles - Verifies that WriteIdxFileFromEcIndex preserves deletion markers from .ecx files when generating .idx files 2. TestWriteIdxFileFromEcIndex_ProcessesEcjJournal - Verifies that deletions from .ecj journal file are correctly appended to the generated .idx file 3. TestEcxFileDeletionVisibleAfterSync - Verifies that MarkNeedleDeleted changes are visible after Sync() 4. TestEcxFileDeletionWithSeparateHandles - Tests that synced changes are visible across separate file handles 5. TestEcVolumeSyncEnsuresDeletionsVisible - Integration test for the full EcVolume.DeleteNeedleFromEcx + Sync() workflow that validates the fix for issue #7751 * refactor: log sync errors in EcVolume.Sync() instead of ignoring them Per code review feedback: sync errors could reintroduce the bug this PR fixes, so logging warnings helps with debugging.	4 days ago
Chris Lu	5c27522507	fix: prevent empty .vif files from ec.decode causing parse errors (#7686 ) * fix: prevent empty .vif files from ec.decode causing parse errors When ec.decode copies .vif files from EC shard nodes, if a source node doesn't have the .vif file, an empty .vif file was created on the target node. This caused volume.configure.replication to fail with 'proto: syntax error' when trying to parse the empty file. This fix: 1. In writeToFile: Remove empty files when no data was written (source file was not found) to avoid leaving corrupted empty files 2. In MaybeLoadVolumeInfo: Handle empty .vif files gracefully by treating them as non-existent, allowing the system to create a proper one Fixes #7666 * refactor: remove redundant dst.Close() and add error logging Address review feedback: - Remove redundant dst.Close() call since defer already handles it - Add error logging for os.Remove() failure	1 week ago
msementsov	c0dad091f1	Separate vacuum speed from replication speed (#7632 )	2 weeks ago
Chris Lu	fdb888729b	fix: properly handle errors in writeToFile to prevent 0-byte EC shards (#7620 ) Fixes #7619 The writeToFile function had two critical bugs that could cause data loss during EC shard evacuation when the destination disk is full: Bug 1: When os.OpenFile fails (e.g., disk full), the error was silently ignored and nil was returned. This caused the caller to think the copy succeeded. Bug 2: When dst.Write fails (e.g., 'no space left on device'), the error was completely ignored because the return value was not checked. When evacuating EC shards to a full volume server (especially on BTRFS): 1. OpenFile may succeed (creates 0-byte file inode) 2. Write fails with 'no space left on device' 3. Errors were ignored, function returned nil 4. Caller thinks copy succeeded and deletes source shard 5. Result: 0-byte shard on destination, data loss! This fix ensures both errors are properly returned, preventing data loss. Added unit tests to verify the fix.	2 weeks ago
Chris Lu	891a2fb6eb	Admin: misc improvements on admin server and workers. EC now works. (#7055 ) * initial design * added simulation as tests * reorganized the codebase to move the simulation framework and tests into their own dedicated package * integration test. ec worker task * remove "enhanced" reference * start master, volume servers, filer Current Status ✅ Master: Healthy and running (port 9333) ✅ Filer: Healthy and running (port 8888) ✅ Volume Servers: All 6 servers running (ports 8080-8085) 🔄 Admin/Workers: Will start when dependencies are ready * generate write load * tasks are assigned * admin start wtih grpc port. worker has its own working directory * Update .gitignore * working worker and admin. Task detection is not working yet. * compiles, detection uses volumeSizeLimitMB from master * compiles * worker retries connecting to admin * build and restart * rendering pending tasks * skip task ID column * sticky worker id * test canScheduleTaskNow * worker reconnect to admin * clean up logs * worker register itself first * worker can run ec work and report status but: 1. one volume should not be repeatedly worked on. 2. ec shards needs to be distributed and source data should be deleted. * move ec task logic * listing ec shards * local copy, ec. Need to distribute. * ec is mostly working now * distribution of ec shards needs improvement * need configuration to enable ec * show ec volumes * interval field UI component * rename * integration test with vauuming * garbage percentage threshold * fix warning * display ec shard sizes * fix ec volumes list * Update ui.go * show default values * ensure correct default value * MaintenanceConfig use ConfigField * use schema defined defaults * config * reduce duplication * refactor to use BaseUIProvider * each task register its schema * checkECEncodingCandidate use ecDetector * use vacuumDetector * use volumeSizeLimitMB * remove remove * remove unused * refactor * use new framework * remove v2 reference * refactor * left menu can scroll now * The maintenance manager was not being initialized when no data directory was configured for persistent storage. * saving config * Update task_config_schema_templ.go * enable/disable tasks * protobuf encoded task configurations * fix system settings * use ui component * remove logs * interface{} Reduction * reduce interface{} * reduce interface{} * avoid from/to map * reduce interface{} * refactor * keep it DRY * added logging * debug messages * debug level * debug * show the log caller line * use configured task policy * log level * handle admin heartbeat response * Update worker.go * fix EC rack and dc count * Report task status to admin server * fix task logging, simplify interface checking, use erasure_coding constants * factor in empty volume server during task planning * volume.list adds disk id * track disk id also * fix locking scheduled and manual scanning * add active topology * simplify task detector * ec task completed, but shards are not showing up * implement ec in ec_typed.go * adjust log level * dedup * implementing ec copying shards and only ecx files * use disk id when distributing ec shards 🎯 Planning: ActiveTopology creates DestinationPlan with specific TargetDisk 📦 Task Creation: maintenance_integration.go creates ECDestination with DiskId 🚀 Task Execution: EC task passes DiskId in VolumeEcShardsCopyRequest 💾 Volume Server: Receives disk_id and stores shards on specific disk (vs.store.Locations[req.DiskId]) 📂 File System: EC shards and metadata land in the exact disk directory planned * Delete original volume from all locations * clean up existing shard locations * local encoding and distributing * Update docker/admin_integration/EC-TESTING-README.md Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> * check volume id range * simplify * fix tests * fix types * clean up logs and tests --------- Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	5 months ago
Chris Lu	69553e5ba6	convert error fromating to %w everywhere (#6995 )	5 months ago
chrislu	96632a34b1	add version to volume proto	6 months ago
chrislu	8e4bffc66b	copy ec shards to disks already having ec volumes fix https://github.com/seaweedfs/seaweedfs/issues/5615	1 year ago
vadimartynov	8aae82dd71	Added context for the MasterClient's methods to avoid endless loops (#5628 ) * Added context for the MasterClient's methods to avoid endless loops * Returned WithClient function. Added WithClientCustomGetMaster function * Hid unused ctx arguments * Using a common context for the KeepConnectedToMaster and WaitUntilConnected functions * Changed the context termination check in the tryConnectToMaster function * Added a child context to the tryConnectToMaster function * Added a common context for KeepConnectedToMaster and WaitUntilConnected functions in benchmark	2 years ago
rustrover	ab70aa92da	remove repetitive words (#5364 )	2 years ago
zehweh	8b39bbbe2f	fix copying .vif file in VolumeCopy (#4943 ) closes #4934 fixes #2633 might fix #3528	2 years ago
Konstantin Lebedev	25535e9c36	Delete volume is empty (#4561 ) * use onlyEmpty for deleteVolume https://github.com/seaweedfs/seaweedfs/issues/4559 * fix IsEmpty * fix test --------- Co-authored-by: Konstantin Lebedev <9497591+kmlebedev@users.noreply.github.co>	3 years ago
chrislu	de286fe662	shell: volume.move handles volume moved to cloud tier fix https://github.com/seaweedfs/seaweedfs/issues/3803	3 years ago
askeipx	2e78a522ab	remove old raft servers if they don't answer to pings for too long (#3398 ) * remove old raft servers if they don't answer to pings for too long add ping durations as options rename ping fields fix some todos get masters through masterclient raft remove server from leader use raft servers to ping them CheckMastersAlive for hashicorp raft only * prepare blocking ping * pass waitForReady as param * pass waitForReady through all functions * waitForReady works * refactor * remove unneeded params * rollback unneeded changes * fix	3 years ago
qzh	74b53729e1	feat(weed.move): add a speed limit parameter of moving files (#3478 ) * feat(weed.move): add a speed limit parameter of moving files * fix(weed.move): set the default value of ioBytePerSecond to vs.compactionBytePerSecond Co-authored-by: zhihao.qu <zhihao.qu@ly.com>	3 years ago
chrislu	26dbc6c905	move to https://github.com/seaweedfs/seaweedfs	3 years ago
石昌林	81f7f08708	Determine whether to preallocate according to the master configuration before executing copy volume	4 years ago
chrislu	94f824e1ce	volume: sync to disk before copying volume files address https://github.com/chrislusf/seaweedfs/issues/2976	4 years ago
chrislu	affe3c2c12	change to util.WriteFile	4 years ago
chrislu	9f9ef1340c	use streaming mode for long poll grpc calls streaming mode would create separate grpc connections for each call. this is to ensure the long poll connections are properly closed.	4 years ago
Chris Lu	a0ef6e3611	prevent nil response fix https://github.com/chrislusf/seaweedfs/issues/2452	4 years ago
Chris Lu	0c8dea9de8	go fmt	4 years ago
Chris Lu	5435027ff0	volume copy: stream out copying progress and avoid grpc request timeout fix https://github.com/chrislusf/seaweedfs/issues/2386	4 years ago
Eng Zer Jun	a23bcbb7ec	refactor: move from io/ioutil to io and os package The io/ioutil package has been deprecated as of Go 1.16, see https://golang.org/doc/go1.16#ioutil. This commit replaces the existing io/ioutil functions with their new definitions in io and os packages. Signed-off-by: Eng Zer Jun <engzerjun@gmail.com>	4 years ago
Chris Lu	e5fc35ed0c	change server address from string to a type	4 years ago
Chris Lu	7ce97b59d8	go fmt	4 years ago
Chris Lu	d1a4e19a3f	volume: copy file also copies modification time to ensure ttl can work well	4 years ago
Chris Lu	f4decf02df	volume copying: clean up stale volume data files fix https://github.com/chrislusf/seaweedfs/issues/2250	4 years ago
Chris Lu	7ca75347ec	minor	5 years ago
Chris Lu	30b30b8fe0	volume.tier.move: passing non-empty disk type	5 years ago
Chris Lu	3fe628f04e	use hdd instead of empty string	5 years ago
Chris Lu	f8446b42ab	this can compile now!!!	5 years ago
Chris Lu	770393a48c	volume: add capability to change disk type when moving a volume	5 years ago
Chris Lu	94525aa0fd	allocate volume by disk type	5 years ago
Chris Lu	6d30b21b10	volume: add "-dir.idx" option for separate index storage fix https://github.com/chrislusf/seaweedfs/issues/1265	5 years ago
Chris Lu	de86945aeb	go fmt	5 years ago
Chris Lu	53c3aad875	volume: add a note file to avoid incomplete volume files fix https://github.com/chrislusf/seaweedfs/issues/1567	5 years ago
Chris Lu	24bf142596	copy large file first	5 years ago
James Hartig	8e54e34576	volume: Don't unmount before deleting volume in copy If we unmount first and then delete, the delete fails because the volume was unmounted. Delete ends up doing the same thing as the unmount anyways.	5 years ago
Chris Lu	91b91d6cb7	add error to avoid copying not found volume fix https://github.com/chrislusf/seaweedfs/issues/1317	6 years ago
Chris Lu	91da7057b1	refactoring	6 years ago
Chris Lu	7bc3c93512	add util.PathJoin	6 years ago
Chris Lu	cea52a4faf	volume copying adds cleaning up on error fix https://github.com/chrislusf/seaweedfs/issues/1253	6 years ago
Chris Lu	97ab8a1976	remove ctx if possible	6 years ago
Chris Lu	892e726eb9	avoid reusing context object fix https://github.com/chrislusf/seaweedfs/issues/1182	6 years ago
Chris Lu	72a64a5cf8	use the same context object in order to retry	6 years ago
Chris Lu	4e731f1c8b	volume: copy volumes also include .vif file	6 years ago
Chris Lu	8fbc0a9163	fix edge cases	6 years ago
Chris Lu	09ca936c78	shell: add ec.decode command	6 years ago

1 2

68 Commits (4dcd33bbc865a40e37d1e6efb03af68945bc0852)