From 1729835b8ca9e2a6b3d38b292c742037cc1369b0 Mon Sep 17 00:00:00 2001 From: Antonio SJ Musumeci Date: Sat, 25 Jan 2025 12:50:40 -0600 Subject: [PATCH] Misc doc updates --- mkdocs/docs/config/terminology.md | 12 ---- mkdocs/docs/faq/limit_drive_spinup.md | 69 ++++++++++++++++++---- mkdocs/docs/faq/usage_and_functionality.md | 61 +++++++++++-------- mkdocs/docs/related_projects.md | 29 +++++---- mkdocs/docs/terminology.md | 33 +++++++++++ mkdocs/mkdocs.yml | 2 +- 6 files changed, 147 insertions(+), 59 deletions(-) delete mode 100644 mkdocs/docs/config/terminology.md create mode 100644 mkdocs/docs/terminology.md diff --git a/mkdocs/docs/config/terminology.md b/mkdocs/docs/config/terminology.md deleted file mode 100644 index f9aacaa1..00000000 --- a/mkdocs/docs/config/terminology.md +++ /dev/null @@ -1,12 +0,0 @@ -# Terminology - -- `branch`: A base path used in the pool. Keep in mind that mergerfs - does not work on devices or even filesystems but on paths. It can - accomidate for multiple paths pointing to the same filesystem. -- `pool`: The mergerfs mount. The union of the branches. The instance - of mergerfs. You can have as many pools as you wish. -- `relative path`: The path in the pool relative to the branch and mount. -- `function`: A filesystem call (open, unlink, create, getattr, rmdir, etc.) -- `category`: A collection of functions based on basic behavior (action, create, search). -- `policy`: The algorithm used to select a file or files when performing a function. -- `path preservation`: Aspect of some policies which includes checking the path for which a file would be created. diff --git a/mkdocs/docs/faq/limit_drive_spinup.md b/mkdocs/docs/faq/limit_drive_spinup.md index d81aef3c..a14274b3 100644 --- a/mkdocs/docs/faq/limit_drive_spinup.md +++ b/mkdocs/docs/faq/limit_drive_spinup.md @@ -2,31 +2,78 @@ ## How can I setup my system to limit drive spinup? -TL;DR: You really can't. Not through mergerfs alone. +TL;DR: You really can't. Not through mergerfs alone. In fact mergerfs +makes an attempt to do so more complicated. -mergerfs is a proxy. Not a cache. It proxies calls between client software and underlying filesystems. If a client does an `open`, `readdir`, `stat`, etc. it must translate that into something that makes sense across N filesystems. For `readdir` that means running the call against all branches and aggregating the output. For `open` that means finding the file to open and doing so. The only way to find the file to open is to scan across all branches and sort the results and pick one. There is no practical way to do otherwise. Especially given so many mergerfs users expect out of band changes to "just work." +mergerfs is a proxy. Despite having some caching behaviors it is not +designed to cache much more than metadata. It proxies calls between +client software and underlying filesystems. If a client makes a +request such as `open`, `readdir`, `stat`, etc. it must translate that +into something that makes sense across multiple filesystems. For +`readdir` that means running the call against all branches and +aggregating the output. For `open` that means finding the file to open +and doing so. The only way to find the file to open is to scan across +all branches and sort the results and pick one. There is no practical +way to do otherwise. Especially given so many mergerfs users expect +out-of-band changes to "just work." -The best way to limit spinup of drives is to limit their usage at the client level. Meaning keeping software from interacting with the filesystem all together. +The best way to limit spinup of drives is to limit their usage at the +client level. Meaning keeping software from interacting with the +filesystem (and therefore the drive) all together. -### What if you assume no out of band changes and cache everything? -This would require a significant rewrite of mergerfs. Everything is done on the fly right now and all those calls to underlying filesystems can cause a spinup. To work around that a database of some sort would have to be used to store ALL metadata about the underlying filesystems and on startup everything scanned and stored. From then on it would have to carefully update all the same data the filesystems do. It couldn't be kept in RAM because it would take up too much space so it'd have to be on a SSD or other storage device. If anything changed out of band it would break things in weird ways. It could rescan on occasion but that would require spinning up everything. It could put file watches on every single directory but that probably won't scale (there are millions of directories in my system for example) and the open files might keep the drives from spinning down. Something as "simple" as keeping the current available free space on each filesystem isn't as easy as one might think given reflinks, snapshots, and other block level dedup technologies. +### What if you assume no out-of-band changes and cache everything? + +This would require a significant rewrite of mergerfs. Everything is +done on the fly right now and all those calls to underlying +filesystems can cause a spinup. To work around that a database of some +sort would have to be used to store ALL metadata about the underlying +filesystems and on startup everything scanned and stored. From then on +it would have to carefully update all the same data the filesystems +do. It couldn't be kept in RAM because it would take up too much space +so it'd have to be on a SSD or other storage device. If anything +changed out of band it would break things in weird ways. It could +rescan on occasion but that would require spinning up +everything. Filesystem watches could be used to get updates when the +filesystem changes but that would allow for race conditions and +might keep the drives from spinning down. Something as "simple" as +keeping the current available free space on each filesystem isn't as +easy as one might think given reflinks, snapshots, and other block +level dedup technologies as well as the space used includes not just +raw file usage. + +Even if all metadata (including xattrs) is cached some software will +open files (media like videos and audio) to check their +metadata. Granted a Plex or Jellyfin scan which may do that is +different from a random directory listing but is still something to +consider. Those "deep" scans can't be kept from waking drives. -Even if all metadata (including xattrs) is cached some software will open files (media like videos and audio) to check their metadata. Granted a Plex or Jellyfin scan which may do that is different from a random directory listing but is still something to consider. Those "deep" scans can't be kept from waking drives. ### What if you only query already active drives? -Let's assume that is plausible (it isn't because some drives actually will spin up if you ask if they are spun down... yes... really) you would have to either cache all the metadata on the filesystem or treat it like the filesystem doesn't exist. The former has all the problems mentioned prior and the latter would break a lot of things. +Let's assume that is plausible (it isn't because some drives actually +will spin up if you ask if they are spun down... yes... really) you +would have to either cache all the metadata on the filesystem or treat +it like the filesystem doesn't exist. The former has all the problems +mentioned prior and the latter would break a lot of things. + ### Is there anything that can be done where mergerfs is involved? -Yes, but whether it works for you depends on your tolerance for the complexity. +Yes, but whether it works for you depends on your tolerance for the +complexity. 1. Cleanly separate writing, storing, and consuming the data. 1. Use a SSD or dedicated and limited pool of drives for downloads / torrents. 2. When downloaded move the files to the primary storage pool. - 3. When setting up software like Plex, Jellyfin, etc. point to the underlying filesystems. Not mergerfs. -2. Add a bunch of bcache, lvmcache, or similar block level cache to your setup. After a bit of use, assuming sufficient storage space, you can limit the likelihood of the underlying spinning disks from needing to be hit. + 3. When setting up software like Plex, Jellyfin, etc. point to the + underlying filesystems. Not mergerfs. +2. Add a bunch of bcache, lvmcache, dm-cache, or similar block level + cache to your setup. After a bit of use, assuming sufficient + storage space, you can limit the likelihood of the underlying + spinning disks from needing to be hit. -Remember too that while it may be a tradeoff you're willing to live with there is decent evidence that spinning down drives puts increased wear on them and can lead to their death earlier than otherwise. +Remember too that while it may be a tradeoff you are willing to live +with there is decent evidence that spinning down drives puts increased +wear on them and can lead to their death earlier than otherwise. diff --git a/mkdocs/docs/faq/usage_and_functionality.md b/mkdocs/docs/faq/usage_and_functionality.md index bf84583d..95536ffe 100644 --- a/mkdocs/docs/faq/usage_and_functionality.md +++ b/mkdocs/docs/faq/usage_and_functionality.md @@ -1,18 +1,18 @@ # Usage and Functionality -## Can mergerfs be used with filesystems which already have data / are in use? +## Can mergerfs be used with filesystems which already have data? Yes. mergerfs is really just a proxy and does **NOT** interfere with -the normal form or function of the filesystems, mounts, paths it -manages. A userland application that is acting as a -man-in-the-middle. It can't do anything that any other random piece of -software can't do. +the normal form or function of the filesystems, mounts, or paths it +manages. It literally is interacting with your filesystems as any +other application does. It can not do anything that any other random +piece of software can't do. mergerfs is **not** a traditional filesystem that takes control over -the underlying block device. mergerfs is **not** RAID. It does **not** -manipulate the data that passes through it. It does **not** shard data -across filesystems. It merely shards some **behavior** and aggregates -others. +the underlying disk or block device. mergerfs is **not** RAID. It does +**not** manipulate the data that passes through it. It does **not** +shard data across filesystems. It only shards some **behavior** and +aggregates others. ## Can filesystems be removed from the pool without affecting them? @@ -37,7 +37,7 @@ Yes. ## How do I migrate data into or out of the pool when adding/removing filesystems? -You don't need to. See the previous question's answer. +There is no need to do so. See the previous questions. ## How do I remove a filesystem but keep the data in the pool? @@ -47,18 +47,29 @@ config and copy (rsync) the data from the removed filesystem into the pool. The same as if it were you transfering data from one filesystem to another. -If you wish to continue using the pool while performing the transfer -simply create a temporary pool without the filesystem in question and -then copy the data. It would probably be a good idea to set the branch -to `RO` prior to doing this to ensure no new content is written to the -filesystem while performing the copy. - - -## Can filesystems be written to directly? Outside of mergerfs while pooled? - -Yes, however, it's not recommended to use the same file from within the -pool and from without at the same time (particularly -writing). Especially if using caching of any kind (cache.files, -cache.entry, cache.attr, cache.negative_entry, cache.symlinks, -cache.readdir, etc.) as there could be a conflict between cached data -and not. +If you wish to continue using the pool with all data available while +performing the transfer simply create a temporary pool without the +branch in question and then copy the data from the branch to the +temporary pool. It would probably be a good idea to set the branch +mode to `RO` prior to doing this to ensure no new content is written +to the filesystem while performing the copy. However, it is typically +good practice to run rsync or rclone again after the first copy +finishes to ensure nothing is left behind. + +NOTE: Above recommends to "copy" rather than "move" because you want +to ensure that your data is transfered before wiping the drive or +filesystem. + + +## Can filesystems still be used directly? Outside of mergerfs while pooled? + +Yes, out-of-band interaction is generally fine. Remember that mergerfs +is just a userland application like any other software so its +interactions with the underlying filesystems is no different. It would +be like two normal applications interacting with the +filesystem. However, it's not recommended to write to the same file +from within the pool and from without at the same time. Especially if +using page caching (`cache.files!=off`) or writeback caching +(`cache.writeback=true`). That said this risk is really not +really different from the risk of two applications writing to +the same file under normal conditions. diff --git a/mkdocs/docs/related_projects.md b/mkdocs/docs/related_projects.md index 7c1d4036..9d45ebd3 100644 --- a/mkdocs/docs/related_projects.md +++ b/mkdocs/docs/related_projects.md @@ -29,16 +29,24 @@ vendors' web storage interfaces. rclone's [union](https://rclone.org/union/) feature is based on mergerfs policies. -* [ZFS](https://openzfs.org/): Common to use ZFS w/ mergerfs. ZFS for - important data and mergerfs pool for replacable media. -* [UnRAID](https://unraid.net): While UnRAID has its own union - filesystem it isn't uncommon to see UnRAID users leverage mergerfs - given the differences in the technologies. There is a [plugin - available by +* [ZFS](https://openzfs.org/): A popular filesystem and volume + management platform originally part of Sun Solaris and later ported + to other operating systems. It is common to use ZFS with + mergerfs. ZFS for important data and mergerfs pool for replacable + media. +* [Proxmox](https://www.proxmox.com/): Proxmox is a popular, Debian + based, virtualization platform. Users tend to install mergerfs on + the host and pass the mount into containers. +* [UnRAID](https://unraid.net): "Unraid is a powerful, easy-to-use + operating system for self-hosted servers and network-attached + storage." While UnRAID has its own union filesystem it isn't + uncommon to see UnRAID users leverage mergerfs given the differences + in the technologies. There is a [plugin available by Rysz](https://forums.unraid.net/topic/144999-plugin-mergerfs-for-unraid-support-topic/) to ease installation and setup. -* [TrueNAS](https://www.truenas.com): Some users are requesting - mergerfs be [made part +* [TrueNAS SCALE](https://www.truenas.com/truenas-scale/): An + enterprise focused NAS operating system with OpenZFS support. A Some + users are requesting mergerfs be [made part of](https://forums.truenas.com/t/add-unionfs-or-mergerfs-and-rdam-enhancement-then-beat-all-other-nas-systems/23218) TrueNAS. * For a time there were a number of Chia miners recommending mergerfs. @@ -46,8 +54,9 @@ details [on their wiki](https://cloudboxes.io/wiki/how-to/apps/set-up-mergerfs-using-ssh): on how to setup mergerfs. -* [QNAP](https://www.myqnap.org/product/mergerfs-apache83/): Someone - has create builds of mergerfs for different QNAP devices. +* [QNAP](https://www.myqnap.org/product/mergerfs-apache83/): A company + known for their turnkey, consumer focused NAS devices. Someone has + created builds of mergerfs for different QNAP devices. ## Distributions including mergerfs diff --git a/mkdocs/docs/terminology.md b/mkdocs/docs/terminology.md new file mode 100644 index 00000000..fde58dce --- /dev/null +++ b/mkdocs/docs/terminology.md @@ -0,0 +1,33 @@ +# Terminology + +* `disk`, `drive`, `disk drive`: A [physical data storage + device](https://en.wikipedia.org/wiki/Disk_storage). Such as a hard + drive or solid-state drive. Usually requires the use of a filesystem + to be useful. mergerfs does not deal with disks. +* `filesystem`: Lowlevel software which provides a way to organize data + and provide access to said data in a standard way. A filesystem is a + higher level abstraction that may or may not be stored on a + disk. mergerfs deals exclusively with filesystems. +* `path`: A location within a filesystem. mergerfs can work with any + path within a filesystem and not simply the root. +* `branch`: A base path used in a mergerfs pool. mergerfs can + accomidate multiple paths pointing to the same filesystem. +* `pool`: The mergerfs mount. The union of the branches. The instance + of mergerfs. You can mount multiple mergerfs pools. Even with the + same branches. +* `relative path`: The path in the pool relative to the branch and + mount. `foo/bar` is the relative path of mergerfs mount + `/mnt/mergerfs/foo/bar`. +* `function`: A filesystem call such as `open`, `unlink`, `create`, + `getattr`, `rmdir`, etc. The requests your software make to the + filesystem. +* `category`: A collection of functions based on basic behavior + (action, create, search). +* `policy`: The algorithm used to select a file or files when + performing a function. +* `path preservation`: Aspect of some policies which includes checking + the path for which a file would be created. +* `out-of-band`: + [out-of-band](https://en.wikipedia.org/wiki/Out-of-band) in our + context refers to interacting with the underlying filesystem + directly instead of going through mergerfs (or NFS or Samba). diff --git a/mkdocs/mkdocs.yml b/mkdocs/mkdocs.yml index 0ca87cd7..82dd9495 100644 --- a/mkdocs/mkdocs.yml +++ b/mkdocs/mkdocs.yml @@ -55,8 +55,8 @@ nav: - setup/installation.md - setup/upgrade.md - setup/build.md +- terminology.md - Config: - - config/terminology.md - config/options.md - config/deprecated_options.md - config/branches.md