From 7e964283b9543b7ae60d30698537d98ccfe69042 Mon Sep 17 00:00:00 2001 From: Antonio SJ Musumeci Date: Tue, 7 Jan 2025 19:16:07 -0600 Subject: [PATCH] Add tiered cache details to docs --- mkdocs/docs/usage_patterns.md | 88 +++++++++++++++++++++++++++++++++++ mkdocs/mkdocs.yml | 1 + 2 files changed, 89 insertions(+) create mode 100644 mkdocs/docs/usage_patterns.md diff --git a/mkdocs/docs/usage_patterns.md b/mkdocs/docs/usage_patterns.md new file mode 100644 index 00000000..358efc73 --- /dev/null +++ b/mkdocs/docs/usage_patterns.md @@ -0,0 +1,88 @@ +# Usage Patterns + +## tiered cache + +Some storage technologies support what is called "tiered" caching. The +placing of smaller, faster storage as a transparent cache to larger, +slower storage. NVMe, SSD, Optane in front of traditional HDDs for +instance. + +mergerfs does not natively support any sort of tiered caching. Most +users have no use for such a feature and its inclusion would +complicate the code as it exists today. However, there are a few +situations where a cache filesystem could help with a typical mergerfs +setup. + +1. Fast network, slow filesystems, many readers: You've a 10+Gbps + network with many readers and your regular filesystems can't keep + up. +2. Fast network, slow filesystems, small'ish bursty writes: You have + a 10+Gbps network and wish to transfer amounts of data less than + your cache filesystem but wish to do so quickly and the time + between bursts is long enough to migrate data. + +With #1 it's arguable if you should be using mergerfs at all. A RAID +level that can aggregate performance or using higher performance +storage would probably be the better solution. If you're going to use +mergerfs there are other tactics that may help: spreading the data +across filesystems (see the mergerfs.dup tool) and setting +`func.open=rand`, using `symlinkify`, or using dm-cache or a similar +technology to add tiered cache to the underlying device itself. + +With #2 one could use dm-cache as well but there is another solution +which requires only mergerfs and a cronjob. + +1. Create 2 mergerfs pools. One which includes just the slow branches + and one which has both the fast branches (SSD,NVME,etc.) and slow + branches. The 'base' pool and the 'cache' pool. +2. The 'cache' pool should have the cache branches listed first in + the branch list. +3. The best `create` policies to use for the 'cache' pool would + probably be `ff`, `epff`, `lfs`, `msplfs`, or `eplfs`. The latter + three under the assumption that the cache filesystem(s) are far + smaller than the backing filesystems. If using path preserving + policies remember that you'll need to manually create the core + directories of those paths you wish to be cached. Be sure the + permissions are in sync. Use `mergerfs.fsck` to check / correct + them. You could also set the slow filesystems mode to `NC` though + that'd mean if the cache filesystems fill you'd get "out of space" + errors. +4. Enable `moveonenospc` and set `minfreespace` appropriately. To + make sure there is enough room on the "slow" pool you might want + to set `minfreespace` to at least as large as the size of the + largest cache filesystem if not larger. This way in the worst case + the whole of the cache filesystem(s) can be moved to the other + drives. +5. Set your programs to use the 'cache' pool. +6. Save one of the below scripts or create you're own. The script's + responsibility is to move files from the cache filesystems (not + pool) to the 'base' pool. +7. Use `cron` (as root) to schedule the command at whatever frequency + is appropriate for your workflow. + + +### time based expiring + +Move files from cache to base pool based only on the last time the +file was accessed. Replace `-atime` with `-amin` if you want minutes +rather than days. May want to use the `fadvise` / `--drop-cache` +version of rsync or run rsync with the tool +[nocache](https://github.com/Feh/nocache). + +**NOTE:** The arguments to these scripts include the cache +**filesystem** itself. Not the pool with the cache filesystem. You +could have data loss if the source is the cache pool. + +[mergerfs.time-based-mover](https://github.com/trapexit/mergerfs/blob/latest-release/tools/mergerfs.time-based-mover?raw=1) + + +### percentage full expiring + +Move the oldest file from the cache to the backing pool. Continue till +below percentage threshold. + +**NOTE:** The arguments to these scripts include the cache +**filesystem** itself. Not the pool with the cache filesystem. You +could have data loss if the source is the cache pool. + +[mergerfs.percent-full-mover](https://github.com/trapexit/mergerfs/blob/latest-release/tools/mergerfs.percent-full-mover?raw=1) diff --git a/mkdocs/mkdocs.yml b/mkdocs/mkdocs.yml index d53c2244..32c5f8f3 100644 --- a/mkdocs/mkdocs.yml +++ b/mkdocs/mkdocs.yml @@ -89,6 +89,7 @@ nav: - performance.md - benchmarking.md - tooling.md +- usage_patterns.md - FAQ: - faq/reliability_and_scalability.md - faq/usage_and_functionality.md