mergerfs does **not** support the copy-on-write (CoW) or whiteout
behaviors found in **aufs** and **overlayfs**. You can **not** mount a
read-only filesystem and write to it. However, mergerfs will ignore
read-only drives when creating new files so you can mix read-write and
read-only drives. It also does **not** split data across drives. It is
not RAID0 / striping. It is simply a union of other filesystems.
read-only filesystems when creating new files so you can mix
read-write and read-only filesystems. It also does **not** split data
across filesystems. It is not RAID0 / striping. It is simply a union of
other filesystems.
# TERMINOLOGY
@ -178,7 +179,7 @@ These options are the same regardless of whether you use them with the
policy of `create` (read below). Enabling this will cause rename and
link to always use the non-path preserving behavior. This means
files, when renamed or linked, will stay on the same
drive. (default: false)
filesystem. (default: false)
* **security_capability=BOOL**: If false return ENOATTR when xattr
security.capability is queried. (default: true)
* **xattr=passthrough|noattr|nosys**: Runtime control of
@ -191,7 +192,7 @@ These options are the same regardless of whether you use them with the
copy-on-write function similar to cow-shell. (default: false)
* **statfs=base|full**: Controls how statfs works. 'base' means it
will always use all branches in statfs calculations. 'full' is in
effect path preserving and only includes drives where the path
effect path preserving and only includes branches where the path
exists. (default: base)
* **statfs_ignore=none|ro|nc**: 'ro' will cause statfs calculations to
ignore available space for branches mounted or tagged as 'read-only'
@ -324,9 +325,9 @@ you're using. Not all features are available in older releases. Use
The 'branches' argument is a colon (':') delimited list of paths to be
pooled together. It does not matter if the paths are on the same or
different drives nor does it matter the filesystem (within
different filesystems nor does it matter the filesystem type (within
reason). Used and available space will not be duplicated for paths on
the same device and any features which aren't supported by the
the same filesystem and any features which aren't supported by the
underlying filesystem (such as file attributes or extended attributes)
will return the appropriate errors.
@ -334,7 +335,7 @@ Branches currently have two options which can be set. A type which
impacts whether or not the branch is included in a policy calculation
and a individual minfreespace value. The values are set by prepending
an `=` at the end of a branch designation and using commas as
delimiters. Example: /mnt/drive=RW,1234
delimiters. Example: `/mnt/drive=RW,1234`
#### branch mode
@ -590,10 +591,10 @@ something to keep in mind.
**WARNING:** Some backup solutions, such as CrashPlan, do not backup
the target of a symlink. If using this feature it will be necessary to
point any backup software to the original drives or configure the
software to follow symlinks if such an option is
available. Alternatively create two mounts. One for backup and one for
general consumption.
point any backup software to the original filesystems or configure the
software to follow symlinks if such an option is available.
Alternatively create two mounts. One for backup and one for general
consumption.
### nullrw
@ -750,11 +751,11 @@ All policies which start with `ep` (**epff**, **eplfs**, **eplus**,
**epmfs**, **eprand**) are `path preserving`. `ep` stands for
`existing path`.
A path preserving policy will only consider drives where the relative
A path preserving policy will only consider branches where the relative
path being accessed already exists.
When using non-path preserving policies paths will be cloned to target
drives as necessary.
branches as necessary.
With the `msp` or `most shared path` policies they are defined as
`path preserving` for the purpose of controlling `link` and `rename`'s
@ -775,15 +776,15 @@ but it makes things a bit more uniform.
| all | Search: For **mkdir**, **mknod**, and **symlink** it will apply to all branches. **create** works like **ff**. |
| epall (existing path, all) | For **mkdir**, **mknod**, and **symlink** it will apply to all found. **create** works like **epff** (but more expensive because it doesn't stop after finding a valid branch). |
| epff (existing path, first found) | Given the order of the branches, as defined at mount time or configured at runtime, act on the first one found where the relative path exists. |
| eplfs (existing path, least free space) | Of all the branches on which the relative path exists choose the drive with the least free space. |
| eplus (existing path, least used space) | Of all the branches on which the relative path exists choose the drive with the least used space. |
| epmfs (existing path, most free space) | Of all the branches on which the relative path exists choose the drive with the most free space. |
| eplfs (existing path, least free space) | Of all the branches on which the relative path exists choose the branch with the least free space. |
| eplus (existing path, least used space) | Of all the branches on which the relative path exists choose the branch with the least used space. |
| epmfs (existing path, most free space) | Of all the branches on which the relative path exists choose the branch with the most free space. |
| eppfrd (existing path, percentage free random distribution) | Like **pfrd** but limited to existing paths. |
| eprand (existing path, random) | Calls **epall** and then randomizes. Returns 1. |
| ff (first found) | Given the order of the drives, as defined at mount time or configured at runtime, act on the first one found. |
| lfs (least free space) | Pick the drive with the least available free space. |
| lus (least used space) | Pick the drive with the least used space. |
| mfs (most free space) | Pick the drive with the most available free space. |
| ff (first found) | Given the order of the branches, as defined at mount time or configured at runtime, act on the first one found. |
| lfs (least free space) | Pick the branch with the least available free space. |
| lus (least used space) | Pick the branch with the least used space. |
| mfs (most free space) | Pick the branch with the most available free space. |
| msplfs (most shared path, least free space) | Like **eplfs** but if it fails to find a branch it will try again with the parent directory. Continues this pattern till finding one. |
| msplus (most shared path, least used space) | Like **eplus** but if it fails to find a branch it will try again with the parent directory. Continues this pattern till finding one. |
| mspmfs (most shared path, most free space) | Like **epmfs** but if it fails to find a branch it will try again with the parent directory. Continues this pattern till finding one. |
@ -832,7 +833,7 @@ filesystem. `rename` only works within a single filesystem or
device. If a rename can't be done atomically due to the source and
destination paths existing on different mount points it will return
**-1** with **errno = EXDEV** (cross device / improper link). So if a
`rename`'s source and target are on different drives within the pool
`rename`'s source and target are on different filesystems within the pool
it creates an issue.
Originally mergerfs would return EXDEV whenever a rename was requested
@ -850,25 +851,25 @@ work while still obeying mergerfs' policies. Below is the basic logic.
* Using the **rename** policy get the list of files to rename
* For each file attempt rename:
* If failure with ENOENT (no such file or directory) run **create** policy
* If create policy returns the same drive as currently evaluating then clone the path
* If create policy returns the same branch as currently evaluating then clone the path
* Re-attempt rename
* If **any** of the renames succeed the higher level rename is considered a success
* If **no** renames succeed the first error encountered will be returned
* On success:
* Remove the target from all drives with no source file
* Remove the source from all drives which failed to rename
* Remove the target from all branches with no source file
* Remove the source from all branches which failed to rename
* If using a **create** policy which does **not** try to preserve directory paths
* Using the **rename** policy get the list of files to rename
* Using the **getattr** policy get the target path
* For each file attempt rename:
* If the source drive != target drive:
* Clone target path from target drive to source drive
* If the source branch != target branch:
* Clone target path from target branch to source branch
* Rename
* If **any** of the renames succeed the higher level rename is considered a success
* If **no** renames succeed the first error encountered will be returned
* On success:
* Remove the target from all drives with no source file
* Remove the source from all drives which failed to rename
* Remove the target from all branches with no source file
* Remove the source from all branches which failed to rename
The the removals are subject to normal entitlement checks.
@ -894,11 +895,11 @@ the source of the metadata you see in an **ls**.
#### statfs / statvfs ####
[statvfs](http://linux.die.net/man/2/statvfs) normalizes the source
drives based on the fragment size and sums the number of adjusted
filesystems based on the fragment size and sums the number of adjusted
blocks and inodes. This means you will see the combined space of all
sources. Total, used, and free. The sources however are dedupped based
on the drive so multiple sources on the same drive will not result in
double counting its space. Filesystems mounted further down the tree
on the filesystem so multiple sources on the same drive will not result in
double counting its space. Other filesystems mounted further down the tree
of the branch will not be included when checking the mount's stats.
The options `statfs` and `statfs_ignore` can be used to modify
@ -1211,8 +1212,8 @@ following:
* mergerfs.fsck: Provides permissions and ownership auditing and the ability to fix them
* mergerfs.dedup: Will help identify and optionally remove duplicate files
* mergerfs.dup: Ensure there are at least N copies of a file across the pool
* mergerfs.balance: Rebalance files across drives by moving them from the most filled to the least filled
* mergerfs.consolidate: move files within a single mergerfs directory to the drive with most free space
* mergerfs.balance: Rebalance files across filesystems by moving them from the most filled to the least filled
* mergerfs.consolidate: move files within a single mergerfs directory to the filesystem with most free space
* https://github.com/trapexit/scorch
* scorch: A tool to help discover silent corruption of files and keep track of files
* https://github.com/trapexit/bbf
@ -1324,37 +1325,18 @@ of sizes below the FUSE message size (128K on older kernels, 1M on
newer).
#### policy caching
Policies are run every time a function (with a policy as mentioned
above) is called. These policies can be expensive depending on
mergerfs' setup and client usage patterns. Generally we wouldn't want
to cache policy results because it may result in stale responses if
the underlying drives are used directly.
The `open` policy cache will cache the result of an `open` policy for
a particular input for `cache.open` seconds or until the file is
unlinked. Each file close (release) will randomly chose to clean up
the cache of expired entries.
This cache is really only useful in cases where you have a large
number of branches and `open` is called on the same files repeatedly
(like **Transmission** which opens and closes a file on every
read/write presumably to keep file handle usage low).
#### statfs caching
Of the syscalls used by mergerfs in policies the `statfs` / `statvfs`
call is perhaps the most expensive. It's used to find out the
available space of a drive and whether it is mounted
available space of a filesystem and whether it is mounted
read-only. Depending on the setup and usage pattern these queries can
be relatively costly. When `cache.statfs` is enabled all calls to
`statfs` by a policy will be cached for the number of seconds its set
to.
Example: If the create policy is `mfs` and the timeout is 60 then for
that 60 seconds the same drive will be returned as the target for
that 60 seconds the same filesystem will be returned as the target for
creates because the available space won't be updated for that time.
@ -1392,42 +1374,42 @@ for instance.
MergerFS does not natively support any sort of tiered caching. Most
users have no use for such a feature and its inclusion would
complicate the code. However, there are a few situations where a cache
drive could help with a typical mergerfs setup.
filesystem could help with a typical mergerfs setup.
1. Fast network, slow drives, many readers: You've a 10+Gbps network
with many readers and your regular drives can't keep up.
2. Fast network, slow drives, small'ish bursty writes: You have a
1. Fast network, slow filesystems, many readers: You've a 10+Gbps network
with many readers and your regular filesystems can't keep up.
2. Fast network, slow filesystems, small'ish bursty writes: You have a
10+Gbps network and wish to transfer amounts of data less than your
cache drive but wish to do so quickly.
cache filesystem but wish to do so quickly.
With #1 it's arguable if you should be using mergerfs at all. RAID
would probably be the better solution. If you're going to use mergerfs
there are other tactics that may help: spreading the data across
drives (see the mergerfs.dup tool) and setting `func.open=rand`, using
`symlinkify`, or using dm-cache or a similar technology to add tiered
cache to the underlying device.
filesystems (see the mergerfs.dup tool) and setting `func.open=rand`,
using `symlinkify`, or using dm-cache or a similar technology to add
tiered cache to the underlying device.
With #2 one could use dm-cache as well but there is another solution
which requires only mergerfs and a cronjob.
1. Create 2 mergerfs pools. One which includes just the slow drives
and one which has both the fast drives (SSD,NVME,etc.) and slow
drives.
2. The 'cache' pool should have the cache drives listed first.
1. Create 2 mergerfs pools. One which includes just the slow devices
and one which has both the fast devices (SSD,NVME,etc.) and slow
devices.
2. The 'cache' pool should have the cache filesystems listed first.
3. The best `create` policies to use for the 'cache' pool would
probably be `ff`, `epff`, `lfs`, or `eplfs`. The latter two under
the assumption that the cache drive(s) are far smaller than the
backing drives. If using path preserving policies remember that
the assumption that the cache filesystem(s) are far smaller than the
backing filesystems. If using path preserving policies remember that
you'll need to manually create the core directories of those paths
you wish to be cached. Be sure the permissions are in sync. Use
`mergerfs.fsck` to check / correct them. You could also tag the
slow drives as `=NC` though that'd mean if the cache drives fill
you'd get "out of space" errors.
`mergerfs.fsck` to check / correct them. You could also set the
slow filesystems mode to `NC` though that'd mean if the cache
filesystems fill you'd get "out of space" errors.
4. Enable `moveonenospc` and set `minfreespace` appropriately. To make
sure there is enough room on the "slow" pool you might want to set
`minfreespace` to at least as large as the size of the largest
cache drive if not larger. This way in the worst case the whole of
the cache drive(s) can be moved to the other drives.
cache filesystem if not larger. This way in the worst case the
whole of the cache filesystem(s) can be moved to the other drives.
5. Set your programs to use the cache pool.
6. Save one of the below scripts or create you're own.
7. Use `cron` (as root) to schedule the command at whatever frequency
@ -1442,15 +1424,15 @@ rather than days. May want to use the `fadvise` / `--drop-cache`
version of rsync or run rsync with the tool "nocache".
*NOTE:* The arguments to these scripts include the cache
**drive**. Not the pool with the cache drive. You could have data loss
if the source is the cache pool.
**filesystem** itself. Not the pool with the cache filesystem. You
could have data loss if the source is the cache pool.