This feature, when enabled, will cause symlinks to be interpreted by mergerfs as their target (depending on the mode).
When there is a getattr/stat request for a file mergerfs will check if the file is a symlink and depending on the follow-symlinks setting will replace the information about the symlink with that of that which it points to.
When unlink'ing or rmdir'ing the followed symlink it will remove the symlink itself and not that which it points to.
never: Behave as normal. Symlinks are treated as such.
directory: Resolve symlinks only which point to directories.
regular: Resolve symlinks only which point to regular files.
all: Resolve all symlinks to that which they point to.
Symlinks which do not point to anything are left as is.
WARNING: This feature works but there might be edge cases yet found. If you find any odd behaviors please file a ticket on github.
<p><code>readdir</code> has policies to control how it manages reading directory
content.</p>
<table>
<thead>
<tr>
<th>Policy</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>seq</td>
<td>"sequential" : Iterate over branches in the order defined. This is the default and traditional behavior found prior to the readdir policy introduction. This will be increasingly slower the more banches are defined. Especially if waiting for drives to spinup or network filesystems to respond.</td>
</tr>
<tr>
<td>cosr</td>
<td>"concurrent open, sequential read" : Concurrently open branch directories using a thread pool and process them in order of definition. This keeps memory and CPU usage low while also reducing the time spent waiting on branches to respond. Number of threads defaults to the number of logical cores. Can be overwritten via the syntax <code>func.readdir=cosr:N</code> where <code>N</code> is the number of threads.</td>
</tr>
<tr>
<td>cor</td>
<td>"concurrent open and read" : Concurrently open branch directories and immediately start reading their contents using a thread pool. This will result in slightly higher memory and CPU usage but reduced latency. Particularly when using higher latency / slower speed network filesystem branches. Unlike <code>seq</code> and <code>cosr</code> the order of files could change due the async nature of the thread pool. Number of threads defaults to the number of logical cores. Can be overwritten via the syntax <code>func.readdir=cor:N</code> where <code>N</code> is the number of threads.</td>
</tr>
</tbody>
</table>
<p>Keep in mind that <code>readdir</code> mostly just provides a list of file names
in a directory and possibly some basic metadata about said files. To
know details about the files, as one would see from commands like
<code>find</code> or <code>ls</code>, it is required to call <code>stat</code> on the file which is
controlled by <code>fuse.getattr</code>.</p>
<h2id="ioctl">ioctl</h2>
<p>When <code>ioctl</code> is used with an open file then it will use the file
handle which was created at the original <code>open</code> call. However, when
using <code>ioctl</code> with a directory mergerfs will use the <code>open</code> policy to
find the directory to act on.</p>
<h4id="rename-and-link">rename and link</h4>
<p><strong>NOTE:</strong> If you're receiving errors from software when files are
moved / renamed / linked then you should consider changing the create
policy to one which is <strong>not</strong> path preserving, enabling
<code>ignorepponrename</code>, or contacting the author of the offending software
and requesting that <code>EXDEV</code> (cross device / improper link) be properly
handled.</p>
<p><code>rename</code> and <code>link</code> are tricky functions in a union
filesystem. <code>rename</code> only works within a single filesystem or
device. If a rename can't be done atomically due to the source and
destination paths existing on different mount points it will return
<strong>-1</strong> with <strong>errno = EXDEV</strong> (cross device / improper link). So if a
<code>rename</code>'s source and target are on different filesystems within the pool
it creates an issue.</p>
<p>Originally mergerfs would return EXDEV whenever a rename was requested
which was cross directory in any way. This made the code simple and
was technically compliant with POSIX requirements. However, many
applications fail to handle EXDEV at all and treat it as a normal
error or otherwise handle it poorly. Such apps include: gvfsd-fuse
v1.20.3 and prior, Finder / CIFS/SMB client in Apple OSX 10.9+,
NZBGet, Samba's recycling bin feature.</p>
<p>As a result a compromise was made in order to get most software to
work while still obeying mergerfs' policies. Below is the basic logic.</p>
<ul>
<li>If using a <strong>create</strong> policy which tries to preserve directory paths (epff,eplfs,eplus,epmfs)</li>
<li>Using the <strong>rename</strong> policy get the list of files to rename</li>
<li>For each file attempt rename:<ul>
<li>If failure with ENOENT (no such file or directory) run <strong>create</strong> policy</li>
<li>If create policy returns the same branch as currently evaluating then clone the path</li>
<li>Re-attempt rename</li>
</ul>
</li>
<li>If <strong>any</strong> of the renames succeed the higher level rename is considered a success</li>
<li>If <strong>no</strong> renames succeed the first error encountered will be returned</li>
<li>On success:<ul>
<li>Remove the target from all branches with no source file</li>
<li>Remove the source from all branches which failed to rename</li>
</ul>
</li>
<li>If using a <strong>create</strong> policy which does <strong>not</strong> try to preserve directory paths</li>
<li>Using the <strong>rename</strong> policy get the list of files to rename</li>
<li>Using the <strong>getattr</strong> policy get the target path</li>
<li>For each file attempt rename:<ul>
<li>If the source branch != target branch:</li>
<li>Clone target path from target branch to source branch</li>
<li>Rename</li>
</ul>
</li>
<li>If <strong>any</strong> of the renames succeed the higher level rename is considered a success</li>
<li>If <strong>no</strong> renames succeed the first error encountered will be returned</li>
<li>On success:<ul>
<li>Remove the target from all branches with no source file</li>
<li>Remove the source from all branches which failed to rename</li>
</ul>
</li>
</ul>
<p>The removals are subject to normal entitlement checks.</p>
<p>The above behavior will help minimize the likelihood of EXDEV being
returned but it will still be possible.</p>
<p><strong>link</strong> uses the same strategy but without the removals.</p>
<h4id="statfs-statvfs">statfs / statvfs</h4>
<p><ahref="http://linux.die.net/man/2/statvfs">statvfs</a> normalizes the source
filesystems based on the fragment size and sums the number of adjusted
blocks and inodes. This means you will see the combined space of all
sources. Total, used, and free. The sources however are dedupped based
on the filesystem so multiple sources on the same drive will not result in
double counting its space. Other filesystems mounted further down the tree
of the branch will not be included when checking the mount's stats.</p>
<p>The options <code>statfs</code> and <code>statfs_ignore</code> can be used to modify
FUSE applications communicate with the kernel over a special character device: /dev/fuse. A large portion of the overhead associated with FUSE is the cost of going back and forth between user space and kernel space over that device. Generally speaking, the fewer trips needed the better the performance will be. Reducing the number of trips can be done a number of ways. Kernel level caching and increasing message sizes being two significant ones. When it comes to reads and writes if the message size is doubled the number of trips are approximately halved.
In Linux 4.20 a new feature was added allowing the negotiation of the max message size. Since the size is in multiples of pages the feature is called max_pages. There is a maximum max_pages value of 256 (1MiB) and minimum of 1 (4KiB). The default used by Linux >=4.20, and hardcoded value used before 4.20, is 32 (128KiB). In mergerfs it's referred to as fuse_msg_size to make it clear what it impacts and provide some abstraction.
Since there should be no downsides to increasing fuse_msg_size / max_pages, outside a minor bump in RAM usage due to larger message buffers, mergerfs defaults the value to 256. On kernels before 4.20 the value has no effect. The reason the value is configurable is to enable experimentation and benchmarking. See the BENCHMARKING section for examples.
Inodes (st_ino) are unique identifiers within a filesystem. Each
mounted filesystem has device ID (st_dev) as well and together they
can uniquely identify a file on the whole of the system. Entries on
the same device with the same inode are in fact references to the same
underlying file. It is a many to one relationship between names and an
inode. Directories, however, do not have multiple links on most
systems due to the complexity they add.
FUSE allows the server (mergerfs) to set inode values but not device
IDs. Creating an inode value is somewhat complex in mergerfs' case as
files aren't really in its control. If a policy changes what directory
or file is to be selected or something changes out of band it becomes
unclear what value should be used. Most software does not to care what
the values are but those that do often break if a value changes
unexpectedly. The tool find will abort a directory walk if it sees a
directory inode change. NFS can return stale handle errors if the
inode changes out of band. File dedup tools will usually leverage
device ids and inodes as a shortcut in searching for duplicate files
and would resort to full file comparisons should it find different
inode values.
mergerfs offers multiple ways to calculate the inode in hopes of
covering different usecases.
* `passthrough`: Passes through the underlying inode value. Mostly
intended for testing as using this does not address any of the
problems mentioned above and could confuse file deduplication
software as inodes from different filesystems can be the same.
* `path-hash`: Hashes the relative path of the entry in question. The
underlying file's values are completely ignored. This means the
inode value will always be the same for that file path. This is
useful when using NFS and you make changes out of band such as copy
data between branches. This also means that entries that do point to
the same file will not be recognizable via inodes. That does not
mean hard links don't work. They will.
* `path-hash32`: 32bit version of path-hash.
* `devino-hash`: Hashes the device id and inode of the underlying
entry. This won't prevent issues with NFS should the policy pick a
different file or files move out of band but will present the same
inode for underlying files that do too.
* `devino-hash32`: 32bit version of devino-hash.
hybrid-hash: Performs path-hash on directories and devino-hash on other file types. Since directories can't have hard links the static value won't make a difference and the files will get values useful for finding duplicates. Probably the best to use if not using NFS. As such it is the default.
Runtime extended attribute support can be managed via the `xattr` option. By default it will passthrough any xattr calls. Given xattr support is rarely used and can have significant performance implications mergerfs allows it to be disabled at runtime. The performance problems mostly comes when file caching is enabled. The kernel will send a `getxattr` for `security.capability` _before every single write_. It doesn't cache the responses to any `getxattr`. This might be addressed in the future but for now mergerfs can really only offer the following workarounds.
`noattr` will cause mergerfs to short circuit all xattr calls and return ENOATTR where appropriate. mergerfs still gets all the requests but they will not be forwarded on to the underlying filesystems. The runtime control will still function in this mode.
`nosys` will cause mergerfs to return ENOSYS for any xattr call. The difference with `noattr` is that the kernel will cache this fact and itself short circuit future calls. This is more efficient than `noattr` but will cause mergerfs' runtime control via the hidden file to stop working.
<h2id="how-can-i-setup-my-system-to-limit-drive-spinup">How can I setup my system to limit drive spinup?</h2>
<p>TL;DR: You really can't. Not through mergerfs alone.</p>
<p>TL;DR: You really can't. Not through mergerfs alone.</p>
<p>mergerfs is a proxy. Not a cache. It proxies calls between client software and underlying filesystems. If a client does an <code>open</code>, <code>readdir</code>, <code>stat</code>, etc. it must translate that into something that makes sense across N filesystems. For <code>readdir</code> that means running the call against all branches and aggregating the output. For <code>open</code> that means finding the file to open and doing so. The only way to find the file to open is to scan across all branches and sort the results and pick one. There is no practical way to do otherwise. Especially given so many mergerfs users expect out of band changes to "just work."</p>
<p>mergerfs is a proxy. Not a cache. It proxies calls between client software and underlying filesystems. If a client does an <code>open</code>, <code>readdir</code>, <code>stat</code>, etc. it must translate that into something that makes sense across N filesystems. For <code>readdir</code> that means running the call against all branches and aggregating the output. For <code>open</code> that means finding the file to open and doing so. The only way to find the file to open is to scan across all branches and sort the results and pick one. There is no practical way to do otherwise. Especially given so many mergerfs users expect out of band changes to "just work."</p>
<p>The best way to limit spinup of drives is to limit their usage at the client level. Meaning keeping software from interacting with the filesystem all together.</p>
<p>The best way to limit spinup of drives is to limit their usage at the client level. Meaning keeping software from interacting with the filesystem all together.</p>