mirror of https://github.com/trapexit/mergerfs.git
You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
1157 lines
42 KiB
1157 lines
42 KiB
.\"t
|
|
.\" Automatically generated by Pandoc 1.16.0.2
|
|
.\"
|
|
.TH "mergerfs" "1" "2017\-02\-18" "mergerfs user manual" ""
|
|
.hy
|
|
.SH NAME
|
|
.PP
|
|
mergerfs \- a featureful union filesystem
|
|
.SH SYNOPSIS
|
|
.PP
|
|
mergerfs \-o<options> <srcmounts> <mountpoint>
|
|
.SH DESCRIPTION
|
|
.PP
|
|
\f[B]mergerfs\f[] is a union filesystem geared towards simplifying
|
|
storage and management of files across numerous commodity storage
|
|
devices.
|
|
It is similar to \f[B]mhddfs\f[], \f[B]unionfs\f[], and \f[B]aufs\f[].
|
|
.SH FEATURES
|
|
.IP \[bu] 2
|
|
Runs in userspace (FUSE)
|
|
.IP \[bu] 2
|
|
Configurable behaviors
|
|
.IP \[bu] 2
|
|
Support for extended attributes (xattrs)
|
|
.IP \[bu] 2
|
|
Support for file attributes (chattr)
|
|
.IP \[bu] 2
|
|
Runtime configurable (via xattrs)
|
|
.IP \[bu] 2
|
|
Safe to run as root
|
|
.IP \[bu] 2
|
|
Opportunistic credential caching
|
|
.IP \[bu] 2
|
|
Works with heterogeneous filesystem types
|
|
.IP \[bu] 2
|
|
Handling of writes to full drives (transparently move file to drive with
|
|
capacity)
|
|
.IP \[bu] 2
|
|
Handles pool of readonly and read/write drives
|
|
.SH OPTIONS
|
|
.SS mount options
|
|
.IP \[bu] 2
|
|
\f[B]defaults\f[]: a shortcut for FUSE\[aq]s \f[B]atomic_o_trunc\f[],
|
|
\f[B]auto_cache\f[], \f[B]big_writes\f[], \f[B]default_permissions\f[],
|
|
\f[B]splice_move\f[], \f[B]splice_read\f[], and \f[B]splice_write\f[].
|
|
These options seem to provide the best performance.
|
|
.IP \[bu] 2
|
|
\f[B]direct_io\f[]: causes FUSE to bypass caching which can increase
|
|
write speeds at the detriment of reads.
|
|
Note that not enabling \f[C]direct_io\f[] will cause double caching of
|
|
files and therefore less memory for caching generally.
|
|
However, \f[C]mmap\f[] does not work when \f[C]direct_io\f[] is enabled.
|
|
.IP \[bu] 2
|
|
\f[B]minfreespace\f[]: the minimum space value used for creation
|
|
policies.
|
|
Understands \[aq]K\[aq], \[aq]M\[aq], and \[aq]G\[aq] to represent
|
|
kilobyte, megabyte, and gigabyte respectively.
|
|
(default: 4G)
|
|
.IP \[bu] 2
|
|
\f[B]moveonenospc\f[]: when enabled (set to \f[B]true\f[]) if a
|
|
\f[B]write\f[] fails with \f[B]ENOSPC\f[] or \f[B]EDQUOT\f[] a scan of
|
|
all drives will be done looking for the drive with most free space which
|
|
is at least the size of the file plus the amount which failed to write.
|
|
An attempt to move the file to that drive will occur (keeping all
|
|
metadata possible) and if successful the original is unlinked and the
|
|
write retried.
|
|
(default: false)
|
|
.IP \[bu] 2
|
|
\f[B]use_ino\f[]: causes mergerfs to supply file/directory inodes rather
|
|
than libfuse.
|
|
While not a default it is generally recommended it be enabled so that
|
|
hard linked files share the same inode value.
|
|
.IP \[bu] 2
|
|
\f[B]dropcacheonclose\f[]: when a file is requested to be closed call
|
|
\f[C]posix_fadvise\f[] on it first to instruct the kernel that we no
|
|
longer need the data and it can drop its cache.
|
|
Recommended when \f[B]direct_io\f[] is not enabled to limit double
|
|
caching.
|
|
(default: false)
|
|
.IP \[bu] 2
|
|
\f[B]fsname\f[]: sets the name of the filesystem as seen in
|
|
\f[B]mount\f[], \f[B]df\f[], etc.
|
|
Defaults to a list of the source paths concatenated together with the
|
|
longest common prefix removed.
|
|
.IP \[bu] 2
|
|
\f[B]func.<func>=<policy>\f[]: sets the specific FUSE function\[aq]s
|
|
policy.
|
|
See below for the list of value types.
|
|
Example: \f[B]func.getattr=newest\f[]
|
|
.IP \[bu] 2
|
|
\f[B]category.<category>=<policy>\f[]: Sets policy of all FUSE functions
|
|
in the provided category.
|
|
Example: \f[B]category.create=mfs\f[]
|
|
.PP
|
|
\f[B]NOTE:\f[] Options are evaluated in the order listed so if the
|
|
options are \f[B]func.rmdir=rand,category.action=ff\f[] the
|
|
\f[B]action\f[] category setting will override the \f[B]rmdir\f[]
|
|
setting.
|
|
.SS srcmounts
|
|
.PP
|
|
The srcmounts (source mounts) argument is a colon (\[aq]:\[aq])
|
|
delimited list of paths to be included in the pool.
|
|
It does not matter if the paths are on the same or different drives nor
|
|
does it matter the filesystem.
|
|
Used and available space will not be duplicated for paths on the same
|
|
device and any features which aren\[aq]t supported by the underlying
|
|
filesystem (such as file attributes or extended attributes) will return
|
|
the appropriate errors.
|
|
.PP
|
|
To make it easier to include multiple source mounts mergerfs supports
|
|
globbing (http://linux.die.net/man/7/glob).
|
|
\f[B]The globbing tokens MUST be escaped when using via the shell else
|
|
the shell itself will expand it.\f[]
|
|
.IP
|
|
.nf
|
|
\f[C]
|
|
$\ mergerfs\ \-o\ defaults,allow_other,use_ino\ /mnt/disk\\*:/mnt/cdrom\ /media/drives
|
|
\f[]
|
|
.fi
|
|
.PP
|
|
The above line will use all mount points in /mnt prefixed with
|
|
\f[B]disk\f[] and the \f[B]cdrom\f[].
|
|
.PP
|
|
To have the pool mounted at boot or otherwise accessable from related
|
|
tools use \f[B]/etc/fstab\f[].
|
|
.IP
|
|
.nf
|
|
\f[C]
|
|
#\ <file\ system>\ \ \ \ \ \ \ \ <mount\ point>\ \ <type>\ \ \ \ \ \ \ \ \ <options>\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ <dump>\ \ <pass>
|
|
/mnt/disk*:/mnt/cdrom\ \ /media/drives\ \ fuse.mergerfs\ \ defaults,allow_other,use_ino\ \ 0\ \ \ \ \ \ \ 0
|
|
\f[]
|
|
.fi
|
|
.PP
|
|
\f[B]NOTE:\f[] the globbing is done at mount or xattr update time (see
|
|
below).
|
|
If a new directory is added matching the glob after the fact it will not
|
|
be automatically included.
|
|
.PP
|
|
\f[B]NOTE:\f[] for mounting via \f[B]fstab\f[] to work you must have
|
|
\f[B]mount.fuse\f[] installed.
|
|
For Ubuntu/Debian it is included in the \f[B]fuse\f[] package.
|
|
.SH FUNCTIONS / POLICIES / CATEGORIES
|
|
.PP
|
|
The POSIX filesystem API has a number of functions.
|
|
\f[B]creat\f[], \f[B]stat\f[], \f[B]chown\f[], etc.
|
|
In mergerfs these functions are grouped into 3 categories:
|
|
\f[B]action\f[], \f[B]create\f[], and \f[B]search\f[].
|
|
Functions and categories can be assigned a policy which dictates how
|
|
\f[B]mergerfs\f[] behaves.
|
|
Any policy can be assigned to a function or category though some may not
|
|
be very useful in practice.
|
|
For instance: \f[B]rand\f[] (random) may be useful for file creation
|
|
(create) but could lead to very odd behavior if used for \f[C]chmod\f[]
|
|
(though only if there were more than one copy of the file).
|
|
.PP
|
|
Policies, when called to create, will ignore drives which are readonly.
|
|
This allows for readonly and read/write drives to be mixed together.
|
|
Note that the drive must be explicitly mounted with the \f[B]ro\f[]
|
|
mount option for this to work.
|
|
.SS Function / Category classifications
|
|
.PP
|
|
.TS
|
|
tab(@);
|
|
lw(10.7n) lw(16.5n).
|
|
T{
|
|
Category
|
|
T}@T{
|
|
FUSE Functions
|
|
T}
|
|
_
|
|
T{
|
|
action
|
|
T}@T{
|
|
chmod, chown, link, removexattr, rename, rmdir, setxattr, truncate,
|
|
unlink, utimens
|
|
T}
|
|
T{
|
|
create
|
|
T}@T{
|
|
create, mkdir, mknod, symlink
|
|
T}
|
|
T{
|
|
search
|
|
T}@T{
|
|
access, getattr, getxattr, ioctl, listxattr, open, readlink
|
|
T}
|
|
T{
|
|
N/A
|
|
T}@T{
|
|
fallocate, fgetattr, fsync, ftruncate, ioctl, read, readdir, release,
|
|
statfs, write
|
|
T}
|
|
.TE
|
|
.PP
|
|
Due to FUSE limitations \f[B]ioctl\f[] behaves differently if its acting
|
|
on a directory.
|
|
It\[aq]ll use the \f[B]getattr\f[] policy to find and open the directory
|
|
before issuing the \f[B]ioctl\f[].
|
|
In other cases where something may be searched (to confirm a directory
|
|
exists across all source mounts) \f[B]getattr\f[] will also be used.
|
|
.SS Path Preservation
|
|
.PP
|
|
Policies, as described below, are of two core types.
|
|
\f[C]path\ preserving\f[] and \f[C]non\-path\ preserving\f[].
|
|
.PP
|
|
All policies which start with \f[C]ep\f[] (\f[B]epff\f[],
|
|
\f[B]eplfs\f[], \f[B]eplus\f[], \f[B]epmfs\f[], \f[B]eprand\f[]) are
|
|
\f[C]path\ preserving\[aq].\f[]ep\f[C]stands\ for\ \[aq]existing\ path\f[].
|
|
.PP
|
|
As the descriptions explain a path preserving policy will only consider
|
|
drives where the relative path being accessed already exists.
|
|
.PP
|
|
When using non\-path preserving policies where something is created
|
|
paths will be copied to target drives as necessary.
|
|
.SS Policy descriptions
|
|
.PP
|
|
.TS
|
|
tab(@);
|
|
lw(14.6n) lw(13.6n).
|
|
T{
|
|
Policy
|
|
T}@T{
|
|
Description
|
|
T}
|
|
_
|
|
T{
|
|
all
|
|
T}@T{
|
|
Search category: acts like \f[B]ff\f[].
|
|
Action category: apply to all found.
|
|
Create category: for \f[B]mkdir\f[], \f[B]mknod\f[], and
|
|
\f[B]symlink\f[] it will apply to all found.
|
|
\f[B]create\f[] works like \f[B]ff\f[].
|
|
It will exclude readonly drives and those with free space less than
|
|
\f[B]minfreespace\f[].
|
|
T}
|
|
T{
|
|
epall (existing path, all)
|
|
T}@T{
|
|
Search category: acts like \f[B]epff\f[].
|
|
Action category: apply to all found.
|
|
Create category: for \f[B]mkdir\f[], \f[B]mknod\f[], and
|
|
\f[B]symlink\f[] it will apply to all existing paths found.
|
|
\f[B]create\f[] works like \f[B]epff\f[].
|
|
Excludes readonly drives and those with free space less than
|
|
\f[B]minfreespace\f[].
|
|
T}
|
|
T{
|
|
epff (existing path, first found)
|
|
T}@T{
|
|
Given the order of the drives, as defined at mount time or configured at
|
|
runtime, act on the first one found where the relative path already
|
|
exists.
|
|
For \f[B]create\f[] category functions it will exclude readonly drives
|
|
and those with free space less than \f[B]minfreespace\f[] (unless there
|
|
is no other option).
|
|
Falls back to \f[B]ff\f[].
|
|
T}
|
|
T{
|
|
eplfs (existing path, least free space)
|
|
T}@T{
|
|
Of all the drives on which the relative path exists choose the drive
|
|
with the least free space.
|
|
For \f[B]create\f[] category functions it will exclude readonly drives
|
|
and those with free space less than \f[B]minfreespace\f[].
|
|
Falls back to \f[B]lfs\f[].
|
|
T}
|
|
T{
|
|
eplus (existing path, least used space)
|
|
T}@T{
|
|
Of all the drives on which the relative path exists choose the drive
|
|
with the least used space.
|
|
For \f[B]create\f[] category functions it will exclude readonly drives
|
|
and those with free space less than \f[B]minfreespace\f[].
|
|
Falls back to \f[B]lus\f[].
|
|
T}
|
|
T{
|
|
epmfs (existing path, most free space)
|
|
T}@T{
|
|
Of all the drives on which the relative path exists choose the drive
|
|
with the most free space.
|
|
For \f[B]create\f[] category functions it will exclude readonly drives
|
|
and those with free space less than \f[B]minfreespace\f[].
|
|
Falls back to \f[B]mfs\f[].
|
|
T}
|
|
T{
|
|
eprand (existing path, random)
|
|
T}@T{
|
|
Calls \f[B]epall\f[] and then randomizes.
|
|
Otherwise behaves the same as \f[B]epall\f[].
|
|
T}
|
|
T{
|
|
erofs
|
|
T}@T{
|
|
Exclusively return \f[B]\-1\f[] with \f[B]errno\f[] set to
|
|
\f[B]EROFS\f[] (Read\-only filesystem).
|
|
By setting \f[B]create\f[] functions to this you can in effect turn the
|
|
filesystem mostly readonly.
|
|
T}
|
|
T{
|
|
ff (first found)
|
|
T}@T{
|
|
Given the order of the drives, as defined at mount time or configured at
|
|
runtime, act on the first one found.
|
|
For \f[B]create\f[] category functions it will exclude readonly drives
|
|
and those with free space less than \f[B]minfreespace\f[] (unless there
|
|
is no other option).
|
|
T}
|
|
T{
|
|
lfs (least free space)
|
|
T}@T{
|
|
Pick the drive with the least available free space.
|
|
For \f[B]create\f[] category functions it will exclude readonly drives
|
|
and those with free space less than \f[B]minfreespace\f[].
|
|
Falls back to \f[B]mfs\f[].
|
|
T}
|
|
T{
|
|
lus (least used space)
|
|
T}@T{
|
|
Pick the drive with the least used space.
|
|
For \f[B]create\f[] category functions it will exclude readonly drives
|
|
and those with free space less than \f[B]minfreespace\f[].
|
|
Falls back to \f[B]mfs\f[].
|
|
T}
|
|
T{
|
|
mfs (most free space)
|
|
T}@T{
|
|
Pick the drive with the most available free space.
|
|
For \f[B]create\f[] category functions it will exclude readonly drives.
|
|
Falls back to \f[B]ff\f[].
|
|
T}
|
|
T{
|
|
newest
|
|
T}@T{
|
|
Pick the file / directory with the largest mtime.
|
|
For \f[B]create\f[] category functions it will exclude readonly drives
|
|
and those with free space less than \f[B]minfreespace\f[] (unless there
|
|
is no other option).
|
|
T}
|
|
T{
|
|
rand (random)
|
|
T}@T{
|
|
Calls \f[B]all\f[] and then randomizes.
|
|
T}
|
|
.TE
|
|
.SS Defaults
|
|
.PP
|
|
.TS
|
|
tab(@);
|
|
l l.
|
|
T{
|
|
Category
|
|
T}@T{
|
|
Policy
|
|
T}
|
|
_
|
|
T{
|
|
action
|
|
T}@T{
|
|
all
|
|
T}
|
|
T{
|
|
create
|
|
T}@T{
|
|
epmfs
|
|
T}
|
|
T{
|
|
search
|
|
T}@T{
|
|
ff
|
|
T}
|
|
.TE
|
|
.SS rename & link
|
|
.PP
|
|
\f[B]NOTE:\f[] If you\[aq]re receiving errors from software when files
|
|
are moved / renamed then you should consider changing the create policy
|
|
to one which is \f[B]not\f[] path preserving or contacting the author of
|
|
the offending software and requesting that \f[C]EXDEV\f[] be properly
|
|
handled.
|
|
.PP
|
|
rename (http://man7.org/linux/man-pages/man2/rename.2.html) is a tricky
|
|
function in a merged system.
|
|
Under normal situations rename only works within a single filesystem or
|
|
device.
|
|
If a rename can\[aq]t be done atomically due to the source and
|
|
destination paths existing on different mount points it will return
|
|
\f[B]\-1\f[] with \f[B]errno = EXDEV\f[] (cross device).
|
|
.PP
|
|
Originally mergerfs would return EXDEV whenever a rename was requested
|
|
which was cross directory in any way.
|
|
This made the code simple and was technically complient with POSIX
|
|
requirements.
|
|
However, many applications fail to handle EXDEV at all and treat it as a
|
|
normal error or otherwise handle it poorly.
|
|
Such apps include: gvfsd\-fuse v1.20.3 and prior, Finder / CIFS/SMB
|
|
client in Apple OSX 10.9+, NZBGet, Samba\[aq]s recycling bin feature.
|
|
.PP
|
|
As a result a compromise was made in order to get most software to work
|
|
while still obeying mergerfs\[aq] policies.
|
|
Below is the rather complicated logic.
|
|
.IP \[bu] 2
|
|
If using a \f[B]create\f[] policy which tries to preserve directory
|
|
paths (epff,eplfs,eplus,epmfs)
|
|
.IP \[bu] 2
|
|
Using the \f[B]rename\f[] policy get the list of files to rename
|
|
.IP \[bu] 2
|
|
For each file attempt rename:
|
|
.RS 2
|
|
.IP \[bu] 2
|
|
If failure with ENOENT run \f[B]create\f[] policy
|
|
.IP \[bu] 2
|
|
If create policy returns the same drive as currently evaluating then
|
|
clone the path
|
|
.IP \[bu] 2
|
|
Re\-attempt rename
|
|
.RE
|
|
.IP \[bu] 2
|
|
If \f[B]any\f[] of the renames succeed the higher level rename is
|
|
considered a success
|
|
.IP \[bu] 2
|
|
If \f[B]no\f[] renames succeed the first error encountered will be
|
|
returned
|
|
.IP \[bu] 2
|
|
On success:
|
|
.RS 2
|
|
.IP \[bu] 2
|
|
Remove the target from all drives with no source file
|
|
.IP \[bu] 2
|
|
Remove the source from all drives which failed to rename
|
|
.RE
|
|
.IP \[bu] 2
|
|
If using a \f[B]create\f[] policy which does \f[B]not\f[] try to
|
|
preserve directory paths
|
|
.IP \[bu] 2
|
|
Using the \f[B]rename\f[] policy get the list of files to rename
|
|
.IP \[bu] 2
|
|
Using the \f[B]getattr\f[] policy get the target path
|
|
.IP \[bu] 2
|
|
For each file attempt rename:
|
|
.RS 2
|
|
.IP \[bu] 2
|
|
If the source drive != target drive:
|
|
.IP \[bu] 2
|
|
Clone target path from target drive to source drive
|
|
.IP \[bu] 2
|
|
Rename
|
|
.RE
|
|
.IP \[bu] 2
|
|
If \f[B]any\f[] of the renames succeed the higher level rename is
|
|
considered a success
|
|
.IP \[bu] 2
|
|
If \f[B]no\f[] renames succeed the first error encountered will be
|
|
returned
|
|
.IP \[bu] 2
|
|
On success:
|
|
.RS 2
|
|
.IP \[bu] 2
|
|
Remove the target from all drives with no source file
|
|
.IP \[bu] 2
|
|
Remove the source from all drives which failed to rename
|
|
.RE
|
|
.PP
|
|
The the removals are subject to normal entitlement checks.
|
|
.PP
|
|
The above behavior will help minimize the likelihood of EXDEV being
|
|
returned but it will still be possible.
|
|
.PP
|
|
\f[B]link\f[] uses the same basic strategy.
|
|
.SS readdir
|
|
.PP
|
|
readdir (http://linux.die.net/man/3/readdir) is different from all other
|
|
filesystem functions.
|
|
While it could have it\[aq]s own set of policies to tweak its behavior
|
|
at this time it provides a simple union of files and directories found.
|
|
Remember that any action or information queried about these files and
|
|
directories come from the respective function.
|
|
For instance: an \f[B]ls\f[] is a \f[B]readdir\f[] and for each
|
|
file/directory returned \f[B]getattr\f[] is called.
|
|
Meaning the policy of \f[B]getattr\f[] is responsible for choosing the
|
|
file/directory which is the source of the metadata you see in an
|
|
\f[B]ls\f[].
|
|
.SS statvfs
|
|
.PP
|
|
statvfs (http://linux.die.net/man/2/statvfs) normalizes the source
|
|
drives based on the fragment size and sums the number of adjusted blocks
|
|
and inodes.
|
|
This means you will see the combined space of all sources.
|
|
Total, used, and free.
|
|
The sources however are dedupped based on the drive so multiple sources
|
|
on the same drive will not result in double counting it\[aq]s space.
|
|
.SH BUILDING
|
|
.PP
|
|
\f[B]NOTE:\f[] Prebuilt packages can be found at:
|
|
https://github.com/trapexit/mergerfs/releases
|
|
.PP
|
|
First get the code from github (http://github.com/trapexit/mergerfs).
|
|
.IP
|
|
.nf
|
|
\f[C]
|
|
$\ git\ clone\ https://github.com/trapexit/mergerfs.git
|
|
$\ #\ or
|
|
$\ wget\ https://github.com/trapexit/mergerfs/releases/download/<ver>/mergerfs\-<ver>.tar.gz
|
|
\f[]
|
|
.fi
|
|
.SS Debian / Ubuntu
|
|
.IP
|
|
.nf
|
|
\f[C]
|
|
$\ sudo\ apt\-get\ install\ g++\ pkg\-config\ git\ git\-buildpackage\ pandoc\ debhelper\ libfuse\-dev\ libattr1\-dev\ python
|
|
$\ cd\ mergerfs
|
|
$\ make\ deb
|
|
$\ sudo\ dpkg\ \-i\ ../mergerfs_version_arch.deb
|
|
\f[]
|
|
.fi
|
|
.SS Fedora
|
|
.IP
|
|
.nf
|
|
\f[C]
|
|
$\ su\ \-
|
|
#\ dnf\ install\ rpm\-build\ fuse\-devel\ libattr\-devel\ pandoc\ gcc\-c++\ git\ make\ which\ python
|
|
#\ cd\ mergerfs
|
|
#\ make\ rpm
|
|
#\ rpm\ \-i\ rpmbuild/RPMS/<arch>/mergerfs\-<verion>.<arch>.rpm
|
|
\f[]
|
|
.fi
|
|
.SS Generically
|
|
.PP
|
|
Have git, python, pkg\-config, pandoc, libfuse, libattr1 installed.
|
|
.IP
|
|
.nf
|
|
\f[C]
|
|
$\ cd\ mergerfs
|
|
$\ make
|
|
$\ make\ man
|
|
$\ sudo\ make\ install
|
|
\f[]
|
|
.fi
|
|
.SH RUNTIME
|
|
.SS \&.mergerfs pseudo file
|
|
.IP
|
|
.nf
|
|
\f[C]
|
|
<mountpoint>/.mergerfs
|
|
\f[]
|
|
.fi
|
|
.PP
|
|
There is a pseudo file available at the mount point which allows for the
|
|
runtime modification of certain \f[B]mergerfs\f[] options.
|
|
The file will not show up in \f[B]readdir\f[] but can be
|
|
\f[B]stat\f[]\[aq]ed and manipulated via
|
|
{list,get,set}xattrs (http://linux.die.net/man/2/listxattr) calls.
|
|
.PP
|
|
Even if xattrs are disabled for mergerfs the
|
|
{list,get,set}xattrs (http://linux.die.net/man/2/listxattr) calls
|
|
against this pseudo file will still work.
|
|
.PP
|
|
Any changes made at runtime are \f[B]not\f[] persisted.
|
|
If you wish for values to persist they must be included as options
|
|
wherever you configure the mounting of mergerfs (fstab).
|
|
.SS Keys
|
|
.PP
|
|
Use \f[C]xattr\ \-l\ /mount/point/.mergerfs\f[] to see all supported
|
|
keys.
|
|
Some are informational and therefore readonly.
|
|
.SS user.mergerfs.srcmounts
|
|
.PP
|
|
Used to query or modify the list of source mounts.
|
|
When modifying there are several shortcuts to easy manipulation of the
|
|
list.
|
|
.PP
|
|
.TS
|
|
tab(@);
|
|
l l.
|
|
T{
|
|
Value
|
|
T}@T{
|
|
Description
|
|
T}
|
|
_
|
|
T{
|
|
[list]
|
|
T}@T{
|
|
set
|
|
T}
|
|
T{
|
|
+<[list]
|
|
T}@T{
|
|
prepend
|
|
T}
|
|
T{
|
|
+>[list]
|
|
T}@T{
|
|
append
|
|
T}
|
|
T{
|
|
\-[list]
|
|
T}@T{
|
|
remove all values provided
|
|
T}
|
|
T{
|
|
\-<
|
|
T}@T{
|
|
remove first in list
|
|
T}
|
|
T{
|
|
\->
|
|
T}@T{
|
|
remove last in list
|
|
T}
|
|
.TE
|
|
.SS minfreespace
|
|
.PP
|
|
Input: interger with an optional multiplier suffix.
|
|
\f[B]K\f[], \f[B]M\f[], or \f[B]G\f[].
|
|
.PP
|
|
Output: value in bytes
|
|
.SS moveonenospc
|
|
.PP
|
|
Input: \f[B]true\f[] and \f[B]false\f[]
|
|
.PP
|
|
Ouput: \f[B]true\f[] or \f[B]false\f[]
|
|
.SS categories / funcs
|
|
.PP
|
|
Input: short policy string as described elsewhere in this document
|
|
.PP
|
|
Output: the policy string except for categories where its funcs have
|
|
multiple types.
|
|
In that case it will be a comma separated list
|
|
.SS Example
|
|
.IP
|
|
.nf
|
|
\f[C]
|
|
[trapexit:/tmp/mount]\ $\ xattr\ \-l\ .mergerfs
|
|
user.mergerfs.srcmounts:\ /tmp/a:/tmp/b
|
|
user.mergerfs.minfreespace:\ 4294967295
|
|
user.mergerfs.moveonenospc:\ false
|
|
\&...
|
|
|
|
[trapexit:/tmp/mount]\ $\ xattr\ \-p\ user.mergerfs.category.search\ .mergerfs
|
|
ff
|
|
|
|
[trapexit:/tmp/mount]\ $\ xattr\ \-w\ user.mergerfs.category.search\ newest\ .mergerfs
|
|
[trapexit:/tmp/mount]\ $\ xattr\ \-p\ user.mergerfs.category.search\ .mergerfs
|
|
newest
|
|
|
|
[trapexit:/tmp/mount]\ $\ xattr\ \-w\ user.mergerfs.srcmounts\ +/tmp/c\ .mergerfs
|
|
[trapexit:/tmp/mount]\ $\ xattr\ \-p\ user.mergerfs.srcmounts\ .mergerfs
|
|
/tmp/a:/tmp/b:/tmp/c
|
|
|
|
[trapexit:/tmp/mount]\ $\ xattr\ \-w\ user.mergerfs.srcmounts\ =/tmp/c\ .mergerfs
|
|
[trapexit:/tmp/mount]\ $\ xattr\ \-p\ user.mergerfs.srcmounts\ .mergerfs
|
|
/tmp/c
|
|
|
|
[trapexit:/tmp/mount]\ $\ xattr\ \-w\ user.mergerfs.srcmounts\ \[aq]+</tmp/a:/tmp/b\[aq]\ .mergerfs
|
|
[trapexit:/tmp/mount]\ $\ xattr\ \-p\ user.mergerfs.srcmounts\ .mergerfs
|
|
/tmp/a:/tmp/b:/tmp/c
|
|
\f[]
|
|
.fi
|
|
.SS file / directory xattrs
|
|
.PP
|
|
While they won\[aq]t show up when using
|
|
listxattr (http://linux.die.net/man/2/listxattr) \f[B]mergerfs\f[]
|
|
offers a number of special xattrs to query information about the files
|
|
served.
|
|
To access the values you will need to issue a
|
|
getxattr (http://linux.die.net/man/2/getxattr) for one of the following:
|
|
.IP \[bu] 2
|
|
\f[B]user.mergerfs.basepath:\f[] the base mount point for the file given
|
|
the current getattr policy
|
|
.IP \[bu] 2
|
|
\f[B]user.mergerfs.relpath:\f[] the relative path of the file from the
|
|
perspective of the mount point
|
|
.IP \[bu] 2
|
|
\f[B]user.mergerfs.fullpath:\f[] the full path of the original file
|
|
given the getattr policy
|
|
.IP \[bu] 2
|
|
\f[B]user.mergerfs.allpaths:\f[] a NUL (\[aq]\[aq]) separated list of
|
|
full paths to all files found
|
|
.IP
|
|
.nf
|
|
\f[C]
|
|
[trapexit:/tmp/mount]\ $\ ls
|
|
A\ B\ C
|
|
[trapexit:/tmp/mount]\ $\ xattr\ \-p\ user.mergerfs.fullpath\ A
|
|
/mnt/a/full/path/to/A
|
|
[trapexit:/tmp/mount]\ $\ xattr\ \-p\ user.mergerfs.basepath\ A
|
|
/mnt/a
|
|
[trapexit:/tmp/mount]\ $\ xattr\ \-p\ user.mergerfs.relpath\ A
|
|
/full/path/to/A
|
|
[trapexit:/tmp/mount]\ $\ xattr\ \-p\ user.mergerfs.allpaths\ A\ |\ tr\ \[aq]\\0\[aq]\ \[aq]\\n\[aq]
|
|
/mnt/a/full/path/to/A
|
|
/mnt/b/full/path/to/A
|
|
\f[]
|
|
.fi
|
|
.SH TOOLING
|
|
.IP \[bu] 2
|
|
https://github.com/trapexit/mergerfs\-tools
|
|
.IP \[bu] 2
|
|
mergerfs.ctl: A tool to make it easier to query and configure mergerfs
|
|
at runtime
|
|
.IP \[bu] 2
|
|
mergerfs.fsck: Provides permissions and ownership auditing and the
|
|
ability to fix them
|
|
.IP \[bu] 2
|
|
mergerfs.dedup: Will help identify and optionally remove duplicate files
|
|
.IP \[bu] 2
|
|
mergerfs.balance: Rebalance files across drives by moving them from the
|
|
most filled to the least filled
|
|
.IP \[bu] 2
|
|
mergerfs.mktrash: Creates FreeDesktop.org Trash specification compatible
|
|
directories on a mergerfs mount
|
|
.IP \[bu] 2
|
|
https://github.com/trapexit/scorch
|
|
.IP \[bu] 2
|
|
scorch: A tool to help discover silent corruption of files
|
|
.IP \[bu] 2
|
|
https://github.com/trapexit/bbf
|
|
.IP \[bu] 2
|
|
bbf (bad block finder): a tool to scan for and \[aq]fix\[aq] hard drive
|
|
bad blocks and find the files using those blocks
|
|
.SH TIPS / NOTES
|
|
.IP \[bu] 2
|
|
The recommended options are
|
|
\f[B]defaults,allow_other,direct_io,use_ino\f[].
|
|
.IP \[bu] 2
|
|
Run mergerfs as \f[C]root\f[] unless you\[aq]re merging paths which are
|
|
owned by the same user otherwise strange permission issues may arise.
|
|
.IP \[bu] 2
|
|
https://github.com/trapexit/backup\-and\-recovery\-howtos : A set of
|
|
guides / howtos on creating a data storage system, backing it up,
|
|
maintaining it, and recovering from failure.
|
|
.IP \[bu] 2
|
|
If you don\[aq]t see some directories and files you expect in a merged
|
|
point or policies seem to skip drives be sure the user has permission to
|
|
all the underlying directories.
|
|
Use \f[C]mergerfs.fsck\f[] to audit the drive for out of sync
|
|
permissions.
|
|
.IP \[bu] 2
|
|
Do \f[I]not\f[] use \f[C]direct_io\f[] if you expect applications (such
|
|
as rtorrent) to mmap (http://linux.die.net/man/2/mmap) files.
|
|
It is not currently supported in FUSE w/ \f[C]direct_io\f[] enabled.
|
|
.IP \[bu] 2
|
|
Since POSIX gives you only error or success on calls its difficult to
|
|
determine the proper behavior when applying the behavior to multiple
|
|
targets.
|
|
\f[B]mergerfs\f[] will return an error only if all attempts of an action
|
|
fail.
|
|
Any success will lead to a success returned.
|
|
This means however that some odd situations may arise.
|
|
.IP \[bu] 2
|
|
Kodi (http://kodi.tv), Plex (http://plex.tv),
|
|
Subsonic (http://subsonic.org), etc.
|
|
can use directory mtime (http://linux.die.net/man/2/stat) to more
|
|
efficiently determine whether to scan for new content rather than simply
|
|
performing a full scan.
|
|
If using the default \f[B]getattr\f[] policy of \f[B]ff\f[] its possible
|
|
\f[B]Kodi\f[] will miss an update on account of it returning the first
|
|
directory found\[aq]s \f[B]stat\f[] info and its a later directory on
|
|
another mount which had the \f[B]mtime\f[] recently updated.
|
|
To fix this you will want to set \f[B]func.getattr=newest\f[].
|
|
Remember though that this is just \f[B]stat\f[].
|
|
If the file is later \f[B]open\f[]\[aq]ed or \f[B]unlink\f[]\[aq]ed and
|
|
the policy is different for those then a completely different file or
|
|
directory could be acted on.
|
|
.IP \[bu] 2
|
|
Some policies mixed with some functions may result in strange behaviors.
|
|
Not that some of these behaviors and race conditions couldn\[aq]t happen
|
|
outside \f[B]mergerfs\f[] but that they are far more likely to occur on
|
|
account of attempt to merge together multiple sources of data which
|
|
could be out of sync due to the different policies.
|
|
.IP \[bu] 2
|
|
For consistency its generally best to set \f[B]category\f[] wide
|
|
policies rather than individual \f[B]func\f[]\[aq]s.
|
|
This will help limit the confusion of tools such as
|
|
rsync (http://linux.die.net/man/1/rsync).
|
|
However, the flexibility is there if needed.
|
|
.SH KNOWN ISSUES / BUGS
|
|
.SS directory mtime is not being updated
|
|
.PP
|
|
Remember that the default policy for \f[C]getattr\f[] is \f[C]ff\f[].
|
|
The information for the first directory found will be returned.
|
|
If it wasn\[aq]t the directory which had been updated then it will
|
|
appear outdated.
|
|
.PP
|
|
The reason this is the default is because any other policy would be far
|
|
more expensive and for many applications it is unnecessary.
|
|
To always return the directory with the most recent mtime or a faked
|
|
value based on all found would require a scan of all drives.
|
|
That alone is far more expensive than \f[C]ff\f[] but would also
|
|
possibly spin up sleeping drives.
|
|
.PP
|
|
If you always want the directory information from the one with the most
|
|
recent mtime then use the \f[C]newest\f[] policy for \f[C]getattr\f[].
|
|
.SS cached memory appears greater than it should be
|
|
.PP
|
|
Use the \f[C]direct_io\f[] option as described above.
|
|
Due to what mergerfs is doing there ends up being two caches of a file
|
|
under normal usage.
|
|
One from the underlying filesystem and one from mergerfs.
|
|
Enabling \f[C]direct_io\f[] removes the mergerfs cache.
|
|
This saves on memory but means the kernel needs to communicate with
|
|
mergerfs more often and can therefore result in slower speeds.
|
|
.PP
|
|
Since enabling \f[C]direct_io\f[] disables \f[C]mmap\f[] this is not an
|
|
ideal situation however write speeds should be increased.
|
|
.PP
|
|
If \f[C]direct_io\f[] is disabled it is probably a good idea to enable
|
|
\f[C]dropcacheonclose\f[] to minimize double caching.
|
|
.SS NFS clients don\[aq]t work
|
|
.PP
|
|
Some NFS clients appear to fail when a mergerfs mount is exported.
|
|
Kodi in particular seems to have issues.
|
|
.PP
|
|
Try enabling the \f[C]use_ino\f[] option.
|
|
Some have reported that it fixes the issue.
|
|
.SS rtorrent fails with ENODEV (No such device)
|
|
.PP
|
|
Be sure to turn off \f[C]direct_io\f[].
|
|
rtorrent and some other applications use
|
|
mmap (http://linux.die.net/man/2/mmap) to read and write to files and
|
|
offer no failback to traditional methods.
|
|
FUSE does not currently support mmap while using \f[C]direct_io\f[].
|
|
There will be a performance penalty on writes with \f[C]direct_io\f[]
|
|
off as well as the problem of double caching but it\[aq]s the only way
|
|
to get such applications to work.
|
|
If the performance loss is too high for other apps you can mount
|
|
mergerfs twice.
|
|
Once with \f[C]direct_io\f[] enabled and one without it.
|
|
.SS mmap performance is really bad
|
|
.PP
|
|
There is a bug (https://lkml.org/lkml/2016/3/16/260) in caching which
|
|
affects overall performance of mmap through FUSE in Linux 4.x kernels.
|
|
It is fixed in 4.4.10 and 4.5.4 (https://lkml.org/lkml/2016/5/11/59).
|
|
.SS When a program tries to move or rename a file it fails
|
|
.PP
|
|
Please read the section above regarding rename & link (#rename--link).
|
|
.PP
|
|
The problem is that many applications do not properly handle
|
|
\f[C]EXDEV\f[] errors which \f[C]rename\f[] and \f[C]link\f[] may return
|
|
even though they are perfectly valid situations which do not indicate
|
|
actual drive or OS errors.
|
|
The error will only be returned by mergerfs if using a path preserving
|
|
policy as described in the policy section above.
|
|
If you do not care about path preservation simply change the mergerfs
|
|
policy to the non\-path preserving version.
|
|
For example: \f[C]\-o\ category.create=mfs\f[]
|
|
.PP
|
|
Ideally the offending software would be fixed and it is recommended that
|
|
if you run into this problem you contact the software\[aq]s author and
|
|
request proper handling of \f[C]EXDEV\f[] errors.
|
|
.SS Samba: Moving files / directories fails
|
|
.PP
|
|
Workaround: Copy the file/directory and then remove the original rather
|
|
than move.
|
|
.PP
|
|
This isn\[aq]t an issue with Samba but some SMB clients.
|
|
GVFS\-fuse v1.20.3 and prior (found in Ubuntu 14.04 among others) failed
|
|
to handle certain error codes correctly.
|
|
Particularly \f[B]STATUS_NOT_SAME_DEVICE\f[] which comes from the
|
|
\f[B]EXDEV\f[] which is returned by \f[B]rename\f[] when the call is
|
|
crossing mount points.
|
|
When a program gets an \f[B]EXDEV\f[] it needs to explicitly take an
|
|
alternate action to accomplish it\[aq]s goal.
|
|
In the case of \f[B]mv\f[] or similar it tries \f[B]rename\f[] and on
|
|
\f[B]EXDEV\f[] falls back to a manual copying of data between the two
|
|
locations and unlinking the source.
|
|
In these older versions of GVFS\-fuse if it received \f[B]EXDEV\f[] it
|
|
would translate that into \f[B]EIO\f[].
|
|
This would cause \f[B]mv\f[] or most any application attempting to move
|
|
files around on that SMB share to fail with a IO error.
|
|
.PP
|
|
GVFS\-fuse v1.22.0 (https://bugzilla.gnome.org/show_bug.cgi?id=734568)
|
|
and above fixed this issue but a large number of systems use the older
|
|
release.
|
|
On Ubuntu the version can be checked by issuing
|
|
\f[C]apt\-cache\ showpkg\ gvfs\-fuse\f[].
|
|
Most distros released in 2015 seem to have the updated release and will
|
|
work fine but older systems may not.
|
|
Upgrading gvfs\-fuse or the distro in general will address the problem.
|
|
.PP
|
|
In Apple\[aq]s MacOSX 10.9 they replaced Samba (client and server) with
|
|
their own product.
|
|
It appears their new client does not handle \f[B]EXDEV\f[] either and
|
|
responds similar to older release of gvfs on Linux.
|
|
.SS Trashing files occasionally fails
|
|
.PP
|
|
This is the same issue as with Samba.
|
|
\f[C]rename\f[] returns \f[C]EXDEV\f[] (in our case that will really
|
|
only happen with path preserving policies like \f[C]epmfs\f[]) and the
|
|
software doesn\[aq]t handle the situtation well.
|
|
This is unfortunately a common failure of software which moves files
|
|
around.
|
|
The standard indicates that an implementation \f[C]MAY\f[] choose to
|
|
support non\-user home directory trashing of files (which is a
|
|
\f[C]MUST\f[]).
|
|
The implementation \f[C]MAY\f[] also support "top directory trashes"
|
|
which many probably do.
|
|
.PP
|
|
To create a \f[C]$topdir/.Trash\f[] directory as defined in the standard
|
|
use the mergerfs\-tools (https://github.com/trapexit/mergerfs-tools)
|
|
tool \f[C]mergerfs.mktrash\f[].
|
|
.SS Supplemental user groups
|
|
.PP
|
|
Due to the overhead of
|
|
getgroups/setgroups (http://linux.die.net/man/2/setgroups) mergerfs
|
|
utilizes a cache.
|
|
This cache is opportunistic and per thread.
|
|
Each thread will query the supplemental groups for a user when that
|
|
particular thread needs to change credentials and will keep that data
|
|
for the lifetime of the thread.
|
|
This means that if a user is added to a group it may not be picked up
|
|
without the restart of mergerfs.
|
|
However, since the high level FUSE API\[aq]s (at least the standard
|
|
version) thread pool dynamically grows and shrinks it\[aq]s possible
|
|
that over time a thread will be killed and later a new thread with no
|
|
cache will start and query the new data.
|
|
.PP
|
|
The gid cache uses fixed storage to simplify the design and be
|
|
compatible with older systems which may not have C++11 compilers.
|
|
There is enough storage for 256 users\[aq] supplemental groups.
|
|
Each user is allowed upto 32 supplemental groups.
|
|
Linux >= 2.6.3 allows upto 65535 groups per user but most other *nixs
|
|
allow far less.
|
|
NFS allowing only 16.
|
|
The system does handle overflow gracefully.
|
|
If the user has more than 32 supplemental groups only the first 32 will
|
|
be used.
|
|
If more than 256 users are using the system when an uncached user is
|
|
found it will evict an existing user\[aq]s cache at random.
|
|
So long as there aren\[aq]t more than 256 active users this should be
|
|
fine.
|
|
If either value is too low for your needs you will have to modify
|
|
\f[C]gidcache.hpp\f[] to increase the values.
|
|
Note that doing so will increase the memory needed by each thread.
|
|
.SS mergerfs or libfuse crashing
|
|
.PP
|
|
If suddenly the mergerfs mount point disappears and
|
|
\f[C]Transport\ endpoint\ is\ not\ connected\f[] is returned when
|
|
attempting to perform actions within the mount directory \f[B]and\f[]
|
|
the version of libfuse (use \f[C]mergerfs\ \-v\f[] to find the version)
|
|
is older than \f[C]2.9.4\f[] its likely due to a bug in libfuse.
|
|
Affected versions of libfuse can be found in Debian Wheezy, Ubuntu
|
|
Precise and others.
|
|
.PP
|
|
In order to fix this please install newer versions of libfuse.
|
|
If using a Debian based distro (Debian,Ubuntu,Mint) you can likely just
|
|
install newer versions of
|
|
libfuse (https://packages.debian.org/unstable/libfuse2) and
|
|
fuse (https://packages.debian.org/unstable/fuse) from the repo of a
|
|
newer release.
|
|
.SS mergerfs appears to be crashing or exiting
|
|
.PP
|
|
There seems to be an issue with Linux version \f[C]4.9.0\f[] and above
|
|
in which an invalid message appears to be transmitted to libfuse (used
|
|
by mergerfs) causing it to exit.
|
|
No messages will be printed in any logs as its not a proper crash.
|
|
Debugging of the issue is still ongoing and can be followed via the
|
|
fuse\-devel
|
|
thread (https://sourceforge.net/p/fuse/mailman/message/35662577).
|
|
.SS mergerfs under heavy load and memory preasure leads to kernel panic
|
|
.PP
|
|
https://lkml.org/lkml/2016/9/14/527
|
|
.IP
|
|
.nf
|
|
\f[C]
|
|
[25192.515454]\ kernel\ BUG\ at\ /build/linux\-a2WvEb/linux\-4.4.0/mm/workingset.c:346!
|
|
[25192.517521]\ invalid\ opcode:\ 0000\ [#1]\ SMP
|
|
[25192.519602]\ Modules\ linked\ in:\ netconsole\ ip6t_REJECT\ nf_reject_ipv6\ ipt_REJECT\ nf_reject_ipv4\ configfs\ binfmt_misc\ veth\ bridge\ stp\ llc\ nf_conntrack_ipv6\ nf_defrag_ipv6\ xt_conntrack\ ip6table_filter\ ip6_tables\ xt_multiport\ iptable_filter\ ipt_MASQUERADE\ nf_nat_masquerade_ipv4\ xt_comment\ xt_nat\ iptable_nat\ nf_conntrack_ipv4\ nf_defrag_ipv4\ nf_nat_ipv4\ nf_nat\ nf_conntrack\ xt_CHECKSUM\ xt_tcpudp\ iptable_mangle\ ip_tables\ x_tables\ intel_rapl\ x86_pkg_temp_thermal\ intel_powerclamp\ eeepc_wmi\ asus_wmi\ coretemp\ sparse_keymap\ kvm_intel\ ppdev\ kvm\ irqbypass\ mei_me\ 8250_fintek\ input_leds\ serio_raw\ parport_pc\ tpm_infineon\ mei\ shpchp\ mac_hid\ parport\ lpc_ich\ autofs4\ drbg\ ansi_cprng\ dm_crypt\ algif_skcipher\ af_alg\ btrfs\ raid456\ async_raid6_recov\ async_memcpy\ async_pq\ async_xor\ async_tx\ xor\ raid6_pq\ libcrc32c\ raid0\ multipath\ linear\ raid10\ raid1\ i915\ crct10dif_pclmul\ crc32_pclmul\ aesni_intel\ i2c_algo_bit\ aes_x86_64\ drm_kms_helper\ lrw\ gf128mul\ glue_helper\ ablk_helper\ syscopyarea\ cryptd\ sysfillrect\ sysimgblt\ fb_sys_fops\ drm\ ahci\ r8169\ libahci\ mii\ wmi\ fjes\ video\ [last\ unloaded:\ netconsole]
|
|
[25192.540910]\ CPU:\ 2\ PID:\ 63\ Comm:\ kswapd0\ Not\ tainted\ 4.4.0\-36\-generic\ #55\-Ubuntu
|
|
[25192.543411]\ Hardware\ name:\ System\ manufacturer\ System\ Product\ Name/P8H67\-M\ PRO,\ BIOS\ 3904\ 04/27/2013
|
|
[25192.545840]\ task:\ ffff88040cae6040\ ti:\ ffff880407488000\ task.ti:\ ffff880407488000
|
|
[25192.548277]\ RIP:\ 0010:[<ffffffff811ba501>]\ \ [<ffffffff811ba501>]\ shadow_lru_isolate+0x181/0x190
|
|
[25192.550706]\ RSP:\ 0018:ffff88040748bbe0\ \ EFLAGS:\ 00010002
|
|
[25192.553127]\ RAX:\ 0000000000001c81\ RBX:\ ffff8802f91ee928\ RCX:\ ffff8802f91eeb38
|
|
[25192.555544]\ RDX:\ ffff8802f91ee938\ RSI:\ ffff8802f91ee928\ RDI:\ ffff8804099ba2c0
|
|
[25192.557914]\ RBP:\ ffff88040748bc08\ R08:\ 000000000001a7b6\ R09:\ 000000000000003f
|
|
[25192.560237]\ R10:\ 000000000001a750\ R11:\ 0000000000000000\ R12:\ ffff8804099ba2c0
|
|
[25192.562512]\ R13:\ ffff8803157e9680\ R14:\ ffff8803157e9668\ R15:\ ffff8804099ba2c8
|
|
[25192.564724]\ FS:\ \ 0000000000000000(0000)\ GS:ffff88041f280000(0000)\ knlGS:0000000000000000
|
|
[25192.566990]\ CS:\ \ 0010\ DS:\ 0000\ ES:\ 0000\ CR0:\ 0000000080050033
|
|
[25192.569201]\ CR2:\ 00007ffabb690000\ CR3:\ 0000000001e0a000\ CR4:\ 00000000000406e0
|
|
[25192.571419]\ Stack:
|
|
[25192.573550]\ \ ffff8804099ba2c0\ ffff88039e4f86f0\ ffff8802f91ee928\ ffff8804099ba2c8
|
|
[25192.575695]\ \ ffff88040748bd08\ ffff88040748bc58\ ffffffff811b99bf\ 0000000000000052
|
|
[25192.577814]\ \ 0000000000000000\ ffffffff811ba380\ 000000000000008a\ 0000000000000080
|
|
[25192.579947]\ Call\ Trace:
|
|
[25192.582022]\ \ [<ffffffff811b99bf>]\ __list_lru_walk_one.isra.3+0x8f/0x130
|
|
[25192.584137]\ \ [<ffffffff811ba380>]\ ?\ memcg_drain_all_list_lrus+0x190/0x190
|
|
[25192.586165]\ \ [<ffffffff811b9a83>]\ list_lru_walk_one+0x23/0x30
|
|
[25192.588145]\ \ [<ffffffff811ba544>]\ scan_shadow_nodes+0x34/0x50
|
|
[25192.590074]\ \ [<ffffffff811a0e9d>]\ shrink_slab.part.40+0x1ed/0x3d0
|
|
[25192.591985]\ \ [<ffffffff811a53da>]\ shrink_zone+0x2ca/0x2e0
|
|
[25192.593863]\ \ [<ffffffff811a64ce>]\ kswapd+0x51e/0x990
|
|
[25192.595737]\ \ [<ffffffff811a5fb0>]\ ?\ mem_cgroup_shrink_node_zone+0x1c0/0x1c0
|
|
[25192.597613]\ \ [<ffffffff810a0808>]\ kthread+0xd8/0xf0
|
|
[25192.599495]\ \ [<ffffffff810a0730>]\ ?\ kthread_create_on_node+0x1e0/0x1e0
|
|
[25192.601335]\ \ [<ffffffff8182e34f>]\ ret_from_fork+0x3f/0x70
|
|
[25192.603193]\ \ [<ffffffff810a0730>]\ ?\ kthread_create_on_node+0x1e0/0x1e0
|
|
\f[]
|
|
.fi
|
|
.PP
|
|
There is a bug in the kernel.
|
|
A work around appears to be turning off \f[C]splice\f[].
|
|
Add \f[C]no_splice_write,no_splice_move,no_splice_read\f[] to
|
|
mergerfs\[aq] options.
|
|
Should be placed after \f[C]defaults\f[] if it is used since it will
|
|
turn them on.
|
|
This however is not guaranteed to work.
|
|
.SH FAQ
|
|
.SS Why use mergerfs over mhddfs?
|
|
.PP
|
|
mhddfs is no longer maintained and has some known stability and security
|
|
issues (see below).
|
|
MergerFS provides a superset of mhddfs\[aq] features and should offer
|
|
the same or maybe better performance.
|
|
.PP
|
|
If you wish to get similar behavior to mhddfs from mergerfs then set
|
|
\f[C]category.create=ff\f[].
|
|
.SS Why use mergerfs over aufs?
|
|
.PP
|
|
While aufs can offer better peak performance mergerfs provides more
|
|
configurability and is generally easier to use.
|
|
mergerfs however does not offer the overlay / copy\-on\-write (COW)
|
|
features which aufs and overlayfs have.
|
|
.SS Why use mergerfs over LVM/ZFS/BTRFS/RAID0 drive concatenation /
|
|
striping?
|
|
.PP
|
|
With simple JBOD / drive concatenation / stripping / RAID0 a single
|
|
drive failure will result in full pool failure.
|
|
mergerfs performs a similar behavior without the possibility of
|
|
catastrophic failure and difficulties in recovery.
|
|
Drives may fail however all other data will continue to be accessable.
|
|
.PP
|
|
When combined with something like SnapRaid (http://www.snapraid.it)
|
|
and/or an offsite backup solution you can have the flexibilty of JBOD
|
|
without the single point of failure.
|
|
.SS Why use mergerfs over ZFS?
|
|
.PP
|
|
MergerFS is not intended to be a replacement for ZFS.
|
|
MergerFS is intended to provide flexible pooling of arbitrary drives
|
|
(local or remote), of arbitrary sizes, and arbitrary filesystems.
|
|
For \f[C]write\ once,\ read\ many\f[] usecases such as bulk media
|
|
storage.
|
|
Where data integrity and backup is managed in other ways.
|
|
In that situation ZFS can introduce major maintance and cost burdens as
|
|
described
|
|
here (http://louwrentius.com/the-hidden-cost-of-using-zfs-for-your-home-nas.html).
|
|
.SS Can drives be written to directly? Outside of mergerfs while pooled?
|
|
.PP
|
|
Yes.
|
|
It will be represented immediately in the pool as the policies
|
|
perscribe.
|
|
.SS Why do I get an "out of space" error even though the system says
|
|
there\[aq]s lots of space left?
|
|
.PP
|
|
First make sure you\[aq]ve read the sections above about policies, path
|
|
preserving, and the \f[B]moveonenospc\f[] option.
|
|
.PP
|
|
Remember that mergerfs is simply presenting a logical merging of the
|
|
contents of the pooled drives.
|
|
The reported free space is the aggregate space available \f[B]not\f[]
|
|
the contiguous space available.
|
|
MergerFS does not split files across drives.
|
|
If the writing of a file fills a drive and \f[B]moveonenospc\f[] is
|
|
disabled it will return an ENOSPC error.
|
|
.PP
|
|
If \f[B]moveonenospc\f[] is enabled but there exists no drives with
|
|
enough space for the file and the data to be written (or the drive
|
|
happened to fill up as the file was being moved) it will error
|
|
indicating there isn\[aq]t enough space.
|
|
.PP
|
|
It is also possible that the filesystem selected has run out of inodes.
|
|
Use \f[C]df\ \-i\f[] to list the total and available inodes per
|
|
filesystem.
|
|
In the future it might be worth considering the number of inodes
|
|
available when making placement decisions in order to minimize this
|
|
situation.
|
|
.SS Can mergerfs mounts be exported over NFS?
|
|
.PP
|
|
Yes.
|
|
Some clients (Kodi) have issues in which the contents of the NFS mount
|
|
will not be presented but users have found that enabling the
|
|
\f[C]use_ino\f[] option often fixes that problem.
|
|
.SS Can mergerfs mounts be exported over Samba / SMB?
|
|
.PP
|
|
Yes.
|
|
.SS How are inodes calculated?
|
|
.PP
|
|
mergerfs\-inode = (original\-inode | (device\-id << 32))
|
|
.PP
|
|
While \f[C]ino_t\f[] is 64 bits only a few filesystems use more than 32.
|
|
Similarly, while \f[C]dev_t\f[] is also 64 bits it was traditionally 16
|
|
bits.
|
|
Bitwise or\[aq]ing them together should work most of the time.
|
|
While totally unique inodes are preferred the overhead which would be
|
|
needed does not seem to outweighted by the benefits.
|
|
.SS It\[aq]s mentioned that there are some security issues with mhddfs.
|
|
What are they? How does mergerfs address them?
|
|
.PP
|
|
mhddfs (https://github.com/trapexit/mhddfs) manages running as
|
|
\f[B]root\f[] by calling
|
|
getuid() (https://github.com/trapexit/mhddfs/blob/cae96e6251dd91e2bdc24800b4a18a74044f6672/src/main.c#L319)
|
|
and if it returns \f[B]0\f[] then it will
|
|
chown (http://linux.die.net/man/1/chown) the file.
|
|
Not only is that a race condition but it doesn\[aq]t handle many other
|
|
situations.
|
|
Rather than attempting to simulate POSIX ACL behavior the proper way to
|
|
manage this is to use seteuid (http://linux.die.net/man/2/seteuid) and
|
|
setegid (http://linux.die.net/man/2/setegid), in effect becoming the
|
|
user making the original call, and perform the action as them.
|
|
This is what mergerfs does.
|
|
.PP
|
|
In Linux setreuid syscalls apply only to the thread.
|
|
GLIBC hides this away by using realtime signals to inform all threads to
|
|
change credentials.
|
|
Taking after \f[B]Samba\f[], mergerfs uses
|
|
\f[B]syscall(SYS_setreuid,...)\f[] to set the callers credentials for
|
|
that thread only.
|
|
Jumping back to \f[B]root\f[] as necessary should escalated privileges
|
|
be needed (for instance: to clone paths between drives).
|
|
.PP
|
|
For non\-Linux systems mergerfs uses a read\-write lock and changes
|
|
credentials only when necessary.
|
|
If multiple threads are to be user X then only the first one will need
|
|
to change the processes credentials.
|
|
So long as the other threads need to be user X they will take a readlock
|
|
allowing multiple threads to share the credentials.
|
|
Once a request comes in to run as user Y that thread will attempt a
|
|
write lock and change to Y\[aq]s credentials when it can.
|
|
If the ability to give writers priority is supported then that flag will
|
|
be used so threads trying to change credentials don\[aq]t starve.
|
|
This isn\[aq]t the best solution but should work reasonably well
|
|
assuming there are few users.
|
|
.SH SUPPORT
|
|
.SS Issues with the software
|
|
.IP \[bu] 2
|
|
github.com: https://github.com/trapexit/mergerfs/issues
|
|
.IP \[bu] 2
|
|
email: trapexit\@spawn.link
|
|
.IP \[bu] 2
|
|
twitter: https://twitter.com/_trapexit
|
|
.SS Support development
|
|
.IP \[bu] 2
|
|
Gratipay: https://gratipay.com/~trapexit
|
|
.IP \[bu] 2
|
|
BitCoin: 12CdMhEPQVmjz3SSynkAEuD5q9JmhTDCZA
|
|
.SH LINKS
|
|
.IP \[bu] 2
|
|
http://github.com/trapexit/mergerfs
|
|
.IP \[bu] 2
|
|
http://github.com/trapexit/mergerfs\-tools
|
|
.IP \[bu] 2
|
|
http://github.com/trapexit/scorch
|
|
.IP \[bu] 2
|
|
http://github.com/trapexit/backup\-and\-recovery\-howtos
|
|
.SH AUTHORS
|
|
Antonio SJ Musumeci <trapexit@spawn.link>.
|