mirror of https://github.com/trapexit/mergerfs.git
You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
2156 lines
82 KiB
2156 lines
82 KiB
.\"t
|
|
.\" Automatically generated by Pandoc 1.19.2.4
|
|
.\"
|
|
.TH "mergerfs" "1" "2019\-06\-03" "mergerfs user manual" ""
|
|
.hy
|
|
.SH NAME
|
|
.PP
|
|
mergerfs \- a featureful union filesystem
|
|
.SH SYNOPSIS
|
|
.PP
|
|
mergerfs \-o<options> <branches> <mountpoint>
|
|
.SH DESCRIPTION
|
|
.PP
|
|
\f[B]mergerfs\f[] is a union filesystem geared towards simplifying
|
|
storage and management of files across numerous commodity storage
|
|
devices.
|
|
It is similar to \f[B]mhddfs\f[], \f[B]unionfs\f[], and \f[B]aufs\f[].
|
|
.SH FEATURES
|
|
.IP \[bu] 2
|
|
Runs in userspace (FUSE)
|
|
.IP \[bu] 2
|
|
Configurable behaviors / file placement
|
|
.IP \[bu] 2
|
|
Support for extended attributes (xattrs)
|
|
.IP \[bu] 2
|
|
Support for file attributes (chattr)
|
|
.IP \[bu] 2
|
|
Runtime configurable (via xattrs)
|
|
.IP \[bu] 2
|
|
Safe to run as root
|
|
.IP \[bu] 2
|
|
Opportunistic credential caching
|
|
.IP \[bu] 2
|
|
Works with heterogeneous filesystem types
|
|
.IP \[bu] 2
|
|
Handling of writes to full drives (transparently move file to drive with
|
|
capacity)
|
|
.IP \[bu] 2
|
|
Handles pool of read\-only and read/write drives
|
|
.IP \[bu] 2
|
|
Can turn read\-only files into symlinks to underlying file
|
|
.IP \[bu] 2
|
|
Hard link copy\-on\-write / CoW
|
|
.IP \[bu] 2
|
|
supports POSIX ACLs
|
|
.SH How it works
|
|
.PP
|
|
mergerfs logically merges multiple paths together.
|
|
Think a union of sets.
|
|
The file/s or directory/s acted on or presented through mergerfs are
|
|
based on the policy chosen for that particular action.
|
|
Read more about policies below.
|
|
.IP
|
|
.nf
|
|
\f[C]
|
|
A\ \ \ \ \ \ \ \ \ +\ \ \ \ \ \ B\ \ \ \ \ \ \ \ =\ \ \ \ \ \ \ C
|
|
/disk1\ \ \ \ \ \ \ \ \ \ \ /disk2\ \ \ \ \ \ \ \ \ \ \ /merged
|
|
|\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ |\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ |
|
|
+\-\-\ /dir1\ \ \ \ \ \ \ \ +\-\-\ /dir1\ \ \ \ \ \ \ \ +\-\-\ /dir1
|
|
|\ \ \ |\ \ \ \ \ \ \ \ \ \ \ \ |\ \ \ |\ \ \ \ \ \ \ \ \ \ \ \ |\ \ \ |
|
|
|\ \ \ +\-\-\ file1\ \ \ \ |\ \ \ +\-\-\ file2\ \ \ \ |\ \ \ +\-\-\ file1
|
|
|\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ |\ \ \ +\-\-\ file3\ \ \ \ |\ \ \ +\-\-\ file2
|
|
+\-\-\ /dir2\ \ \ \ \ \ \ \ |\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ |\ \ \ +\-\-\ file3
|
|
|\ \ \ |\ \ \ \ \ \ \ \ \ \ \ \ +\-\-\ /dir3\ \ \ \ \ \ \ \ |
|
|
|\ \ \ +\-\-\ file4\ \ \ \ \ \ \ \ |\ \ \ \ \ \ \ \ \ \ \ \ +\-\-\ /dir2
|
|
|\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ +\-\-\ file5\ \ \ |\ \ \ |
|
|
+\-\-\ file6\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ |\ \ \ +\-\-\ file4
|
|
\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ |
|
|
\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ +\-\-\ /dir3
|
|
\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ |\ \ \ |
|
|
\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ |\ \ \ +\-\-\ file5
|
|
\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ |
|
|
\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ +\-\-\ file6
|
|
\f[]
|
|
.fi
|
|
.PP
|
|
mergerfs does \f[B]not\f[] support the copy\-on\-write (CoW) behavior
|
|
found in \f[B]aufs\f[] and \f[B]overlayfs\f[].
|
|
You can \f[B]not\f[] mount a read\-only filesystem and write to it.
|
|
However, mergerfs will ignore read\-only drives when creating new files
|
|
so you can mix read\-write and read\-only drives.
|
|
It also does \f[B]not\f[] split data across drives.
|
|
It is not RAID0 / striping.
|
|
It is simply a union.
|
|
.SH OPTIONS
|
|
.SS mount options
|
|
.IP \[bu] 2
|
|
\f[B]allow_other\f[]: A libfuse option which allows users besides the
|
|
one which ran mergerfs to see the filesystem.
|
|
This is required for most use\-cases.
|
|
.IP \[bu] 2
|
|
\f[B]minfreespace=SIZE\f[]: The minimum space value used for creation
|
|
policies.
|
|
Understands \[aq]K\[aq], \[aq]M\[aq], and \[aq]G\[aq] to represent
|
|
kilobyte, megabyte, and gigabyte respectively.
|
|
(default: 4G)
|
|
.IP \[bu] 2
|
|
\f[B]moveonenospc=BOOL\f[]: When enabled if a \f[B]write\f[] fails with
|
|
\f[B]ENOSPC\f[] or \f[B]EDQUOT\f[] a scan of all drives will be done
|
|
looking for the drive with the most free space which is at least the
|
|
size of the file plus the amount which failed to write.
|
|
An attempt to move the file to that drive will occur (keeping all
|
|
metadata possible) and if successful the original is unlinked and the
|
|
write retried.
|
|
(default: false)
|
|
.IP \[bu] 2
|
|
\f[B]use_ino\f[]: Causes mergerfs to supply file/directory inodes rather
|
|
than libfuse.
|
|
While not a default it is recommended it be enabled so that linked files
|
|
share the same inode value.
|
|
.IP \[bu] 2
|
|
\f[B]dropcacheonclose=BOOL\f[]: When a file is requested to be closed
|
|
call \f[C]posix_fadvise\f[] on it first to instruct the kernel that we
|
|
no longer need the data and it can drop its cache.
|
|
Recommended when \f[B]cache.files=partial|full|auto\-full\f[] to limit
|
|
double caching.
|
|
(default: false)
|
|
.IP \[bu] 2
|
|
\f[B]symlinkify=BOOL\f[]: When enabled and a file is not writable and
|
|
its mtime or ctime is older than \f[B]symlinkify_timeout\f[] files will
|
|
be reported as symlinks to the original files.
|
|
Please read more below before using.
|
|
(default: false)
|
|
.IP \[bu] 2
|
|
\f[B]symlinkify_timeout=INT\f[]: Time to wait, in seconds, to activate
|
|
the \f[B]symlinkify\f[] behavior.
|
|
(default: 3600)
|
|
.IP \[bu] 2
|
|
\f[B]nullrw=BOOL\f[]: Turns reads and writes into no\-ops.
|
|
The request will succeed but do nothing.
|
|
Useful for benchmarking mergerfs.
|
|
(default: false)
|
|
.IP \[bu] 2
|
|
\f[B]ignorepponrename=BOOL\f[]: Ignore path preserving on rename.
|
|
Typically rename and link act differently depending on the policy of
|
|
\f[C]create\f[] (read below).
|
|
Enabling this will cause rename and link to always use the non\-path
|
|
preserving behavior.
|
|
This means files, when renamed or linked, will stay on the same drive.
|
|
(default: false)
|
|
.IP \[bu] 2
|
|
\f[B]security_capability=BOOL\f[]: If false return ENOATTR when xattr
|
|
security.capability is queried.
|
|
(default: true)
|
|
.IP \[bu] 2
|
|
\f[B]xattr=passthrough|noattr|nosys\f[]: Runtime control of xattrs.
|
|
Default is to passthrough xattr requests.
|
|
\[aq]noattr\[aq] will short circuit as if nothing exists.
|
|
\[aq]nosys\[aq] will respond with ENOSYS as if xattrs are not supported
|
|
or disabled.
|
|
(default: passthrough)
|
|
.IP \[bu] 2
|
|
\f[B]link_cow=BOOL\f[]: When enabled if a regular file is opened which
|
|
has a link count > 1 it will copy the file to a temporary file and
|
|
rename over the original.
|
|
Breaking the link and providing a basic copy\-on\-write function similar
|
|
to cow\-shell.
|
|
(default: false)
|
|
.IP \[bu] 2
|
|
\f[B]statfs=base|full\f[]: Controls how statfs works.
|
|
\[aq]base\[aq] means it will always use all branches in statfs
|
|
calculations.
|
|
\[aq]full\[aq] is in effect path preserving and only includes drives
|
|
where the path exists.
|
|
(default: base)
|
|
.IP \[bu] 2
|
|
\f[B]statfs_ignore=none|ro|nc\f[]: \[aq]ro\[aq] will cause statfs
|
|
calculations to ignore available space for branches mounted or tagged as
|
|
\[aq]read\-only\[aq] or \[aq]no create\[aq].
|
|
\[aq]nc\[aq] will ignore available space for branches tagged as \[aq]no
|
|
create\[aq].
|
|
(default: none)
|
|
.IP \[bu] 2
|
|
\f[B]posix_acl=BOOL\f[]: Enable POSIX ACL support (if supported by
|
|
kernel and underlying filesystem).
|
|
(default: false)
|
|
.IP \[bu] 2
|
|
\f[B]async_read=BOOL\f[]: Perform reads asynchronously.
|
|
If disabled or unavailable the kernel will ensure there is at most one
|
|
pending read request per file handle and will attempt to order requests
|
|
by offset.
|
|
(default: true)
|
|
.IP \[bu] 2
|
|
\f[B]fuse_msg_size=INT\f[]: Set the max number of pages per FUSE
|
|
message.
|
|
Only available on Linux >= 4.20 and ignored otherwise.
|
|
(min: 1; max: 256; default: 256)
|
|
.IP \[bu] 2
|
|
\f[B]threads=INT\f[]: Number of threads to use in multithreaded mode.
|
|
When set to zero it will attempt to discover and use the number of
|
|
logical cores.
|
|
If the lookup fails it will fall back to using 4.
|
|
If the thread count is set negative it will look up the number of cores
|
|
then divide by the absolute value.
|
|
ie.
|
|
threads=\-2 on an 8 core machine will result in 8 / 2 = 4 threads.
|
|
There will always be at least 1 thread.
|
|
NOTE: higher number of threads increases parallelism but usually
|
|
decreases throughput.
|
|
(default: 0)
|
|
.IP \[bu] 2
|
|
\f[B]fsname=STR\f[]: Sets the name of the filesystem as seen in
|
|
\f[B]mount\f[], \f[B]df\f[], etc.
|
|
Defaults to a list of the source paths concatenated together with the
|
|
longest common prefix removed.
|
|
.IP \[bu] 2
|
|
\f[B]func.FUNC=POLICY\f[]: Sets the specific FUSE function\[aq]s policy.
|
|
See below for the list of value types.
|
|
Example: \f[B]func.getattr=newest\f[]
|
|
.IP \[bu] 2
|
|
\f[B]category.CATEGORY=POLICY\f[]: Sets policy of all FUSE functions in
|
|
the provided category.
|
|
See POLICIES section for defaults.
|
|
Example: \f[B]category.create=mfs\f[]
|
|
.IP \[bu] 2
|
|
\f[B]cache.open=INT\f[]: \[aq]open\[aq] policy cache timeout in seconds.
|
|
(default: 0)
|
|
.IP \[bu] 2
|
|
\f[B]cache.statfs=INT\f[]: \[aq]statfs\[aq] cache timeout in seconds.
|
|
(default: 0)
|
|
.IP \[bu] 2
|
|
\f[B]cache.attr=INT\f[]: File attribute cache timeout in seconds.
|
|
(default: 1)
|
|
.IP \[bu] 2
|
|
\f[B]cache.entry=INT\f[]: File name lookup cache timeout in seconds.
|
|
(default: 1)
|
|
.IP \[bu] 2
|
|
\f[B]cache.negative_entry=INT\f[]: Negative file name lookup cache
|
|
timeout in seconds.
|
|
(default: 0)
|
|
.IP \[bu] 2
|
|
\f[B]cache.files=libfuse|off|partial|full|auto\-full\f[]: File page
|
|
caching mode (default: libfuse)
|
|
.IP \[bu] 2
|
|
\f[B]cache.symlinks=BOOL\f[]: Cache symlinks (if supported by kernel)
|
|
(default: false)
|
|
.IP \[bu] 2
|
|
\f[B]cache.readdir=BOOL\f[]: Cache readdir (if supported by kernel)
|
|
(default: false)
|
|
.IP \[bu] 2
|
|
\f[B]direct_io\f[]: deprecated \- Bypass page cache.
|
|
Use \f[C]cache.files=off\f[] instead.
|
|
(default: false)
|
|
.IP \[bu] 2
|
|
\f[B]kernel_cache\f[]: deprecated \- Do not invalidate data cache on
|
|
file open.
|
|
Use \f[C]cache.files=full\f[] instead.
|
|
(default: false)
|
|
.IP \[bu] 2
|
|
\f[B]auto_cache\f[]: deprecated \- Invalidate data cache if file mtime
|
|
or size change.
|
|
Use \f[C]cache.files=auto\-full\f[] instead.
|
|
(default: false)
|
|
.IP \[bu] 2
|
|
\f[B]async_read\f[]: deprecated \- Perform reads asynchronously.
|
|
Use \f[C]async_read=true\f[] instead.
|
|
.IP \[bu] 2
|
|
\f[B]sync_read\f[]: deprecated \- Perform reads synchronously.
|
|
Use \f[C]async_read=false\f[] instead.
|
|
.PP
|
|
\f[B]NOTE:\f[] Options are evaluated in the order listed so if the
|
|
options are \f[B]func.rmdir=rand,category.action=ff\f[] the
|
|
\f[B]action\f[] category setting will override the \f[B]rmdir\f[]
|
|
setting.
|
|
.SS Value Types
|
|
.IP \[bu] 2
|
|
BOOL = \[aq]true\[aq] | \[aq]false\[aq]
|
|
.IP \[bu] 2
|
|
INT = [0,MAX_INT]
|
|
.IP \[bu] 2
|
|
SIZE = \[aq]NNM\[aq]; NN = INT, M = \[aq]K\[aq] | \[aq]M\[aq] |
|
|
\[aq]G\[aq] | \[aq]T\[aq]
|
|
.IP \[bu] 2
|
|
STR = string
|
|
.IP \[bu] 2
|
|
FUNC = FUSE function
|
|
.IP \[bu] 2
|
|
CATEGORY = FUSE function category
|
|
.IP \[bu] 2
|
|
POLICY = mergerfs function policy
|
|
.SS branches
|
|
.PP
|
|
The \[aq]branches\[aq] (formerly \[aq]srcmounts\[aq]) argument is a
|
|
colon (\[aq]:\[aq]) delimited list of paths to be pooled together.
|
|
It does not matter if the paths are on the same or different drives nor
|
|
does it matter the filesystem.
|
|
Used and available space will not be duplicated for paths on the same
|
|
device and any features which aren\[aq]t supported by the underlying
|
|
filesystem (such as file attributes or extended attributes) will return
|
|
the appropriate errors.
|
|
.PP
|
|
To make it easier to include multiple branches mergerfs supports
|
|
globbing (http://linux.die.net/man/7/glob).
|
|
\f[B]The globbing tokens MUST be escaped when using via the shell else
|
|
the shell itself will apply the glob itself.\f[]
|
|
.PP
|
|
Each branch can have a suffix of \f[C]=RW\f[] (read / write),
|
|
\f[C]=RO\f[] (read\-only), or \f[C]=NC\f[] (no create).
|
|
These suffixes work with globs as well and will apply to each path
|
|
found.
|
|
\f[C]RW\f[] is the default behavior and those paths will be eligible for
|
|
all policy categories.
|
|
\f[C]RO\f[] will exclude those paths from \f[C]create\f[] and
|
|
\f[C]action\f[] policies (just as a filesystem being mounted \f[C]ro\f[]
|
|
would).
|
|
\f[C]NC\f[] will exclude those paths from \f[C]create\f[] policies (you
|
|
can\[aq]t create but you can change / delete).
|
|
.IP
|
|
.nf
|
|
\f[C]
|
|
#\ mergerfs\ \-o\ allow_other,use_ino\ /mnt/disk\\*:/mnt/cdrom\ /media/drives
|
|
\f[]
|
|
.fi
|
|
.PP
|
|
The above line will use all mount points in /mnt prefixed with
|
|
\f[B]disk\f[] and the \f[B]cdrom\f[].
|
|
.PP
|
|
To have the pool mounted at boot or otherwise accessable from related
|
|
tools use \f[B]/etc/fstab\f[].
|
|
.IP
|
|
.nf
|
|
\f[C]
|
|
#\ <file\ system>\ \ \ \ \ \ \ \ <mount\ point>\ \ <type>\ \ \ \ \ \ \ \ \ <options>\ \ \ \ \ \ \ \ \ \ \ \ \ <dump>\ \ <pass>
|
|
/mnt/disk*:/mnt/cdrom\ \ /media/drives\ \ fuse.mergerfs\ \ allow_other,use_ino\ \ \ 0\ \ \ \ \ \ \ 0
|
|
\f[]
|
|
.fi
|
|
.PP
|
|
\f[B]NOTE:\f[] the globbing is done at mount or xattr update time (see
|
|
below).
|
|
If a new directory is added matching the glob after the fact it will not
|
|
be automatically included.
|
|
.PP
|
|
\f[B]NOTE:\f[] for mounting via \f[B]fstab\f[] to work you must have
|
|
\f[B]mount.fuse\f[] installed.
|
|
For Ubuntu/Debian it is included in the \f[B]fuse\f[] package.
|
|
.SS fuse_msg_size
|
|
.PP
|
|
FUSE applications communicate with the kernel over a special character
|
|
device: \f[C]/dev/fuse\f[].
|
|
A large portion of the overhead associated with FUSE is the cost of
|
|
going back and forth from user space and kernel space over that device.
|
|
Generally speaking the fewer trips needed the better the performance
|
|
will be.
|
|
Reducing the number of trips can be done a number of ways.
|
|
Kernel level caching and increasing message sizes being two significant
|
|
ones.
|
|
When it comes to reads and writes if the message size is doubled the
|
|
number of trips are appoximately halved.
|
|
.PP
|
|
In Linux 4.20 a new feature was added allowing the negotiation of the
|
|
max message size.
|
|
Since the size is in multiples of
|
|
pages (https://en.wikipedia.org/wiki/Page_(computer_memory)) the feature
|
|
is called \f[C]max_pages\f[].
|
|
There is a maximum \f[C]max_pages\f[] value of 256 (1MiB) and minimum of
|
|
1 (4KiB).
|
|
The default used by Linux >=4.20, and hardcoded value used before 4.20,
|
|
is 32 (128KiB).
|
|
In mergerfs its referred to as \f[C]fuse_msg_size\f[] to make it clear
|
|
what it impacts and provide some abstraction.
|
|
.PP
|
|
Since there should be no downsides to increasing \f[C]fuse_msg_size\f[]
|
|
/ \f[C]max_pages\f[], outside a minor bump in RAM usage due to larger
|
|
message buffers, mergerfs defaults the value to 256.
|
|
On kernels before 4.20 the value has no effect.
|
|
The reason the value is configurable is to enable experimentation and
|
|
benchmarking.
|
|
See the \f[C]nullrw\f[] section for benchmarking examples.
|
|
.SS symlinkify
|
|
.PP
|
|
Due to the levels of indirection introduced by mergerfs and the
|
|
underlying technology FUSE there can be varying levels of performance
|
|
degredation.
|
|
This feature will turn non\-directories which are not writable into
|
|
symlinks to the original file found by the \f[C]readlink\f[] policy
|
|
after the mtime and ctime are older than the timeout.
|
|
.PP
|
|
\f[B]WARNING:\f[] The current implementation has a known issue in which
|
|
if the file is open and being used when the file is converted to a
|
|
symlink then the application which has that file open will receive an
|
|
error when using it.
|
|
This is unlikely to occur in practice but is something to keep in mind.
|
|
.PP
|
|
\f[B]WARNING:\f[] Some backup solutions, such as CrashPlan, do not
|
|
backup the target of a symlink.
|
|
If using this feature it will be necessary to point any backup software
|
|
to the original drives or configure the software to follow symlinks if
|
|
such an option is available.
|
|
Alternatively create two mounts.
|
|
One for backup and one for general consumption.
|
|
.SS nullrw
|
|
.PP
|
|
Due to how FUSE works there is an overhead to all requests made to a
|
|
FUSE filesystem.
|
|
Meaning that even a simple passthrough will have some slowdown.
|
|
However, generally the overhead is minimal in comparison to the cost of
|
|
the underlying I/O.
|
|
By disabling the underlying I/O we can test the theoretical performance
|
|
boundries.
|
|
.PP
|
|
By enabling \f[C]nullrw\f[] mergerfs will work as it always does
|
|
\f[B]except\f[] that all reads and writes will be no\-ops.
|
|
A write will succeed (the size of the write will be returned as if it
|
|
were successful) but mergerfs does nothing with the data it was given.
|
|
Similarly a read will return the size requested but won\[aq]t touch the
|
|
buffer.
|
|
.PP
|
|
Example:
|
|
.IP
|
|
.nf
|
|
\f[C]
|
|
$\ dd\ if=/dev/zero\ of=/path/to/mergerfs/mount/benchmark\ ibs=1M\ obs=512\ count=1024\ iflag=dsync,nocache\ oflag=dsync,nocache\ conv=fdatasync\ status=progress
|
|
1024+0\ records\ in
|
|
2097152+0\ records\ out
|
|
1073741824\ bytes\ (1.1\ GB,\ 1.0\ GiB)\ copied,\ 15.4067\ s,\ 69.7\ MB/s
|
|
|
|
$\ dd\ if=/dev/zero\ of=/path/to/mergerfs/mount/benchmark\ ibs=1M\ obs=1M\ count=1024\ iflag=dsync,nocache\ oflag=dsync,nocache\ conv=fdatasync\ status=progress
|
|
1024+0\ records\ in
|
|
1024+0\ records\ out
|
|
1073741824\ bytes\ (1.1\ GB,\ 1.0\ GiB)\ copied,\ 0.219585\ s,\ 4.9\ GB/s
|
|
|
|
$\ dd\ if=/path/to/mergerfs/mount/benchmark\ of=/dev/null\ bs=512\ count=102400\ iflag=dsync,nocache\ oflag=dsync,nocache\ conv=fdatasync\ status=progress
|
|
102400+0\ records\ in
|
|
102400+0\ records\ out
|
|
52428800\ bytes\ (52\ MB,\ 50\ MiB)\ copied,\ 0.757991\ s,\ 69.2\ MB/s
|
|
|
|
$\ dd\ if=/path/to/mergerfs/mount/benchmark\ of=/dev/null\ bs=1M\ count=1024\ iflag=dsync,nocache\ oflag=dsync,nocache\ conv=fdatasync\ status=progress
|
|
1024+0\ records\ in
|
|
1024+0\ records\ out
|
|
1073741824\ bytes\ (1.1\ GB,\ 1.0\ GiB)\ copied,\ 0.18405\ s,\ 5.8\ GB/s
|
|
\f[]
|
|
.fi
|
|
.PP
|
|
It\[aq]s important to test with different \f[C]obs\f[] (output block
|
|
size) values since the relative overhead is greater with smaller values.
|
|
As you can see above the size of a read or write can massively impact
|
|
theoretical performance.
|
|
If an application performs much worse through mergerfs it could very
|
|
well be that it doesn\[aq]t optimally size its read and write requests.
|
|
In such cases contact the mergerfs author so it can be investigated.
|
|
.SS xattr
|
|
.PP
|
|
Runtime extended attribute support can be managed via the \f[C]xattr\f[]
|
|
option.
|
|
By default it will passthrough any xattr calls.
|
|
Given xattr support is rarely used and can have significant performance
|
|
implications mergerfs allows it to be disabled at runtime.
|
|
.PP
|
|
\f[C]noattr\f[] will cause mergerfs to short circuit all xattr calls and
|
|
return ENOATTR where appropriate.
|
|
mergerfs still gets all the requests but they will not be forwarded on
|
|
to the underlying filesystems.
|
|
The runtime control will still function in this mode.
|
|
.PP
|
|
\f[C]nosys\f[] will cause mergerfs to return ENOSYS for any xattr call.
|
|
The difference with \f[C]noattr\f[] is that the kernel will cache this
|
|
fact and itself short circuit future calls.
|
|
This will be more efficient than \f[C]noattr\f[] but will cause
|
|
mergerfs\[aq] runtime control via the hidden file to stop working.
|
|
.SH FUNCTIONS / POLICIES / CATEGORIES
|
|
.PP
|
|
The POSIX filesystem API is made up of a number of functions.
|
|
\f[B]creat\f[], \f[B]stat\f[], \f[B]chown\f[], etc.
|
|
In mergerfs most of the core functions are grouped into 3 categories:
|
|
\f[B]action\f[], \f[B]create\f[], and \f[B]search\f[].
|
|
These functions and categories can be assigned a policy which dictates
|
|
what file or directory is chosen when performing that behavior.
|
|
Any policy can be assigned to a function or category though some may not
|
|
be very useful in practice.
|
|
For instance: \f[B]rand\f[] (random) may be useful for file creation
|
|
(create) but could lead to very odd behavior if used for \f[C]chmod\f[]
|
|
if there were more than one copy of the file.
|
|
.PP
|
|
Some functions, listed in the category \f[C]N/A\f[] below, can not be
|
|
assigned the normal policies.
|
|
All functions which work on file handles use the handle which was
|
|
acquired by \f[C]open\f[] or \f[C]create\f[].
|
|
\f[C]readdir\f[] has no real need for a policy given the purpose is
|
|
merely to return a list of entries in a directory.
|
|
\f[C]statfs\f[]\[aq]s behavior can be modified via other options.
|
|
That said many times the current FUSE kernel driver will not always
|
|
provide the file handle when a client calls \f[C]fgetattr\f[],
|
|
\f[C]fchown\f[], \f[C]fchmod\f[], \f[C]futimens\f[], \f[C]ftruncate\f[],
|
|
etc.
|
|
This means it will call the regular, path based, versions.
|
|
.PP
|
|
When using policies which are based on a branch\[aq]s available space
|
|
the base path provided is used.
|
|
Not the full path to the file in question.
|
|
Meaning that sub mounts won\[aq]t be considered in the space
|
|
calculations.
|
|
The reason is that it doesn\[aq]t really work for non\-path preserving
|
|
policies and can lead to non\-obvious behaviors.
|
|
.SS Function / Category classifications
|
|
.PP
|
|
.TS
|
|
tab(@);
|
|
lw(7.9n) lw(62.1n).
|
|
T{
|
|
Category
|
|
T}@T{
|
|
FUSE Functions
|
|
T}
|
|
_
|
|
T{
|
|
action
|
|
T}@T{
|
|
chmod, chown, link, removexattr, rename, rmdir, setxattr, truncate,
|
|
unlink, utimens
|
|
T}
|
|
T{
|
|
create
|
|
T}@T{
|
|
create, mkdir, mknod, symlink
|
|
T}
|
|
T{
|
|
search
|
|
T}@T{
|
|
access, getattr, getxattr, ioctl (directories), listxattr, open,
|
|
readlink
|
|
T}
|
|
T{
|
|
N/A
|
|
T}@T{
|
|
fchmod, fchown, futimens, ftruncate, fallocate, fgetattr, fsync, ioctl
|
|
(files), read, readdir, release, statfs, write, copy_file_range
|
|
T}
|
|
.TE
|
|
.PP
|
|
In cases where something may be searched (to confirm a directory exists
|
|
across all source mounts) \f[B]getattr\f[] will be used.
|
|
.SS Path Preservation
|
|
.PP
|
|
Policies, as described below, are of two basic types.
|
|
\f[C]path\ preserving\f[] and \f[C]non\-path\ preserving\f[].
|
|
.PP
|
|
All policies which start with \f[C]ep\f[] (\f[B]epff\f[],
|
|
\f[B]eplfs\f[], \f[B]eplus\f[], \f[B]epmfs\f[], \f[B]eprand\f[]) are
|
|
\f[C]path\ preserving\f[].
|
|
\f[C]ep\f[] stands for \f[C]existing\ path\f[].
|
|
.PP
|
|
A path preserving policy will only consider drives where the relative
|
|
path being accessed already exists.
|
|
.PP
|
|
When using non\-path preserving policies paths will be cloned to target
|
|
drives as necessary.
|
|
.SS Filters
|
|
.PP
|
|
Policies basically search branches and create a list of files / paths
|
|
for functions to work on.
|
|
The policy is responsible for filtering and sorting.
|
|
The policy type defines the sorting but filtering is mostly uniform as
|
|
described below.
|
|
.IP \[bu] 2
|
|
No \f[B]search\f[] policies filter.
|
|
.IP \[bu] 2
|
|
All \f[B]action\f[] policies will filter out branches which are mounted
|
|
\f[B]read\-only\f[] or tagged as \f[B]RO (read\-only)\f[].
|
|
.IP \[bu] 2
|
|
All \f[B]create\f[] policies will filter out branches which are mounted
|
|
\f[B]read\-only\f[], tagged \f[B]RO (read\-only)\f[] or \f[B]NC (no
|
|
create)\f[], or has available space less than \f[C]minfreespace\f[].
|
|
.PP
|
|
If all branches are filtered an error will be returned.
|
|
Typically \f[B]EROFS\f[] or \f[B]ENOSPC\f[] depending on the reasons.
|
|
.SS Policy descriptions
|
|
.PP
|
|
.TS
|
|
tab(@);
|
|
lw(16.6n) lw(53.4n).
|
|
T{
|
|
Policy
|
|
T}@T{
|
|
Description
|
|
T}
|
|
_
|
|
T{
|
|
all
|
|
T}@T{
|
|
Search category: same as \f[B]epall\f[].
|
|
Action category: same as \f[B]epall\f[].
|
|
Create category: for \f[B]mkdir\f[], \f[B]mknod\f[], and
|
|
\f[B]symlink\f[] it will apply to all branches.
|
|
\f[B]create\f[] works like \f[B]ff\f[].
|
|
T}
|
|
T{
|
|
epall (existing path, all)
|
|
T}@T{
|
|
Search category: same as \f[B]epff\f[] (but more expensive because it
|
|
doesn\[aq]t stop after finding a valid branch).
|
|
Action category: apply to all found.
|
|
Create category: for \f[B]mkdir\f[], \f[B]mknod\f[], and
|
|
\f[B]symlink\f[] it will apply to all found.
|
|
\f[B]create\f[] works like \f[B]epff\f[] (but more expensive because it
|
|
doesn\[aq]t stop after finding a valid branch).
|
|
T}
|
|
T{
|
|
epff (existing path, first found)
|
|
T}@T{
|
|
Given the order of the branches, as defined at mount time or configured
|
|
at runtime, act on the first one found where the relative path exists.
|
|
T}
|
|
T{
|
|
eplfs (existing path, least free space)
|
|
T}@T{
|
|
Of all the branches on which the relative path exists choose the drive
|
|
with the least free space.
|
|
T}
|
|
T{
|
|
eplus (existing path, least used space)
|
|
T}@T{
|
|
Of all the branches on which the relative path exists choose the drive
|
|
with the least used space.
|
|
T}
|
|
T{
|
|
epmfs (existing path, most free space)
|
|
T}@T{
|
|
Of all the branches on which the relative path exists choose the drive
|
|
with the most free space.
|
|
T}
|
|
T{
|
|
eprand (existing path, random)
|
|
T}@T{
|
|
Calls \f[B]epall\f[] and then randomizes.
|
|
T}
|
|
T{
|
|
erofs
|
|
T}@T{
|
|
Exclusively return \f[B]\-1\f[] with \f[B]errno\f[] set to
|
|
\f[B]EROFS\f[] (read\-only filesystem).
|
|
T}
|
|
T{
|
|
ff (first found)
|
|
T}@T{
|
|
Search category: same as \f[B]epff\f[].
|
|
Action category: same as \f[B]epff\f[].
|
|
Create category: Given the order of the drives, as defined at mount time
|
|
or configured at runtime, act on the first one found.
|
|
T}
|
|
T{
|
|
lfs (least free space)
|
|
T}@T{
|
|
Search category: same as \f[B]eplfs\f[].
|
|
Action category: same as \f[B]eplfs\f[].
|
|
Create category: Pick the drive with the least available free space.
|
|
T}
|
|
T{
|
|
lus (least used space)
|
|
T}@T{
|
|
Search category: same as \f[B]eplus\f[].
|
|
Action category: same as \f[B]eplus\f[].
|
|
Create category: Pick the drive with the least used space.
|
|
T}
|
|
T{
|
|
mfs (most free space)
|
|
T}@T{
|
|
Search category: same as \f[B]epmfs\f[].
|
|
Action category: same as \f[B]epmfs\f[].
|
|
Create category: Pick the drive with the most available free space.
|
|
T}
|
|
T{
|
|
newest
|
|
T}@T{
|
|
Pick the file / directory with the largest mtime.
|
|
T}
|
|
T{
|
|
rand (random)
|
|
T}@T{
|
|
Calls \f[B]all\f[] and then randomizes.
|
|
T}
|
|
.TE
|
|
.SS Defaults
|
|
.PP
|
|
.TS
|
|
tab(@);
|
|
l l.
|
|
T{
|
|
Category
|
|
T}@T{
|
|
Policy
|
|
T}
|
|
_
|
|
T{
|
|
action
|
|
T}@T{
|
|
epall
|
|
T}
|
|
T{
|
|
create
|
|
T}@T{
|
|
epmfs
|
|
T}
|
|
T{
|
|
search
|
|
T}@T{
|
|
ff
|
|
T}
|
|
.TE
|
|
.SS ioctl
|
|
.PP
|
|
When \f[C]ioctl\f[] is used with an open file then it will use the file
|
|
handle which was created at the original \f[C]open\f[] call.
|
|
However, when using \f[C]ioctl\f[] with a directory mergerfs will use
|
|
the \f[C]open\f[] policy to find the directory to act on.
|
|
.SS unlink
|
|
.PP
|
|
In FUSE there is an opaque "file handle" which is created by
|
|
\f[C]open\f[], \f[C]create\f[], or \f[C]opendir\f[], passed to the
|
|
kernel, and then is passed back to the FUSE userland application by the
|
|
kernel.
|
|
Unfortunately, the FUSE kernel driver does not always send the file
|
|
handle when it theoretically could/should.
|
|
This complicates certain behaviors / workflows particularly in the high
|
|
level API.
|
|
As a result mergerfs is currently doing a few hacky things.
|
|
.PP
|
|
libfuse2 and libfuse3, when using the high level API, will rename names
|
|
to \f[C]\&.fuse_hiddenXXXXXX\f[] if the file is open when unlinked or
|
|
renamed over.
|
|
It does this so the file is still available when a request referencing
|
|
the now missing file is made.
|
|
This file however keeps a \f[C]rmdir\f[] from succeeding and can be
|
|
picked up by software reading directories.
|
|
.PP
|
|
The change mergerfs has done is that if a file is open when an unlink or
|
|
rename happens it will open the file and keep it open till closed by all
|
|
those who opened it prior.
|
|
When a request comes in referencing that file and it doesn\[aq]t include
|
|
a file handle it will instead use the file handle created at
|
|
unlink/rename time.
|
|
.PP
|
|
This won\[aq]t result in technically proper behavior but close enough
|
|
for many usecases.
|
|
.PP
|
|
The plan is to rewrite mergerfs to use the low level API so these
|
|
invasive libfuse changes are no longer necessary.
|
|
.SS rename & link
|
|
.PP
|
|
\f[B]NOTE:\f[] If you\[aq]re receiving errors from software when files
|
|
are moved / renamed / linked then you should consider changing the
|
|
create policy to one which is \f[B]not\f[] path preserving, enabling
|
|
\f[C]ignorepponrename\f[], or contacting the author of the offending
|
|
software and requesting that \f[C]EXDEV\f[] be properly handled.
|
|
.PP
|
|
\f[C]rename\f[] and \f[C]link\f[] are tricky functions in a union
|
|
filesystem.
|
|
\f[C]rename\f[] only works within a single filesystem or device.
|
|
If a rename can\[aq]t be done atomically due to the source and
|
|
destination paths existing on different mount points it will return
|
|
\f[B]\-1\f[] with \f[B]errno = EXDEV\f[] (cross device).
|
|
So if a \f[C]rename\f[]\[aq]s source and target are on different drives
|
|
within the pool it creates an issue.
|
|
.PP
|
|
Originally mergerfs would return EXDEV whenever a rename was requested
|
|
which was cross directory in any way.
|
|
This made the code simple and was technically complient with POSIX
|
|
requirements.
|
|
However, many applications fail to handle EXDEV at all and treat it as a
|
|
normal error or otherwise handle it poorly.
|
|
Such apps include: gvfsd\-fuse v1.20.3 and prior, Finder / CIFS/SMB
|
|
client in Apple OSX 10.9+, NZBGet, Samba\[aq]s recycling bin feature.
|
|
.PP
|
|
As a result a compromise was made in order to get most software to work
|
|
while still obeying mergerfs\[aq] policies.
|
|
Below is the basic logic.
|
|
.IP \[bu] 2
|
|
If using a \f[B]create\f[] policy which tries to preserve directory
|
|
paths (epff,eplfs,eplus,epmfs)
|
|
.IP \[bu] 2
|
|
Using the \f[B]rename\f[] policy get the list of files to rename
|
|
.IP \[bu] 2
|
|
For each file attempt rename:
|
|
.RS 2
|
|
.IP \[bu] 2
|
|
If failure with ENOENT run \f[B]create\f[] policy
|
|
.IP \[bu] 2
|
|
If create policy returns the same drive as currently evaluating then
|
|
clone the path
|
|
.IP \[bu] 2
|
|
Re\-attempt rename
|
|
.RE
|
|
.IP \[bu] 2
|
|
If \f[B]any\f[] of the renames succeed the higher level rename is
|
|
considered a success
|
|
.IP \[bu] 2
|
|
If \f[B]no\f[] renames succeed the first error encountered will be
|
|
returned
|
|
.IP \[bu] 2
|
|
On success:
|
|
.RS 2
|
|
.IP \[bu] 2
|
|
Remove the target from all drives with no source file
|
|
.IP \[bu] 2
|
|
Remove the source from all drives which failed to rename
|
|
.RE
|
|
.IP \[bu] 2
|
|
If using a \f[B]create\f[] policy which does \f[B]not\f[] try to
|
|
preserve directory paths
|
|
.IP \[bu] 2
|
|
Using the \f[B]rename\f[] policy get the list of files to rename
|
|
.IP \[bu] 2
|
|
Using the \f[B]getattr\f[] policy get the target path
|
|
.IP \[bu] 2
|
|
For each file attempt rename:
|
|
.RS 2
|
|
.IP \[bu] 2
|
|
If the source drive != target drive:
|
|
.IP \[bu] 2
|
|
Clone target path from target drive to source drive
|
|
.IP \[bu] 2
|
|
Rename
|
|
.RE
|
|
.IP \[bu] 2
|
|
If \f[B]any\f[] of the renames succeed the higher level rename is
|
|
considered a success
|
|
.IP \[bu] 2
|
|
If \f[B]no\f[] renames succeed the first error encountered will be
|
|
returned
|
|
.IP \[bu] 2
|
|
On success:
|
|
.RS 2
|
|
.IP \[bu] 2
|
|
Remove the target from all drives with no source file
|
|
.IP \[bu] 2
|
|
Remove the source from all drives which failed to rename
|
|
.RE
|
|
.PP
|
|
The the removals are subject to normal entitlement checks.
|
|
.PP
|
|
The above behavior will help minimize the likelihood of EXDEV being
|
|
returned but it will still be possible.
|
|
.PP
|
|
\f[B]link\f[] uses the same strategy but without the removals.
|
|
.SS readdir
|
|
.PP
|
|
readdir (http://linux.die.net/man/3/readdir) is different from all other
|
|
filesystem functions.
|
|
While it could have its own set of policies to tweak its behavior at
|
|
this time it provides a simple union of files and directories found.
|
|
Remember that any action or information queried about these files and
|
|
directories come from the respective function.
|
|
For instance: an \f[B]ls\f[] is a \f[B]readdir\f[] and for each
|
|
file/directory returned \f[B]getattr\f[] is called.
|
|
Meaning the policy of \f[B]getattr\f[] is responsible for choosing the
|
|
file/directory which is the source of the metadata you see in an
|
|
\f[B]ls\f[].
|
|
.SS statfs / statvfs
|
|
.PP
|
|
statvfs (http://linux.die.net/man/2/statvfs) normalizes the source
|
|
drives based on the fragment size and sums the number of adjusted blocks
|
|
and inodes.
|
|
This means you will see the combined space of all sources.
|
|
Total, used, and free.
|
|
The sources however are dedupped based on the drive so multiple sources
|
|
on the same drive will not result in double counting its space.
|
|
Filesystems mounted further down the tree of the branch will not be
|
|
included when checking the mount\[aq]s stats.
|
|
.PP
|
|
The options \f[C]statfs\f[] and \f[C]statfs_ignore\f[] can be used to
|
|
modify \f[C]statfs\f[] behavior.
|
|
.SH BUILDING
|
|
.PP
|
|
\f[B]NOTE:\f[] Prebuilt packages can be found at:
|
|
https://github.com/trapexit/mergerfs/releases
|
|
.PP
|
|
First get the code from github (https://github.com/trapexit/mergerfs).
|
|
.IP
|
|
.nf
|
|
\f[C]
|
|
$\ git\ clone\ https://github.com/trapexit/mergerfs.git
|
|
$\ #\ or
|
|
$\ wget\ https://github.com/trapexit/mergerfs/releases/download/<ver>/mergerfs\-<ver>.tar.gz
|
|
\f[]
|
|
.fi
|
|
.SS Debian / Ubuntu
|
|
.IP
|
|
.nf
|
|
\f[C]
|
|
$\ cd\ mergerfs
|
|
$\ sudo\ tools/install\-build\-pkgs
|
|
$\ make\ deb
|
|
$\ sudo\ dpkg\ \-i\ ../mergerfs_version_arch.deb
|
|
\f[]
|
|
.fi
|
|
.SS Fedora
|
|
.IP
|
|
.nf
|
|
\f[C]
|
|
$\ su\ \-
|
|
#\ cd\ mergerfs
|
|
#\ tools/install\-build\-pkgs
|
|
#\ make\ rpm
|
|
#\ rpm\ \-i\ rpmbuild/RPMS/<arch>/mergerfs\-<verion>.<arch>.rpm
|
|
\f[]
|
|
.fi
|
|
.SS Generically
|
|
.PP
|
|
Have git, g++, make, python installed.
|
|
.IP
|
|
.nf
|
|
\f[C]
|
|
$\ cd\ mergerfs
|
|
$\ make
|
|
$\ sudo\ make\ install
|
|
\f[]
|
|
.fi
|
|
.SS Build options
|
|
.IP
|
|
.nf
|
|
\f[C]
|
|
$\ make\ help
|
|
usage:\ make
|
|
|
|
make\ USE_XATTR=0\ \ \ \ \ \ \-\ build\ program\ without\ xattrs\ functionality
|
|
make\ STATIC=1\ \ \ \ \ \ \ \ \ \-\ build\ static\ binary
|
|
make\ LTO=1\ \ \ \ \ \ \ \ \ \ \ \ \-\ build\ with\ link\ time\ optimization
|
|
\f[]
|
|
.fi
|
|
.SH RUNTIME CONFIG
|
|
.SS .mergerfs pseudo file
|
|
.IP
|
|
.nf
|
|
\f[C]
|
|
<mountpoint>/.mergerfs
|
|
\f[]
|
|
.fi
|
|
.PP
|
|
There is a pseudo file available at the mount point which allows for the
|
|
runtime modification of certain \f[B]mergerfs\f[] options.
|
|
The file will not show up in \f[B]readdir\f[] but can be
|
|
\f[B]stat\f[]\[aq]ed and manipulated via
|
|
{list,get,set}xattrs (http://linux.die.net/man/2/listxattr) calls.
|
|
.PP
|
|
Any changes made at runtime are \f[B]not\f[] persisted.
|
|
If you wish for values to persist they must be included as options
|
|
wherever you configure the mounting of mergerfs (/etc/fstab).
|
|
.SS Keys
|
|
.PP
|
|
Use \f[C]xattr\ \-l\ /mountpoint/.mergerfs\f[] to see all supported
|
|
keys.
|
|
Some are informational and therefore read\-only.
|
|
\f[C]setxattr\f[] will return EINVAL on read\-only keys.
|
|
.SS Values
|
|
.PP
|
|
Same as the command line.
|
|
.SS user.mergerfs.branches
|
|
.PP
|
|
\f[B]NOTE:\f[] formerly \f[C]user.mergerfs.srcmounts\f[] but said key is
|
|
still supported.
|
|
.PP
|
|
Used to query or modify the list of branches.
|
|
When modifying there are several shortcuts to easy manipulation of the
|
|
list.
|
|
.PP
|
|
.TS
|
|
tab(@);
|
|
l l.
|
|
T{
|
|
Value
|
|
T}@T{
|
|
Description
|
|
T}
|
|
_
|
|
T{
|
|
[list]
|
|
T}@T{
|
|
set
|
|
T}
|
|
T{
|
|
+<[list]
|
|
T}@T{
|
|
prepend
|
|
T}
|
|
T{
|
|
+>[list]
|
|
T}@T{
|
|
append
|
|
T}
|
|
T{
|
|
\-[list]
|
|
T}@T{
|
|
remove all values provided
|
|
T}
|
|
T{
|
|
\-<
|
|
T}@T{
|
|
remove first in list
|
|
T}
|
|
T{
|
|
\->
|
|
T}@T{
|
|
remove last in list
|
|
T}
|
|
.TE
|
|
.PP
|
|
\f[C]xattr\ \-w\ user.mergerfs.branches\ +</mnt/drive3\ /mnt/pool/.mergerfs\f[]
|
|
.PP
|
|
The \f[C]=NC\f[], \f[C]=RO\f[], \f[C]=RW\f[] syntax works just as on the
|
|
command line.
|
|
.SS Example
|
|
.IP
|
|
.nf
|
|
\f[C]
|
|
[trapexit:/mnt/mergerfs]\ $\ xattr\ \-l\ .mergerfs
|
|
user.mergerfs.branches:\ /mnt/a=RW:/mnt/b=RW
|
|
user.mergerfs.minfreespace:\ 4294967295
|
|
user.mergerfs.moveonenospc:\ false
|
|
\&...
|
|
|
|
[trapexit:/mnt/mergerfs]\ $\ xattr\ \-p\ user.mergerfs.category.search\ .mergerfs
|
|
ff
|
|
|
|
[trapexit:/mnt/mergerfs]\ $\ xattr\ \-w\ user.mergerfs.category.search\ newest\ .mergerfs
|
|
[trapexit:/mnt/mergerfs]\ $\ xattr\ \-p\ user.mergerfs.category.search\ .mergerfs
|
|
newest
|
|
|
|
[trapexit:/mnt/mergerfs]\ $\ xattr\ \-w\ user.mergerfs.branches\ +/mnt/c\ .mergerfs
|
|
[trapexit:/mnt/mergerfs]\ $\ xattr\ \-p\ user.mergerfs.branches\ .mergerfs
|
|
/mnt/a:/mnt/b:/mnt/c
|
|
|
|
[trapexit:/mnt/mergerfs]\ $\ xattr\ \-w\ user.mergerfs.branches\ =/mnt/c\ .mergerfs
|
|
[trapexit:/mnt/mergerfs]\ $\ xattr\ \-p\ user.mergerfs.branches\ .mergerfs
|
|
/mnt/c
|
|
|
|
[trapexit:/mnt/mergerfs]\ $\ xattr\ \-w\ user.mergerfs.branches\ \[aq]+</mnt/a:/mnt/b\[aq]\ .mergerfs
|
|
[trapexit:/mnt/mergerfs]\ $\ xattr\ \-p\ user.mergerfs.branches\ .mergerfs
|
|
/mnt/a:/mnt/b:/mnt/c
|
|
\f[]
|
|
.fi
|
|
.SS file / directory xattrs
|
|
.PP
|
|
While they won\[aq]t show up when using
|
|
listxattr (http://linux.die.net/man/2/listxattr) \f[B]mergerfs\f[]
|
|
offers a number of special xattrs to query information about the files
|
|
served.
|
|
To access the values you will need to issue a
|
|
getxattr (http://linux.die.net/man/2/getxattr) for one of the following:
|
|
.IP \[bu] 2
|
|
\f[B]user.mergerfs.basepath\f[]: the base mount point for the file given
|
|
the current getattr policy
|
|
.IP \[bu] 2
|
|
\f[B]user.mergerfs.relpath\f[]: the relative path of the file from the
|
|
perspective of the mount point
|
|
.IP \[bu] 2
|
|
\f[B]user.mergerfs.fullpath\f[]: the full path of the original file
|
|
given the getattr policy
|
|
.IP \[bu] 2
|
|
\f[B]user.mergerfs.allpaths\f[]: a NUL (\[aq]\[aq]) separated list of
|
|
full paths to all files found
|
|
.IP
|
|
.nf
|
|
\f[C]
|
|
[trapexit:/mnt/mergerfs]\ $\ ls
|
|
A\ B\ C
|
|
[trapexit:/mnt/mergerfs]\ $\ xattr\ \-p\ user.mergerfs.fullpath\ A
|
|
/mnt/a/full/path/to/A
|
|
[trapexit:/mnt/mergerfs]\ $\ xattr\ \-p\ user.mergerfs.basepath\ A
|
|
/mnt/a
|
|
[trapexit:/mnt/mergerfs]\ $\ xattr\ \-p\ user.mergerfs.relpath\ A
|
|
/full/path/to/A
|
|
[trapexit:/mnt/mergerfs]\ $\ xattr\ \-p\ user.mergerfs.allpaths\ A\ |\ tr\ \[aq]\\0\[aq]\ \[aq]\\n\[aq]
|
|
/mnt/a/full/path/to/A
|
|
/mnt/b/full/path/to/A
|
|
\f[]
|
|
.fi
|
|
.SH TOOLING
|
|
.IP \[bu] 2
|
|
https://github.com/trapexit/mergerfs\-tools
|
|
.IP \[bu] 2
|
|
mergerfs.ctl: A tool to make it easier to query and configure mergerfs
|
|
at runtime
|
|
.IP \[bu] 2
|
|
mergerfs.fsck: Provides permissions and ownership auditing and the
|
|
ability to fix them
|
|
.IP \[bu] 2
|
|
mergerfs.dedup: Will help identify and optionally remove duplicate files
|
|
.IP \[bu] 2
|
|
mergerfs.dup: Ensure there are at least N copies of a file across the
|
|
pool
|
|
.IP \[bu] 2
|
|
mergerfs.balance: Rebalance files across drives by moving them from the
|
|
most filled to the least filled
|
|
.IP \[bu] 2
|
|
mergerfs.mktrash: Creates FreeDesktop.org Trash specification compatible
|
|
directories on a mergerfs mount
|
|
.IP \[bu] 2
|
|
https://github.com/trapexit/scorch
|
|
.IP \[bu] 2
|
|
scorch: A tool to help discover silent corruption of files and keep
|
|
track of files
|
|
.IP \[bu] 2
|
|
https://github.com/trapexit/bbf
|
|
.IP \[bu] 2
|
|
bbf (bad block finder): a tool to scan for and \[aq]fix\[aq] hard drive
|
|
bad blocks and find the files using those blocks
|
|
.SH CACHING
|
|
.SS page caching
|
|
.PP
|
|
https://en.wikipedia.org/wiki/Page_cache
|
|
.PP
|
|
tl;dr: * cache.files=off: Disables page caching.
|
|
Underlying files cached, mergerfs files are not.
|
|
* cache.files=partial: Enables page caching.
|
|
Underlying files cached, mergerfs files cached while open.
|
|
* cache.files=full: Enables page caching.
|
|
Underlying files cached, mergerfs files cached across opens.
|
|
* cache.files=auto\-full: Enables page caching.
|
|
Underlying files cached, mergerfs files cached across opens if mtime and
|
|
size are unchanged since previous open.
|
|
* cache.files=libfuse: follow traditional libfuse \f[C]direct_io\f[],
|
|
\f[C]kernel_cache\f[], and \f[C]auto_cache\f[] arguments.
|
|
.PP
|
|
FUSE, which mergerfs uses, offers a number of page caching modes.
|
|
mergerfs tries to simplify their use via the \f[C]cache.files\f[]
|
|
option.
|
|
It can and should replace usage of \f[C]direct_io\f[],
|
|
\f[C]kernel_cache\f[], and \f[C]auto_cache\f[].
|
|
.PP
|
|
Due to mergerfs using FUSE and therefore being a userland process
|
|
proxying existing filesystems the kernel will double cache the content
|
|
being read and written through mergerfs.
|
|
Once from the underlying filesystem and once from mergerfs (it sees them
|
|
as two separate entities).
|
|
Using \f[C]cache.files=off\f[] will keep the double caching from
|
|
happening by disabling caching of mergerfs but this has the side effect
|
|
that \f[I]all\f[] read and write calls will be passed to mergerfs which
|
|
may be slower than enabling caching, you lose shared \f[C]mmap\f[]
|
|
support which can affect apps such as rtorrent, and no read\-ahead will
|
|
take place.
|
|
The kernel will still cache the underlying filesystem data but that only
|
|
helps so much given mergerfs will still process all requests.
|
|
.PP
|
|
If you do enable file page caching,
|
|
\f[C]cache.files=partial|full|auto\-full\f[], you should also enable
|
|
\f[C]dropcacheonclose\f[] which will cause mergerfs to instruct the
|
|
kernel to flush the underlying file\[aq]s page cache when the file is
|
|
closed.
|
|
This behavior is the same as the rsync fadvise / drop cache patch and
|
|
Feh\[aq]s nocache project.
|
|
.PP
|
|
If most files are read once through and closed (like media) it is best
|
|
to enable \f[C]dropcacheonclose\f[] regardless of caching mode in order
|
|
to minimize buffer bloat.
|
|
.PP
|
|
It is difficult to balance memory usage, cache bloat & duplication, and
|
|
performance.
|
|
Ideally mergerfs would be able to disable caching for the files it
|
|
reads/writes but allow page caching for itself.
|
|
That would limit the FUSE overhead.
|
|
However, there isn\[aq]t a good way to achieve this.
|
|
It would need to open all files with O_DIRECT which places limitations
|
|
on the what underlying filesystems would be supported and complicates
|
|
the code.
|
|
.PP
|
|
kernel documenation:
|
|
https://www.kernel.org/doc/Documentation/filesystems/fuse\-io.txt
|
|
.SS entry & attribute caching
|
|
.PP
|
|
Given the relatively high cost of FUSE due to the kernel <\-> userspace
|
|
round trips there are kernel side caches for file entries and
|
|
attributes.
|
|
The entry cache limits the \f[C]lookup\f[] calls to mergerfs which ask
|
|
if a file exists.
|
|
The attribute cache limits the need to make \f[C]getattr\f[] calls to
|
|
mergerfs which provide file attributes (mode, size, type, etc.).
|
|
As with the page cache these should not be used if the underlying
|
|
filesystems are being manipulated at the same time as it could lead to
|
|
odd behavior or data corruption.
|
|
The options for setting these are \f[C]cache.entry\f[] and
|
|
\f[C]cache.negative_entry\f[] for the entry cache and
|
|
\f[C]cache.attr\f[] for the attributes cache.
|
|
\f[C]cache.negative_entry\f[] refers to the timeout for negative
|
|
responses to lookups (non\-existant files).
|
|
.SS policy caching
|
|
.PP
|
|
Policies are run every time a function (with a policy as mentioned
|
|
above) is called.
|
|
These policies can be expensive depending on mergerfs\[aq] setup and
|
|
client usage patterns.
|
|
Generally we wouldn\[aq]t want to cache policy results because it may
|
|
result in stale responses if the underlying drives are used directly.
|
|
.PP
|
|
The \f[C]open\f[] policy cache will cache the result of an \f[C]open\f[]
|
|
policy for a particular input for \f[C]cache.open\f[] seconds or until
|
|
the file is unlinked.
|
|
Each file close (release) will randomly chose to clean up the cache of
|
|
expired entries.
|
|
.PP
|
|
This cache is really only useful in cases where you have a large number
|
|
of branches and \f[C]open\f[] is called on the same files repeatedly
|
|
(like \f[B]Transmission\f[] which opens and closes a file on every
|
|
read/write presumably to keep file handle usage low).
|
|
.SS statfs caching
|
|
.PP
|
|
Of the syscalls used by mergerfs in policies the \f[C]statfs\f[] /
|
|
\f[C]statvfs\f[] call is perhaps the most expensive.
|
|
It\[aq]s used to find out the available space of a drive and whether it
|
|
is mounted read\-only.
|
|
Depending on the setup and usage pattern these queries can be relatively
|
|
costly.
|
|
When \f[C]cache.statfs\f[] is enabled all calls to \f[C]statfs\f[] by a
|
|
policy will be cached for the number of seconds its set to.
|
|
.PP
|
|
Example: If the create policy is \f[C]mfs\f[] and the timeout is 60 then
|
|
for that 60 seconds the same drive will be returned as the target for
|
|
creates because the available space won\[aq]t be updated for that time.
|
|
.SS symlink caching
|
|
.PP
|
|
As of version 4.20 Linux supports symlink caching.
|
|
Significant performance increases can be had in workloads which use a
|
|
lot of symlinks.
|
|
Setting \f[C]cache.symlinks=true\f[] will result in requesting symlink
|
|
caching from the kernel only if supported.
|
|
As a result its safe to enable it on systems prior to 4.20.
|
|
That said it is disabled by default for now.
|
|
You can see if caching is enabled by querying the xattr
|
|
\f[C]user.mergerfs.cache.symlinks\f[] but given it must be requested at
|
|
startup you can not change it at runtime.
|
|
.SS readdir caching
|
|
.PP
|
|
As of version 4.20 Linux supports readdir caching.
|
|
This can have a significant impact on directory traversal.
|
|
Especially when combined with entry (\f[C]cache.entry\f[]) and attribute
|
|
(\f[C]cache.attr\f[]) caching.
|
|
Setting \f[C]cache.readdir=true\f[] will result in requesting readdir
|
|
caching from the kernel on each \f[C]opendir\f[].
|
|
If the kernel doesn\[aq]t support readdir caching setting the option to
|
|
\f[C]true\f[] has no effect.
|
|
This option is configuarable at runtime via xattr
|
|
\f[C]user.mergerfs.cache.readdir\f[].
|
|
.SS writeback caching
|
|
.PP
|
|
writeback caching is a technique for improving write speeds by batching
|
|
writes at a faster device and then bulk writing to the slower device.
|
|
With FUSE the kernel will wait for a number of writes to be made and
|
|
then send it to the filesystem as one request.
|
|
mergerfs currently uses a modified and vendored libfuse 2.9.7 which does
|
|
not support writeback caching.
|
|
Adding said feature should not be difficult but benchmarking needs to be
|
|
done to see if what effect it will have.
|
|
.SS tiered caching
|
|
.PP
|
|
Some storage technologies support what some call "tiered" caching.
|
|
The placing of usually smaller, faster storage as a transparent cache to
|
|
larger, slower storage.
|
|
NVMe, SSD, Optane in front of traditional HDDs for instance.
|
|
.PP
|
|
MergerFS does not natively support any sort of tiered caching.
|
|
Most users have no use for such a feature and its inclusion would
|
|
complicate the code.
|
|
However, there are a few situations where a cache drive could help with
|
|
a typical mergerfs setup.
|
|
.IP "1." 3
|
|
Fast network, slow drives, many readers: You\[aq]ve a 10+Gbps network
|
|
with many readers and your regular drives can\[aq]t keep up.
|
|
.IP "2." 3
|
|
Fast network, slow drives, small\[aq]ish bursty writes: You have a
|
|
10+Gbps network and wish to transfer amounts of data less than your
|
|
cache drive but wish to do so quickly.
|
|
.PP
|
|
With #1 its arguable if you should be using mergerfs at all.
|
|
RAID would probably be the better solution.
|
|
If you\[aq]re going to use mergerfs there are other tactics that may
|
|
help: spreading the data across drives (see the mergerfs.dup tool) and
|
|
setting \f[C]func.open=rand\f[], using \f[C]symlinkify\f[], or using
|
|
dm\-cache or a similar technology to add tiered cache to the underlying
|
|
device.
|
|
.PP
|
|
With #2 one could use dm\-cache as well but there is another solution
|
|
which requires only mergerfs and a cronjob.
|
|
.IP "1." 3
|
|
Create 2 mergerfs pools.
|
|
One which includes just the slow drives and one which has both the fast
|
|
drives (SSD,NVME,etc.) and slow drives.
|
|
.IP "2." 3
|
|
The \[aq]cache\[aq] pool should have the cache drives listed first.
|
|
.IP "3." 3
|
|
The best \f[C]create\f[] policies to use for the \[aq]cache\[aq] pool
|
|
would probably be \f[C]ff\f[], \f[C]epff\f[], \f[C]lfs\f[], or
|
|
\f[C]eplfs\f[].
|
|
The latter two under the assumption that the cache drive(s) are far
|
|
smaller than the backing drives.
|
|
If using path preserving policies remember that you\[aq]ll need to
|
|
manually create the core directories of those paths you wish to be
|
|
cached.
|
|
Be sure the permissions are in sync.
|
|
Use \f[C]mergerfs.fsck\f[] to check / correct them.
|
|
You could also tag the slow drives as \f[C]=NC\f[] though that\[aq]d
|
|
mean if the cache drives fill you\[aq]d get "out of space" errors.
|
|
.IP "4." 3
|
|
Enable \f[C]moveonenospc\f[] and set \f[C]minfreespace\f[]
|
|
appropriately.
|
|
Perhaps setting \f[C]minfreespace\f[] to the size of the largest cache
|
|
drive.
|
|
.IP "5." 3
|
|
Set your programs to use the cache pool.
|
|
.IP "6." 3
|
|
Save one of the below scripts or create you\[aq]re own.
|
|
.IP "7." 3
|
|
Use \f[C]cron\f[] (as root) to schedule the command at whatever
|
|
frequency is appropriate for your workflow.
|
|
.SS time based expiring
|
|
.PP
|
|
Move files from cache to backing pool based only on the last time the
|
|
file was accessed.
|
|
Replace \f[C]\-atime\f[] with \f[C]\-amin\f[] if you want minutes rather
|
|
than days.
|
|
May want to use the \f[C]fadvise\f[] / \f[C]\-\-drop\-cache\f[] version
|
|
of rsync or run rsync with the tool "nocache".
|
|
.IP
|
|
.nf
|
|
\f[C]
|
|
#!/bin/bash
|
|
|
|
if\ [\ $#\ !=\ 3\ ];\ then
|
|
\ \ echo\ "usage:\ $0\ <cache\-drive>\ <backing\-pool>\ <days\-old>"
|
|
\ \ exit\ 1
|
|
fi
|
|
|
|
CACHE="${1}"
|
|
BACKING="${2}"
|
|
N=${3}
|
|
|
|
find\ "${CACHE}"\ \-type\ f\ \-atime\ +${N}\ \-printf\ \[aq]%P\\n\[aq]\ |\ \\
|
|
\ \ rsync\ \-\-files\-from=\-\ \-axqHAXWES\ \-\-preallocate\ \-\-remove\-source\-files\ "${CACHE}/"\ "${BACKING}/"
|
|
\f[]
|
|
.fi
|
|
.SS percentage full expiring
|
|
.PP
|
|
Move the oldest file from the cache to the backing pool.
|
|
Continue till below percentage threshold.
|
|
.IP
|
|
.nf
|
|
\f[C]
|
|
#!/bin/bash
|
|
|
|
if\ [\ $#\ !=\ 3\ ];\ then
|
|
\ \ echo\ "usage:\ $0\ <cache\-drive>\ <backing\-pool>\ <percentage>"
|
|
\ \ exit\ 1
|
|
fi
|
|
|
|
CACHE="${1}"
|
|
BACKING="${2}"
|
|
PERCENTAGE=${3}
|
|
|
|
set\ \-o\ errexit
|
|
while\ [\ $(df\ \-\-output=pcent\ "${CACHE}"\ |\ grep\ \-v\ Use\ |\ cut\ \-d\[aq]%\[aq]\ \-f1)\ \-gt\ ${PERCENTAGE}\ ]
|
|
do
|
|
\ \ \ \ FILE=$(find\ "${CACHE}"\ \-type\ f\ \-printf\ \[aq]%A\@\ %P\\n\[aq]\ |\ \\
|
|
\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ sort\ |\ \\
|
|
\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ head\ \-n\ 1\ |\ \\
|
|
\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ cut\ \-d\[aq]\ \[aq]\ \-f2\-)
|
|
\ \ \ \ test\ \-n\ "${FILE}"
|
|
\ \ \ \ rsync\ \-axqHAXWES\ \-\-preallocate\ \-\-remove\-source\-files\ "${CACHE}/./${FILE}"\ "${BACKING}/"
|
|
done
|
|
\f[]
|
|
.fi
|
|
.SH TIPS / NOTES
|
|
.IP \[bu] 2
|
|
\f[B]use_ino\f[] will only work when used with mergerfs 2.18.0 and
|
|
above.
|
|
.IP \[bu] 2
|
|
Run mergerfs as \f[C]root\f[] (with \f[B]allow_other\f[]) unless
|
|
you\[aq]re merging paths which are owned by the same user otherwise
|
|
strange permission issues may arise.
|
|
.IP \[bu] 2
|
|
https://github.com/trapexit/backup\-and\-recovery\-howtos : A set of
|
|
guides / howtos on creating a data storage system, backing it up,
|
|
maintaining it, and recovering from failure.
|
|
.IP \[bu] 2
|
|
If you don\[aq]t see some directories and files you expect in a merged
|
|
point or policies seem to skip drives be sure the user has permission to
|
|
all the underlying directories.
|
|
Use \f[C]mergerfs.fsck\f[] to audit the drive for out of sync
|
|
permissions.
|
|
.IP \[bu] 2
|
|
Do \f[B]not\f[] use \f[C]cache.files=off\f[] or \f[C]direct_io\f[] if
|
|
you expect applications (such as rtorrent) to
|
|
mmap (http://linux.die.net/man/2/mmap) files.
|
|
Shared mmap is not currently supported in FUSE w/ \f[C]direct_io\f[]
|
|
enabled.
|
|
Enabling \f[C]dropcacheonclose\f[] is recommended when
|
|
\f[C]cache.files=partial|full|auto\-full\f[] or
|
|
\f[C]direct_io=false\f[].
|
|
.IP \[bu] 2
|
|
Since POSIX functions give only a singular error or success its
|
|
difficult to determine the proper behavior when applying the function to
|
|
multiple targets.
|
|
\f[B]mergerfs\f[] will return an error only if all attempts of an action
|
|
fail.
|
|
Any success will lead to a success returned.
|
|
This means however that some odd situations may arise.
|
|
.IP \[bu] 2
|
|
Kodi (http://kodi.tv), Plex (http://plex.tv),
|
|
Subsonic (http://subsonic.org), etc.
|
|
can use directory mtime (http://linux.die.net/man/2/stat) to more
|
|
efficiently determine whether to scan for new content rather than simply
|
|
performing a full scan.
|
|
If using the default \f[B]getattr\f[] policy of \f[B]ff\f[] its possible
|
|
those programs will miss an update on account of it returning the first
|
|
directory found\[aq]s \f[B]stat\f[] info and its a later directory on
|
|
another mount which had the \f[B]mtime\f[] recently updated.
|
|
To fix this you will want to set \f[B]func.getattr=newest\f[].
|
|
Remember though that this is just \f[B]stat\f[].
|
|
If the file is later \f[B]open\f[]\[aq]ed or \f[B]unlink\f[]\[aq]ed and
|
|
the policy is different for those then a completely different file or
|
|
directory could be acted on.
|
|
.IP \[bu] 2
|
|
Some policies mixed with some functions may result in strange behaviors.
|
|
Not that some of these behaviors and race conditions couldn\[aq]t happen
|
|
outside \f[B]mergerfs\f[] but that they are far more likely to occur on
|
|
account of the attempt to merge together multiple sources of data which
|
|
could be out of sync due to the different policies.
|
|
.IP \[bu] 2
|
|
For consistency its generally best to set \f[B]category\f[] wide
|
|
policies rather than individual \f[B]func\f[]\[aq]s.
|
|
This will help limit the confusion of tools such as
|
|
rsync (http://linux.die.net/man/1/rsync).
|
|
However, the flexibility is there if needed.
|
|
.SH KNOWN ISSUES / BUGS
|
|
.SS directory mtime is not being updated
|
|
.PP
|
|
Remember that the default policy for \f[C]getattr\f[] is \f[C]ff\f[].
|
|
The information for the first directory found will be returned.
|
|
If it wasn\[aq]t the directory which had been updated then it will
|
|
appear outdated.
|
|
.PP
|
|
The reason this is the default is because any other policy would be more
|
|
expensive and for many applications it is unnecessary.
|
|
To always return the directory with the most recent mtime or a faked
|
|
value based on all found would require a scan of all drives.
|
|
.PP
|
|
If you always want the directory information from the one with the most
|
|
recent mtime then use the \f[C]newest\f[] policy for \f[C]getattr\f[].
|
|
.SS \f[C]mv\ /mnt/pool/foo\ /mnt/disk1/foo\f[] removes \f[C]foo\f[]
|
|
.PP
|
|
This is not a bug.
|
|
.PP
|
|
Run in verbose mode to better undertand what\[aq]s happening:
|
|
.IP
|
|
.nf
|
|
\f[C]
|
|
$\ mv\ \-v\ /mnt/pool/foo\ /mnt/disk1/foo
|
|
copied\ \[aq]/mnt/pool/foo\[aq]\ \->\ \[aq]/mnt/disk1/foo\[aq]
|
|
removed\ \[aq]/mnt/pool/foo\[aq]
|
|
$\ ls\ /mnt/pool/foo
|
|
ls:\ cannot\ access\ \[aq]/mnt/pool/foo\[aq]:\ No\ such\ file\ or\ directory
|
|
\f[]
|
|
.fi
|
|
.PP
|
|
\f[C]mv\f[], when working across devices, is copying the source to
|
|
target and then removing the source.
|
|
Since the source \f[B]is\f[] the target in this case, depending on the
|
|
unlink policy, it will remove the just copied file and other files
|
|
across the branches.
|
|
.PP
|
|
If you want to move files to one drive just copy them there and use
|
|
mergerfs.dedup to clean up the old paths or manually remove them from
|
|
the branches directly.
|
|
.SS cached memory appears greater than it should be
|
|
.PP
|
|
Use \f[C]cache.files=off\f[] or \f[C]direct_io=true\f[].
|
|
See the section on page caching.
|
|
.SS NFS clients returning ESTALE / Stale file handle
|
|
.PP
|
|
Be sure to use \f[C]noforget\f[] and \f[C]use_ino\f[] arguments.
|
|
.SS NFS clients don\[aq]t work
|
|
.PP
|
|
Some NFS clients appear to fail when a mergerfs mount is exported.
|
|
Kodi in particular seems to have issues.
|
|
.PP
|
|
Try enabling the \f[C]use_ino\f[] option.
|
|
Some have reported that it fixes the issue.
|
|
.SS rtorrent fails with ENODEV (No such device)
|
|
.PP
|
|
Be sure to set \f[C]cache.files=partial|full|auto\-full\f[] or turn off
|
|
\f[C]direct_io\f[].
|
|
rtorrent and some other applications use
|
|
mmap (http://linux.die.net/man/2/mmap) to read and write to files and
|
|
offer no failback to traditional methods.
|
|
FUSE does not currently support mmap while using \f[C]direct_io\f[].
|
|
There may be a performance penalty on writes with \f[C]direct_io\f[] off
|
|
as well as the problem of double caching but it\[aq]s the only way to
|
|
get such applications to work.
|
|
If the performance loss is too high for other apps you can mount
|
|
mergerfs twice.
|
|
Once with \f[C]direct_io\f[] enabled and one without it.
|
|
Be sure to set \f[C]dropcacheonclose=true\f[] if not using
|
|
\f[C]direct_io\f[].
|
|
.SS rtorrent fails with files >= 4GiB
|
|
.PP
|
|
This is a kernel bug with mmap and FUSE on 32bit platforms.
|
|
A fix should become available for all LTS releases.
|
|
.PP
|
|
https://marc.info/?l=linux\-fsdevel&m=155550785230874&w=2
|
|
.SS Plex doesn\[aq]t work with mergerfs
|
|
.PP
|
|
It does.
|
|
If you\[aq]re trying to put Plex\[aq]s config / metadata on mergerfs you
|
|
have to leave \f[C]direct_io\f[] off because Plex is using sqlite which
|
|
apparently needs mmap.
|
|
mmap doesn\[aq]t work with \f[C]direct_io\f[].
|
|
To fix this place the data elsewhere or disable \f[C]direct_io\f[] (with
|
|
\f[C]dropcacheonclose=true\f[]).
|
|
.PP
|
|
If the issue is that scanning doesn\[aq]t seem to pick up media then be
|
|
sure to set \f[C]func.getattr=newest\f[] as mentioned above.
|
|
.SS mmap performance is really bad
|
|
.PP
|
|
There is a bug (https://lkml.org/lkml/2016/3/16/260) in caching which
|
|
affects overall performance of mmap through FUSE in Linux 4.x kernels.
|
|
It is fixed in 4.4.10 and 4.5.4 (https://lkml.org/lkml/2016/5/11/59).
|
|
.SS When a program tries to move or rename a file it fails
|
|
.PP
|
|
Please read the section above regarding rename & link (#rename--link).
|
|
.PP
|
|
The problem is that many applications do not properly handle
|
|
\f[C]EXDEV\f[] errors which \f[C]rename\f[] and \f[C]link\f[] may return
|
|
even though they are perfectly valid situations which do not indicate
|
|
actual drive or OS errors.
|
|
The error will only be returned by mergerfs if using a path preserving
|
|
policy as described in the policy section above.
|
|
If you do not care about path preservation simply change the mergerfs
|
|
policy to the non\-path preserving version.
|
|
For example: \f[C]\-o\ category.create=mfs\f[]
|
|
.PP
|
|
Ideally the offending software would be fixed and it is recommended that
|
|
if you run into this problem you contact the software\[aq]s author and
|
|
request proper handling of \f[C]EXDEV\f[] errors.
|
|
.SS Samba: Moving files / directories fails
|
|
.PP
|
|
Workaround: Copy the file/directory and then remove the original rather
|
|
than move.
|
|
.PP
|
|
This isn\[aq]t an issue with Samba but some SMB clients.
|
|
GVFS\-fuse v1.20.3 and prior (found in Ubuntu 14.04 among others) failed
|
|
to handle certain error codes correctly.
|
|
Particularly \f[B]STATUS_NOT_SAME_DEVICE\f[] which comes from the
|
|
\f[B]EXDEV\f[] which is returned by \f[B]rename\f[] when the call is
|
|
crossing mount points.
|
|
When a program gets an \f[B]EXDEV\f[] it needs to explicitly take an
|
|
alternate action to accomplish its goal.
|
|
In the case of \f[B]mv\f[] or similar it tries \f[B]rename\f[] and on
|
|
\f[B]EXDEV\f[] falls back to a manual copying of data between the two
|
|
locations and unlinking the source.
|
|
In these older versions of GVFS\-fuse if it received \f[B]EXDEV\f[] it
|
|
would translate that into \f[B]EIO\f[].
|
|
This would cause \f[B]mv\f[] or most any application attempting to move
|
|
files around on that SMB share to fail with a IO error.
|
|
.PP
|
|
GVFS\-fuse v1.22.0 (https://bugzilla.gnome.org/show_bug.cgi?id=734568)
|
|
and above fixed this issue but a large number of systems use the older
|
|
release.
|
|
On Ubuntu the version can be checked by issuing
|
|
\f[C]apt\-cache\ showpkg\ gvfs\-fuse\f[].
|
|
Most distros released in 2015 seem to have the updated release and will
|
|
work fine but older systems may not.
|
|
Upgrading gvfs\-fuse or the distro in general will address the problem.
|
|
.PP
|
|
In Apple\[aq]s MacOSX 10.9 they replaced Samba (client and server) with
|
|
their own product.
|
|
It appears their new client does not handle \f[B]EXDEV\f[] either and
|
|
responds similar to older release of gvfs on Linux.
|
|
.SS Trashing files occasionally fails
|
|
.PP
|
|
This is the same issue as with Samba.
|
|
\f[C]rename\f[] returns \f[C]EXDEV\f[] (in our case that will really
|
|
only happen with path preserving policies like \f[C]epmfs\f[]) and the
|
|
software doesn\[aq]t handle the situtation well.
|
|
This is unfortunately a common failure of software which moves files
|
|
around.
|
|
The standard indicates that an implementation \f[C]MAY\f[] choose to
|
|
support non\-user home directory trashing of files (which is a
|
|
\f[C]MUST\f[]).
|
|
The implementation \f[C]MAY\f[] also support "top directory trashes"
|
|
which many probably do.
|
|
.PP
|
|
To create a \f[C]$topdir/.Trash\f[] directory as defined in the standard
|
|
use the mergerfs\-tools (https://github.com/trapexit/mergerfs-tools)
|
|
tool \f[C]mergerfs.mktrash\f[].
|
|
.SS tar: Directory renamed before its status could be extracted
|
|
.PP
|
|
Make sure to use the \f[C]use_ino\f[] option.
|
|
.SS Supplemental user groups
|
|
.PP
|
|
Due to the overhead of
|
|
getgroups/setgroups (http://linux.die.net/man/2/setgroups) mergerfs
|
|
utilizes a cache.
|
|
This cache is opportunistic and per thread.
|
|
Each thread will query the supplemental groups for a user when that
|
|
particular thread needs to change credentials and will keep that data
|
|
for the lifetime of the thread.
|
|
This means that if a user is added to a group it may not be picked up
|
|
without the restart of mergerfs.
|
|
However, since the high level FUSE API\[aq]s (at least the standard
|
|
version) thread pool dynamically grows and shrinks it\[aq]s possible
|
|
that over time a thread will be killed and later a new thread with no
|
|
cache will start and query the new data.
|
|
.PP
|
|
The gid cache uses fixed storage to simplify the design and be
|
|
compatible with older systems which may not have C++11 compilers.
|
|
There is enough storage for 256 users\[aq] supplemental groups.
|
|
Each user is allowed upto 32 supplemental groups.
|
|
Linux >= 2.6.3 allows upto 65535 groups per user but most other *nixs
|
|
allow far less.
|
|
NFS allowing only 16.
|
|
The system does handle overflow gracefully.
|
|
If the user has more than 32 supplemental groups only the first 32 will
|
|
be used.
|
|
If more than 256 users are using the system when an uncached user is
|
|
found it will evict an existing user\[aq]s cache at random.
|
|
So long as there aren\[aq]t more than 256 active users this should be
|
|
fine.
|
|
If either value is too low for your needs you will have to modify
|
|
\f[C]gidcache.hpp\f[] to increase the values.
|
|
Note that doing so will increase the memory needed by each thread.
|
|
.SS mergerfs or libfuse crashing
|
|
.PP
|
|
\f[B]NOTE:\f[] as of mergerfs 2.22.0 it includes the most recent version
|
|
of libfuse (or requires libfuse\-2.9.7) so any crash should be reported.
|
|
For older releases continue reading...
|
|
.PP
|
|
If suddenly the mergerfs mount point disappears and
|
|
\f[C]Transport\ endpoint\ is\ not\ connected\f[] is returned when
|
|
attempting to perform actions within the mount directory \f[B]and\f[]
|
|
the version of libfuse (use \f[C]mergerfs\ \-v\f[] to find the version)
|
|
is older than \f[C]2.9.4\f[] its likely due to a bug in libfuse.
|
|
Affected versions of libfuse can be found in Debian Wheezy, Ubuntu
|
|
Precise and others.
|
|
.PP
|
|
In order to fix this please install newer versions of libfuse.
|
|
If using a Debian based distro (Debian,Ubuntu,Mint) you can likely just
|
|
install newer versions of
|
|
libfuse (https://packages.debian.org/unstable/libfuse2) and
|
|
fuse (https://packages.debian.org/unstable/fuse) from the repo of a
|
|
newer release.
|
|
.SS mergerfs appears to be crashing or exiting
|
|
.PP
|
|
There seems to be an issue with Linux version \f[C]4.9.0\f[] and above
|
|
in which an invalid message appears to be transmitted to libfuse (used
|
|
by mergerfs) causing it to exit.
|
|
No messages will be printed in any logs as its not a proper crash.
|
|
Debugging of the issue is still ongoing and can be followed via the
|
|
fuse\-devel
|
|
thread (https://sourceforge.net/p/fuse/mailman/message/35662577).
|
|
.SS mergerfs under heavy load and memory preasure leads to kernel panic
|
|
.PP
|
|
https://lkml.org/lkml/2016/9/14/527
|
|
.IP
|
|
.nf
|
|
\f[C]
|
|
[25192.515454]\ kernel\ BUG\ at\ /build/linux\-a2WvEb/linux\-4.4.0/mm/workingset.c:346!
|
|
[25192.517521]\ invalid\ opcode:\ 0000\ [#1]\ SMP
|
|
[25192.519602]\ Modules\ linked\ in:\ netconsole\ ip6t_REJECT\ nf_reject_ipv6\ ipt_REJECT\ nf_reject_ipv4\ configfs\ binfmt_misc\ veth\ bridge\ stp\ llc\ nf_conntrack_ipv6\ nf_defrag_ipv6\ xt_conntrack\ ip6table_filter\ ip6_tables\ xt_multiport\ iptable_filter\ ipt_MASQUERADE\ nf_nat_masquerade_ipv4\ xt_comment\ xt_nat\ iptable_nat\ nf_conntrack_ipv4\ nf_defrag_ipv4\ nf_nat_ipv4\ nf_nat\ nf_conntrack\ xt_CHECKSUM\ xt_tcpudp\ iptable_mangle\ ip_tables\ x_tables\ intel_rapl\ x86_pkg_temp_thermal\ intel_powerclamp\ eeepc_wmi\ asus_wmi\ coretemp\ sparse_keymap\ kvm_intel\ ppdev\ kvm\ irqbypass\ mei_me\ 8250_fintek\ input_leds\ serio_raw\ parport_pc\ tpm_infineon\ mei\ shpchp\ mac_hid\ parport\ lpc_ich\ autofs4\ drbg\ ansi_cprng\ dm_crypt\ algif_skcipher\ af_alg\ btrfs\ raid456\ async_raid6_recov\ async_memcpy\ async_pq\ async_xor\ async_tx\ xor\ raid6_pq\ libcrc32c\ raid0\ multipath\ linear\ raid10\ raid1\ i915\ crct10dif_pclmul\ crc32_pclmul\ aesni_intel\ i2c_algo_bit\ aes_x86_64\ drm_kms_helper\ lrw\ gf128mul\ glue_helper\ ablk_helper\ syscopyarea\ cryptd\ sysfillrect\ sysimgblt\ fb_sys_fops\ drm\ ahci\ r8169\ libahci\ mii\ wmi\ fjes\ video\ [last\ unloaded:\ netconsole]
|
|
[25192.540910]\ CPU:\ 2\ PID:\ 63\ Comm:\ kswapd0\ Not\ tainted\ 4.4.0\-36\-generic\ #55\-Ubuntu
|
|
[25192.543411]\ Hardware\ name:\ System\ manufacturer\ System\ Product\ Name/P8H67\-M\ PRO,\ BIOS\ 3904\ 04/27/2013
|
|
[25192.545840]\ task:\ ffff88040cae6040\ ti:\ ffff880407488000\ task.ti:\ ffff880407488000
|
|
[25192.548277]\ RIP:\ 0010:[<ffffffff811ba501>]\ \ [<ffffffff811ba501>]\ shadow_lru_isolate+0x181/0x190
|
|
[25192.550706]\ RSP:\ 0018:ffff88040748bbe0\ \ EFLAGS:\ 00010002
|
|
[25192.553127]\ RAX:\ 0000000000001c81\ RBX:\ ffff8802f91ee928\ RCX:\ ffff8802f91eeb38
|
|
[25192.555544]\ RDX:\ ffff8802f91ee938\ RSI:\ ffff8802f91ee928\ RDI:\ ffff8804099ba2c0
|
|
[25192.557914]\ RBP:\ ffff88040748bc08\ R08:\ 000000000001a7b6\ R09:\ 000000000000003f
|
|
[25192.560237]\ R10:\ 000000000001a750\ R11:\ 0000000000000000\ R12:\ ffff8804099ba2c0
|
|
[25192.562512]\ R13:\ ffff8803157e9680\ R14:\ ffff8803157e9668\ R15:\ ffff8804099ba2c8
|
|
[25192.564724]\ FS:\ \ 0000000000000000(0000)\ GS:ffff88041f280000(0000)\ knlGS:0000000000000000
|
|
[25192.566990]\ CS:\ \ 0010\ DS:\ 0000\ ES:\ 0000\ CR0:\ 0000000080050033
|
|
[25192.569201]\ CR2:\ 00007ffabb690000\ CR3:\ 0000000001e0a000\ CR4:\ 00000000000406e0
|
|
[25192.571419]\ Stack:
|
|
[25192.573550]\ \ ffff8804099ba2c0\ ffff88039e4f86f0\ ffff8802f91ee928\ ffff8804099ba2c8
|
|
[25192.575695]\ \ ffff88040748bd08\ ffff88040748bc58\ ffffffff811b99bf\ 0000000000000052
|
|
[25192.577814]\ \ 0000000000000000\ ffffffff811ba380\ 000000000000008a\ 0000000000000080
|
|
[25192.579947]\ Call\ Trace:
|
|
[25192.582022]\ \ [<ffffffff811b99bf>]\ __list_lru_walk_one.isra.3+0x8f/0x130
|
|
[25192.584137]\ \ [<ffffffff811ba380>]\ ?\ memcg_drain_all_list_lrus+0x190/0x190
|
|
[25192.586165]\ \ [<ffffffff811b9a83>]\ list_lru_walk_one+0x23/0x30
|
|
[25192.588145]\ \ [<ffffffff811ba544>]\ scan_shadow_nodes+0x34/0x50
|
|
[25192.590074]\ \ [<ffffffff811a0e9d>]\ shrink_slab.part.40+0x1ed/0x3d0
|
|
[25192.591985]\ \ [<ffffffff811a53da>]\ shrink_zone+0x2ca/0x2e0
|
|
[25192.593863]\ \ [<ffffffff811a64ce>]\ kswapd+0x51e/0x990
|
|
[25192.595737]\ \ [<ffffffff811a5fb0>]\ ?\ mem_cgroup_shrink_node_zone+0x1c0/0x1c0
|
|
[25192.597613]\ \ [<ffffffff810a0808>]\ kthread+0xd8/0xf0
|
|
[25192.599495]\ \ [<ffffffff810a0730>]\ ?\ kthread_create_on_node+0x1e0/0x1e0
|
|
[25192.601335]\ \ [<ffffffff8182e34f>]\ ret_from_fork+0x3f/0x70
|
|
[25192.603193]\ \ [<ffffffff810a0730>]\ ?\ kthread_create_on_node+0x1e0/0x1e0
|
|
\f[]
|
|
.fi
|
|
.PP
|
|
There is a bug in the kernel.
|
|
A work around appears to be turning off \f[C]splice\f[].
|
|
Don\[aq]t add the \f[C]splice_*\f[] arguments or add
|
|
\f[C]no_splice_write,no_splice_move,no_splice_read\f[].
|
|
This, however, is not guaranteed to work.
|
|
.SS rm: fts_read failed: No such file or directory
|
|
.PP
|
|
NOTE: This is only relevant to mergerfs versions at or below v2.25.x and
|
|
should not occur in more recent versions.
|
|
See the notes on \f[C]unlink\f[].
|
|
.PP
|
|
Not \f[I]really\f[] a bug.
|
|
The FUSE library will move files when asked to delete them as a way to
|
|
deal with certain edge cases and then later delete that file when its
|
|
clear the file is no longer needed.
|
|
This however can lead to two issues.
|
|
One is that these hidden files are noticed by \f[C]rm\ \-rf\f[] or
|
|
\f[C]find\f[] when scanning directories and they may try to remove them
|
|
and they might have disappeared already.
|
|
There is nothing \f[I]wrong\f[] about this happening but it can be
|
|
annoying.
|
|
The second issue is that a directory might not be able to removed on
|
|
account of the hidden file being still there.
|
|
.PP
|
|
Using the \f[B]hard_remove\f[] option will make it so these temporary
|
|
files are not used and files are deleted immedately.
|
|
That has a side effect however.
|
|
Files which are unlinked and then they are still used (in certain forms)
|
|
will result in an error (ENOENT).
|
|
.SH FAQ
|
|
.SS How well does mergerfs scale? Is it "production ready?"
|
|
.PP
|
|
Users have reported running mergerfs on everything from a Raspberry Pi
|
|
to dual socket Xeon systems with >20 cores.
|
|
I\[aq]m aware of at least a few companies which use mergerfs in
|
|
production.
|
|
Open Media Vault (https://www.openmediavault.org) includes mergerfs as
|
|
its sole solution for pooling drives.
|
|
.SS Can mergerfs be used with drives which already have data / are in
|
|
use?
|
|
.PP
|
|
Yes.
|
|
MergerFS is a proxy and does \f[B]NOT\f[] interfere with the normal form
|
|
or function of the drives / mounts / paths it manages.
|
|
.PP
|
|
MergerFS is \f[B]not\f[] a traditional filesystem.
|
|
MergerFS is \f[B]not\f[] RAID.
|
|
It does \f[B]not\f[] manipulate the data that passes through it.
|
|
It does \f[B]not\f[] shard data across drives.
|
|
It merely shards some \f[B]behavior\f[] and aggregates others.
|
|
.SS Can mergerfs be removed without affecting the data?
|
|
.PP
|
|
See the previous question\[aq]s answer.
|
|
.SS Do hard links work?
|
|
.PP
|
|
Yes.
|
|
You need to use \f[C]use_ino\f[] to support proper reporting of inodes.
|
|
.PP
|
|
What mergerfs does not do is fake hard links across branches.
|
|
Read the section "rename & link" for how it works.
|
|
.SS Does mergerfs support CoW / copy\-on\-write?
|
|
.PP
|
|
Not in the sense of a filesystem like BTRFS or ZFS nor in the overlayfs
|
|
or aufs sense.
|
|
It does offer a
|
|
cow\-shell (http://manpages.ubuntu.com/manpages/bionic/man1/cow-shell.1.html)
|
|
like hard link breaking (copy to temp file then rename over original)
|
|
which can be useful when wanting to save space by hardlinking duplicate
|
|
files but wish to treat each name as if it were a unique and separate
|
|
file.
|
|
.SS Why can\[aq]t I see my files / directories?
|
|
.PP
|
|
It\[aq]s almost always a permissions issue.
|
|
Unlike mhddfs, which runs as root and attempts to access content as
|
|
such, mergerfs always changes its credentials to that of the caller.
|
|
This means that if the user does not have access to a file or directory
|
|
than neither will mergerfs.
|
|
However, because mergerfs is creating a union of paths it may be able to
|
|
read some files and directories on one drive but not another resulting
|
|
in an incomplete set.
|
|
.PP
|
|
Whenever you run into a split permission issue (seeing some but not all
|
|
files) try using
|
|
mergerfs.fsck (https://github.com/trapexit/mergerfs-tools) tool to check
|
|
for and fix the mismatch.
|
|
If you aren\[aq]t seeing anything at all be sure that the basic
|
|
permissions are correct.
|
|
The user and group values are correct and that directories have their
|
|
executable bit set.
|
|
A common mistake by users new to Linux is to \f[C]chmod\ \-R\ 644\f[]
|
|
when they should have \f[C]chmod\ \-R\ u=rwX,go=rX\f[].
|
|
.PP
|
|
If using a network filesystem such as NFS, SMB, CIFS (Samba) be sure to
|
|
pay close attention to anything regarding permissioning and users.
|
|
Root squashing and user translation for instance has bitten a few
|
|
mergerfs users.
|
|
Some of these also affect the use of mergerfs from container platforms
|
|
such as Docker.
|
|
.SS Why is only one drive being used?
|
|
.PP
|
|
Are you using a path preserving policy?
|
|
The default policy for file creation is \f[C]epmfs\f[].
|
|
That means only the drives with the path preexisting will be considered
|
|
when creating a file.
|
|
If you don\[aq]t care about where files and directories are created you
|
|
likely shouldn\[aq]t be using a path preserving policy and instead
|
|
something like \f[C]mfs\f[].
|
|
.PP
|
|
This can be especially apparent when filling an empty pool from an
|
|
external source.
|
|
If you do want path preservation you\[aq]ll need to perform the manual
|
|
act of creating paths on the drives you want the data to land on before
|
|
transfering your data.
|
|
Setting \f[C]func.mkdir=epall\f[] can simplify managing path
|
|
perservation for \f[C]create\f[].
|
|
.SS Why was libfuse embedded into mergerfs?
|
|
.IP "1." 3
|
|
A significant number of users use mergerfs on distros with old versions
|
|
of libfuse which have serious bugs.
|
|
Requiring updated versions of libfuse on those distros isn\[aq]t
|
|
pratical (no package offered, user inexperience, etc.).
|
|
The only practical way to provide a stable runtime on those systems was
|
|
to "vendor" / embed the library into the project.
|
|
.IP "2." 3
|
|
mergerfs was written to use the high level API.
|
|
There are a number of limitations in the HLAPI that make certain
|
|
features difficult or impossible to implement.
|
|
While some of these features could be patched into newer versions of
|
|
libfuse without breaking the public API some of them would require hacky
|
|
code to provide backwards compatibility.
|
|
While it may still be worth working with upstream to address these
|
|
issues in future versions, since the library needs to be vendored for
|
|
stability and compatibility reasons it is preferable / easier to modify
|
|
the API.
|
|
Longer term the plan is to rewrite mergerfs to use the low level API.
|
|
.SS Why did support for system libfuse get removed?
|
|
.PP
|
|
See above first.
|
|
.PP
|
|
If/when mergerfs is rewritten to use the low\-level API then it\[aq]ll
|
|
be plausible to support system libfuse but till then its simply too much
|
|
work to manage the differences across the versions.
|
|
.SS Why use mergerfs over mhddfs?
|
|
.PP
|
|
mhddfs is no longer maintained and has some known stability and security
|
|
issues (see below).
|
|
MergerFS provides a superset of mhddfs\[aq] features and should offer
|
|
the same or maybe better performance.
|
|
.PP
|
|
Below is an example of mhddfs and mergerfs setup to work similarly.
|
|
.PP
|
|
\f[C]mhddfs\ \-o\ mlimit=4G,allow_other\ /mnt/drive1,/mnt/drive2\ /mnt/pool\f[]
|
|
.PP
|
|
\f[C]mergerfs\ \-o\ minfreespace=4G,allow_other,category.create=ff\ /mnt/drive1:/mnt/drive2\ /mnt/pool\f[]
|
|
.SS Why use mergerfs over aufs?
|
|
.PP
|
|
aufs is mostly abandoned and no longer available in many distros.
|
|
.PP
|
|
While aufs can offer better peak performance mergerfs provides more
|
|
configurability and is generally easier to use.
|
|
mergerfs however does not offer the overlay / copy\-on\-write (CoW)
|
|
features which aufs and overlayfs have.
|
|
.SS Why use mergerfs over unionfs?
|
|
.PP
|
|
UnionFS is more like aufs then mergerfs in that it offers overlay / CoW
|
|
features.
|
|
If you\[aq]re just looking to create a union of drives and want
|
|
flexibility in file/directory placement then mergerfs offers that
|
|
whereas unionfs is more for overlaying RW filesystems over RO ones.
|
|
.SS Why use mergerfs over LVM/ZFS/BTRFS/RAID0 drive concatenation /
|
|
striping?
|
|
.PP
|
|
With simple JBOD / drive concatenation / stripping / RAID0 a single
|
|
drive failure will result in full pool failure.
|
|
mergerfs performs a similar behavior without the possibility of
|
|
catastrophic failure and the difficulties in recovery.
|
|
Drives may fail however all other data will continue to be accessable.
|
|
.PP
|
|
When combined with something like SnapRaid (http://www.snapraid.it)
|
|
and/or an offsite backup solution you can have the flexibilty of JBOD
|
|
without the single point of failure.
|
|
.SS Why use mergerfs over ZFS?
|
|
.PP
|
|
MergerFS is not intended to be a replacement for ZFS.
|
|
MergerFS is intended to provide flexible pooling of arbitrary drives
|
|
(local or remote), of arbitrary sizes, and arbitrary filesystems.
|
|
For \f[C]write\ once,\ read\ many\f[] usecases such as bulk media
|
|
storage.
|
|
Where data integrity and backup is managed in other ways.
|
|
In that situation ZFS can introduce major maintance and cost burdens as
|
|
described
|
|
here (http://louwrentius.com/the-hidden-cost-of-using-zfs-for-your-home-nas.html).
|
|
.SS Can drives be written to directly? Outside of mergerfs while pooled?
|
|
.PP
|
|
Yes, however its not recommended to use the same file from within the
|
|
pool and from without at the same time.
|
|
Especially if using caching of any kind (cache.files, cache.entry,
|
|
cache.attr, cache.negative_entry, cache.symlinks, cache.readdir, etc.).
|
|
.SS Why do I get an "out of space" / "no space left on device" / ENOSPC
|
|
error even though there appears to be lots of space available?
|
|
.PP
|
|
First make sure you\[aq]ve read the sections above about policies, path
|
|
preservation, branch filtering, and the options \f[B]minfreespace\f[],
|
|
\f[B]moveonenospc\f[], \f[B]statfs\f[], and \f[B]statfs_ignore\f[].
|
|
.PP
|
|
mergerfs is simply presenting a union of the content within multiple
|
|
branches.
|
|
The reported free space is an aggregate of space available within the
|
|
pool (behavior modified by \f[B]statfs\f[] and \f[B]statfs_ignore\f[]).
|
|
It does not represent a contiguous space.
|
|
In the same way that read\-only filesystems, those with quotas, or
|
|
reserved space report the full theoretical space available.
|
|
.PP
|
|
Due to path preservation, branch tagging, read\-only status, and
|
|
\f[B]minfreespace\f[] settings it is perfectly valid that
|
|
\f[C]ENOSPC\f[] / "out of space" / "no space left on device" be
|
|
returned.
|
|
It is doing what was asked of it: filtering possible branches due to
|
|
those settings.
|
|
Only one error can be returned and if one of the reasons for filtering a
|
|
branch was \f[B]minfreespace\f[] then it will be returned as such.
|
|
\f[B]moveonenospc\f[] is only relevant to writing a file which is too
|
|
large for the drive its currently on.
|
|
.PP
|
|
It is also possible that the filesystem selected has run out of inodes.
|
|
Use \f[C]df\ \-i\f[] to list the total and available inodes per
|
|
filesystem.
|
|
.PP
|
|
If you don\[aq]t care about path preservation then simply change the
|
|
\f[C]create\f[] policy to one which isn\[aq]t.
|
|
\f[C]mfs\f[] is probably what most are looking for.
|
|
The reason its not default is because it was originally set to
|
|
\f[C]epmfs\f[] and changing it now would change people\[aq]s setup.
|
|
Such a setting change will likely occur in mergerfs 3.
|
|
.SS Can mergerfs mounts be exported over NFS?
|
|
.PP
|
|
Yes.
|
|
Due to current usage of libfuse by mergerfs and how NFS interacts with
|
|
it it is necessary to add \f[C]noforget\f[] to mergerfs options to keep
|
|
from getting "stale file handle" errors.
|
|
.PP
|
|
Some clients (Kodi) have issues in which the contents of the NFS mount
|
|
will not be presented but users have found that enabling the
|
|
\f[C]use_ino\f[] option often fixes that problem.
|
|
.SS Can mergerfs mounts be exported over Samba / SMB?
|
|
.PP
|
|
Yes.
|
|
While some users have reported problems it appears to always be related
|
|
to how Samba is setup in relation to permissions.
|
|
.SS How are inodes calculated?
|
|
.PP
|
|
mergerfs\-inode = (original\-inode | (device\-id << 32))
|
|
.PP
|
|
While \f[C]ino_t\f[] is 64 bits only a few filesystems use more than 32.
|
|
Similarly, while \f[C]dev_t\f[] is also 64 bits it was traditionally 16
|
|
bits.
|
|
Bitwise or\[aq]ing them together should work most of the time.
|
|
While totally unique inodes are preferred the overhead which would be
|
|
needed does not seem to outweighted by the benefits.
|
|
.PP
|
|
While atypical, yes, inodes can be reused and not refer to the same
|
|
file.
|
|
The internal id used to reference a file in FUSE is different from the
|
|
inode value presented.
|
|
The former is the \f[C]nodeid\f[] and is actually a tuple of
|
|
(nodeid,generation).
|
|
That tuple is not user facing.
|
|
The inode is merely metadata passed through the kernel and found using
|
|
the \f[C]stat\f[] family of calls or \f[C]readdir\f[].
|
|
.PP
|
|
From FUSE docs regarding \f[C]use_ino\f[]:
|
|
.IP
|
|
.nf
|
|
\f[C]
|
|
Honor\ the\ st_ino\ field\ in\ the\ functions\ getattr()\ and
|
|
fill_dir().\ This\ value\ is\ used\ to\ fill\ in\ the\ st_ino\ field
|
|
in\ the\ stat(2),\ lstat(2),\ fstat(2)\ functions\ and\ the\ d_ino
|
|
field\ in\ the\ readdir(2)\ function.\ The\ filesystem\ does\ not
|
|
have\ to\ guarantee\ uniqueness,\ however\ some\ applications
|
|
rely\ on\ this\ value\ being\ unique\ for\ the\ whole\ filesystem.
|
|
Note\ that\ this\ does\ *not*\ affect\ the\ inode\ that\ libfuse
|
|
and\ the\ kernel\ use\ internally\ (also\ called\ the\ "nodeid").
|
|
\f[]
|
|
.fi
|
|
.SS I notice massive slowdowns of writes over NFS
|
|
.PP
|
|
Due to how NFS works and interacts with FUSE when not using
|
|
\f[C]cache.files=off\f[] or \f[C]direct_io\f[] its possible that a
|
|
getxattr for \f[C]security.capability\f[] will be issued prior to any
|
|
write.
|
|
This will usually result in a massive slowdown for writes.
|
|
Using \f[C]cache.files=off\f[] or \f[C]direct_io\f[] will keep this from
|
|
happening (and generally good to enable unless you need the features it
|
|
disables) but the \f[C]security_capability\f[] option can also help by
|
|
short circuiting the call and returning \f[C]ENOATTR\f[].
|
|
.PP
|
|
You could also set \f[C]xattr\f[] to \f[C]noattr\f[] or \f[C]nosys\f[]
|
|
to short circuit or stop all xattr requests.
|
|
.SS What are these .fuse_hidden files?
|
|
.PP
|
|
NOTE: mergerfs >= 2.26.0 will not have these temporary files.
|
|
See the notes on \f[C]unlink\f[].
|
|
.PP
|
|
When not using \f[B]hard_remove\f[] libfuse will create
|
|
\&.fuse_hiddenXXXXXXXX files when an opened file is unlinked.
|
|
This is to simplify "use after unlink" usecases.
|
|
There is a possibility these files end up being picked up by software
|
|
scanning directories and not ignoring hidden files.
|
|
This is rarely a problem but a solution is in the works.
|
|
.PP
|
|
The files are cleaned up once the file is finally closed.
|
|
Only if mergerfs crashes or is killed would they be left around.
|
|
They are safe to remove as they are already unlinked files.
|
|
.SS It\[aq]s mentioned that there are some security issues with mhddfs.
|
|
What are they? How does mergerfs address them?
|
|
.PP
|
|
mhddfs (https://github.com/trapexit/mhddfs) manages running as
|
|
\f[B]root\f[] by calling
|
|
getuid() (https://github.com/trapexit/mhddfs/blob/cae96e6251dd91e2bdc24800b4a18a74044f6672/src/main.c#L319)
|
|
and if it returns \f[B]0\f[] then it will
|
|
chown (http://linux.die.net/man/1/chown) the file.
|
|
Not only is that a race condition but it doesn\[aq]t handle other
|
|
situations.
|
|
Rather than attempting to simulate POSIX ACL behavior the proper way to
|
|
manage this is to use seteuid (http://linux.die.net/man/2/seteuid) and
|
|
setegid (http://linux.die.net/man/2/setegid), in effect becoming the
|
|
user making the original call, and perform the action as them.
|
|
This is what mergerfs does and why mergerfs should always run as root.
|
|
.PP
|
|
In Linux setreuid syscalls apply only to the thread.
|
|
GLIBC hides this away by using realtime signals to inform all threads to
|
|
change credentials.
|
|
Taking after \f[B]Samba\f[], mergerfs uses
|
|
\f[B]syscall(SYS_setreuid,...)\f[] to set the callers credentials for
|
|
that thread only.
|
|
Jumping back to \f[B]root\f[] as necessary should escalated privileges
|
|
be needed (for instance: to clone paths between drives).
|
|
.PP
|
|
For non\-Linux systems mergerfs uses a read\-write lock and changes
|
|
credentials only when necessary.
|
|
If multiple threads are to be user X then only the first one will need
|
|
to change the processes credentials.
|
|
So long as the other threads need to be user X they will take a readlock
|
|
allowing multiple threads to share the credentials.
|
|
Once a request comes in to run as user Y that thread will attempt a
|
|
write lock and change to Y\[aq]s credentials when it can.
|
|
If the ability to give writers priority is supported then that flag will
|
|
be used so threads trying to change credentials don\[aq]t starve.
|
|
This isn\[aq]t the best solution but should work reasonably well
|
|
assuming there are few users.
|
|
.SH PERFORMANCE TWEAKING
|
|
.PP
|
|
NOTE: be sure to read about these features before changing them
|
|
.IP \[bu] 2
|
|
enable (or disable) \f[C]splice_move\f[], \f[C]splice_read\f[], and
|
|
\f[C]splice_write\f[]
|
|
.IP \[bu] 2
|
|
increase cache timeouts \f[C]cache.attr\f[], \f[C]cache.entry\f[],
|
|
\f[C]cache.negative_entry\f[]
|
|
.IP \[bu] 2
|
|
enable (or disable) page caching (\f[C]cache.files\f[])
|
|
.IP \[bu] 2
|
|
enable \f[C]cache.open\f[]
|
|
.IP \[bu] 2
|
|
enable \f[C]cache.statfs\f[]
|
|
.IP \[bu] 2
|
|
enable \f[C]cache.symlinks\f[]
|
|
.IP \[bu] 2
|
|
enable \f[C]cache.readdir\f[]
|
|
.IP \[bu] 2
|
|
change the number of worker threads
|
|
.IP \[bu] 2
|
|
disable \f[C]security_capability\f[] and/or \f[C]xattr\f[]
|
|
.IP \[bu] 2
|
|
disable \f[C]posix_acl\f[]
|
|
.IP \[bu] 2
|
|
disable \f[C]async_read\f[]
|
|
.IP \[bu] 2
|
|
test theoretical performance using \f[C]nullrw\f[] or mounting a ram
|
|
disk
|
|
.IP \[bu] 2
|
|
use \f[C]symlinkify\f[] if your data is largely static
|
|
.IP \[bu] 2
|
|
use tiered cache drives
|
|
.IP \[bu] 2
|
|
use lvm and lvm cache to place a SSD in front of your HDDs (howto
|
|
coming)
|
|
.SH SUPPORT
|
|
.PP
|
|
Filesystems are very complex and difficult to debug.
|
|
mergerfs, while being just a proxy of sorts, is also very difficult to
|
|
debug given the large number of possible settings it can have itself and
|
|
the massive number of environments it can run in.
|
|
When reporting on a suspected issue \f[B]please, please\f[] include as
|
|
much of the below information as possible otherwise it will be difficult
|
|
or impossible to diagnose.
|
|
Also please make sure to read all of the above documentation as it
|
|
includes nearly every known system or user issue previously encountered.
|
|
.SS Information to include in bug reports
|
|
.IP \[bu] 2
|
|
Version of mergerfs: \f[C]mergerfs\ \-V\f[]
|
|
.IP \[bu] 2
|
|
mergerfs settings: from \f[C]/etc/fstab\f[] or command line execution
|
|
.IP \[bu] 2
|
|
Version of Linux: \f[C]uname\ \-a\f[]
|
|
.IP \[bu] 2
|
|
Versions of any additional software being used
|
|
.IP \[bu] 2
|
|
List of drives, their filesystems, and sizes (before and after issue):
|
|
\f[C]df\ \-h\f[]
|
|
.IP \[bu] 2
|
|
A \f[C]strace\f[] of the app having problems:
|
|
.IP \[bu] 2
|
|
\f[C]strace\ \-f\ \-o\ /tmp/app.strace.txt\ <cmd>\f[]
|
|
.IP \[bu] 2
|
|
A \f[C]strace\f[] of mergerfs while the program is trying to do whatever
|
|
it\[aq]s failing to do:
|
|
.IP \[bu] 2
|
|
\f[C]strace\ \-f\ \-p\ <mergerfsPID>\ \-o\ /tmp/mergerfs.strace.txt\f[]
|
|
.IP \[bu] 2
|
|
\f[B]Precise\f[] directions on replicating the issue.
|
|
Do not leave \f[B]anything\f[] out.
|
|
.IP \[bu] 2
|
|
Try to recreate the problem in the simplist way using standard programs.
|
|
.SS Contact / Issue submission
|
|
.IP \[bu] 2
|
|
github.com: https://github.com/trapexit/mergerfs/issues
|
|
.IP \[bu] 2
|
|
email: trapexit\@spawn.link
|
|
.IP \[bu] 2
|
|
twitter: https://twitter.com/_trapexit
|
|
.IP \[bu] 2
|
|
reddit: https://www.reddit.com/user/trapexit
|
|
.IP \[bu] 2
|
|
discord: https://discord.gg/MpAr69V
|
|
.SS Support development
|
|
.PP
|
|
This software is free to use and released under a very liberal license.
|
|
That said if you like this software and would like to support its
|
|
development donations are welcome.
|
|
.IP \[bu] 2
|
|
PayPal: https://paypal.me/trapexit
|
|
.IP \[bu] 2
|
|
Patreon: https://www.patreon.com/trapexit
|
|
.IP \[bu] 2
|
|
SubscribeStar: https://www.subscribestar.com/trapexit
|
|
.IP \[bu] 2
|
|
Bitcoin (BTC): 12CdMhEPQVmjz3SSynkAEuD5q9JmhTDCZA
|
|
.IP \[bu] 2
|
|
Bitcoin Cash (BCH): 1AjPqZZhu7GVEs6JFPjHmtsvmDL4euzMzp
|
|
.IP \[bu] 2
|
|
Ethereum (ETH): 0x09A166B11fCC127324C7fc5f1B572255b3046E94
|
|
.IP \[bu] 2
|
|
Litecoin (LTC): LXAsq6yc6zYU3EbcqyWtHBrH1Ypx4GjUjm
|
|
.SH LINKS
|
|
.IP \[bu] 2
|
|
https://spawn.link
|
|
.IP \[bu] 2
|
|
https://github.com/trapexit/mergerfs
|
|
.IP \[bu] 2
|
|
https://github.com/trapexit/mergerfs\-tools
|
|
.IP \[bu] 2
|
|
https://github.com/trapexit/scorch
|
|
.IP \[bu] 2
|
|
https://github.com/trapexit/bbf
|
|
.IP \[bu] 2
|
|
https://github.com/trapexit/backup\-and\-recovery\-howtos
|
|
.SH AUTHORS
|
|
Antonio SJ Musumeci <trapexit@spawn.link>.
|