mirror of https://github.com/trapexit/mergerfs.git
You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
3079 lines
113 KiB
3079 lines
113 KiB
.\"t
|
|
.\" Automatically generated by Pandoc 2.9.2.1
|
|
.\"
|
|
.TH "mergerfs" "1" "" "mergerfs user manual" ""
|
|
.hy
|
|
.SH NAME
|
|
.PP
|
|
mergerfs - a featureful union filesystem
|
|
.SH SYNOPSIS
|
|
.PP
|
|
mergerfs -o<options> <branches> <mountpoint>
|
|
.SH DESCRIPTION
|
|
.PP
|
|
\f[B]mergerfs\f[R] is a union filesystem geared towards simplifying
|
|
storage and management of files across numerous commodity storage
|
|
devices.
|
|
It is similar to \f[B]mhddfs\f[R], \f[B]unionfs\f[R], and
|
|
\f[B]aufs\f[R].
|
|
.SH FEATURES
|
|
.IP \[bu] 2
|
|
Configurable behaviors / file placement
|
|
.IP \[bu] 2
|
|
Ability to add or remove filesystems at will
|
|
.IP \[bu] 2
|
|
Resistance to individual filesystem failure
|
|
.IP \[bu] 2
|
|
Support for extended attributes (xattrs)
|
|
.IP \[bu] 2
|
|
Support for file attributes (chattr)
|
|
.IP \[bu] 2
|
|
Runtime configurable (via xattrs)
|
|
.IP \[bu] 2
|
|
Works with heterogeneous filesystem types
|
|
.IP \[bu] 2
|
|
Moving of file when filesystem runs out of space while writing
|
|
.IP \[bu] 2
|
|
Ignore read-only filesystems when creating files
|
|
.IP \[bu] 2
|
|
Turn read-only files into symlinks to underlying file
|
|
.IP \[bu] 2
|
|
Hard link copy-on-write / CoW
|
|
.IP \[bu] 2
|
|
Support for POSIX ACLs
|
|
.IP \[bu] 2
|
|
Misc other things
|
|
.SH HOW IT WORKS
|
|
.PP
|
|
mergerfs logically merges multiple paths together.
|
|
Think a union of sets.
|
|
The file/s or directory/s acted on or presented through mergerfs are
|
|
based on the policy chosen for that particular action.
|
|
Read more about policies below.
|
|
.IP
|
|
.nf
|
|
\f[C]
|
|
A + B = C
|
|
/disk1 /disk2 /merged
|
|
| | |
|
|
+-- /dir1 +-- /dir1 +-- /dir1
|
|
| | | | | |
|
|
| +-- file1 | +-- file2 | +-- file1
|
|
| | +-- file3 | +-- file2
|
|
+-- /dir2 | | +-- file3
|
|
| | +-- /dir3 |
|
|
| +-- file4 | +-- /dir2
|
|
| +-- file5 | |
|
|
+-- file6 | +-- file4
|
|
|
|
|
+-- /dir3
|
|
| |
|
|
| +-- file5
|
|
|
|
|
+-- file6
|
|
\f[R]
|
|
.fi
|
|
.PP
|
|
mergerfs does \f[B]not\f[R] support the copy-on-write (CoW) or whiteout
|
|
behaviors found in \f[B]aufs\f[R] and \f[B]overlayfs\f[R].
|
|
You can \f[B]not\f[R] mount a read-only filesystem and write to it.
|
|
However, mergerfs will ignore read-only filesystems when creating new
|
|
files so you can mix read-write and read-only filesystems.
|
|
It also does \f[B]not\f[R] split data across filesystems.
|
|
It is not RAID0 / striping.
|
|
It is simply a union of other filesystems.
|
|
.SH TERMINOLOGY
|
|
.IP \[bu] 2
|
|
branch: A base path used in the pool.
|
|
.IP \[bu] 2
|
|
pool: The mergerfs mount.
|
|
The union of the branches.
|
|
.IP \[bu] 2
|
|
relative path: The path in the pool relative to the branch and mount.
|
|
.IP \[bu] 2
|
|
function: A filesystem call (open, unlink, create, getattr, rmdir, etc.)
|
|
.IP \[bu] 2
|
|
category: A collection of functions based on basic behavior (action,
|
|
create, search).
|
|
.IP \[bu] 2
|
|
policy: The algorithm used to select a file when performing a function.
|
|
.IP \[bu] 2
|
|
path preservation: Aspect of some policies which includes checking the
|
|
path for which a file would be created.
|
|
.SH BASIC SETUP
|
|
.PP
|
|
If you don\[cq]t already know that you have a special use case then just
|
|
start with one of the following option sets.
|
|
.SS You need \f[C]mmap\f[R] (used by rtorrent and many sqlite3 base software)
|
|
.PP
|
|
\f[C]cache.files=partial,dropcacheonclose=true,category.create=mfs\f[R]
|
|
.SS You don\[cq]t need \f[C]mmap\f[R]
|
|
.PP
|
|
\f[C]cache.files=off,dropcacheonclose=true,category.create=mfs\f[R]
|
|
.SS Command Line
|
|
.PP
|
|
\f[C]mergerfs -o cache.files=partial,dropcacheonclose=true,category.create=mfs /mnt/hdd0:/mnt/hdd1 /media\f[R]
|
|
.SS /etc/fstab
|
|
.PP
|
|
\f[C]/mnt/hdd0:/mnt/hdd1 /media fuse.mergerfs cache.files=partial,dropcacheonclose=true,category.create=mfs 0 0\f[R]
|
|
.SS systemd mount
|
|
.PP
|
|
https://github.com/trapexit/mergerfs/wiki/systemd
|
|
.IP
|
|
.nf
|
|
\f[C]
|
|
[Unit]
|
|
Description=mergerfs service
|
|
|
|
[Service]
|
|
Type=simple
|
|
KillMode=none
|
|
ExecStart=/usr/bin/mergerfs \[rs]
|
|
-f \[rs]
|
|
-o cache.files=partial \[rs]
|
|
-o dropcacheonclose=true \[rs]
|
|
-o category.create=mfs \[rs]
|
|
/mnt/hdd0:/mnt/hdd1 \[rs]
|
|
/media
|
|
ExecStop=/bin/fusermount -uz /media
|
|
Restart=on-failure
|
|
|
|
[Install]
|
|
WantedBy=default.target
|
|
\f[R]
|
|
.fi
|
|
.PP
|
|
See the mergerfs wiki for real world
|
|
deployments (https://github.com/trapexit/mergerfs/wiki/Real-World-Deployments)
|
|
for comparisons / ideas.
|
|
.SH OPTIONS
|
|
.PP
|
|
These options are the same regardless of whether you use them with the
|
|
\f[C]mergerfs\f[R] commandline program, in fstab, or in a config file.
|
|
.SS mount options
|
|
.IP \[bu] 2
|
|
\f[B]config\f[R]: Path to a config file.
|
|
Same arguments as below in key=val / ini style format.
|
|
.IP \[bu] 2
|
|
\f[B]branches\f[R]: Colon delimited list of branches.
|
|
.IP \[bu] 2
|
|
\f[B]minfreespace=SIZE\f[R]: The minimum space value used for creation
|
|
policies.
|
|
Can be overridden by branch specific option.
|
|
Understands `K', `M', and `G' to represent kilobyte, megabyte, and
|
|
gigabyte respectively.
|
|
(default: 4G)
|
|
.IP \[bu] 2
|
|
\f[B]moveonenospc=BOOL|POLICY\f[R]: When enabled if a \f[B]write\f[R]
|
|
fails with \f[B]ENOSPC\f[R] (no space left on device) or
|
|
\f[B]EDQUOT\f[R] (disk quota exceeded) the policy selected will run to
|
|
find a new location for the file.
|
|
An attempt to move the file to that branch will occur (keeping all
|
|
metadata possible) and if successful the original is unlinked and the
|
|
write retried.
|
|
(default: false, true = mfs)
|
|
.IP \[bu] 2
|
|
\f[B]inodecalc=passthrough|path-hash|devino-hash|hybrid-hash\f[R]:
|
|
Selects the inode calculation algorithm.
|
|
(default: hybrid-hash)
|
|
.IP \[bu] 2
|
|
\f[B]dropcacheonclose=BOOL\f[R]: When a file is requested to be closed
|
|
call \f[C]posix_fadvise\f[R] on it first to instruct the kernel that we
|
|
no longer need the data and it can drop its cache.
|
|
Recommended when
|
|
\f[B]cache.files=partial|full|auto-full|per-process\f[R] to limit double
|
|
caching.
|
|
(default: false)
|
|
.IP \[bu] 2
|
|
\f[B]symlinkify=BOOL\f[R]: When enabled and a file is not writable and
|
|
its mtime or ctime is older than \f[B]symlinkify_timeout\f[R] files will
|
|
be reported as symlinks to the original files.
|
|
Please read more below before using.
|
|
(default: false)
|
|
.IP \[bu] 2
|
|
\f[B]symlinkify_timeout=UINT\f[R]: Time to wait, in seconds, to activate
|
|
the \f[B]symlinkify\f[R] behavior.
|
|
(default: 3600)
|
|
.IP \[bu] 2
|
|
\f[B]nullrw=BOOL\f[R]: Turns reads and writes into no-ops.
|
|
The request will succeed but do nothing.
|
|
Useful for benchmarking mergerfs.
|
|
(default: false)
|
|
.IP \[bu] 2
|
|
\f[B]lazy-umount-mountpoint=BOOL\f[R]: mergerfs will attempt to
|
|
\[lq]lazy umount\[rq] the mountpoint before mounting itself.
|
|
Useful when performing live upgrades of mergerfs.
|
|
(default: false)
|
|
.IP \[bu] 2
|
|
\f[B]ignorepponrename=BOOL\f[R]: Ignore path preserving on rename.
|
|
Typically rename and link act differently depending on the policy of
|
|
\f[C]create\f[R] (read below).
|
|
Enabling this will cause rename and link to always use the non-path
|
|
preserving behavior.
|
|
This means files, when renamed or linked, will stay on the same
|
|
filesystem.
|
|
(default: false)
|
|
.IP \[bu] 2
|
|
\f[B]security_capability=BOOL\f[R]: If false return ENOATTR when xattr
|
|
security.capability is queried.
|
|
(default: true)
|
|
.IP \[bu] 2
|
|
\f[B]xattr=passthrough|noattr|nosys\f[R]: Runtime control of xattrs.
|
|
Default is to passthrough xattr requests.
|
|
`noattr' will short circuit as if nothing exists.
|
|
`nosys' will respond with ENOSYS as if xattrs are not supported or
|
|
disabled.
|
|
(default: passthrough)
|
|
.IP \[bu] 2
|
|
\f[B]link_cow=BOOL\f[R]: When enabled if a regular file is opened which
|
|
has a link count > 1 it will copy the file to a temporary file and
|
|
rename over the original.
|
|
Breaking the link and providing a basic copy-on-write function similar
|
|
to cow-shell.
|
|
(default: false)
|
|
.IP \[bu] 2
|
|
\f[B]statfs=base|full\f[R]: Controls how statfs works.
|
|
`base' means it will always use all branches in statfs calculations.
|
|
`full' is in effect path preserving and only includes branches where the
|
|
path exists.
|
|
(default: base)
|
|
.IP \[bu] 2
|
|
\f[B]statfs_ignore=none|ro|nc\f[R]: `ro' will cause statfs calculations
|
|
to ignore available space for branches mounted or tagged as `read-only'
|
|
or `no create'.
|
|
`nc' will ignore available space for branches tagged as `no create'.
|
|
(default: none)
|
|
.IP \[bu] 2
|
|
\f[B]nfsopenhack=off|git|all\f[R]: A workaround for exporting mergerfs
|
|
over NFS where there are issues with creating files for write while
|
|
setting the mode to read-only.
|
|
(default: off)
|
|
.IP \[bu] 2
|
|
\f[B]branches-mount-timeout=UINT\f[R]: Number of seconds to wait at
|
|
startup for branches to be a mount other than the mountpoint\[cq]s
|
|
filesystem.
|
|
(default: 0)
|
|
.IP \[bu] 2
|
|
\f[B]follow-symlinks=never|directory|regular|all\f[R]: Turns symlinks
|
|
into what they point to.
|
|
(default: never)
|
|
.IP \[bu] 2
|
|
\f[B]link-exdev=passthrough|rel-symlink|abs-base-symlink|abs-pool-symlink\f[R]:
|
|
When a link fails with EXDEV optionally create a symlink to the file
|
|
instead.
|
|
.IP \[bu] 2
|
|
\f[B]rename-exdev=passthrough|rel-symlink|abs-symlink\f[R]: When a
|
|
rename fails with EXDEV optionally move the file to a special directory
|
|
and symlink to it.
|
|
.IP \[bu] 2
|
|
\f[B]readahead=UINT\f[R]: Set readahead (in kilobytes) for mergerfs and
|
|
branches if greater than 0.
|
|
(default: 0)
|
|
.IP \[bu] 2
|
|
\f[B]posix_acl=BOOL\f[R]: Enable POSIX ACL support (if supported by
|
|
kernel and underlying filesystem).
|
|
(default: false)
|
|
.IP \[bu] 2
|
|
\f[B]async_read=BOOL\f[R]: Perform reads asynchronously.
|
|
If disabled or unavailable the kernel will ensure there is at most one
|
|
pending read request per file handle and will attempt to order requests
|
|
by offset.
|
|
(default: true)
|
|
.IP \[bu] 2
|
|
\f[B]fuse_msg_size=UINT\f[R]: Set the max number of pages per FUSE
|
|
message.
|
|
Only available on Linux >= 4.20 and ignored otherwise.
|
|
(min: 1; max: 256; default: 256)
|
|
.IP \[bu] 2
|
|
\f[B]threads=INT\f[R]: Number of threads to use.
|
|
When used alone (\f[C]process-thread-count=-1\f[R]) it sets the number
|
|
of threads reading and processing FUSE messages.
|
|
When used together it sets the number of threads reading from FUSE.
|
|
When set to zero it will attempt to discover and use the number of
|
|
logical cores.
|
|
If the thread count is set negative it will look up the number of cores
|
|
then divide by the absolute value.
|
|
ie.
|
|
threads=-2 on an 8 core machine will result in 8 / 2 = 4 threads.
|
|
There will always be at least 1 thread.
|
|
If set to -1 in combination with \f[C]process-thread-count\f[R] then it
|
|
will try to pick reasonable values based on CPU thread count.
|
|
NOTE: higher number of threads increases parallelism but usually
|
|
decreases throughput.
|
|
(default: 0)
|
|
.IP \[bu] 2
|
|
\f[B]read-thread-count=INT\f[R]: Alias for \f[C]threads\f[R].
|
|
.IP \[bu] 2
|
|
\f[B]process-thread-count=INT\f[R]: Enables separate thread pool to
|
|
asynchronously process FUSE requests.
|
|
In this mode \f[C]read-thread-count\f[R] refers to the number of threads
|
|
reading FUSE messages which are dispatched to process threads.
|
|
-1 means disabled otherwise acts like \f[C]read-thread-count\f[R].
|
|
(default: -1)
|
|
.IP \[bu] 2
|
|
\f[B]process-thread-queue-depth=UINT\f[R]: Sets the number of requests
|
|
any single process thread can have queued up at one time.
|
|
Meaning the total memory usage of the queues is queue depth multiplied
|
|
by the number of process threads plus read thread count.
|
|
0 sets the depth to the same as the process thread count.
|
|
(default: 0)
|
|
.IP \[bu] 2
|
|
\f[B]pin-threads=STR\f[R]: Selects a strategy to pin threads to CPUs
|
|
(default: unset)
|
|
.IP \[bu] 2
|
|
\f[B]flush-on-close=never|always|opened-for-write\f[R]: Flush data cache
|
|
on file close.
|
|
Mostly for when writeback is enabled or merging network filesystems.
|
|
(default: opened-for-write)
|
|
.RS 2
|
|
.IP \[bu] 2
|
|
\f[B]scheduling-priority=INT\f[R]: Set mergerfs\[cq] scheduling
|
|
priority.
|
|
Valid values range from -20 to 19.
|
|
See \f[C]setpriority\f[R] man page for more details.
|
|
(default: -10)
|
|
.RE
|
|
.IP \[bu] 2
|
|
\f[B]fsname=STR\f[R]: Sets the name of the filesystem as seen in
|
|
\f[B]mount\f[R], \f[B]df\f[R], etc.
|
|
Defaults to a list of the source paths concatenated together with the
|
|
longest common prefix removed.
|
|
.IP \[bu] 2
|
|
\f[B]func.FUNC=POLICY\f[R]: Sets the specific FUSE function\[cq]s
|
|
policy.
|
|
See below for the list of value types.
|
|
Example: \f[B]func.getattr=newest\f[R]
|
|
.IP \[bu] 2
|
|
\f[B]func.readdir=seq|cosr|cor|cosr:INT|cor:INT\f[R]: Sets
|
|
\f[C]readdir\f[R] policy.
|
|
INT value sets the number of threads to use for concurrency.
|
|
(default: seq)
|
|
.IP \[bu] 2
|
|
\f[B]category.action=POLICY\f[R]: Sets policy of all FUSE functions in
|
|
the action category.
|
|
(default: epall)
|
|
.IP \[bu] 2
|
|
\f[B]category.create=POLICY\f[R]: Sets policy of all FUSE functions in
|
|
the create category.
|
|
(default: epmfs)
|
|
.IP \[bu] 2
|
|
\f[B]category.search=POLICY\f[R]: Sets policy of all FUSE functions in
|
|
the search category.
|
|
(default: ff)
|
|
.IP \[bu] 2
|
|
\f[B]cache.open=UINT\f[R]: `open' policy cache timeout in seconds.
|
|
(default: 0)
|
|
.IP \[bu] 2
|
|
\f[B]cache.statfs=UINT\f[R]: `statfs' cache timeout in seconds.
|
|
(default:
|
|
.RS 2
|
|
.RE
|
|
.IP \[bu] 2
|
|
\f[B]cache.attr=UINT\f[R]: File attribute cache timeout in seconds.
|
|
(default: 1)
|
|
.IP \[bu] 2
|
|
\f[B]cache.entry=UINT\f[R]: File name lookup cache timeout in seconds.
|
|
(default: 1)
|
|
.IP \[bu] 2
|
|
\f[B]cache.negative_entry=UINT\f[R]: Negative file name lookup cache
|
|
timeout in seconds.
|
|
(default: 0)
|
|
.IP \[bu] 2
|
|
\f[B]cache.files=libfuse|off|partial|full|auto-full|per-process\f[R]:
|
|
File page caching mode (default: libfuse)
|
|
.IP \[bu] 2
|
|
\f[B]cache.files.process-names=LIST\f[R]: A pipe | delimited list of
|
|
process comm (https://man7.org/linux/man-pages/man5/proc.5.html) names
|
|
to enable page caching for when \f[C]cache.files=per-process\f[R].
|
|
(default: \[lq]rtorrent|qbittorrent-nox\[rq])
|
|
.IP \[bu] 2
|
|
\f[B]cache.writeback=BOOL\f[R]: Enable kernel writeback caching
|
|
(default: false)
|
|
.IP \[bu] 2
|
|
\f[B]cache.symlinks=BOOL\f[R]: Cache symlinks (if supported by kernel)
|
|
(default: false)
|
|
.IP \[bu] 2
|
|
\f[B]cache.readdir=BOOL\f[R]: Cache readdir (if supported by kernel)
|
|
(default: false)
|
|
.IP \[bu] 2
|
|
\f[B]parallel-direct-writes=BOOL\f[R]: Allow the kernel to dispatch
|
|
multiple, parallel (non-extending) write requests for files opened with
|
|
\f[C]cache.files=per-process\f[R] (if the process is not in
|
|
\f[C]process-names\f[R]) or \f[C]cache.files=off\f[R].
|
|
(This requires kernel support, and was added in v6.2)
|
|
.IP \[bu] 2
|
|
\f[B]direct_io\f[R]: deprecated - Bypass page cache.
|
|
Use \f[C]cache.files=off\f[R] instead.
|
|
(default: false)
|
|
.IP \[bu] 2
|
|
\f[B]kernel_cache\f[R]: deprecated - Do not invalidate data cache on
|
|
file open.
|
|
Use \f[C]cache.files=full\f[R] instead.
|
|
(default: false)
|
|
.IP \[bu] 2
|
|
\f[B]auto_cache\f[R]: deprecated - Invalidate data cache if file mtime
|
|
or size change.
|
|
Use \f[C]cache.files=auto-full\f[R] instead.
|
|
(default: false)
|
|
.IP \[bu] 2
|
|
\f[B]async_read\f[R]: deprecated - Perform reads asynchronously.
|
|
Use \f[C]async_read=true\f[R] instead.
|
|
.IP \[bu] 2
|
|
\f[B]sync_read\f[R]: deprecated - Perform reads synchronously.
|
|
Use \f[C]async_read=false\f[R] instead.
|
|
.IP \[bu] 2
|
|
\f[B]splice_read\f[R]: deprecated - Does nothing.
|
|
.IP \[bu] 2
|
|
\f[B]splice_write\f[R]: deprecated - Does nothing.
|
|
.IP \[bu] 2
|
|
\f[B]splice_move\f[R]: deprecated - Does nothing.
|
|
.IP \[bu] 2
|
|
\f[B]allow_other\f[R]: deprecated - mergerfs always sets this FUSE
|
|
option as normal permissions can be used to limit access.
|
|
.IP \[bu] 2
|
|
\f[B]use_ino\f[R]: deprecated - mergerfs should always control inode
|
|
calculation so this is enabled all the time.
|
|
.PP
|
|
\f[B]NOTE:\f[R] Options are evaluated in the order listed so if the
|
|
options are \f[B]func.rmdir=rand,category.action=ff\f[R] the
|
|
\f[B]action\f[R] category setting will override the \f[B]rmdir\f[R]
|
|
setting.
|
|
.PP
|
|
\f[B]NOTE:\f[R] Always look at the documentation for the version of
|
|
mergerfs you\[cq]re using.
|
|
Not all features are available in older releases.
|
|
Use \f[C]man mergerfs\f[R] or find the docs as linked in the release.
|
|
.SS Value Types
|
|
.IP \[bu] 2
|
|
BOOL = `true' | `false'
|
|
.IP \[bu] 2
|
|
INT = [MIN_INT,MAX_INT]
|
|
.IP \[bu] 2
|
|
UINT = [0,MAX_INT]
|
|
.IP \[bu] 2
|
|
SIZE = `NNM'; NN = INT, M = `K' | `M' | `G' | `T'
|
|
.IP \[bu] 2
|
|
STR = string (may refer to an enumerated value, see details of argument)
|
|
.IP \[bu] 2
|
|
FUNC = filesystem function
|
|
.IP \[bu] 2
|
|
CATEGORY = function category
|
|
.IP \[bu] 2
|
|
POLICY = mergerfs function policy
|
|
.SS branches
|
|
.PP
|
|
The `branches' argument is a colon (`:') delimited list of paths to be
|
|
pooled together.
|
|
It does not matter if the paths are on the same or different filesystems
|
|
nor does it matter the filesystem type (within reason).
|
|
Used and available space will not be duplicated for paths on the same
|
|
filesystem and any features which aren\[cq]t supported by the underlying
|
|
filesystem (such as file attributes or extended attributes) will return
|
|
the appropriate errors.
|
|
.PP
|
|
Branches currently have two options which can be set.
|
|
A type which impacts whether or not the branch is included in a policy
|
|
calculation and a individual minfreespace value.
|
|
The values are set by prepending an \f[C]=\f[R] at the end of a branch
|
|
designation and using commas as delimiters.
|
|
Example: \f[C]/mnt/drive=RW,1234\f[R]
|
|
.SS branch mode
|
|
.IP \[bu] 2
|
|
RW: (read/write) - Default behavior.
|
|
Will be eligible in all policy categories.
|
|
.IP \[bu] 2
|
|
RO: (read-only) - Will be excluded from \f[C]create\f[R] and
|
|
\f[C]action\f[R] policies.
|
|
Same as a read-only mounted filesystem would be (though faster to
|
|
process).
|
|
.IP \[bu] 2
|
|
NC: (no-create) - Will be excluded from \f[C]create\f[R] policies.
|
|
You can\[cq]t create on that branch but you can change or delete.
|
|
.SS minfreespace
|
|
.PP
|
|
Same purpose and syntax as the global option but specific to the branch.
|
|
If not set the global value is used.
|
|
.SS globbing
|
|
.PP
|
|
To make it easier to include multiple branches mergerfs supports
|
|
globbing (http://linux.die.net/man/7/glob).
|
|
\f[B]The globbing tokens MUST be escaped when using via the shell else
|
|
the shell itself will apply the glob itself.\f[R]
|
|
.IP
|
|
.nf
|
|
\f[C]
|
|
# mergerfs /mnt/hdd\[rs]*:/mnt/ssd /media
|
|
\f[R]
|
|
.fi
|
|
.PP
|
|
The above line will use all mount points in /mnt prefixed with
|
|
\f[B]hdd\f[R] and \f[B]ssd\f[R].
|
|
.PP
|
|
To have the pool mounted at boot or otherwise accessible from related
|
|
tools use \f[B]/etc/fstab\f[R].
|
|
.IP
|
|
.nf
|
|
\f[C]
|
|
# <file system> <mount point> <type> <options> <dump> <pass>
|
|
/mnt/hdd*:/mnt/ssd /media fuse.mergerfs minfreespace=16G 0 0
|
|
\f[R]
|
|
.fi
|
|
.PP
|
|
\f[B]NOTE:\f[R] the globbing is done at mount or when updated using the
|
|
runtime API.
|
|
If a new directory is added matching the glob after the fact it will not
|
|
be automatically included.
|
|
.PP
|
|
\f[B]NOTE:\f[R] for mounting via \f[B]fstab\f[R] to work you must have
|
|
\f[B]mount.fuse\f[R] installed.
|
|
For Ubuntu/Debian it is included in the \f[B]fuse\f[R] package.
|
|
.SS inodecalc
|
|
.PP
|
|
Inodes (st_ino) are unique identifiers within a filesystem.
|
|
Each mounted filesystem has device ID (st_dev) as well and together they
|
|
can uniquely identify a file on the whole of the system.
|
|
Entries on the same device with the same inode are in fact references to
|
|
the same underlying file.
|
|
It is a many to one relationship between names and an inode.
|
|
Directories, however, do not have multiple links on most systems due to
|
|
the complexity they add.
|
|
.PP
|
|
FUSE allows the server (mergerfs) to set inode values but not device
|
|
IDs.
|
|
Creating an inode value is somewhat complex in mergerfs\[cq] case as
|
|
files aren\[cq]t really in its control.
|
|
If a policy changes what directory or file is to be selected or
|
|
something changes out of band it becomes unclear what value should be
|
|
used.
|
|
Most software does not to care what the values are but those that do
|
|
often break if a value changes unexpectedly.
|
|
The tool \f[C]find\f[R] will abort a directory walk if it sees a
|
|
directory inode change.
|
|
NFS will return stale handle errors if the inode changes out of band.
|
|
File dedup tools will usually leverage device ids and inodes as a
|
|
shortcut in searching for duplicate files and would resort to full file
|
|
comparisons should it find different inode values.
|
|
.PP
|
|
mergerfs offers multiple ways to calculate the inode in hopes of
|
|
covering different usecases.
|
|
.IP \[bu] 2
|
|
passthrough: Passes through the underlying inode value.
|
|
Mostly intended for testing as using this does not address any of the
|
|
problems mentioned above and could confuse file deduplication software
|
|
as inodes from different filesystems can be the same.
|
|
.IP \[bu] 2
|
|
path-hash: Hashes the relative path of the entry in question.
|
|
The underlying file\[cq]s values are completely ignored.
|
|
This means the inode value will always be the same for that file path.
|
|
This is useful when using NFS and you make changes out of band such as
|
|
copy data between branches.
|
|
This also means that entries that do point to the same file will not be
|
|
recognizable via inodes.
|
|
That \f[B]does not\f[R] mean hard links don\[cq]t work.
|
|
They will.
|
|
.IP \[bu] 2
|
|
path-hash32: 32bit version of path-hash.
|
|
.IP \[bu] 2
|
|
devino-hash: Hashes the device id and inode of the underlying entry.
|
|
This won\[cq]t prevent issues with NFS should the policy pick a
|
|
different file or files move out of band but will present the same inode
|
|
for underlying files that do too.
|
|
.IP \[bu] 2
|
|
devino-hash32: 32bit version of devino-hash.
|
|
.IP \[bu] 2
|
|
hybrid-hash: Performs \f[C]path-hash\f[R] on directories and
|
|
\f[C]devino-hash\f[R] on other file types.
|
|
Since directories can\[cq]t have hard links the static value won\[cq]t
|
|
make a difference and the files will get values useful for finding
|
|
duplicates.
|
|
Probably the best to use if not using NFS.
|
|
As such it is the default.
|
|
.IP \[bu] 2
|
|
hybrid-hash32: 32bit version of hybrid-hash.
|
|
.PP
|
|
32bit versions are provided as there is some software which does not
|
|
handle 64bit inodes well.
|
|
.PP
|
|
While there is a risk of hash collision in tests of a couple million
|
|
entries there were zero collisions.
|
|
Unlike a typical filesystem FUSE filesystems can reuse inodes and not
|
|
refer to the same entry.
|
|
The internal identifier used to reference a file in FUSE is different
|
|
from the inode value presented.
|
|
The former is the \f[C]nodeid\f[R] and is actually a tuple of 2 64bit
|
|
values: \f[C]nodeid\f[R] and \f[C]generation\f[R].
|
|
This tuple is not client facing.
|
|
The inode that is presented to the client is passed through the kernel
|
|
uninterpreted.
|
|
.PP
|
|
From FUSE docs for \f[C]use_ino\f[R]:
|
|
.IP
|
|
.nf
|
|
\f[C]
|
|
Honor the st_ino field in the functions getattr() and
|
|
fill_dir(). This value is used to fill in the st_ino field
|
|
in the stat(2), lstat(2), fstat(2) functions and the d_ino
|
|
field in the readdir(2) function. The filesystem does not
|
|
have to guarantee uniqueness, however some applications
|
|
rely on this value being unique for the whole filesystem.
|
|
Note that this does *not* affect the inode that libfuse
|
|
and the kernel use internally (also called the \[dq]nodeid\[dq]).
|
|
\f[R]
|
|
.fi
|
|
.PP
|
|
As of version 2.35.0 the \f[C]use_ino\f[R] option has been removed.
|
|
mergerfs should always be managing inode values.
|
|
.SS pin-threads
|
|
.PP
|
|
Simple strategies for pinning read and/or process threads.
|
|
If process threads are not enabled than the strategy simply works on the
|
|
read threads.
|
|
Invalid values are ignored.
|
|
.IP \[bu] 2
|
|
R1L: All read threads pinned to a single logical CPU.
|
|
.IP \[bu] 2
|
|
R1P: All read threads pinned to a single physical CPU.
|
|
.IP \[bu] 2
|
|
RP1L: All read and process threads pinned to a single logical CPU.
|
|
.IP \[bu] 2
|
|
RP1P: All read and process threads pinned to a single physical CPU.
|
|
.IP \[bu] 2
|
|
R1LP1L: All read threads pinned to a single logical CPU, all process
|
|
threads pinned to a (if possible) different logical CPU.
|
|
.IP \[bu] 2
|
|
R1PP1P: All read threads pinned to a single physical CPU, all process
|
|
threads pinned to a (if possible) different logical CPU.
|
|
.IP \[bu] 2
|
|
RPSL: All read and process threads are spread across all logical CPUs.
|
|
.IP \[bu] 2
|
|
RPSP: All read and process threads are spread across all physical CPUs.
|
|
.IP \[bu] 2
|
|
R1PPSP: All read threads are pinned to a single physical CPU while
|
|
process threads are spread across all other phsycial CPUs.
|
|
.SS fuse_msg_size
|
|
.PP
|
|
FUSE applications communicate with the kernel over a special character
|
|
device: \f[C]/dev/fuse\f[R].
|
|
A large portion of the overhead associated with FUSE is the cost of
|
|
going back and forth from user space and kernel space over that device.
|
|
Generally speaking the fewer trips needed the better the performance
|
|
will be.
|
|
Reducing the number of trips can be done a number of ways.
|
|
Kernel level caching and increasing message sizes being two significant
|
|
ones.
|
|
When it comes to reads and writes if the message size is doubled the
|
|
number of trips are approximately halved.
|
|
.PP
|
|
In Linux 4.20 a new feature was added allowing the negotiation of the
|
|
max message size.
|
|
Since the size is in multiples of
|
|
pages (https://en.wikipedia.org/wiki/Page_(computer_memory)) the feature
|
|
is called \f[C]max_pages\f[R].
|
|
There is a maximum \f[C]max_pages\f[R] value of 256 (1MiB) and minimum
|
|
of 1 (4KiB).
|
|
The default used by Linux >=4.20, and hardcoded value used before 4.20,
|
|
is 32 (128KiB).
|
|
In mergerfs its referred to as \f[C]fuse_msg_size\f[R] to make it clear
|
|
what it impacts and provide some abstraction.
|
|
.PP
|
|
Since there should be no downsides to increasing \f[C]fuse_msg_size\f[R]
|
|
/ \f[C]max_pages\f[R], outside a minor bump in RAM usage due to larger
|
|
message buffers, mergerfs defaults the value to 256.
|
|
On kernels before 4.20 the value has no effect.
|
|
The reason the value is configurable is to enable experimentation and
|
|
benchmarking.
|
|
See the BENCHMARKING section for examples.
|
|
.SS follow-symlinks
|
|
.PP
|
|
This feature, when enabled, will cause symlinks to be interpreted by
|
|
mergerfs as their target (depending on the mode).
|
|
.PP
|
|
When there is a getattr/stat request for a file mergerfs will check if
|
|
the file is a symlink and depending on the \f[C]follow-symlinks\f[R]
|
|
setting will replace the information about the symlink with that of that
|
|
which it points to.
|
|
.PP
|
|
When unlink\[cq]ing or rmdir\[cq]ing the followed symlink it will remove
|
|
the symlink itself and not that which it points to.
|
|
.IP \[bu] 2
|
|
never: Behave as normal.
|
|
Symlinks are treated as such.
|
|
.IP \[bu] 2
|
|
directory: Resolve symlinks only which point to directories.
|
|
.IP \[bu] 2
|
|
regular: Resolve symlinks only which point to regular files.
|
|
.IP \[bu] 2
|
|
all: Resolve all symlinks to that which they point to.
|
|
.PP
|
|
Symlinks which do not point to anything are left as is.
|
|
.PP
|
|
WARNING: This feature works but there might be edge cases yet found.
|
|
If you find any odd behaviors please file a ticket on
|
|
github (https://github.com/trapexit/mergerfs/issues).
|
|
.SS link-exdev
|
|
.PP
|
|
If using path preservation and a \f[C]link\f[R] fails with EXDEV make a
|
|
call to \f[C]symlink\f[R] where the \f[C]target\f[R] is the
|
|
\f[C]oldlink\f[R] and the \f[C]linkpath\f[R] is the \f[C]newpath\f[R].
|
|
The \f[C]target\f[R] value is determined by the value of
|
|
\f[C]link-exdev\f[R].
|
|
.IP \[bu] 2
|
|
passthrough: Return EXDEV as normal.
|
|
.IP \[bu] 2
|
|
rel-symlink: A relative path from the \f[C]newpath\f[R].
|
|
.IP \[bu] 2
|
|
abs-base-symlink: A absolute value using the underlying branch.
|
|
.IP \[bu] 2
|
|
abs-pool-symlink: A absolute value using the mergerfs mount point.
|
|
.PP
|
|
NOTE: It is possible that some applications check the file they link.
|
|
In those cases it is possible it will error or complain.
|
|
.SS rename-exdev
|
|
.PP
|
|
If using path preservation and a \f[C]rename\f[R] fails with EXDEV:
|
|
.IP "1." 3
|
|
Move file from \f[B]/branch/a/b/c\f[R] to
|
|
\f[B]/branch/.mergerfs_rename_exdev/a/b/c\f[R].
|
|
.IP "2." 3
|
|
symlink the rename\[cq]s \f[C]newpath\f[R] to the moved file.
|
|
.PP
|
|
The \f[C]target\f[R] value is determined by the value of
|
|
\f[C]rename-exdev\f[R].
|
|
.IP \[bu] 2
|
|
passthrough: Return EXDEV as normal.
|
|
.IP \[bu] 2
|
|
rel-symlink: A relative path from the \f[C]newpath\f[R].
|
|
.IP \[bu] 2
|
|
abs-symlink: A absolute value using the mergerfs mount point.
|
|
.PP
|
|
NOTE: It is possible that some applications check the file they rename.
|
|
In those cases it is possible it will error or complain.
|
|
.PP
|
|
NOTE: The reason \f[C]abs-symlink\f[R] is not split into two like
|
|
\f[C]link-exdev\f[R] is due to the complexities in managing absolute
|
|
base symlinks when multiple \f[C]oldpaths\f[R] exist.
|
|
.SS symlinkify
|
|
.PP
|
|
Due to the levels of indirection introduced by mergerfs and the
|
|
underlying technology FUSE there can be varying levels of performance
|
|
degradation.
|
|
This feature will turn non-directories which are not writable into
|
|
symlinks to the original file found by the \f[C]readlink\f[R] policy
|
|
after the mtime and ctime are older than the timeout.
|
|
.PP
|
|
\f[B]WARNING:\f[R] The current implementation has a known issue in which
|
|
if the file is open and being used when the file is converted to a
|
|
symlink then the application which has that file open will receive an
|
|
error when using it.
|
|
This is unlikely to occur in practice but is something to keep in mind.
|
|
.PP
|
|
\f[B]WARNING:\f[R] Some backup solutions, such as CrashPlan, do not
|
|
backup the target of a symlink.
|
|
If using this feature it will be necessary to point any backup software
|
|
to the original filesystems or configure the software to follow symlinks
|
|
if such an option is available.
|
|
Alternatively create two mounts.
|
|
One for backup and one for general consumption.
|
|
.SS nullrw
|
|
.PP
|
|
Due to how FUSE works there is an overhead to all requests made to a
|
|
FUSE filesystem that wouldn\[cq]t exist for an in kernel one.
|
|
Meaning that even a simple passthrough will have some slowdown.
|
|
However, generally the overhead is minimal in comparison to the cost of
|
|
the underlying I/O.
|
|
By disabling the underlying I/O we can test the theoretical performance
|
|
boundaries.
|
|
.PP
|
|
By enabling \f[C]nullrw\f[R] mergerfs will work as it always does
|
|
\f[B]except\f[R] that all reads and writes will be no-ops.
|
|
A write will succeed (the size of the write will be returned as if it
|
|
were successful) but mergerfs does nothing with the data it was given.
|
|
Similarly a read will return the size requested but won\[cq]t touch the
|
|
buffer.
|
|
.PP
|
|
See the BENCHMARKING section for suggestions on how to test.
|
|
.SS xattr
|
|
.PP
|
|
Runtime extended attribute support can be managed via the
|
|
\f[C]xattr\f[R] option.
|
|
By default it will passthrough any xattr calls.
|
|
Given xattr support is rarely used and can have significant performance
|
|
implications mergerfs allows it to be disabled at runtime.
|
|
The performance problems mostly comes when file caching is enabled.
|
|
The kernel will send a \f[C]getxattr\f[R] for
|
|
\f[C]security.capability\f[R] \f[I]before every single write\f[R].
|
|
It doesn\[cq]t cache the responses to any \f[C]getxattr\f[R].
|
|
This might be addressed in the future but for now mergerfs can really
|
|
only offer the following workarounds.
|
|
.PP
|
|
\f[C]noattr\f[R] will cause mergerfs to short circuit all xattr calls
|
|
and return ENOATTR where appropriate.
|
|
mergerfs still gets all the requests but they will not be forwarded on
|
|
to the underlying filesystems.
|
|
The runtime control will still function in this mode.
|
|
.PP
|
|
\f[C]nosys\f[R] will cause mergerfs to return ENOSYS for any xattr call.
|
|
The difference with \f[C]noattr\f[R] is that the kernel will cache this
|
|
fact and itself short circuit future calls.
|
|
This is more efficient than \f[C]noattr\f[R] but will cause
|
|
mergerfs\[cq] runtime control via the hidden file to stop working.
|
|
.SS nfsopenhack
|
|
.PP
|
|
NFS is not fully POSIX compliant and historically certain behaviors,
|
|
such as opening files with O_EXCL, are not or not well supported.
|
|
When mergerfs (or any FUSE filesystem) is exported over NFS some of
|
|
these issues come up due to how NFS and FUSE interact.
|
|
.PP
|
|
This hack addresses the issue where the creation of a file with a
|
|
read-only mode but with a read/write or write only flag.
|
|
Normally this is perfectly valid but NFS chops the one open call into
|
|
multiple calls.
|
|
Exactly how it is translated depends on the configuration and versions
|
|
of the NFS server and clients but it results in a permission error
|
|
because a normal user is not allowed to open a read-only file as
|
|
writable.
|
|
.PP
|
|
Even though it\[cq]s a more niche situation this hack breaks normal
|
|
security and behavior and as such is \f[C]off\f[R] by default.
|
|
If set to \f[C]git\f[R] it will only perform the hack when the path in
|
|
question includes \f[C]/.git/\f[R].
|
|
\f[C]all\f[R] will result it applying anytime a readonly file which is
|
|
empty is opened for writing.
|
|
.SH FUNCTIONS, CATEGORIES and POLICIES
|
|
.PP
|
|
The POSIX filesystem API is made up of a number of functions.
|
|
\f[B]creat\f[R], \f[B]stat\f[R], \f[B]chown\f[R], etc.
|
|
For ease of configuration in mergerfs most of the core functions are
|
|
grouped into 3 categories: \f[B]action\f[R], \f[B]create\f[R], and
|
|
\f[B]search\f[R].
|
|
These functions and categories can be assigned a policy which dictates
|
|
which branch is chosen when performing that function.
|
|
.PP
|
|
Some functions, listed in the category \f[C]N/A\f[R] below, can not be
|
|
assigned the normal policies.
|
|
These functions work with file handles, rather than file paths, which
|
|
were created by \f[C]open\f[R] or \f[C]create\f[R].
|
|
That said many times the current FUSE kernel driver will not always
|
|
provide the file handle when a client calls \f[C]fgetattr\f[R],
|
|
\f[C]fchown\f[R], \f[C]fchmod\f[R], \f[C]futimens\f[R],
|
|
\f[C]ftruncate\f[R], etc.
|
|
This means it will call the regular, path based, versions.
|
|
\f[C]statfs\f[R]\[cq]s behavior can be modified via other options.
|
|
.PP
|
|
When using policies which are based on a branch\[cq]s available space
|
|
the base path provided is used.
|
|
Not the full path to the file in question.
|
|
Meaning that mounts in the branch won\[cq]t be considered in the space
|
|
calculations.
|
|
The reason is that it doesn\[cq]t really work for non-path preserving
|
|
policies and can lead to non-obvious behaviors.
|
|
.PP
|
|
NOTE: While any policy can be assigned to a function or category, some
|
|
may not be very useful in practice.
|
|
For instance: \f[B]rand\f[R] (random) may be useful for file creation
|
|
(create) but could lead to very odd behavior if used for \f[C]chmod\f[R]
|
|
if there were more than one copy of the file.
|
|
.SS Functions and their Category classifications
|
|
.PP
|
|
.TS
|
|
tab(@);
|
|
lw(7.4n) lw(62.6n).
|
|
T{
|
|
Category
|
|
T}@T{
|
|
FUSE Functions
|
|
T}
|
|
_
|
|
T{
|
|
action
|
|
T}@T{
|
|
chmod, chown, link, removexattr, rename, rmdir, setxattr, truncate,
|
|
unlink, utimens
|
|
T}
|
|
T{
|
|
create
|
|
T}@T{
|
|
create, mkdir, mknod, symlink
|
|
T}
|
|
T{
|
|
search
|
|
T}@T{
|
|
access, getattr, getxattr, ioctl (directories), listxattr, open,
|
|
readlink
|
|
T}
|
|
T{
|
|
N/A
|
|
T}@T{
|
|
fchmod, fchown, futimens, ftruncate, fallocate, fgetattr, fsync, ioctl
|
|
(files), read, readdir, release, statfs, write, copy_file_range
|
|
T}
|
|
.TE
|
|
.PP
|
|
In cases where something may be searched for (such as a path to clone)
|
|
\f[B]getattr\f[R] will usually be used.
|
|
.SS Policies
|
|
.PP
|
|
A policy is the algorithm used to choose a branch or branches for a
|
|
function to work on or generally how the function behaves.
|
|
.PP
|
|
Any function in the \f[C]create\f[R] category will clone the relative
|
|
path if needed.
|
|
Some other functions (\f[C]rename\f[R],\f[C]link\f[R],\f[C]ioctl\f[R])
|
|
have special requirements or behaviors which you can read more about
|
|
below.
|
|
.SS Filtering
|
|
.PP
|
|
Most policies basically search branches and create a list of files /
|
|
paths for functions to work on.
|
|
The policy is responsible for filtering and sorting the branches.
|
|
Filters include \f[B]minfreespace\f[R], whether or not a branch is
|
|
mounted read-only, and the branch tagging (RO,NC,RW).
|
|
These filters are applied across most policies.
|
|
.IP \[bu] 2
|
|
No \f[B]search\f[R] function policies filter.
|
|
.IP \[bu] 2
|
|
All \f[B]action\f[R] function policies filter out branches which are
|
|
mounted \f[B]read-only\f[R] or tagged as \f[B]RO (read-only)\f[R].
|
|
.IP \[bu] 2
|
|
All \f[B]create\f[R] function policies filter out branches which are
|
|
mounted \f[B]read-only\f[R], tagged \f[B]RO (read-only)\f[R] or \f[B]NC
|
|
(no create)\f[R], or has available space less than
|
|
\f[C]minfreespace\f[R].
|
|
.PP
|
|
Policies may have their own additional filtering such as those that
|
|
require existing paths to be present.
|
|
.PP
|
|
If all branches are filtered an error will be returned.
|
|
Typically \f[B]EROFS\f[R] (read-only filesystem) or \f[B]ENOSPC\f[R] (no
|
|
space left on device) depending on the most recent reason for filtering
|
|
a branch.
|
|
\f[B]ENOENT\f[R] will be returned if no eligible branch is found.
|
|
.PP
|
|
If \f[B]create\f[R], \f[B]mkdir\f[R], \f[B]mknod\f[R], or
|
|
\f[B]symlink\f[R] fail with \f[C]EROFS\f[R] or other fundimental errors
|
|
then mergerfs will mark any branch found to be read-only as such (IE
|
|
will set the mode \f[C]RO\f[R]) and will rerun the policy and try again.
|
|
This is mostly for \f[C]ext4\f[R] filesystems that can suddenly become
|
|
read-only when it encounters an error.
|
|
.SS Path Preservation
|
|
.PP
|
|
Policies, as described below, are of two basic classifications.
|
|
\f[C]path preserving\f[R] and \f[C]non-path preserving\f[R].
|
|
.PP
|
|
All policies which start with \f[C]ep\f[R] (\f[B]epff\f[R],
|
|
\f[B]eplfs\f[R], \f[B]eplus\f[R], \f[B]epmfs\f[R], \f[B]eprand\f[R]) are
|
|
\f[C]path preserving\f[R].
|
|
\f[C]ep\f[R] stands for \f[C]existing path\f[R].
|
|
.PP
|
|
A path preserving policy will only consider branches where the relative
|
|
path being accessed already exists.
|
|
.PP
|
|
When using non-path preserving policies paths will be cloned to target
|
|
branches as necessary.
|
|
.PP
|
|
With the \f[C]msp\f[R] or \f[C]most shared path\f[R] policies they are
|
|
defined as \f[C]path preserving\f[R] for the purpose of controlling
|
|
\f[C]link\f[R] and \f[C]rename\f[R]\[cq]s behaviors since
|
|
\f[C]ignorepponrename\f[R] is available to disable that behavior.
|
|
.SS Policy descriptions
|
|
.PP
|
|
A policy\[cq]s behavior differs, as mentioned above, based on the
|
|
function it is used with.
|
|
Sometimes it really might not make sense to even offer certain policies
|
|
because they are literally the same as others but it makes things a bit
|
|
more uniform.
|
|
.PP
|
|
.TS
|
|
tab(@);
|
|
lw(16.2n) lw(53.8n).
|
|
T{
|
|
Policy
|
|
T}@T{
|
|
Description
|
|
T}
|
|
_
|
|
T{
|
|
all
|
|
T}@T{
|
|
Search: For \f[B]mkdir\f[R], \f[B]mknod\f[R], and \f[B]symlink\f[R] it
|
|
will apply to all branches.
|
|
\f[B]create\f[R] works like \f[B]ff\f[R].
|
|
T}
|
|
T{
|
|
epall (existing path, all)
|
|
T}@T{
|
|
For \f[B]mkdir\f[R], \f[B]mknod\f[R], and \f[B]symlink\f[R] it will
|
|
apply to all found.
|
|
\f[B]create\f[R] works like \f[B]epff\f[R] (but more expensive because
|
|
it doesn\[cq]t stop after finding a valid branch).
|
|
T}
|
|
T{
|
|
epff (existing path, first found)
|
|
T}@T{
|
|
Given the order of the branches, as defined at mount time or configured
|
|
at runtime, act on the first one found where the relative path exists.
|
|
T}
|
|
T{
|
|
eplfs (existing path, least free space)
|
|
T}@T{
|
|
Of all the branches on which the relative path exists choose the branch
|
|
with the least free space.
|
|
T}
|
|
T{
|
|
eplus (existing path, least used space)
|
|
T}@T{
|
|
Of all the branches on which the relative path exists choose the branch
|
|
with the least used space.
|
|
T}
|
|
T{
|
|
epmfs (existing path, most free space)
|
|
T}@T{
|
|
Of all the branches on which the relative path exists choose the branch
|
|
with the most free space.
|
|
T}
|
|
T{
|
|
eppfrd (existing path, percentage free random distribution)
|
|
T}@T{
|
|
Like \f[B]pfrd\f[R] but limited to existing paths.
|
|
T}
|
|
T{
|
|
eprand (existing path, random)
|
|
T}@T{
|
|
Calls \f[B]epall\f[R] and then randomizes.
|
|
Returns 1.
|
|
T}
|
|
T{
|
|
ff (first found)
|
|
T}@T{
|
|
Given the order of the branches, as defined at mount time or configured
|
|
at runtime, act on the first one found.
|
|
T}
|
|
T{
|
|
lfs (least free space)
|
|
T}@T{
|
|
Pick the branch with the least available free space.
|
|
T}
|
|
T{
|
|
lus (least used space)
|
|
T}@T{
|
|
Pick the branch with the least used space.
|
|
T}
|
|
T{
|
|
mfs (most free space)
|
|
T}@T{
|
|
Pick the branch with the most available free space.
|
|
T}
|
|
T{
|
|
msplfs (most shared path, least free space)
|
|
T}@T{
|
|
Like \f[B]eplfs\f[R] but if it fails to find a branch it will try again
|
|
with the parent directory.
|
|
Continues this pattern till finding one.
|
|
T}
|
|
T{
|
|
msplus (most shared path, least used space)
|
|
T}@T{
|
|
Like \f[B]eplus\f[R] but if it fails to find a branch it will try again
|
|
with the parent directory.
|
|
Continues this pattern till finding one.
|
|
T}
|
|
T{
|
|
mspmfs (most shared path, most free space)
|
|
T}@T{
|
|
Like \f[B]epmfs\f[R] but if it fails to find a branch it will try again
|
|
with the parent directory.
|
|
Continues this pattern till finding one.
|
|
T}
|
|
T{
|
|
msppfrd (most shared path, percentage free random distribution)
|
|
T}@T{
|
|
Like \f[B]eppfrd\f[R] but if it fails to find a branch it will try again
|
|
with the parent directory.
|
|
Continues this pattern till finding one.
|
|
T}
|
|
T{
|
|
newest
|
|
T}@T{
|
|
Pick the file / directory with the largest mtime.
|
|
T}
|
|
T{
|
|
pfrd (percentage free random distribution)
|
|
T}@T{
|
|
Chooses a branch at random with the likelihood of selection based on a
|
|
branch\[cq]s available space relative to the total.
|
|
T}
|
|
T{
|
|
rand (random)
|
|
T}@T{
|
|
Calls \f[B]all\f[R] and then randomizes.
|
|
Returns 1 branch.
|
|
T}
|
|
.TE
|
|
.PP
|
|
\f[B]NOTE:\f[R] If you are using an underlying filesystem that reserves
|
|
blocks such as ext2, ext3, or ext4 be aware that mergerfs respects the
|
|
reservation by using \f[C]f_bavail\f[R] (number of free blocks for
|
|
unprivileged users) rather than \f[C]f_bfree\f[R] (number of free
|
|
blocks) in policy calculations.
|
|
\f[B]df\f[R] does NOT use \f[C]f_bavail\f[R], it uses \f[C]f_bfree\f[R],
|
|
so direct comparisons between \f[B]df\f[R] output and mergerfs\[cq]
|
|
policies is not appropriate.
|
|
.SS Defaults
|
|
.PP
|
|
.TS
|
|
tab(@);
|
|
l l.
|
|
T{
|
|
Category
|
|
T}@T{
|
|
Policy
|
|
T}
|
|
_
|
|
T{
|
|
action
|
|
T}@T{
|
|
epall
|
|
T}
|
|
T{
|
|
create
|
|
T}@T{
|
|
epmfs
|
|
T}
|
|
T{
|
|
search
|
|
T}@T{
|
|
ff
|
|
T}
|
|
.TE
|
|
.SS func.readdir
|
|
.PP
|
|
examples: \f[C]func.readdir=seq\f[R], \f[C]func.readdir=cor:4\f[R]
|
|
.PP
|
|
\f[C]readdir\f[R] has policies to control how it manages reading
|
|
directory content.
|
|
.PP
|
|
.TS
|
|
tab(@);
|
|
lw(26.7n) lw(43.3n).
|
|
T{
|
|
Policy
|
|
T}@T{
|
|
Description
|
|
T}
|
|
_
|
|
T{
|
|
seq
|
|
T}@T{
|
|
\[lq]sequential\[rq] : Iterate over branches in the order defined.
|
|
This is the default and traditional behavior found prior to the readdir
|
|
policy introduction.
|
|
T}
|
|
T{
|
|
cosr
|
|
T}@T{
|
|
\[lq]concurrent open, sequential read\[rq] : Concurrently open branch
|
|
directories using a thread pool and process them in order of definition.
|
|
This keeps memory and CPU usage low while also reducing the time spent
|
|
waiting on branches to respond.
|
|
Number of threads defaults to the number of logical cores.
|
|
Can be overwritten via the syntax \f[C]func.readdir=cosr:N\f[R] where
|
|
\f[C]N\f[R] is the number of threads.
|
|
T}
|
|
T{
|
|
cor
|
|
T}@T{
|
|
\[lq]concurrent open and read\[rq] : Concurrently open branch
|
|
directories and immediately start reading their contents using a thread
|
|
pool.
|
|
This will result in slightly higher memory and CPU usage but reduced
|
|
latency.
|
|
Particularly when using higher latency / slower speed network filesystem
|
|
branches.
|
|
Unlike \f[C]seq\f[R] and \f[C]cosr\f[R] the order of files could change
|
|
due the async nature of the thread pool.
|
|
Number of threads defaults to the number of logical cores.
|
|
Can be overwritten via the syntax \f[C]func.readdir=cor:N\f[R] where
|
|
\f[C]N\f[R] is the number of threads.
|
|
T}
|
|
.TE
|
|
.PP
|
|
Keep in mind that \f[C]readdir\f[R] mostly just provides a list of file
|
|
names in a directory and possibly some basic metadata about said files.
|
|
To know details about the files, as one would see from commands like
|
|
\f[C]find\f[R] or \f[C]ls\f[R], it is required to call \f[C]stat\f[R] on
|
|
the file which is controlled by \f[C]fuse.getattr\f[R].
|
|
.SS ioctl
|
|
.PP
|
|
When \f[C]ioctl\f[R] is used with an open file then it will use the file
|
|
handle which was created at the original \f[C]open\f[R] call.
|
|
However, when using \f[C]ioctl\f[R] with a directory mergerfs will use
|
|
the \f[C]open\f[R] policy to find the directory to act on.
|
|
.SS rename & link
|
|
.PP
|
|
\f[B]NOTE:\f[R] If you\[cq]re receiving errors from software when files
|
|
are moved / renamed / linked then you should consider changing the
|
|
create policy to one which is \f[B]not\f[R] path preserving, enabling
|
|
\f[C]ignorepponrename\f[R], or contacting the author of the offending
|
|
software and requesting that \f[C]EXDEV\f[R] (cross device / improper
|
|
link) be properly handled.
|
|
.PP
|
|
\f[C]rename\f[R] and \f[C]link\f[R] are tricky functions in a union
|
|
filesystem.
|
|
\f[C]rename\f[R] only works within a single filesystem or device.
|
|
If a rename can\[cq]t be done atomically due to the source and
|
|
destination paths existing on different mount points it will return
|
|
\f[B]-1\f[R] with \f[B]errno = EXDEV\f[R] (cross device / improper
|
|
link).
|
|
So if a \f[C]rename\f[R]\[cq]s source and target are on different
|
|
filesystems within the pool it creates an issue.
|
|
.PP
|
|
Originally mergerfs would return EXDEV whenever a rename was requested
|
|
which was cross directory in any way.
|
|
This made the code simple and was technically compliant with POSIX
|
|
requirements.
|
|
However, many applications fail to handle EXDEV at all and treat it as a
|
|
normal error or otherwise handle it poorly.
|
|
Such apps include: gvfsd-fuse v1.20.3 and prior, Finder / CIFS/SMB
|
|
client in Apple OSX 10.9+, NZBGet, Samba\[cq]s recycling bin feature.
|
|
.PP
|
|
As a result a compromise was made in order to get most software to work
|
|
while still obeying mergerfs\[cq] policies.
|
|
Below is the basic logic.
|
|
.IP \[bu] 2
|
|
If using a \f[B]create\f[R] policy which tries to preserve directory
|
|
paths (epff,eplfs,eplus,epmfs)
|
|
.RS 2
|
|
.IP \[bu] 2
|
|
Using the \f[B]rename\f[R] policy get the list of files to rename
|
|
.IP \[bu] 2
|
|
For each file attempt rename:
|
|
.RS 2
|
|
.IP \[bu] 2
|
|
If failure with ENOENT (no such file or directory) run \f[B]create\f[R]
|
|
policy
|
|
.IP \[bu] 2
|
|
If create policy returns the same branch as currently evaluating then
|
|
clone the path
|
|
.IP \[bu] 2
|
|
Re-attempt rename
|
|
.RE
|
|
.IP \[bu] 2
|
|
If \f[B]any\f[R] of the renames succeed the higher level rename is
|
|
considered a success
|
|
.IP \[bu] 2
|
|
If \f[B]no\f[R] renames succeed the first error encountered will be
|
|
returned
|
|
.IP \[bu] 2
|
|
On success:
|
|
.RS 2
|
|
.IP \[bu] 2
|
|
Remove the target from all branches with no source file
|
|
.IP \[bu] 2
|
|
Remove the source from all branches which failed to rename
|
|
.RE
|
|
.RE
|
|
.IP \[bu] 2
|
|
If using a \f[B]create\f[R] policy which does \f[B]not\f[R] try to
|
|
preserve directory paths
|
|
.RS 2
|
|
.IP \[bu] 2
|
|
Using the \f[B]rename\f[R] policy get the list of files to rename
|
|
.IP \[bu] 2
|
|
Using the \f[B]getattr\f[R] policy get the target path
|
|
.IP \[bu] 2
|
|
For each file attempt rename:
|
|
.RS 2
|
|
.IP \[bu] 2
|
|
If the source branch != target branch:
|
|
.RS 2
|
|
.IP \[bu] 2
|
|
Clone target path from target branch to source branch
|
|
.RE
|
|
.IP \[bu] 2
|
|
Rename
|
|
.RE
|
|
.IP \[bu] 2
|
|
If \f[B]any\f[R] of the renames succeed the higher level rename is
|
|
considered a success
|
|
.IP \[bu] 2
|
|
If \f[B]no\f[R] renames succeed the first error encountered will be
|
|
returned
|
|
.IP \[bu] 2
|
|
On success:
|
|
.RS 2
|
|
.IP \[bu] 2
|
|
Remove the target from all branches with no source file
|
|
.IP \[bu] 2
|
|
Remove the source from all branches which failed to rename
|
|
.RE
|
|
.RE
|
|
.PP
|
|
The the removals are subject to normal entitlement checks.
|
|
.PP
|
|
The above behavior will help minimize the likelihood of EXDEV being
|
|
returned but it will still be possible.
|
|
.PP
|
|
\f[B]link\f[R] uses the same strategy but without the removals.
|
|
.SS statfs / statvfs
|
|
.PP
|
|
statvfs (http://linux.die.net/man/2/statvfs) normalizes the source
|
|
filesystems based on the fragment size and sums the number of adjusted
|
|
blocks and inodes.
|
|
This means you will see the combined space of all sources.
|
|
Total, used, and free.
|
|
The sources however are dedupped based on the filesystem so multiple
|
|
sources on the same drive will not result in double counting its space.
|
|
Other filesystems mounted further down the tree of the branch will not
|
|
be included when checking the mount\[cq]s stats.
|
|
.PP
|
|
The options \f[C]statfs\f[R] and \f[C]statfs_ignore\f[R] can be used to
|
|
modify \f[C]statfs\f[R] behavior.
|
|
.SS flush-on-close
|
|
.PP
|
|
https://lkml.kernel.org/linux-fsdevel/20211024132607.1636952-1-amir73il\[at]gmail.com/T/
|
|
.PP
|
|
By default FUSE would issue a flush before the release of a file
|
|
descriptor.
|
|
This was considered a bit aggressive and a feature added to give the
|
|
FUSE server the ability to choose when that happens.
|
|
.PP
|
|
Options: * always * never * opened-for-write
|
|
.PP
|
|
For now it defaults to \[lq]opened-for-write\[rq] which is less
|
|
aggressive than the behavior before this feature was added.
|
|
It should not be a problem because the flush is really only relevant
|
|
when a file is written to.
|
|
Given flush is irrelevant for many filesystems in the future a branch
|
|
specific flag may be added so only files opened on a specific branch
|
|
would be flushed on close.
|
|
.SH ERROR HANDLING
|
|
.PP
|
|
POSIX filesystem functions offer a single return code meaning that there
|
|
is some complication regarding the handling of multiple branches as
|
|
mergerfs does.
|
|
It tries to handle errors in a way that would generally return
|
|
meaningful values for that particular function.
|
|
.SS chmod, chown, removexattr, setxattr, truncate, utimens
|
|
.IP "1)" 3
|
|
if no error: return 0 (success)
|
|
.IP "2)" 3
|
|
if no successes: return first error
|
|
.IP "3)" 3
|
|
if one of the files acted on was the same as the related search
|
|
function: return its value
|
|
.IP "4)" 3
|
|
return 0 (success)
|
|
.PP
|
|
While doing this increases the complexity and cost of error handling,
|
|
particularly step 3, this provides probably the most reasonable return
|
|
value.
|
|
.SS unlink, rmdir
|
|
.IP "1)" 3
|
|
if no errors: return 0 (success)
|
|
.IP "2)" 3
|
|
return first error
|
|
.PP
|
|
Older version of mergerfs would return success if any success occurred
|
|
but for unlink and rmdir there are downstream assumptions that, while
|
|
not impossible to occur, can confuse some software.
|
|
.SS others
|
|
.PP
|
|
For search functions there is always a single thing acted on and as such
|
|
whatever return value that comes from the single function call is
|
|
returned.
|
|
.PP
|
|
For create functions \f[C]mkdir\f[R], \f[C]mknod\f[R], and
|
|
\f[C]symlink\f[R] which don\[cq]t return a file descriptor and therefore
|
|
can have \f[C]all\f[R] or \f[C]epall\f[R] policies it will return
|
|
success if any of the calls succeed and an error otherwise.
|
|
.SH INSTALL
|
|
.PP
|
|
https://github.com/trapexit/mergerfs/releases
|
|
.PP
|
|
If your distribution\[cq]s package manager includes mergerfs check if
|
|
the version is up to date.
|
|
If out of date it is recommended to use the latest release found on the
|
|
release page.
|
|
Details for common distros are below.
|
|
.SS Debian
|
|
.PP
|
|
Most Debian installs are of a stable branch and therefore do not have
|
|
the most up to date software.
|
|
While mergerfs is available via \f[C]apt\f[R] it is suggested that uses
|
|
install the most recent version available from the releases
|
|
page (https://github.com/trapexit/mergerfs/releases).
|
|
.SS prebuilt deb
|
|
.IP
|
|
.nf
|
|
\f[C]
|
|
wget https://github.com/trapexit/mergerfs/releases/download/<ver>/mergerfs_<ver>.debian-<rel>_<arch>.deb
|
|
dpkg -i mergerfs_<ver>.debian-<rel>_<arch>.deb
|
|
\f[R]
|
|
.fi
|
|
.SS apt
|
|
.IP
|
|
.nf
|
|
\f[C]
|
|
sudo apt install -y mergerfs
|
|
\f[R]
|
|
.fi
|
|
.SS Ubuntu
|
|
.PP
|
|
Most Ubuntu installs are of a stable branch and therefore do not have
|
|
the most up to date software.
|
|
While mergerfs is available via \f[C]apt\f[R] it is suggested that uses
|
|
install the most recent version available from the releases
|
|
page (https://github.com/trapexit/mergerfs/releases).
|
|
.SS prebuilt deb
|
|
.IP
|
|
.nf
|
|
\f[C]
|
|
wget https://github.com/trapexit/mergerfs/releases/download/<version>/mergerfs_<ver>.ubuntu-<rel>_<arch>.deb
|
|
dpkg -i mergerfs_<ver>.ubuntu-<rel>_<arch>.deb
|
|
\f[R]
|
|
.fi
|
|
.SS apt
|
|
.IP
|
|
.nf
|
|
\f[C]
|
|
sudo apt install -y mergerfs
|
|
\f[R]
|
|
.fi
|
|
.SS Raspberry Pi OS
|
|
.PP
|
|
Effectively the same as Debian or Ubuntu.
|
|
.SS Fedora
|
|
.IP
|
|
.nf
|
|
\f[C]
|
|
wget https://github.com/trapexit/mergerfs/releases/download/<ver>/mergerfs-<ver>.fc<rel>.<arch>.rpm
|
|
sudo rpm -i mergerfs-<ver>.fc<rel>.<arch>.rpm
|
|
\f[R]
|
|
.fi
|
|
.SS CentOS / Rocky
|
|
.IP
|
|
.nf
|
|
\f[C]
|
|
wget https://github.com/trapexit/mergerfs/releases/download/<ver>/mergerfs-<ver>.el<rel>.<arch>.rpm
|
|
sudo rpm -i mergerfs-<ver>.el<rel>.<arch>.rpm
|
|
\f[R]
|
|
.fi
|
|
.SS ArchLinux
|
|
.IP "1." 3
|
|
Setup AUR
|
|
.IP "2." 3
|
|
Install \f[C]mergerfs\f[R]
|
|
.SS Other
|
|
.PP
|
|
Static binaries are provided for situations where native packages are
|
|
unavailable.
|
|
.IP
|
|
.nf
|
|
\f[C]
|
|
wget https://github.com/trapexit/mergerfs/releases/download/<ver>/mergerfs-static-linux_<arch>.tar.gz
|
|
sudo tar xvf mergerfs-static-linux_<arch>.tar.gz -C /
|
|
\f[R]
|
|
.fi
|
|
.SH BUILD
|
|
.PP
|
|
\f[B]NOTE:\f[R] Prebuilt packages can be found at and recommended for
|
|
most users: https://github.com/trapexit/mergerfs/releases
|
|
.PP
|
|
\f[B]NOTE:\f[R] Only tagged releases are supported.
|
|
\f[C]master\f[R] and other branches should be considered works in
|
|
progress.
|
|
.PP
|
|
First get the code from github (https://github.com/trapexit/mergerfs).
|
|
.IP
|
|
.nf
|
|
\f[C]
|
|
$ git clone https://github.com/trapexit/mergerfs.git
|
|
$ # or
|
|
$ wget https://github.com/trapexit/mergerfs/releases/download/<ver>/mergerfs-<ver>.tar.gz
|
|
\f[R]
|
|
.fi
|
|
.SS Debian / Ubuntu
|
|
.IP
|
|
.nf
|
|
\f[C]
|
|
$ cd mergerfs
|
|
$ sudo tools/install-build-pkgs
|
|
$ make deb
|
|
$ sudo dpkg -i ../mergerfs_<version>_<arch>.deb
|
|
\f[R]
|
|
.fi
|
|
.SS RHEL / CentOS / Rocky / Fedora
|
|
.IP
|
|
.nf
|
|
\f[C]
|
|
$ su -
|
|
# cd mergerfs
|
|
# tools/install-build-pkgs
|
|
# make rpm
|
|
# rpm -i rpmbuild/RPMS/<arch>/mergerfs-<version>.<arch>.rpm
|
|
\f[R]
|
|
.fi
|
|
.SS Generic
|
|
.PP
|
|
Have git, g++, make, python installed.
|
|
.IP
|
|
.nf
|
|
\f[C]
|
|
$ cd mergerfs
|
|
$ make
|
|
$ sudo make install
|
|
\f[R]
|
|
.fi
|
|
.SS Build options
|
|
.IP
|
|
.nf
|
|
\f[C]
|
|
$ make help
|
|
usage: make
|
|
|
|
make USE_XATTR=0 - build program without xattrs functionality
|
|
make STATIC=1 - build static binary
|
|
make LTO=1 - build with link time optimization
|
|
\f[R]
|
|
.fi
|
|
.SH UPGRADE
|
|
.PP
|
|
mergerfs can be upgraded live by mounting on top of the previous
|
|
instance.
|
|
Simply install the new version of mergerfs and follow the instructions
|
|
below.
|
|
.PP
|
|
Run mergerfs again or if using \f[C]/etc/fstab\f[R] call for it to mount
|
|
again.
|
|
Existing open files and such will continue to work fine though they
|
|
won\[cq]t see runtime changes since any such change would be the new
|
|
mount.
|
|
If you plan on changing settings with the new mount you should / could
|
|
apply those before mounting the new version.
|
|
.IP
|
|
.nf
|
|
\f[C]
|
|
$ sudo mount /mnt/mergerfs
|
|
$ mount | grep mergerfs
|
|
media on /mnt/mergerfs type fuse.mergerfs (rw,relatime,user_id=0,group_id=0,default_permissions,allow_other)
|
|
media on /mnt/mergerfs type fuse.mergerfs (rw,relatime,user_id=0,group_id=0,default_permissions,allow_other)
|
|
\f[R]
|
|
.fi
|
|
.PP
|
|
A problem with this approach is that the underlying instance will
|
|
continue to run even if the software using it stop or are restarted.
|
|
To work around this you can use a \[lq]lazy umount\[rq].
|
|
Before mounting over top the mount point with the new instance of
|
|
mergerfs issue: \f[C]umount -l <mergerfs_mountpoint>\f[R].
|
|
Or you can let mergerfs do it by setting the option
|
|
\f[C]lazy-umount-mountpoint=true\f[R].
|
|
.SH RUNTIME INTERFACES
|
|
.SS RUNTIME CONFIG
|
|
.SS .mergerfs pseudo file
|
|
.IP
|
|
.nf
|
|
\f[C]
|
|
<mountpoint>/.mergerfs
|
|
\f[R]
|
|
.fi
|
|
.PP
|
|
There is a pseudo file available at the mount point which allows for the
|
|
runtime modification of certain \f[B]mergerfs\f[R] options.
|
|
The file will not show up in \f[B]readdir\f[R] but can be
|
|
\f[B]stat\f[R]\[cq]ed and manipulated via
|
|
{list,get,set}xattrs (http://linux.die.net/man/2/listxattr) calls.
|
|
.PP
|
|
Any changes made at runtime are \f[B]not\f[R] persisted.
|
|
If you wish for values to persist they must be included as options
|
|
wherever you configure the mounting of mergerfs (/etc/fstab).
|
|
.SS Keys
|
|
.PP
|
|
Use \f[C]getfattr -d /mountpoint/.mergerfs\f[R] or
|
|
\f[C]xattr -l /mountpoint/.mergerfs\f[R] to see all supported keys.
|
|
Some are informational and therefore read-only.
|
|
\f[C]setxattr\f[R] will return EINVAL (invalid argument) on read-only
|
|
keys.
|
|
.SS Values
|
|
.PP
|
|
Same as the command line.
|
|
.SS user.mergerfs.branches
|
|
.PP
|
|
Used to query or modify the list of branches.
|
|
When modifying there are several shortcuts to easy manipulation of the
|
|
list.
|
|
.PP
|
|
.TS
|
|
tab(@);
|
|
l l.
|
|
T{
|
|
Value
|
|
T}@T{
|
|
Description
|
|
T}
|
|
_
|
|
T{
|
|
[list]
|
|
T}@T{
|
|
set
|
|
T}
|
|
T{
|
|
+<[list]
|
|
T}@T{
|
|
prepend
|
|
T}
|
|
T{
|
|
+>[list]
|
|
T}@T{
|
|
append
|
|
T}
|
|
T{
|
|
-[list]
|
|
T}@T{
|
|
remove all values provided
|
|
T}
|
|
T{
|
|
-<
|
|
T}@T{
|
|
remove first in list
|
|
T}
|
|
T{
|
|
->
|
|
T}@T{
|
|
remove last in list
|
|
T}
|
|
.TE
|
|
.PP
|
|
\f[C]xattr -w user.mergerfs.branches +</mnt/drive3 /mnt/pool/.mergerfs\f[R]
|
|
.PP
|
|
The \f[C]=NC\f[R], \f[C]=RO\f[R], \f[C]=RW\f[R] syntax works just as on
|
|
the command line.
|
|
.SS Example
|
|
.IP
|
|
.nf
|
|
\f[C]
|
|
[trapexit:/mnt/mergerfs] $ getfattr -d .mergerfs
|
|
user.mergerfs.branches=\[dq]/mnt/a=RW:/mnt/b=RW\[dq]
|
|
user.mergerfs.minfreespace=\[dq]4294967295\[dq]
|
|
user.mergerfs.moveonenospc=\[dq]false\[dq]
|
|
\&...
|
|
|
|
[trapexit:/mnt/mergerfs] $ getfattr -n user.mergerfs.category.search .mergerfs
|
|
user.mergerfs.category.search=\[dq]ff\[dq]
|
|
|
|
[trapexit:/mnt/mergerfs] $ setfattr -n user.mergerfs.category.search -v newest .mergerfs
|
|
[trapexit:/mnt/mergerfs] $ getfattr -n user.mergerfs.category.search .mergerfs
|
|
user.mergerfs.category.search=\[dq]newest\[dq]
|
|
\f[R]
|
|
.fi
|
|
.SS file / directory xattrs
|
|
.PP
|
|
While they won\[cq]t show up when using \f[C]getfattr\f[R]
|
|
\f[B]mergerfs\f[R] offers a number of special xattrs to query
|
|
information about the files served.
|
|
To access the values you will need to issue a
|
|
getxattr (http://linux.die.net/man/2/getxattr) for one of the following:
|
|
.IP \[bu] 2
|
|
\f[B]user.mergerfs.basepath\f[R]: the base mount point for the file
|
|
given the current getattr policy
|
|
.IP \[bu] 2
|
|
\f[B]user.mergerfs.relpath\f[R]: the relative path of the file from the
|
|
perspective of the mount point
|
|
.IP \[bu] 2
|
|
\f[B]user.mergerfs.fullpath\f[R]: the full path of the original file
|
|
given the getattr policy
|
|
.IP \[bu] 2
|
|
\f[B]user.mergerfs.allpaths\f[R]: a NUL (`\[rs]0') separated list of
|
|
full paths to all files found
|
|
.SS SIGNALS
|
|
.IP \[bu] 2
|
|
USR1: This will cause mergerfs to send invalidation notifications to the
|
|
kernel for all files.
|
|
This will cause all unused files to be released from memory.
|
|
.IP \[bu] 2
|
|
USR2: Trigger a general cleanup of currently unused memory.
|
|
A more thorough version of what happens every \[ti]15 minutes.
|
|
.SS IOCTLS
|
|
.PP
|
|
Found in \f[C]fuse_ioctl.cpp\f[R]:
|
|
.IP
|
|
.nf
|
|
\f[C]
|
|
typedef char IOCTL_BUF[4096];
|
|
#define IOCTL_APP_TYPE 0xDF
|
|
#define IOCTL_FILE_INFO _IOWR(IOCTL_APP_TYPE,0,IOCTL_BUF)
|
|
#define IOCTL_GC _IO(IOCTL_APP_TYPE,1)
|
|
#define IOCTL_GC1 _IO(IOCTL_APP_TYPE,2)
|
|
#define IOCTL_INVALIDATE_ALL_NODES _IO(IOCTL_APP_TYPE,3)
|
|
\f[R]
|
|
.fi
|
|
.IP \[bu] 2
|
|
IOCTL_FILE_INFO: Same as the \[lq]file / directory xattrs\[rq] mentioned
|
|
above.
|
|
Use a buffer size of 4096 bytes.
|
|
Pass in a string of \[lq]basepath\[rq], \[lq]relpath\[rq],
|
|
\[lq]fullpath\[rq], or \[lq]allpaths\[rq].
|
|
Receive details in same buffer.
|
|
.IP \[bu] 2
|
|
IOCTL_GC: Triggers a thorough garbage collection of excess memory.
|
|
Same as SIGUSR2.
|
|
.IP \[bu] 2
|
|
IOCTL_GC1: Triggers a simple garbage collection of excess memory.
|
|
Same as what happens every 15 minutes normally.
|
|
.IP \[bu] 2
|
|
IOCTL_INVALIDATE_ALL_NODES: Same as SIGUSR1.
|
|
Send invalidation notifications to the kernel for all files causing
|
|
unused files to be released from memory.
|
|
.SH TOOLING
|
|
.IP \[bu] 2
|
|
https://github.com/trapexit/mergerfs-tools
|
|
.RS 2
|
|
.IP \[bu] 2
|
|
mergerfs.ctl: A tool to make it easier to query and configure mergerfs
|
|
at runtime
|
|
.IP \[bu] 2
|
|
mergerfs.fsck: Provides permissions and ownership auditing and the
|
|
ability to fix them
|
|
.IP \[bu] 2
|
|
mergerfs.dedup: Will help identify and optionally remove duplicate files
|
|
.IP \[bu] 2
|
|
mergerfs.dup: Ensure there are at least N copies of a file across the
|
|
pool
|
|
.IP \[bu] 2
|
|
mergerfs.balance: Rebalance files across filesystems by moving them from
|
|
the most filled to the least filled
|
|
.IP \[bu] 2
|
|
mergerfs.consolidate: move files within a single mergerfs directory to
|
|
the filesystem with most free space
|
|
.RE
|
|
.IP \[bu] 2
|
|
https://github.com/trapexit/scorch
|
|
.RS 2
|
|
.IP \[bu] 2
|
|
scorch: A tool to help discover silent corruption of files and keep
|
|
track of files
|
|
.RE
|
|
.IP \[bu] 2
|
|
https://github.com/trapexit/bbf
|
|
.RS 2
|
|
.IP \[bu] 2
|
|
bbf (bad block finder): a tool to scan for and `fix' hard drive bad
|
|
blocks and find the files using those blocks
|
|
.RE
|
|
.SH CACHING
|
|
.SS page caching
|
|
.PP
|
|
https://en.wikipedia.org/wiki/Page_cache
|
|
.IP \[bu] 2
|
|
cache.files=off: Disables page caching.
|
|
Underlying files cached, mergerfs files are not.
|
|
.IP \[bu] 2
|
|
cache.files=partial: Enables page caching.
|
|
Underlying files cached, mergerfs files cached while open.
|
|
.IP \[bu] 2
|
|
cache.files=full: Enables page caching.
|
|
Underlying files cached, mergerfs files cached across opens.
|
|
.IP \[bu] 2
|
|
cache.files=auto-full: Enables page caching.
|
|
Underlying files cached, mergerfs files cached across opens if mtime and
|
|
size are unchanged since previous open.
|
|
.IP \[bu] 2
|
|
cache.files=libfuse: follow traditional libfuse \f[C]direct_io\f[R],
|
|
\f[C]kernel_cache\f[R], and \f[C]auto_cache\f[R] arguments.
|
|
.IP \[bu] 2
|
|
cache.files=per-process: Enable page caching (equivalent to
|
|
\f[C]cache.files=partial\f[R]) only for processes whose `comm' name
|
|
matches one of the values defined in
|
|
\f[C]cache.files.process-names\f[R].
|
|
If the name does not match the file open is equivalent to
|
|
\f[C]cache.files=off\f[R].
|
|
.PP
|
|
FUSE, which mergerfs uses, offers a number of page caching modes.
|
|
mergerfs tries to simplify their use via the \f[C]cache.files\f[R]
|
|
option.
|
|
It can and should replace usage of \f[C]direct_io\f[R],
|
|
\f[C]kernel_cache\f[R], and \f[C]auto_cache\f[R].
|
|
.PP
|
|
Due to mergerfs using FUSE and therefore being a userland process
|
|
proxying existing filesystems the kernel will double cache the content
|
|
being read and written through mergerfs.
|
|
Once from the underlying filesystem and once from mergerfs (it sees them
|
|
as two separate entities).
|
|
Using \f[C]cache.files=off\f[R] will keep the double caching from
|
|
happening by disabling caching of mergerfs but this has the side effect
|
|
that \f[I]all\f[R] read and write calls will be passed to mergerfs which
|
|
may be slower than enabling caching, you lose shared \f[C]mmap\f[R]
|
|
support which can affect apps such as rtorrent, and no read-ahead will
|
|
take place.
|
|
The kernel will still cache the underlying filesystem data but that only
|
|
helps so much given mergerfs will still process all requests.
|
|
.PP
|
|
If you do enable file page caching,
|
|
\f[C]cache.files=partial|full|auto-full\f[R], you should also enable
|
|
\f[C]dropcacheonclose\f[R] which will cause mergerfs to instruct the
|
|
kernel to flush the underlying file\[cq]s page cache when the file is
|
|
closed.
|
|
This behavior is the same as the rsync fadvise / drop cache patch and
|
|
Feh\[cq]s nocache project.
|
|
.PP
|
|
If most files are read once through and closed (like media) it is best
|
|
to enable \f[C]dropcacheonclose\f[R] regardless of caching mode in order
|
|
to minimize buffer bloat.
|
|
.PP
|
|
It is difficult to balance memory usage, cache bloat & duplication, and
|
|
performance.
|
|
Ideally mergerfs would be able to disable caching for the files it
|
|
reads/writes but allow page caching for itself.
|
|
That would limit the FUSE overhead.
|
|
However, there isn\[cq]t a good way to achieve this.
|
|
It would need to open all files with O_DIRECT which places limitations
|
|
on the what underlying filesystems would be supported and complicates
|
|
the code.
|
|
.PP
|
|
kernel documentation:
|
|
https://www.kernel.org/doc/Documentation/filesystems/fuse-io.txt
|
|
.SS entry & attribute caching
|
|
.PP
|
|
Given the relatively high cost of FUSE due to the kernel <-> userspace
|
|
round trips there are kernel side caches for file entries and
|
|
attributes.
|
|
The entry cache limits the \f[C]lookup\f[R] calls to mergerfs which ask
|
|
if a file exists.
|
|
The attribute cache limits the need to make \f[C]getattr\f[R] calls to
|
|
mergerfs which provide file attributes (mode, size, type, etc.).
|
|
As with the page cache these should not be used if the underlying
|
|
filesystems are being manipulated at the same time as it could lead to
|
|
odd behavior or data corruption.
|
|
The options for setting these are \f[C]cache.entry\f[R] and
|
|
\f[C]cache.negative_entry\f[R] for the entry cache and
|
|
\f[C]cache.attr\f[R] for the attributes cache.
|
|
\f[C]cache.negative_entry\f[R] refers to the timeout for negative
|
|
responses to lookups (non-existent files).
|
|
.SS writeback caching
|
|
.PP
|
|
When \f[C]cache.files\f[R] is enabled the default is for it to perform
|
|
writethrough caching.
|
|
This behavior won\[cq]t help improve performance as each write still
|
|
goes one for one through the filesystem.
|
|
By enabling the FUSE writeback cache small writes may be aggregated by
|
|
the kernel and then sent to mergerfs as one larger request.
|
|
This can greatly improve the throughput for apps which write to files
|
|
inefficiently.
|
|
The amount the kernel can aggregate is limited by the size of a FUSE
|
|
message.
|
|
Read the \f[C]fuse_msg_size\f[R] section for more details.
|
|
.PP
|
|
There is a small side effect as a result of enabling writeback caching.
|
|
Underlying files won\[cq]t ever be opened with O_APPEND or O_WRONLY.
|
|
The former because the kernel then manages append mode and the latter
|
|
because the kernel may request file data from mergerfs to populate the
|
|
write cache.
|
|
The O_APPEND change means that if a file is changed outside of mergerfs
|
|
it could lead to corruption as the kernel won\[cq]t know the end of the
|
|
file has changed.
|
|
That said any time you use caching you should keep from using the same
|
|
file outside of mergerfs at the same time.
|
|
.PP
|
|
Note that if an application is properly sizing writes then writeback
|
|
caching will have little or no effect.
|
|
It will only help with writes of sizes below the FUSE message size (128K
|
|
on older kernels, 1M on newer).
|
|
.SS statfs caching
|
|
.PP
|
|
Of the syscalls used by mergerfs in policies the \f[C]statfs\f[R] /
|
|
\f[C]statvfs\f[R] call is perhaps the most expensive.
|
|
It\[cq]s used to find out the available space of a filesystem and
|
|
whether it is mounted read-only.
|
|
Depending on the setup and usage pattern these queries can be relatively
|
|
costly.
|
|
When \f[C]cache.statfs\f[R] is enabled all calls to \f[C]statfs\f[R] by
|
|
a policy will be cached for the number of seconds its set to.
|
|
.PP
|
|
Example: If the create policy is \f[C]mfs\f[R] and the timeout is 60
|
|
then for that 60 seconds the same filesystem will be returned as the
|
|
target for creates because the available space won\[cq]t be updated for
|
|
that time.
|
|
.SS symlink caching
|
|
.PP
|
|
As of version 4.20 Linux supports symlink caching.
|
|
Significant performance increases can be had in workloads which use a
|
|
lot of symlinks.
|
|
Setting \f[C]cache.symlinks=true\f[R] will result in requesting symlink
|
|
caching from the kernel only if supported.
|
|
As a result its safe to enable it on systems prior to 4.20.
|
|
That said it is disabled by default for now.
|
|
You can see if caching is enabled by querying the xattr
|
|
\f[C]user.mergerfs.cache.symlinks\f[R] but given it must be requested at
|
|
startup you can not change it at runtime.
|
|
.SS readdir caching
|
|
.PP
|
|
As of version 4.20 Linux supports readdir caching.
|
|
This can have a significant impact on directory traversal.
|
|
Especially when combined with entry (\f[C]cache.entry\f[R]) and
|
|
attribute (\f[C]cache.attr\f[R]) caching.
|
|
Setting \f[C]cache.readdir=true\f[R] will result in requesting readdir
|
|
caching from the kernel on each \f[C]opendir\f[R].
|
|
If the kernel doesn\[cq]t support readdir caching setting the option to
|
|
\f[C]true\f[R] has no effect.
|
|
This option is configurable at runtime via xattr
|
|
\f[C]user.mergerfs.cache.readdir\f[R].
|
|
.SS tiered caching
|
|
.PP
|
|
Some storage technologies support what some call \[lq]tiered\[rq]
|
|
caching.
|
|
The placing of usually smaller, faster storage as a transparent cache to
|
|
larger, slower storage.
|
|
NVMe, SSD, Optane in front of traditional HDDs for instance.
|
|
.PP
|
|
mergerfs does not natively support any sort of tiered caching.
|
|
Most users have no use for such a feature and its inclusion would
|
|
complicate the code.
|
|
However, there are a few situations where a cache filesystem could help
|
|
with a typical mergerfs setup.
|
|
.IP "1." 3
|
|
Fast network, slow filesystems, many readers: You\[cq]ve a 10+Gbps
|
|
network with many readers and your regular filesystems can\[cq]t keep
|
|
up.
|
|
.IP "2." 3
|
|
Fast network, slow filesystems, small\[cq]ish bursty writes: You have a
|
|
10+Gbps network and wish to transfer amounts of data less than your
|
|
cache filesystem but wish to do so quickly.
|
|
.PP
|
|
With #1 it\[cq]s arguable if you should be using mergerfs at all.
|
|
RAID would probably be the better solution.
|
|
If you\[cq]re going to use mergerfs there are other tactics that may
|
|
help: spreading the data across filesystems (see the mergerfs.dup tool)
|
|
and setting \f[C]func.open=rand\f[R], using \f[C]symlinkify\f[R], or
|
|
using dm-cache or a similar technology to add tiered cache to the
|
|
underlying device.
|
|
.PP
|
|
With #2 one could use dm-cache as well but there is another solution
|
|
which requires only mergerfs and a cronjob.
|
|
.IP "1." 3
|
|
Create 2 mergerfs pools.
|
|
One which includes just the slow devices and one which has both the fast
|
|
devices (SSD,NVME,etc.) and slow devices.
|
|
.IP "2." 3
|
|
The `cache' pool should have the cache filesystems listed first.
|
|
.IP "3." 3
|
|
The best \f[C]create\f[R] policies to use for the `cache' pool would
|
|
probably be \f[C]ff\f[R], \f[C]epff\f[R], \f[C]lfs\f[R], or
|
|
\f[C]eplfs\f[R].
|
|
The latter two under the assumption that the cache filesystem(s) are far
|
|
smaller than the backing filesystems.
|
|
If using path preserving policies remember that you\[cq]ll need to
|
|
manually create the core directories of those paths you wish to be
|
|
cached.
|
|
Be sure the permissions are in sync.
|
|
Use \f[C]mergerfs.fsck\f[R] to check / correct them.
|
|
You could also set the slow filesystems mode to \f[C]NC\f[R] though
|
|
that\[cq]d mean if the cache filesystems fill you\[cq]d get \[lq]out of
|
|
space\[rq] errors.
|
|
.IP "4." 3
|
|
Enable \f[C]moveonenospc\f[R] and set \f[C]minfreespace\f[R]
|
|
appropriately.
|
|
To make sure there is enough room on the \[lq]slow\[rq] pool you might
|
|
want to set \f[C]minfreespace\f[R] to at least as large as the size of
|
|
the largest cache filesystem if not larger.
|
|
This way in the worst case the whole of the cache filesystem(s) can be
|
|
moved to the other drives.
|
|
.IP "5." 3
|
|
Set your programs to use the cache pool.
|
|
.IP "6." 3
|
|
Save one of the below scripts or create you\[cq]re own.
|
|
.IP "7." 3
|
|
Use \f[C]cron\f[R] (as root) to schedule the command at whatever
|
|
frequency is appropriate for your workflow.
|
|
.SS time based expiring
|
|
.PP
|
|
Move files from cache to backing pool based only on the last time the
|
|
file was accessed.
|
|
Replace \f[C]-atime\f[R] with \f[C]-amin\f[R] if you want minutes rather
|
|
than days.
|
|
May want to use the \f[C]fadvise\f[R] / \f[C]--drop-cache\f[R] version
|
|
of rsync or run rsync with the tool \[lq]nocache\[rq].
|
|
.PP
|
|
\f[I]NOTE:\f[R] The arguments to these scripts include the cache
|
|
\f[B]filesystem\f[R] itself.
|
|
Not the pool with the cache filesystem.
|
|
You could have data loss if the source is the cache pool.
|
|
.PP
|
|
mergerfs.time-based-mover
|
|
.SS percentage full expiring
|
|
.PP
|
|
Move the oldest file from the cache to the backing pool.
|
|
Continue till below percentage threshold.
|
|
.PP
|
|
\f[I]NOTE:\f[R] The arguments to these scripts include the cache
|
|
\f[B]filesystem\f[R] itself.
|
|
Not the pool with the cache filesystem.
|
|
You could have data loss if the source is the cache pool.
|
|
.PP
|
|
mergerfs.percent-full-mover
|
|
.SH PERFORMANCE
|
|
.PP
|
|
mergerfs is at its core just a proxy and therefore its theoretical max
|
|
performance is that of the underlying devices.
|
|
However, given it is a FUSE filesystem working from userspace there is
|
|
an increase in overhead relative to kernel based solutions.
|
|
That said the performance can match the theoretical max but it depends
|
|
greatly on the system\[cq]s configuration.
|
|
Especially when adding network filesystems into the mix there are many
|
|
variables which can impact performance.
|
|
Device speeds and latency, network speeds and latency, general
|
|
concurrency, read/write sizes, etc.
|
|
Unfortunately, given the number of variables it has been difficult to
|
|
find a single set of settings which provide optimal performance.
|
|
If you\[cq]re having performance issues please look over the suggestions
|
|
below (including the benchmarking section.)
|
|
.PP
|
|
NOTE: be sure to read about these features before changing them to
|
|
understand what behaviors it may impact
|
|
.IP \[bu] 2
|
|
disable \f[C]security_capability\f[R] and/or \f[C]xattr\f[R]
|
|
.IP \[bu] 2
|
|
increase cache timeouts \f[C]cache.attr\f[R], \f[C]cache.entry\f[R],
|
|
\f[C]cache.negative_entry\f[R]
|
|
.IP \[bu] 2
|
|
enable (or disable) page caching (\f[C]cache.files\f[R])
|
|
.IP \[bu] 2
|
|
enable \f[C]parallel-direct-writes\f[R]
|
|
.IP \[bu] 2
|
|
enable \f[C]cache.writeback\f[R]
|
|
.IP \[bu] 2
|
|
enable \f[C]cache.statfs\f[R]
|
|
.IP \[bu] 2
|
|
enable \f[C]cache.symlinks\f[R]
|
|
.IP \[bu] 2
|
|
enable \f[C]cache.readdir\f[R]
|
|
.IP \[bu] 2
|
|
change the number of worker threads
|
|
.IP \[bu] 2
|
|
disable \f[C]posix_acl\f[R]
|
|
.IP \[bu] 2
|
|
disable \f[C]async_read\f[R]
|
|
.IP \[bu] 2
|
|
test theoretical performance using \f[C]nullrw\f[R] or mounting a ram
|
|
disk
|
|
.IP \[bu] 2
|
|
use \f[C]symlinkify\f[R] if your data is largely static and read-only
|
|
.IP \[bu] 2
|
|
use tiered cache devices
|
|
.IP \[bu] 2
|
|
use LVM and LVM cache to place a SSD in front of your HDDs
|
|
.IP \[bu] 2
|
|
increase readahead: \f[C]readahead=1024\f[R]
|
|
.PP
|
|
If you come across a setting that significantly impacts performance
|
|
please contact trapexit so he may investigate further.
|
|
Please test both against your normal setup, a singular branch, and with
|
|
\f[C]nullrw=true\f[R]
|
|
.SH BENCHMARKING
|
|
.PP
|
|
Filesystems are complicated.
|
|
They do many things and many of those are interconnected.
|
|
Additionally, the OS, drivers, hardware, etc.
|
|
all can impact performance.
|
|
Therefore, when benchmarking, it is \f[B]necessary\f[R] that the test
|
|
focus as narrowly as possible.
|
|
.PP
|
|
For most throughput is the key benchmark.
|
|
To test throughput \f[C]dd\f[R] is useful but \f[B]must\f[R] be used
|
|
with the correct settings in order to ensure the filesystem or device is
|
|
actually being tested.
|
|
The OS can and will cache data.
|
|
Without forcing synchronous reads and writes and/or disabling caching
|
|
the values returned will not be representative of the device\[cq]s true
|
|
performance.
|
|
.PP
|
|
When benchmarking through mergerfs ensure you only use 1 branch to
|
|
remove any possibility of the policies complicating the situation.
|
|
Benchmark the underlying filesystem first and then mount mergerfs over
|
|
it and test again.
|
|
If you\[cq]re experience speeds below your expectation you will need to
|
|
narrow down precisely which component is leading to the slowdown.
|
|
Preferably test the following in the order listed (but not combined).
|
|
.IP "1." 3
|
|
Enable \f[C]nullrw\f[R] mode with \f[C]nullrw=true\f[R].
|
|
This will effectively make reads and writes no-ops.
|
|
Removing the underlying device / filesystem from the equation.
|
|
This will give us the top theoretical speeds.
|
|
.IP "2." 3
|
|
Mount mergerfs over \f[C]tmpfs\f[R].
|
|
\f[C]tmpfs\f[R] is a RAM disk.
|
|
Extremely high speed and very low latency.
|
|
This is a more realistic best case scenario.
|
|
Example: \f[C]mount -t tmpfs -o size=2G tmpfs /tmp/tmpfs\f[R]
|
|
.IP "3." 3
|
|
Mount mergerfs over a local device.
|
|
NVMe, SSD, HDD, etc.
|
|
If you have more than one I\[cq]d suggest testing each of them as drives
|
|
and/or controllers (their drivers) could impact performance.
|
|
.IP "4." 3
|
|
Finally, if you intend to use mergerfs with a network filesystem, either
|
|
as the source of data or to combine with another through mergerfs, test
|
|
each of those alone as above.
|
|
.PP
|
|
Once you find the component which has the performance issue you can do
|
|
further testing with different options to see if they impact
|
|
performance.
|
|
For reads and writes the most relevant would be: \f[C]cache.files\f[R],
|
|
\f[C]async_read\f[R].
|
|
Less likely but relevant when using NFS or with certain filesystems
|
|
would be \f[C]security_capability\f[R], \f[C]xattr\f[R], and
|
|
\f[C]posix_acl\f[R].
|
|
If you find a specific system, device, filesystem, controller, etc.
|
|
that performs poorly contact trapexit so he may investigate further.
|
|
.PP
|
|
Sometimes the problem is really the application accessing or writing
|
|
data through mergerfs.
|
|
Some software use small buffer sizes which can lead to more requests and
|
|
therefore greater overhead.
|
|
You can test this out yourself by replace \f[C]bs=1M\f[R] in the
|
|
examples below with \f[C]ibs\f[R] or \f[C]obs\f[R] and using a size of
|
|
\f[C]512\f[R] instead of \f[C]1M\f[R].
|
|
In one example test using \f[C]nullrw\f[R] the write speed dropped from
|
|
4.9GB/s to 69.7MB/s when moving from \f[C]1M\f[R] to \f[C]512\f[R].
|
|
Similar results were had when testing reads.
|
|
Small writes overhead may be improved by leveraging a write cache but in
|
|
casual tests little gain was found.
|
|
More tests will need to be done before this feature would become
|
|
available.
|
|
If you have an app that appears slow with mergerfs it could be due to
|
|
this.
|
|
Contact trapexit so he may investigate further.
|
|
.SS write benchmark
|
|
.IP
|
|
.nf
|
|
\f[C]
|
|
$ dd if=/dev/zero of=/mnt/mergerfs/1GB.file bs=1M count=1024 oflag=nocache conv=fdatasync status=progress
|
|
\f[R]
|
|
.fi
|
|
.SS read benchmark
|
|
.IP
|
|
.nf
|
|
\f[C]
|
|
$ dd if=/mnt/mergerfs/1GB.file of=/dev/null bs=1M count=1024 iflag=nocache conv=fdatasync status=progress
|
|
\f[R]
|
|
.fi
|
|
.SS other benchmarks
|
|
.PP
|
|
If you are attempting to benchmark other behaviors you must ensure you
|
|
clear kernel caches before runs.
|
|
In fact it would be a good deal to run before the read and write
|
|
benchmarks as well just in case.
|
|
.IP
|
|
.nf
|
|
\f[C]
|
|
sync
|
|
echo 3 | sudo tee /proc/sys/vm/drop_caches
|
|
\f[R]
|
|
.fi
|
|
.SH TIPS / NOTES
|
|
.IP \[bu] 2
|
|
This document is literal and thorough.
|
|
If a suspected feature isn\[cq]t mentioned it doesn\[cq]t exist.
|
|
If certain libfuse arguments aren\[cq]t listed they probably
|
|
shouldn\[cq]t be used.
|
|
.IP \[bu] 2
|
|
Ensure you\[cq]re using the latest version.
|
|
.IP \[bu] 2
|
|
Run mergerfs as \f[C]root\f[R].
|
|
mergerfs is designed and intended to be run as \f[C]root\f[R] and may
|
|
exibit incorrect behavior if run otherwise..
|
|
.IP \[bu] 2
|
|
If you don\[cq]t see some directories and files you expect, policies
|
|
seem to skip branches, you get strange permission errors, etc.
|
|
be sure the underlying filesystems\[cq] permissions are all the same.
|
|
Use \f[C]mergerfs.fsck\f[R] to audit the filesystem for out of sync
|
|
permissions.
|
|
.IP \[bu] 2
|
|
If you still have permission issues be sure you are using POSIX ACL
|
|
compliant filesystems.
|
|
mergerfs doesn\[cq]t generally make exceptions for FAT, NTFS, or other
|
|
non-POSIX filesystem.
|
|
.IP \[bu] 2
|
|
Do \f[B]not\f[R] use \f[C]cache.files=off\f[R] if you expect
|
|
applications (such as rtorrent) to use
|
|
mmap (http://linux.die.net/man/2/mmap) files.
|
|
Shared mmap is not currently supported in FUSE w/ page caching disabled.
|
|
Enabling \f[C]dropcacheonclose\f[R] is recommended when
|
|
\f[C]cache.files=partial|full|auto-full\f[R].
|
|
.IP \[bu] 2
|
|
Kodi (http://kodi.tv), Plex (http://plex.tv),
|
|
Subsonic (http://subsonic.org), etc.
|
|
can use directory mtime (http://linux.die.net/man/2/stat) to more
|
|
efficiently determine whether to scan for new content rather than simply
|
|
performing a full scan.
|
|
If using the default \f[B]getattr\f[R] policy of \f[B]ff\f[R] it\[cq]s
|
|
possible those programs will miss an update on account of it returning
|
|
the first directory found\[cq]s \f[B]stat\f[R] info and it\[cq]s a later
|
|
directory on another mount which had the \f[B]mtime\f[R] recently
|
|
updated.
|
|
To fix this you will want to set \f[B]func.getattr=newest\f[R].
|
|
Remember though that this is just \f[B]stat\f[R].
|
|
If the file is later \f[B]open\f[R]\[cq]ed or \f[B]unlink\f[R]\[cq]ed
|
|
and the policy is different for those then a completely different file
|
|
or directory could be acted on.
|
|
.IP \[bu] 2
|
|
Some policies mixed with some functions may result in strange behaviors.
|
|
Not that some of these behaviors and race conditions couldn\[cq]t happen
|
|
outside \f[B]mergerfs\f[R] but that they are far more likely to occur on
|
|
account of the attempt to merge together multiple sources of data which
|
|
could be out of sync due to the different policies.
|
|
.IP \[bu] 2
|
|
For consistency its generally best to set \f[B]category\f[R] wide
|
|
policies rather than individual \f[B]func\f[R]\[cq]s.
|
|
This will help limit the confusion of tools such as
|
|
rsync (http://linux.die.net/man/1/rsync).
|
|
However, the flexibility is there if needed.
|
|
.SH KNOWN ISSUES / BUGS
|
|
.SS kernel issues & bugs
|
|
.PP
|
|
<https://github.com/trapexit/mergerfs/wiki/Kernel-Issues-&-Bugs>
|
|
.SS directory mtime is not being updated
|
|
.PP
|
|
Remember that the default policy for \f[C]getattr\f[R] is \f[C]ff\f[R].
|
|
The information for the first directory found will be returned.
|
|
If it wasn\[cq]t the directory which had been updated then it will
|
|
appear outdated.
|
|
.PP
|
|
The reason this is the default is because any other policy would be more
|
|
expensive and for many applications it is unnecessary.
|
|
To always return the directory with the most recent mtime or a faked
|
|
value based on all found would require a scan of all filesystems.
|
|
.PP
|
|
If you always want the directory information from the one with the most
|
|
recent mtime then use the \f[C]newest\f[R] policy for \f[C]getattr\f[R].
|
|
.SS `mv /mnt/pool/foo /mnt/disk1/foo' removes `foo'
|
|
.PP
|
|
This is not a bug.
|
|
.PP
|
|
Run in verbose mode to better understand what\[cq]s happening:
|
|
.IP
|
|
.nf
|
|
\f[C]
|
|
$ mv -v /mnt/pool/foo /mnt/disk1/foo
|
|
copied \[aq]/mnt/pool/foo\[aq] -> \[aq]/mnt/disk1/foo\[aq]
|
|
removed \[aq]/mnt/pool/foo\[aq]
|
|
$ ls /mnt/pool/foo
|
|
ls: cannot access \[aq]/mnt/pool/foo\[aq]: No such file or directory
|
|
\f[R]
|
|
.fi
|
|
.PP
|
|
\f[C]mv\f[R], when working across devices, is copying the source to
|
|
target and then removing the source.
|
|
Since the source \f[B]is\f[R] the target in this case, depending on the
|
|
unlink policy, it will remove the just copied file and other files
|
|
across the branches.
|
|
.PP
|
|
If you want to move files to one filesystem just copy them there and use
|
|
mergerfs.dedup to clean up the old paths or manually remove them from
|
|
the branches directly.
|
|
.SS cached memory appears greater than it should be
|
|
.PP
|
|
Use \f[C]cache.files=off\f[R] and/or \f[C]dropcacheonclose=true\f[R].
|
|
See the section on page caching.
|
|
.SS NFS clients returning ESTALE / Stale file handle
|
|
.PP
|
|
NFS does not like out of band changes.
|
|
That is especially true of inode values.
|
|
.PP
|
|
Be sure to use the following options:
|
|
.IP \[bu] 2
|
|
noforget
|
|
.IP \[bu] 2
|
|
inodecalc=path-hash
|
|
.SS rtorrent fails with ENODEV (No such device)
|
|
.PP
|
|
Be sure to set \f[C]cache.files=partial|full|auto-full|per-processe\f[R]
|
|
or turn off \f[C]direct_io\f[R].
|
|
rtorrent and some other applications use
|
|
mmap (http://linux.die.net/man/2/mmap) to read and write to files and
|
|
offer no fallback to traditional methods.
|
|
FUSE does not currently support mmap while using \f[C]direct_io\f[R].
|
|
There may be a performance penalty on writes with \f[C]direct_io\f[R]
|
|
off as well as the problem of double caching but it\[cq]s the only way
|
|
to get such applications to work.
|
|
If the performance loss is too high for other apps you can mount
|
|
mergerfs twice.
|
|
Once with \f[C]direct_io\f[R] enabled and one without it.
|
|
Be sure to set \f[C]dropcacheonclose=true\f[R] if not using
|
|
\f[C]direct_io\f[R].
|
|
.SS Plex doesn\[cq]t work with mergerfs
|
|
.PP
|
|
It does.
|
|
If you\[cq]re trying to put Plex\[cq]s config / metadata / database on
|
|
mergerfs you can\[cq]t set \f[C]cache.files=off\f[R] because Plex is
|
|
using sqlite3 with mmap enabled.
|
|
Shared mmap is not supported by Linux\[cq]s FUSE implementation when
|
|
page caching is disabled.
|
|
To fix this place the data elsewhere (preferable) or enable
|
|
\f[C]cache.files\f[R] (with \f[C]dropcacheonclose=true\f[R]).
|
|
Sqlite3 does not need mmap but the developer needs to fall back to
|
|
standard IO if mmap fails.
|
|
.PP
|
|
This applies to other software: Radarr, Sonarr, Lidarr, Jellyfin, etc.
|
|
.PP
|
|
I would recommend reaching out to the developers of the software
|
|
you\[cq]re having troubles with and asking them to add a fallback to
|
|
regular file IO when mmap is unavailable.
|
|
.PP
|
|
If the issue is that scanning doesn\[cq]t seem to pick up media then be
|
|
sure to set \f[C]func.getattr=newest\f[R] though generally a full scan
|
|
will pick up all media anyway.
|
|
.SS When a program tries to move or rename a file it fails
|
|
.PP
|
|
Please read the section above regarding rename & link.
|
|
.PP
|
|
The problem is that many applications do not properly handle
|
|
\f[C]EXDEV\f[R] errors which \f[C]rename\f[R] and \f[C]link\f[R] may
|
|
return even though they are perfectly valid situations which do not
|
|
indicate actual device, filesystem, or OS errors.
|
|
The error will only be returned by mergerfs if using a path preserving
|
|
policy as described in the policy section above.
|
|
If you do not care about path preservation simply change the mergerfs
|
|
policy to the non-path preserving version.
|
|
For example: \f[C]-o category.create=mfs\f[R] Ideally the offending
|
|
software would be fixed and it is recommended that if you run into this
|
|
problem you contact the software\[cq]s author and request proper
|
|
handling of \f[C]EXDEV\f[R] errors.
|
|
.SS my 32bit software has problems
|
|
.PP
|
|
Some software have problems with 64bit inode values.
|
|
The symptoms can include EOVERFLOW errors when trying to list files.
|
|
You can address this by setting \f[C]inodecalc\f[R] to one of the 32bit
|
|
based algos as described in the relevant section.
|
|
.SS Samba: Moving files / directories fails
|
|
.PP
|
|
Workaround: Copy the file/directory and then remove the original rather
|
|
than move.
|
|
.PP
|
|
This isn\[cq]t an issue with Samba but some SMB clients.
|
|
GVFS-fuse v1.20.3 and prior (found in Ubuntu 14.04 among others) failed
|
|
to handle certain error codes correctly.
|
|
Particularly \f[B]STATUS_NOT_SAME_DEVICE\f[R] which comes from the
|
|
\f[B]EXDEV\f[R] which is returned by \f[B]rename\f[R] when the call is
|
|
crossing mount points.
|
|
When a program gets an \f[B]EXDEV\f[R] it needs to explicitly take an
|
|
alternate action to accomplish its goal.
|
|
In the case of \f[B]mv\f[R] or similar it tries \f[B]rename\f[R] and on
|
|
\f[B]EXDEV\f[R] falls back to a manual copying of data between the two
|
|
locations and unlinking the source.
|
|
In these older versions of GVFS-fuse if it received \f[B]EXDEV\f[R] it
|
|
would translate that into \f[B]EIO\f[R].
|
|
This would cause \f[B]mv\f[R] or most any application attempting to move
|
|
files around on that SMB share to fail with a IO error.
|
|
.PP
|
|
GVFS-fuse v1.22.0 (https://bugzilla.gnome.org/show_bug.cgi?id=734568)
|
|
and above fixed this issue but a large number of systems use the older
|
|
release.
|
|
On Ubuntu the version can be checked by issuing
|
|
\f[C]apt-cache showpkg gvfs-fuse\f[R].
|
|
Most distros released in 2015 seem to have the updated release and will
|
|
work fine but older systems may not.
|
|
Upgrading gvfs-fuse or the distro in general will address the problem.
|
|
.PP
|
|
In Apple\[cq]s MacOSX 10.9 they replaced Samba (client and server) with
|
|
their own product.
|
|
It appears their new client does not handle \f[B]EXDEV\f[R] either and
|
|
responds similar to older release of gvfs on Linux.
|
|
.SS Trashing files occasionally fails
|
|
.PP
|
|
This is the same issue as with Samba.
|
|
\f[C]rename\f[R] returns \f[C]EXDEV\f[R] (in our case that will really
|
|
only happen with path preserving policies like \f[C]epmfs\f[R]) and the
|
|
software doesn\[cq]t handle the situation well.
|
|
This is unfortunately a common failure of software which moves files
|
|
around.
|
|
The standard indicates that an implementation \f[C]MAY\f[R] choose to
|
|
support non-user home directory trashing of files (which is a
|
|
\f[C]MUST\f[R]).
|
|
The implementation \f[C]MAY\f[R] also support \[lq]top directory
|
|
trashes\[rq] which many probably do.
|
|
.PP
|
|
To create a \f[C]$topdir/.Trash\f[R] directory as defined in the
|
|
standard use the
|
|
mergerfs-tools (https://github.com/trapexit/mergerfs-tools) tool
|
|
\f[C]mergerfs.mktrash\f[R].
|
|
.SS Supplemental user groups
|
|
.PP
|
|
Due to the overhead of
|
|
getgroups/setgroups (http://linux.die.net/man/2/setgroups) mergerfs
|
|
utilizes a cache.
|
|
This cache is opportunistic and per thread.
|
|
Each thread will query the supplemental groups for a user when that
|
|
particular thread needs to change credentials and will keep that data
|
|
for the lifetime of the thread.
|
|
This means that if a user is added to a group it may not be picked up
|
|
without the restart of mergerfs.
|
|
However, since the high level FUSE API\[cq]s (at least the standard
|
|
version) thread pool dynamically grows and shrinks it\[cq]s possible
|
|
that over time a thread will be killed and later a new thread with no
|
|
cache will start and query the new data.
|
|
.PP
|
|
The gid cache uses fixed storage to simplify the design and be
|
|
compatible with older systems which may not have C++11 compilers.
|
|
There is enough storage for 256 users\[cq] supplemental groups.
|
|
Each user is allowed up to 32 supplemental groups.
|
|
Linux >= 2.6.3 allows up to 65535 groups per user but most other *nixs
|
|
allow far less.
|
|
NFS allowing only 16.
|
|
The system does handle overflow gracefully.
|
|
If the user has more than 32 supplemental groups only the first 32 will
|
|
be used.
|
|
If more than 256 users are using the system when an uncached user is
|
|
found it will evict an existing user\[cq]s cache at random.
|
|
So long as there aren\[cq]t more than 256 active users this should be
|
|
fine.
|
|
If either value is too low for your needs you will have to modify
|
|
\f[C]gidcache.hpp\f[R] to increase the values.
|
|
Note that doing so will increase the memory needed by each thread.
|
|
.PP
|
|
While not a bug some users have found when using containers that
|
|
supplemental groups defined inside the container don\[cq]t work properly
|
|
with regard to permissions.
|
|
This is expected as mergerfs lives outside the container and therefore
|
|
is querying the host\[cq]s group database.
|
|
There might be a hack to work around this (make mergerfs read the
|
|
/etc/group file in the container) but it is not yet implemented and
|
|
would be limited to Linux and the /etc/group DB.
|
|
Preferably users would mount in the host group file into the containers
|
|
or use a standard shared user & groups technology like NIS or LDAP.
|
|
.SH FAQ
|
|
.SS How well does mergerfs scale? Is it \[lq]production ready?\[rq]
|
|
.PP
|
|
Users have reported running mergerfs on everything from a Raspberry Pi
|
|
to dual socket Xeon systems with >20 cores.
|
|
I\[cq]m aware of at least a few companies which use mergerfs in
|
|
production.
|
|
Open Media Vault (https://www.openmediavault.org) includes mergerfs as
|
|
its sole solution for pooling filesystems.
|
|
The author of mergerfs had it running for over 300 days managing 16+
|
|
devices with reasonably heavy 24/7 read and write usage.
|
|
Stopping only after the machine\[cq]s power supply died.
|
|
.PP
|
|
Most serious issues (crashes or data corruption) have been due to kernel
|
|
bugs (https://github.com/trapexit/mergerfs/wiki/Kernel-Issues-&-Bugs).
|
|
All of which are fixed in stable releases.
|
|
.SS Can mergerfs be used with filesystems which already have data / are in use?
|
|
.PP
|
|
Yes.
|
|
mergerfs is really just a proxy and does \f[B]NOT\f[R] interfere with
|
|
the normal form or function of the filesystems / mounts / paths it
|
|
manages.
|
|
It is just another userland application that is acting as a
|
|
man-in-the-middle.
|
|
It can\[cq]t do anything that any other random piece of software
|
|
can\[cq]t do.
|
|
.PP
|
|
mergerfs is \f[B]not\f[R] a traditional filesystem that takes control
|
|
over the underlying block device.
|
|
mergerfs is \f[B]not\f[R] RAID.
|
|
It does \f[B]not\f[R] manipulate the data that passes through it.
|
|
It does \f[B]not\f[R] shard data across filesystems.
|
|
It merely shards some \f[B]behavior\f[R] and aggregates others.
|
|
.SS Can drives/filesystems be removed from the pool at will?
|
|
.PP
|
|
Yes.
|
|
See previous question\[cq]s answer.
|
|
.SS Can mergerfs be removed without affecting the data?
|
|
.PP
|
|
Yes.
|
|
See the previous question\[cq]s answer.
|
|
.SS Can drives/filesystems be moved to another pool?
|
|
.PP
|
|
Yes.
|
|
See the previous question\[cq]s answer.
|
|
.SS How do I migrate data into or out of the pool when adding/removing drives/filesystems?
|
|
.PP
|
|
You don\[cq]t need to.
|
|
See the previous question\[cq]s answer.
|
|
.SS How do I remove a drive/filesystem but keep the data in the pool?
|
|
.PP
|
|
Nothing special needs to be done.
|
|
Remove the branch from mergerfs\[cq] config and copy (rsync) the data
|
|
from the removed filesystem into the pool.
|
|
Effectively the same as if it were you transfering data from one
|
|
filesystem to another.
|
|
.PP
|
|
If you wish to continue using the pool while performing the transfer
|
|
simply create another, temporary pool without the filesystem in question
|
|
and then copy the data.
|
|
It would probably be a good idea to set the branch to \f[C]RO\f[R] prior
|
|
to doing this to ensure no new content is written to the filesystem
|
|
while performing the copy.
|
|
.SS What policies should I use?
|
|
.PP
|
|
Unless you\[cq]re doing something more niche the average user is
|
|
probably best off using \f[C]mfs\f[R] for \f[C]category.create\f[R].
|
|
It will spread files out across your branches based on available space.
|
|
Use \f[C]mspmfs\f[R] if you want to try to colocate the data a bit more.
|
|
You may want to use \f[C]lus\f[R] if you prefer a slightly different
|
|
distribution of data if you have a mix of smaller and larger
|
|
filesystems.
|
|
Generally though \f[C]mfs\f[R], \f[C]lus\f[R], or even \f[C]rand\f[R]
|
|
are good for the general use case.
|
|
If you are starting with an imbalanced pool you can use the tool
|
|
\f[B]mergerfs.balance\f[R] to redistribute files across the pool.
|
|
.PP
|
|
If you really wish to try to colocate files based on directory you can
|
|
set \f[C]func.create\f[R] to \f[C]epmfs\f[R] or similar and
|
|
\f[C]func.mkdir\f[R] to \f[C]rand\f[R] or \f[C]eprand\f[R] depending on
|
|
if you just want to colocate generally or on specific branches.
|
|
Either way the \f[I]need\f[R] to colocate is rare.
|
|
For instance: if you wish to remove the device regularly and want the
|
|
data to predictably be on that device or if you don\[cq]t use backup at
|
|
all and don\[cq]t wish to replace that data piecemeal.
|
|
In which case using path preservation can help but will require some
|
|
manual attention.
|
|
Colocating after the fact can be accomplished using the
|
|
\f[B]mergerfs.consolidate\f[R] tool.
|
|
If you don\[cq]t need strict colocation which the \f[C]ep\f[R] policies
|
|
provide then you can use the \f[C]msp\f[R] based policies which will
|
|
walk back the path till finding a branch that works.
|
|
.PP
|
|
Ultimately there is no correct answer.
|
|
It is a preference or based on some particular need.
|
|
mergerfs is very easy to test and experiment with.
|
|
I suggest creating a test setup and experimenting to get a sense of what
|
|
you want.
|
|
.PP
|
|
\f[C]epmfs\f[R] is the default \f[C]category.create\f[R] policy because
|
|
\f[C]ep\f[R] policies are not going to change the general layout of the
|
|
branches.
|
|
It won\[cq]t place files/dirs on branches that don\[cq]t already have
|
|
the relative branch.
|
|
So it keeps the system in a known state.
|
|
It\[cq]s much easier to stop using \f[C]epmfs\f[R] or redistribute files
|
|
around the filesystem than it is to consolidate them back.
|
|
.SS What settings should I use?
|
|
.PP
|
|
Depends on what features you want.
|
|
Generally speaking there are no \[lq]wrong\[rq] settings.
|
|
All settings are performance or feature related.
|
|
The best bet is to read over the available options and choose what fits
|
|
your situation.
|
|
If something isn\[cq]t clear from the documentation please reach out and
|
|
the documentation will be improved.
|
|
.PP
|
|
That said, for the average person, the following should be fine:
|
|
.PP
|
|
\f[C]cache.files=off,dropcacheonclose=true,category.create=mfs\f[R]
|
|
.SS Why are all my files ending up on 1 filesystem?!
|
|
.PP
|
|
Did you start with empty filesystems?
|
|
Did you explicitly configure a \f[C]category.create\f[R] policy?
|
|
Are you using an \f[C]existing path\f[R] / \f[C]path preserving\f[R]
|
|
policy?
|
|
.PP
|
|
The default create policy is \f[C]epmfs\f[R].
|
|
That is a path preserving algorithm.
|
|
With such a policy for \f[C]mkdir\f[R] and \f[C]create\f[R] with a set
|
|
of empty filesystems it will select only 1 filesystem when the first
|
|
directory is created.
|
|
Anything, files or directories, created in that first directory will be
|
|
placed on the same branch because it is preserving paths.
|
|
.PP
|
|
This catches a lot of new users off guard but changing the default would
|
|
break the setup for many existing users.
|
|
If you do not care about path preservation and wish your files to be
|
|
spread across all your filesystems change to \f[C]mfs\f[R] or similar
|
|
policy as described above.
|
|
If you do want path preservation you\[cq]ll need to perform the manual
|
|
act of creating paths on the filesystems you want the data to land on
|
|
before transferring your data.
|
|
Setting \f[C]func.mkdir=epall\f[R] can simplify managing path
|
|
preservation for \f[C]create\f[R].
|
|
Or use \f[C]func.mkdir=rand\f[R] if you\[cq]re interested in just
|
|
grouping together directory content by filesystem.
|
|
.SS Do hardlinks work?
|
|
.PP
|
|
Yes.
|
|
See also the option \f[C]inodecalc\f[R] for how inode values are
|
|
calculated.
|
|
.PP
|
|
What mergerfs does not do is fake hard links across branches.
|
|
Read the section \[lq]rename & link\[rq] for how it works.
|
|
.PP
|
|
Remember that hardlinks will NOT work across devices.
|
|
That includes between the original filesystem and a mergerfs pool,
|
|
between two separate pools of the same underlying filesystems, or bind
|
|
mounts of paths within the mergerfs pool.
|
|
The latter is common when using Docker or Podman.
|
|
Multiple volumes (bind mounts) to the same underlying filesystem are
|
|
considered different devices.
|
|
There is no way to link between them.
|
|
You should mount in the highest directory in the mergerfs pool that
|
|
includes all the paths you need if you want links to work.
|
|
.SS Can I use mergerfs without SnapRAID? SnapRAID without mergerfs?
|
|
.PP
|
|
Yes.
|
|
They are completely unrelated pieces of software.
|
|
.SS Can mergerfs run via Docker, Podman, Kubernetes, etc.
|
|
.PP
|
|
Yes.
|
|
With Docker you\[cq]ll need to include
|
|
\f[C]--cap-add=SYS_ADMIN --device=/dev/fuse --security-opt=apparmor:unconfined\f[R]
|
|
or similar with other container runtimes.
|
|
You should also be running it as root or given sufficient caps to change
|
|
user and group identity as well as have root like filesystem
|
|
permissions.
|
|
.PP
|
|
Keep in mind that you \f[B]MUST\f[R] consider identity when using
|
|
containers.
|
|
For example: supplemental groups will be picked up from the container
|
|
unless you properly manage users and groups by sharing relevant /etc
|
|
files or by using some other means to share identity across containers.
|
|
Similarly if you use \[lq]rootless\[rq] containers and user namespaces
|
|
to do uid/gid translations you \f[B]MUST\f[R] consider that while
|
|
managing shared files.
|
|
.PP
|
|
Also, as mentioned by hotio (https://hotio.dev/containers/mergerfs),
|
|
with Docker you should probably be mounting with
|
|
\f[C]bind-propagation\f[R] set to \f[C]slave\f[R].
|
|
.SS Does mergerfs support CoW / copy-on-write / writes to read-only filesystems?
|
|
.PP
|
|
Not in the sense of a filesystem like BTRFS or ZFS nor in the overlayfs
|
|
or aufs sense.
|
|
It does offer a
|
|
cow-shell (http://manpages.ubuntu.com/manpages/bionic/man1/cow-shell.1.html)
|
|
like hard link breaking (copy to temp file then rename over original)
|
|
which can be useful when wanting to save space by hardlinking duplicate
|
|
files but wish to treat each name as if it were a unique and separate
|
|
file.
|
|
.PP
|
|
If you want to write to a read-only filesystem you should look at
|
|
overlayfs.
|
|
You can always include the overlayfs mount into a mergerfs pool.
|
|
.SS Why can\[cq]t I see my files / directories?
|
|
.PP
|
|
It\[cq]s almost always a permissions issue.
|
|
Unlike mhddfs and unionfs-fuse, which runs as root and attempts to
|
|
access content as such, mergerfs always changes its credentials to that
|
|
of the caller.
|
|
This means that if the user does not have access to a file or directory
|
|
than neither will mergerfs.
|
|
However, because mergerfs is creating a union of paths it may be able to
|
|
read some files and directories on one filesystem but not another
|
|
resulting in an incomplete set.
|
|
.PP
|
|
Whenever you run into a split permission issue (seeing some but not all
|
|
files) try using
|
|
mergerfs.fsck (https://github.com/trapexit/mergerfs-tools) tool to check
|
|
for and fix the mismatch.
|
|
If you aren\[cq]t seeing anything at all be sure that the basic
|
|
permissions are correct.
|
|
The user and group values are correct and that directories have their
|
|
executable bit set.
|
|
A common mistake by users new to Linux is to \f[C]chmod -R 644\f[R] when
|
|
they should have \f[C]chmod -R u=rwX,go=rX\f[R].
|
|
.PP
|
|
If using a network filesystem such as NFS, SMB, CIFS (Samba) be sure to
|
|
pay close attention to anything regarding permissioning and users.
|
|
Root squashing and user translation for instance has bitten a few
|
|
mergerfs users.
|
|
Some of these also affect the use of mergerfs from container platforms
|
|
such as Docker.
|
|
.SS Why use FUSE? Why not a kernel based solution?
|
|
.PP
|
|
As with any solutions to a problem there are advantages and
|
|
disadvantages to each one.
|
|
.PP
|
|
A FUSE based solution has all the downsides of FUSE:
|
|
.IP \[bu] 2
|
|
Higher IO latency due to the trips in and out of kernel space
|
|
.IP \[bu] 2
|
|
Higher general overhead due to trips in and out of kernel space
|
|
.IP \[bu] 2
|
|
Double caching when using page caching
|
|
.IP \[bu] 2
|
|
Misc limitations due to FUSE\[cq]s design
|
|
.PP
|
|
But FUSE also has a lot of upsides:
|
|
.IP \[bu] 2
|
|
Easier to offer a cross platform solution
|
|
.IP \[bu] 2
|
|
Easier forward and backward compatibility
|
|
.IP \[bu] 2
|
|
Easier updates for users
|
|
.IP \[bu] 2
|
|
Easier and faster release cadence
|
|
.IP \[bu] 2
|
|
Allows more flexibility in design and features
|
|
.IP \[bu] 2
|
|
Overall easier to write, secure, and maintain
|
|
.IP \[bu] 2
|
|
Much lower barrier to entry (getting code into the kernel takes a lot of
|
|
time and effort initially)
|
|
.PP
|
|
FUSE was chosen because of all the advantages listed above.
|
|
The negatives of FUSE do not outweigh the positives.
|
|
.SS Is my OS\[cq]s libfuse needed for mergerfs to work?
|
|
.PP
|
|
No.\ Normally \f[C]mount.fuse\f[R] is needed to get mergerfs (or any
|
|
FUSE filesystem to mount using the \f[C]mount\f[R] command but in
|
|
vendoring the libfuse library the \f[C]mount.fuse\f[R] app has been
|
|
renamed to \f[C]mount.mergerfs\f[R] meaning the filesystem type in
|
|
\f[C]fstab\f[R] can simply be \f[C]mergerfs\f[R].
|
|
That said there should be no harm in having it installed and continuing
|
|
to using \f[C]fuse.mergerfs\f[R] as the type in \f[C]/etc/fstab\f[R].
|
|
.PP
|
|
If \f[C]mergerfs\f[R] doesn\[cq]t work as a type it could be due to how
|
|
the \f[C]mount.mergerfs\f[R] tool was installed.
|
|
Must be in \f[C]/sbin/\f[R] with proper permissions.
|
|
.SS Why was splice support removed?
|
|
.PP
|
|
After a lot of testing over the years splicing always appeared to be at
|
|
best provide equivalent performance and in cases worse performance.
|
|
Splice is not supported on other platforms forcing a traditional
|
|
read/write fallback to be provided.
|
|
The splice code was removed to simplify the codebase.
|
|
.SS Why use mergerfs over mhddfs?
|
|
.PP
|
|
mhddfs is no longer maintained and has some known stability and security
|
|
issues (see below).
|
|
mergerfs provides a superset of mhddfs\[cq] features and should offer
|
|
the same or maybe better performance.
|
|
.PP
|
|
Below is an example of mhddfs and mergerfs setup to work similarly.
|
|
.PP
|
|
\f[C]mhddfs -o mlimit=4G,allow_other /mnt/drive1,/mnt/drive2 /mnt/pool\f[R]
|
|
.PP
|
|
\f[C]mergerfs -o minfreespace=4G,category.create=ff /mnt/drive1:/mnt/drive2 /mnt/pool\f[R]
|
|
.SS Why use mergerfs over aufs?
|
|
.PP
|
|
aufs is mostly abandoned and no longer available in many distros.
|
|
.PP
|
|
While aufs can offer better peak performance mergerfs provides more
|
|
configurability and is generally easier to use.
|
|
mergerfs however does not offer the overlay / copy-on-write (CoW)
|
|
features which aufs and overlayfs have.
|
|
.SS Why use mergerfs over unionfs?
|
|
.PP
|
|
UnionFS is more like aufs than mergerfs in that it offers overlay / CoW
|
|
features.
|
|
If you\[cq]re just looking to create a union of filesystems and want
|
|
flexibility in file/directory placement then mergerfs offers that
|
|
whereas unionfs is more for overlaying RW filesystems over RO ones.
|
|
.SS Why use mergerfs over overlayfs?
|
|
.PP
|
|
Same reasons as with unionfs.
|
|
.SS Why use mergerfs over LVM/ZFS/BTRFS/RAID0 drive concatenation / striping?
|
|
.PP
|
|
With simple JBOD / drive concatenation / stripping / RAID0 a single
|
|
drive failure will result in full pool failure.
|
|
mergerfs performs a similar function without the possibility of
|
|
catastrophic failure and the difficulties in recovery.
|
|
Drives may fail, however, all other data will continue to be accessible.
|
|
.PP
|
|
When combined with something like SnapRaid (http://www.snapraid.it)
|
|
and/or an offsite backup solution you can have the flexibility of JBOD
|
|
without the single point of failure.
|
|
.SS Why use mergerfs over ZFS?
|
|
.PP
|
|
mergerfs is not intended to be a replacement for ZFS.
|
|
mergerfs is intended to provide flexible pooling of arbitrary
|
|
filesystems (local or remote), of arbitrary sizes, and arbitrary
|
|
filesystems.
|
|
For \f[C]write once, read many\f[R] usecases such as bulk media storage.
|
|
Where data integrity and backup is managed in other ways.
|
|
In that situation ZFS can introduce a number of costs and limitations as
|
|
described
|
|
here (http://louwrentius.com/the-hidden-cost-of-using-zfs-for-your-home-nas.html),
|
|
here (https://markmcb.com/2020/01/07/five-years-of-btrfs/), and
|
|
here (https://utcc.utoronto.ca/~cks/space/blog/solaris/ZFSWhyNoRealReshaping).
|
|
.SS Why use mergerfs over UnRAID?
|
|
.PP
|
|
UnRAID is a full OS and its storage layer, as I understand, is
|
|
proprietary and closed source.
|
|
Users who have experience with both have said they prefer the
|
|
flexibility offered by mergerfs and for some the fact it is free and
|
|
open source is important.
|
|
.PP
|
|
There are a number of UnRAID users who use mergerfs as well though
|
|
I\[cq]m not entirely familiar with the use case.
|
|
.SS Why use mergerfs over StableBit\[cq]s DrivePool?
|
|
.PP
|
|
DrivePool works only on Windows so not as common an alternative as other
|
|
Linux solutions.
|
|
If you want to use Windows then DrivePool is a good option.
|
|
Functionally the two projects work a bit differently.
|
|
DrivePool always writes to the filesystem with the most free space and
|
|
later rebalances.
|
|
mergerfs does not offer rebalance but chooses a branch at file/directory
|
|
create time.
|
|
DrivePool\[cq]s rebalancing can be done differently in any directory and
|
|
has file pattern matching to further customize the behavior.
|
|
mergerfs, not having rebalancing does not have these features, but
|
|
similar features are planned for mergerfs v3.
|
|
DrivePool has builtin file duplication which mergerfs does not natively
|
|
support (but can be done via an external script.)
|
|
.PP
|
|
There are a lot of misc differences between the two projects but most
|
|
features in DrivePool can be replicated with external tools in
|
|
combination with mergerfs.
|
|
.PP
|
|
Additionally DrivePool is a closed source commercial product vs mergerfs
|
|
a ISC licensed OSS project.
|
|
.SS What should mergerfs NOT be used for?
|
|
.IP \[bu] 2
|
|
databases: Even if the database stored data in separate files (mergerfs
|
|
wouldn\[cq]t offer much otherwise) the higher latency of the indirection
|
|
will kill performance.
|
|
If it is a lightly used SQLITE database then it may be fine but
|
|
you\[cq]ll need to test.
|
|
.IP \[bu] 2
|
|
VM images: For the same reasons as databases.
|
|
VM images are accessed very aggressively and mergerfs will introduce too
|
|
much latency (if it works at all).
|
|
.IP \[bu] 2
|
|
As replacement for RAID: mergerfs is just for pooling branches.
|
|
If you need that kind of device performance aggregation or high
|
|
availability you should stick with RAID.
|
|
.SS Can filesystems be written to directly? Outside of mergerfs while pooled?
|
|
.PP
|
|
Yes, however it\[cq]s not recommended to use the same file from within
|
|
the pool and from without at the same time (particularly writing).
|
|
Especially if using caching of any kind (cache.files, cache.entry,
|
|
cache.attr, cache.negative_entry, cache.symlinks, cache.readdir, etc.)
|
|
as there could be a conflict between cached data and not.
|
|
.SS Why do I get an \[lq]out of space\[rq] / \[lq]no space left on device\[rq] / ENOSPC error even though there appears to be lots of space available?
|
|
.PP
|
|
First make sure you\[cq]ve read the sections above about policies, path
|
|
preservation, branch filtering, and the options \f[B]minfreespace\f[R],
|
|
\f[B]moveonenospc\f[R], \f[B]statfs\f[R], and \f[B]statfs_ignore\f[R].
|
|
.PP
|
|
mergerfs is simply presenting a union of the content within multiple
|
|
branches.
|
|
The reported free space is an aggregate of space available within the
|
|
pool (behavior modified by \f[B]statfs\f[R] and
|
|
\f[B]statfs_ignore\f[R]).
|
|
It does not represent a contiguous space.
|
|
In the same way that read-only filesystems, those with quotas, or
|
|
reserved space report the full theoretical space available.
|
|
.PP
|
|
Due to path preservation, branch tagging, read-only status, and
|
|
\f[B]minfreespace\f[R] settings it is perfectly valid that
|
|
\f[C]ENOSPC\f[R] / \[lq]out of space\[rq] / \[lq]no space left on
|
|
device\[rq] be returned.
|
|
It is doing what was asked of it: filtering possible branches due to
|
|
those settings.
|
|
Only one error can be returned and if one of the reasons for filtering a
|
|
branch was \f[B]minfreespace\f[R] then it will be returned as such.
|
|
\f[B]moveonenospc\f[R] is only relevant to writing a file which is too
|
|
large for the filesystem it\[cq]s currently on.
|
|
.PP
|
|
It is also possible that the filesystem selected has run out of inodes.
|
|
Use \f[C]df -i\f[R] to list the total and available inodes per
|
|
filesystem.
|
|
.PP
|
|
If you don\[cq]t care about path preservation then simply change the
|
|
\f[C]create\f[R] policy to one which isn\[cq]t.
|
|
\f[C]mfs\f[R] is probably what most are looking for.
|
|
The reason it\[cq]s not default is because it was originally set to
|
|
\f[C]epmfs\f[R] and changing it now would change people\[cq]s setup.
|
|
Such a setting change will likely occur in mergerfs 3.
|
|
.SS Why does the total available space in mergerfs not equal outside?
|
|
.PP
|
|
Are you using ext2/3/4?
|
|
With reserve for root?
|
|
mergerfs uses available space for statfs calculations.
|
|
If you\[cq]ve reserved space for root then it won\[cq]t show up.
|
|
.PP
|
|
You can remove the reserve by running: \f[C]tune2fs -m 0 <device>\f[R]
|
|
.SS Can mergerfs mounts be exported over NFS?
|
|
.PP
|
|
Yes, however if you do anything which may changes files out of band
|
|
(including for example using the \f[C]newest\f[R] policy) it will result
|
|
in \[lq]stale file handle\[rq] errors unless properly setup.
|
|
.PP
|
|
Be sure to use the following options:
|
|
.IP \[bu] 2
|
|
noforget
|
|
.IP \[bu] 2
|
|
inodecalc=path-hash
|
|
.SS Can mergerfs mounts be exported over Samba / SMB?
|
|
.PP
|
|
Yes.
|
|
While some users have reported problems it appears to always be related
|
|
to how Samba is setup in relation to permissions.
|
|
.SS Can mergerfs mounts be used over SSHFS?
|
|
.PP
|
|
Yes.
|
|
.SS I notice massive slowdowns of writes when enabling cache.files.
|
|
.PP
|
|
When file caching is enabled in any form (\f[C]cache.files!=off\f[R] or
|
|
\f[C]direct_io=false\f[R]) it will issue \f[C]getxattr\f[R] requests for
|
|
\f[C]security.capability\f[R] prior to \f[I]every single write\f[R].
|
|
This will usually result in a performance degradation, especially when
|
|
using a network filesystem (such as NFS or CIFS/SMB/Samba.)
|
|
Unfortunately at this moment the kernel is not caching the response.
|
|
.PP
|
|
To work around this situation mergerfs offers a few solutions.
|
|
.IP "1." 3
|
|
Set \f[C]security_capability=false\f[R].
|
|
It will short circuit any call and return \f[C]ENOATTR\f[R].
|
|
This still means though that mergerfs will receive the request before
|
|
every write but at least it doesn\[cq]t get passed through to the
|
|
underlying filesystem.
|
|
.IP "2." 3
|
|
Set \f[C]xattr=noattr\f[R].
|
|
Same as above but applies to \f[I]all\f[R] calls to getxattr.
|
|
Not just \f[C]security.capability\f[R].
|
|
This will not be cached by the kernel either but mergerfs\[cq] runtime
|
|
config system will still function.
|
|
.IP "3." 3
|
|
Set \f[C]xattr=nosys\f[R].
|
|
Results in mergerfs returning \f[C]ENOSYS\f[R] which \f[I]will\f[R] be
|
|
cached by the kernel.
|
|
No future xattr calls will be forwarded to mergerfs.
|
|
The downside is that also means the xattr based config and query
|
|
functionality won\[cq]t work either.
|
|
.IP "4." 3
|
|
Disable file caching.
|
|
If you aren\[cq]t using applications which use \f[C]mmap\f[R] it\[cq]s
|
|
probably simpler to just disable it all together.
|
|
The kernel won\[cq]t send the requests when caching is disabled.
|
|
.SS It\[cq]s mentioned that there are some security issues with mhddfs. What are they? How does mergerfs address them?
|
|
.PP
|
|
mhddfs (https://github.com/trapexit/mhddfs) manages running as
|
|
\f[B]root\f[R] by calling
|
|
getuid() (https://github.com/trapexit/mhddfs/blob/cae96e6251dd91e2bdc24800b4a18a74044f6672/src/main.c#L319)
|
|
and if it returns \f[B]0\f[R] then it will
|
|
chown (http://linux.die.net/man/1/chown) the file.
|
|
Not only is that a race condition but it doesn\[cq]t handle other
|
|
situations.
|
|
Rather than attempting to simulate POSIX ACL behavior the proper way to
|
|
manage this is to use seteuid (http://linux.die.net/man/2/seteuid) and
|
|
setegid (http://linux.die.net/man/2/setegid), in effect becoming the
|
|
user making the original call, and perform the action as them.
|
|
This is what mergerfs does and why mergerfs should always run as root.
|
|
.PP
|
|
In Linux setreuid syscalls apply only to the thread.
|
|
GLIBC hides this away by using realtime signals to inform all threads to
|
|
change credentials.
|
|
Taking after \f[B]Samba\f[R], mergerfs uses
|
|
\f[B]syscall(SYS_setreuid,\&...)\f[R] to set the callers credentials for
|
|
that thread only.
|
|
Jumping back to \f[B]root\f[R] as necessary should escalated privileges
|
|
be needed (for instance: to clone paths between filesystems).
|
|
.PP
|
|
For non-Linux systems mergerfs uses a read-write lock and changes
|
|
credentials only when necessary.
|
|
If multiple threads are to be user X then only the first one will need
|
|
to change the processes credentials.
|
|
So long as the other threads need to be user X they will take a readlock
|
|
allowing multiple threads to share the credentials.
|
|
Once a request comes in to run as user Y that thread will attempt a
|
|
write lock and change to Y\[cq]s credentials when it can.
|
|
If the ability to give writers priority is supported then that flag will
|
|
be used so threads trying to change credentials don\[cq]t starve.
|
|
This isn\[cq]t the best solution but should work reasonably well
|
|
assuming there are few users.
|
|
.SH SUPPORT
|
|
.PP
|
|
Filesystems are complex and difficult to debug.
|
|
mergerfs, while being just a proxy of sorts, can be difficult to debug
|
|
given the large number of possible settings it can have itself and the
|
|
number of environments it can run in.
|
|
When reporting on a suspected issue \f[B]please\f[R] include as much of
|
|
the below information as possible otherwise it will be difficult or
|
|
impossible to diagnose.
|
|
Also please read the above documentation as it provides details on many
|
|
previously encountered questions/issues.
|
|
.PP
|
|
\f[B]Please make sure you are using the latest
|
|
release (https://github.com/trapexit/mergerfs/releases) or have tried it
|
|
in comparison. Old versions, which are often included in distros like
|
|
Debian and Ubuntu, are not ever going to be updated and the issue you
|
|
are encountering may have been addressed already.\f[R]
|
|
.PP
|
|
\f[B]For commercial support or feature requests please contact me
|
|
directly. (mailto:support@spawn.link)\f[R]
|
|
.SS Information to include in bug reports
|
|
.IP \[bu] 2
|
|
Information about the broader problem along with any attempted
|
|
solutions. (https://xyproblem.info)
|
|
.IP \[bu] 2
|
|
Solution already ruled out and why.
|
|
.IP \[bu] 2
|
|
Version of mergerfs: \f[C]mergerfs --version\f[R]
|
|
.IP \[bu] 2
|
|
mergerfs settings / arguments: from fstab, systemd unit, command line,
|
|
OMV plugin, etc.
|
|
.IP \[bu] 2
|
|
Version of the OS: \f[C]uname -a\f[R] and \f[C]lsb_release -a\f[R]
|
|
.IP \[bu] 2
|
|
List of branches, their filesystem types, sizes (before and after
|
|
issue): \f[C]df -h\f[R]
|
|
.IP \[bu] 2
|
|
\f[B]All\f[R] information about the relevant paths and files:
|
|
permissions, ownership, etc.
|
|
.IP \[bu] 2
|
|
\f[B]All\f[R] information about the client app making the requests:
|
|
version, uid/gid
|
|
.IP \[bu] 2
|
|
Runtime environment:
|
|
.RS 2
|
|
.IP \[bu] 2
|
|
Is mergerfs running within a container?
|
|
.IP \[bu] 2
|
|
Are the client apps using mergerfs running in a container?
|
|
.RE
|
|
.IP \[bu] 2
|
|
A \f[C]strace\f[R] of the app having problems:
|
|
.RS 2
|
|
.IP \[bu] 2
|
|
\f[C]strace -fvTtt -s 256 -o /tmp/app.strace.txt <cmd>\f[R]
|
|
.RE
|
|
.IP \[bu] 2
|
|
A \f[C]strace\f[R] of mergerfs while the program is trying to do
|
|
whatever it is failing to do:
|
|
.RS 2
|
|
.IP \[bu] 2
|
|
\f[C]strace -fvTtt -s 256 -p <mergerfsPID> -o /tmp/mergerfs.strace.txt\f[R]
|
|
.RE
|
|
.IP \[bu] 2
|
|
\f[B]Precise\f[R] directions on replicating the issue.
|
|
Do not leave \f[B]anything\f[R] out.
|
|
.IP \[bu] 2
|
|
Try to recreate the problem in the simplest way using standard programs:
|
|
\f[C]ln\f[R], \f[C]mv\f[R], \f[C]cp\f[R], \f[C]ls\f[R], \f[C]dd\f[R],
|
|
etc.
|
|
.SS Contact / Issue submission
|
|
.IP \[bu] 2
|
|
github.com: https://github.com/trapexit/mergerfs/issues
|
|
.IP \[bu] 2
|
|
discord: https://discord.gg/MpAr69V
|
|
.IP \[bu] 2
|
|
reddit: https://www.reddit.com/r/mergerfs
|
|
.SS Donations
|
|
.PP
|
|
https://github.com/trapexit/support
|
|
.PP
|
|
Development and support of a project like mergerfs requires a
|
|
significant amount of time and effort.
|
|
The software is released under the very liberal ISC license and is
|
|
therefore free to use for personal or commercial uses.
|
|
.PP
|
|
If you are a personal user and find mergerfs and its support valuable
|
|
and would like to support the project financially it would be very much
|
|
appreciated.
|
|
.PP
|
|
If you are using mergerfs commercially please consider sponsoring the
|
|
project to ensure it continues to be maintained and receive updates.
|
|
If custom features are needed feel free to contact me
|
|
directly (mailto:support@spawn.link).
|
|
.SH LINKS
|
|
.IP \[bu] 2
|
|
https://spawn.link
|
|
.IP \[bu] 2
|
|
https://github.com/trapexit/mergerfs
|
|
.IP \[bu] 2
|
|
https://github.com/trapexit/mergerfs/wiki
|
|
.IP \[bu] 2
|
|
https://github.com/trapexit/mergerfs-tools
|
|
.IP \[bu] 2
|
|
https://github.com/trapexit/scorch
|
|
.IP \[bu] 2
|
|
https://github.com/trapexit/bbf
|