mirror of https://github.com/trapexit/mergerfs.git
Antonio SJ Musumeci
9 years ago
2 changed files with 761 additions and 0 deletions
-
2Makefile
-
759mergerfs.1
@ -0,0 +1,759 @@ |
|||||
|
.\"t |
||||
|
.TH "mergerfs" "1" "2015\-10\-11" "mergerfs user manual" "" |
||||
|
.SH NAME |
||||
|
.PP |
||||
|
mergerfs \- another FUSE union filesystem |
||||
|
.SH SYNOPSIS |
||||
|
.PP |
||||
|
mergerfs \-o<options> <srcpoints> <mountpoint> |
||||
|
.SH DESCRIPTION |
||||
|
.PP |
||||
|
\f[B]mergerfs\f[] is similar to \f[B]mhddfs\f[], \f[B]unionfs\f[], and |
||||
|
\f[B]aufs\f[]. |
||||
|
Like \f[B]mhddfs\f[] in that it too uses \f[B]FUSE\f[]. |
||||
|
Like \f[B]aufs\f[] in that it provides multiple policies for how to |
||||
|
handle behavior. |
||||
|
.PP |
||||
|
Why \f[B]mergerfs\f[] when those exist? |
||||
|
\f[B]mhddfs\f[] has not been updated in some time nor very flexible. |
||||
|
There are also security issues when with running as root. |
||||
|
\f[B]aufs\f[] is more flexible than \f[B]mhddfs\f[] but kernel based and |
||||
|
difficult to debug when problems arise. |
||||
|
Neither support file attributes |
||||
|
(chattr (http://linux.die.net/man/1/chattr)). |
||||
|
.SH FEATURES |
||||
|
.IP \[bu] 2 |
||||
|
Runs in userspace (FUSE) |
||||
|
.IP \[bu] 2 |
||||
|
Configurable behaviors |
||||
|
.IP \[bu] 2 |
||||
|
Supports extended attributes (xattrs) |
||||
|
.IP \[bu] 2 |
||||
|
Supports file attributes (chattr) |
||||
|
.IP \[bu] 2 |
||||
|
Dynamically configurable (via xattrs) |
||||
|
.IP \[bu] 2 |
||||
|
Safe to run as root |
||||
|
.IP \[bu] 2 |
||||
|
Opportunistic credential caching |
||||
|
.IP \[bu] 2 |
||||
|
Works with heterogeneous filesystem types |
||||
|
.SH OPTIONS |
||||
|
.SS options |
||||
|
.IP \[bu] 2 |
||||
|
\f[B]defaults\f[]: a shortcut for FUSE\[aq]s \f[B]atomic_o_trunc\f[], |
||||
|
\f[B]auto_cache\f[], \f[B]big_writes\f[], \f[B]default_permissions\f[], |
||||
|
\f[B]splice_move\f[], \f[B]splice_read\f[], and \f[B]splice_write\f[]. |
||||
|
These options seem to provide the best performance. |
||||
|
.IP \[bu] 2 |
||||
|
\f[B]direct_io\f[]: causes FUSE to bypass an addition caching step which |
||||
|
can increase write speeds at the detriment of read speed. |
||||
|
.IP \[bu] 2 |
||||
|
\f[B]minfreespace\f[]: the minimum space value used for the |
||||
|
\f[B]lfs\f[], \f[B]fwfs\f[], and \f[B]epmfs\f[] policies. |
||||
|
Understands \[aq]K\[aq], \[aq]M\[aq], and \[aq]G\[aq] to represent |
||||
|
kilobyte, megabyte, and gigabyte respectively. |
||||
|
(default: 4G) |
||||
|
.IP \[bu] 2 |
||||
|
\f[B]moveonenospc\f[]: when enabled (set to \f[B]true\f[]) if a |
||||
|
\f[B]write\f[] fails with \f[B]ENOSPC\f[] a scan of all drives will be |
||||
|
done looking for the drive with most free space which is at least the |
||||
|
size of the file plus the amount which failed to write. |
||||
|
An attempt to move the file to that drive will occur (keeping all |
||||
|
metadata possible) and if successful the original is unlinked and the |
||||
|
write retried. |
||||
|
(default: false) |
||||
|
.IP \[bu] 2 |
||||
|
\f[B]func.<func>=<policy>\f[]: sets the specific FUSE function\[aq]s |
||||
|
policy. |
||||
|
See below for the list of value types. |
||||
|
Example: \f[B]func.getattr=newest\f[] |
||||
|
.IP \[bu] 2 |
||||
|
\f[B]category.<category>=<policy>\f[]: Sets policy of all FUSE functions |
||||
|
in the provided category. |
||||
|
Example: \f[B]category.create=mfs\f[] |
||||
|
.PP |
||||
|
\f[B]NOTE:\f[] Options are evaluated in the order listed so if the |
||||
|
options are \f[B]func.rmdir=rand,category.action=ff\f[] the |
||||
|
\f[B]action\f[] category setting will override the \f[B]rmdir\f[] |
||||
|
setting. |
||||
|
.SS srcpoints |
||||
|
.PP |
||||
|
The source points argument is a colon (\[aq]:\[aq]) delimited list of |
||||
|
paths. |
||||
|
To make it simpler to include multiple source points without having to |
||||
|
modify your fstab (http://linux.die.net/man/5/fstab) we also support |
||||
|
globbing (http://linux.die.net/man/7/glob). |
||||
|
\f[B]The globbing tokens MUST be escaped when using via the shell else |
||||
|
the shell itself will probably expand it.\f[] |
||||
|
.IP |
||||
|
.nf |
||||
|
\f[C] |
||||
|
$\ mergerfs\ /mnt/disk\\*:/mnt/cdrom\ /media/drives |
||||
|
\f[] |
||||
|
.fi |
||||
|
.PP |
||||
|
The above line will use all points in /mnt prefixed with \f[I]disk\f[] |
||||
|
and the directory \f[I]cdrom\f[]. |
||||
|
.PP |
||||
|
In /etc/fstab it\[aq]d look like the following: |
||||
|
.IP |
||||
|
.nf |
||||
|
\f[C] |
||||
|
#\ <file\ system>\ \ \ \ \ \ \ \ <mount\ point>\ \ <type>\ \ \ \ \ \ \ \ \ <options>\ \ \ \ \ \ \ \ \ \ \ \ \ <dump>\ \ <pass> |
||||
|
/mnt/disk*:/mnt/cdrom\ \ /media/drives\ \ fuse.mergerfs\ \ defaults,allow_other\ \ 0\ \ \ \ \ \ \ 0 |
||||
|
\f[] |
||||
|
.fi |
||||
|
.PP |
||||
|
\f[B]NOTE:\f[] the globbing is done at mount or xattr update time. |
||||
|
If a new directory is added matching the glob after the fact it will not |
||||
|
be included. |
||||
|
.SH POLICIES |
||||
|
.PP |
||||
|
Filesystem calls are broken up into 3 categories: \f[B]action\f[], |
||||
|
\f[B]create\f[], \f[B]search\f[]. |
||||
|
There are also some calls which have no policy attached due to state |
||||
|
being kept between calls. |
||||
|
These categories can be assigned a policy which dictates how |
||||
|
\f[B]mergerfs\f[] behaves. |
||||
|
Any policy can be assigned to a category though some aren\[aq]t terribly |
||||
|
practical. |
||||
|
For instance: \f[B]rand\f[] (Random) may be useful for \f[B]create\f[] |
||||
|
but could lead to very odd behavior if used for \f[B]search\f[]. |
||||
|
.SS Functional classifications |
||||
|
.PP |
||||
|
.TS |
||||
|
tab(@); |
||||
|
l l. |
||||
|
T{ |
||||
|
Category |
||||
|
T}@T{ |
||||
|
FUSE Functions |
||||
|
T} |
||||
|
_ |
||||
|
T{ |
||||
|
action |
||||
|
T}@T{ |
||||
|
chmod, chown, link, removexattr, rename, rmdir, setxattr, truncate, |
||||
|
unlink, utimens |
||||
|
T} |
||||
|
T{ |
||||
|
create |
||||
|
T}@T{ |
||||
|
create, mkdir, mknod, symlink |
||||
|
T} |
||||
|
T{ |
||||
|
search |
||||
|
T}@T{ |
||||
|
access, getattr, getxattr, ioctl, listxattr, open, readlink |
||||
|
T} |
||||
|
T{ |
||||
|
N/A |
||||
|
T}@T{ |
||||
|
fallocate, fgetattr, fsync, ftruncate, ioctl, read, readdir, release, |
||||
|
statfs, write |
||||
|
T} |
||||
|
.TE |
||||
|
.PP |
||||
|
\f[B]ioctl\f[] behaves differently if its acting on a directory. |
||||
|
It\[aq]ll use the \f[B]getattr\f[] policy to find and open the directory |
||||
|
before issuing the \f[B]ioctl\f[]. |
||||
|
In other cases where something may be searched (to confirm a directory |
||||
|
exists across all source mounts) then \f[B]getattr\f[] will be used. |
||||
|
.SS Policy descriptions |
||||
|
.PP |
||||
|
.TS |
||||
|
tab(@); |
||||
|
l l. |
||||
|
T{ |
||||
|
Policy |
||||
|
T}@T{ |
||||
|
Description |
||||
|
T} |
||||
|
_ |
||||
|
T{ |
||||
|
ff (first found) |
||||
|
T}@T{ |
||||
|
Given the order of the drives act on the first one found (regardless if |
||||
|
stat would return EACCES). |
||||
|
T} |
||||
|
T{ |
||||
|
ffwp (first found w/ permissions) |
||||
|
T}@T{ |
||||
|
Given the order of the drives act on the first one found which you have |
||||
|
access (stat does not error with EACCES). |
||||
|
T} |
||||
|
T{ |
||||
|
newest (newest file) |
||||
|
T}@T{ |
||||
|
If multiple files exist return the one with the most recent mtime. |
||||
|
T} |
||||
|
T{ |
||||
|
mfs (most free space) |
||||
|
T}@T{ |
||||
|
Use the drive with the most free space available. |
||||
|
T} |
||||
|
T{ |
||||
|
epmfs (existing path, most free space) |
||||
|
T}@T{ |
||||
|
If the path exists on multiple drives use the one with the most free |
||||
|
space and is greater than \f[B]minfreespace\f[]. |
||||
|
If no drive has at least \f[B]minfreespace\f[] then fallback to |
||||
|
\f[B]mfs\f[]. |
||||
|
T} |
||||
|
T{ |
||||
|
fwfs (first with free space) |
||||
|
T}@T{ |
||||
|
Pick the first drive which has at least \f[B]minfreespace\f[]. |
||||
|
T} |
||||
|
T{ |
||||
|
lfs (least free space) |
||||
|
T}@T{ |
||||
|
Pick the drive with least available space but more than |
||||
|
\f[B]minfreespace\f[]. |
||||
|
T} |
||||
|
T{ |
||||
|
rand (random) |
||||
|
T}@T{ |
||||
|
Pick an existing drive at random. |
||||
|
T} |
||||
|
T{ |
||||
|
all |
||||
|
T}@T{ |
||||
|
Applies action to all found. |
||||
|
For searches it will behave like first found \f[B]ff\f[]. |
||||
|
T} |
||||
|
T{ |
||||
|
enosys, einval, enotsup, exdev, erofs |
||||
|
T}@T{ |
||||
|
Exclusively return \f[C]\-1\f[] with \f[C]errno\f[] set to the |
||||
|
respective value. |
||||
|
Useful for debugging other applications\[aq] behavior to errors. |
||||
|
T} |
||||
|
.TE |
||||
|
.SS Defaults |
||||
|
.PP |
||||
|
.TS |
||||
|
tab(@); |
||||
|
l l. |
||||
|
T{ |
||||
|
Category |
||||
|
T}@T{ |
||||
|
Policy |
||||
|
T} |
||||
|
_ |
||||
|
T{ |
||||
|
action |
||||
|
T}@T{ |
||||
|
all |
||||
|
T} |
||||
|
T{ |
||||
|
create |
||||
|
T}@T{ |
||||
|
epmfs |
||||
|
T} |
||||
|
T{ |
||||
|
search |
||||
|
T}@T{ |
||||
|
ff |
||||
|
T} |
||||
|
.TE |
||||
|
.SS rename |
||||
|
.PP |
||||
|
rename (http://man7.org/linux/man-pages/man2/rename.2.html) is a tricky |
||||
|
function in a merged system. |
||||
|
Normally if a rename can\[aq]t be done atomically due to the from and to |
||||
|
paths existing on different mount points it will return \f[C]\-1\f[] |
||||
|
with \f[C]errno\ =\ EXDEV\f[]. |
||||
|
The atomic rename is most critical for replacing files in place |
||||
|
atomically (such as securing writing to a temp file and then replacing a |
||||
|
target). |
||||
|
The problem is that by merging multiple paths you can have N instances |
||||
|
of the source and destinations on different drives. |
||||
|
Meaning that if you just renamed each source locally you could end up |
||||
|
with the destination files not overwriten / replaced. |
||||
|
To address this mergerfs works in the following way. |
||||
|
If the source and destination exist in different directories it will |
||||
|
immediately return \f[C]EXDEV\f[]. |
||||
|
Generally it\[aq]s not expected for cross directory renames to work so |
||||
|
it should be fine for most instances (mv,rsync,etc.). |
||||
|
If they do belong to the same directory it then runs the \f[C]rename\f[] |
||||
|
policy to get the files to rename. |
||||
|
It iterates through and renames each file while keeping track of those |
||||
|
paths which have not been renamed. |
||||
|
If all the renames succeed it will then \f[C]unlink\f[] or |
||||
|
\f[C]rmdir\f[] the other paths to clean up any preexisting target files. |
||||
|
This allows the new file to be found without the file itself ever |
||||
|
disappearing. |
||||
|
There may still be some issues with this behavior. |
||||
|
Particularly on error. |
||||
|
At the moment however this seems the best policy. |
||||
|
.SS readdir |
||||
|
.PP |
||||
|
readdir (http://linux.die.net/man/3/readdir) is very different from most |
||||
|
functions in this realm. |
||||
|
It certainly could have it\[aq]s own set of policies to tweak its |
||||
|
behavior. |
||||
|
At this time it provides a simple \f[B]first found\f[] merging of |
||||
|
directories and file found. |
||||
|
That is: only the first file or directory found for a directory is |
||||
|
returned. |
||||
|
Given how FUSE works though the data representing the returned entry |
||||
|
comes from \f[B]getattr\f[]. |
||||
|
.PP |
||||
|
It could be extended to offer the ability to see all files found. |
||||
|
Perhaps concatenating \f[B]#\f[] and a number to the name. |
||||
|
But to really be useful you\[aq]d need to be able to access them which |
||||
|
would complicate file lookup. |
||||
|
.SS statvfs |
||||
|
.PP |
||||
|
statvfs (http://linux.die.net/man/2/statvfs) normalizes the source |
||||
|
drives based on the fragment size and sums the number of adjusted blocks |
||||
|
and inodes. |
||||
|
This means you will see the combined space of all sources. |
||||
|
Total, used, and free. |
||||
|
The sources however are dedupped based on the drive so multiple points |
||||
|
on the same drive will not result in double counting it\[aq]s space. |
||||
|
.PP |
||||
|
\f[B]NOTE:\f[] Since we can not (easily) replicate the atomicity of an |
||||
|
\f[B]mkdir\f[] or \f[B]mknod\f[] without side effects those calls will |
||||
|
first do a scan to see if the file exists and then attempts a create. |
||||
|
This means there is a slight race condition. |
||||
|
Worse case you\[aq]d end up with the directory or file on more than one |
||||
|
mount. |
||||
|
.SH BUILDING |
||||
|
.PP |
||||
|
\f[B]NOTE:\f[] Prebuilt packages can be found at: |
||||
|
https://github.com/trapexit/mergerfs/releases |
||||
|
.PP |
||||
|
First get the code from github (http://github.com/trapexit/mergerfs). |
||||
|
.IP |
||||
|
.nf |
||||
|
\f[C] |
||||
|
$\ git\ clone\ https://github.com/trapexit/mergerfs.git |
||||
|
$\ #\ or |
||||
|
$\ wget\ https://github.com/trapexit/mergerfs/archive/master.zip |
||||
|
\f[] |
||||
|
.fi |
||||
|
.SS Debian / Ubuntu |
||||
|
.IP |
||||
|
.nf |
||||
|
\f[C] |
||||
|
$\ sudo\ apt\-get\ install\ g++\ pkg\-config\ git\ git\-buildpackage\ pandoc\ debhelper\ libfuse\-dev\ libattr1\-dev |
||||
|
$\ cd\ mergerfs |
||||
|
$\ make\ deb |
||||
|
$\ sudo\ dpkg\ \-i\ ../mergerfs_version_arch.deb |
||||
|
\f[] |
||||
|
.fi |
||||
|
.SS Fedora |
||||
|
.IP |
||||
|
.nf |
||||
|
\f[C] |
||||
|
$\ su\ \- |
||||
|
#\ dnf\ install\ rpm\-build\ fuse\-devel\ libattr\-devel\ pandoc\ gcc\-c++\ git\ make\ which |
||||
|
#\ cd\ mergerfs |
||||
|
#\ make\ rpm |
||||
|
#\ rpm\ \-i\ rpmbuild/RPMS/<arch>/mergerfs\-<verion>.<arch>.rpm |
||||
|
\f[] |
||||
|
.fi |
||||
|
.SS Generically |
||||
|
.PP |
||||
|
Have pkg\-config, pandoc, libfuse, libattr1 installed. |
||||
|
.IP |
||||
|
.nf |
||||
|
\f[C] |
||||
|
$\ cd\ mergerfs |
||||
|
$\ make |
||||
|
$\ make\ man |
||||
|
$\ sudo\ make\ install |
||||
|
\f[] |
||||
|
.fi |
||||
|
.SH RUNTIME |
||||
|
.SS \&.mergerfs pseudo file |
||||
|
.IP |
||||
|
.nf |
||||
|
\f[C] |
||||
|
<mountpoint>/.mergerfs |
||||
|
\f[] |
||||
|
.fi |
||||
|
.PP |
||||
|
There is a pseudo file available at the mount point which allows for the |
||||
|
runtime modification of certain \f[B]mergerfs\f[] options. |
||||
|
The file will not show up in \f[B]readdir\f[] but can be |
||||
|
\f[B]stat\f[]\[aq]ed and manipulated via |
||||
|
{list,get,set}xattrs (http://linux.die.net/man/2/listxattr) calls. |
||||
|
.PP |
||||
|
Even if xattrs are disabled the |
||||
|
{list,get,set}xattrs (http://linux.die.net/man/2/listxattr) calls will |
||||
|
still work. |
||||
|
.SS Keys |
||||
|
.PP |
||||
|
Use \f[C]xattr\ \-l\ /mount/point/.mergerfs\f[] to see all supported |
||||
|
keys. |
||||
|
.SS Example |
||||
|
.IP |
||||
|
.nf |
||||
|
\f[C] |
||||
|
[trapexit:/tmp/mount]\ $\ xattr\ \-l\ .mergerfs |
||||
|
user.mergerfs.srcmounts:\ /tmp/a:/tmp/b |
||||
|
user.mergerfs.minfreespace:\ 4294967295 |
||||
|
user.mergerfs.moveonenospc:\ false |
||||
|
user.mergerfs.policies:\ all,einval,enosys,enotsup,epmfs,erofs,exdev,ff,ffwp,fwfs,lfs,mfs,newest,rand |
||||
|
user.mergerfs.version:\ x.y.z |
||||
|
user.mergerfs.category.action:\ all |
||||
|
user.mergerfs.category.create:\ epmfs |
||||
|
user.mergerfs.category.search:\ ff |
||||
|
user.mergerfs.func.access:\ ff |
||||
|
user.mergerfs.func.chmod:\ all |
||||
|
user.mergerfs.func.chown:\ all |
||||
|
user.mergerfs.func.create:\ epmfs |
||||
|
user.mergerfs.func.getattr:\ ff |
||||
|
user.mergerfs.func.getxattr:\ ff |
||||
|
user.mergerfs.func.link:\ all |
||||
|
user.mergerfs.func.listxattr:\ ff |
||||
|
user.mergerfs.func.mkdir:\ epmfs |
||||
|
user.mergerfs.func.mknod:\ epmfs |
||||
|
user.mergerfs.func.open:\ ff |
||||
|
user.mergerfs.func.readlink:\ ff |
||||
|
user.mergerfs.func.removexattr:\ all |
||||
|
user.mergerfs.func.rename:\ all |
||||
|
user.mergerfs.func.rmdir:\ all |
||||
|
user.mergerfs.func.setxattr:\ all |
||||
|
user.mergerfs.func.symlink:\ epmfs |
||||
|
user.mergerfs.func.truncate:\ all |
||||
|
user.mergerfs.func.unlink:\ all |
||||
|
user.mergerfs.func.utimens:\ all |
||||
|
|
||||
|
[trapexit:/tmp/mount]\ $\ xattr\ \-p\ user.mergerfs.category.search\ .mergerfs |
||||
|
ff |
||||
|
|
||||
|
[trapexit:/tmp/mount]\ $\ xattr\ \-w\ user.mergerfs.category.search\ ffwp\ .mergerfs |
||||
|
[trapexit:/tmp/mount]\ $\ xattr\ \-p\ user.mergerfs.category.search\ .mergerfs |
||||
|
ffwp |
||||
|
|
||||
|
[trapexit:/tmp/mount]\ $\ xattr\ \-w\ user.mergerfs.srcmounts\ +/tmp/c\ .mergerfs |
||||
|
[trapexit:/tmp/mount]\ $\ xattr\ \-p\ user.mergerfs.srcmounts\ .mergerfs |
||||
|
/tmp/a:/tmp/b:/tmp/c |
||||
|
|
||||
|
[trapexit:/tmp/mount]\ $\ xattr\ \-w\ user.mergerfs.srcmounts\ =/tmp/c\ .mergerfs |
||||
|
[trapexit:/tmp/mount]\ $\ xattr\ \-p\ user.mergerfs.srcmounts\ .mergerfs |
||||
|
/tmp/c |
||||
|
|
||||
|
[trapexit:/tmp/mount]\ $\ xattr\ \-w\ user.mergerfs.srcmounts\ \[aq]+</tmp/a:/tmp/b\[aq]\ .mergerfs |
||||
|
[trapexit:/tmp/mount]\ $\ xattr\ \-p\ user.mergerfs.srcmounts\ .mergerfs |
||||
|
/tmp/a:/tmp/b:/tmp/c |
||||
|
\f[] |
||||
|
.fi |
||||
|
.SS user.mergerfs.srcmounts |
||||
|
.PP |
||||
|
For \f[B]user.mergerfs.srcmounts\f[] there are several instructions |
||||
|
available for manipulating the list. |
||||
|
The value provided is just as the value used at mount time. |
||||
|
A colon (\[aq]:\[aq]) delimited list of full path globs. |
||||
|
.PP |
||||
|
.TS |
||||
|
tab(@); |
||||
|
l l. |
||||
|
T{ |
||||
|
Instruction |
||||
|
T}@T{ |
||||
|
Description |
||||
|
T} |
||||
|
_ |
||||
|
T{ |
||||
|
[list] |
||||
|
T}@T{ |
||||
|
set |
||||
|
T} |
||||
|
T{ |
||||
|
+<[list] |
||||
|
T}@T{ |
||||
|
prepend |
||||
|
T} |
||||
|
T{ |
||||
|
+>[list] |
||||
|
T}@T{ |
||||
|
append |
||||
|
T} |
||||
|
T{ |
||||
|
\-[list] |
||||
|
T}@T{ |
||||
|
remove all values provided |
||||
|
T} |
||||
|
T{ |
||||
|
\-< |
||||
|
T}@T{ |
||||
|
remove first in list |
||||
|
T} |
||||
|
T{ |
||||
|
\-> |
||||
|
T}@T{ |
||||
|
remove last in list |
||||
|
T} |
||||
|
.TE |
||||
|
.SS minfreespace |
||||
|
.PP |
||||
|
Input: interger with an optional suffix. |
||||
|
\f[B]K\f[], \f[B]M\f[], or \f[B]G\f[]. |
||||
|
Output: value in bytes |
||||
|
.SS moveonenospc |
||||
|
.PP |
||||
|
Input: \f[B]true\f[] and \f[B]false\f[] Ouput: \f[B]true\f[] or |
||||
|
\f[B]false\f[] |
||||
|
.SS categories / funcs |
||||
|
.PP |
||||
|
Input: short policy string as described elsewhere in this document |
||||
|
Output: the policy string except for categories where its funcs have |
||||
|
multiple types. |
||||
|
In that case it will be a comma separated list. |
||||
|
.SS mergerfs file xattrs |
||||
|
.PP |
||||
|
While they won\[aq]t show up when using |
||||
|
listxattr (http://linux.die.net/man/2/listxattr) \f[B]mergerfs\f[] |
||||
|
offers a number of special xattrs to query information about the files |
||||
|
served. |
||||
|
To access the values you will need to issue a |
||||
|
getxattr (http://linux.die.net/man/2/getxattr) for one of the following: |
||||
|
.IP \[bu] 2 |
||||
|
\f[B]user.mergerfs.basepath:\f[] the base mount point for the file given |
||||
|
the current search policy |
||||
|
.IP \[bu] 2 |
||||
|
\f[B]user.mergerfs.relpath:\f[] the relative path of the file from the |
||||
|
perspective of the mount point |
||||
|
.IP \[bu] 2 |
||||
|
\f[B]user.mergerfs.fullpath:\f[] the full path of the original file |
||||
|
given the search policy |
||||
|
.IP \[bu] 2 |
||||
|
\f[B]user.mergerfs.allpaths:\f[] a NUL (\[aq]\[aq]) separated list of |
||||
|
full paths to all files found |
||||
|
.IP |
||||
|
.nf |
||||
|
\f[C] |
||||
|
[trapexit:/tmp/mount]\ $\ ls |
||||
|
A\ B\ C |
||||
|
[trapexit:/tmp/mount]\ $\ xattr\ \-p\ user.mergerfs.fullpath\ A |
||||
|
/mnt/a/full/path/to/A |
||||
|
[trapexit:/tmp/mount]\ $\ xattr\ \-p\ user.mergerfs.basepath\ A |
||||
|
/mnt/a |
||||
|
[trapexit:/tmp/mount]\ $\ xattr\ \-p\ user.mergerfs.relpath\ A |
||||
|
/full/path/to/A |
||||
|
[trapexit:/tmp/mount]\ $\ xattr\ \-p\ user.mergerfs.allpaths\ A\ |\ tr\ \[aq]\\0\[aq]\ \[aq]\\n\[aq] |
||||
|
/mnt/a/full/path/to/A |
||||
|
/mnt/b/full/path/to/A |
||||
|
\f[] |
||||
|
.fi |
||||
|
.SH TOOLING |
||||
|
.IP \[bu] 2 |
||||
|
/usr/sbin/fsck.mergerfs: Provides permissions and ownership auditing and |
||||
|
the ability to fix them. |
||||
|
.SH TIPS / NOTES |
||||
|
.IP \[bu] 2 |
||||
|
If you don\[aq]t see some directories / files you expect in a merged |
||||
|
point be sure the user has permission to all the underlying directories. |
||||
|
If \f[C]/drive0/a\f[] has is owned by \f[C]root:root\f[] with ACLs set |
||||
|
to \f[C]0700\f[] and \f[C]/drive1/a\f[] is \f[C]root:root\f[] and |
||||
|
\f[C]0755\f[] you\[aq]ll see only \f[C]/drive1/a\f[]. |
||||
|
Use \f[C]fsck.mergerfs\f[] to audit the drive for out of sync |
||||
|
permissions. |
||||
|
.IP \[bu] 2 |
||||
|
Since POSIX gives you only error or success on calls its difficult to |
||||
|
determine the proper behavior when applying the behavior to multiple |
||||
|
targets. |
||||
|
Generally if something succeeds when reading it returns the data it can. |
||||
|
If something fails when making an action we continue on and return the |
||||
|
last error. |
||||
|
.IP \[bu] 2 |
||||
|
The recommended options are \f[B]defaults,allow_other\f[]. |
||||
|
The \f[B]allow_other\f[] is to allow users who are not the one which |
||||
|
executed mergerfs access to the mountpoint. |
||||
|
\f[B]defaults\f[] is described above and should offer the best |
||||
|
performance. |
||||
|
It\[aq]s possible that if you\[aq]re running on an older platform the |
||||
|
\f[B]splice\f[] features aren\[aq]t available and could error. |
||||
|
In that case simply use the other options manually. |
||||
|
.IP \[bu] 2 |
||||
|
If write performance is valued more than read it may be useful to enable |
||||
|
\f[B]direct_io\f[]. |
||||
|
.IP \[bu] 2 |
||||
|
Remember that some policies mixed with some functions may result in |
||||
|
strange behaviors. |
||||
|
Not that some of these behaviors and race conditions couldn\[aq]t happen |
||||
|
outside \f[B]mergerfs\f[] but that they are far more likely to occur on |
||||
|
account of attempt to merge together multiple sources of data which |
||||
|
could be out of sync due to the different policies. |
||||
|
.IP \[bu] 2 |
||||
|
An example: Kodi (http://kodi.tv) and Plex (http://plex.tv) can |
||||
|
apparently use directory mtime (http://linux.die.net/man/2/stat) to more |
||||
|
efficiently determine whether or not to scan for new content rather than |
||||
|
simply performing a full scan. |
||||
|
If using the current default \f[B]getattr\f[] policy of \f[B]ff\f[] its |
||||
|
possible \f[B]Kodi\f[] will miss an update on account of it returning |
||||
|
the first directory found\[aq]s \f[B]stat\f[] info and its a later |
||||
|
directory on another mount which had the \f[B]mtime\f[] recently |
||||
|
updated. |
||||
|
To fix this you will want to set \f[B]func.getattr=newest\f[]. |
||||
|
Remember though that this is just \f[B]stat\f[]. |
||||
|
If the file is later \f[B]open\f[]\[aq]ed or \f[B]unlink\f[]\[aq]ed and |
||||
|
the policy is different for those then a completely different file or |
||||
|
directory could be acted on. |
||||
|
.IP \[bu] 2 |
||||
|
Due to previously mentioned issues its generally best to set |
||||
|
\f[B]category\f[] wide policies rather than individual |
||||
|
\f[B]func\f[]\[aq]s. |
||||
|
This will help limit the confusion of tools such as |
||||
|
rsync (http://linux.die.net/man/1/rsync). |
||||
|
.SH Known Issues / Bugs |
||||
|
.SS Samba |
||||
|
.IP \[bu] 2 |
||||
|
Moving files or directories between directories on a SMB share fail with |
||||
|
IO errors. |
||||
|
.RS 2 |
||||
|
.PP |
||||
|
Workaround: Copy the file/directory and then remove the original rather |
||||
|
than move. |
||||
|
.PP |
||||
|
This isn\[aq]t an issue with Samba but some SMB clients. |
||||
|
GVFS\-fuse v1.20.3 and prior (found in Ubuntu 14.04 among others) failed |
||||
|
to handle certain error codes correctly. |
||||
|
Particularly \f[B]STATUS_NOT_SAME_DEVICE\f[] which comes from the |
||||
|
\f[B]EXDEV\f[] which is returned by \f[B]rename\f[] when the call is |
||||
|
crossing mountpoints. |
||||
|
When a program gets an \f[B]EXDEV\f[] it needs to explicitly take an |
||||
|
alternate action to accomplish it\[aq]s goal. |
||||
|
In the case of \f[B]mv\f[] or similar it tries \f[B]rename\f[] and on |
||||
|
\f[B]EXDEV\f[] falls back to a manual copying of data between the two |
||||
|
locations and unlinking the source. |
||||
|
In these older versions of GVFS\-fuse if it received \f[B]EXDEV\f[] it |
||||
|
would translate that into \f[B]EIO\f[]. |
||||
|
This would cause \f[B]mv\f[] or most any application attempting to move |
||||
|
files around on that SMB share to fail with a IO error. |
||||
|
.PP |
||||
|
GVFS\-fuse v1.22.0 (https://bugzilla.gnome.org/show_bug.cgi?id=734568) |
||||
|
and above fixed this issue but a large number of systems use the older |
||||
|
release. |
||||
|
On Ubuntu the version can be checked by issuing |
||||
|
\f[C]apt\-cache\ showpkg\ gvfs\-fuse\f[]. |
||||
|
Most distros released in 2015 seem to have the updated release and will |
||||
|
work fine but older systems may not. |
||||
|
Upgrading gvfs\-fuse or the distro in general will address the problem. |
||||
|
.PP |
||||
|
In Apple\[aq]s MacOSX 10.9 they replaced Samba (client and server) with |
||||
|
their own product. |
||||
|
It appears their new client does not handle \f[B]EXDEV\f[] either and |
||||
|
responds similar to older release of gvfs on Linux. |
||||
|
.RE |
||||
|
.SS Supplemental groups |
||||
|
.IP \[bu] 2 |
||||
|
Due to the overhead of |
||||
|
getgroups/setgroups (http://linux.die.net/man/2/setgroups) mergerfs |
||||
|
utilizes a cache. |
||||
|
This cache is opportunistic and per thread. |
||||
|
Each thread will query the supplemental groups for a user when that |
||||
|
particular thread needs to change credentials and will keep that data |
||||
|
for the lifetime of the mount or thread. |
||||
|
This means that if a user is added to a group it may not be picked up |
||||
|
without the restart of mergerfs. |
||||
|
However, since the high level FUSE API\[aq]s (at least the standard |
||||
|
version) thread pool dynamically grows and shrinks it\[aq]s possible |
||||
|
that over time a thread will be killed and later a new thread with no |
||||
|
cache will start and query the new data. |
||||
|
.RS 2 |
||||
|
.PP |
||||
|
The gid cache uses fixed storage to simplify the design and be |
||||
|
compatible with older systems which may not have C++11 compilers (as the |
||||
|
original design required). |
||||
|
There is enough storage for 256 users\[aq] supplemental groups. |
||||
|
Each user is allowed upto 32 supplemental groups. |
||||
|
Linux >= 2.6.3 allows upto 65535 groups per user but most other *nixs |
||||
|
allow far less. |
||||
|
NFS allowing only 16. |
||||
|
The system does handle overflow gracefully. |
||||
|
If the user has more than 32 supplemental groups only the first 32 will |
||||
|
be used. |
||||
|
If more than 256 users are using the system when an uncached user is |
||||
|
found it will evict an existing user\[aq]s cache at random. |
||||
|
So long as there aren\[aq]t more than 256 active users this should be |
||||
|
fine. |
||||
|
If either value is too low for your needs you will have to modify |
||||
|
\f[C]gidcache.hpp\f[] to increase the values. |
||||
|
Note that doing so will increase the memory needed by each thread. |
||||
|
.RE |
||||
|
.SH FAQ |
||||
|
.PP |
||||
|
\f[I]It\[aq]s mentioned that there are some security issues with mhddfs. |
||||
|
What are they? How does mergerfs address them?\f[] |
||||
|
.PP |
||||
|
mhddfs (https://github.com/trapexit/mhddfs) tries to handle being run as |
||||
|
\f[B]root\f[] by calling |
||||
|
getuid() (https://github.com/trapexit/mhddfs/blob/cae96e6251dd91e2bdc24800b4a18a74044f6672/src/main.c#L319) |
||||
|
and if it returns \f[B]0\f[] then it will |
||||
|
chown (http://linux.die.net/man/1/chown) the file. |
||||
|
Not only is that a race condition but it doesn\[aq]t handle many other |
||||
|
situations. |
||||
|
Rather than attempting to simulate POSIX ACL behaviors the proper |
||||
|
behavior is to use seteuid (http://linux.die.net/man/2/seteuid) and |
||||
|
setegid (http://linux.die.net/man/2/setegid), become the user making the |
||||
|
original call and perform the action as them. |
||||
|
This is how mergerfs (https://github.com/trapexit/mergerfs) handles |
||||
|
things. |
||||
|
.PP |
||||
|
If you are familiar with POSIX standards you\[aq]ll know that this |
||||
|
behavior poses a problem. |
||||
|
\f[B]seteuid\f[] and \f[B]setegid\f[] affect the whole process and |
||||
|
\f[B]libfuse\f[] is multithreaded by default. |
||||
|
We\[aq]d need to lock access to \f[B]seteuid\f[] and \f[B]setegid\f[] |
||||
|
with a mutex so that the several threads aren\[aq]t stepping on one |
||||
|
another and files end up with weird permissions and ownership. |
||||
|
This however wouldn\[aq]t scale well. |
||||
|
With lots of calls the contention on that mutex would be extremely high. |
||||
|
Thankfully on Linux and OSX we have a better solution. |
||||
|
.PP |
||||
|
OSX has a non\-portable pthread |
||||
|
extension (https://developer.apple.com/library/mac/documentation/Darwin/Reference/ManPages/man2/pthread_setugid_np.2.html) |
||||
|
for per\-thread user and group impersonation. |
||||
|
.PP |
||||
|
Linux does not support |
||||
|
pthread_setugid_np (https://developer.apple.com/library/mac/documentation/Darwin/Reference/ManPages/man2/pthread_setugid_np.2.html) |
||||
|
but user and group IDs are a per\-thread attribute though documentation |
||||
|
on that fact or how to manipulate them is not well distributed. |
||||
|
From the \f[B]4.00\f[] release of the Linux man\-pages project for |
||||
|
setuid (http://man7.org/linux/man-pages/man2/setuid.2.html) |
||||
|
.RS |
||||
|
.PP |
||||
|
At the kernel level, user IDs and group IDs are a per\-thread attribute. |
||||
|
However, POSIX requires that all threads in a process share the same |
||||
|
credentials. |
||||
|
The NPTL threading implementation handles the POSIX requirements by |
||||
|
providing wrapper functions for the various system calls that change |
||||
|
process UIDs and GIDs. |
||||
|
These wrapper functions (including the one for setuid()) employ a |
||||
|
signal\-based technique to ensure that when one thread changes |
||||
|
credentials, all of the other threads in the process also change their |
||||
|
credentials. |
||||
|
For details, see nptl(7). |
||||
|
.RE |
||||
|
.PP |
||||
|
Turns out the setreuid syscalls apply only to the thread. |
||||
|
GLIBC hides this away using RT signals to inform all threads to change |
||||
|
credentials. |
||||
|
Taking after \f[B]Samba\f[] mergerfs uses |
||||
|
\f[B]syscall(SYS_setreuid,...)\f[] to set the callers credentials for |
||||
|
that thread only. |
||||
|
Jumping back to \f[B]root\f[] as necessary should escalated privileges |
||||
|
be needed (for instance: to clone paths). |
||||
|
.PP |
||||
|
For non\-Linux systems mergerfs uses a read\-write lock and changes |
||||
|
credentials only when necessary. |
||||
|
If multiple threads are to be user X then only the first one will need |
||||
|
to change the processes credentials. |
||||
|
So long as the other threads need to be user X they will take a readlock |
||||
|
allow multiple threads to share the credentials. |
||||
|
Once a request comes in to run as user Y that thread will attempt a |
||||
|
write lock and change to Y\[aq]s credentials when it can. |
||||
|
If the ability to give writers priority is supported then that flag will |
||||
|
be used so threads trying to change credentials don\[aq]t starve. |
||||
|
This isn\[aq]t the best solution but should work reasonably well. |
||||
|
As new platforms are supported if they offer per thread credentials |
||||
|
those APIs will be adopted. |
||||
|
.SH AUTHORS |
||||
|
Antonio SJ Musumeci <trapexit@spawn.link>. |
Write
Preview
Loading…
Cancel
Save
Reference in new issue