From 30c29c9fae1bcbf240be9e60c409071d1173fa04 Mon Sep 17 00:00:00 2001 From: Antonio SJ Musumeci Date: Sat, 17 Oct 2015 22:06:10 -0400 Subject: [PATCH] remove manpage from root directory --- mergerfs.1 | 759 ----------------------------------------------------- 1 file changed, 759 deletions(-) delete mode 100644 mergerfs.1 diff --git a/mergerfs.1 b/mergerfs.1 deleted file mode 100644 index b8795f27..00000000 --- a/mergerfs.1 +++ /dev/null @@ -1,759 +0,0 @@ -.\"t -.TH "mergerfs" "1" "2015\-10\-11" "mergerfs user manual" "" -.SH NAME -.PP -mergerfs \- another FUSE union filesystem -.SH SYNOPSIS -.PP -mergerfs \-o -.SH DESCRIPTION -.PP -\f[B]mergerfs\f[] is similar to \f[B]mhddfs\f[], \f[B]unionfs\f[], and -\f[B]aufs\f[]. -Like \f[B]mhddfs\f[] in that it too uses \f[B]FUSE\f[]. -Like \f[B]aufs\f[] in that it provides multiple policies for how to -handle behavior. -.PP -Why \f[B]mergerfs\f[] when those exist? -\f[B]mhddfs\f[] has not been updated in some time nor very flexible. -There are also security issues when with running as root. -\f[B]aufs\f[] is more flexible than \f[B]mhddfs\f[] but kernel based and -difficult to debug when problems arise. -Neither support file attributes -(chattr (http://linux.die.net/man/1/chattr)). -.SH FEATURES -.IP \[bu] 2 -Runs in userspace (FUSE) -.IP \[bu] 2 -Configurable behaviors -.IP \[bu] 2 -Supports extended attributes (xattrs) -.IP \[bu] 2 -Supports file attributes (chattr) -.IP \[bu] 2 -Dynamically configurable (via xattrs) -.IP \[bu] 2 -Safe to run as root -.IP \[bu] 2 -Opportunistic credential caching -.IP \[bu] 2 -Works with heterogeneous filesystem types -.SH OPTIONS -.SS options -.IP \[bu] 2 -\f[B]defaults\f[]: a shortcut for FUSE\[aq]s \f[B]atomic_o_trunc\f[], -\f[B]auto_cache\f[], \f[B]big_writes\f[], \f[B]default_permissions\f[], -\f[B]splice_move\f[], \f[B]splice_read\f[], and \f[B]splice_write\f[]. -These options seem to provide the best performance. -.IP \[bu] 2 -\f[B]direct_io\f[]: causes FUSE to bypass an addition caching step which -can increase write speeds at the detriment of read speed. -.IP \[bu] 2 -\f[B]minfreespace\f[]: the minimum space value used for the -\f[B]lfs\f[], \f[B]fwfs\f[], and \f[B]epmfs\f[] policies. -Understands \[aq]K\[aq], \[aq]M\[aq], and \[aq]G\[aq] to represent -kilobyte, megabyte, and gigabyte respectively. -(default: 4G) -.IP \[bu] 2 -\f[B]moveonenospc\f[]: when enabled (set to \f[B]true\f[]) if a -\f[B]write\f[] fails with \f[B]ENOSPC\f[] a scan of all drives will be -done looking for the drive with most free space which is at least the -size of the file plus the amount which failed to write. -An attempt to move the file to that drive will occur (keeping all -metadata possible) and if successful the original is unlinked and the -write retried. -(default: false) -.IP \[bu] 2 -\f[B]func.=\f[]: sets the specific FUSE function\[aq]s -policy. -See below for the list of value types. -Example: \f[B]func.getattr=newest\f[] -.IP \[bu] 2 -\f[B]category.=\f[]: Sets policy of all FUSE functions -in the provided category. -Example: \f[B]category.create=mfs\f[] -.PP -\f[B]NOTE:\f[] Options are evaluated in the order listed so if the -options are \f[B]func.rmdir=rand,category.action=ff\f[] the -\f[B]action\f[] category setting will override the \f[B]rmdir\f[] -setting. -.SS srcpoints -.PP -The source points argument is a colon (\[aq]:\[aq]) delimited list of -paths. -To make it simpler to include multiple source points without having to -modify your fstab (http://linux.die.net/man/5/fstab) we also support -globbing (http://linux.die.net/man/7/glob). -\f[B]The globbing tokens MUST be escaped when using via the shell else -the shell itself will probably expand it.\f[] -.IP -.nf -\f[C] -$\ mergerfs\ /mnt/disk\\*:/mnt/cdrom\ /media/drives -\f[] -.fi -.PP -The above line will use all points in /mnt prefixed with \f[I]disk\f[] -and the directory \f[I]cdrom\f[]. -.PP -In /etc/fstab it\[aq]d look like the following: -.IP -.nf -\f[C] -#\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ -/mnt/disk*:/mnt/cdrom\ \ /media/drives\ \ fuse.mergerfs\ \ defaults,allow_other\ \ 0\ \ \ \ \ \ \ 0 -\f[] -.fi -.PP -\f[B]NOTE:\f[] the globbing is done at mount or xattr update time. -If a new directory is added matching the glob after the fact it will not -be included. -.SH POLICIES -.PP -Filesystem calls are broken up into 3 categories: \f[B]action\f[], -\f[B]create\f[], \f[B]search\f[]. -There are also some calls which have no policy attached due to state -being kept between calls. -These categories can be assigned a policy which dictates how -\f[B]mergerfs\f[] behaves. -Any policy can be assigned to a category though some aren\[aq]t terribly -practical. -For instance: \f[B]rand\f[] (Random) may be useful for \f[B]create\f[] -but could lead to very odd behavior if used for \f[B]search\f[]. -.SS Functional classifications -.PP -.TS -tab(@); -l l. -T{ -Category -T}@T{ -FUSE Functions -T} -_ -T{ -action -T}@T{ -chmod, chown, link, removexattr, rename, rmdir, setxattr, truncate, -unlink, utimens -T} -T{ -create -T}@T{ -create, mkdir, mknod, symlink -T} -T{ -search -T}@T{ -access, getattr, getxattr, ioctl, listxattr, open, readlink -T} -T{ -N/A -T}@T{ -fallocate, fgetattr, fsync, ftruncate, ioctl, read, readdir, release, -statfs, write -T} -.TE -.PP -\f[B]ioctl\f[] behaves differently if its acting on a directory. -It\[aq]ll use the \f[B]getattr\f[] policy to find and open the directory -before issuing the \f[B]ioctl\f[]. -In other cases where something may be searched (to confirm a directory -exists across all source mounts) then \f[B]getattr\f[] will be used. -.SS Policy descriptions -.PP -.TS -tab(@); -l l. -T{ -Policy -T}@T{ -Description -T} -_ -T{ -ff (first found) -T}@T{ -Given the order of the drives act on the first one found (regardless if -stat would return EACCES). -T} -T{ -ffwp (first found w/ permissions) -T}@T{ -Given the order of the drives act on the first one found which you have -access (stat does not error with EACCES). -T} -T{ -newest (newest file) -T}@T{ -If multiple files exist return the one with the most recent mtime. -T} -T{ -mfs (most free space) -T}@T{ -Use the drive with the most free space available. -T} -T{ -epmfs (existing path, most free space) -T}@T{ -If the path exists on multiple drives use the one with the most free -space and is greater than \f[B]minfreespace\f[]. -If no drive has at least \f[B]minfreespace\f[] then fallback to -\f[B]mfs\f[]. -T} -T{ -fwfs (first with free space) -T}@T{ -Pick the first drive which has at least \f[B]minfreespace\f[]. -T} -T{ -lfs (least free space) -T}@T{ -Pick the drive with least available space but more than -\f[B]minfreespace\f[]. -T} -T{ -rand (random) -T}@T{ -Pick an existing drive at random. -T} -T{ -all -T}@T{ -Applies action to all found. -For searches it will behave like first found \f[B]ff\f[]. -T} -T{ -enosys, einval, enotsup, exdev, erofs -T}@T{ -Exclusively return \f[C]\-1\f[] with \f[C]errno\f[] set to the -respective value. -Useful for debugging other applications\[aq] behavior to errors. -T} -.TE -.SS Defaults -.PP -.TS -tab(@); -l l. -T{ -Category -T}@T{ -Policy -T} -_ -T{ -action -T}@T{ -all -T} -T{ -create -T}@T{ -epmfs -T} -T{ -search -T}@T{ -ff -T} -.TE -.SS rename -.PP -rename (http://man7.org/linux/man-pages/man2/rename.2.html) is a tricky -function in a merged system. -Normally if a rename can\[aq]t be done atomically due to the from and to -paths existing on different mount points it will return \f[C]\-1\f[] -with \f[C]errno\ =\ EXDEV\f[]. -The atomic rename is most critical for replacing files in place -atomically (such as securing writing to a temp file and then replacing a -target). -The problem is that by merging multiple paths you can have N instances -of the source and destinations on different drives. -Meaning that if you just renamed each source locally you could end up -with the destination files not overwriten / replaced. -To address this mergerfs works in the following way. -If the source and destination exist in different directories it will -immediately return \f[C]EXDEV\f[]. -Generally it\[aq]s not expected for cross directory renames to work so -it should be fine for most instances (mv,rsync,etc.). -If they do belong to the same directory it then runs the \f[C]rename\f[] -policy to get the files to rename. -It iterates through and renames each file while keeping track of those -paths which have not been renamed. -If all the renames succeed it will then \f[C]unlink\f[] or -\f[C]rmdir\f[] the other paths to clean up any preexisting target files. -This allows the new file to be found without the file itself ever -disappearing. -There may still be some issues with this behavior. -Particularly on error. -At the moment however this seems the best policy. -.SS readdir -.PP -readdir (http://linux.die.net/man/3/readdir) is very different from most -functions in this realm. -It certainly could have it\[aq]s own set of policies to tweak its -behavior. -At this time it provides a simple \f[B]first found\f[] merging of -directories and file found. -That is: only the first file or directory found for a directory is -returned. -Given how FUSE works though the data representing the returned entry -comes from \f[B]getattr\f[]. -.PP -It could be extended to offer the ability to see all files found. -Perhaps concatenating \f[B]#\f[] and a number to the name. -But to really be useful you\[aq]d need to be able to access them which -would complicate file lookup. -.SS statvfs -.PP -statvfs (http://linux.die.net/man/2/statvfs) normalizes the source -drives based on the fragment size and sums the number of adjusted blocks -and inodes. -This means you will see the combined space of all sources. -Total, used, and free. -The sources however are dedupped based on the drive so multiple points -on the same drive will not result in double counting it\[aq]s space. -.PP -\f[B]NOTE:\f[] Since we can not (easily) replicate the atomicity of an -\f[B]mkdir\f[] or \f[B]mknod\f[] without side effects those calls will -first do a scan to see if the file exists and then attempts a create. -This means there is a slight race condition. -Worse case you\[aq]d end up with the directory or file on more than one -mount. -.SH BUILDING -.PP -\f[B]NOTE:\f[] Prebuilt packages can be found at: -https://github.com/trapexit/mergerfs/releases -.PP -First get the code from github (http://github.com/trapexit/mergerfs). -.IP -.nf -\f[C] -$\ git\ clone\ https://github.com/trapexit/mergerfs.git -$\ #\ or -$\ wget\ https://github.com/trapexit/mergerfs/archive/master.zip -\f[] -.fi -.SS Debian / Ubuntu -.IP -.nf -\f[C] -$\ sudo\ apt\-get\ install\ g++\ pkg\-config\ git\ git\-buildpackage\ pandoc\ debhelper\ libfuse\-dev\ libattr1\-dev -$\ cd\ mergerfs -$\ make\ deb -$\ sudo\ dpkg\ \-i\ ../mergerfs_version_arch.deb -\f[] -.fi -.SS Fedora -.IP -.nf -\f[C] -$\ su\ \- -#\ dnf\ install\ rpm\-build\ fuse\-devel\ libattr\-devel\ pandoc\ gcc\-c++\ git\ make\ which -#\ cd\ mergerfs -#\ make\ rpm -#\ rpm\ \-i\ rpmbuild/RPMS//mergerfs\-..rpm -\f[] -.fi -.SS Generically -.PP -Have pkg\-config, pandoc, libfuse, libattr1 installed. -.IP -.nf -\f[C] -$\ cd\ mergerfs -$\ make -$\ make\ man -$\ sudo\ make\ install -\f[] -.fi -.SH RUNTIME -.SS \&.mergerfs pseudo file -.IP -.nf -\f[C] -/.mergerfs -\f[] -.fi -.PP -There is a pseudo file available at the mount point which allows for the -runtime modification of certain \f[B]mergerfs\f[] options. -The file will not show up in \f[B]readdir\f[] but can be -\f[B]stat\f[]\[aq]ed and manipulated via -{list,get,set}xattrs (http://linux.die.net/man/2/listxattr) calls. -.PP -Even if xattrs are disabled the -{list,get,set}xattrs (http://linux.die.net/man/2/listxattr) calls will -still work. -.SS Keys -.PP -Use \f[C]xattr\ \-l\ /mount/point/.mergerfs\f[] to see all supported -keys. -.SS Example -.IP -.nf -\f[C] -[trapexit:/tmp/mount]\ $\ xattr\ \-l\ .mergerfs -user.mergerfs.srcmounts:\ /tmp/a:/tmp/b -user.mergerfs.minfreespace:\ 4294967295 -user.mergerfs.moveonenospc:\ false -user.mergerfs.policies:\ all,einval,enosys,enotsup,epmfs,erofs,exdev,ff,ffwp,fwfs,lfs,mfs,newest,rand -user.mergerfs.version:\ x.y.z -user.mergerfs.category.action:\ all -user.mergerfs.category.create:\ epmfs -user.mergerfs.category.search:\ ff -user.mergerfs.func.access:\ ff -user.mergerfs.func.chmod:\ all -user.mergerfs.func.chown:\ all -user.mergerfs.func.create:\ epmfs -user.mergerfs.func.getattr:\ ff -user.mergerfs.func.getxattr:\ ff -user.mergerfs.func.link:\ all -user.mergerfs.func.listxattr:\ ff -user.mergerfs.func.mkdir:\ epmfs -user.mergerfs.func.mknod:\ epmfs -user.mergerfs.func.open:\ ff -user.mergerfs.func.readlink:\ ff -user.mergerfs.func.removexattr:\ all -user.mergerfs.func.rename:\ all -user.mergerfs.func.rmdir:\ all -user.mergerfs.func.setxattr:\ all -user.mergerfs.func.symlink:\ epmfs -user.mergerfs.func.truncate:\ all -user.mergerfs.func.unlink:\ all -user.mergerfs.func.utimens:\ all - -[trapexit:/tmp/mount]\ $\ xattr\ \-p\ user.mergerfs.category.search\ .mergerfs -ff - -[trapexit:/tmp/mount]\ $\ xattr\ \-w\ user.mergerfs.category.search\ ffwp\ .mergerfs -[trapexit:/tmp/mount]\ $\ xattr\ \-p\ user.mergerfs.category.search\ .mergerfs -ffwp - -[trapexit:/tmp/mount]\ $\ xattr\ \-w\ user.mergerfs.srcmounts\ +/tmp/c\ .mergerfs -[trapexit:/tmp/mount]\ $\ xattr\ \-p\ user.mergerfs.srcmounts\ .mergerfs -/tmp/a:/tmp/b:/tmp/c - -[trapexit:/tmp/mount]\ $\ xattr\ \-w\ user.mergerfs.srcmounts\ =/tmp/c\ .mergerfs -[trapexit:/tmp/mount]\ $\ xattr\ \-p\ user.mergerfs.srcmounts\ .mergerfs -/tmp/c - -[trapexit:/tmp/mount]\ $\ xattr\ \-w\ user.mergerfs.srcmounts\ \[aq]+[list] -T}@T{ -append -T} -T{ -\-[list] -T}@T{ -remove all values provided -T} -T{ -\-< -T}@T{ -remove first in list -T} -T{ -\-> -T}@T{ -remove last in list -T} -.TE -.SS minfreespace -.PP -Input: interger with an optional suffix. -\f[B]K\f[], \f[B]M\f[], or \f[B]G\f[]. -Output: value in bytes -.SS moveonenospc -.PP -Input: \f[B]true\f[] and \f[B]false\f[] Ouput: \f[B]true\f[] or -\f[B]false\f[] -.SS categories / funcs -.PP -Input: short policy string as described elsewhere in this document -Output: the policy string except for categories where its funcs have -multiple types. -In that case it will be a comma separated list. -.SS mergerfs file xattrs -.PP -While they won\[aq]t show up when using -listxattr (http://linux.die.net/man/2/listxattr) \f[B]mergerfs\f[] -offers a number of special xattrs to query information about the files -served. -To access the values you will need to issue a -getxattr (http://linux.die.net/man/2/getxattr) for one of the following: -.IP \[bu] 2 -\f[B]user.mergerfs.basepath:\f[] the base mount point for the file given -the current search policy -.IP \[bu] 2 -\f[B]user.mergerfs.relpath:\f[] the relative path of the file from the -perspective of the mount point -.IP \[bu] 2 -\f[B]user.mergerfs.fullpath:\f[] the full path of the original file -given the search policy -.IP \[bu] 2 -\f[B]user.mergerfs.allpaths:\f[] a NUL (\[aq]\[aq]) separated list of -full paths to all files found -.IP -.nf -\f[C] -[trapexit:/tmp/mount]\ $\ ls -A\ B\ C -[trapexit:/tmp/mount]\ $\ xattr\ \-p\ user.mergerfs.fullpath\ A -/mnt/a/full/path/to/A -[trapexit:/tmp/mount]\ $\ xattr\ \-p\ user.mergerfs.basepath\ A -/mnt/a -[trapexit:/tmp/mount]\ $\ xattr\ \-p\ user.mergerfs.relpath\ A -/full/path/to/A -[trapexit:/tmp/mount]\ $\ xattr\ \-p\ user.mergerfs.allpaths\ A\ |\ tr\ \[aq]\\0\[aq]\ \[aq]\\n\[aq] -/mnt/a/full/path/to/A -/mnt/b/full/path/to/A -\f[] -.fi -.SH TOOLING -.IP \[bu] 2 -/usr/sbin/fsck.mergerfs: Provides permissions and ownership auditing and -the ability to fix them. -.SH TIPS / NOTES -.IP \[bu] 2 -If you don\[aq]t see some directories / files you expect in a merged -point be sure the user has permission to all the underlying directories. -If \f[C]/drive0/a\f[] has is owned by \f[C]root:root\f[] with ACLs set -to \f[C]0700\f[] and \f[C]/drive1/a\f[] is \f[C]root:root\f[] and -\f[C]0755\f[] you\[aq]ll see only \f[C]/drive1/a\f[]. -Use \f[C]fsck.mergerfs\f[] to audit the drive for out of sync -permissions. -.IP \[bu] 2 -Since POSIX gives you only error or success on calls its difficult to -determine the proper behavior when applying the behavior to multiple -targets. -Generally if something succeeds when reading it returns the data it can. -If something fails when making an action we continue on and return the -last error. -.IP \[bu] 2 -The recommended options are \f[B]defaults,allow_other\f[]. -The \f[B]allow_other\f[] is to allow users who are not the one which -executed mergerfs access to the mountpoint. -\f[B]defaults\f[] is described above and should offer the best -performance. -It\[aq]s possible that if you\[aq]re running on an older platform the -\f[B]splice\f[] features aren\[aq]t available and could error. -In that case simply use the other options manually. -.IP \[bu] 2 -If write performance is valued more than read it may be useful to enable -\f[B]direct_io\f[]. -.IP \[bu] 2 -Remember that some policies mixed with some functions may result in -strange behaviors. -Not that some of these behaviors and race conditions couldn\[aq]t happen -outside \f[B]mergerfs\f[] but that they are far more likely to occur on -account of attempt to merge together multiple sources of data which -could be out of sync due to the different policies. -.IP \[bu] 2 -An example: Kodi (http://kodi.tv) and Plex (http://plex.tv) can -apparently use directory mtime (http://linux.die.net/man/2/stat) to more -efficiently determine whether or not to scan for new content rather than -simply performing a full scan. -If using the current default \f[B]getattr\f[] policy of \f[B]ff\f[] its -possible \f[B]Kodi\f[] will miss an update on account of it returning -the first directory found\[aq]s \f[B]stat\f[] info and its a later -directory on another mount which had the \f[B]mtime\f[] recently -updated. -To fix this you will want to set \f[B]func.getattr=newest\f[]. -Remember though that this is just \f[B]stat\f[]. -If the file is later \f[B]open\f[]\[aq]ed or \f[B]unlink\f[]\[aq]ed and -the policy is different for those then a completely different file or -directory could be acted on. -.IP \[bu] 2 -Due to previously mentioned issues its generally best to set -\f[B]category\f[] wide policies rather than individual -\f[B]func\f[]\[aq]s. -This will help limit the confusion of tools such as -rsync (http://linux.die.net/man/1/rsync). -.SH Known Issues / Bugs -.SS Samba -.IP \[bu] 2 -Moving files or directories between directories on a SMB share fail with -IO errors. -.RS 2 -.PP -Workaround: Copy the file/directory and then remove the original rather -than move. -.PP -This isn\[aq]t an issue with Samba but some SMB clients. -GVFS\-fuse v1.20.3 and prior (found in Ubuntu 14.04 among others) failed -to handle certain error codes correctly. -Particularly \f[B]STATUS_NOT_SAME_DEVICE\f[] which comes from the -\f[B]EXDEV\f[] which is returned by \f[B]rename\f[] when the call is -crossing mountpoints. -When a program gets an \f[B]EXDEV\f[] it needs to explicitly take an -alternate action to accomplish it\[aq]s goal. -In the case of \f[B]mv\f[] or similar it tries \f[B]rename\f[] and on -\f[B]EXDEV\f[] falls back to a manual copying of data between the two -locations and unlinking the source. -In these older versions of GVFS\-fuse if it received \f[B]EXDEV\f[] it -would translate that into \f[B]EIO\f[]. -This would cause \f[B]mv\f[] or most any application attempting to move -files around on that SMB share to fail with a IO error. -.PP -GVFS\-fuse v1.22.0 (https://bugzilla.gnome.org/show_bug.cgi?id=734568) -and above fixed this issue but a large number of systems use the older -release. -On Ubuntu the version can be checked by issuing -\f[C]apt\-cache\ showpkg\ gvfs\-fuse\f[]. -Most distros released in 2015 seem to have the updated release and will -work fine but older systems may not. -Upgrading gvfs\-fuse or the distro in general will address the problem. -.PP -In Apple\[aq]s MacOSX 10.9 they replaced Samba (client and server) with -their own product. -It appears their new client does not handle \f[B]EXDEV\f[] either and -responds similar to older release of gvfs on Linux. -.RE -.SS Supplemental groups -.IP \[bu] 2 -Due to the overhead of -getgroups/setgroups (http://linux.die.net/man/2/setgroups) mergerfs -utilizes a cache. -This cache is opportunistic and per thread. -Each thread will query the supplemental groups for a user when that -particular thread needs to change credentials and will keep that data -for the lifetime of the mount or thread. -This means that if a user is added to a group it may not be picked up -without the restart of mergerfs. -However, since the high level FUSE API\[aq]s (at least the standard -version) thread pool dynamically grows and shrinks it\[aq]s possible -that over time a thread will be killed and later a new thread with no -cache will start and query the new data. -.RS 2 -.PP -The gid cache uses fixed storage to simplify the design and be -compatible with older systems which may not have C++11 compilers (as the -original design required). -There is enough storage for 256 users\[aq] supplemental groups. -Each user is allowed upto 32 supplemental groups. -Linux >= 2.6.3 allows upto 65535 groups per user but most other *nixs -allow far less. -NFS allowing only 16. -The system does handle overflow gracefully. -If the user has more than 32 supplemental groups only the first 32 will -be used. -If more than 256 users are using the system when an uncached user is -found it will evict an existing user\[aq]s cache at random. -So long as there aren\[aq]t more than 256 active users this should be -fine. -If either value is too low for your needs you will have to modify -\f[C]gidcache.hpp\f[] to increase the values. -Note that doing so will increase the memory needed by each thread. -.RE -.SH FAQ -.PP -\f[I]It\[aq]s mentioned that there are some security issues with mhddfs. -What are they? How does mergerfs address them?\f[] -.PP -mhddfs (https://github.com/trapexit/mhddfs) tries to handle being run as -\f[B]root\f[] by calling -getuid() (https://github.com/trapexit/mhddfs/blob/cae96e6251dd91e2bdc24800b4a18a74044f6672/src/main.c#L319) -and if it returns \f[B]0\f[] then it will -chown (http://linux.die.net/man/1/chown) the file. -Not only is that a race condition but it doesn\[aq]t handle many other -situations. -Rather than attempting to simulate POSIX ACL behaviors the proper -behavior is to use seteuid (http://linux.die.net/man/2/seteuid) and -setegid (http://linux.die.net/man/2/setegid), become the user making the -original call and perform the action as them. -This is how mergerfs (https://github.com/trapexit/mergerfs) handles -things. -.PP -If you are familiar with POSIX standards you\[aq]ll know that this -behavior poses a problem. -\f[B]seteuid\f[] and \f[B]setegid\f[] affect the whole process and -\f[B]libfuse\f[] is multithreaded by default. -We\[aq]d need to lock access to \f[B]seteuid\f[] and \f[B]setegid\f[] -with a mutex so that the several threads aren\[aq]t stepping on one -another and files end up with weird permissions and ownership. -This however wouldn\[aq]t scale well. -With lots of calls the contention on that mutex would be extremely high. -Thankfully on Linux and OSX we have a better solution. -.PP -OSX has a non\-portable pthread -extension (https://developer.apple.com/library/mac/documentation/Darwin/Reference/ManPages/man2/pthread_setugid_np.2.html) -for per\-thread user and group impersonation. -.PP -Linux does not support -pthread_setugid_np (https://developer.apple.com/library/mac/documentation/Darwin/Reference/ManPages/man2/pthread_setugid_np.2.html) -but user and group IDs are a per\-thread attribute though documentation -on that fact or how to manipulate them is not well distributed. -From the \f[B]4.00\f[] release of the Linux man\-pages project for -setuid (http://man7.org/linux/man-pages/man2/setuid.2.html) -.RS -.PP -At the kernel level, user IDs and group IDs are a per\-thread attribute. -However, POSIX requires that all threads in a process share the same -credentials. -The NPTL threading implementation handles the POSIX requirements by -providing wrapper functions for the various system calls that change -process UIDs and GIDs. -These wrapper functions (including the one for setuid()) employ a -signal\-based technique to ensure that when one thread changes -credentials, all of the other threads in the process also change their -credentials. -For details, see nptl(7). -.RE -.PP -Turns out the setreuid syscalls apply only to the thread. -GLIBC hides this away using RT signals to inform all threads to change -credentials. -Taking after \f[B]Samba\f[] mergerfs uses -\f[B]syscall(SYS_setreuid,...)\f[] to set the callers credentials for -that thread only. -Jumping back to \f[B]root\f[] as necessary should escalated privileges -be needed (for instance: to clone paths). -.PP -For non\-Linux systems mergerfs uses a read\-write lock and changes -credentials only when necessary. -If multiple threads are to be user X then only the first one will need -to change the processes credentials. -So long as the other threads need to be user X they will take a readlock -allow multiple threads to share the credentials. -Once a request comes in to run as user Y that thread will attempt a -write lock and change to Y\[aq]s credentials when it can. -If the ability to give writers priority is supported then that flag will -be used so threads trying to change credentials don\[aq]t starve. -This isn\[aq]t the best solution but should work reasonably well. -As new platforms are supported if they offer per thread credentials -those APIs will be adopted. -.SH AUTHORS -Antonio SJ Musumeci .