You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

239 lines
11 KiB

  1. # Technical Behavior and Limitations
  2. ## Do hardlinks work?
  3. Yes. See also the option `inodecalc` for how inode values are
  4. calculated.
  5. What mergerfs does not do is fake hard links across branches. Read
  6. the section "rename & link" for how it works.
  7. Remember that hardlinks will NOT work across devices. That includes
  8. between the original filesystem and a mergerfs pool, between two
  9. separate pools of the same underlying filesystems, or bind mounts of
  10. paths within the mergerfs pool. The latter is common when using Docker
  11. or Podman. Multiple volumes (bind mounts) to the same underlying
  12. filesystem are considered different devices. There is no way to link
  13. between them. You should mount in the highest directory in the
  14. mergerfs pool that includes all the paths you need if you want links
  15. to work.
  16. ## How does mergerfs handle moving and copying of files?
  17. This is a _very_ common mistaken assumption regarding how filesystems
  18. work. There is no such thing as "move" or "copy." These concepts are
  19. high level behaviors made up of numerous independent steps and _not_
  20. individual filesystem functions.
  21. A "move" can include a "copy" so lets describe copy first.
  22. When an application copies a file from source to destination it can do
  23. so in a number of ways but the basics are the following.
  24. 1. `open` the source file.
  25. 2. `create` the destination file.
  26. 3. `read` a chunk of data from source and `write` to
  27. destination. Continue till it runs out of data to copy.
  28. 4. Copy file metadata (`stat`) such as ownership (`chown`),
  29. permissions (`chmod`), timestamps (`utimes`), extended attributes
  30. (`getxattr`, `setxattr`), etc.
  31. 5. `close` source and destination files.
  32. "move" is typically a `rename(src,dst)` and if that errors with
  33. `EXDEV` (meaning the source and destination are on different
  34. filesystems) the application will "copy" the file as described above
  35. and then it removes (`unlink`) the source.
  36. The `rename(src,dst)`, `open(src)`, `create(dst)`, data copying,
  37. metadata copying, `unlink(src)`, etc. are entirely distinct and
  38. separate events. There is really no practical way to know that what is
  39. ultimately occurring is the "copying" of a file or what the source
  40. file would be. Since the source is not known there is no way to know
  41. how large a created file is destined to become. This is why it is
  42. impossible for mergerfs to choose the branch for a `create` based on
  43. file size. The only context provided when a file is created, besides
  44. the name, is the permissions, if it is to be read and/or written, and
  45. some low level settings for the operating system.
  46. All of this means that mergerfs can not make decisions when a file is
  47. created based on file size or the source of the data. That information
  48. is simply not available. At best mergerfs could respond to files
  49. reaching a certain size when writing data or when a file is closed.
  50. Related: if a user wished to have mergerfs perform certain activities
  51. based on the name of a file it is common and even best practice for a
  52. program to write to a temporary file first and then rename to its
  53. final destination. That temporary file name will typically be random
  54. and have no indication of the type of file being written.
  55. ## Does FICLONE or FICLONERANGE work?
  56. Unfortunately not. FUSE, the technology mergerfs is based on, does not
  57. support the `clone_file_range` feature needed for it to work. mergerfs
  58. won't even know such a request is made. The kernel will simply return
  59. an error back to the application making the request.
  60. Should FUSE gain the ability mergerfs will be updated to support it.
  61. ## Why do I get an "out of space" / "no space left on device" / ENOSPC error even though there appears to be lots of space available?
  62. First make sure you've read the sections above about policies, path
  63. preservation, branch filtering, and the options **minfreespace**,
  64. **moveonenospc**, **statfs**, and **statfs_ignore**.
  65. mergerfs is simply presenting a union of the content within multiple
  66. branches. The reported free space is an aggregate of space available
  67. within the pool (behavior modified by **statfs** and
  68. **statfs_ignore**). It does not represent a contiguous space. In the
  69. same way that read-only filesystems, those with quotas, or reserved
  70. space report the full theoretical space available.
  71. Due to path preservation, branch tagging, read-only status, and
  72. **minfreespace** settings it is perfectly valid that `ENOSPC` / "out
  73. of space" / "no space left on device" be returned. It is doing what
  74. was asked of it: filtering possible branches due to those
  75. settings. Only one error can be returned and if one of the reasons for
  76. filtering a branch was **minfreespace** then it will be returned as
  77. such. **moveonenospc** is only relevant to writing a file which is too
  78. large for the filesystem it's currently on.
  79. It is also possible that the filesystem selected has run out of
  80. inodes. Use `df -i` to list the total and available inodes per
  81. filesystem.
  82. If you don't care about path preservation then simply change the
  83. `create` policy to one which isn't. `mfs` is probably what most are
  84. looking for. The reason it's not default is because it was originally
  85. set to `epmfs` and changing it now would change people's setup. Such a
  86. setting change will likely occur in mergerfs 3.
  87. ## Why does the total available space in mergerfs not equal outside?
  88. Are you using ext2/3/4? With reserve for root? mergerfs uses available
  89. space for statfs calculations. If you've reserved space for root then
  90. it won't show up.
  91. You can remove the reserve by running: `tune2fs -m 0 <device>`
  92. ## I notice massive slowdowns of writes when enabling cache.files.
  93. When file caching is enabled in any form (`cache.files!=off`) it will
  94. issue `getxattr` requests for `security.capability` prior to _every
  95. single write_. This will usually result in performance degradation,
  96. especially when using a network filesystem (such as NFS or SMB.)
  97. Unfortunately at this moment, the kernel is not caching the response.
  98. To work around this situation mergerfs offers a few solutions.
  99. 1. Set `security_capability=false`. It will short circuit any call and
  100. return `ENOATTR`. This still means though that mergerfs will
  101. receive the request before every write but at least it doesn't get
  102. passed through to the underlying filesystem.
  103. 2. Set `xattr=noattr`. Same as above but applies to _all_ calls to
  104. getxattr. Not just `security.capability`. This will not be cached
  105. by the kernel either but mergerfs' runtime config system will still
  106. function.
  107. 3. Set `xattr=nosys`. Results in mergerfs returning `ENOSYS` which
  108. _will_ be cached by the kernel. No future xattr calls will be
  109. forwarded to mergerfs. The downside is that also means the xattr
  110. based config and query functionality won't work either.
  111. 4. Disable file caching. If you aren't using applications which use
  112. `mmap` it's probably simpler to just disable it altogether. The
  113. kernel won't send the requests when caching is disabled.
  114. ## Why can't I see my files / directories?
  115. It's almost always a permissions issue. Unlike mhddfs and
  116. unionfs-fuse, which runs as root and attempts to access content as
  117. such, mergerfs always changes its credentials to that of the
  118. caller. This means that if the user does not have access to a file or
  119. directory than neither will mergerfs. However, because mergerfs is
  120. creating a union of paths it may be able to read some files and
  121. directories on one filesystem but not another resulting in an
  122. incomplete set.
  123. Whenever you run into a split permission issue (seeing some but not
  124. all files) try using
  125. [mergerfs.fsck](https://github.com/trapexit/mergerfs-tools) tool to
  126. check for and fix the mismatch. If you aren't seeing anything at all
  127. be sure that the basic permissions are correct. The user and group
  128. values are correct and that directories have their executable bit
  129. set. A common mistake by users new to Linux is to `chmod -R 644` when
  130. they should have `chmod -R u=rwX,go=rX`.
  131. If using a network filesystem such as NFS or SMB (Samba) be sure to
  132. pay close attention to anything regarding permissioning and
  133. users. Root squashing and user translation for instance has bitten a
  134. few mergerfs users. Some of these also affect the use of mergerfs from
  135. container platforms such as Docker.
  136. ## Why use FUSE? Why not a kernel based solution?
  137. As with any solution to a problem, there are advantages and
  138. disadvantages to each one.
  139. A FUSE based solution has all the downsides of FUSE:
  140. - Higher IO latency due to the trips in and out of kernel space
  141. - Higher general overhead due to trips in and out of kernel space
  142. - Double caching when using page caching
  143. - Misc limitations due to FUSE's design
  144. But FUSE also has a lot of upsides:
  145. - Easier to offer a cross platform solution
  146. - Easier forward and backward compatibility
  147. - Easier updates for users
  148. - Easier and faster release cadence
  149. - Allows more flexibility in design and features
  150. - Overall easier to write, secure, and maintain
  151. - Much lower barrier to entry (getting code into the kernel takes a
  152. lot of time and effort initially)
  153. ## Is my OS's libfuse needed for mergerfs to work?
  154. No. Normally `mount.fuse` is needed to get mergerfs (or any FUSE
  155. filesystem to mount using the `mount` command but in vendoring the
  156. libfuse library the `mount.fuse` app has been renamed to
  157. `mount.mergerfs` meaning the filesystem type in `fstab` can simply be
  158. `mergerfs`. That said there should be no harm in having it installed
  159. and continuing to using `fuse.mergerfs` as the type in `/etc/fstab`.
  160. If `mergerfs` doesn't work as a type it could be due to how the
  161. `mount.mergerfs` tool was installed. Must be in `/sbin/` with proper
  162. permissions.
  163. ## Why was splice support removed?
  164. After a lot of testing over the years, splicing always appeared to
  165. at best, provide equivalent performance, and in some cases, worse
  166. performance. Splice is not supported on other platforms forcing a
  167. traditional read/write fallback to be provided. The splice code was
  168. removed to simplify the codebase.
  169. ## How does mergerfs handle credentials?
  170. mergerfs is a multithreaded application in order to handle requests
  171. from the kernel concurrently. Each FUSE message has a header with
  172. certain details about the request include the process ID (pid) of the
  173. requestion application, the process' effective user id (uid), and
  174. group id (gid). To ensure proper POSIX filesystem behavior and
  175. security mergerfs must change its identity to match that of the
  176. requester when performing the core filesystem function on the
  177. underlying filesystem. On most Unix/POSIX based system a process and
  178. all its threads are under the same uid and gid. However, on Linux each
  179. thread may have its own credentials. This allows mergerfs to be
  180. multithreaded and for each thread to change to the credentials
  181. (seteuid,setegid) as required by the incoming message it is
  182. handling. However, on FreeBSD this is not possible at the moment
  183. (though there has been
  184. [discussions](https://wiki.freebsd.org/Per-Thread%20Credentials) and
  185. as such must change the credentials of the whole application when
  186. actioning messages. mergerfs does optimize this behavior by only
  187. changing credentials and locking the thread to do so if the process is
  188. currently not the same as what is necessary by the incoming request.