You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

271 lines
11 KiB

  1. # Known Issues and Bugs
  2. ## mergerfs
  3. ### Supplemental user groups
  4. Due to the overhead of
  5. [getgroups/setgroups](http://linux.die.net/man/2/setgroups) mergerfs
  6. utilizes a cache. This cache is opportunistic and per thread. Each
  7. thread will query the supplemental groups for a user when that
  8. particular thread needs to change credentials and will keep that data
  9. for the lifetime of the thread. This means that if a user is added to
  10. a group it may not be picked up without the restart of
  11. mergerfs. In the future this may be improved to allow a periodic or
  12. manual clearing of the cache.
  13. While not a bug some users have found when using containers that
  14. supplemental groups defined inside the container don't work as
  15. expected. Since mergerfs lives outside the container it is querying
  16. the host's group database. Effectively containers have their own user
  17. and group definitions unless setup otherwise just as different systems
  18. would.
  19. Users should mount in the host group file into the containers or use a
  20. standard shared user & groups technology like NIS or LDAP.
  21. ### directory mtime is not being updated
  22. Remember that the default policy for `getattr` is `ff`. The
  23. information for the first directory found will be returned. If it
  24. wasn't the directory which had been updated then it will appear
  25. outdated.
  26. The reason this is the default is because any other policy would be
  27. more expensive and for many applications it is unnecessary. To always
  28. return the directory with the most recent mtime or a faked value based
  29. on all found would require a scan of all filesystems.
  30. If you always want the directory information from the one with the
  31. most recent mtime then use the `newest` policy for `getattr`.
  32. ### 'mv /mnt/pool/foo /mnt/disk1/foo' removes 'foo'
  33. This is not a bug.
  34. Run in verbose mode to better understand what's happening:
  35. ```
  36. $ mv -v /mnt/pool/foo /mnt/disk1/foo
  37. copied '/mnt/pool/foo' -> '/mnt/disk1/foo'
  38. removed '/mnt/pool/foo'
  39. $ ls /mnt/pool/foo
  40. ls: cannot access '/mnt/pool/foo': No such file or directory
  41. ```
  42. `mv`, when working across devices, is copying the source to target and
  43. then removing the source. Since the source **is** the target in this
  44. case, depending on the unlink policy, it will remove the just copied
  45. file and other files across the branches.
  46. If you want to move files to one filesystem just copy them there and
  47. use mergerfs.dedup to clean up the old paths or manually remove them
  48. from the branches directly.
  49. ### cached memory appears greater than it should be
  50. Use `cache.files=off` and/or `dropcacheonclose=true`. See the section
  51. on [page caching](config/cache.md).
  52. ### NFS clients returning ESTALE / Stale file handle
  53. NFS generally does not like out of band changes. Take a look at the
  54. section on NFS in the [remote-filesystems](remote_filesystems.md) for
  55. more details.
  56. ### rtorrent fails with ENODEV (No such device)
  57. Be sure to set
  58. [cache.files=partial|full|auto-full|per-process](config/cache.md)
  59. or use Linux kernel v6.6 or above. rtorrent and some other
  60. applications use [mmap](http://linux.die.net/man/2/mmap) to read and
  61. write to files and offer no fallback to traditional methods.
  62. ### Plex / Jellyfin doesn't work with mergerfs
  63. It does. If you're trying to put the software's config / metadata /
  64. database on mergerfs you can't set
  65. [cache.files=off](config/cache.md) (unless you use Linux v6.6 or
  66. above) because Plex is using **sqlite3** with **mmap** enabled.
  67. That said it is recommended that config and runtime files be stored on
  68. SSDs on a regular filesystem for performance reasons and if you are
  69. using HDDs in your pool to help limit spinup.
  70. Other software that leverages **sqlite3** which require **mmap**
  71. includes Radarr, Sonarr, Lidarr.
  72. It is recommended that you reach out to the developers of the software
  73. you're having troubles with and asking them to add a fallback to
  74. regular file IO when **mmap** is unavailable. It is not only more
  75. compatible and resilient but also can be more performant in certain
  76. situations.
  77. If the issue is that quick scanning doesn't seem to pick up media then
  78. be sure to set `func.getattr=newest`, though generally, a full scan
  79. will pick up all media anyway.
  80. ### When a program tries to move or rename a file it fails
  81. Please read the docs regarding [rename and
  82. link](config/rename_and_link.md).
  83. The problem is that many applications do not properly handle `EXDEV`
  84. errors which `rename` and `link` may return even though they are
  85. perfectly valid situations which do not indicate actual device,
  86. filesystem, or OS errors. The error will only be returned by mergerfs
  87. if using a path preserving policy as described in the policy section
  88. above. If you do not care about path preservation simply change the
  89. mergerfs policy to the non-path preserving version. For example: `-o
  90. category.create=mfs` Ideally the offending software would be fixed and
  91. it is recommended that if you run into this problem you contact the
  92. software's author and request proper handling of `EXDEV` errors.
  93. ### my 32bit software has problems
  94. Some software have problems with 64bit inode values. The symptoms can
  95. include EOVERFLOW errors when trying to list files. You can address
  96. this by setting `inodecalc` to one of the 32bit based algos as
  97. described in the relevant section.
  98. ### Moving files and directories fails with Samba
  99. Workaround: Copy the file/directory and then remove the original
  100. rather than move.
  101. This isn't an issue with Samba but some SMB clients. GVFS-fuse v1.20.3
  102. and prior (found in Ubuntu 14.04 among others) failed to handle
  103. certain error codes correctly. Particularly `STATUS_NOT_SAME_DEVICE`
  104. which comes from the `EXDEV` that is returned by `rename` when the
  105. call is crossing mount points. When a program gets an `EXDEV` it needs
  106. to explicitly take an alternate action to accomplish its goal. In the
  107. case of `mv` or similar it tries `rename` and on `EXDEV` falls back to
  108. a copying the file to the destination and deleting the source. In
  109. these older versions of GVFS-fuse if it received `EXDEV` it would
  110. translate that into `EIO`. This would cause `mv` or most any
  111. application attempting to move files around on that SMB share to fail
  112. with a generic IO error.
  113. [GVFS-fuse v1.22.0](https://bugzilla.gnome.org/show_bug.cgi?id=734568)
  114. and above fixed this issue but a large number of systems use the older
  115. release. On Ubuntu, the version can be checked by issuing `apt-cache
  116. showpkg gvfs-fuse`. Most distros released in 2015 seem to have the
  117. updated release and will work fine but older systems may
  118. not. Upgrading gvfs-fuse or the distro in general will address the
  119. problem.
  120. In Apple's MacOSX 10.9 they replaced Samba (client and server) with
  121. their own product. It appears their new client does not handle
  122. `EXDEV` either and responds similarly to older releases of gvfs on
  123. Linux.
  124. ### Trashing files occasionally fails
  125. This is the same issue as with Samba. `rename` returns `EXDEV` (in our
  126. case that will really only happen with path preserving policies like
  127. `epmfs`) and the software doesn't handle the situation well. This is
  128. unfortunately a common failure of software which moves files
  129. around. The standard indicates that an implementation **MAY** choose
  130. to support non-user home directory trashing of files (which is a
  131. **MUST**). The implementation **MAY** also support "top directory
  132. trashes" which many probably do.
  133. To create a `$topdir/.Trash` directory as defined in the standard use
  134. the [mergerfs-tools](https://github.com/trapexit/mergerfs-tools) tool
  135. `mergerfs.mktrash`.
  136. ## FUSE and Linux kernel
  137. There have been a number of kernel issues / bugs over the years which
  138. mergerfs has run into. Here is a list of them for reference and
  139. posterity.
  140. ### NFS and EIO errors
  141. [https://lore.kernel.org/linux-fsdevel/20240228160213.1988854-1-mszeredi@redhat.com/T/](https://lore.kernel.org/linux-fsdevel/20240228160213.1988854-1-mszeredi@redhat.com/T/)
  142. Over the years some users have reported that while exporting mergerfs
  143. via NFS, after significant filesystem activity, not only will the NFS
  144. client start returning ESTALE and EIO errors but mergerfs itself would
  145. start returning EIO errors. The problem was that no one could
  146. reliability reproduce the issue. After a string of reports in late
  147. 2023 and early 2024 more investigation was done.
  148. In Linux 5.14 new validation was put into FUSE which caught a few
  149. invalid situations and would tag a FUSE node as invalid if a check
  150. failed. Such checks include invalid file type, changing of type from
  151. one request to another, a size greater than 63bit, and the generation
  152. of a inode changing while in use.
  153. What happened was that mergerfs was using a different fixed, non-zero
  154. value for the generation of all nodes as it was suggested that unique
  155. inode + generation pairs are needed for proper integration with
  156. NFS. That non-zero value was being sent back to the kernel when a
  157. lookup request was made for root. The reason this was hard to track
  158. down was because NFS almost uniquely uses an API which can lead to a
  159. lookup of the root node that simply won't happen under normal
  160. workloads and usage. And that lookup will only happen if child nodes
  161. of the root were forgotten but NFS still had a handle to that node and
  162. later asked for details about it. It would trigger a set of requests
  163. to lookup info on those nodes.
  164. This wasn't a bug in FUSE but mergerfs. However, the incorrect
  165. behavior of mergerfs lead to FUSE behave in an unexpected and
  166. incorrect manner. It would issue a lookup of the "parent of a child of
  167. the root" and mergerfs would send the invalid generation value. As a
  168. result the kernel would mark the root node as "bad" which would then
  169. trigger the kernel to issue a "forget root" message. In between those
  170. it would issue a request for the parent of the root... which doesn't
  171. exist.
  172. So the kernel was doing two invalid things. Requesting the parent of
  173. the root and then when that failed issuing a forget for the
  174. root. These led to chasing after the wrong possible causes.
  175. The change was for FUSE to revert the marking of root node bad if the
  176. generation is non-zero and warn about it. It will mark the node bad
  177. but not unhash/forget/remove it.
  178. mergerfs in v2.40.1 ensures that generation for root is always 0 on
  179. lookup which should work across any kernel version.
  180. ### Truncated files
  181. This was a bug with `mmap` and `FUSE` on 32bit platforms. Should be fixed in all LTS releases.
  182. * [https://marc.info/?l=linux-fsdevel&m=155550785230874&w=2](https://marc.info/?l=linux-fsdevel&m=155550785230874&w=2)
  183. ### Crashing on OpenVZ
  184. There was a bug in the OpenVZ kernel with regard to how it handles `ioctl` calls. It was making invalid requests which would lead to crashes due to mergerfs not expecting them.
  185. * [https://bugs.openvz.org/browse/OVZ-7145](https://bugs.openvz.org/browse/OVZ-7145)
  186. * [https://www.mail-archive.com/devel@openvz.org/msg37096.html](https://www.mail-archive.com/devel@openvz.org/msg37096.html)
  187. ### Really bad mmap performance
  188. There was a bug in caching which affects overall performance of `mmap` through `FUSE` in Linux 4.x kernels. It is fixed in 4.4.10 and 4.5.4.
  189. * [https://lkml.org/lkml/2016/3/16/260](https://lkml.org/lkml/2016/3/16/260)
  190. * [https://lkml.org/lkml/2016/5/11/59](https://lkml.org/lkml/2016/5/11/59)
  191. ### Heavy load and memory pressure leads to kernel panic
  192. * [https://lkml.org/lkml/2016/9/14/527](https://lkml.org/lkml/2016/9/14/527)
  193. * [https://lkml.org/lkml/2016/10/4/1](https://lkml.org/lkml/2016/10/4/1)
  194. * [https://www.theregister.com/2016/10/05/linus_torvalds_admits_buggy_crap_made_it_into_linux_48/](https://www.theregister.com/2016/10/05/linus_torvalds_admits_buggy_crap_made_it_into_linux_48/)