Browse Source

Misc documentation updates (#1505)

pull/1506/head
trapexit 2 months ago
committed by GitHub
parent
commit
6ae9641bcc
No known key found for this signature in database GPG Key ID: B5690EEEBB952194
  1. 1
      buildtools/build-mergerfs
  2. 2
      buildtools/build-release
  3. 2
      mkdocs/docs/extended_usage_patterns.md
  4. 10
      mkdocs/docs/faq/recommendations_and_warnings.md
  5. 30
      mkdocs/docs/faq/technical_behavior_and_limitations.md
  6. 46
      mkdocs/docs/index.md
  7. 56
      mkdocs/docs/known_issues_bugs.md
  8. 2
      mkdocs/docs/performance.md
  9. 70
      mkdocs/docs/project_comparisons.md
  10. 31
      mkdocs/docs/resource_usage.md
  11. 3
      mkdocs/docs/support.md
  12. 101
      mkdocs/docs/tooling.md
  13. 3
      mkdocs/mkdocs.yml

1
buildtools/build-mergerfs

@ -8,7 +8,6 @@ git \
clone \
--single-branch \
--branch="${BRANCH}" \
--depth=1 \
"${REPO_URL}" \
"${SRCDIR}"

2
buildtools/build-release

@ -23,6 +23,8 @@ def build(containerfile,
# TODO: Capture output and write to log
print(args)
rv = subprocess.run(args)
os.makedirs(pkgdirpath,exist_ok=True)
report_filepath = os.path.join(pkgdirpath,"build-report.txt")
with open(report_filepath,"a+") as f:
build = os.path.basename(containerfile)

2
mkdocs/docs/usage_patterns.md → mkdocs/docs/extended_usage_patterns.md

@ -1,4 +1,4 @@
# Usage Patterns
# Extended Usage Patterns
## tiered cache

10
mkdocs/docs/faq/recommendations_and_warnings.md

@ -2,16 +2,20 @@
## What should mergerfs NOT be used for?
- databases: Even if the database stored data in separate files
* databases: Even if the database stored data in separate files
(mergerfs wouldn't offer much otherwise) the higher latency of the
indirection will really harm performance. If it is a lightly used
sqlite3 database then it should be fine.
- VM images: For the same reasons as databases. VM images are accessed
* VM images: For the same reasons as databases. VM images are accessed
very aggressively and mergerfs will introduce a lot of extra latency.
- As replacement for RAID: mergerfs is just for pooling branches. If
* As replacement for RAID: mergerfs is just for pooling branches. If
you need that kind of device performance aggregation or high
availability you should stick with RAID. However, it is fine to put
a filesystem which is on a RAID setup in mergerfs.
**However, if using [passthrough](../config/passthrough.md) the above
situations are less likely to be a concern. Best to do testing for
your specific use case.**
## It's mentioned that there are some security issues with mhddfs. What are they? How does mergerfs address them?

30
mkdocs/docs/faq/technical_behavior_and_limitations.md

@ -172,20 +172,26 @@ removed to simplify the codebase.
mergerfs is a multithreaded application in order to handle requests
from the kernel concurrently. Each FUSE message has a header with
certain details about the request include the process ID (pid) of the
requesting application, the process' effective user id (uid), and
certain details about the request including the process ID (pid) of
the requesting application, the process' effective user id (uid), and
group id (gid). To ensure proper POSIX filesystem behavior and
security mergerfs must change its identity to match that of the
requester when performing the core filesystem function on the
underlying filesystem. On most Unix/POSIX based system a process and
all its threads are under the same uid and gid. However, on Linux each
thread may have its own credentials. This allows mergerfs to be
multithreaded and for each thread to change to the credentials
(seteuid,setegid) as required by the incoming message it is
handling. However, on FreeBSD this is not possible at the moment
(though there has been
[discussions](https://wiki.freebsd.org/Per-Thread%20Credentials) and
requester when performing the certain functions on the underlying
filesystem. As required by standards most Unix/POSIX based systems a
process and all its threads are under the same uid and gid. However,
on Linux each thread **may** have its own credentials. This allows
mergerfs to be multithreaded and for each thread to change to the
credentials as required by the incoming message it is
handling. However, currently on FreeBSD this is not possible (though
there has been
[discussions](https://wiki.freebsd.org/Per-Thread%20Credentials)) and
as such must change the credentials of the whole application when
actioning messages. mergerfs does optimize this behavior by only
changing credentials and locking the thread to do so if the process is
currently not the same as what is necessary by the incoming request.
currently not the same as what is necessary by the incoming
request. As a result of this design FreeBSD may experience more
contention and therefore lower performance than Linux.
Additionally, mergerfs [utilizes a cache for supplemental
groups](../known_issues_bugs.md#supplemental-user-groups) due the the
high cost of querying that information.

46
mkdocs/docs/index.md

@ -4,23 +4,27 @@
[FUSE](https://en.wikipedia.org/wiki/Filesystem_in_Userspace) based
[union filesystem](https://en.wikipedia.org/wiki/Union_mount) geared
towards simplifying storage and management of files across numerous
commodity storage devices. It is similar to **mhddfs**, **unionfs**,
and **aufs**.
commodity storage devices. It is similar to [**mhddfs**, **unionfs**,
**aufs**, **DrivePool**, etc.](project_comparisons.md).
## Features
* Logically combine numerous filesystems/paths into a single
mount point
mount point (JBOFS: Just a Bunch of FileSystems)
* Combine paths of the same or different filesystems
* Ability to add or remove filesystems/paths without impacting the
rest of the data
* Unaffected by individual filesystem failure
* Configurable file selection and creation placement
* File IO [passthrough](config/passthrough.md) for near native IO
performance (where supported)
* Works with filesystems of any size
* Works with filesystems of almost any type
* Works with filesystems of [almost any
type](faq/compatibility_and_integration.md#what-filesystems-can-be-used-as-branches)
* Ignore read-only filesystems when creating files
* Hard link copy-on-write / CoW
* Runtime configurable
* Hard link [copy-on-write / CoW](config/link_cow.md)
* [Runtime configurable](runtime_interface.md)
* Support for extended attributes (xattrs)
* Support for file attributes (chattr)
* Support for POSIX ACLs
@ -30,7 +34,7 @@ and **aufs**.
* Read/write overlay on top of read-only filesystem like OverlayFS
* File whiteout
* RAID like parity calculation
* RAID like parity calculation (see [SnapRAID](https://www.snapraid.it))
* Redundancy
* Splitting of files across branches
@ -64,21 +68,23 @@ A + B = C
| | |
+-- /dir1 +-- /dir1 +-- /dir1
| | | | | |
| +-- file1 | +-- file2 | +-- file1
| | +-- file3 | +-- file2
+-- /dir2 | | +-- file3
| | +-- /dir3 |
| +-- file4 | +-- /dir2
| +-- file5 | |
+-- file6 | +-- file4
|
+-- /dir3
| |
| +-- file5
|
+-- file6
| +-- file1 | | | +-- file1
| | +-- file2 | +-- file2
| | +-- file3 | +-- file3
| | |
+-- /dir2 | +-- /dir2
| | | | |
| *-- file4 | | +-- file4
| | |
| +-- /dir3 +-- /dir3
| | | | |
| | +-- file5 | +-- file5
| | |
+-- file6 | +-- file6
+-- file7 +-- file7 +-- file7
```
## Getting Started
Head to the [quick start guide](quickstart.md).

56
mkdocs/docs/known_issues_bugs.md

@ -2,6 +2,23 @@
## mergerfs
### FreeBSD version
* FreeBSD doesn't have per thread credentials meaning threads must
block to change credentials as required by numerous filesystem
functions. This impacts performance.
* FreeBSD's FUSE implementation is lacking many features of Linux.
* passthrough
* statx
* lazy umount
* oom_score_adj
* fuse_msg_size
* kernel symlink caching
* kernel readdir caching
* writeback caching
* ...
### Supplemental user groups
#### Supplemental group caching
@ -83,28 +100,30 @@ more details.
### SQLite3, Plex, Jellyfin do not work with mergerfs
It does. If you're trying to put the software's config / metadata /
database on mergerfs you can't set
[cache.files=off](config/cache.md) (unless you use Linux v6.6 or
above) because they are using **sqlite3** with **mmap** enabled.
It does. If you're trying to put the software's config / metadata /
database on mergerfs you can't set [cache.files=off](config/cache.md)
(unless you use Linux v6.6 or above and
[direct-io-allow-mmap](config/options.md) is enabled) because they are
using **sqlite3** with **mmap** enabled and have failed to properly
handle the situation where **mmap** may not be available.
That said it is recommended that config and runtime files be stored on
SSDs on a regular filesystem for performance reasons. See [What should
mergerfs NOT be used for?](faq/recommendations_and_warnings.md).
mergerfs NOT be used
for?](faq/recommendations_and_warnings.md#what-should-mergerfs-not-be-used-for).
Other software that leverages **sqlite3** which require **mmap**
includes Radarr, Sonarr, and Lidarr. That said many programs use
includes Radarr, Sonarr, and Lidarr. However, many programs use
**sqlite3** and do not require **mmap**.
It is recommended that you reach out to the developers of the software
you are having troubles with and asking them to add a fallback to
regular file IO when **mmap** is unavailable. It is not only more
compatible and resilient but also can be more performant in certain
situations.
If the issue is that quick scanning doesn't seem to pick up media then
be sure to set `func.getattr=newest`. That said a full scan will pick
up all media and it will put less load on the host to use time based
you are having troubles with and ask them to add a fallback to regular
file IO when **mmap** is unavailable. It is not only more compatible
but also can be more performant in certain situations.
If the issue is that quick scans do not seem to pick up media then be
sure to set `func.getattr=newest`. That said a full scan will pick up
all media and it will put less load on the host to use time based
library scans or to configure downloading software to trigger a scan
when files are added to the pool. See [Does inotify and fanotify
work?](faq/compatibility_and_integration.md#does-inotify-and-fanotify-work)
@ -169,7 +188,7 @@ the [mergerfs-tools](https://github.com/trapexit/mergerfs-tools) tool
## FUSE and Linux kernel
There have been a number of kernel issues / bugs over the years which
mergerfs has run into. Here is a list of them for reference and
mergerfs users have run into. Here is a list of them for reference and
posterity.
@ -225,14 +244,17 @@ lookup which should work across any kernel version.
### Truncated files
This was a bug with `mmap` and `FUSE` on 32bit platforms. Should be fixed in all LTS releases.
This was a bug with `mmap` and `FUSE` on 32bit platforms. Should be
fixed in all LTS releases.
* [https://marc.info/?l=linux-fsdevel&m=155550785230874&w=2](https://marc.info/?l=linux-fsdevel&m=155550785230874&w=2)
### Crashing on OpenVZ
There was a bug in the OpenVZ kernel with regard to how it handles `ioctl` calls. It was making invalid requests which would lead to crashes due to mergerfs not expecting them.
There was a bug in the OpenVZ kernel with regard to how it handles
`ioctl` calls. It was making invalid requests which would lead to
crashes due to mergerfs not expecting them.
* [https://bugs.openvz.org/browse/OVZ-7145](https://bugs.openvz.org/browse/OVZ-7145)
* [https://www.mail-archive.com/devel@openvz.org/msg37096.html](https://www.mail-archive.com/devel@openvz.org/msg37096.html)

2
mkdocs/docs/performance.md

@ -43,7 +43,7 @@ before changing them to understand how functionality will change.
* disable `async_read`
* use [symlinkify](config/symlinkify.md) if your data is largely
static and read-only
* use [tiered cache](usage_patterns.md) devices
* use [tiered cache](extended_usage_patterns.md) devices
* use LVM and LVM cache to place a SSD in front of your HDDs

70
mkdocs/docs/project_comparisons.md

@ -163,7 +163,7 @@ AnyRAID](https://hexos.com/blog/introducing-zfs-anyraid-sponsored-by-eshtek)
is a feature being developed for ZFS which is intended to provide more
flexibility in ZFS pools. Allowing a mix of capacity disks to have
greater capacity than traditional RAID and would allow for partial
upgrades while keeping live redundency.
upgrades while keeping live redundancy.
This ZFS feature, as of mid-2025, is extremely early in its development
and there are no timelines or estimates for when it may be released.
@ -226,3 +226,71 @@ mergerfs has the feature [symlinkify](config/symlinkify.md) which
provides a similar behavior but is more flexible in that it is not
read-only. That said there can still be some software that won't like
that kind of setup.
## rclone union
rclone's [union](https://rclone.org/union) backend allows you to
create a union of multiple rclone backends and was inspired by
[mergerfs](https://rclone.org/union/#behavior-policies). Given rclone
knows more about the underlying backend than mergerfs could it can be
more efficient than creating a similar union with `rclone mount` and
mergerfs.
However, it is not uncommon to see users setup rclone mounts and
combine them with local or other remote filesystems using mergerfs
given the differing feature sets and focuses of the two projects.
## distributed filesystems
* AFS
* Ceph/CephFS
* GlusterFS
* LizardFS
* MooseFS
* etc.
Distributed remote filesystems come in many forms. Some offering POSIX
filesystem compliance and some not. Some providing remote block
devices or object stores on which a POSIX or POSIX-like filesystem is
built on top of. Some which are effectively distributed union
filesystems with duplication.
These filesystems almost always require a significant amount of
compute to run well and are typically deployed on their own
hardware. Often in an "orchestrators" + "workers" configuration across
numerous nodes. This limits their usefulness for casual and homelab
users. There could also be issues with network congestion and general
performance if using a single network and that network is slower than
the storage devices.
While possible to use a distributed filesystem to combine storage
devices (and typically to provide redundancy) it will require a more
complicated setup and more compute resources than mergerfs (while also
offering a different set of capabilities.)
## 9P
[9P, the Plan 9 Filesystem
Protocol,](https://en.wikipedia.org/wiki/9P_(protocol)) is a protocol
developed for the Plan 9 operation system to help expand on the Unix
idea that everything should be a file. The protocol made its way to
other systems and is still widely used. As such 9P is not directly
comparable to mergerfs but more so to FUSE which mergerfs uses. FUSE
is also a filesystem protocol (though designed for kernel <->
userspace communication rather than over a network). FUSE, even more
than the
[9P2000.L](https://github.com/chaos/diod/blob/master/protocol.md)
variant of 9P, is focused primarily on supporting Linux filesystem
features.
mergerfs leverages FUSE but could have in theory leveraged 9P with a
reduction in features.
While 9P has [extensive
usage](https://docs.kernel.org/filesystems/9p.html) in certain
situations its use in modern userland Linux systems is limited. FUSE
has largely replaced use cases that may have been implemented with 9P
servers in the past.

31
mkdocs/docs/resource_usage.md

@ -0,0 +1,31 @@
# Resource Usage and Management
## Usage
* threads
* configurable number of [threads](config/threads.md)
* reading from kernel
* processing messages from kernel
* readdir concurrency
* memory
* 1MB+ pre reader thread + inflight processing for messages
depending on [fuse_msg_size](config/fuse_msg_size.md)
* buffers allocated temporarily for reading directories
* [gidcache](faq/technical_behavior_and_limitations.md#how-does-mergerfs-handle-credentials)
* FUSE nodes
* noforget forgotten nodes
## Management
* To limit the risk of the Linux kernel's OOM Killer targeting
mergerfs it sets its
[oom_score_adj](https://man7.org/linux/man-pages/man5/proc_pid_oom_score_adj.5.html)
value to -990.
* mergerfs increases [its available file descriptor and file size
limit.](https://www.man7.org/linux/man-pages/man3/setrlimit.3p.html)
* mergerfs lowers its [scheduling
priority](https://man7.org/linux/man-pages/man3/setpriority.3p.html)
to -10 ([by default](config/options.md))
* The [readahead](config/readahead.md) values of mergerfs itself and
managed filesystems can be modified.

3
mkdocs/docs/support.md

@ -22,7 +22,8 @@ directly.](mailto:support@spawn.link)**
* [Information about the broader problem along with any attempted
solutions.](https://xyproblem.info)
* Solution already ruled out and why.
* The details from the output of the `mergerfs.collect-info` tool.
* The details from the output of the
[mergerfs.collect-info](tooling.md#mergerfscollect-info) tool.
* Alternatively:
* Version of mergerfs: `mergerfs --version`
* mergerfs settings / arguments: from fstab, systemd unit, command

101
mkdocs/docs/tooling.md

@ -1,17 +1,64 @@
# Tooling
## mergerfs.collect-info
A tool included in recent releases of `mergerfs` which collects
details about your system and configuration to help providing
[support](support.md).
```text
$ mergerfs.collect-info
* Please have mergerfs mounted before running this tool.
* Upload the following file to your GitHub ticket or put on https://pastebin.com when requesting support.
* /tmp/mergerfs.info.txt
```
## fsck.mergerfs
A tool to help diagnose and solve mergerfs pool issues. Primarily
related to mismatched permissions and ownership.
```text
$ fsck.mergerfs --help
fsck.mergerfs: A tool to help diagnose and solve mergerfs pool issues
USAGE: fsck.mergerfs [OPTIONS] path
POSITIONALS:
path TEXT:DIR REQUIRED mergerfs path
OPTIONS:
-h, --help Print this help message and exit
--fix TEXT:{none,manual,newest,largest} [none]
Will attempt to 'fix' the problem by chown+chmod or copying files
based on a selected file.
* none: Do nothing. Just print details.
* manual: User selects source file.
* newest: Use file with most recent mtime.
* largest: Use file with largest size.
--check-size BOOLEAN [false]
Considers file size in calculating differences
--copy-file BOOLEAN [false]
Copy file rather than chown/chmod to fix
```
## preload.so
EXPERIMENTAL
For some time there has been work to enable passthrough IO in
FUSE. Passthrough IO would allow for near native performance with
regards to reads and writes (at the expense of certain mergerfs
features.) In Linux v6.9 that feature made its way into the kernel
however in a somewhat limited form which is incompatible with aspects
of how mergerfs currently functions. While work will continue to
support passthrough IO in mergerfs this library was created to offer
similar functionality in a more limited way.
For some time there has been work to enable
[passthrough](config/passthrough.md) IO in FUSE. Passthrough IO would
allow for near native performance with regards to reads and writes (at
the expense of certain mergerfs features.) In Linux v6.9 that feature
made its way into the kernel however in a somewhat limited form which
is incompatible with aspects of how mergerfs functions and took longer
to implement as a result. This preloadable library was created as an
alternative to the FUSE [passthrough](config/passthrough.md)
integration.
`/usr/lib/mergerfs/preload.so`
@ -24,11 +71,13 @@ on, reopens the file on the underlying filesystem and returns that
instead. Meaning that you will get native read/write performance
because mergerfs is no longer part of the workflow. Keep in mind that
this also means certain mergerfs features that work by interrupting
the read/write workflow, such as `moveonenospc`, will no longer work.
the read/write workflow, such as
[moveonenospc](config/moveonenospc.md), will no longer work.
Also, understand that this will only work on dynamically linked
software. Anything statically compiled will not work. Many GoLang and
Rust apps are statically compiled.
software (a that dynamically linked with the same general libc version
as the software being used with it.) Anything statically compiled will
not work. Many GoLang and Rust apps are statically compiled.
The library will not interfere with non-mergerfs filesystems. The
library is written to always fallback to returning the mergerfs opened
@ -41,6 +90,7 @@ Thank you to
[nohajc](https://github.com/nohajc/mergerfs-io-passthrough) for
prototyping the idea.
### casual usage
```sh
@ -49,7 +99,8 @@ LD_PRELOAD=/usr/lib/mergerfs/preload.so touch /mnt/mergerfs/filename
Or run `export LD_PRELOAD=/usr/lib/mergerfs/preload.so` in your shell
or place it in your shell config file to have it be picked up by all
software ran from your shell.
software ran from your shell. For instance, in `bash`, add the export
to `.bashrc`.
### Docker and Podman usage
@ -104,14 +155,18 @@ Environment=LD_PRELOAD=/usr/lib/mergerfs/preload.so
## Misc
- https://github.com/trapexit/mergerfs-tools
- mergerfs.ctl: A tool to make it easier to query and configure mergerfs at runtime
- mergerfs.fsck: Provides permissions and ownership auditing and the ability to fix them
- mergerfs.dedup: Will help identify and optionally remove duplicate files
- mergerfs.dup: Ensure there are at least N copies of a file across the pool
- mergerfs.balance: Rebalance files across filesystems by moving them from the most filled to the least filled
- mergerfs.consolidate: move files within a single mergerfs directory to the filesystem with most free space
- https://github.com/trapexit/scorch
- scorch: A tool to help discover silent corruption of files and keep track of files
- https://github.com/trapexit/bbf
- bbf (bad block finder): a tool to scan for and 'fix' hard drive bad blocks and find the files using those blocks
* https://github.com/trapexit/mergerfs-tools
* **Keep in mind these tools are more to provide examples of custom
behaviors that can be build on top of mergerfs. They may not
have all the features you are looking for.**
* mergerfs.ctl: A tool to make it easier to query and configure mergerfs at runtime
* mergerfs.fsck: Provides permissions and ownership auditing and
the ability to fix them (should use `fsck.mergerfs` instead)
* mergerfs.dedup: Will help identify and optionally remove duplicate files
* mergerfs.dup: Ensure there are at least N copies of a file across the pool
* mergerfs.balance: Rebalance files across filesystems by moving them from the most filled to the least filled
* mergerfs.consolidate: move files within a single mergerfs directory to the filesystem with most free space
* https://github.com/trapexit/scorch
* scorch: A tool to help discover silent corruption of files and keep track of files
* https://github.com/trapexit/bbf
* bbf (bad block finder): a tool to scan for and 'fix' hard drive bad blocks and find the files using those blocks

3
mkdocs/mkdocs.yml

@ -95,6 +95,7 @@ nav:
- config/export-support.md
- config/kernel-permissions-check.md
- error_handling_and_logging.md
- resource_usage.md
- runtime_interface.md
- remote_filesystems.md
- tips_notes.md
@ -103,7 +104,7 @@ nav:
- benchmarking.md
- performance.md
- tooling.md
- usage_patterns.md
- extended_usage_patterns.md
- FAQ:
- faq/why_isnt_it_working.md
- faq/reliability_and_scalability.md

Loading…
Cancel
Save