Browse Source

Misc updates of docs (#1518)

pull/1519/head
trapexit 1 month ago
committed by GitHub
parent
commit
5becb334cd
No known key found for this signature in database GPG Key ID: B5690EEEBB952194
  1. 8
      mkdocs/docs/config/cache.md
  2. 26
      mkdocs/docs/config/passthrough.md
  3. 17
      mkdocs/docs/extended_usage_patterns.md
  4. 2
      mkdocs/docs/intro_to_filesystems.md
  5. 9
      mkdocs/docs/quickstart.md
  6. 10
      tools/mergerfs.percent-full-mover

8
mkdocs/docs/config/cache.md

@ -11,7 +11,8 @@ works for mergerfs itself. Not the underlying filesystems.
* `cache.files=full`: Enables page caching. Files are cached across
opens.
* `cache.files=auto-full`: Enables page caching. Files are cached
across opens if mtime and size are unchanged since previous open.
across opens if mtime and size are unchanged since previous
open. Cache is dropped if mtime or size change on open.
* `cache.files=per-process`: Enable page caching (equivalent to
`cache.files=partial`) only for processes whose 'comm' name matches
one of the values defined in cache.files.process-names. If the name
@ -40,6 +41,11 @@ transparently enable page caching when mmap is requested. This means
it should be safe to set `cache.files=off`. However, on Linux v6.5 and
below you will need to configure `cache.files` as you need.
If [passthrough](passthrough.md) is enabled so must be page
caching. mergerfs will set `cache.files=auto-full` if `passthrough` is
enabled. And when using `passthrough` the there is no double page
caching since it is in fact passing through the IO.
[^1]: This is not unique to mergerfs and affects all FUSE
filesystems. It is something that the FUSE community hopes to

26
mkdocs/docs/config/passthrough.md

@ -3,10 +3,10 @@
* default: `off`
* arguments:
* `off`: Passthrough is never enabled.
* `ro`: Only enable passthrough when file opened for reading only.
* `wo`: Only enable passthrough when file opened for writing only.
* `rw`: Enable passthrough when file opened for reading, writing,
or both.
* `ro`: Only enable IO passthrough when file opened for reading only.
* `wo`: Only enable IO passthrough when file opened for writing only.
* `rw`: Enable IO passthrough when file opened for reading, writing,
or both.
In [Linux 6.9](https://kernelnewbies.org/Linux_6.9#Faster_FUSE_I.2FO)
a IO passthrough feature was added to FUSE. Typically `mergerfs` has
@ -57,7 +57,7 @@ file opened. However, at the moment there is no use case for picking
and choosing which to enable outside `cache.files=per-process` (which
is largely unnecessary on Linux v6.6 and above. See
[direct-io-allow-mmap](options.md)) If such a use case arises please
reach out to the author to discuss.
[reach out to the author](../support.md) to discuss.
Unlike [preload.so](../tooling.md#preloadso), `passthrough` will work for
any software interacting with `mergerfs`. However, `passthrough`
@ -67,10 +67,18 @@ requires Linux v6.9 or above to work.
`root` as currently only `root` is allowed to leverage the kernel
feature.
**NOTE:** If a file has been opened and passthrough enabled, while that
file is open, if another open request is made `mergerfs` must also
enable `passthrough` for the second open request. This is a limitation
of how the passthrough feature works.
**NOTE:** If a file has been opened and `passthrough` enabled, while
that file is open, if another open request is made `mergerfs` must
also enable `passthrough` for the second open request. This is a
limitation of how the passthrough feature works. Though there is no
known usecase where this is useful.
**NOTE:** In order to add `passthrough` feature to `mergerfs` it was
necessary to remove the "feature" where mergerfs could open the same
file on different branches. Such as using `func.open=rand` and having
multiple files at the same relative path across different
branches. This "feature" was very very rarely used and it was
impossible to support `passthrough` without changing the behavior.
## Alternatives

17
mkdocs/docs/extended_usage_patterns.md

@ -14,11 +14,11 @@ bottlenecked by their network, internet connection, or limited size of
the cache. However, there are a few situations where a tiered cache
setup could help.
1. Fast network, slow filesystems, many readers: You've a 10+Gbps
1. Fast network, slow filesystems, many readers: You've a 10Gbps+
network with many readers and your regular filesystems can't keep
up.
2. Fast network, slow filesystems, small'ish bursty writes: You have
a 10+Gbps network and wish to transfer amounts of data less than
a 10Gbps+ network and wish to transfer amounts of data less than
your cache filesystem but wish to do so quickly and the time
between bursts is long enough to migrate data.
@ -27,8 +27,8 @@ level that can aggregate performance or using higher performance
storage would probably be the better solution. If you're going to use
mergerfs there are other tactics that may help: spreading the data
across filesystems (see the mergerfs.dup tool) and setting
`func.open=rand`, using `symlinkify`, or using dm-cache or a similar
technology to add tiered cache to the underlying device itself.
`func.open=rand` or using dm-cache or a similar technology to add
tiered cache to the underlying device itself.
With #2 one could use a block cache solution as available via LVM and
dm-cache but there is another solution which requires only mergerfs, a
@ -52,12 +52,19 @@ script to move files around, and a cron job to run said script.
* Set your programs to use the **cache** pool.
* Configure the **base** pool with the `create` policy you would like
to lay out files as you like.
* Save one of the below scripts or create your own. The script's
* Use monstermuffin's
[mergerfs-cache-mover](https://github.com/monstermuffin/mergerfs-cache-mover),
one of the scripts below, or create your own. The script's
responsibility is to move files from the **cache** branches (not
pool) to the **base** pool.
* Use `cron` (as root) to schedule the command at whatever frequency
is appropriate for your workflow.
**NOTE:** Due to the additional overhead it is not recommended to nest
or otherwise create hierarchies of mergerfs pools. It will work but
the latency increases will further harm performance. Even when using
passthrough IO or other features.
### time based expiring

2
mkdocs/docs/intro_to_filesystems.md

@ -51,6 +51,8 @@ those needing that knowledge.
* [file descriptor](https://en.wikipedia.org/wiki/File_descriptor): A
handle used by software, provided by the operating system, to
reference open files.
* [mmap](https://en.wikipedia.org/wiki/Mmap): A way to abstract access
to a file by making it appear as a region of memory.
## Files

9
mkdocs/docs/quickstart.md

@ -30,10 +30,11 @@ caching](config/cache.md) was disabled (ie:
`cache.files=off`). However, it now will enable page caching if needed
for a particular file if `mmap` is requested.
`mmap` is needed by certain software to read and write to a
file. However, many software could work without it and fail to have
proper error handling. Many programs that use sqlite3 will require
`mmap` despite [sqlite3 working perfectly
[mmap](https://en.wikipedia.org/wiki/Mmap) is needed by certain
software to read and write to a file. However, many software could
work without it and fail to have proper error handling for when it is
unavailable. Many programs that use **sqlite3** will require `mmap`
despite [sqlite3 working perfectly
fine](known_issues_bugs.md#sqlite3-plex-jellyfin-do-not-work-with-mergerfs)
without it (and in some cases can be more performant with regular file
IO.)

10
tools/mergerfs.percent-full-mover

@ -7,13 +7,13 @@ fi
CACHEFS="${1}"
BASEPOOL="${2}"
PERCENTAGE=${3}
PERCENTAGE="${3}"
set -o errexit
while [ $(df "${CACHE}" | tail -n1 | awk '{print $5}' | cut -d'%' -f1) -gt ${PERCENTAGE} ]
while [ $(df "${CACHEFS}" | tail -n1 | awk '{print $5}' | cut -d'%' -f1) -gt ${PERCENTAGE} ]
do
# Find the file with the oldest access time
FILE=$(find "${CACHE}" -type f -printf '%A@ %P\n' | \
FILE=$(find "${CACHEFS}" -type f -printf '%A@ %P\n' | \
sort | \
head -n 1 | \
cut -d' ' -f2-)
@ -32,6 +32,6 @@ do
--remove-source-files \
--relative \
--log-file=/tmp/mergerfs-cache-rsync.log \
"${CACHE}/./${FILE}" \
"${BACKING}/"
"${CACHEFS}/./${FILE}" \
"${BASEPOOL}/"
done
Loading…
Cancel
Save