You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

213 lines
10 KiB

  1. # CACHING
  2. #### page caching
  3. https://en.wikipedia.org/wiki/Page_cache
  4. - cache.files=off: Disables page caching. Underlying files cached,
  5. mergerfs files are not.
  6. - cache.files=partial: Enables page caching. Underlying files cached,
  7. mergerfs files cached while open.
  8. - cache.files=full: Enables page caching. Underlying files cached,
  9. mergerfs files cached across opens.
  10. - cache.files=auto-full: Enables page caching. Underlying files
  11. cached, mergerfs files cached across opens if mtime and size are
  12. unchanged since previous open.
  13. - cache.files=libfuse: follow traditional libfuse `direct_io`,
  14. `kernel_cache`, and `auto_cache` arguments.
  15. - cache.files=per-process: Enable page caching (equivalent to
  16. `cache.files=partial`) only for processes whose 'comm' name matches
  17. one of the values defined in `cache.files.process-names`. If the
  18. name does not match the file open is equivalent to
  19. `cache.files=off`.
  20. FUSE, which mergerfs uses, offers a number of page caching modes. mergerfs tries to simplify their use via the `cache.files`
  21. option. It can and should replace usage of `direct_io`,
  22. `kernel_cache`, and `auto_cache`.
  23. Due to mergerfs using FUSE and therefore being a userland process
  24. proxying existing filesystems the kernel will double cache the content
  25. being read and written through mergerfs. Once from the underlying
  26. filesystem and once from mergerfs (it sees them as two separate
  27. entities). Using `cache.files=off` will keep the double caching from
  28. happening by disabling caching of mergerfs but this has the side
  29. effect that _all_ read and write calls will be passed to mergerfs
  30. which may be slower than enabling caching, you lose shared `mmap`
  31. support which can affect apps such as rtorrent, and no read-ahead will
  32. take place. The kernel will still cache the underlying filesystem data
  33. but that only helps so much given mergerfs will still process all
  34. requests.
  35. If you do enable file page caching,
  36. `cache.files=partial|full|auto-full`, you should also enable
  37. `dropcacheonclose` which will cause mergerfs to instruct the kernel to
  38. flush the underlying file's page cache when the file is closed. This
  39. behavior is the same as the rsync fadvise / drop cache patch and Feh's
  40. nocache project.
  41. If most files are read once through and closed (like media) it is best
  42. to enable `dropcacheonclose` regardless of caching mode in order to
  43. minimize buffer bloat.
  44. It is difficult to balance memory usage, cache bloat & duplication,
  45. and performance. Ideally, mergerfs would be able to disable caching for
  46. the files it reads/writes but allow page caching for itself. That
  47. would limit the FUSE overhead. However, there isn't a good way to
  48. achieve this. It would need to open all files with O_DIRECT which
  49. places limitations on what the underlying filesystems would be
  50. supported and complicates the code.
  51. kernel documentation: https://www.kernel.org/doc/Documentation/filesystems/fuse-io.txt
  52. #### entry & attribute caching
  53. Given the relatively high cost of FUSE due to the kernel <-> userspace
  54. round trips there are kernel side caches for file entries and
  55. attributes. The entry cache limits the `lookup` calls to mergerfs
  56. which ask if a file exists. The attribute cache limits the need to
  57. make `getattr` calls to mergerfs which provide file attributes (mode,
  58. size, type, etc.). As with the page cache these should not be used if
  59. the underlying filesystems are being manipulated at the same time as
  60. it could lead to odd behavior or data corruption. The options for
  61. setting these are `cache.entry` and `cache.negative_entry` for the
  62. entry cache and `cache.attr` for the attributes
  63. cache. `cache.negative_entry` refers to the timeout for negative
  64. responses to lookups (non-existent files).
  65. #### writeback caching
  66. When `cache.files` is enabled the default is for it to perform
  67. writethrough caching. This behavior won't help improve performance as
  68. each write still goes one for one through the filesystem. By enabling
  69. the FUSE writeback cache small writes may be aggregated by the kernel
  70. and then sent to mergerfs as one larger request. This can greatly
  71. improve the throughput for apps which write to files
  72. inefficiently. The amount the kernel can aggregate is limited by the
  73. size of a FUSE message. Read the `fuse_msg_size` section for more
  74. details.
  75. There is a small side effect as a result of enabling writeback
  76. caching. Underlying files won't ever be opened with O_APPEND or
  77. O_WRONLY. The former because the kernel then manages append mode and
  78. the latter because the kernel may request file data from mergerfs to
  79. populate the write cache. The O_APPEND change means that if a file is
  80. changed outside of mergerfs it could lead to corruption as the kernel
  81. won't know the end of the file has changed. That said any time you use
  82. caching you should keep from using the same file outside of mergerfs
  83. at the same time.
  84. Note that if an application is properly sizing writes then writeback
  85. caching will have little or no effect. It will only help with writes
  86. of sizes below the FUSE message size (128K on older kernels, 1M on
  87. newer).
  88. #### statfs caching
  89. Of the syscalls used by mergerfs in policies the `statfs` / `statvfs`
  90. call is perhaps the most expensive. It's used to find out the
  91. available space of a filesystem and whether it is mounted
  92. read-only. Depending on the setup and usage pattern these queries can
  93. be relatively costly. When `cache.statfs` is enabled all calls to
  94. `statfs` by a policy will be cached for the number of seconds its set
  95. to.
  96. Example: If the create policy is `mfs` and the timeout is 60 then for
  97. that 60 seconds the same filesystem will be returned as the target for
  98. creates because the available space won't be updated for that time.
  99. #### symlink caching
  100. As of version 4.20 Linux supports symlink caching. Significant
  101. performance increases can be had in workloads which use a lot of
  102. symlinks. Setting `cache.symlinks=true` will result in requesting
  103. symlink caching from the kernel only if supported. As a result it's
  104. safe to enable it on systems prior to 4.20. That said it is disabled
  105. by default for now. You can see if caching is enabled by querying the
  106. xattr `user.mergerfs.cache.symlinks` but given it must be requested at
  107. startup you can not change it at runtime.
  108. #### readdir caching
  109. As of version 4.20 Linux supports readdir caching. This can have a
  110. significant impact on directory traversal. Especially when combined
  111. with entry (`cache.entry`) and attribute (`cache.attr`)
  112. caching. Setting `cache.readdir=true` will result in requesting
  113. readdir caching from the kernel on each `opendir`. If the kernel
  114. doesn't support readdir caching setting the option to `true` has no
  115. effect. This option is configurable at runtime via xattr
  116. `user.mergerfs.cache.readdir`.
  117. #### tiered caching
  118. Some storage technologies support what some call "tiered" caching. The
  119. placing of usually smaller, faster storage as a transparent cache to
  120. larger, slower storage. NVMe, SSD, Optane in front of traditional HDDs
  121. for instance.
  122. mergerfs does not natively support any sort of tiered caching. Most
  123. users have no use for such a feature and its inclusion would
  124. complicate the code. However, there are a few situations where a cache
  125. filesystem could help with a typical mergerfs setup.
  126. 1. Fast network, slow filesystems, many readers: You've a 10+Gbps network
  127. with many readers and your regular filesystems can't keep up.
  128. 2. Fast network, slow filesystems, small'ish bursty writes: You have a
  129. 10+Gbps network and wish to transfer amounts of data less than your
  130. cache filesystem but wish to do so quickly.
  131. With #1 it's arguable if you should be using mergerfs at all. RAID
  132. would probably be the better solution. If you're going to use mergerfs
  133. there are other tactics that may help: spreading the data across
  134. filesystems (see the mergerfs.dup tool) and setting `func.open=rand`,
  135. using `symlinkify`, or using dm-cache or a similar technology to add
  136. tiered cache to the underlying device.
  137. With #2 one could use dm-cache as well but there is another solution
  138. which requires only mergerfs and a cronjob.
  139. 1. Create 2 mergerfs pools. One which includes just the slow devices
  140. and one which has both the fast devices (SSD,NVME,etc.) and slow
  141. devices.
  142. 2. The 'cache' pool should have the cache filesystems listed first.
  143. 3. The best `create` policies to use for the 'cache' pool would
  144. probably be `ff`, `epff`, `lfs`, or `eplfs`. The latter two under
  145. the assumption that the cache filesystem(s) are far smaller than the
  146. backing filesystems. If using path preserving policies remember that
  147. you'll need to manually create the core directories of those paths
  148. you wish to be cached. Be sure the permissions are in sync. Use
  149. `mergerfs.fsck` to check / correct them. You could also set the
  150. slow filesystems mode to `NC` though that'd mean if the cache
  151. filesystems fill you'd get "out of space" errors.
  152. 4. Enable `moveonenospc` and set `minfreespace` appropriately. To make
  153. sure there is enough room on the "slow" pool you might want to set
  154. `minfreespace` to at least as large as the size of the largest
  155. cache filesystem if not larger. This way in the worst case the
  156. whole of the cache filesystem(s) can be moved to the other drives.
  157. 5. Set your programs to use the cache pool.
  158. 6. Save one of the below scripts or create you're own.
  159. 7. Use `cron` (as root) to schedule the command at whatever frequency
  160. is appropriate for your workflow.
  161. ##### time based expiring
  162. Move files from cache to backing pool based only on the last time the
  163. file was accessed. Replace `-atime` with `-amin` if you want minutes
  164. rather than days. May want to use the `fadvise` / `--drop-cache`
  165. version of rsync or run rsync with the tool "nocache".
  166. _NOTE:_ The arguments to these scripts include the cache
  167. **filesystem** itself. Not the pool with the cache filesystem. You
  168. could have data loss if the source is the cache pool.
  169. [mergerfs.time-based-mover](https://raw.githubusercontent.com/trapexit/mergerfs/refs/heads/latest-release/tools/mergerfs.time-based-mover)
  170. ##### percentage full expiring
  171. Move the oldest file from the cache to the backing pool. Continue till
  172. below percentage threshold.
  173. _NOTE:_ The arguments to these scripts include the cache
  174. **filesystem** itself. Not the pool with the cache filesystem. You
  175. could have data loss if the source is the cache pool.
  176. [mergerfs.percent-full-mover](https://raw.githubusercontent.com/trapexit/mergerfs/refs/heads/latest-release/tools/mergerfs.percent-full-mover)