Browse Source

Add "per-process" file caching feature

Allows chosing page caching based on process name.
pull/1139/head
Antonio SJ Musumeci 2 years ago
parent
commit
9effeb6869
  1. 80
      README.md
  2. 8
      src/config.cpp
  3. 2
      src/config.hpp
  4. 4
      src/config_cachefiles.cpp
  5. 3
      src/config_cachefiles.hpp
  6. 43
      src/config_set.cpp
  7. 35
      src/config_set.hpp
  8. 41
      src/fs_openat.hpp
  9. 34
      src/fuse_create.cpp
  10. 35
      src/fuse_open.cpp
  11. 3
      src/mergerfs.cpp
  12. 54
      src/procfs_get_name.cpp
  13. 27
      src/procfs_get_name.hpp
  14. 32
      src/str.cpp
  15. 9
      src/str.hpp

80
README.md

@ -1,6 +1,6 @@
% mergerfs(1) mergerfs user manual
% Antonio SJ Musumeci <trapexit@spawn.link>
% 2023-01-29
% 2023-02-16
# NAME
@ -124,7 +124,7 @@ These options are the same regardless of whether you use them with the `mergerfs
* **dropcacheonclose=BOOL**: When a file is requested to be closed
call `posix_fadvise` on it first to instruct the kernel that we no
longer need the data and it can drop its cache. Recommended when
**cache.files=partial|full|auto-full** to limit double
**cache.files=partial|full|auto-full|per-process** to limit double
caching. (default: false)
* **symlinkify=BOOL**: When enabled and a file is not writable and its
mtime or ctime is older than **symlinkify_timeout** files will be
@ -224,8 +224,12 @@ These options are the same regardless of whether you use them with the `mergerfs
seconds. (default: 1)
* **cache.negative_entry=UINT**: Negative file name lookup cache
timeout in seconds. (default: 0)
* **cache.files=libfuse|off|partial|full|auto-full**: File page
caching mode (default: libfuse)
* **cache.files=libfuse|off|partial|full|auto-full|per-process**: File
page caching mode (default: libfuse)
* **cache.files.process-names=LIST**: A pipe | delimited list of
process [comm](https://man7.org/linux/man-pages/man5/proc.5.html)
names to enable page caching for when
`cache.files=per-process`. (default: "rtorrent|qbittorrent-nox")
* **cache.writeback=BOOL**: Enable kernel writeback caching (default:
false)
* **cache.symlinks=BOOL**: Cache symlinks (if supported by kernel)
@ -771,23 +775,57 @@ While they won't show up when using `getfattr` **mergerfs** offers a number of s
https://en.wikipedia.org/wiki/Page_cache
tl;dr:
* cache.files=off: Disables page caching. Underlying files cached, mergerfs files are not.
* cache.files=partial: Enables page caching. Underlying files cached, mergerfs files cached while open.
* cache.files=full: Enables page caching. Underlying files cached, mergerfs files cached across opens.
* cache.files=auto-full: Enables page caching. Underlying files cached, mergerfs files cached across opens if mtime and size are unchanged since previous open.
* cache.files=libfuse: follow traditional libfuse `direct_io`, `kernel_cache`, and `auto_cache` arguments.
FUSE, which mergerfs uses, offers a number of page caching modes. mergerfs tries to simplify their use via the `cache.files` option. It can and should replace usage of `direct_io`, `kernel_cache`, and `auto_cache`.
Due to mergerfs using FUSE and therefore being a userland process proxying existing filesystems the kernel will double cache the content being read and written through mergerfs. Once from the underlying filesystem and once from mergerfs (it sees them as two separate entities). Using `cache.files=off` will keep the double caching from happening by disabling caching of mergerfs but this has the side effect that *all* read and write calls will be passed to mergerfs which may be slower than enabling caching, you lose shared `mmap` support which can affect apps such as rtorrent, and no read-ahead will take place. The kernel will still cache the underlying filesystem data but that only helps so much given mergerfs will still process all requests.
If you do enable file page caching, `cache.files=partial|full|auto-full`, you should also enable `dropcacheonclose` which will cause mergerfs to instruct the kernel to flush the underlying file's page cache when the file is closed. This behavior is the same as the rsync fadvise / drop cache patch and Feh's nocache project.
If most files are read once through and closed (like media) it is best to enable `dropcacheonclose` regardless of caching mode in order to minimize buffer bloat.
It is difficult to balance memory usage, cache bloat & duplication, and performance. Ideally mergerfs would be able to disable caching for the files it reads/writes but allow page caching for itself. That would limit the FUSE overhead. However, there isn't a good way to achieve this. It would need to open all files with O_DIRECT which places limitations on the what underlying filesystems would be supported and complicates the code.
* cache.files=off: Disables page caching. Underlying files cached,
mergerfs files are not.
* cache.files=partial: Enables page caching. Underlying files cached,
mergerfs files cached while open.
* cache.files=full: Enables page caching. Underlying files cached,
mergerfs files cached across opens.
* cache.files=auto-full: Enables page caching. Underlying files
cached, mergerfs files cached across opens if mtime and size are
unchanged since previous open.
* cache.files=libfuse: follow traditional libfuse `direct_io`,
`kernel_cache`, and `auto_cache` arguments.
* cache.files=per-process: Enable page caching only for processes
which 'comm' name matches one of the values defined in
`cache.files.process-names`.
FUSE, which mergerfs uses, offers a number of page caching
modes. mergerfs tries to simplify their use via the `cache.files`
option. It can and should replace usage of `direct_io`,
`kernel_cache`, and `auto_cache`.
Due to mergerfs using FUSE and therefore being a userland process
proxying existing filesystems the kernel will double cache the content
being read and written through mergerfs. Once from the underlying
filesystem and once from mergerfs (it sees them as two separate
entities). Using `cache.files=off` will keep the double caching from
happening by disabling caching of mergerfs but this has the side
effect that *all* read and write calls will be passed to mergerfs
which may be slower than enabling caching, you lose shared `mmap`
support which can affect apps such as rtorrent, and no read-ahead will
take place. The kernel will still cache the underlying filesystem data
but that only helps so much given mergerfs will still process all
requests.
If you do enable file page caching,
`cache.files=partial|full|auto-full`, you should also enable
`dropcacheonclose` which will cause mergerfs to instruct the kernel to
flush the underlying file's page cache when the file is closed. This
behavior is the same as the rsync fadvise / drop cache patch and Feh's
nocache project.
If most files are read once through and closed (like media) it is best
to enable `dropcacheonclose` regardless of caching mode in order to
minimize buffer bloat.
It is difficult to balance memory usage, cache bloat & duplication,
and performance. Ideally mergerfs would be able to disable caching for
the files it reads/writes but allow page caching for itself. That
would limit the FUSE overhead. However, there isn't a good way to
achieve this. It would need to open all files with O_DIRECT which
places limitations on the what underlying filesystems would be
supported and complicates the code.
kernel documentation: https://www.kernel.org/doc/Documentation/filesystems/fuse-io.txt

8
src/config.cpp

@ -37,11 +37,13 @@
#define MINFREESPACE_DEFAULT (4294967295ULL)
using std::string;
#define IFERT(S) if(S == s_) return true
const std::string CONTROLFILE = "/.mergerfs";
constexpr static const char CACHE_FILES_PROCESS_NAMES_DEFAULT[] =
"rtorrent|"
"qbittorrent-nox";
Config Config::_singleton;
@ -79,6 +81,7 @@ Config::Config()
cache_attr(1),
cache_entry(1),
cache_files(CacheFiles::ENUM::LIBFUSE),
cache_files_process_names(CACHE_FILES_PROCESS_NAMES_DEFAULT),
cache_negative_entry(0),
cache_readdir(false),
cache_statfs(0),
@ -125,6 +128,7 @@ Config::Config()
_map["cache.attr"] = &cache_attr;
_map["cache.entry"] = &cache_entry;
_map["cache.files"] = &cache_files;
_map["cache.files.process-names"] = &cache_files_process_names;
_map["cache.negative_entry"] = &cache_negative_entry;
_map["cache.readdir"] = &cache_readdir;
_map["cache.statfs"] = &cache_statfs;

2
src/config.hpp

@ -30,6 +30,7 @@
#include "config_statfs.hpp"
#include "config_statfsignore.hpp"
#include "config_xattr.hpp"
#include "config_set.hpp"
#include "enum.hpp"
#include "errno.hpp"
#include "funcs.hpp"
@ -107,6 +108,7 @@ public:
ConfigUINT64 cache_attr;
ConfigUINT64 cache_entry;
CacheFiles cache_files;
ConfigSet cache_files_process_names;
ConfigUINT64 cache_negative_entry;
ConfigBOOL cache_readdir;
ConfigUINT64 cache_statfs;

4
src/config_cachefiles.cpp

@ -36,6 +36,8 @@ CacheFiles::to_string() const
return "full";
case CacheFiles::ENUM::AUTO_FULL:
return "auto-full";
case CacheFiles::ENUM::PER_PROCESS:
return "per-process";
}
return "invalid";
@ -55,6 +57,8 @@ CacheFiles::from_string(const std::string &s_)
_data = CacheFiles::ENUM::FULL;
ef(s_ == "auto-full")
_data = CacheFiles::ENUM::AUTO_FULL;
ef(s_ == "per-process")
_data = CacheFiles::ENUM::PER_PROCESS;
else
return -EINVAL;

3
src/config_cachefiles.hpp

@ -27,7 +27,8 @@ enum class CacheFilesEnum
OFF,
PARTIAL,
FULL,
AUTO_FULL
AUTO_FULL,
PER_PROCESS
};
typedef Enum<CacheFilesEnum> CacheFiles;

43
src/config_set.cpp

@ -0,0 +1,43 @@
/*
ISC License
Copyright (c) 2023, Antonio SJ Musumeci <trapexit@spawn.link>
Permission to use, copy, modify, and/or distribute this software for any
purpose with or without fee is hereby granted, provided that the above
copyright notice and this permission notice appear in all copies.
THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
*/
#include "config_set.hpp"
#include "str.hpp"
ConfigSet::ConfigSet(const std::string &str_)
{
from_string(str_);
}
std::string
ConfigSet::to_string(void) const
{
return str::join(*this,'|');
}
int
ConfigSet::from_string(const std::string &str_)
{
this->clear();
str::split(str_,'|',this);
return 0;
}

35
src/config_set.hpp

@ -0,0 +1,35 @@
/*
ISC License
Copyright (c) 2023, Antonio SJ Musumeci <trapexit@spawn.link>
Permission to use, copy, modify, and/or distribute this software for any
purpose with or without fee is hereby granted, provided that the above
copyright notice and this permission notice appear in all copies.
THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
*/
#pragma once
#include "tofrom_string.hpp"
#include <set>
#include <string>
class ConfigSet : public std::set<std::string>, public ToFromString
{
public:
ConfigSet(const std::string &str);
public:
std::string to_string(void) const final;
int from_string(const std::string &) final;
};

41
src/fs_openat.hpp

@ -0,0 +1,41 @@
/*
ISC License
Copyright (c) 2023, Antonio SJ Musumeci <trapexit@spawn.link>
Permission to use, copy, modify, and/or distribute this software for any
purpose with or without fee is hereby granted, provided that the above
copyright notice and this permission notice appear in all copies.
THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
*/
#pragma once
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <errno.h>
namespace fs
{
static
inline
int
openat(const int dirfd_,
const char *pathname_,
const int flags_)
{
int rv;
rv = ::openat(dirfd_,pathname_,flags_);
return ((rv == -1) ? -errno : rv);
}
}

34
src/fuse_create.cpp

@ -21,6 +21,7 @@
#include "fs_clonepath.hpp"
#include "fs_open.hpp"
#include "fs_path.hpp"
#include "procfs_get_name.hpp"
#include "ugid.hpp"
#include "fuse.h"
@ -28,9 +29,6 @@
#include <string>
#include <vector>
using std::string;
using std::vector;
namespace l
{
@ -56,6 +54,7 @@ namespace l
static
void
config_to_ffi_flags(Config::Read &cfg_,
const int tid_,
fuse_file_info_t *ffi_)
{
switch(cfg_->cache_files)
@ -85,12 +84,29 @@ namespace l
ffi_->keep_cache = 0;
ffi_->auto_cache = 1;
break;
case CacheFiles::ENUM::PER_PROCESS:
std::string proc_name;
proc_name = procfs::get_name(tid_);
if(cfg_->cache_files_process_names.count(proc_name) == 0)
{
ffi_->direct_io = 1;
ffi_->keep_cache = 0;
ffi_->auto_cache = 0;
}
else
{
ffi_->direct_io = 0;
ffi_->keep_cache = 0;
ffi_->auto_cache = 0;
}
break;
}
}
static
int
create_core(const string &fullpath_,
create_core(const std::string &fullpath_,
mode_t mode_,
const mode_t umask_,
const int flags_)
@ -103,7 +119,7 @@ namespace l
static
int
create_core(const string &createpath_,
create_core(const std::string &createpath_,
const char *fusepath_,
const mode_t mode_,
const mode_t umask_,
@ -111,7 +127,7 @@ namespace l
uint64_t *fh_)
{
int rv;
string fullpath;
std::string fullpath;
fullpath = fs::path::make(createpath_,fusepath_);
@ -136,8 +152,8 @@ namespace l
uint64_t *fh_)
{
int rv;
string fullpath;
string fusedirpath;
std::string fullpath;
std::string fusedirpath;
StrVec createpaths;
StrVec existingpaths;
@ -175,7 +191,7 @@ namespace FUSE
const fuse_context *fc = fuse_get_context();
const ugid::Set ugid(fc->uid,fc->gid);
l::config_to_ffi_flags(cfg,ffi_);
l::config_to_ffi_flags(cfg,fc->pid,ffi_);
if(cfg->writeback_cache)
l::tweak_flags_writeback_cache(&ffi_->flags);

35
src/fuse_open.cpp

@ -17,23 +17,22 @@
#include "config.hpp"
#include "errno.hpp"
#include "fileinfo.hpp"
#include "fs_lchmod.hpp"
#include "fs_cow.hpp"
#include "fs_fchmod.hpp"
#include "fs_lchmod.hpp"
#include "fs_open.hpp"
#include "fs_path.hpp"
#include "fs_stat.hpp"
#include "procfs_get_name.hpp"
#include "stat_util.hpp"
#include "ugid.hpp"
#include "fuse.h"
#include <set>
#include <string>
#include <vector>
using std::string;
using std::vector;
namespace l
{
@ -46,7 +45,7 @@ namespace l
static
int
lchmod_and_open_if_not_writable_and_empty(const string &fullpath_,
lchmod_and_open_if_not_writable_and_empty(const std::string &fullpath_,
const int flags_)
{
int rv;
@ -86,7 +85,7 @@ namespace l
case NFSOpenHack::ENUM::GIT:
if(l::rdonly(flags_))
return (errno=EACCES,-1);
if(fullpath_.find("/.git/") == string::npos)
if(fullpath_.find("/.git/") == std::string::npos)
return (errno=EACCES,-1);
return l::lchmod_and_open_if_not_writable_and_empty(fullpath_,flags_);
case NFSOpenHack::ENUM::ALL:
@ -118,6 +117,7 @@ namespace l
static
void
config_to_ffi_flags(Config::Read &cfg_,
const int tid_,
fuse_file_info_t *ffi_)
{
switch(cfg_->cache_files)
@ -147,12 +147,29 @@ namespace l
ffi_->keep_cache = 0;
ffi_->auto_cache = 1;
break;
case CacheFiles::ENUM::PER_PROCESS:
std::string proc_name;
proc_name = procfs::get_name(tid_);
if(cfg_->cache_files_process_names.count(proc_name) == 0)
{
ffi_->direct_io = 1;
ffi_->keep_cache = 0;
ffi_->auto_cache = 0;
}
else
{
ffi_->direct_io = 0;
ffi_->keep_cache = 0;
ffi_->auto_cache = 0;
}
break;
}
}
static
int
open_core(const string &basepath_,
open_core(const std::string &basepath_,
const char *fusepath_,
const int flags_,
const bool link_cow_,
@ -160,7 +177,7 @@ namespace l
uint64_t *fh_)
{
int fd;
string fullpath;
std::string fullpath;
fullpath = fs::path::make(basepath_,fusepath_);
@ -209,7 +226,7 @@ namespace FUSE
const fuse_context *fc = fuse_get_context();
const ugid::Set ugid(fc->uid,fc->gid);
l::config_to_ffi_flags(cfg,ffi_);
l::config_to_ffi_flags(cfg,fc->pid,ffi_);
if(cfg->writeback_cache)
l::tweak_flags_writeback_cache(&ffi_->flags);

3
src/mergerfs.cpp

@ -21,6 +21,7 @@
#include "fs_umount2.hpp"
#include "mergerfs.hpp"
#include "option_parser.hpp"
#include "procfs_get_name.hpp"
#include "resources.hpp"
#include "strvec.hpp"
@ -235,6 +236,8 @@ namespace l
if(cfg->lazy_umount_mountpoint)
l::lazy_umount(cfg->mountpoint);
procfs::init();
return fuse_main(args.argc,
args.argv,
&ops);

54
src/procfs_get_name.cpp

@ -0,0 +1,54 @@
#ifndef _GNU_SOURCE
#define _GNU_SOURCE
#endif
#include "procfs_get_name.hpp"
#include "errno.hpp"
#include "fs_close.hpp"
#include "fs_open.hpp"
#include "fs_openat.hpp"
#include "fs_read.hpp"
#include <pthread.h>
static int g_PROCFS_DIR_FD = -1;
const char PROCFS_PATH[] = "/proc";
int
procfs::init()
{
if(g_PROCFS_DIR_FD != -1)
return 0;
g_PROCFS_DIR_FD = fs::open(PROCFS_PATH,O_PATH|O_DIRECTORY);
if(g_PROCFS_DIR_FD == -1)
return -errno;
return 0;
}
std::string
procfs::get_name(const int tid_)
{
int fd;
int rv;
char commpath[256];
snprintf(commpath,sizeof(commpath),"%d/comm",tid_);
fd = fs::openat(g_PROCFS_DIR_FD,commpath,O_RDONLY);
if(fd < 0)
return {};
rv = fs::read(fd,commpath,sizeof(commpath));
if(rv == -1)
return {};
// Overwrite the newline with NUL
commpath[rv-1] = '\0';
fs::close(fd);
return commpath;
}

27
src/procfs_get_name.hpp

@ -0,0 +1,27 @@
/*
ISC License
Copyright (c) 2023, Antonio SJ Musumeci <trapexit@spawn.link>
Permission to use, copy, modify, and/or distribute this software for any
purpose with or without fee is hereby granted, provided that the above
copyright notice and this permission notice appear in all copies.
THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
*/
#pragma once
#include <string>
namespace procfs
{
int init();
std::string get_name(const int tid);
}

32
src/str.cpp

@ -40,6 +40,18 @@ namespace str
result_->push_back(part);
}
void
split(const char *str_,
const char delimiter_,
set<string> *result_)
{
string part;
istringstream ss(str_);
while(std::getline(ss,part,delimiter_))
result_->insert(part);
}
void
split(const string &str_,
const char delimiter_,
@ -48,6 +60,14 @@ namespace str
return str::split(str_.c_str(),delimiter_,result_);
}
void
split(const string &str_,
const char delimiter_,
set<string> *result_)
{
return str::split(str_.c_str(),delimiter_,result_);
}
void
rsplit1(const string &str_,
const char delimiter_,
@ -112,16 +132,10 @@ namespace str
const char sep_)
{
string rv;
set<string>::iterator i;
if(set_.empty())
return string();
i = set_.begin();
rv += *i;
++i;
for(; i != set_.end(); ++i)
rv += sep_ + *i;
for(auto const &s : set_)
rv += s + sep_;
rv.pop_back();
return rv;
}

9
src/str.hpp

@ -31,6 +31,15 @@ namespace str
const char delimiter,
std::vector<std::string> *result);
void
split(const char *str,
const char delimiter,
std::set<std::string> *result);
void
split(const std::string &str,
const char delimiter,
std::set<std::string> *result);
void
rsplit1(const std::string &str,
const char delimiter,

Loading…
Cancel
Save