Add "per-process" file caching feature

Allows chosing page caching based on process name.
3 years ago · 9effeb6869
15 changed files with 368 additions and 60 deletions
--- a/README.md
+++ b/README.md
@ -1,6 +1,6 @@
 % mergerfs(1) mergerfs user manual
 % Antonio SJ Musumeci <trapexit@spawn.link>
-% 2023-01-29
+% 2023-02-16

 # NAME

@ -124,7 +124,7 @@ These options are the same regardless of whether you use them with the `mergerfs
 * **dropcacheonclose=BOOL**: When a file is requested to be closed
  call `posix_fadvise` on it first to instruct the kernel that we no
  longer need the data and it can drop its cache. Recommended when
-  **cache.files=partial|full|auto-full** to limit double
+  **cache.files=partial|full|auto-full|per-process** to limit double
  caching. (default: false)
 * **symlinkify=BOOL**: When enabled and a file is not writable and its
  mtime or ctime is older than **symlinkify_timeout** files will be
@ -224,8 +224,12 @@ These options are the same regardless of whether you use them with the `mergerfs
  seconds. (default: 1)
 * **cache.negative_entry=UINT**: Negative file name lookup cache
  timeout in seconds. (default: 0)
-* **cache.files=libfuse|off|partial|full|auto-full**: File page
-  caching mode (default: libfuse)
+* **cache.files=libfuse|off|partial|full|auto-full|per-process**: File
+  page caching mode (default: libfuse)
+* **cache.files.process-names=LIST**: A pipe | delimited list of
+  process [comm](https://man7.org/linux/man-pages/man5/proc.5.html)
+  names to enable page caching for when
+  `cache.files=per-process`. (default: "rtorrent|qbittorrent-nox")
 * **cache.writeback=BOOL**: Enable kernel writeback caching (default:
  false)
 * **cache.symlinks=BOOL**: Cache symlinks (if supported by kernel)
@ -771,23 +775,57 @@ While they won't show up when using `getfattr` **mergerfs** offers a number of s

 https://en.wikipedia.org/wiki/Page_cache

-tl;dr:
-* cache.files=off: Disables page caching. Underlying files cached, mergerfs files are not.
-* cache.files=partial: Enables page caching. Underlying files cached, mergerfs files cached while open.
-* cache.files=full: Enables page caching. Underlying files cached, mergerfs files cached across opens.
-* cache.files=auto-full: Enables page caching. Underlying files cached, mergerfs files cached across opens if mtime and size are unchanged since previous open.
-* cache.files=libfuse: follow traditional libfuse `direct_io`, `kernel_cache`, and `auto_cache` arguments.
-
-
-FUSE, which mergerfs uses, offers a number of page caching modes. mergerfs tries to simplify their use via the `cache.files` option. It can and should replace usage of `direct_io`, `kernel_cache`, and `auto_cache`.
-
-Due to mergerfs using FUSE and therefore being a userland process proxying existing filesystems the kernel will double cache the content being read and written through mergerfs. Once from the underlying filesystem and once from mergerfs (it sees them as two separate entities). Using `cache.files=off` will keep the double caching from happening by disabling caching of mergerfs but this has the side effect that *all* read and write calls will be passed to mergerfs which may be slower than enabling caching, you lose shared `mmap` support which can affect apps such as rtorrent, and no read-ahead will take place. The kernel will still cache the underlying filesystem data but that only helps so much given mergerfs will still process all requests.
-
-If you do enable file page caching, `cache.files=partial|full|auto-full`, you should also enable `dropcacheonclose` which will cause mergerfs to instruct the kernel to flush the underlying file's page cache when the file is closed. This behavior is the same as the rsync fadvise / drop cache patch and Feh's nocache project.
-
-If most files are read once through and closed (like media) it is best to enable `dropcacheonclose` regardless of caching mode in order to minimize buffer bloat.
-
-It is difficult to balance memory usage, cache bloat & duplication, and performance. Ideally mergerfs would be able to disable caching for the files it reads/writes but allow page caching for itself. That would limit the FUSE overhead. However, there isn't a good way to achieve this. It would need to open all files with O_DIRECT which places limitations on the what underlying filesystems would be supported and complicates the code.
+* cache.files=off: Disables page caching. Underlying files cached,
+  mergerfs files are not.
+* cache.files=partial: Enables page caching. Underlying files cached,
+  mergerfs files cached while open.
+* cache.files=full: Enables page caching. Underlying files cached,
+  mergerfs files cached across opens.
+* cache.files=auto-full: Enables page caching. Underlying files
+  cached, mergerfs files cached across opens if mtime and size are
+  unchanged since previous open.
+* cache.files=libfuse: follow traditional libfuse `direct_io`,
+  `kernel_cache`, and `auto_cache` arguments.
+* cache.files=per-process: Enable page caching only for processes
+  which 'comm' name matches one of the values defined in
+  `cache.files.process-names`.
+
+FUSE, which mergerfs uses, offers a number of page caching
+modes. mergerfs tries to simplify their use via the `cache.files`
+option. It can and should replace usage of `direct_io`,
+`kernel_cache`, and `auto_cache`.
+
+Due to mergerfs using FUSE and therefore being a userland process
+proxying existing filesystems the kernel will double cache the content
+being read and written through mergerfs. Once from the underlying
+filesystem and once from mergerfs (it sees them as two separate
+entities). Using `cache.files=off` will keep the double caching from
+happening by disabling caching of mergerfs but this has the side
+effect that *all* read and write calls will be passed to mergerfs
+which may be slower than enabling caching, you lose shared `mmap`
+support which can affect apps such as rtorrent, and no read-ahead will
+take place. The kernel will still cache the underlying filesystem data
+but that only helps so much given mergerfs will still process all
+requests.
+
+If you do enable file page caching,
+`cache.files=partial|full|auto-full`, you should also enable
+`dropcacheonclose` which will cause mergerfs to instruct the kernel to
+flush the underlying file's page cache when the file is closed. This
+behavior is the same as the rsync fadvise / drop cache patch and Feh's
+nocache project.
+
+If most files are read once through and closed (like media) it is best
+to enable `dropcacheonclose` regardless of caching mode in order to
+minimize buffer bloat.
+
+It is difficult to balance memory usage, cache bloat & duplication,
+and performance. Ideally mergerfs would be able to disable caching for
+the files it reads/writes but allow page caching for itself. That
+would limit the FUSE overhead. However, there isn't a good way to
+achieve this. It would need to open all files with O_DIRECT which
+places limitations on the what underlying filesystems would be
+supported and complicates the code.

 kernel documentation: https://www.kernel.org/doc/Documentation/filesystems/fuse-io.txt

--- a/src/config.cpp
+++ b/src/config.cpp
@ -37,11 +37,13 @@

 #define MINFREESPACE_DEFAULT (4294967295ULL)

-using std::string;
-
 #define IFERT(S) if(S == s_) return true

 const std::string CONTROLFILE = "/.mergerfs";
+constexpr static const char CACHE_FILES_PROCESS_NAMES_DEFAULT[] =
+  "rtorrent|"
+  "qbittorrent-nox";
+

 Config Config::_singleton;

@ -79,6 +81,7 @@ Config::Config()
    cache_attr(1),
    cache_entry(1),
    cache_files(CacheFiles::ENUM::LIBFUSE),
+    cache_files_process_names(CACHE_FILES_PROCESS_NAMES_DEFAULT),
    cache_negative_entry(0),
    cache_readdir(false),
    cache_statfs(0),
@ -125,6 +128,7 @@ Config::Config()
  _map["cache.attr"]             = &cache_attr;
  _map["cache.entry"]            = &cache_entry;
  _map["cache.files"]            = &cache_files;
+  _map["cache.files.process-names"] = &cache_files_process_names;
  _map["cache.negative_entry"]   = &cache_negative_entry;
  _map["cache.readdir"]          = &cache_readdir;
  _map["cache.statfs"]           = &cache_statfs;
--- a/src/config.hpp
+++ b/src/config.hpp
@ -30,6 +30,7 @@
 #include "config_statfs.hpp"
 #include "config_statfsignore.hpp"
 #include "config_xattr.hpp"
+#include "config_set.hpp"
 #include "enum.hpp"
 #include "errno.hpp"
 #include "funcs.hpp"
@ -107,6 +108,7 @@ public:
  ConfigUINT64   cache_attr;
  ConfigUINT64   cache_entry;
  CacheFiles     cache_files;
+  ConfigSet      cache_files_process_names;
  ConfigUINT64   cache_negative_entry;
  ConfigBOOL     cache_readdir;
  ConfigUINT64   cache_statfs;
--- a/src/config_cachefiles.cpp
+++ b/src/config_cachefiles.cpp
@ -36,6 +36,8 @@ CacheFiles::to_string() const
      return "full";
    case CacheFiles::ENUM::AUTO_FULL:
      return "auto-full";
+    case CacheFiles::ENUM::PER_PROCESS:
+      return "per-process";
    }

  return "invalid";
@ -55,6 +57,8 @@ CacheFiles::from_string(const std::string &s_)
    _data = CacheFiles::ENUM::FULL;
  ef(s_ == "auto-full")
    _data = CacheFiles::ENUM::AUTO_FULL;
+  ef(s_ == "per-process")
+    _data = CacheFiles::ENUM::PER_PROCESS;
  else
    return -EINVAL;

--- a/src/config_cachefiles.hpp
+++ b/src/config_cachefiles.hpp
@ -27,7 +27,8 @@ enum class CacheFilesEnum
    OFF,
    PARTIAL,
    FULL,
-    AUTO_FULL
+    AUTO_FULL,
+    PER_PROCESS
  };

 typedef Enum<CacheFilesEnum> CacheFiles;
--- a/src/config_set.cpp
+++ b/src/config_set.cpp
@ -0,0 +1,43 @@
+/*
+  ISC License
+
+  Copyright (c) 2023, Antonio SJ Musumeci <trapexit@spawn.link>
+
+  Permission to use, copy, modify, and/or distribute this software for any
+  purpose with or without fee is hereby granted, provided that the above
+  copyright notice and this permission notice appear in all copies.
+
+  THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
+  WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
+  MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
+  ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
+  WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
+  ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
+  OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
+*/
+
+#include "config_set.hpp"
+
+#include "str.hpp"
+
+
+ConfigSet::ConfigSet(const std::string &str_)
+{
+  from_string(str_);
+}
+
+std::string
+ConfigSet::to_string(void) const
+{
+  return str::join(*this,'|');
+}
+
+int
+ConfigSet::from_string(const std::string &str_)
+{
+  this->clear();
+
+  str::split(str_,'|',this);
+
+  return 0;
+}
--- a/src/config_set.hpp
+++ b/src/config_set.hpp
@ -0,0 +1,35 @@
+/*
+  ISC License
+
+  Copyright (c) 2023, Antonio SJ Musumeci <trapexit@spawn.link>
+
+  Permission to use, copy, modify, and/or distribute this software for any
+  purpose with or without fee is hereby granted, provided that the above
+  copyright notice and this permission notice appear in all copies.
+
+  THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
+  WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
+  MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
+  ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
+  WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
+  ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
+  OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
+*/
+
+#pragma once
+
+#include "tofrom_string.hpp"
+
+#include <set>
+#include <string>
+
+
+class ConfigSet : public std::set<std::string>, public ToFromString
+{
+public:
+  ConfigSet(const std::string &str);
+
+public:
+  std::string to_string(void) const final;
+  int from_string(const std::string &) final;
+};
--- a/src/fs_openat.hpp
+++ b/src/fs_openat.hpp
@ -0,0 +1,41 @@
+/*
+  ISC License
+
+  Copyright (c) 2023, Antonio SJ Musumeci <trapexit@spawn.link>
+
+  Permission to use, copy, modify, and/or distribute this software for any
+  purpose with or without fee is hereby granted, provided that the above
+  copyright notice and this permission notice appear in all copies.
+
+  THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
+  WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
+  MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
+  ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
+  WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
+  ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
+  OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
+*/
+
+#pragma once
+
+#include <sys/types.h>
+#include <sys/stat.h>
+#include <fcntl.h>
+#include <errno.h>
+
+namespace fs
+{
+  static
+  inline
+  int
+  openat(const int   dirfd_,
+         const char *pathname_,
+         const int   flags_)
+  {
+    int rv;
+
+    rv = ::openat(dirfd_,pathname_,flags_);
+
+    return ((rv == -1) ? -errno : rv);
+  }
+}
--- a/src/fuse_create.cpp
+++ b/src/fuse_create.cpp
@ -21,6 +21,7 @@
 #include "fs_clonepath.hpp"
 #include "fs_open.hpp"
 #include "fs_path.hpp"
+#include "procfs_get_name.hpp"
 #include "ugid.hpp"

 #include "fuse.h"
@ -28,9 +29,6 @@
 #include <string>
 #include <vector>

-using std::string;
-using std::vector;
-

 namespace l
 {
@ -56,6 +54,7 @@ namespace l
  static
  void
  config_to_ffi_flags(Config::Read     &cfg_,
+                      const int         tid_,
                      fuse_file_info_t *ffi_)
  {
    switch(cfg_->cache_files)
@ -85,12 +84,29 @@ namespace l
        ffi_->keep_cache = 0;
        ffi_->auto_cache = 1;
        break;
+      case CacheFiles::ENUM::PER_PROCESS:
+        std::string proc_name;
+
+        proc_name = procfs::get_name(tid_);
+        if(cfg_->cache_files_process_names.count(proc_name) == 0)
+          {
+            ffi_->direct_io  = 1;
+            ffi_->keep_cache = 0;
+            ffi_->auto_cache = 0;
+          }
+        else
+          {
+            ffi_->direct_io  = 0;
+            ffi_->keep_cache = 0;
+            ffi_->auto_cache = 0;
+          }
+        break;
      }
  }

  static
  int
-  create_core(const string &fullpath_,
+  create_core(const std::string &fullpath_,
              mode_t             mode_,
              const mode_t       umask_,
              const int          flags_)
@ -103,7 +119,7 @@ namespace l

  static
  int
-  create_core(const string &createpath_,
+  create_core(const std::string &createpath_,
              const char        *fusepath_,
              const mode_t       mode_,
              const mode_t       umask_,
@ -111,7 +127,7 @@ namespace l
              uint64_t          *fh_)
  {
    int rv;
-    string fullpath;
+    std::string fullpath;

    fullpath = fs::path::make(createpath_,fusepath_);

@ -136,8 +152,8 @@ namespace l
         uint64_t             *fh_)
  {
    int rv;
-    string fullpath;
-    string fusedirpath;
+    std::string fullpath;
+    std::string fusedirpath;
    StrVec createpaths;
    StrVec existingpaths;

@ -175,7 +191,7 @@ namespace FUSE
    const fuse_context *fc = fuse_get_context();
    const ugid::Set     ugid(fc->uid,fc->gid);

-    l::config_to_ffi_flags(cfg,ffi_);
+    l::config_to_ffi_flags(cfg,fc->pid,ffi_);

    if(cfg->writeback_cache)
      l::tweak_flags_writeback_cache(&ffi_->flags);
--- a/src/fuse_open.cpp
+++ b/src/fuse_open.cpp
@ -17,23 +17,22 @@
 #include "config.hpp"
 #include "errno.hpp"
 #include "fileinfo.hpp"
-#include "fs_lchmod.hpp"
 #include "fs_cow.hpp"
 #include "fs_fchmod.hpp"
+#include "fs_lchmod.hpp"
 #include "fs_open.hpp"
 #include "fs_path.hpp"
 #include "fs_stat.hpp"
+#include "procfs_get_name.hpp"
 #include "stat_util.hpp"
 #include "ugid.hpp"

 #include "fuse.h"

+#include <set>
 #include <string>
 #include <vector>

-using std::string;
-using std::vector;
-

 namespace l
 {
@ -46,7 +45,7 @@ namespace l

  static
  int
-  lchmod_and_open_if_not_writable_and_empty(const string &fullpath_,
+  lchmod_and_open_if_not_writable_and_empty(const std::string &fullpath_,
                                            const int          flags_)
  {
    int rv;
@ -86,7 +85,7 @@ namespace l
      case NFSOpenHack::ENUM::GIT:
        if(l::rdonly(flags_))
          return (errno=EACCES,-1);
-        if(fullpath_.find("/.git/") == string::npos)
+        if(fullpath_.find("/.git/") == std::string::npos)
          return (errno=EACCES,-1);
        return l::lchmod_and_open_if_not_writable_and_empty(fullpath_,flags_);
      case NFSOpenHack::ENUM::ALL:
@ -118,6 +117,7 @@ namespace l
  static
  void
  config_to_ffi_flags(Config::Read     &cfg_,
+                      const int         tid_,
                      fuse_file_info_t *ffi_)
  {
    switch(cfg_->cache_files)
@ -147,12 +147,29 @@ namespace l
        ffi_->keep_cache = 0;
        ffi_->auto_cache = 1;
        break;
+      case CacheFiles::ENUM::PER_PROCESS:
+        std::string proc_name;
+
+        proc_name = procfs::get_name(tid_);
+        if(cfg_->cache_files_process_names.count(proc_name) == 0)
+          {
+            ffi_->direct_io  = 1;
+            ffi_->keep_cache = 0;
+            ffi_->auto_cache = 0;
+          }
+        else
+          {
+            ffi_->direct_io  = 0;
+            ffi_->keep_cache = 0;
+            ffi_->auto_cache = 0;
+          }
+        break;
      }
  }

  static
  int
-  open_core(const string      &basepath_,
+  open_core(const std::string &basepath_,
            const char        *fusepath_,
            const int          flags_,
            const bool         link_cow_,
@ -160,7 +177,7 @@ namespace l
            uint64_t          *fh_)
  {
    int fd;
-    string fullpath;
+    std::string fullpath;

    fullpath = fs::path::make(basepath_,fusepath_);

@ -209,7 +226,7 @@ namespace FUSE
    const fuse_context *fc  = fuse_get_context();
    const ugid::Set     ugid(fc->uid,fc->gid);

-    l::config_to_ffi_flags(cfg,ffi_);
+    l::config_to_ffi_flags(cfg,fc->pid,ffi_);

    if(cfg->writeback_cache)
      l::tweak_flags_writeback_cache(&ffi_->flags);
--- a/src/mergerfs.cpp
+++ b/src/mergerfs.cpp
@ -21,6 +21,7 @@
 #include "fs_umount2.hpp"
 #include "mergerfs.hpp"
 #include "option_parser.hpp"
+#include "procfs_get_name.hpp"
 #include "resources.hpp"
 #include "strvec.hpp"

@ -235,6 +236,8 @@ namespace l
    if(cfg->lazy_umount_mountpoint)
      l::lazy_umount(cfg->mountpoint);

+    procfs::init();
+
    return fuse_main(args.argc,
                     args.argv,
                     &ops);
--- a/src/procfs_get_name.cpp
+++ b/src/procfs_get_name.cpp
@ -0,0 +1,54 @@
+#ifndef _GNU_SOURCE
+#define _GNU_SOURCE
+#endif
+
+#include "procfs_get_name.hpp"
+
+#include "errno.hpp"
+#include "fs_close.hpp"
+#include "fs_open.hpp"
+#include "fs_openat.hpp"
+#include "fs_read.hpp"
+
+#include <pthread.h>
+
+static int g_PROCFS_DIR_FD = -1;
+const char PROCFS_PATH[] = "/proc";
+
+int
+procfs::init()
+{
+  if(g_PROCFS_DIR_FD != -1)
+    return 0;
+
+  g_PROCFS_DIR_FD = fs::open(PROCFS_PATH,O_PATH|O_DIRECTORY);
+  if(g_PROCFS_DIR_FD == -1)
+    return -errno;
+
+  return 0;
+}
+
+std::string
+procfs::get_name(const int tid_)
+{
+  int fd;
+  int rv;
+  char commpath[256];
+
+  snprintf(commpath,sizeof(commpath),"%d/comm",tid_);
+
+  fd = fs::openat(g_PROCFS_DIR_FD,commpath,O_RDONLY);
+  if(fd < 0)
+    return {};
+
+  rv = fs::read(fd,commpath,sizeof(commpath));
+  if(rv == -1)
+    return {};
+
+  // Overwrite the newline with NUL
+  commpath[rv-1] = '\0';
+
+  fs::close(fd);
+
+  return commpath;
+}
--- a/src/procfs_get_name.hpp
+++ b/src/procfs_get_name.hpp
@ -0,0 +1,27 @@
+/*
+  ISC License
+
+  Copyright (c) 2023, Antonio SJ Musumeci <trapexit@spawn.link>
+
+  Permission to use, copy, modify, and/or distribute this software for any
+  purpose with or without fee is hereby granted, provided that the above
+  copyright notice and this permission notice appear in all copies.
+
+  THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
+  WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
+  MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
+  ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
+  WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
+  ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
+  OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
+*/
+
+#pragma once
+
+#include <string>
+
+namespace procfs
+{
+  int init();
+  std::string get_name(const int tid);
+}
--- a/src/str.cpp
+++ b/src/str.cpp
@ -40,6 +40,18 @@ namespace str
      result_->push_back(part);
  }

+  void
+  split(const char  *str_,
+        const char   delimiter_,
+        set<string> *result_)
+  {
+    string part;
+    istringstream ss(str_);
+
+    while(std::getline(ss,part,delimiter_))
+      result_->insert(part);
+  }
+
  void
  split(const string   &str_,
        const char      delimiter_,
@ -48,6 +60,14 @@ namespace str
    return str::split(str_.c_str(),delimiter_,result_);
  }

+  void
+  split(const string &str_,
+        const char    delimiter_,
+        set<string>  *result_)
+  {
+    return str::split(str_.c_str(),delimiter_,result_);
+  }
+
  void
  rsplit1(const string   &str_,
          const char      delimiter_,
@ -112,16 +132,10 @@ namespace str
       const char         sep_)
  {
    string rv;
-    set<string>::iterator i;
-
-    if(set_.empty())
-      return string();

-    i = set_.begin();
-    rv += *i;
-    ++i;
-    for(; i != set_.end(); ++i)
-      rv += sep_ + *i;
+    for(auto const &s : set_)
+      rv += s + sep_;
+    rv.pop_back();

    return rv;
  }
--- a/src/str.hpp
+++ b/src/str.hpp
@ -31,6 +31,15 @@ namespace str
        const char                delimiter,
        std::vector<std::string> *result);

+  void
+  split(const char            *str,
+        const char             delimiter,
+        std::set<std::string> *result);
+  void
+  split(const std::string     &str,
+        const char             delimiter,
+        std::set<std::string> *result);
+
  void
  rsplit1(const std::string        &str,
          const char                delimiter,