You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
|
|
# inodecalc
Inodes (`st_ino`) are unique identifiers within a filesystem. Each mounted filesystem has device ID (st_dev) as well and together they can uniquely identify a file on the whole of the system. Entries on the same device with the same inode are in fact references to the same underlying file. It is a many to one relationship between names and an inode. Directories, however, do not have multiple links on most systems due to the complexity they add.
FUSE allows the server (mergerfs) to set inode values but not device IDs. Creating an inode value is somewhat complex in mergerfs' case as files aren't really in its control. If a policy changes what directory or file is to be selected or something changes out of band it becomes unclear what value should be used. Most software does not to care what the values are but those that do often break if a value changes unexpectedly. The tool find will abort a directory walk if it sees a directory inode change. NFS can return stale handle errors if the inode changes out of band. File dedup tools will usually leverage device ids and inodes as a shortcut in searching for duplicate files and would resort to full file comparisons should it find different inode values.
mergerfs offers multiple ways to calculate the inode in hopes of covering different usecases.
* `passthrough`: Passes through the underlying inode value. Mostly intended for testing as using this does not address any of the problems mentioned above and could confuse file deduplication software as inodes from different filesystems can be the same. * `path-hash`: Hashes the relative path of the entry in question. The underlying file's values are completely ignored. This means the inode value will always be the same for that file path. This is useful when using NFS and you make changes out of band such as copy data between branches. This also means that entries that do point to the same file will not be recognizable via inodes. That does not mean hard links don't work. They will. * `path-hash32`: 32bit version of path-hash. * `devino-hash`: Hashes the device id and inode of the underlying entry. This won't prevent issues with NFS should the policy pick a different file or files move out of band but will present the same inode for underlying files that do too. * `devino-hash32`: 32bit version of devino-hash. * `hybrid-hash`: Performs path-hash on directories and devino-hash on other file types. Since directories can't have hard links the static value won't make a difference and the files will get values useful for finding duplicates. Probably the best to use if not using NFS. As such it is the default. * `hybrid-hash32`: 32bit version of hybrid-hash.
32bit versions are provided as there is some software which does not handle 64bit inodes well.
While there is a risk of hash collision in tests of a couple of million entries there were zero collisions. Unlike a typical filesystem FUSE filesystems can reuse inodes and not refer to the same entry. The internal identifier used to reference a file in FUSE is different from the inode value presented. The former is the nodeid and is actually a tuple of 2 64bit values: nodeid and generation. This tuple is not client facing. The inode that is presented to the client is passed through the kernel uninterpreted.
From FUSE docs for `use_ino`:
> Honor the st_ino field in the functions getattr() and
> fill_dir(). This value is used to fill in the st_ino field
> in the stat(2), lstat(2), fstat(2) functions and the d_ino
> field in the readdir(2) function. The filesystem does not
> have to guarantee uniqueness, however some applications
> rely on this value being unique for the whole filesystem.
> Note that this does *not* affect the inode that libfuse
> and the kernel use internally (also called the "nodeid").
**NOTE:** As of version 2.35.0 the use_ino option has been removed. mergerfs should always be managing inode values.
|