You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

75 lines
3.9 KiB

  1. # inodecalc
  2. Inodes (`st_ino`) are unique identifiers within a filesystem. Each
  3. mounted filesystem has device ID (st_dev) as well and together they
  4. can uniquely identify a file on the whole of the system. Entries on
  5. the same device with the same inode are in fact references to the same
  6. underlying file. It is a many to one relationship between names and an
  7. inode. Directories, however, do not have multiple links on most
  8. systems due to the complexity they add.
  9. FUSE allows the server (mergerfs) to set inode values but not device
  10. IDs. Creating an inode value is somewhat complex in mergerfs' case as
  11. files aren't really in its control. If a policy changes what directory
  12. or file is to be selected or something changes out of band it becomes
  13. unclear what value should be used. Most software does not to care what
  14. the values are but those that do often break if a value changes
  15. unexpectedly. The tool find will abort a directory walk if it sees a
  16. directory inode change. NFS can return stale handle errors if the
  17. inode changes out of band. File dedup tools will usually leverage
  18. device ids and inodes as a shortcut in searching for duplicate files
  19. and would resort to full file comparisons should it find different
  20. inode values.
  21. mergerfs offers multiple ways to calculate the inode in hopes of
  22. covering different usecases.
  23. * `passthrough`: Passes through the underlying inode value. Mostly
  24. intended for testing as using this does not address any of the
  25. problems mentioned above and could confuse file deduplication
  26. software as inodes from different filesystems can be the same.
  27. * `path-hash`: Hashes the relative path of the entry in question. The
  28. underlying file's values are completely ignored. This means the
  29. inode value will always be the same for that file path. This is
  30. useful when using NFS and you make changes out of band such as copy
  31. data between branches. This also means that entries that do point to
  32. the same file will not be recognizable via inodes. That does not
  33. mean hard links don't work. They will.
  34. * `path-hash32`: 32bit version of path-hash.
  35. * `devino-hash`: Hashes the device id and inode of the underlying
  36. entry. This won't prevent issues with NFS should the policy pick a
  37. different file or files move out of band but will present the same
  38. inode for underlying files that do too.
  39. * `devino-hash32`: 32bit version of devino-hash.
  40. * `hybrid-hash`: Performs path-hash on directories and devino-hash on
  41. other file types. Since directories can't have hard links the static
  42. value won't make a difference and the files will get values useful
  43. for finding duplicates. Probably the best to use if not using
  44. NFS. As such it is the default.
  45. * `hybrid-hash32`: 32bit version of hybrid-hash.
  46. 32bit versions are provided as there is some software which does not
  47. handle 64bit inodes well.
  48. While there is a risk of hash collision in tests of a couple of
  49. million entries there were zero collisions. Unlike a typical
  50. filesystem FUSE filesystems can reuse inodes and not refer to the same
  51. entry. The internal identifier used to reference a file in FUSE is
  52. different from the inode value presented. The former is the nodeid and
  53. is actually a tuple of 2 64bit values: nodeid and generation. This
  54. tuple is not client facing. The inode that is presented to the client
  55. is passed through the kernel uninterpreted.
  56. From FUSE docs for `use_ino`:
  57. > Honor the st_ino field in the functions getattr() and
  58. > fill_dir(). This value is used to fill in the st_ino field
  59. > in the stat(2), lstat(2), fstat(2) functions and the d_ino
  60. > field in the readdir(2) function. The filesystem does not
  61. > have to guarantee uniqueness, however some applications
  62. > rely on this value being unique for the whole filesystem.
  63. > Note that this does *not* affect the inode that libfuse
  64. > and the kernel use internally (also called the "nodeid").
  65. **NOTE:** As of version 2.35.0 the use_ino option has been
  66. removed. mergerfs should always be managing inode values.