You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

118 lines
3.3 KiB

  1. Distributed Filer
  2. ===========================
  3. The default weed filer is in standalone mode, storing file metadata on disk.
  4. It is quite efficient to go through deep directory path and can handle
  5. millions of files.
  6. However, no SPOF is a must-have requirement for many projects.
  7. Luckily, SeaweedFS is so flexible that we can use a completely different way
  8. to manage file metadata.
  9. This distributed filer uses Redis or Cassandra to store the metadata.
  10. Redis Setup
  11. #####################
  12. No setup required.
  13. Cassandra Setup
  14. #####################
  15. Here is the CQL to create the table.CassandraStore.
  16. Optionally you can adjust the keyspace name and replication settings.
  17. For production, you would want to set replication_factor to 3
  18. if there are at least 3 Cassandra servers.
  19. .. code-block:: bash
  20. create keyspace seaweed WITH replication = {
  21. 'class':'SimpleStrategy',
  22. 'replication_factor':1
  23. };
  24. use seaweed;
  25. CREATE TABLE seaweed_files (
  26. path varchar,
  27. fids list<varchar>,
  28. PRIMARY KEY (path)
  29. );
  30. Sample usage
  31. #####################
  32. To start a weed filer in distributed mode with Redis:
  33. .. code-block:: bash
  34. # assuming you already started weed master and weed volume
  35. weed filer -redis.server=localhost:6379
  36. To start a weed filer in distributed mode with Cassandra:
  37. .. code-block:: bash
  38. # assuming you already started weed master and weed volume
  39. weed filer -cassandra.server=localhost
  40. Now you can add/delete files
  41. .. code-block:: bash
  42. # POST a file and read it back
  43. curl -F "filename=@README.md" "http://localhost:8888/path/to/sources/"
  44. curl "http://localhost:8888/path/to/sources/README.md"
  45. # POST a file with a new name and read it back
  46. curl -F "filename=@Makefile" "http://localhost:8888/path/to/sources/new_name"
  47. curl "http://localhost:8888/path/to/sources/new_name"
  48. Limitation
  49. ############
  50. List sub folders and files are not supported because Redis or Cassandra
  51. does not support prefix search.
  52. Flat Namespace Design
  53. ############
  54. In stead of using both directory and file metadata, this implementation uses
  55. a flat namespace.
  56. If storing each directory metadata separatedly, there would be multiple
  57. network round trips to fetch directory information for deep directories,
  58. impeding system performance.
  59. A flat namespace would take more space because the parent directories are
  60. repeatedly stored. But disk space is a lesser concern especially for
  61. distributed systems.
  62. So either Redis or Cassandra is a simple file_full_path ~ file_id mapping.
  63. (Actually Cassandra is a file_full_path ~ list_of_file_ids mapping
  64. with the hope to support easy file appending for streaming files.)
  65. Complexity
  66. ###################
  67. For one file retrieval, the full_filename=>file_id lookup will be O(logN)
  68. using Redis or Cassandra. But very likely the one additional network hop would
  69. take longer than the actual lookup.
  70. Use Cases
  71. #########################
  72. Clients can assess one "weed filer" via HTTP, create files via HTTP POST,
  73. read files via HTTP POST directly.
  74. Future
  75. ###################
  76. SeaweedFS can support other distributed databases. It will be better
  77. if that database can support prefix search, in order to list files
  78. under a directory.
  79. Helps Wanted
  80. ########################
  81. Please implement your preferred metadata store!
  82. Just follow the cassandra_store/cassandra_store.go file and send me a pull
  83. request. I will handle the rest.