You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

101 lines
3.0 KiB

  1. Distributed Filer
  2. ===========================
  3. The default weed filer is in standalone mode, storing file metadata on disk.
  4. It is quite efficient to go through deep directory path and can handle
  5. millions of files.
  6. However, no SPOF is a must-have requirement for many projects.
  7. Luckily, SeaweedFS is so flexible that we can use a completely different way
  8. to manage file metadata.
  9. This distributed filer uses Cassandra to store the metadata.
  10. Cassandra Setup
  11. #####################
  12. Here is the CQL to create the table.CassandraStore.
  13. Optionally you can adjust the keyspace name and replication settings.
  14. For production server, you would want to set replication_factor to 3.
  15. .. code-block:: bash
  16. create keyspace seaweed WITH replication = {
  17. 'class':'SimpleStrategy',
  18. 'replication_factor':1
  19. };
  20. use seaweed;
  21. CREATE TABLE seaweed_files (
  22. path varchar,
  23. fids list<varchar>,
  24. PRIMARY KEY (path)
  25. );
  26. Sample usage
  27. #####################
  28. To start a weed filer in distributed mode:
  29. .. code-block:: bash
  30. # assuming you already started weed master and weed volume
  31. weed filer -cassandra.server=localhost
  32. Now you can add/delete files, and even browse the sub directories and files
  33. .. code-block:: bash
  34. # POST a file and read it back
  35. curl -F "filename=@README.md" "http://localhost:8888/path/to/sources/"
  36. curl "http://localhost:8888/path/to/sources/README.md"
  37. # POST a file with a new name and read it back
  38. curl -F "filename=@Makefile" "http://localhost:8888/path/to/sources/new_name"
  39. curl "http://localhost:8888/path/to/sources/new_name"
  40. Limitation
  41. ############
  42. List sub folders and files are not supported because Cassandra does not support
  43. prefix search.
  44. Flat Namespace Design
  45. ############
  46. In stead of using both directory and file metadata, this implementation uses
  47. a flat namespace.
  48. If storing each directory metadata separatedly, there would be multiple
  49. network round trips to fetch directory information for deep directories,
  50. impeding system performance.
  51. A flat namespace would take more space because the parent directories are
  52. repeatedly stored. But disk space is a lesser concern especially for
  53. distributed systems.
  54. Complexity
  55. ###################
  56. For one file retrieval, the full_filename=>file_id lookup will be O(logN)
  57. using Cassandra. But very likely the one additional network hop would
  58. take longer than the Cassandra internal lookup.
  59. Use Cases
  60. #########################
  61. Clients can assess one "weed filer" via HTTP, list files under a directory, create files via HTTP POST, read files via HTTP POST directly.
  62. Although one "weed filer" can only sits in one machine, you can start multiple "weed filer" on several machines, each "weed filer" instance running in its own collection, having its own namespace, but sharing the same Seaweed-FS storage.
  63. Future
  64. ###################
  65. The Cassandra implementation can be switched to other distributed hash table.
  66. Helps Wanted
  67. ########################
  68. Please implement your preferred metadata store!
  69. Just follow the cassandra_store/cassandra_store.go file and send me a pull
  70. request. I will handle the rest.