A common file system would use inode to store meta data for each folder and file. The folder tree structure are usually linked. And sub folders and files are usually organized as an on-disk b+tree or similar variations. This scales well in terms of storage, but not well for fast file retrieval due to multiple disk access just for the file meta data, before even trying to get the file content.
WeedFS wants to make as small number of disk access as possible, yet still be able to store a lot of file metadata. So we need to think very differently.
Seaweed-FS wants to make as small number of disk access as possible, yet still be able to store a lot of file metadata. So we need to think very differently.
From a full file path to get to the file content, there are several steps:
@ -48,7 +48,7 @@ From a full file path to get to the file content, there are several steps:
file_id => data_block
Because default WeedFS only provides file_id=>data_block mapping, the first 2 steps need to be implemented.
Because default Seaweed-FS only provides file_id=>data_block mapping, the first 2 steps need to be implemented.
There are several data features I noticed:
@ -122,7 +122,7 @@ The LevelDB implementation may be switched underneath to external data storage,
Also, a HA feature will be added, so that multiple "weed filer" instance can share the same set of view of files.
Later, FUSE or HCFS plugins will be created, to really integrate WeedFS to existing systems.
Later, FUSE or HCFS plugins will be created, to really integrate Seaweed-FS to existing systems.
Here are the strategies or best ways to optimize WeedFS.
Here are the strategies or best ways to optimize Seaweed-FS.
Increase concurrent writes
################################
By default, WeedFS grows the volumes automatically. For example, for no-replication volumes, there will be concurrently 7 writable volumes allocated.
By default, Seaweed-FS grows the volumes automatically. For example, for no-replication volumes, there will be concurrently 7 writable volumes allocated.
If you want to distribute writes to more volumes, you can do so by instructing WeedFS master via this URL.
If you want to distribute writes to more volumes, you can do so by instructing Seaweed-FS master via this URL.
..code-block:: bash
@ -31,14 +31,14 @@ More hard drives will give you better write/read throughput.
Gzip content
################################
WeedFS determines the file can be gzipped based on the file name extension. So if you submit a textual file, it's better to use an common file name extension, like ".txt", ".html", ".js", ".css", etc. If the name is unknown, like ".go", WeedFS will not gzip the content, but just save the content as is.
Seaweed-FS determines the file can be gzipped based on the file name extension. So if you submit a textual file, it's better to use an common file name extension, like ".txt", ".html", ".js", ".css", etc. If the name is unknown, like ".go", Seaweed-FS will not gzip the content, but just save the content as is.
You can also manually gzip content before submission. If you do so, make sure the submitted file has file name with ends with ".gz". For example, "my.css" can be gzipped to "my.css.gz" and sent to WeedFS. When retrieving the content, if the http client supports "gzip" encoding, the gzipped content would be sent back. Otherwise, the unzipped content would be sent back.
You can also manually gzip content before submission. If you do so, make sure the submitted file has file name with ends with ".gz". For example, "my.css" can be gzipped to "my.css.gz" and sent to Seaweed-FS. When retrieving the content, if the http client supports "gzip" encoding, the gzipped content would be sent back. Otherwise, the unzipped content would be sent back.
Memory consumption
#################################
For volume servers, the memory consumption is tightly related to the number of files. For example, one 32G volume can easily have 1.5 million files if each file is only 20KB. To store the 1.5 million entries of meta data in memory, currently WeedFS consumes 36MB memory, about 24bytes per entry in memory. So if you allocate 64 volumes(2TB), you would need 2~3GB memory. However, if the average file size is larger, say 200KB, only 200~300MB memory is needed.
For volume servers, the memory consumption is tightly related to the number of files. For example, one 32G volume can easily have 1.5 million files if each file is only 20KB. To store the 1.5 million entries of meta data in memory, currently Seaweed-FS consumes 36MB memory, about 24bytes per entry in memory. So if you allocate 64 volumes(2TB), you would need 2~3GB memory. However, if the average file size is larger, say 200KB, only 200~300MB memory is needed.
Theoretically the memory consumption can go even lower by compacting since the file ids are mostly monotonically increasing. I did not invest time on that yet since the memory consumption, 24bytes/entry(including uncompressed 8bytes file id, 4 bytes file size, plus additional map data structure cost) is already pretty low. But I welcome any one to compact these data in memory even more efficiently.
@ -106,7 +106,7 @@ In case you need to delete them later, you can go to the volume servers and dele
Logging
##############################
When going to production, you will want to collect the logs. WeedFS uses glog. Here are some examples:
When going to production, you will want to collect the logs. Seaweed-FS uses glog. Here are some examples:
Weed-FS can support replication. The replication is implemented not on file level, but on volume level.
Seaweed-FS can support replication. The replication is implemented not on file level, but on volume level.
How to use
###################################
@ -58,7 +58,7 @@ Each replication type will physically create x+y+z+1 copies of volume data files
Example topology configuration
###################################
The WeedFS master server tries to read the default topology configuration file are read from /etc/weedfs/weedfs.conf, if it exists. The topology setting to configure data center and racks file format is as this.
The Seaweed-FS master server tries to read the default topology configuration file are read from /etc/weedfs/weedfs.conf, if it exists. The topology setting to configure data center and racks file format is as this.