Stuart P. Bentley 10 years ago
parent
commit
79127b267b
  1. 16
      README.md
  2. 177
      docs/Makefile
  3. 293
      docs/api.rst
  4. 169
      docs/benchmarks.rst
  5. 278
      docs/changelist.rst
  6. 44
      docs/clients.rst
  7. 266
      docs/conf.py
  8. 136
      docs/directories.rst
  9. 118
      docs/distributed_filer.rst
  10. 82
      docs/failover.rst
  11. 140
      docs/gettingstarted.rst
  12. 32
      docs/index.rst
  13. 242
      docs/make.bat
  14. 114
      docs/optimization.rst
  15. 98
      docs/replication.rst
  16. 1
      docs/requirements.txt
  17. 85
      docs/ttl.rst
  18. 114
      docs/usecases.rst

16
README.md

@ -3,7 +3,7 @@ Seaweed File System
[![Build Status](https://travis-ci.org/chrislusf/weed-fs.svg?branch=master)](https://travis-ci.org/chrislusf/weed-fs)
[![GoDoc](https://godoc.org/github.com/chrislusf/weed-fs/go?status.svg)](https://godoc.org/github.com/chrislusf/weed-fs/go)
[![RTD](https://readthedocs.org/projects/weed-fs/badge/?version=latest)](http://weed-fs.readthedocs.org/en/latest/)
[![Wiki](https://img.shields.io/badge/docs-wiki-blue.svg)](https://github.com/chrislusf/weed-fs/wiki)
## Introduction
@ -17,9 +17,9 @@ Instead of supporting full POSIX file system semantics, Seaweed-FS choose to imp
Instead of managing all file metadata in a central master, Seaweed-FS choose to manages file volumes in the central master, and let volume servers manage files and the metadata. This relieves concurrency pressure from the central master and spreads file metadata into volume servers' memories, allowing faster file access with just one disk read operation!
Seaweed-FS models after [Facebook's Haystack design paper](http://www.usenix.org/event/osdi10/tech/full_papers/Beaver.pdf).
Seaweed-FS models after [Facebook's Haystack design paper](http://www.usenix.org/event/osdi10/tech/full_papers/Beaver.pdf).
Seaweed-FS costs only 40 bytes disk storage for each file's metadata. It is so simple with O(1) disk read that you are welcome to challenge the performance with your actual use cases.
Seaweed-FS costs only 40 bytes disk storage for each file's metadata. It is so simple with O(1) disk read that you are welcome to challenge the performance with your actual use cases.
![](https://api.bintray.com/packages/chrislusf/Weed-FS/seaweed/images/download.png)
@ -39,7 +39,7 @@ http://groups.google.com/group/weed-file-system Seaweed File System Discussion G
* Support Etag, Accept-Range, Last-Modified, etc.
## Example Usage
By default, the master node runs on port 9333, and the volume nodes runs on port 8080.
By default, the master node runs on port 9333, and the volume nodes runs on port 8080.
Here I will start one master node, and two volume nodes on port 8080 and 8081. Ideally, they should be started from different machines. Here I just use localhost as example.
Seaweed-FS uses HTTP REST operations to write, read, delete. The return results are JSON or JSONP format.
@ -55,7 +55,7 @@ Seaweed-FS uses HTTP REST operations to write, read, delete. The return results
```
> weed volume -dir="/tmp/data1" -max=5 -mserver="localhost:9333" -port=8080 &
> weed volume -dir="/tmp/data2" -max=10 -mserver="localhost:9333" -port=8081 &
> weed volume -dir="/tmp/data2" -max=10 -mserver="localhost:9333" -port=8081 &
```
@ -75,7 +75,7 @@ Second, to store the file content, send a HTTP multipart PUT or POST request to
{"size": 43234}
```
For update, send another PUT or POST request with updated file content.
For update, send another PUT or POST request with updated file content.
For deletion, send an HTTP DELETE request to the same `url + '/' + fid` URL:
@ -89,7 +89,7 @@ The number 3 here, is a volume id. After the comma, it's one file key, 01, and a
The volume id is an unsigned 32 bit integer. The file key is an unsigned 64bit integer. The file cookie is an unsigned 32bit integer, used to prevent URL guessing.
The file key and file cookie are both coded in hex. You can store the <volume id, file key, file cookie> tuple in your own format, or simply store the fid as string.
The file key and file cookie are both coded in hex. You can store the <volume id, file key, file cookie> tuple in your own format, or simply store the fid as string.
If stored as a string, in theory, you would need 8+1+16+8=33 bytes. A char(33) would be enough, if not more than enough, since most usage would not need 2^32 volumes.
@ -269,7 +269,7 @@ step 2: also you may need to install Mercurial by following the instructions bel
http://mercurial.selenic.com/downloads
step 3: download, compile, and install the project by executing the following command
step 3: download, compile, and install the project by executing the following command
go get github.com/chrislusf/weed-fs/go/weed

177
docs/Makefile

@ -1,177 +0,0 @@
# Makefile for Sphinx documentation
#
# You can set these variables from the command line.
SPHINXOPTS =
SPHINXBUILD = sphinx-build
PAPER =
BUILDDIR = _build
# User-friendly check for sphinx-build
ifeq ($(shell which $(SPHINXBUILD) >/dev/null 2>&1; echo $$?), 1)
$(error The '$(SPHINXBUILD)' command was not found. Make sure you have Sphinx installed, then set the SPHINXBUILD environment variable to point to the full path of the '$(SPHINXBUILD)' executable. Alternatively you can add the directory with the executable to your PATH. If you don't have Sphinx installed, grab it from http://sphinx-doc.org/)
endif
# Internal variables.
PAPEROPT_a4 = -D latex_paper_size=a4
PAPEROPT_letter = -D latex_paper_size=letter
ALLSPHINXOPTS = -d $(BUILDDIR)/doctrees $(PAPEROPT_$(PAPER)) $(SPHINXOPTS) .
# the i18n builder cannot share the environment and doctrees with the others
I18NSPHINXOPTS = $(PAPEROPT_$(PAPER)) $(SPHINXOPTS) .
.PHONY: help clean html dirhtml singlehtml pickle json htmlhelp qthelp devhelp epub latex latexpdf text man changes linkcheck doctest gettext
help:
@echo "Please use \`make <target>' where <target> is one of"
@echo " html to make standalone HTML files"
@echo " dirhtml to make HTML files named index.html in directories"
@echo " singlehtml to make a single large HTML file"
@echo " pickle to make pickle files"
@echo " json to make JSON files"
@echo " htmlhelp to make HTML files and a HTML help project"
@echo " qthelp to make HTML files and a qthelp project"
@echo " devhelp to make HTML files and a Devhelp project"
@echo " epub to make an epub"
@echo " latex to make LaTeX files, you can set PAPER=a4 or PAPER=letter"
@echo " latexpdf to make LaTeX files and run them through pdflatex"
@echo " latexpdfja to make LaTeX files and run them through platex/dvipdfmx"
@echo " text to make text files"
@echo " man to make manual pages"
@echo " texinfo to make Texinfo files"
@echo " info to make Texinfo files and run them through makeinfo"
@echo " gettext to make PO message catalogs"
@echo " changes to make an overview of all changed/added/deprecated items"
@echo " xml to make Docutils-native XML files"
@echo " pseudoxml to make pseudoxml-XML files for display purposes"
@echo " linkcheck to check all external links for integrity"
@echo " doctest to run all doctests embedded in the documentation (if enabled)"
clean:
rm -rf $(BUILDDIR)/*
html:
$(SPHINXBUILD) -b html $(ALLSPHINXOPTS) $(BUILDDIR)/html
@echo
@echo "Build finished. The HTML pages are in $(BUILDDIR)/html."
dirhtml:
$(SPHINXBUILD) -b dirhtml $(ALLSPHINXOPTS) $(BUILDDIR)/dirhtml
@echo
@echo "Build finished. The HTML pages are in $(BUILDDIR)/dirhtml."
singlehtml:
$(SPHINXBUILD) -b singlehtml $(ALLSPHINXOPTS) $(BUILDDIR)/singlehtml
@echo
@echo "Build finished. The HTML page is in $(BUILDDIR)/singlehtml."
pickle:
$(SPHINXBUILD) -b pickle $(ALLSPHINXOPTS) $(BUILDDIR)/pickle
@echo
@echo "Build finished; now you can process the pickle files."
json:
$(SPHINXBUILD) -b json $(ALLSPHINXOPTS) $(BUILDDIR)/json
@echo
@echo "Build finished; now you can process the JSON files."
htmlhelp:
$(SPHINXBUILD) -b htmlhelp $(ALLSPHINXOPTS) $(BUILDDIR)/htmlhelp
@echo
@echo "Build finished; now you can run HTML Help Workshop with the" \
".hhp project file in $(BUILDDIR)/htmlhelp."
qthelp:
$(SPHINXBUILD) -b qthelp $(ALLSPHINXOPTS) $(BUILDDIR)/qthelp
@echo
@echo "Build finished; now you can run "qcollectiongenerator" with the" \
".qhcp project file in $(BUILDDIR)/qthelp, like this:"
@echo "# qcollectiongenerator $(BUILDDIR)/qthelp/weed-fs.qhcp"
@echo "To view the help file:"
@echo "# assistant -collectionFile $(BUILDDIR)/qthelp/weed-fs.qhc"
devhelp:
$(SPHINXBUILD) -b devhelp $(ALLSPHINXOPTS) $(BUILDDIR)/devhelp
@echo
@echo "Build finished."
@echo "To view the help file:"
@echo "# mkdir -p $$HOME/.local/share/devhelp/weed-fs"
@echo "# ln -s $(BUILDDIR)/devhelp $$HOME/.local/share/devhelp/weed-fs"
@echo "# devhelp"
epub:
$(SPHINXBUILD) -b epub $(ALLSPHINXOPTS) $(BUILDDIR)/epub
@echo
@echo "Build finished. The epub file is in $(BUILDDIR)/epub."
latex:
$(SPHINXBUILD) -b latex $(ALLSPHINXOPTS) $(BUILDDIR)/latex
@echo
@echo "Build finished; the LaTeX files are in $(BUILDDIR)/latex."
@echo "Run \`make' in that directory to run these through (pdf)latex" \
"(use \`make latexpdf' here to do that automatically)."
latexpdf:
$(SPHINXBUILD) -b latex $(ALLSPHINXOPTS) $(BUILDDIR)/latex
@echo "Running LaTeX files through pdflatex..."
$(MAKE) -C $(BUILDDIR)/latex all-pdf
@echo "pdflatex finished; the PDF files are in $(BUILDDIR)/latex."
latexpdfja:
$(SPHINXBUILD) -b latex $(ALLSPHINXOPTS) $(BUILDDIR)/latex
@echo "Running LaTeX files through platex and dvipdfmx..."
$(MAKE) -C $(BUILDDIR)/latex all-pdf-ja
@echo "pdflatex finished; the PDF files are in $(BUILDDIR)/latex."
text:
$(SPHINXBUILD) -b text $(ALLSPHINXOPTS) $(BUILDDIR)/text
@echo
@echo "Build finished. The text files are in $(BUILDDIR)/text."
man:
$(SPHINXBUILD) -b man $(ALLSPHINXOPTS) $(BUILDDIR)/man
@echo
@echo "Build finished. The manual pages are in $(BUILDDIR)/man."
texinfo:
$(SPHINXBUILD) -b texinfo $(ALLSPHINXOPTS) $(BUILDDIR)/texinfo
@echo
@echo "Build finished. The Texinfo files are in $(BUILDDIR)/texinfo."
@echo "Run \`make' in that directory to run these through makeinfo" \
"(use \`make info' here to do that automatically)."
info:
$(SPHINXBUILD) -b texinfo $(ALLSPHINXOPTS) $(BUILDDIR)/texinfo
@echo "Running Texinfo files through makeinfo..."
make -C $(BUILDDIR)/texinfo info
@echo "makeinfo finished; the Info files are in $(BUILDDIR)/texinfo."
gettext:
$(SPHINXBUILD) -b gettext $(I18NSPHINXOPTS) $(BUILDDIR)/locale
@echo
@echo "Build finished. The message catalogs are in $(BUILDDIR)/locale."
changes:
$(SPHINXBUILD) -b changes $(ALLSPHINXOPTS) $(BUILDDIR)/changes
@echo
@echo "The overview file is in $(BUILDDIR)/changes."
linkcheck:
$(SPHINXBUILD) -b linkcheck $(ALLSPHINXOPTS) $(BUILDDIR)/linkcheck
@echo
@echo "Link check complete; look for any errors in the above output " \
"or in $(BUILDDIR)/linkcheck/output.txt."
doctest:
$(SPHINXBUILD) -b doctest $(ALLSPHINXOPTS) $(BUILDDIR)/doctest
@echo "Testing of doctests in the sources finished, look at the " \
"results in $(BUILDDIR)/doctest/output.txt."
xml:
$(SPHINXBUILD) -b xml $(ALLSPHINXOPTS) $(BUILDDIR)/xml
@echo
@echo "Build finished. The XML files are in $(BUILDDIR)/xml."
pseudoxml:
$(SPHINXBUILD) -b pseudoxml $(ALLSPHINXOPTS) $(BUILDDIR)/pseudoxml
@echo
@echo "Build finished. The pseudo-XML files are in $(BUILDDIR)/pseudoxml."

293
docs/api.rst

@ -1,293 +0,0 @@
API
===================================
Master server
###################################
You can append to any HTTP API with &pretty=y to see a formatted json output.
Assign a file key
***********************************
.. code-block:: bash
# Basic Usage:
curl http://localhost:9333/dir/assign
{"count":1,"fid":"3,01637037d6","url":"127.0.0.1:8080",
"publicUrl":"localhost:8080"}
# To assign with a specific replication type:
curl "http://localhost:9333/dir/assign?replication=001"
# To specify how many file ids to reserve
curl "http://localhost:9333/dir/assign?count=5"
# To assign a specific data center
curl "http://localhost:9333/dir/assign?dataCenter=dc1"
Lookup volume
***********************************
We would need to find out whether the volumes have moved.
.. code-block:: bash
curl "http://localhost:9333/dir/lookup?volumeId=3&pretty=y"
{
"locations": [
{
"publicUrl": "localhost:8080",
"url": "localhost:8080"
}
]
}
# Other usages:
# You can actually use the file id to lookup
curl "http://localhost:9333/dir/lookup?volumeId=3,01637037d6"
# If you know the collection, specify it since it will be a little faster
curl "http://localhost:9333/dir/lookup?volumeId=3&collection=turbo"
Force garbage collection
***********************************
If your system has many deletions, the deleted file's disk space will not be synchronously re-claimed. There is a background job to check volume disk usage. If empty space is more than the threshold, default to 0.3, the vacuum job will make the volume readonly, create a new volume with only existing files, and switch on the new volume. If you are impatient or doing some testing, vacuum the unused spaces this way.
.. code-block:: bash
curl "http://localhost:9333/vol/vacuum"
curl "http://localhost:9333/vol/vacuum?garbageThreshold=0.4"
The garbageThreshold=0.4 is optional, and will not change the default threshold. You can start volume master with a different default garbageThreshold.
Pre-Allocate Volumes
***********************************
One volume servers one write a time. If you need to increase concurrency, you can pre-allocate lots of volumes.
.. code-block:: bash
curl "http://localhost:9333/vol/grow?replication=000&count=4"
{"count":4}
# specify a collection
curl "http://localhost:9333/vol/grow?collection=turbo&count=4"
# specify data center
curl "http://localhost:9333/vol/grow?dataCenter=dc1&count=4"
This generates 4 empty volumes.
Check System Status
***********************************
.. code-block:: bash
curl "http://10.0.2.15:9333/cluster/status?pretty=y"
{
"IsLeader": true,
"Leader": "10.0.2.15:9333",
"Peers": [
"10.0.2.15:9334",
"10.0.2.15:9335"
]
}
curl "http://localhost:9333/dir/status?pretty=y"
{
"Topology": {
"DataCenters": [
{
"Free": 3,
"Id": "dc1",
"Max": 7,
"Racks": [
{
"DataNodes": [
{
"Free": 3,
"Max": 7,
"PublicUrl": "localhost:8080",
"Url": "localhost:8080",
"Volumes": 4
}
],
"Free": 3,
"Id": "DefaultRack",
"Max": 7
}
]
},
{
"Free": 21,
"Id": "dc3",
"Max": 21,
"Racks": [
{
"DataNodes": [
{
"Free": 7,
"Max": 7,
"PublicUrl": "localhost:8081",
"Url": "localhost:8081",
"Volumes": 0
}
],
"Free": 7,
"Id": "rack1",
"Max": 7
},
{
"DataNodes": [
{
"Free": 7,
"Max": 7,
"PublicUrl": "localhost:8082",
"Url": "localhost:8082",
"Volumes": 0
},
{
"Free": 7,
"Max": 7,
"PublicUrl": "localhost:8083",
"Url": "localhost:8083",
"Volumes": 0
}
],
"Free": 14,
"Id": "DefaultRack",
"Max": 14
}
]
}
],
"Free": 24,
"Max": 28,
"layouts": [
{
"collection": "",
"replication": "000",
"writables": [
1,
2,
3,
4
]
}
]
},
"Version": "0.47"
}
Volume Server
###################################
Upload File
***********************************
.. code-block:: bash
curl -F file=@/home/chris/myphoto.jpg http://127.0.0.1:8080/3,01637037d6
{"size": 43234}
The size returned is the size stored on Seaweed-FS, sometimes the file is automatically gzipped based on the mime type.
Upload File Directly
***********************************
.. code-block:: bash
curl -F file=@/home/chris/myphoto.jpg http://localhost:9333/submit
{"fid":"3,01fbe0dc6f1f38","fileName":"myphoto.jpg","fileUrl":"localhost:8080/3,01fbe0dc6f1f38","size":68231}
This API is just for convenience. The master server would get an file id and store the file to the right volume server.
It is a convenient API and does not support different parameters when assigning file id. (or you can add the support and send a push request.)
Delete File
***********************************
.. code-block:: bash
curl -X DELETE http://127.0.0.1:8080/3,01637037d6
Create а specific volume on a specific volume server
*****************************************************
.. code-block:: bash
curl "http://localhost:8080/admin/assign_volume?replication=000&volume=3"
This generates volume 3 on this volume server.
If you use other replicationType, e.g. 001, you would need to do the same on other volume servers to create the mirroring volumes.
Check Volume Server Status
***********************************
.. code-block:: bash
curl "http://localhost:8080/status?pretty=y"
{
"Version": "0.34",
"Volumes": [
{
"Id": 1,
"Size": 1319688,
"RepType": "000",
"Version": 2,
"FileCount": 276,
"DeleteCount": 0,
"DeletedByteCount": 0,
"ReadOnly": false
},
{
"Id": 2,
"Size": 1040962,
"RepType": "000",
"Version": 2,
"FileCount": 291,
"DeleteCount": 0,
"DeletedByteCount": 0,
"ReadOnly": false
},
{
"Id": 3,
"Size": 1486334,
"RepType": "000",
"Version": 2,
"FileCount": 301,
"DeleteCount": 2,
"DeletedByteCount": 0,
"ReadOnly": false
},
{
"Id": 4,
"Size": 8953592,
"RepType": "000",
"Version": 2,
"FileCount": 320,
"DeleteCount": 2,
"DeletedByteCount": 0,
"ReadOnly": false
},
{
"Id": 5,
"Size": 70815851,
"RepType": "000",
"Version": 2,
"FileCount": 309,
"DeleteCount": 1,
"DeletedByteCount": 0,
"ReadOnly": false
},
{
"Id": 6,
"Size": 1483131,
"RepType": "000",
"Version": 2,
"FileCount": 301,
"DeleteCount": 1,
"DeletedByteCount": 0,
"ReadOnly": false
},
{
"Id": 7,
"Size": 46797832,
"RepType": "000",
"Version": 2,
"FileCount": 292,
"DeleteCount": 0,
"DeletedByteCount": 0,
"ReadOnly": false
}
]
}

169
docs/benchmarks.rst

@ -1,169 +0,0 @@
Benchmarks
======================
Do we really need the benchmark? People always use benchmark to compare systems.
But benchmarks are misleading. The resources, e.g., CPU, disk, memory, network,
all matter a lot. And with Seaweed File System, single node vs multiple nodes,
benchmarking on one machine vs several multiple machines, all matter a lot.
Here is the steps on how to run benchmark if you really need some numbers.
Unscientific Single machine benchmarking
##################################################
I start weed servers in one console for simplicity. Better run servers on different consoles.
For more realistic tests, please start them on different machines.
.. code-block:: bash
# prepare directories
mkdir 3 4 5
# start 3 servers
./weed server -dir=./3 -master.port=9333 -volume.port=8083 &
./weed volume -dir=./4 -port=8084 &
./weed volume -dir=./5 -port=8085 &
./weed benchmark -server=localhost:9333
What does the test do?
#############################
By default, the benchmark command would start writing 1 million files, each having 1KB size, uncompressed.
For each file, one request is sent to assign a file key, and a second request is sent to post the file to the volume server.
The written file keys are stored in a temp file.
Then the benchmark command would read the list of file keys, randomly read 1 million files.
For each volume, the volume id is cached, so there is several request to lookup the volume id,
and all the rest requests are to get the file content.
Many options are options are configurable. Please check the help content:
.. code-block:: bash
./weed benchmark -h
Different Benchmark Target
###############################
The default "weed benchmark" uses 1 million 1KB file. This is to stress the number of files per second.
Increasing the file size to 100KB or more can show much larger number of IO throughput in KB/second.
My own unscientific single machine results
###################################################
My Own Results on Mac Book with Solid State Disk, CPU: 1 Intel Core i7 at 2.2GHz.
.. code-block:: bash
Write 1 million 1KB file:
Concurrency Level: 64
Time taken for tests: 182.456 seconds
Complete requests: 1048576
Failed requests: 0
Total transferred: 1073741824 bytes
Requests per second: 5747.01 [#/sec]
Transfer rate: 5747.01 [Kbytes/sec]
Connection Times (ms)
min avg max std
Total: 0.3 10.9 430.9 5.7
Percentage of the requests served within a certain time (ms)
50% 10.2 ms
66% 12.0 ms
75% 12.6 ms
80% 12.9 ms
90% 14.0 ms
95% 14.9 ms
98% 16.2 ms
99% 17.3 ms
100% 430.9 ms
Randomly read 1 million files:
Concurrency Level: 64
Time taken for tests: 80.732 seconds
Complete requests: 1048576
Failed requests: 0
Total transferred: 1073741824 bytes
Requests per second: 12988.37 [#/sec]
Transfer rate: 12988.37 [Kbytes/sec]
Connection Times (ms)
min avg max std
Total: 0.0 4.7 254.3 6.3
Percentage of the requests served within a certain time (ms)
50% 2.6 ms
66% 2.9 ms
75% 3.7 ms
80% 4.7 ms
90% 10.3 ms
95% 16.6 ms
98% 26.3 ms
99% 34.8 ms
100% 254.3 ms
My own replication 001 single machine results
##############################################
Create benchmark volumes directly
.. code-block:: bash
curl "http://localhost:9333/vol/grow?collection=benchmark&count=3&replication=001&pretty=y"
# Later, after finishing the test, remove the benchmark collection
curl "http://localhost:9333/col/delete?collection=benchmark&pretty=y"
Write 1million 1KB files results:
Concurrency Level: 64
Time taken for tests: 174.949 seconds
Complete requests: 1048576
Failed requests: 0
Total transferred: 1073741824 bytes
Requests per second: 5993.62 [#/sec]
Transfer rate: 5993.62 [Kbytes/sec]
Connection Times (ms)
min avg max std
Total: 0.3 10.4 296.6 4.4
Percentage of the requests served within a certain time (ms)
50% 9.7 ms
66% 11.5 ms
75% 12.1 ms
80% 12.4 ms
90% 13.4 ms
95% 14.3 ms
98% 15.5 ms
99% 16.7 ms
100% 296.6 ms
Randomly read results:
Concurrency Level: 64
Time taken for tests: 53.987 seconds
Complete requests: 1048576
Failed requests: 0
Total transferred: 1073741824 bytes
Requests per second: 19422.81 [#/sec]
Transfer rate: 19422.81 [Kbytes/sec]
Connection Times (ms)
min avg max std
Total: 0.0 3.0 256.9 3.8
Percentage of the requests served within a certain time (ms)
50% 2.7 ms
66% 2.9 ms
75% 3.2 ms
80% 3.5 ms
90% 4.4 ms
95% 5.6 ms
98% 7.4 ms
99% 9.4 ms
100% 256.9 ms
How can the replication 001 writes faster than no replication?
I could not tell. Very likely, the computer was in turbo mode.
I can not reproduce it consistently either. Posted the number here just to illustrate that number lies.
Don't quote on the exact number, just get an idea of the performance would be good enough.

278
docs/changelist.rst

@ -1,278 +0,0 @@
Change List
===================================
Introduction
############
This file contains list of recent changes, important features, usage changes, data format changes, etc. Do read this if you upgrade.
v0.68
#####
1. Filer supports storing file~file_id mapping to remote key-value storage Redis, Cassandra. So multiple filers are supported.
v0.67
#####
1. Increase "weed benchmark" performance to pump in more data. The bottleneck is on the client side. Duh...
v0.65
#####
1. Reset the cluster configuration if "-peers" is not empty.
v0.64
#####
1. Add TTL support!
1. filer: resolve directory log file error, avoid possible race condition
v0.63
#####
1. Compiled with Go 1.3.1 to fix a rare crashing issue.
v0.62
#####
1. Add support for Etag.
2. Add /admin/mv to move a file or a folder.
3. Add client Go API to pre-process the images.
v0.61
#####
1. Reduce memory requirements for "weed fix"
2. Guess mime type by file name extensions when stored mime type is "application/octstream"
3. Added simple volume id lookup caching expiring by time.
v0.60
#####
Fix file missing error caused by .idx file overwriting. The problem shows up if the weed volume server is restarted after 2 times. But the actual .idx file may have already been overwritten on second restart.
To fix this issue, please run "weed fix -dir=... -volumeId=..." to re-generate the .idx file.
v0.59
#####
1. Add option to automatically fix jpeg picture orientation.
2. Add volume id lookup caching
3. Support Partial Content and Range Requests. http status code == 206.
v0.57
#####
Add hidden dynamic image resizing feature
Add an hidden feature: For images, jpg/png/gif, if you specify append these url parameters, &width=xxx or &height=xxx or both, the image will be dynamically resized. However, resizing the image would cause high CPU and memory usage. Not recommended unless special use cases. So this would not be documented anywhere else.
v0.56 Major Command line options change
#####
Adjust command line options.
1. switch to use -publicIp instead of -publicUrl
2. -ip can be empty. It will listen to all available interfaces.
3. For "weed server", these options are changed:
- -masterPort => -master.port
- -peers => -master.peers
- -mdir => -master.dir
- -volumeSizeLimitMB => -master.volumeSizeLimitMB
- -conf => -master.conf
- -defaultReplicaPlacement => -master.defaultReplicaPlacement
- -port => -volume.port
- -max => -volume.max
v0.55 Recursive folder deletion for Filer
#####
Now folders with sub folders or files can be deleted recursively.
Also, for filer, avoid showing files under the first created directory when listing the root directory.
v0.54 Misc improvements
#####
No need to persist metadata for master sequence number generation. This shall avoid possible issues where file are lost due to duplicated sequence number generated in rare cases.
More robust handing of "peers" in master node clustering mode.
Added logging instructions.
v0.53 Miscellaneous improvements
#####
Added retry logic to wait for cluster peers during cluster bootstrapping. Previously the cluster bootstrapping is ordered. This make it tricky to deploy automatically and repeatedly. The fix make the commands repeatable.
Also, when growing volumes, additional preferred "rack" and "dataNode" parameters are also provided, works together with existing "dataCenter" parameter.
Fix important bug where settings for non-"000" replications are read back wrong, if volume server is restarted.
v0.52 Added "filer" server
#####
A "weed filer" server is added, to provide more "common" file storage. Currently the fullFileName-to-fileId mapping is stored with an efficient embedded leveldb. So it's not linearly scalable yet. But it can handle LOTS of files.
.. code-block:: bash
//POST a file and read it back
curl -F "filename=@README.md" "http://localhost:8888/path/to/sources/"
curl "http://localhost:8888/path/to/sources/README.md"
//POST a file with a new name and read it back
curl -F "filename=@Makefile" "http://localhost:8888/path/to/sources/new_name"
curl "http://localhost:8888/path/to/sources/new_name"
//list sub folders and files
curl "http://localhost:8888/path/to/sources/?pretty=y"
v0.51 Idle Timeout
#####
Previously the timeout setting is "-readTimeout", which is the time limit of the whole http connection. This is inconvenient for large files or for slow internet connections. Now this option is replaced with "-idleTimeout", and default to 10 seconds. Ideally, you should not need to tweak it based on your use case.
v0.50 Improved Locking
#####
1. All read operation switched to thread-safe pread, no read locks now.
2. When vacuuming large volumes, a lock was preventing heartbeats to master node. This is fixed now.
3. Fix volume compaction error for collections.
v0.49 Bug Fixes
#####
With the new benchmark tool to bombard the system, many bugs are found and fixed, especially on clustering, http connection reuse.
v0.48 added benchmark command!
#####
Benchmark! Enough said.
v0.47 Improving replication
#####
Support more replication types.
v0.46 Adding failover master server
#####
Automatically fail over master servers!
v0.46 Add "weed server" command
#####
Now you can start one master server and one volume server in just one command!
.. code-block:: bash
weed server
v0.45 Add support for extra large file
#####
For extra large file, this example will split the file into 100MB chunks.
.. code-block:: bash
weed upload -maxMB=100 the_file_name
Also, Added "download" command, for simple files or chunked files.
.. code-block:: bash
weed download file_id [file_id3](file_id2)
v0.34 Add support for multiple directories on volume server
#####
For volume server, add support for multiple folders and multiple max limit. For example:
.. code-block:: bash
weed volume -dir=folder1,folder2,folder3 -max=7,8,9
v0.33 Add Nicer URL support
#####
For HTTP GET request
.. code-block:: bash
http://localhost:8080/3,01637037d6
Can also be retrieved by
.. code-block:: bash
http://localhost:8080/3/01637037d6/my_preferred_name.jpg
v0.32 Add support for Last-Modified header
#####
The last modified timestamp is stored with 5 additional bytes.
Return http code 304 if the file is not modified.
Also, the writing are more solid with the fix for issue#26.
v0.31 Allocate File Key on specific data center
#####
Volume servers can start with a specific data center name.
.. code-block:: bash
weed volume -dir=/tmp/1 -port=8080 -dataCenter=dc1
weed volume -dir=/tmp/2 -port=8081 -dataCenter=dc2
Or the master server can determine the data center via volume server's IP address and settings in weed.conf file.
Now when requesting a file key, an optional "dataCenter" parameter can limit the assigned volume to the specific data center. For example, this specif
.. code-block:: bash
http://localhost:9333/dir/assign?dataCenter=dc1
v0.26 Storing File Name and Mime Type
#####
In order to keep one single disk read for each file, a new storage format is implemented to store: is gzipped or not, file name and mime type (used when downloading files), and possibly other future new attributes. The volumes with old storage format are treated as read only and deprecated.
Also, you can pre-gzip and submit your file directly, for example, gzip "my.css" into "my.css.gz", and submit. In this case, "my.css" will be stored as the file name. This should save some transmission time, and allow you to force gzipped storage or customize the gzip compression level.
v0.25 Adding reclaiming garbage spaces
Garbage spaces are reclaimed by an automatic compacting process. Garbage spaces are generated when updating or deleting files. If they exceed a configurable threshold, 0.3 by default (meaning 30% of the used disk space is garbage), the volume will be marked as readonly, compacted and garbage spaces are reclaimed, and then marked as writable.
v0.19 Adding rack and data center aware replication
#####
Now when you have one rack, or multiple racks, or multiple data centers, you can choose your own replication strategy.
v0.18 Detect disconnected volume servers
#####
The disconnected volume servers would not be assigned when generating the file keys. Volume servers by default send a heartbeat to master server every 5~10 seconds. Master thinks the volume server is disconnected after 5 times of the heartbeat interval, or 25 seconds by default.
v0.16 Change to single executable file to do everything
#####
If you are using v0.15 or earlier, you would use
.. code-block:: bash
>weedvolume -dir="/tmp" -volumes=0-4 -mserver="localhost:9333" -port=8080 -publicUrl="localhost:8080"
With v0.16 or later, you would need to do this in stead:
.. code-block:: bash
>weed volume -dir="/tmp" -volumes=0-4 -mserver="localhost:9333" -port=8080 -publicUrl="localhost:8080"
And more new commands, in addition to "server","volume","fix", etc, will be added.
This provides a simple deliverable file, and the file size is much smaller since Go language statically compile the commands. Combining commands into one file would avoid lots of duplication.

44
docs/clients.rst

@ -1,44 +0,0 @@
Client libraries
=====================
Clients
###################################
+---------------------------------------------------------------------------------+--------------+-----------+
| Name | Author | Language |
+=================================================================================+==============+===========+
| `WeedPHP <https://github.com/micjohnson/weed-php/>`_ | Mic Johnson | PHP |
+---------------------------------------------------------------------------------+--------------+-----------+
| `Seaweed-FS Symfony bundle <https://github.com/micjohnson/weed-php-bundle>`_ | Mic Johnson | PHP |
+---------------------------------------------------------------------------------+--------------+-----------+
| `Seaweed-FS Node.js client <https://github.com/cruzrr/node-weedfs>`_ | Aaron Blakely| Javascript|
+---------------------------------------------------------------------------------+--------------+-----------+
| `Amazon S3 API for Seaweed-FS <https://github.com/tgulacsi/s3weed>`_ | Tamás Gulácsi| Go |
+---------------------------------------------------------------------------------+--------------+-----------+
| `File store upload test <https://github.com/tgulacsi/filestore-upload-test>`_ | Tamás Gulácsi| Go |
+---------------------------------------------------------------------------------+--------------+-----------+
| `Java Seaweed-FS client <https://github.com/simplebread/WeedFSClient>`_ | Xu Zhang | Java |
+---------------------------------------------------------------------------------+--------------+-----------+
| `Java Seaweed-FS client 2 <https://github.com/zenria/Weed-FS-Java-Client>`_ | Zenria | Java |
+---------------------------------------------------------------------------------+--------------+-----------+
| `Python-weed <https://github.com/darkdarkfruit/python-weed>`_ | Darkdarkfruit| Python |
+---------------------------------------------------------------------------------+--------------+-----------+
| `Pyweed <https://github.com/utek/pyweed>`_ | Utek | Python |
+---------------------------------------------------------------------------------+--------------+-----------+
| `Camlistore blobserver Storage <https://github.com/tgulacsi/camli-weed>`_ | Tamás Gulácsi| Go |
+---------------------------------------------------------------------------------+--------------+-----------+
| `Scala Seaweed-FS client <https://github.com/chiradip/WeedFsScalaClient>`_ | Chiradip | Scala |
+---------------------------------------------------------------------------------+--------------+-----------+
| `Module for kohana <https://github.com/bububa/kohanaphp-weedfs>`_ | Bububa | PHP |
+---------------------------------------------------------------------------------+--------------+-----------+
| `Weedo <https://github.com/ginuerzh/weedo>`_ | Ginuerzh | Go |
+---------------------------------------------------------------------------------+--------------+-----------+
| `Django-weed <https://github.com/ProstoKSI/django-weed>`_ | ProstoKSI | Python |
+---------------------------------------------------------------------------------+--------------+-----------+
Projects using Seaweed-FS
###################################
* An `email River Plugin <https://github.com/medcl/elasticsearch-river-email/>`_ for Elasticsearch uses Seaweed-FS server to save attachments
Websites using Seaweed-FS
###################################
* `Email to create Web Pages <http://mailp.in/>`_ uses Seaweed-FS to save email attachments.

266
docs/conf.py

@ -1,266 +0,0 @@
# -*- coding: utf-8 -*-
#
# weed-fs documentation build configuration file, created by
# sphinx-quickstart on Wed Jul 23 12:13:18 2014.
#
# This file is execfile()d with the current directory set to its
# containing dir.
#
# Note that not all possible configuration values are present in this
# autogenerated file.
#
# All configuration values have a default; values that are commented out
# serve to show the default.
import sys
import os
# If extensions (or modules to document with autodoc) are in another directory,
# add these directories to sys.path here. If the directory is relative to the
# documentation root, use os.path.abspath to make it absolute, like shown here.
#sys.path.insert(0, os.path.abspath('.'))
# -- General configuration ------------------------------------------------
# If your documentation needs a minimal Sphinx version, state it here.
#needs_sphinx = '1.0'
# Add any Sphinx extension module names here, as strings. They can be
# extensions coming with Sphinx (named 'sphinx.ext.*') or your custom
# ones.
extensions = []
# Add any paths that contain templates here, relative to this directory.
templates_path = ['_templates']
# The suffix of source filenames.
source_suffix = '.rst'
# The encoding of source files.
#source_encoding = 'utf-8-sig'
# The master toctree document.
master_doc = 'index'
# General information about the project.
project = u'seaweed-fs'
copyright = u'2015, chrislusf, ernado'
# The version info for the project you're documenting, acts as replacement for
# |version| and |release|, also used in various other places throughout the
# built documents.
#
# The short X.Y version.
version = '0.67'
# The full version, including alpha/beta/rc tags.
release = '0.67'
# The language for content autogenerated by Sphinx. Refer to documentation
# for a list of supported languages.
#language = None
# There are two options for replacing |today|: either, you set today to some
# non-false value, then it is used:
#today = ''
# Else, today_fmt is used as the format for a strftime call.
#today_fmt = '%B %d, %Y'
# List of patterns, relative to source directory, that match files and
# directories to ignore when looking for source files.
exclude_patterns = ['_build']
# The reST default role (used for this markup: `text`) to use for all
# documents.
#default_role = None
# If true, '()' will be appended to :func: etc. cross-reference text.
#add_function_parentheses = True
# If true, the current module name will be prepended to all description
# unit titles (such as .. function::).
#add_module_names = True
# If true, sectionauthor and moduleauthor directives will be shown in the
# output. They are ignored by default.
#show_authors = False
# The name of the Pygments (syntax highlighting) style to use.
pygments_style = 'sphinx'
# A list of ignored prefixes for module index sorting.
#modindex_common_prefix = []
# If true, keep warnings as "system message" paragraphs in the built documents.
#keep_warnings = False
# -- Options for HTML output ----------------------------------------------
# The theme to use for HTML and HTML Help pages. See the documentation for
# a list of builtin themes.
html_theme = 'default'
# Theme options are theme-specific and customize the look and feel of a theme
# further. For a list of options available for each theme, see the
# documentation.
#html_theme_options = {}
# Add any paths that contain custom themes here, relative to this directory.
#html_theme_path = []
# The name for this set of Sphinx documents. If None, it defaults to
# "<project> v<release> documentation".
#html_title = None
# A shorter title for the navigation bar. Default is the same as html_title.
#html_short_title = None
# The name of an image file (relative to this directory) to place at the top
# of the sidebar.
#html_logo = None
# The name of an image file (within the static path) to use as favicon of the
# docs. This file should be a Windows icon file (.ico) being 16x16 or 32x32
# pixels large.
#html_favicon = None
# Add any paths that contain custom static files (such as style sheets) here,
# relative to this directory. They are copied after the builtin static files,
# so a file named "default.css" will overwrite the builtin "default.css".
html_static_path = ['_static']
# Add any extra paths that contain custom files (such as robots.txt or
# .htaccess) here, relative to this directory. These files are copied
# directly to the root of the documentation.
#html_extra_path = []
# If not '', a 'Last updated on:' timestamp is inserted at every page bottom,
# using the given strftime format.
#html_last_updated_fmt = '%b %d, %Y'
# If true, SmartyPants will be used to convert quotes and dashes to
# typographically correct entities.
#html_use_smartypants = True
# Custom sidebar templates, maps document names to template names.
#html_sidebars = {}
# Additional templates that should be rendered to pages, maps page names to
# template names.
#html_additional_pages = {}
# If false, no module index is generated.
#html_domain_indices = True
# If false, no index is generated.
#html_use_index = True
# If true, the index is split into individual pages for each letter.
#html_split_index = False
# If true, links to the reST sources are added to the pages.
#html_show_sourcelink = True
# If true, "Created using Sphinx" is shown in the HTML footer. Default is True.
#html_show_sphinx = True
# If true, "(C) Copyright ..." is shown in the HTML footer. Default is True.
#html_show_copyright = True
# If true, an OpenSearch description file will be output, and all pages will
# contain a <link> tag referring to it. The value of this option must be the
# base URL from which the finished HTML is served.
#html_use_opensearch = ''
# This is the file name suffix for HTML files (e.g. ".xhtml").
#html_file_suffix = None
# Output file base name for HTML help builder.
htmlhelp_basename = 'weed-fsdoc'
# -- Options for LaTeX output ---------------------------------------------
latex_elements = {
# The paper size ('letterpaper' or 'a4paper').
#'papersize': 'letterpaper',
# The font size ('10pt', '11pt' or '12pt').
#'pointsize': '10pt',
# Additional stuff for the LaTeX preamble.
#'preamble': '',
}
# Grouping the document tree into LaTeX files. List of tuples
# (source start file, target name, title,
# author, documentclass [howto, manual, or own class]).
latex_documents = [
('index', 'weed-fs.tex', u'weed-fs Documentation',
u'chrislusf, ernado', 'manual'),
]
# The name of an image file (relative to this directory) to place at the top of
# the title page.
#latex_logo = None
# For "manual" documents, if this is true, then toplevel headings are parts,
# not chapters.
#latex_use_parts = False
# If true, show page references after internal links.
#latex_show_pagerefs = False
# If true, show URL addresses after external links.
#latex_show_urls = False
# Documents to append as an appendix to all manuals.
#latex_appendices = []
# If false, no module index is generated.
#latex_domain_indices = True
# -- Options for manual page output ---------------------------------------
# One entry per manual page. List of tuples
# (source start file, name, description, authors, manual section).
man_pages = [
('index', 'weed-fs', u'weed-fs Documentation',
[u'chrislusf, ernado'], 1)
]
# If true, show URL addresses after external links.
#man_show_urls = False
# -- Options for Texinfo output -------------------------------------------
# Grouping the document tree into Texinfo files. List of tuples
# (source start file, target name, title, author,
# dir menu entry, description, category)
texinfo_documents = [
('index', 'seaweed-fs', u'seaweed-fs Documentation',
u'chrislusf, ernado', 'seaweed-fs', 'One line description of project.',
'Miscellaneous'),
]
# Documents to append as an appendix to all manuals.
#texinfo_appendices = []
# If false, no module index is generated.
#texinfo_domain_indices = True
# How to display URL addresses: 'footnote', 'no', or 'inline'.
#texinfo_show_urls = 'footnote'
# If true, do not generate a @detailmenu in the "Top" node's menu.
#texinfo_no_detailmenu = False
# on_rtd is whether we are on readthedocs.org, this line of code grabbed from docs.readthedocs.org
on_rtd = os.environ.get('READTHEDOCS', None) == 'True'
if not on_rtd: # only import and set the theme if we're building docs locally
import sphinx_rtd_theme
html_theme = 'sphinx_rtd_theme'
html_theme_path = [sphinx_rtd_theme.get_html_theme_path()]

136
docs/directories.rst

@ -1,136 +0,0 @@
Directories and files
===========================
When talking about file systems, many people would assume directories,
list files under a directory, etc. These are expected if we want to hook up
Seaweed File System with linux by FUSE, or with Hadoop, etc.
Sample usage
#####################
Two ways to start a weed filer in standalone mode:
.. code-block:: bash
# assuming you already started weed master and weed volume
weed filer
# Or assuming you have nothing started yet,
# this command starts master server, volume server, and filer in one shot.
# It's strictly the same as starting them separately.
weed server -filer=true
Now you can add/delete files, and even browse the sub directories and files
.. code-block:: bash
# POST a file and read it back
curl -F "filename=@README.md" "http://localhost:8888/path/to/sources/"
curl "http://localhost:8888/path/to/sources/README.md"
# POST a file with a new name and read it back
curl -F "filename=@Makefile" "http://localhost:8888/path/to/sources/new_name"
curl "http://localhost:8888/path/to/sources/new_name"
# list sub folders and files
curl "http://localhost:8888/path/to/sources/?pretty=y"
# if lots of files under this folder, here is a way to efficiently paginate through all of them
curl "http://localhost:8888/path/to/sources/?lastFileName=abc.txt&limit=50&pretty=y"
Design
############
A common file system would use inode to store meta data for each folder and file. The folder tree structure are usually linked. And sub folders and files are usually organized as an on-disk b+tree or similar variations. This scales well in terms of storage, but not well for fast file retrieval due to multiple disk access just for the file meta data, before even trying to get the file content.
Seaweed-FS wants to make as small number of disk access as possible, yet still be able to store a lot of file metadata. So we need to think very differently.
We can take the following steps to map a full file path to the actual data block:
.. code-block:: bash
file_parent_directory => directory_id
directory_id+fileName => file_id
file_id => data_block
Because default Seaweed-FS only provides file_id=>data_block mapping, only the first 2 steps need to be implemented.
There are several data features I noticed:
* the number of directories usually is small, or very small
* the number of files can be small, medium, large, or very large
This leads to a novel (as far as I know now) approach to organize the meta data for the directories and files separately.
A "weed filer" server is to provide these two missing parent_directory=>directory_id, and directory_id+filename=>file_id mappings, completing the "common" file storage interface.
Assumptions
###############
I believe these are reasonable assumptions:
* The number of directories are smaller than the number of files by one or more magnitudes.
* Very likely for big systems, the number of files under one particular directory can be very high, ideally unlimited, far exceeding the number of directories.
* Directory meta data is accessed very often.
Data structure
#################
This assumed differences between directories and files lead to the design that the metadata for directories and files should have different data structure.
* Store directories in memory
* all of directories hopefully all be in memory
* efficient to move/rename/list_directories
* Store files in a sorted string table in <dir_id/filename, file_id> format
* efficient to list_files, just simple iterator
* efficient to locate files, binary search
Complexity
###################
For one file retrieval, if the parent directory includes n folders, then it will take n steps to navigate from root to the file folder. However, this O(n) step is all in memory. So in practice, it will be very fast.
For one file retrieval, the dir_id+filename=>file_id lookup will be O(logN) using LevelDB, a log-structured-merge (LSM) tree implementation. The complexity is the same as B-Tree.
For file listing under a particular directory, the listing in LevelDB is just a simple scan, since the record in LevelDB is already sorted. For B-Tree, this may involves multiple disk seeks to jump through.
For directory renaming, it's just trivially change the name or parent of the directory. Since the directory_id stays the same, there are no change to files metadata.
For file renaming, it's just trivially delete and then add a row in leveldb.
Details
########################
In the current first version, the path_to_file=>file_id mapping is stored with an efficient embedded leveldb. Being embedded, it runs on single machine. So it's not linearly scalable yet. However, it can handle LOTS AND LOTS of files on Seaweed-FS on other master/volume servers.
Switching from the embedded leveldb to an external distributed database is very feasible. Your contribution is welcome!
The in-memory directory structure can improve on memory efficiency. Current simple map in memory works when the number of directories is less than 1 million, which will use about 500MB memory. But I would expect common use case would have a few, not even more than 100 directories.
Use Cases
#########################
Clients can assess one "weed filer" via HTTP, list files under a directory, create files via HTTP POST, read files via HTTP POST directly.
Although one "weed filer" can only sits in one machine, you can start multiple "weed filer" on several machines, each "weed filer" instance running in its own collection, having its own namespace, but sharing the same Seaweed-FS storage.
Future
###################
In future version, the parent_directory=>directory_id, and directory_id+filename=>file_id mappings will be refactored to support different storage system.
The directory meta data may be switched to some other in-memory database.
The LevelDB implementation may be switched underneath to external data storage, e.g. MySQL, TokyoCabinet, etc. Preferably some pure-go implementation.
Also, a HA feature will be added, so that multiple "weed filer" instance can share the same set of view of files.
Later, FUSE or HCFS plugins will be created, to really integrate Seaweed-FS to existing systems.
Helps Wanted
########################
This is a big step towards more interesting Seaweed-FS usage and integration with existing systems.
If you can help to refactor and implement other directory meta data, or file meta data storage, please do so.

118
docs/distributed_filer.rst

@ -1,118 +0,0 @@
Distributed Filer
===========================
The default weed filer is in standalone mode, storing file metadata on disk.
It is quite efficient to go through deep directory path and can handle
millions of files.
However, no SPOF is a must-have requirement for many projects.
Luckily, SeaweedFS is so flexible that we can use a completely different way
to manage file metadata.
This distributed filer uses Redis or Cassandra to store the metadata.
Redis Setup
#####################
No setup required.
Cassandra Setup
#####################
Here is the CQL to create the table.CassandraStore.
Optionally you can adjust the keyspace name and replication settings.
For production, you would want to set replication_factor to 3
if there are at least 3 Cassandra servers.
.. code-block:: bash
create keyspace seaweed WITH replication = {
'class':'SimpleStrategy',
'replication_factor':1
};
use seaweed;
CREATE TABLE seaweed_files (
path varchar,
fids list<varchar>,
PRIMARY KEY (path)
);
Sample usage
#####################
To start a weed filer in distributed mode with Redis:
.. code-block:: bash
# assuming you already started weed master and weed volume
weed filer -redis.server=localhost:6379
To start a weed filer in distributed mode with Cassandra:
.. code-block:: bash
# assuming you already started weed master and weed volume
weed filer -cassandra.server=localhost
Now you can add/delete files
.. code-block:: bash
# POST a file and read it back
curl -F "filename=@README.md" "http://localhost:8888/path/to/sources/"
curl "http://localhost:8888/path/to/sources/README.md"
# POST a file with a new name and read it back
curl -F "filename=@Makefile" "http://localhost:8888/path/to/sources/new_name"
curl "http://localhost:8888/path/to/sources/new_name"
Limitation
############
List sub folders and files are not supported because Redis or Cassandra
does not support prefix search.
Flat Namespace Design
############
In stead of using both directory and file metadata, this implementation uses
a flat namespace.
If storing each directory metadata separatedly, there would be multiple
network round trips to fetch directory information for deep directories,
impeding system performance.
A flat namespace would take more space because the parent directories are
repeatedly stored. But disk space is a lesser concern especially for
distributed systems.
So either Redis or Cassandra is a simple file_full_path ~ file_id mapping.
(Actually Cassandra is a file_full_path ~ list_of_file_ids mapping
with the hope to support easy file appending for streaming files.)
Complexity
###################
For one file retrieval, the full_filename=>file_id lookup will be O(logN)
using Redis or Cassandra. But very likely the one additional network hop would
take longer than the actual lookup.
Use Cases
#########################
Clients can assess one "weed filer" via HTTP, create files via HTTP POST,
read files via HTTP POST directly.
Future
###################
SeaweedFS can support other distributed databases. It will be better
if that database can support prefix search, in order to list files
under a directory.
Helps Wanted
########################
Please implement your preferred metadata store!
Just follow the cassandra_store/cassandra_store.go file and send me a pull
request. I will handle the rest.

82
docs/failover.rst

@ -1,82 +0,0 @@
Failover master server
=======================
Introduction
###################################
Some user will ask for no single point of failure. Although google runs its file system with a single master for years, no SPOF seems becoming a criteria for architects to pick solutions.
Luckily, it's not too difficult to enable Seaweed File System with failover master servers.
Cheat Sheet: Startup multiple servers
########################################
This section is a quick way to start 3 master servers and 3 volume servers. All done!
.. code-block:: bash
weed server -master.port=9333 -dir=./1 -volume.port=8080 \
-master.peers=localhost:9333,localhost:9334,localhost:9335
weed server -master.port=9334 -dir=./2 -volume.port=8081 \
-master.peers=localhost:9333,localhost:9334,localhost:9335
weed server -master.port=9335 -dir=./3 -volume.port=8082 \
-master.peers=localhost:9333,localhost:9334,localhost:9335
Or, you can use this "-peers" settings to add master servers one by one.
.. code-block:: bash
weed server -master.port=9333 -dir=./1 -volume.port=8080
weed server -master.port=9334 -dir=./2 -volume.port=8081 -master.peers=localhost:9333
weed server -master.port=9335 -dir=./3 -volume.port=8082 -master.peers=localhost:9334
How it works
##########################
The master servers are coordinated by Raft protocol, to elect a leader. The leader took over all the work to manage volumes, assign file ids. All other master servers just simply forward requests to the leader.
If the leader dies, another leader will be elected. And all the volume servers will send their heartbeat together with their volumes information to the new leader. The new leader will take the full responsibility.
During the transition, there could be moments where the new leader has partial information about all volume servers. This just means those yet-to-heartbeat volume servers will not be writable temporarily.
Startup multiple master servers
###############################################
Now let's start the master and volume servers separately, the usual way.
Usually you would start several (3 or 5) master servers, then start the volume servers:
.. code-block:: bash
weed master -port=9333 -mdir=./1
weed master -port=9334 -mdir=./2 -peers=localhost:9333
weed master -port=9335 -mdir=./3 -peers=localhost:9334
# now start the volume servers, specifying any one of the master server
weed volume -dir=./1 -port=8080
weed volume -dir=./2 -port=8081 -mserver=localhost:9334
weed volume -dir=./3 -port=8082 -mserver=localhost:9335
These 6 commands will actually functioning the same as the previous 3 commands from the cheatsheet.
Even though we only specified one peer in "-peers" option to bootstrap, the master server will get to know all the other master servers in the cluster, and store these information in the local directory.
When you need to restart
If you need to restart the master servers, just run the master servers WITHOUT the "-peers" option.
.. code-block:: bash
weed master -port=9333 -mdir=./1
weed master -port=9334 -mdir=./2
weed master -port=9335 -mdir=./3
To understand why, remember that the cluster information is "sticky", meaning it is stored on disk. If you restart the server, the cluster information stay the same, so the "-peers" option is not needed again.
Common Problem
############################
This "sticky" cluster information can cause some misunderstandings. For example, here is one:
https://code.google.com/p/weed-fs/issues/detail?id=70
The previously used value "localhost" would come up even not specified. This could cause your some time to figure out.

140
docs/gettingstarted.rst

@ -1,140 +0,0 @@
Getting started
===================================
Installing Seaweed-FS
###################################
Download a proper version from `Seaweed-FS download page <https://bintray.com/chrislusf/Weed-FS/weed/>`_.
Decompress the downloaded file. You will only find one executable file, either "weed" on most systems or "weed.exe" on windows.
Put the file "weed" to all related computers, in any folder you want. Use
.. code-block:: bash
./weed -h # to check available options
Set up Weed Master
*********************************
.. code-block:: bash
./weed master -h # to check available options
If no replication is required, this will be enough. The "mdir" option is to configure a folder where the generated sequence file ids are saved.
.. code-block:: bash
./weed master -mdir="."
If you need replication, you would also set the configuration file. By default it is "/etc/weedfs/weedfs.conf" file. The example can be found in RackDataCenterAwareReplication
Set up Weed Volume Server
*********************************
.. code-block:: bash
./weed volume -h # to check available options
Usually volume servers are spread on different computers. They can have different disk space, or even different operating system.
Usually you would need to specify the available disk space, the Weed Master location, and the storage folder.
.. code-block:: bash
./weed volume -max=100 -mserver="localhost:9333" -dir="./data"
Cheat Sheet: Setup One Master Server and One Volume Server
**************************************************************
Actually, forget about previous commands. You can setup one master server and one volume server in one shot:
.. code-block:: bash
./weed server -dir="./data"
# same, just specifying the default values
# use "weed server -h" to find out more
./weed server -master.port=9333 -volume.port=8080 -dir="./data"
Testing Seaweed-FS
###################################
With the master and volume server up, now what? Let's pump in a lot of files into the system!
.. code-block:: bash
./weed upload -dir="/some/big/folder"
This command would recursively upload all files. Or you can specify what files you want to include.
.. code-block:: bash
./weed upload -dir="/some/big/folder" -include=*.txt
Then, you can simply check "du -m -s /some/big/folder" to see the actual disk usage by OS, and compare it with the file size under "/data". Usually if you are uploading a lot of textual files, the consumed disk size would be much smaller since textual files are gzipped automatically.
Now you can use your tools to hit weed-fs as hard as you can.
Using Seaweed-FS in docker
####################################
You can use image "cydev/weed" or build your own with `dockerfile <https://github.com/chrislusf/weed-fs/blob/master/Dockerfile>`_ in the root of repo.
Using pre-built Docker image
**************************************************************
.. code-block:: bash
docker run --name weed cydev/weed server
And in another terminal
.. code-block:: bash
IP=$(docker inspect --format '{{ .NetworkSettings.IPAddress }}' weed)
curl "http://$IP:9333/cluster/status?pretty=y"
{
"IsLeader": true,
"Leader": "localhost:9333"
}
# use $IP as host for api queries
Building image from dockerfile
**************************************************************
Make a local copy of weed-fs from github
.. code-block:: bash
git clone https://github.com/chrislusf/weed-fs.git
Minimal Image (~19.6 MB)
.. code-block:: bash
docker build --no-cache -t 'cydev/weed' .
Go-Build Docker Image (~764 MB)
.. code-block:: bash
mv Dockerfile Dockerfile.minimal
mv Dockerfile.go_build Dockerfile
docker build --no-cache -t 'cydev/weed' .
In production
**************************************************************
To gain persistency you can use docker volumes.
.. code-block:: bash
# start our weed server daemonized
docker run --name weed -d -p 9333:9333 -p 8080:8080 \
-v /opt/weedfs/data:/data cydev/weed server -dir="/data" \
-publicIp="$(curl -s cydev.ru/ip)"
Now our weed-fs server will be persistent and accessible by localhost:9333 and :8080 on host machine.
Dont forget to specify "-publicIp" for correct connectivity.

32
docs/index.rst

@ -1,32 +0,0 @@
.. weed-fs documentation master file, created by
sphinx-quickstart on Wed Jul 23 12:13:18 2014.
You can adapt this file completely to your liking, but it should at least
contain the root `toctree` directive.
Welcome to Seaweed-FS documentation!
===================================
This is the official site for Seaweed-FS.
The one on google code is deprecated.
For pre-compiled releases,
https://bintray.com/chrislusf/Weed-FS/seaweed
Contents:
.. toctree::
:maxdepth: 2
gettingstarted
clients
api
replication
ttl
failover
directories
distributed_filer
usecases
optimization
benchmarks
changelist

242
docs/make.bat

@ -1,242 +0,0 @@
@ECHO OFF
REM Command file for Sphinx documentation
if "%SPHINXBUILD%" == "" (
set SPHINXBUILD=sphinx-build
)
set BUILDDIR=_build
set ALLSPHINXOPTS=-d %BUILDDIR%/doctrees %SPHINXOPTS% .
set I18NSPHINXOPTS=%SPHINXOPTS% .
if NOT "%PAPER%" == "" (
set ALLSPHINXOPTS=-D latex_paper_size=%PAPER% %ALLSPHINXOPTS%
set I18NSPHINXOPTS=-D latex_paper_size=%PAPER% %I18NSPHINXOPTS%
)
if "%1" == "" goto help
if "%1" == "help" (
:help
echo.Please use `make ^<target^>` where ^<target^> is one of
echo. html to make standalone HTML files
echo. dirhtml to make HTML files named index.html in directories
echo. singlehtml to make a single large HTML file
echo. pickle to make pickle files
echo. json to make JSON files
echo. htmlhelp to make HTML files and a HTML help project
echo. qthelp to make HTML files and a qthelp project
echo. devhelp to make HTML files and a Devhelp project
echo. epub to make an epub
echo. latex to make LaTeX files, you can set PAPER=a4 or PAPER=letter
echo. text to make text files
echo. man to make manual pages
echo. texinfo to make Texinfo files
echo. gettext to make PO message catalogs
echo. changes to make an overview over all changed/added/deprecated items
echo. xml to make Docutils-native XML files
echo. pseudoxml to make pseudoxml-XML files for display purposes
echo. linkcheck to check all external links for integrity
echo. doctest to run all doctests embedded in the documentation if enabled
goto end
)
if "%1" == "clean" (
for /d %%i in (%BUILDDIR%\*) do rmdir /q /s %%i
del /q /s %BUILDDIR%\*
goto end
)
%SPHINXBUILD% 2> nul
if errorlevel 9009 (
echo.
echo.The 'sphinx-build' command was not found. Make sure you have Sphinx
echo.installed, then set the SPHINXBUILD environment variable to point
echo.to the full path of the 'sphinx-build' executable. Alternatively you
echo.may add the Sphinx directory to PATH.
echo.
echo.If you don't have Sphinx installed, grab it from
echo.http://sphinx-doc.org/
exit /b 1
)
if "%1" == "html" (
%SPHINXBUILD% -b html %ALLSPHINXOPTS% %BUILDDIR%/html
if errorlevel 1 exit /b 1
echo.
echo.Build finished. The HTML pages are in %BUILDDIR%/html.
goto end
)
if "%1" == "dirhtml" (
%SPHINXBUILD% -b dirhtml %ALLSPHINXOPTS% %BUILDDIR%/dirhtml
if errorlevel 1 exit /b 1
echo.
echo.Build finished. The HTML pages are in %BUILDDIR%/dirhtml.
goto end
)
if "%1" == "singlehtml" (
%SPHINXBUILD% -b singlehtml %ALLSPHINXOPTS% %BUILDDIR%/singlehtml
if errorlevel 1 exit /b 1
echo.
echo.Build finished. The HTML pages are in %BUILDDIR%/singlehtml.
goto end
)
if "%1" == "pickle" (
%SPHINXBUILD% -b pickle %ALLSPHINXOPTS% %BUILDDIR%/pickle
if errorlevel 1 exit /b 1
echo.
echo.Build finished; now you can process the pickle files.
goto end
)
if "%1" == "json" (
%SPHINXBUILD% -b json %ALLSPHINXOPTS% %BUILDDIR%/json
if errorlevel 1 exit /b 1
echo.
echo.Build finished; now you can process the JSON files.
goto end
)
if "%1" == "htmlhelp" (
%SPHINXBUILD% -b htmlhelp %ALLSPHINXOPTS% %BUILDDIR%/htmlhelp
if errorlevel 1 exit /b 1
echo.
echo.Build finished; now you can run HTML Help Workshop with the ^
.hhp project file in %BUILDDIR%/htmlhelp.
goto end
)
if "%1" == "qthelp" (
%SPHINXBUILD% -b qthelp %ALLSPHINXOPTS% %BUILDDIR%/qthelp
if errorlevel 1 exit /b 1
echo.
echo.Build finished; now you can run "qcollectiongenerator" with the ^
.qhcp project file in %BUILDDIR%/qthelp, like this:
echo.^> qcollectiongenerator %BUILDDIR%\qthelp\weed-fs.qhcp
echo.To view the help file:
echo.^> assistant -collectionFile %BUILDDIR%\qthelp\weed-fs.ghc
goto end
)
if "%1" == "devhelp" (
%SPHINXBUILD% -b devhelp %ALLSPHINXOPTS% %BUILDDIR%/devhelp
if errorlevel 1 exit /b 1
echo.
echo.Build finished.
goto end
)
if "%1" == "epub" (
%SPHINXBUILD% -b epub %ALLSPHINXOPTS% %BUILDDIR%/epub
if errorlevel 1 exit /b 1
echo.
echo.Build finished. The epub file is in %BUILDDIR%/epub.
goto end
)
if "%1" == "latex" (
%SPHINXBUILD% -b latex %ALLSPHINXOPTS% %BUILDDIR%/latex
if errorlevel 1 exit /b 1
echo.
echo.Build finished; the LaTeX files are in %BUILDDIR%/latex.
goto end
)
if "%1" == "latexpdf" (
%SPHINXBUILD% -b latex %ALLSPHINXOPTS% %BUILDDIR%/latex
cd %BUILDDIR%/latex
make all-pdf
cd %BUILDDIR%/..
echo.
echo.Build finished; the PDF files are in %BUILDDIR%/latex.
goto end
)
if "%1" == "latexpdfja" (
%SPHINXBUILD% -b latex %ALLSPHINXOPTS% %BUILDDIR%/latex
cd %BUILDDIR%/latex
make all-pdf-ja
cd %BUILDDIR%/..
echo.
echo.Build finished; the PDF files are in %BUILDDIR%/latex.
goto end
)
if "%1" == "text" (
%SPHINXBUILD% -b text %ALLSPHINXOPTS% %BUILDDIR%/text
if errorlevel 1 exit /b 1
echo.
echo.Build finished. The text files are in %BUILDDIR%/text.
goto end
)
if "%1" == "man" (
%SPHINXBUILD% -b man %ALLSPHINXOPTS% %BUILDDIR%/man
if errorlevel 1 exit /b 1
echo.
echo.Build finished. The manual pages are in %BUILDDIR%/man.
goto end
)
if "%1" == "texinfo" (
%SPHINXBUILD% -b texinfo %ALLSPHINXOPTS% %BUILDDIR%/texinfo
if errorlevel 1 exit /b 1
echo.
echo.Build finished. The Texinfo files are in %BUILDDIR%/texinfo.
goto end
)
if "%1" == "gettext" (
%SPHINXBUILD% -b gettext %I18NSPHINXOPTS% %BUILDDIR%/locale
if errorlevel 1 exit /b 1
echo.
echo.Build finished. The message catalogs are in %BUILDDIR%/locale.
goto end
)
if "%1" == "changes" (
%SPHINXBUILD% -b changes %ALLSPHINXOPTS% %BUILDDIR%/changes
if errorlevel 1 exit /b 1
echo.
echo.The overview file is in %BUILDDIR%/changes.
goto end
)
if "%1" == "linkcheck" (
%SPHINXBUILD% -b linkcheck %ALLSPHINXOPTS% %BUILDDIR%/linkcheck
if errorlevel 1 exit /b 1
echo.
echo.Link check complete; look for any errors in the above output ^
or in %BUILDDIR%/linkcheck/output.txt.
goto end
)
if "%1" == "doctest" (
%SPHINXBUILD% -b doctest %ALLSPHINXOPTS% %BUILDDIR%/doctest
if errorlevel 1 exit /b 1
echo.
echo.Testing of doctests in the sources finished, look at the ^
results in %BUILDDIR%/doctest/output.txt.
goto end
)
if "%1" == "xml" (
%SPHINXBUILD% -b xml %ALLSPHINXOPTS% %BUILDDIR%/xml
if errorlevel 1 exit /b 1
echo.
echo.Build finished. The XML files are in %BUILDDIR%/xml.
goto end
)
if "%1" == "pseudoxml" (
%SPHINXBUILD% -b pseudoxml %ALLSPHINXOPTS% %BUILDDIR%/pseudoxml
if errorlevel 1 exit /b 1
echo.
echo.Build finished. The pseudo-XML files are in %BUILDDIR%/pseudoxml.
goto end
)
:end

114
docs/optimization.rst

@ -1,114 +0,0 @@
Optimization
==============
Here are the strategies or best ways to optimize Seaweed-FS.
Increase concurrent writes
################################
By default, Seaweed-FS grows the volumes automatically. For example, for no-replication volumes, there will be concurrently 7 writable volumes allocated.
If you want to distribute writes to more volumes, you can do so by instructing Seaweed-FS master via this URL.
.. code-block:: bash
curl http://localhost:9333/vol/grow?count=12&replication=001
This will assign 12 volumes with 001 replication. Since 001 replication means 2 copies for the same data, this will actually consumes 24 physical volumes.
Increase concurrent reads
################################
Same as above, more volumes will increase read concurrency.
In addition, increase the replication will also help. Having the same data stored on multiple servers will surely increase read concurrency.
Add more hard drives
################################
More hard drives will give you better write/read throughput.
Gzip content
################################
Seaweed-FS determines the file can be gzipped based on the file name extension. So if you submit a textual file, it's better to use an common file name extension, like ".txt", ".html", ".js", ".css", etc. If the name is unknown, like ".go", Seaweed-FS will not gzip the content, but just save the content as is.
You can also manually gzip content before submission. If you do so, make sure the submitted file has file name with ends with ".gz". For example, "my.css" can be gzipped to "my.css.gz" and sent to Seaweed-FS. When retrieving the content, if the http client supports "gzip" encoding, the gzipped content would be sent back. Otherwise, the unzipped content would be sent back.
Memory consumption
#################################
For volume servers, the memory consumption is tightly related to the number of files. For example, one 32G volume can easily have 1.5 million files if each file is only 20KB. To store the 1.5 million entries of meta data in memory, currently Seaweed-FS consumes 36MB memory, about 24bytes per entry in memory. So if you allocate 64 volumes(2TB), you would need 2~3GB memory. However, if the average file size is larger, say 200KB, only 200~300MB memory is needed.
Theoretically the memory consumption can go even lower by compacting since the file ids are mostly monotonically increasing. I did not invest time on that yet since the memory consumption, 24bytes/entry(including uncompressed 8bytes file id, 4 bytes file size, plus additional map data structure cost) is already pretty low. But I welcome any one to compact these data in memory even more efficiently.
Insert with your own keys
################################
The file id generation is actually pretty trivial and you could use your own way to generate the file keys.
A file key has 3 parts:
* volume id: a volume with free spaces
* file id: a monotonously increasing and unique number
* file cookie: a random number, you can customize it in whichever way you want
You can directly ask master server to assign a file key, and replace the file id part to your own unique id, e.g., user id.
Also you can get each volume's free space from the server status.
.. code-block:: bash
curl "http://localhost:9333/dir/status?pretty=y"
Once you are sure about the volume free spaces, you can use your own file ids. Just need to ensure the file key format is compatible.
The assigned file cookie can also be customized.
Customizing the file id and/or file cookie is an acceptable behavior. "strict monotonously increasing" is not necessary, but keeping file id in a "mostly" increasing order is expected in order to keep the in memory data structure efficient.
Upload large files
###################################
If files are large and network is slow, the server will take time to read the file. Please increase the "-readTimeout=3" limit setting for volume server. It cut off the connection if uploading takes a longer time than the limit.
Upload large files with Auto Split/Merge
If the file is large, it's better to upload this way:
.. code-block:: bash
weed upload -maxMB=64 the_file_name
This will split the file into data chunks of 64MB each, and upload them separately. The file ids of all the data chunks are saved into an additional meta chunk. The meta chunk's file id are returned.
When downloading the file, just
.. code-block:: bash
weed download the_meta_chunk_file_id
The meta chunk has the list of file ids, with each file id on each line. So if you want to process them in parallel, you can download the meta chunk and deal with each data chunk directly.
Collection as a Simple Name Space
When assigning file ids,
.. code-block:: bash
curl http://master:9333/dir/assign?collection=pictures
curl http://master:9333/dir/assign?collection=documents
will also generate a "pictures" collection and a "documents" collection if they are not created already. Each collection will have its dedicated volumes, and they will not share the same volume.
Actually, the actual data files have the collection name as the prefix, e.g., "pictures_1.dat", "documents_3.dat".
In case you need to delete them later, you can go to the volume servers and delete the data files directly, for now. Later maybe a deleteCollection command may be implemented, if someone asks...
Logging
##############################
When going to production, you will want to collect the logs. Seaweed-FS uses glog. Here are some examples:
.. code-block:: bash
weed -v=2 master
weed -log_dir=. volume

98
docs/replication.rst

@ -1,98 +0,0 @@
Replication
===================================
Seaweed-FS can support replication. The replication is implemented not on file level, but on volume level.
How to use
###################################
Basically, the way it works is:
1. start weed master, and optionally specify the default replication type
.. code-block:: bash
./weed master -defaultReplication=001
2. start volume servers as this:
.. code-block:: bash
./weed volume -port=8081 -dir=/tmp/1 -max=100
./weed volume -port=8082 -dir=/tmp/2 -max=100
./weed volume -port=8083 -dir=/tmp/3 -max=100
Submitting, Reading, Deleting files has the same steps.
The meaning of replication type
###################################
*Note: This subject to change.*
+-----+---------------------------------------------------------------------------+
|000 |no replication, just one copy |
+-----+---------------------------------------------------------------------------+
|001 |replicate once on the same rack |
+-----+---------------------------------------------------------------------------+
|010 |replicate once on a different rack in the same data center |
+-----+---------------------------------------------------------------------------+
|100 |replicate once on a different data center |
+-----+---------------------------------------------------------------------------+
|200 |replicate twice on two other different data center |
+-----+---------------------------------------------------------------------------+
|110 |replicate once on a different rack, and once on a different data center |
+-----+---------------------------------------------------------------------------+
|... |... |
+-----+---------------------------------------------------------------------------+
So if the replication type is xyz
+-------+--------------------------------------------------------+
|**x** |number of replica in other data centers |
+-------+--------------------------------------------------------+
|**y** |number of replica in other racks in the same data center|
+-------+--------------------------------------------------------+
|**z** |number of replica in other servers in the same rack |
+-------+--------------------------------------------------------+
x,y,z each can be 0, 1, or 2. So there are 9 possible replication types, and can be easily extended.
Each replication type will physically create x+y+z+1 copies of volume data files.
Example topology configuration
###################################
The Seaweed-FS master server tries to read the default topology configuration file are read from /etc/weedfs/weedfs.conf, if it exists. The topology setting to configure data center and racks file format is as this.
.. code-block:: xml
<Configuration>
<Topology>
<DataCenter name="dc1">
<Rack name="rack1">
<Ip>192.168.1.1</Ip>
</Rack>
</DataCenter>
<DataCenter name="dc2">
<Rack name="rack1">
<Ip>192.168.1.2</Ip>
</Rack>
<Rack name="rack2">
<Ip>192.168.1.3</Ip>
<Ip>192.168.1.4</Ip>
</Rack>
</DataCenter>
</Topology>
</Configuration>
Allocate File Key on specific data center
Volume servers can start with a specific data center name.
.. code-block:: bash
weed volume -dir=/tmp/1 -port=8080 -dataCenter=dc1
weed volume -dir=/tmp/2 -port=8081 -dataCenter=dc2
Or the master server can determine the data center via volume server's IP address and settings in weed.conf file.
Now when requesting a file key, an optional "dataCenter" parameter can limit the assigned volume to the specific data center. For example, this specify
.. code-block:: bash
http://localhost:9333/dir/assign?dataCenter=dc1

1
docs/requirements.txt

@ -1 +0,0 @@
sphinx_rtd_theme

85
docs/ttl.rst

@ -1,85 +0,0 @@
Store file with a Time To Live
===================
Introduction
#############################
Seaweed is a key~file store, and files can optionally expire with a Time To Live (TTL).
How to use it?
#############################
Assume we want to store a file with TTL of 3 minutes.
First, ask the master to assign a file id to a volume with a 3-minute TTL:
.. code-block:: bash
> curl http://localhost:9333/dir/assign?ttl=3m
{"count":1,"fid":"5,01637037d6","url":"127.0.0.1:8080","publicUrl":"localhost:8080"}
Secondly, use the file id to store on the volume server
.. code-block:: bash
> curl -F "file=@x.go" http://127.0.0.1:8080/5,01637037d6?ttl=3m
After writing, the file content will be returned as usual if read before the TTL expiry. But if read after the TTL expiry, the file will be reported as missing and return the http response status as not found.
For next writes with ttl=3m, the same set of volumes with ttl=3m will be used until:
1. the ttl=3m volumes are full. If so, new volumes will be created.
2. there are no write activities for 3 minutes. If so, these volumes will be stopped and deleted.
Advanced Usage
#############################
As you may have noticed, the "ttl=3m" is used twice! One for assigning file id, and one for uploading the actual file. The first one is for master to pick a matching volume, while the second one is written together with the file.
These two TTL values are not required to be the same. As long as the volume TTL is larger than file TTL, it should be OK.
This gives some flexibility to fine-tune the file TTL, while reducing the number of volume TTL variations, which simplifies managing the TTL volumes.
Supported TTL format
#############################
The TTL is in the format of one integer number followed by one unit. The unit can be 'm', 'h', 'd', 'w', 'M', 'y'.
Supported TTL format examples:
- 3m: 3 minutes
- 4h: 4 hours
- 5d: 5 days
- 6w: 6 weeks
- 7M: 7 months
- 8y: 8 years
How efficient it is?
#############################
TTL seems easy to implement since we just need to report the file as missing if the time is over the TTL. However, the real difficulty is to efficiently reclaim disk space from expired files, similar to JVM memory garbage collection, which is a sophisticated piece of work with many man-years of effort.
Memcached also supports TTL. It gets around this problem by putting entries into fix-sized slabs. If one slab is expired, no work is required and the slab can be overwritten right away. However, this fix-sized slab approach is not applicable to files since the file contents rarely fit in slabs exactly.
Seaweed-FS efficiently resolves this disk space garbage collection problem with great simplicity. One of key differences from "normal" implementation is that the TTL is associated with the volume, together with each specific file.
During the file id assigning step, the file id will be assigned to a volume with matching TTL. The volumes are checked periodically (every 5~10 seconds by default). If the latest expiration time has been reached, all the files in the whole volume will be all expired, and the volume can be safely deleted.
Implementation Details
#############################
1. When assigning file key, the master would pick one TTL volume with matching TTL. If such volumes do not exist, create a few.
2. Volume servers will write the file with expiration time. When serving file, if the file is expired, the file will be reported as not found.
3. Volume servers will track each volume's largest expiration time, and stop reporting the expired volumes to the master server.
4. Master server will think the previously existed volumes are dead, and stop assigning write requests to them.
5. After about 10% of the TTL time, or at most 10 minutes, the volume servers will delete the expired volume.
Deployment
#############################
For deploying to production, the TTL volume maximum size should be taken into consideration. If the writes are frequent, the TTL volume will grow to the max volume size. So when the disk space is not ample enough, it's better to reduce the maximum volume size.
It's recommended not to mix the TTL volumes and non TTL volumes in the same cluster. This is because the volume maximum size, default to 30GB, is configured on the volume master at the cluster level.
We could implement the configuration for max volume size for each TTL. However, it could get fairly verbose. Maybe later if it is strongly desired.

114
docs/usecases.rst

@ -1,114 +0,0 @@
Use cases
===================
Saving image with different sizes
#############################
Each image usually store one file key in database. However, one image can have several versions, e.g., thumbnail, small, medium, large, original. And each version of the same image will have a file key. It's not ideal to store all the keys.
One way to resolve this is here.
Reserve a set of file keys, for example, 5
.. code-block:: bash
curl http://<host>:<port>/dir/assign?count=5
{"fid":"3,01637037d6","url":"127.0.0.1:8080","publicUrl":"localhost:8080","count":5}
Save the 5 versions of the image to the volume server. The urls for each image can be:
.. code-block:: bash
http://<url>:<port>/3,01637037d6
http://<url>:<port>/3,01637037d6_1
http://<url>:<port>/3,01637037d6_2
http://<url>:<port>/3,01637037d6_3
http://<url>:<port>/3,01637037d6_4
Overwriting mime types
#############################
The correct way to send mime type:
.. code-block:: bash
curl -F "file=@myImage.png;type=image/png" http://127.0.0.1:8081/5,2730a7f18b44
The wrong way to send it:
.. code-block:: bash
curl -H "Content-Type:image/png" -F file=@myImage.png http://127.0.0.1:8080/5,2730a7f18b44
Securing Seaweed-FS
#############################
The simple way is to front all master and volume servers with firewall.
However, if blocking servicing port is not feasible or trivial, a white list option can be used. Only traffic from the white list IP addresses have write permission.
.. code-block:: bash
weed master -whiteList="::1,127.0.0.1"
weed volume -whiteList="::1,127.0.0.1"
# "::1" is for IP v6 localhost.
Data Migration Example
#############################
.. code-block:: bash
weed master -mdir="/tmp/mdata" -defaultReplication="001" -ip="localhost" -port=9334
weed volume -dir=/tmp/vol1/ -mserver="localhost:9334" -ip="localhost" -port=8081
weed volume -dir=/tmp/vol2/ -mserver="localhost:9334" -ip="localhost" -port=8082
weed volume -dir=/tmp/vol3/ -mserver="localhost:9334" -ip="localhost" -port=8083
.. code-block:: bash
ls vol1 vol2 vol3
vol1:
1.dat 1.idx 2.dat 2.idx 3.dat 3.idx 5.dat 5.idx
vol2:
2.dat 2.idx 3.dat 3.idx 4.dat 4.idx 6.dat 6.idx
vol3:
1.dat 1.idx 4.dat 4.idx 5.dat 5.idx 6.dat 6.idx
stop all of them
move vol3/* to vol1 and vol2
it is ok to move x.dat and x.idx from one volumeserver to another volumeserver,
because they are exactly the same.
it can be checked by md5.
.. code-block:: bash
md5 vol1/1.dat vol2/1.dat
MD5 (vol1/1.dat) = c1a49a0ee550b44fef9f8ae9e55215c7
MD5 (vol2/1.dat) = c1a49a0ee550b44fef9f8ae9e55215c7
md5 vol1/1.idx vol2/1.idx
MD5 (vol1/1.idx) = b9edc95795dfb3b0f9063c9cc9ba8095
MD5 (vol2/1.idx) = b9edc95795dfb3b0f9063c9cc9ba8095
.. code-block:: bash
ls vol1 vol2 vol3
vol1:
1.dat 1.idx 2.dat 2.idx 3.dat 3.idx 4.dat 4.idx 5.dat 5.idx 6.dat 6.idx
vol2:
1.dat 1.idx 2.dat 2.idx 3.dat 3.idx 4.dat 4.idx 5.dat 5.idx 6.dat 6.idx
vol3:
start
.. code-block:: bash
weed master -mdir="/tmp/mdata" -defaultReplication="001" -ip="localhost" -port=9334
weed volume -dir=/tmp/vol1/ -mserver="localhost:9334" -ip="localhost" -port=8081
weed volume -dir=/tmp/vol2/ -mserver="localhost:9334" -ip="localhost" -port=8082
so we finished moving data of localhost:8083 to localhost:8081/localhost:8082
Loading…
Cancel
Save