SeaweedFS is an independent Apache-licensed open source project with its ongoing development made
possible entirely thanks to the support of these awesome [backers](https://github.com/chrislusf/seaweedfs/blob/master/backers.md).
SeaweedFS is an independent Apache-licensed open source project with its ongoing development made
possible entirely thanks to the support of these awesome [backers](https://github.com/chrislusf/seaweedfs/blob/master/backers.md).
If you'd like to grow SeaweedFS even stronger, please consider joining our
<ahref="https://www.patreon.com/seaweedfs">sponsors on Patreon</a>.
Platinum ($2500/month), Gold ($500/month): put your company logo on the SeaweedFS github page
Generous Backer($50/month), Backer($10/month): put your name on the SeaweedFS backer page.
Your support will be really appreciated by me and other supporters!
<h3align="center"><ahref="https://www.patreon.com/seaweedfs">Sponsor SeaweedFS via Patreon</a></h3>
<!--
<h4align="center">Platinum</h4>
<palign="center">
@ -45,6 +43,8 @@ Your support will be really appreciated by me and other supporters!
</tbody>
</table>
-->
---
@ -52,9 +52,29 @@ Your support will be really appreciated by me and other supporters!
- [SeaweedFS on Slack](https://join.slack.com/t/seaweedfs/shared_invite/enQtMzI4MTMwMjU2MzA3LTc4MmVlYmFlNjBmZTgzZmJlYmI1MDE1YzkyNWYyZjkwZDFiM2RlMDdjNjVlNjdjYzc4NGFhZGIyYzEyMzJkYTA)
* [Compared to Other File Systems](#compared-to-other-file-systems)
* [Compared to HDFS](#compared-to-hdfs)
* [Compared to GlusterFS, Ceph](#compared-to-glusterfs-ceph)
* [Compared to GlusterFS](#compared-to-glusterfs)
* [Compared to Ceph](#compared-to-ceph)
* [Dev Plan](#dev-plan)
* [Installation Guide](#installation-guide)
* [Disk Related Topics](#disk-related-topics)
* [Benchmark](#Benchmark)
* [License](#license)
## Introduction ##
SeaweedFS is a simple and highly scalable distributed file system. There are two objectives:
@ -65,41 +85,57 @@ SeaweedFS started as an Object Store to handle small files efficiently. Instead
There is only 40 bytes of disk storage overhead for each file's metadata. It is so simple with O(1) disk reads that you are welcome to challenge the performance with your actual use cases.
SeaweedFS started by implementing [Facebook's Haystack design paper](http://www.usenix.org/event/osdi10/tech/full_papers/Beaver.pdf).
SeaweedFS started by implementing [Facebook's Haystack design paper](http://www.usenix.org/event/osdi10/tech/full_papers/Beaver.pdf). Also, SeaweedFS implements erasure coding with ideas from [f4: Facebook’s Warm BLOB Storage System](https://www.usenix.org/system/files/conference/osdi14/osdi14-paper-muralidhar.pdf)
SeaweedFS can work very well with just the object store. [[Filer]] can then be added later to support directories and POSIX attributes. Filer is a separate linearly-scalable stateless server with customizable metadata stores, e.g., MySql/Postgres/Redis/Cassandra/LevelDB.
## Additional Features
* Can choose no replication or different replication levels, rack and data center aware
* Automatic master servers failover - no single point of failure (SPOF)
* Automatic Gzip compression depending on file mime type
* Automatic compaction to reclaim disk space after deletion or update
[Back to TOC](#table-of-contents)
## Features ##
[Back to TOC](#table-of-contents)
## Additional Features ##
* Can choose no replication or different replication levels, rack and data center aware.
* Automatic master servers failover - no single point of failure (SPOF).
* Automatic Gzip compression depending on file mime type.
* Automatic compaction to reclaim disk space after deletion or update.
* Servers in the same cluster can have different disk spaces, file systems, OS etc.
* Adding/Removing servers does **not** cause any data re-balancing
* Optionally fix the orientation for jpeg pictures
* Support Etag, Accept-Range, Last-Modified, etc.
* Adding/Removing servers does **not** cause any data re-balancing.
* Optionally fix the orientation for jpeg pictures.
* Support ETag, Accept-Range, Last-Modified, etc.
* Support in-memory/leveldb/boltdb/btree mode tuning for memory/performance balance.
* Support rebalancing the writable and readonly volumes.
[Back to TOC](#table-of-contents)
## Filer Features
## Filer Features ##
* [filer server][Filer] provide "normal" directories and files via http.
* [mount filer][Mount] to read and write files directly as a local directory via FUSE.
* [Amazon S3 compatible API][AmazonS3API] to access files with S3 tooling.
* [Erasure Coding for warm storage][ErasureCoding] Rack-Aware 10.4 erasure coding reduces storage cost and increases availability.
* [Hadoop Compatible File System][Hadoop] to access files from Hadoop/Spark/Flink/etc jobs.
* [Async Backup To Cloud][BackupToCloud] has extremely fast local access and backups to Amazon S3, Google Cloud Storage, Azure, BackBlaze.
* [WebDAV] access as a mapped drive on Mac and Windows, or from mobile devices.
By default, the master node runs on port 9333, and the volume nodes run on port 8080.
Let's start one master node, and two volume nodes on port 8080 and 8081. Ideally, they should be started from different machines. We'll use localhost as an example.
SeaweedFS uses HTTP REST operations to read, write, and delete. The responses are in JSON or JSONP format.
### Start Master Server
### Start Master Server ###
```
> ./weed master
@ -125,7 +161,7 @@ Second, to store the file content, send a HTTP multi-part POST request to `url +
Since (usually) there are not too many volume servers, and volumes don't move often, you can cache the results most of the time. Depending on the replication type, one volume can have multiple replica locations. Just randomly pick one location to read.
@ -213,7 +250,7 @@ More details about replication can be found [on the wiki][Replication].
You can also set the default replication strategy when starting the master server.
### Allocate File Key on specific data center ###
### Allocate File Key on Specific Data Center ###
Volume servers can be started with a specific data center name:
@ -239,6 +276,8 @@ When requesting a file key, an optional "dataCenter" parameter can limit the ass
Usually distributed file systems split each file into chunks, a central master keeps a mapping of filenames, chunk indices to chunk handles, and also which chunks each chunk server has.
@ -279,12 +318,16 @@ Each individual file size is limited to the volume size.
All file meta information stored on an volume server is readable from memory without disk access. Each file takes just a 16-byte map entry of <64bitkey,32bitoffset,32bitsize>. Of course, each map entry has its own space cost for the map. But usually the disk space runs out before the memory does.
[Back to TOC](#table-of-contents)
## Compared to Other File Systems ##
Most other distributed file systems seem more complicated than necessary.
SeaweedFS is meant to be fast and simple, in both setup and operation. If you do not understand how it works when you reach here, we've failed! Please raise an issue with any questions or update this file with clarifications.
[Back to TOC](#table-of-contents)
### Compared to HDFS ###
HDFS uses the chunk approach for each file, and is ideal for storing large files.
@ -293,6 +336,7 @@ SeaweedFS is ideal for serving relatively smaller files quickly and concurrently
SeaweedFS can also store extra large files by splitting them into manageable data chunks, and store the file ids of the data chunks into a meta chunk. This is managed by "weed upload/download" tool, and the weed master or volume servers are agnostic about it.
[Back to TOC](#table-of-contents)
### Compared to GlusterFS, Ceph ###
@ -310,17 +354,21 @@ The architectures are mostly the same. SeaweedFS aims to store and read files fa
| GlusterFS | hashing | | FUSE, NFS | | |
| Ceph | hashing + rules | | FUSE | Yes | |
[Back to TOC](#table-of-contents)
### Compared to GlusterFS ###
GlusterFS stores files, both directories and content, in configurable volumes called "bricks".
GlusterFS hashes the path and filename into ids, and assigned to virtual volumes, and then mapped to "bricks".
[Back to TOC](#table-of-contents)
### Compared to Ceph ###
Ceph can be setup similar to SeaweedFS as a key->blob store. It is much more complicated, with the need to support layers on top of it. [Here is a more detailed comparison](https://github.com/chrislusf/seaweedfs/issues/120)
SeaweedFS has a centralized master group to look up free volumes, while Ceph uses hashing and metadata servers to locate its objects. Having a centralized master makes it easy to code and manage.
SeaweedFS has a centralized master group to look up free volumes, while Ceph uses hashing and metadata servers to locate its objects. Having a centralized master makes it easy to code and manage.
Same as SeaweedFS, Ceph is also based on the object store RADOS. Ceph is rather complicated with mixed reviews.
@ -336,16 +384,26 @@ SeaweedFS Filer uses off-the-shelf stores, such as MySql, Postgres, Redis, Cassa
More tools and documentation, on how to maintain and scale the system. For example, how to move volumes, automatically balancing data, how to grow volumes, how to check system status, etc.
Other key features include: Erasure Encoding, JWT security.
This is a super exciting project! And we need helpers and [support](https://www.patreon.com/seaweedfs)!
BTW, We suggest run the code style check script `util/gostd` before you push your branch to remote, it will make SeaweedFS easy to review, maintain and develop:
## Installation guide for users who are not familiar with golang
```
$ ./util/gostd
```
[Back to TOC](#table-of-contents)
## Installation Guide ##
> Installation guide for users who are not familiar with golang
Step 1: install go on your machine and setup the environment by following the instructions at:
@ -366,23 +424,27 @@ go get github.com/chrislusf/seaweedfs/weed
Once this is done, you will find the executable "weed" in your `$GOPATH/bin` directory
Step 4: after you modify your code locally, you could start a local build by calling `go install` under
Step 4: after you modify your code locally, you could start a local build by calling `go install` under
```
$GOPATH/src/github.com/chrislusf/seaweedfs/weed
```
## Disk Related topics ##
[Back to TOC](#table-of-contents)
## Disk Related Topics ##
### Hard Drive Performance ###
When testing read performance on SeaweedFS, it basically becomes a performance test of your hard drive's random read speed. Hard drives usually get 100MB/s~200MB/s.
### Solid State Disk
### Solid State Disk ###
To modify or delete small files, SSD must delete a whole block at a time, and move content in existing blocks to a new block. SSD is fast when brand new, but will get fragmented over time and you have to garbage collect, compacting blocks. SeaweedFS is friendly to SSD since it is append-only. Deletion and compaction are done on volume level in the background, not slowing reading and not causing fragmentation.
## Benchmark
[Back to TOC](#table-of-contents)
## Benchmark ##
My Own Unscientific Single Machine Results on Mac Book with Solid State Disk, CPU: 1 Intel Core i7 2.6GHz.
@ -435,8 +497,9 @@ Percentage of the requests served within a certain time (ms)
100% 20.7 ms
```
[Back to TOC](#table-of-contents)
## License
## License ##
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
@ -450,7 +513,8 @@ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
[Back to TOC](#table-of-contents)
## Stargazers over time
## Stargazers over time ##
[![Stargazers over time](https://starcharts.herokuapp.com/chrislusf/seaweedfs.svg)](https://starcharts.herokuapp.com/chrislusf/seaweedfs)
# Install SeaweedFS and Supercronic ( for cron job mode )
# Tried to use curl only (curl -o /tmp/linux_amd64.tar.gz ...), however it turned out that the following tar command failed with "gzip: stdin: not in gzip format"
// filerExportOutputFile = cmdFilerExport.Flag.String("output", "", "the output file. If empty, only list out the directory tree")
filerExportSourceStore=cmdFilerExport.Flag.String("sourceStore","","the source store name in filer.toml, default to currently enabled store")
filerExportTargetStore=cmdFilerExport.Flag.String("targetStore","","the target store name in filer.toml, or \"notification\" to export all files to message queue")
dir=cmdFilerExport.Flag.String("dir","/","only process files under this directory")
dirListLimit=cmdFilerExport.Flag.Int("dirListLimit",100000,"limit directory list size")
masterOptions.metricsIntervalSec=cmdServer.Flag.Int("metrics.intervalSeconds",15,"Prometheus push interval in seconds")
filerOptions.collection=cmdServer.Flag.String("filer.collection","","all data will be stored in this collection")
filerOptions.port=cmdServer.Flag.Int("filer.port",8888,"filer server http listen port")
filerOptions.grpcPort=cmdServer.Flag.Int("filer.port.grpc",0,"filer grpc server listen port, default to http port + 10000")
filerOptions.publicPort=cmdServer.Flag.Int("filer.port.public",0,"filer server public http listen port")
filerOptions.defaultReplicaPlacement=cmdServer.Flag.String("filer.defaultReplicaPlacement","","Default replication type if not specified during runtime.")
filerOptions.redirectOnRead=cmdServer.Flag.Bool("filer.redirectOnRead",false,"whether proxy or redirect to volume server during file GET request")
@ -88,15 +96,25 @@ func init() {
serverOptions.v.port=cmdServer.Flag.Int("volume.port",8080,"volume server http listen port")
serverOptions.v.publicPort=cmdServer.Flag.Int("volume.port.public",0,"volume server public port")
serverOptions.v.indexType=cmdServer.Flag.String("volume.index","memory","Choose [memory|leveldb|boltdb|btree] mode for memory~performance balance.")
serverOptions.v.indexType=cmdServer.Flag.String("volume.index","memory","Choose [memory|leveldb|leveldbMedium|leveldbLarge] mode for memory~performance balance.")
serverOptions.v.fixJpgOrientation=cmdServer.Flag.Bool("volume.images.fix.orientation",false,"Adjust jpg orientation when uploading.")
serverOptions.v.readRedirect=cmdServer.Flag.Bool("volume.read.redirect",true,"Redirect moved or non-local volumes.")
serverOptions.v.compactionMBPerSecond=cmdServer.Flag.Int("volume.compactionMBps",0,"limit compaction speed in mega bytes per second")