Browse Source

Merge pull request #3060 from natmaka/master

pull/3062/head
Chris Lu 3 years ago
committed by GitHub
parent
commit
72e7dcde51
No known key found for this signature in database GPG Key ID: 4AEE18F83AFDEB23
  1. 26
      README.md
  2. 4
      test/s3/compatibility/run.sh
  3. 2
      unmaintained/fix_dat/fix_dat.go
  4. 6
      weed/command/autocomplete.go
  5. 6
      weed/command/benchmark.go

26
README.md

@ -31,13 +31,11 @@ Your support will be really appreciated by me and other supporters!
</p> </p>
--> -->
### Gold Sponsors ### Gold Sponsors
- [![nodion](https://www.nodion.com/img/logo.svg)](https://www.nodion.com) - [![nodion](https://www.nodion.com/img/logo.svg)](https://www.nodion.com)
--- ---
- [Download Binaries for different platforms](https://github.com/chrislusf/seaweedfs/releases/latest) - [Download Binaries for different platforms](https://github.com/chrislusf/seaweedfs/releases/latest)
- [SeaweedFS on Slack](https://join.slack.com/t/seaweedfs/shared_invite/enQtMzI4MTMwMjU2MzA3LTEyYzZmZWYzOGQ3MDJlZWMzYmI0OTE4OTJiZjJjODBmMzUxNmYwODg0YjY3MTNlMjBmZDQ1NzQ5NDJhZWI2ZmY) - [SeaweedFS on Slack](https://join.slack.com/t/seaweedfs/shared_invite/enQtMzI4MTMwMjU2MzA3LTEyYzZmZWYzOGQ3MDJlZWMzYmI0OTE4OTJiZjJjODBmMzUxNmYwODg0YjY3MTNlMjBmZDQ1NzQ5NDJhZWI2ZmY)
- [SeaweedFS on Twitter](https://twitter.com/SeaweedFS) - [SeaweedFS on Twitter](https://twitter.com/SeaweedFS)
@ -61,7 +59,7 @@ Table of Contents
* [Additional Features](#additional-features) * [Additional Features](#additional-features)
* [Filer Features](#filer-features) * [Filer Features](#filer-features)
* [Example: Using Seaweed Object Store](#example-Using-Seaweed-Object-Store) * [Example: Using Seaweed Object Store](#example-Using-Seaweed-Object-Store)
* [Architecture](#architecture)
* [Architecture](#Object-Store-Architecture)
* [Compared to Other File Systems](#compared-to-other-file-systems) * [Compared to Other File Systems](#compared-to-other-file-systems)
* [Compared to HDFS](#compared-to-hdfs) * [Compared to HDFS](#compared-to-hdfs)
* [Compared to GlusterFS, Ceph](#compared-to-glusterfs-ceph) * [Compared to GlusterFS, Ceph](#compared-to-glusterfs-ceph)
@ -127,7 +125,7 @@ Faster and Cheaper than direct cloud storage!
## Additional Features ## ## Additional Features ##
* Can choose no replication or different replication levels, rack and data center aware. * Can choose no replication or different replication levels, rack and data center aware.
* Automatic master servers failover - no single point of failure (SPOF). * Automatic master servers failover - no single point of failure (SPOF).
* Automatic Gzip compression depending on file mime type.
* Automatic Gzip compression depending on file MIME type.
* Automatic compaction to reclaim disk space after deletion or update. * Automatic compaction to reclaim disk space after deletion or update.
* [Automatic entry TTL expiration][VolumeServerTTL]. * [Automatic entry TTL expiration][VolumeServerTTL].
* Any server with some disk spaces can add to the total storage space. * Any server with some disk spaces can add to the total storage space.
@ -206,7 +204,7 @@ SeaweedFS uses HTTP REST operations to read, write, and delete. The responses ar
### Write File ### ### Write File ###
To upload a file: first, send a HTTP POST, PUT, or GET request to `/dir/assign` to get an `fid` and a volume server url:
To upload a file: first, send a HTTP POST, PUT, or GET request to `/dir/assign` to get an `fid` and a volume server URL:
``` ```
> curl http://localhost:9333/dir/assign > curl http://localhost:9333/dir/assign
@ -255,7 +253,7 @@ First look up the volume server's URLs by the file's volumeId:
Since (usually) there are not too many volume servers, and volumes don't move often, you can cache the results most of the time. Depending on the replication type, one volume can have multiple replica locations. Just randomly pick one location to read. Since (usually) there are not too many volume servers, and volumes don't move often, you can cache the results most of the time. Depending on the replication type, one volume can have multiple replica locations. Just randomly pick one location to read.
Now you can take the public url, render the url or directly read from the volume server via url:
Now you can take the public URL, render the URL or directly read from the volume server via URL:
``` ```
http://localhost:8080/3,01637037d6.jpg http://localhost:8080/3,01637037d6.jpg
@ -356,9 +354,9 @@ On each write request, the master server also generates a file key, which is a g
### Write and Read files ### ### Write and Read files ###
When a client sends a write request, the master server returns (volume id, file key, file cookie, volume node url) for the file. The client then contacts the volume node and POSTs the file content.
When a client sends a write request, the master server returns (volume id, file key, file cookie, volume node URL) for the file. The client then contacts the volume node and POSTs the file content.
When a client needs to read a file based on (volume id, file key, file cookie), it asks the master server by the volume id for the (volume node url, volume node public url), or retrieves this from a cache. Then the client can GET the content, or just render the URL on web pages and let browsers fetch the content.
When a client needs to read a file based on (volume id, file key, file cookie), it asks the master server by the volume id for the (volume node URL, volume node public URL), or retrieves this from a cache. Then the client can GET the content, or just render the URL on web pages and let browsers fetch the content.
Please see the example for details on the write-read process. Please see the example for details on the write-read process.
@ -412,7 +410,7 @@ The architectures are mostly the same. SeaweedFS aims to store and read files fa
* SeaweedFS optimizes for small files, ensuring O(1) disk seek operation, and can also handle large files. * SeaweedFS optimizes for small files, ensuring O(1) disk seek operation, and can also handle large files.
* SeaweedFS statically assigns a volume id for a file. Locating file content becomes just a lookup of the volume id, which can be easily cached. * SeaweedFS statically assigns a volume id for a file. Locating file content becomes just a lookup of the volume id, which can be easily cached.
* SeaweedFS Filer metadata store can be any well-known and proven data stores, e.g., Redis, Cassandra, HBase, Mongodb, Elastic Search, MySql, Postgres, Sqlite, MemSql, TiDB, CockroachDB, Etcd, YDB etc, and is easy to customized.
* SeaweedFS Filer metadata store can be any well-known and proven data store, e.g., Redis, Cassandra, HBase, Mongodb, Elastic Search, MySql, Postgres, Sqlite, MemSql, TiDB, CockroachDB, Etcd, YDB etc, and is easy to customize.
* SeaweedFS Volume server also communicates directly with clients via HTTP, supporting range queries, direct uploads, etc. * SeaweedFS Volume server also communicates directly with clients via HTTP, supporting range queries, direct uploads, etc.
| System | File Metadata | File Content Read| POSIX | REST API | Optimized for large number of small files | | System | File Metadata | File Content Read| POSIX | REST API | Optimized for large number of small files |
@ -448,9 +446,9 @@ Ceph can be setup similar to SeaweedFS as a key->blob store. It is much more com
SeaweedFS has a centralized master group to look up free volumes, while Ceph uses hashing and metadata servers to locate its objects. Having a centralized master makes it easy to code and manage. SeaweedFS has a centralized master group to look up free volumes, while Ceph uses hashing and metadata servers to locate its objects. Having a centralized master makes it easy to code and manage.
Same as SeaweedFS, Ceph is also based on the object store RADOS. Ceph is rather complicated with mixed reviews.
Ceph, like SeaweedFS, is based on the object store RADOS. Ceph is rather complicated with mixed reviews.
Ceph uses CRUSH hashing to automatically manage the data placement, which is efficient to locate the data. But the data has to be placed according to the CRUSH algorithm. Any wrong configuration would cause data loss. Topology changes, such as adding new servers to increase capacity, will cause data migration with high IO cost to fit the CRUSH algorithm. SeaweedFS places data by assigning them to any writable volumes. If writes to one volume failed, just pick another volume to write. Adding more volumes are also as simple as it can be.
Ceph uses CRUSH hashing to automatically manage data placement, which is efficient to locate the data. But the data has to be placed according to the CRUSH algorithm. Any wrong configuration would cause data loss. Topology changes, such as adding new servers to increase capacity, will cause data migration with high IO cost to fit the CRUSH algorithm. SeaweedFS places data by assigning them to any writable volumes. If writes to one volume failed, just pick another volume to write. Adding more volumes is also as simple as it can be.
SeaweedFS is optimized for small files. Small files are stored as one continuous block of content, with at most 8 unused bytes between files. Small file access is O(1) disk read. SeaweedFS is optimized for small files. Small files are stored as one continuous block of content, with at most 8 unused bytes between files. Small file access is O(1) disk read.
@ -499,7 +497,7 @@ Step 1: install go on your machine and setup the environment by following the in
https://golang.org/doc/install https://golang.org/doc/install
make sure you set up your $GOPATH
make sure to define your $GOPATH
Step 2: checkout this repo: Step 2: checkout this repo:
@ -536,7 +534,7 @@ Write 1 million 1KB file:
``` ```
Concurrency Level: 16 Concurrency Level: 16
Time taken for tests: 66.753 seconds Time taken for tests: 66.753 seconds
Complete requests: 1048576
Completed requests: 1048576
Failed requests: 0 Failed requests: 0
Total transferred: 1106789009 bytes Total transferred: 1106789009 bytes
Requests per second: 15708.23 [#/sec] Requests per second: 15708.23 [#/sec]
@ -562,7 +560,7 @@ Randomly read 1 million files:
``` ```
Concurrency Level: 16 Concurrency Level: 16
Time taken for tests: 22.301 seconds Time taken for tests: 22.301 seconds
Complete requests: 1048576
Completed requests: 1048576
Failed requests: 0 Failed requests: 0
Total transferred: 1106812873 bytes Total transferred: 1106812873 bytes
Requests per second: 47019.38 [#/sec] Requests per second: 47019.38 [#/sec]

4
test/s3/compatibility/run.sh

@ -10,7 +10,7 @@ docker stop s3test-instance || echo "already stopped"
ulimit -n 10000 ulimit -n 10000
../../../weed/weed server -filer -s3 -volume.max 0 -master.volumeSizeLimitMB 5 -dir "$(pwd)/tmp" 1>&2>weed.log & ../../../weed/weed server -filer -s3 -volume.max 0 -master.volumeSizeLimitMB 5 -dir "$(pwd)/tmp" 1>&2>weed.log &
until $(curl --output /dev/null --silent --head --fail http://127.0.0.1:9333); do
until curl --output /dev/null --silent --head --fail http://127.0.0.1:9333; do
printf '.' printf '.'
sleep 5 sleep 5
done done
@ -18,7 +18,7 @@ sleep 3
rm -Rf logs-full.txt logs-summary.txt rm -Rf logs-full.txt logs-summary.txt
# docker run --name s3test-instance --rm -e S3TEST_CONF=s3tests.conf -v `pwd`/s3tests.conf:/s3-tests/s3tests.conf -it s3tests ./virtualenv/bin/nosetests s3tests_boto3/functional/test_s3.py:test_get_obj_tagging -v -a 'resource=object,!bucket-policy,!versioning,!encryption' # docker run --name s3test-instance --rm -e S3TEST_CONF=s3tests.conf -v `pwd`/s3tests.conf:/s3-tests/s3tests.conf -it s3tests ./virtualenv/bin/nosetests s3tests_boto3/functional/test_s3.py:test_get_obj_tagging -v -a 'resource=object,!bucket-policy,!versioning,!encryption'
docker run --name s3test-instance --rm -e S3TEST_CONF=s3tests.conf -v `pwd`/s3tests.conf:/s3-tests/s3tests.conf -it s3tests ./virtualenv/bin/nosetests s3tests_boto3/functional/test_s3.py -v -a 'resource=object,!bucket-policy,!versioning,!encryption' | sed -n -e '/botocore.hooks/!p;//q' | tee logs-summary.txt
docker run --name s3test-instance --rm -e S3TEST_CONF=s3tests.conf -v "$(pwd)"/s3tests.conf:/s3-tests/s3tests.conf -it s3tests ./virtualenv/bin/nosetests s3tests_boto3/functional/test_s3.py -v -a 'resource=object,!bucket-policy,!versioning,!encryption' | sed -n -e '/botocore.hooks/!p;//q' | tee logs-summary.txt
docker stop s3test-instance || echo "already stopped" docker stop s3test-instance || echo "already stopped"
killall -9 weed killall -9 weed

2
unmaintained/fix_dat/fix_dat.go

@ -24,7 +24,7 @@ var (
/* /*
This is to resolve an one-time issue that caused inconsistency with .dat and .idx files. This is to resolve an one-time issue that caused inconsistency with .dat and .idx files.
In this case, the .dat file contains all data, but some of deletion caused incorrect offset.
In this case, the .dat file contains all data, but some deletion caused incorrect offset.
The .idx has all correct offsets. The .idx has all correct offsets.
1. fix the .dat file, a new .dat_fixed file will be generated. 1. fix the .dat file, a new .dat_fixed file will be generated.

6
weed/command/autocomplete.go

@ -41,7 +41,7 @@ func AutocompleteMain(commands []*Command) bool {
func installAutoCompletion() bool { func installAutoCompletion() bool {
if runtime.GOOS == "windows" { if runtime.GOOS == "windows" {
fmt.Println("windows is not supported")
fmt.Println("Windows is not supported")
return false return false
} }
@ -56,7 +56,7 @@ func installAutoCompletion() bool {
func uninstallAutoCompletion() bool { func uninstallAutoCompletion() bool {
if runtime.GOOS == "windows" { if runtime.GOOS == "windows" {
fmt.Println("windows is not supported")
fmt.Println("Windows is not supported")
return false return false
} }
@ -65,7 +65,7 @@ func uninstallAutoCompletion() bool {
fmt.Printf("uninstall failed! %s\n", err) fmt.Printf("uninstall failed! %s\n", err)
return false return false
} }
fmt.Printf("autocompletion is disable. Please restart your shell.\n")
fmt.Printf("autocompletion is disabled. Please restart your shell.\n")
return true return true
} }

6
weed/command/benchmark.go

@ -74,14 +74,14 @@ func init() {
var cmdBenchmark = &Command{ var cmdBenchmark = &Command{
UsageLine: "benchmark -master=localhost:9333 -c=10 -n=100000", UsageLine: "benchmark -master=localhost:9333 -c=10 -n=100000",
Short: "benchmark on writing millions of files and read out",
Short: "benchmark by writing millions of files and reading them out",
Long: `benchmark on an empty SeaweedFS file system. Long: `benchmark on an empty SeaweedFS file system.
Two tests during benchmark: Two tests during benchmark:
1) write lots of small files to the system 1) write lots of small files to the system
2) read the files out 2) read the files out
The file content is mostly zero, but no compression is done.
The file content is mostly zeros, but no compression is done.
You can choose to only benchmark read or write. You can choose to only benchmark read or write.
During write, the list of uploaded file ids is stored in "-list" specified file. During write, the list of uploaded file ids is stored in "-list" specified file.
@ -468,7 +468,7 @@ func (s *stats) printStats() {
timeTaken := float64(int64(s.end.Sub(s.start))) / 1000000000 timeTaken := float64(int64(s.end.Sub(s.start))) / 1000000000
fmt.Printf("\nConcurrency Level: %d\n", *b.concurrency) fmt.Printf("\nConcurrency Level: %d\n", *b.concurrency)
fmt.Printf("Time taken for tests: %.3f seconds\n", timeTaken) fmt.Printf("Time taken for tests: %.3f seconds\n", timeTaken)
fmt.Printf("Complete requests: %d\n", completed)
fmt.Printf("Completed requests: %d\n", completed)
fmt.Printf("Failed requests: %d\n", failed) fmt.Printf("Failed requests: %d\n", failed)
fmt.Printf("Total transferred: %d bytes\n", transferred) fmt.Printf("Total transferred: %d bytes\n", transferred)
fmt.Printf("Requests per second: %.2f [#/sec]\n", float64(completed)/timeTaken) fmt.Printf("Requests per second: %.2f [#/sec]\n", float64(completed)/timeTaken)

Loading…
Cancel
Save