seaweedfs/note/replication.txt


								1. each file can choose the replication factor

								2. replication granularity is in volume level

								3. if not enough spaces, we can automatically decrease some volume's the replication factor, especially for cold data

								4. plan to support migrating data to cheaper storage

								5. plan to manual volume placement, access-based volume placement, auction based volume placement


								When a new volume server is started, it reports

								  1. how many volumes it can hold

								  2. current list of existing volumes and each volume's replication type

								Each volume server remembers:

								  1. current volume ids

								  2. replica locations are read from the master


								The master assign volume ids based on

								  1. replication factor

								     data center, rack

								  2. concurrent write support

								On master, stores the replication configuration

								{

								  replication:{

								    {type:"00", min_volume_count:3, weight:10},

								    {type:"01", min_volume_count:2, weight:20},

								    {type:"10", min_volume_count:2, weight:20},

								    {type:"11", min_volume_count:3, weight:30},

								    {type:"20", min_volume_count:2, weight:20}

								  },

								  port:9333,

								}

								Or manually via command line

								  1. add volume with specified replication factor

								  2. add volume with specified volume id


								If duplicated volume ids are reported from different volume servers,

								the master determines the replication factor of the volume,

								if less than the replication factor, the volume is in readonly mode

								if more than the replication factor, the volume will purge the smallest/oldest volume

								if equal, the volume will function as usual


								Use cases:

								  on volume server

								    1. weed volume -mserver="xx.xx.xx.xx:9333" -publicUrl="good.com:8080" -dir="/tmp" -volumes=50

								  on weed master

								    1. weed master -port=9333

								      generate a default json configuration file if doesn't exist


								Bootstrap

								  1. at the very beginning, the system has no volumes at all.

								When data node starts:

								  1. each data node send to master its existing volumes and max volume blocks

								  2. master remembers the topology/data_center/rack/data_node/volumes

								     for each replication level, stores

								       volume id ~ data node

								       writable volume ids

								If any "assign" request comes in

								  1. find a writable volume with the right replicationLevel

								  2. if not found, grow the volumes with the right replication level

								  3. return a writable volume to the user


								  for data node:

								    0. detect existing volumes DONE

								    1. onStartUp, and periodically, send existing volumes and maxVolumeCount  store.Join(), DONE

								    2. accept command to grow a volume( id + replication level)  DONE

								       /admin/assign_volume?volume=some_id&replicationType=01

								    3. accept setting volumeLocationList  DONE

								       /admin/set_volume_locations_list?volumeLocationsList=[{Vid:xxx,Locations:[loc1,loc2,loc3]}]

								    4. for each write, pass the write to the next location, (Step 2)

								       POST method should accept an index, like ttl, get decremented every hop

								  for master:

								    1. accept data node's report of existing volumes and maxVolumeCount ALREADY EXISTS /dir/join

								    2. periodically refresh for active data nodes, and adjust writable volumes

								    3. send command to grow a volume(id + replication level)  DONE

								    5. accept lookup for volume locations    ALREADY EXISTS /dir/lookup

								    6. read topology/datacenter/rack layout


								An algorithm to allocate volumes evenly, but may be inefficient if free volumes are plenty:

								input: replication=xyz

								algorithm:

								ret_dcs = []

								foreach dc that has y+z+1 volumes{

								  ret_racks = []

								  foreach rack with z+1 volumes{

								    ret = select z+1 servers with 1 volume

								  if ret.size()==z+1 {

								    ret_racks.append(ret)

								  }

								  }

								  randomly pick one rack from ret_racks

								  ret += select y racks with 1 volume each

								  if ret.size()==y+z+1{

								    ret_dcs.append(ret)

								  }

								}

								randomly pick one dc from ret_dcs

								ret += select x data centers with 1 volume each


								A simple replica placement algorithm, but may fail when free volume slots are not plenty:

								ret := []volumes

								dc = randomly pick 1 data center with y+z+1 volumes

								  rack = randomly pick 1 rack with z+1 volumes

								    ret = ret.append(randomly pick z+1 volumes)

								  ret = ret.append(randomly pick y racks with 1 volume)

								ret = ret.append(randomly pick x data centers with 1 volume)


								TODO:

								  1. replicate content to the other server if the replication type needs replicas