You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

86 lines
3.6 KiB

13 years ago
13 years ago
13 years ago
13 years ago
13 years ago
  1. 1. each file can choose the replication factor
  2. 2. replication granularity is in volume level
  3. 3. if not enough spaces, we can automatically decrease some volume's the replication factor, especially for cold data
  4. 4. plan to support migrating data to cheaper storage
  5. 5. plan to manual volume placement, access-based volume placement, auction based volume placement
  6. When a new volume server is started, it reports
  7. 1. how many volumes it can hold
  8. 2. current list of existing volumes and each volume's replication type
  9. Each volume server remembers:
  10. 1. current volume ids
  11. 2. replica locations are read from the master
  12. The master assign volume ids based on
  13. 1. replication factor
  14. data center, rack
  15. 2. concurrent write support
  16. On master, stores the replication configuration
  17. {
  18. replication:{
  19. {type:"00", min_volume_count:3, weight:10},
  20. {type:"01", min_volume_count:2, weight:20},
  21. {type:"10", min_volume_count:2, weight:20},
  22. {type:"11", min_volume_count:3, weight:30},
  23. {type:"20", min_volume_count:2, weight:20}
  24. },
  25. port:9333,
  26. }
  27. Or manually via command line
  28. 1. add volume with specified replication factor
  29. 2. add volume with specified volume id
  30. If duplicated volume ids are reported from different volume servers,
  31. the master determines the replication factor of the volume,
  32. if less than the replication factor, the volume is in readonly mode
  33. if more than the replication factor, the volume will purge the smallest/oldest volume
  34. if equal, the volume will function as usual
  35. Use cases:
  36. on volume server
  37. 1. weed volume -mserver="xx.xx.xx.xx:9333" -publicUrl="good.com:8080" -dir="/tmp" -volumes=50
  38. on weed master
  39. 1. weed master -port=9333
  40. generate a default json configuration file if doesn't exist
  41. Bootstrap
  42. 1. at the very beginning, the system has no volumes at all.
  43. When data node starts:
  44. 1. each data node send to master its existing volumes and max volume blocks
  45. 2. master remembers the topology/data_center/rack/data_node/volumes
  46. for each replication level, stores
  47. volume id ~ data node
  48. writable volume ids
  49. If any "assign" request comes in
  50. 1. find a writable volume with the right replicationLevel
  51. 2. if not found, grow the volumes with the right replication level
  52. 3. return a writable volume to the user
  53. Plan:
  54. Step 1. implement one copy(no replication), automatically assign volume ids
  55. Step 2. add replication
  56. For the above operations, here are the todo list:
  57. for data node:
  58. 0. detect existing volumes DONE
  59. 1. onStartUp, and periodically, send existing volumes and maxVolumeCount store.Join(), DONE
  60. 2. accept command to grow a volume( id + replication level) DONE
  61. /admin/assign_volume?volume=some_id&replicationType=01
  62. 3. accept setting volumeLocationList DONE
  63. /admin/set_volume_locations_list?volumeLocationsList=[{Vid:xxx,Locations:[loc1,loc2,loc3]}]
  64. 4. for each write, pass the write to the next location, (Step 2)
  65. POST method should accept an index, like ttl, get decremented every hop
  66. for master:
  67. 1. accept data node's report of existing volumes and maxVolumeCount ALREADY EXISTS /dir/join
  68. 2. periodically refresh for active data nodes, and adjust writable volumes
  69. 3. send command to grow a volume(id + replication level) DONE
  70. 4. NOT_IMPLEMENTING: if dead/stale data nodes are found, for the affected volumes, send stale info
  71. to other data nodes. BECAUSE the master will stop sending writes to these data nodes
  72. 5. accept lookup for volume locations ALREADY EXISTS /dir/lookup
  73. 6. read topology/datacenter/rack layout
  74. TODO:
  75. 1. replicate content to the other server if the replication type needs replicas