* initial design
* added simulation as tests
* reorganized the codebase to move the simulation framework and tests into their own dedicated package
* integration test. ec worker task
* remove "enhanced" reference
* start master, volume servers, filer
Current Status
✅ Master: Healthy and running (port 9333)
✅ Filer: Healthy and running (port 8888)
✅ Volume Servers: All 6 servers running (ports 8080-8085)
🔄 Admin/Workers: Will start when dependencies are ready
* generate write load
* tasks are assigned
* admin start wtih grpc port. worker has its own working directory
* Update .gitignore
* working worker and admin. Task detection is not working yet.
* compiles, detection uses volumeSizeLimitMB from master
* compiles
* worker retries connecting to admin
* build and restart
* rendering pending tasks
* skip task ID column
* sticky worker id
* test canScheduleTaskNow
* worker reconnect to admin
* clean up logs
* worker register itself first
* worker can run ec work and report status
but:
1. one volume should not be repeatedly worked on.
2. ec shards needs to be distributed and source data should be deleted.
* move ec task logic
* listing ec shards
* local copy, ec. Need to distribute.
* ec is mostly working now
* distribution of ec shards needs improvement
* need configuration to enable ec
* show ec volumes
* interval field UI component
* rename
* integration test with vauuming
* garbage percentage threshold
* fix warning
* display ec shard sizes
* fix ec volumes list
* Update ui.go
* show default values
* ensure correct default value
* MaintenanceConfig use ConfigField
* use schema defined defaults
* config
* reduce duplication
* refactor to use BaseUIProvider
* each task register its schema
* checkECEncodingCandidate use ecDetector
* use vacuumDetector
* use volumeSizeLimitMB
* remove
remove
* remove unused
* refactor
* use new framework
* remove v2 reference
* refactor
* left menu can scroll now
* The maintenance manager was not being initialized when no data directory was configured for persistent storage.
* saving config
* Update task_config_schema_templ.go
* enable/disable tasks
* protobuf encoded task configurations
* fix system settings
* use ui component
* remove logs
* interface{} Reduction
* reduce interface{}
* reduce interface{}
* avoid from/to map
* reduce interface{}
* refactor
* keep it DRY
* added logging
* debug messages
* debug level
* debug
* show the log caller line
* use configured task policy
* log level
* handle admin heartbeat response
* Update worker.go
* fix EC rack and dc count
* Report task status to admin server
* fix task logging, simplify interface checking, use erasure_coding constants
* factor in empty volume server during task planning
* volume.list adds disk id
* track disk id also
* fix locking scheduled and manual scanning
* add active topology
* simplify task detector
* ec task completed, but shards are not showing up
* implement ec in ec_typed.go
* adjust log level
* dedup
* implementing ec copying shards and only ecx files
* use disk id when distributing ec shards
🎯 Planning: ActiveTopology creates DestinationPlan with specific TargetDisk
📦 Task Creation: maintenance_integration.go creates ECDestination with DiskId
🚀 Task Execution: EC task passes DiskId in VolumeEcShardsCopyRequest
💾 Volume Server: Receives disk_id and stores shards on specific disk (vs.store.Locations[req.DiskId])
📂 File System: EC shards and metadata land in the exact disk directory planned
* Delete original volume from all locations
* clean up existing shard locations
* local encoding and distributing
* Update docker/admin_integration/EC-TESTING-README.md
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
* check volume id range
* simplify
* fix tests
* fix types
* clean up logs and tests
---------
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
* add a menu item "Message Queue"
* add a menu item "Message Queue"
* move the "brokers" link under it.
* add "topics", "subscribers". Add pages for them.
* refactor
* show topic details
* admin display publisher and subscriber info
* remove publisher and subscribers from the topic row pull down
* collecting more stats from publishers and subscribers
* fix layout
* fix publisher name
* add local listeners for mq broker and agent
* render consumer group offsets
* remove subscribers from left menu
* topic with retention
* support editing topic retention
* show retention when listing topics
* create bucket
* Update s3_buckets_templ.go
* embed the static assets into the binary
fix https://github.com/seaweedfs/seaweedfs/issues/6964
* add ui for maintenance
* valid config loading. fix workers page.
* refactor
* grpc between admin and workers
* add a long-running bidirectional grpc call between admin and worker
* use the grpc call to heartbeat
* use the grpc call to communicate
* worker can remove the http client
* admin uses http port + 10000 as its default grpc port
* one task one package
* handles connection failures gracefully with exponential backoff
* grpc with insecure tls
* grpc with optional tls
* fix detecting tls
* change time config from nano seconds to seconds
* add tasks with 3 interfaces
* compiles reducing hard coded
* remove a couple of tasks
* remove hard coded references
* reduce hard coded values
* remove hard coded values
* remove hard coded from templ
* refactor maintenance package
* fix import cycle
* simplify
* simplify
* auto register
* auto register factory
* auto register task types
* self register types
* refactor
* simplify
* remove one task
* register ui
* lazy init executor factories
* use registered task types
* DefaultWorkerConfig remove hard coded task types
* remove more hard coded
* implement get maintenance task
* dynamic task configuration
* "System Settings" should only have system level settings
* adjust menu for tasks
* ensure menu not collapsed
* render job configuration well
* use templ for ui of task configuration
* fix ordering
* fix bugs
* saving duration in seconds
* use value and unit for duration
* Delete WORKER_REFACTORING_PLAN.md
* Delete maintenance.json
* Delete custom_worker_example.go
* remove address from workers
* remove old code from ec task
* remove creating collection button
* reconnect with exponential backoff
* worker use security.toml
* start admin server with tls info from security.toml
* fix "weed admin" cli description
Changed the signal from SIGUSR1 to SIGTERM. This should fix the compilation error since SIGTERM is available on all Unix-like systems including macOS. The functionality remains similar - it will still allow the master process to wait for a signal before exiting, just using a different signal type.
This prevent crash filler with nil pointer dereference as s3 expect this
parameters.
New two parameters are added to filer command - copy of s3 parameters:
- s3.cacert.file - path to the TLS CA certificate file
- s3.tlsVerifyClientCert - whether to verify the client's certificate
* rename
* set agent address
* refactor
* add agent sub
* pub messages
* grpc new client
* can publish records via agent
* send init message with session id
* fmt
* check cancelled request while waiting
* use sessionId
* handle possible nil stream
* subscriber process messages
* separate debug port
* use atomic int64
* less logs
* minor
* skip io.EOF
* rename
* remove unused
* use saved offsets
* do not reuse session, since always session id is new after restart
remove last active ts from SessionEntry
* simplify printing
* purge unused
* just proxy the subscription, skipping the session step
* adjust offset types
* subscribe offset type and possible value
* start after the known tsns
* avoid wrongly set startPosition
* move
* remove
* refactor
* typo
* fix
* fix changed path
s3 command ignore tlsVerifyClientCert and cacert.file arguments from
command line. On startS3Server instead of use real values (in s3opt),
default values (from s3Options, always empty) are checked.
Now on right values are checked and if user provide this arguments
RequireAndVerifyClientCert is set and/or ca certificate is loaded.