You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
7.0 KiB
7.0 KiB
Admin Worker Plugin System (Design)
This document describes the plugin system for admin-managed workers, implemented in parallel with the current maintenance/worker mechanism.
Scope
- Add a new plugin protocol and runtime model for multi-language workers.
- Keep all current admin + worker code paths untouched.
- Use gRPC for all admin-worker communication.
- Let workers describe job configuration UI declaratively via protobuf.
- Persist all job type configuration under admin server data directory.
- Support detector workers and executor workers per job type.
- Add end-to-end workflow observability (activities, active jobs, progress).
New Contract
- Proto file:
weed/pb/plugin.proto - gRPC service:
PluginControlService.WorkerStream - Connection model: worker-initiated long-lived bidirectional stream.
Why this model:
- Works for workers in any language with gRPC support.
- Avoids admin dialing constraints in NAT/private networks.
- Allows command/response, progress streaming, and heartbeat over one channel.
Core Runtime Components (Admin Side)
PluginRegistry
- Tracks connected workers and their per-job-type capabilities.
- Maintains liveness via heartbeat timeout.
SchemaCoordinator
- For each job type, asks one capable worker for
JobTypeDescriptor. - Caches descriptor version and refresh timestamp.
ConfigStore
- Persists descriptor + saved config values in
dataDir. - Stores both:
- Admin-owned runtime config (detection interval, dispatch concurrency, retry).
- Worker-owned config values (plugin-specific detection/execution knobs).
DetectorScheduler
- Per job type, chooses one detector worker (
can_detect=true). - Sends
RunDetectionRequestwith saved configs + cluster context. - Accepts
DetectionProposals, dedupes bydedupe_key, inserts jobs.
JobDispatcher
- Chooses executor worker (
can_execute=true) for each pending job. - Sends
ExecuteJobRequest. - Consumes
JobProgressUpdateandJobCompleted.
WorkflowMonitor
- Builds live counters and timeline from events:
- activities per job type,
- active jobs,
- per-job progress/state,
- worker health/load.
Worker Responsibilities
- Register capabilities on connect (
WorkerHello). - Expose job type descriptor (
ConfigSchemaResponse) including UI schemas:
- admin config form,
- worker config form,
- defaults.
- Run detection on demand (
RunDetectionRequest) and return proposals. - Execute assigned jobs (
ExecuteJobRequest) and stream progress. - Heartbeat regularly with slot usage and running work.
- Handle cancellation requests (
CancelRequest) for in-flight detection/execution.
Declarative UI Model
UI is fully derived from protobuf schema:
ConfigFormConfigSectionConfigFieldConfigOptionValidationRuleConfigValue(typed scalar/list/map/object value container)
Result:
- Admin can render forms without hardcoded task structs.
- New job types can ship UI schema from worker binary alone.
- Worker language is irrelevant as long as it can emit protobuf messages.
Detection and Dispatch Flow
- Worker connects and registers capabilities.
- Admin requests descriptor per job type.
- Admin persists descriptor and editable config values.
- On detection interval (admin-owned setting):
- Admin chooses one detector worker for that job type.
- Sends
RunDetectionRequestwith:AdminRuntimeConfig,admin_config_values,worker_config_values,ClusterContext(master/filer/volume grpc locations, metadata).
- Detector emits
DetectionProposalsandDetectionComplete. - Admin dedupes and enqueues jobs.
- Dispatcher assigns jobs to any eligible executor worker.
- Executor emits
JobProgressUpdateandJobCompleted. - Monitor updates workflow UI in near-real-time.
Persistence Layout (Admin Data Dir)
Current layout under <admin-data-dir>/plugin/:
job_types/<job_type>/descriptor.pbjob_types/<job_type>/descriptor.jsonjob_types/<job_type>/config.pbjob_types/<job_type>/config.jsonjob_types/<job_type>/runs.jsonjobs/tracked_jobs.jsonactivities/activities.json
config.pb should use PersistedJobTypeConfig from plugin.proto.
Admin UI
- Route:
/plugin - Includes:
- runtime status,
- workers/capabilities,
- declarative descriptor-driven config forms,
- run history (last 10 success + last 10 errors),
- tracked jobs and activity stream,
- manual actions for schema refresh, detection, and detect+execute workflow.
Scheduling Policy (Initial)
Detector selection per job type:
- only workers with
can_detect=true. - prefer healthy worker with highest free detection slots.
- lease ends when heartbeat timeout or stream drop.
Execution dispatch:
- only workers with
can_execute=true. - select by available execution slots and least active jobs.
- retry on failure using admin runtime retry config.
Safety and Reliability
- Idempotency: dedupe proposals by (
job_type,dedupe_key). - Backpressure: enforce max jobs per detection run.
- Timeouts: detection and execution timeout from admin runtime config.
- Replay-safe persistence: write job state changes before emitting UI events.
- Heartbeat-based failover for detector/executor reassignment.
Backward Compatibility
- Legacy
worker.protoruntime remains internally available where still referenced. - External CLI worker path is moved to plugin runtime behavior.
- Runtime is enabled by default on admin worker gRPC server.
Incremental Rollout Plan
Phase 1
- Introduce protocol and storage models only.
Phase 2
- Build admin registry/scheduler/dispatcher behind feature flag.
Phase 3
- Add dedicated plugin UI pages and metrics.
Phase 4
- Port one existing job type (e.g. vacuum) as external worker plugin.
Phase 4 status (starter)
- Added
weed workercommand as an externalplugin.protoworker process. - Initial handler implements
vacuumjob type with:- declarative descriptor/config form response (
ConfigSchemaResponse), - detection via master topology scan (
RunDetectionRequest), - execution via existing vacuum task logic (
ExecuteJobRequest), - heartbeat/load reporting for monitor UI.
- declarative descriptor/config form response (
- Legacy maintenance-worker-specific CLI path is removed.
Run example:
- Start admin:
weed admin -master=localhost:9333 - Start worker:
weed worker -admin=localhost:23646 - Optional explicit job type:
weed worker -admin=localhost:23646 -jobType=vacuum - Optional stable worker ID persistence:
weed worker -admin=localhost:23646 -workingDir=/var/lib/seaweedfs-plugin
Phase 5
- Migrate remaining job types and deprecate old mechanism.
Agreed Defaults
- Detector multiplicity
- Exactly one detector worker per job type at a time. Admin selects one worker and runs detection there.
- Secret handling
- No encryption at rest required for plugin config in this phase.
- Schema compatibility
- No migration policy required yet; this is a new system.
- Execution ownership
- Same worker is allowed to do both detection and execution.
- Retention
- Keep last 10 successful runs and last 10 error runs per job type.