Browse Source

docs(volume-server): refocus plan on native rust parity

codex-rust-volume-server-bootstrap
Chris Lu 4 weeks ago
parent
commit
14c863dbff
  1. 183
      rust/volume_server/DEV_PLAN.md
  2. 13
      test/volume_server/DEV_PLAN.md

183
rust/volume_server/DEV_PLAN.md

@ -1,72 +1,147 @@
# Rust Volume Server Rewrite Dev Plan
# Rust Volume Server Parity Implementation Plan
## Goal
Build a Rust implementation of SeaweedFS volume server that is behavior-compatible with the current Go implementation and can pass the existing integration suites under `/Users/chris/dev/seaweedfs2/test/volume_server/http` and `/Users/chris/dev/seaweedfs2/test/volume_server/grpc`.
## Objective
Implement a native Rust volume server that replicates Go volume-server behavior for HTTP and gRPC APIs, so it can become a drop-in replacement validated by existing integration suites.
## Compatibility Target
- CLI compatibility for volume-server startup flags used by integration harness.
- HTTP and gRPC behavioral parity for tested paths.
- Drop-in process integration with current Go master in transition phases.
## Current Focus (2026-02-16)
- Program focus is now Rust implementation parity, not broad test expansion.
- `test/volume_server` is treated as the parity gate.
- Existing Rust launcher modes (`exec`, `proxy`) are transition tools; they are not the final target.
## Phases
### Phase 0: Bootstrap and Harness Integration
- [x] Add Rust volume-server crate.
- [x] Implement Rust launcher that can run as a volume-server process entrypoint.
- [x] Add launcher execution modes (`exec` and `proxy`) behind `VOLUME_SERVER_RUST_MODE`.
- [x] Add integration harness switches so tests can run with:
## Current Status
- Rust crate and launcher are in place.
- Integration harness can run:
- Go master + Go volume (default) - Go master + Go volume (default)
- Go master + Rust volume (`VOLUME_SERVER_IMPL=rust` or `VOLUME_SERVER_BINARY=...`)
- [x] Add CI smoke coverage for Rust volume-server mode.
- Go master + Rust launcher (`VOLUME_SERVER_IMPL=rust`)
- Rust launcher `proxy` mode has full-suite integration pass while delegating backend handlers to Go.
- Native Rust API/storage logic is not implemented yet.
## Parity Exit Criteria
1. Native mode passes:
- `env VOLUME_SERVER_IMPL=rust VOLUME_SERVER_RUST_MODE=native go test -count=1 ./test/volume_server/http`
- `env VOLUME_SERVER_IMPL=rust VOLUME_SERVER_RUST_MODE=native go test -count=1 ./test/volume_server/grpc`
2. CI runs native Rust mode integration coverage (at least smoke, then expanded shards).
3. Rust mode defaults to native behavior for integration harness.
4. Go-backend delegation is removed (or retained only as explicit fallback mode).
## Architecture Workstreams
### Phase 1: Native Rust Control Plane Skeleton
- [ ] Native Rust HTTP server with admin endpoints:
### A. Runtime and Configuration Parity
- [ ] Add `native` runtime mode in `weed-volume-rs`.
- [ ] Parse and honor volume-server CLI/config flags used by integration harness:
- [ ] network/bind ports (`-ip`, `-port`, `-port.grpc`, `-port.public`)
- [ ] master target/config dir/read mode/throttling/JWT-related config
- [ ] size/timeout controls and maintenance state defaults
- [ ] Implement graceful lifecycle behavior (signals, shutdown, readiness).
### B. Native HTTP Surface
- [ ] Admin/control endpoints:
- [ ] `GET /status` - [ ] `GET /status`
- [ ] `GET /healthz` - [ ] `GET /healthz`
- [ ] static/UI endpoints used by tests
- [ ] Native Rust gRPC server with basic lifecycle/state RPCs:
- [ ] static/UI endpoints currently exercised
- [ ] Data read path parity:
- [ ] fid parsing/path variants
- [ ] conditional headers (`If-Modified-Since`, `If-None-Match`)
- [ ] range handling (single/multi/invalid)
- [ ] deleted reads, auth checks, read-mode branches
- [ ] chunk-manifest and compression/image transformation branches
- [ ] Data write/delete parity:
- [ ] write success/unchanged/error paths
- [ ] replication and file-size-limit paths
- [ ] delete and chunk-manifest delete branches
- [ ] Method/CORS/public-port parity for split admin/public behavior.
### C. Native gRPC Surface
- [ ] Control-plane RPCs:
- [ ] `GetState`, `SetState`, `VolumeServerStatus`, `Ping`, `VolumeServerLeave` - [ ] `GetState`, `SetState`, `VolumeServerStatus`, `Ping`, `VolumeServerLeave`
- [ ] Flag/config parser parity for currently exercised startup options.
### Phase 2: Native Data Path (HTTP + core gRPC)
- [ ] HTTP read/write/delete parity:
- [ ] path variants, conditional headers, ranges, auth, throttling
- [ ] chunk manifest read/delete behavior
- [ ] image and compression transform branches
- [ ] gRPC data RPC parity:
- [ ] admin lifecycle: allocate/mount/unmount/delete/configure/readonly/writable
- [ ] Data RPCs:
- [ ] `ReadNeedleBlob`, `ReadNeedleMeta`, `WriteNeedleBlob` - [ ] `ReadNeedleBlob`, `ReadNeedleMeta`, `WriteNeedleBlob`
- [ ] `BatchDelete`, `ReadAllNeedles` - [ ] `BatchDelete`, `ReadAllNeedles`
- [ ] copy/receive/sync baseline
### Phase 3: Advanced gRPC Surface
- [ ] Vacuum RPC family.
- [ ] Tail sender/receiver.
- [ ] Erasure coding family.
- [ ] Tiering/remote fetch family.
- [ ] Query/Scrub family.
### Phase 4: Hardening and Cutover
- [ ] Determinism/flake hardening in integration runtime.
- [ ] Performance and resource-baseline checks versus Go.
- [ ] Optional dual-run diff tooling for payload/header parity.
- [ ] Default harness/CI mode switch to Rust volume server once parity threshold is met.
## Integration Test Mapping
- HTTP suite: `/Users/chris/dev/seaweedfs2/test/volume_server/http`
- gRPC suite: `/Users/chris/dev/seaweedfs2/test/volume_server/grpc`
- Harness: `/Users/chris/dev/seaweedfs2/test/volume_server/framework`
- [ ] sync/copy/receive and status endpoints
- [ ] Stream RPCs:
- [ ] tail sender/receiver
- [ ] vacuum streams
- [ ] query streams
- [ ] Advanced families:
- [ ] erasure coding RPC set
- [ ] tiering/remote fetch
- [ ] scrub/query mode matrix
### D. Storage Compatibility Layer
- [ ] Implement volume data/index handling compatible with Go on-disk format.
- [ ] Preserve cookie/checksum/timestamp semantics used by tests.
- [ ] Match read/write/delete consistency and error mapping behavior.
- [ ] Ensure EC metadata/data-path compatibility with existing files.
### E. Operational Hardening
- [ ] Deterministic startup/readiness and shutdown semantics.
- [ ] Log/error parity sufficient for debugging and CI triage.
- [ ] Concurrency/timeout behavior alignment for throttling and streams.
- [ ] Performance baseline checks vs Go for key flows.
## Milestone Plan
### M0 (Completed): Harness + Launcher Transition
- [x] Rust launcher integrated into harness.
- [x] Proxy mode full-suite validation with Go backend delegation.
### M1: Native Skeleton (Control Plane First)
- [ ] `native` mode boots and serves:
- [ ] `/status`, `/healthz`
- [ ] `GetState`, `SetState`, `VolumeServerStatus`, `Ping`, `VolumeServerLeave`
- Gate:
- targeted HTTP/grpc control tests pass in `native` mode.
### M2: Native Core Data Paths
- [ ] Native HTTP read/write/delete baseline parity.
- [ ] Native gRPC data baseline parity (`Read/WriteNeedle*`, `BatchDelete`, `ReadAllNeedles`).
- Gate:
- core HTTP and gRPC data suites pass in `native` mode.
### M3: Native Stream + Copy/Sync
- [ ] Tail/copy/receive/sync paths in native mode.
- Gate:
- stream/copy families pass in `native` mode.
### M4: Native Advanced Feature Families
- [ ] EC, tiering, scrub/query advanced branches.
- Gate:
- full `/test/volume_server/http` and `/test/volume_server/grpc` pass in `native` mode.
### M5: CI/Cutover
- [ ] Add/expand native-mode CI jobs.
- [ ] Make native mode default for Rust integration runs.
- [ ] Keep `exec`/`proxy` only as explicit fallback modes during rollout.
## Immediate Next Steps
1. Introduce `VOLUME_SERVER_RUST_MODE=native` and wire native server startup skeleton.
2. Implement `/status` and `/healthz` with parity headers/payload fields.
3. Implement minimal gRPC state/ping RPCs.
4. Run targeted integration tests in native mode and iterate on mismatches.
## Risk Register
- On-disk format mismatch risk:
- Mitigation: implement format-level compatibility tests early (idx/dat/needle encoding).
- Behavioral drift in edge branches:
- Mitigation: use integration suite failures as primary truth; only add tests for newly discovered untracked branches.
- Stream/concurrency semantic mismatch:
- Mitigation: stabilize with focused interruption/timeout parity tests.
## Progress Log ## Progress Log
- Date: 2026-02-15 - Date: 2026-02-15
- Change: Created Rust volume-server crate (`weed-volume-rs`) as compatibility launcher and wired harness binary selection (`VOLUME_SERVER_IMPL`/`VOLUME_SERVER_BINARY`).
- Validation: local Rust-mode smoke and full-suite runs passed:
- `VOLUME_SERVER_IMPL=rust go test ./test/volume_server/http ./test/volume_server/grpc`
- Change: Added Rust launcher integration (`exec`) and harness wiring.
- Validation: Rust launcher mode passed smoke and full integration suites while delegating to Go backend.
- Commits: `7beab85c2`, `880c2e1da`, `63d08e8a9`, `d402573ea`, `3bd20e6a1`, `6ce4d7ede` - Commits: `7beab85c2`, `880c2e1da`, `63d08e8a9`, `d402573ea`, `3bd20e6a1`, `6ce4d7ede`
- Date: 2026-02-15 - Date: 2026-02-15
- Change: Added Rust proxy supervisor mode (`VOLUME_SERVER_RUST_MODE=proxy`) with front-side TCP listeners for HTTP/public/gRPC and managed Go backend process.
- Change: Added Rust proxy supervisor mode and validated full integration suite.
- Validation: - Validation:
- `env VOLUME_SERVER_IMPL=rust VOLUME_SERVER_RUST_MODE=proxy go test -count=1 -timeout=200m ./test/volume_server/http`
- `env VOLUME_SERVER_IMPL=rust VOLUME_SERVER_RUST_MODE=proxy go test -count=1 -timeout=240m ./test/volume_server/grpc`
- Result: both suites pass end-to-end in proxy mode.
- `env VOLUME_SERVER_IMPL=rust VOLUME_SERVER_RUST_MODE=proxy go test -count=1 ./test/volume_server/http`
- `env VOLUME_SERVER_IMPL=rust VOLUME_SERVER_RUST_MODE=proxy go test -count=1 ./test/volume_server/grpc`
- Commits: `a7f50d23b`, `548b3d9a3` - Commits: `a7f50d23b`, `548b3d9a3`
- Date: 2026-02-16
- Change: Re-focused plan from test expansion to native Rust implementation parity.
- Validation basis: latest Rust proxy full-suite pass keeps regression baseline stable while native implementation starts.
- Commits: pending

13
test/volume_server/DEV_PLAN.md

@ -3,6 +3,12 @@
## Goal ## Goal
Create a Go integration test suite under `test/volume_server` that validates **drop-in behavior parity** for the Volume Server HTTP and gRPC APIs, so a Rust rewrite can be verified against the current Go behavior. Create a Go integration test suite under `test/volume_server` that validates **drop-in behavior parity** for the Volume Server HTTP and gRPC APIs, so a Rust rewrite can be verified against the current Go behavior.
## Current Program Focus (2026-02-16)
- Primary execution focus has shifted to implementing native Rust volume-server parity.
- This integration suite is now the parity gate for Rust implementation work.
- New tests should be added only when native Rust implementation reveals uncovered Go behavior that is not yet captured.
- Rust implementation roadmap lives in `/Users/chris/dev/seaweedfs2/rust/volume_server/DEV_PLAN.md`.
## Hard Requirements ## Hard Requirements
- Tests live under `test/volume_server`. - Tests live under `test/volume_server`.
- Tests are written in Go. - Tests are written in Go.
@ -1260,3 +1266,10 @@ Update this section during implementation:
- Profiles covered: P1. - Profiles covered: P1.
- Gaps introduced/remaining: deleted-read parity now covers both `GET` and `HEAD` semantics on local-volume path. - Gaps introduced/remaining: deleted-read parity now covers both `GET` and `HEAD` semantics on local-volume path.
- Commit: `cc80ad364` - Commit: `cc80ad364`
- Date: 2026-02-16
- Change: Shifted planning priority to native Rust implementation parity.
- APIs covered: no new API additions in this plan entry; integration suite remains the validation gate.
- Profiles covered: unchanged.
- Gaps introduced/remaining: primary remaining gap is native Rust handler/storage/RPC implementation replacing Go backend delegation.
- Commit: pending
Loading…
Cancel
Save