* fix: prevent filer.backup stall in single-filer setups (#4977)
When MetaAggregator.MetaLogBuffer is empty (which happens in single-filer
setups with no peers), ReadFromBuffer was returning nil error, causing
LoopProcessLogData to enter an infinite wait loop on ListenersCond.
This fix returns ResumeFromDiskError instead, allowing SubscribeMetadata
to loop back and read from persisted logs on disk. This ensures filer.backup
continues processing events even when the in-memory aggregator buffer is empty.
Fixes#4977
* test: add integration tests for metadata subscription
Add integration tests for metadata subscription functionality:
- TestMetadataSubscribeBasic: Tests basic subscription and event receiving
- TestMetadataSubscribeSingleFilerNoStall: Regression test for #4977,
verifies subscription doesn't stall under high load in single-filer setups
- TestMetadataSubscribeResumeFromDisk: Tests resuming subscription from disk
Related to #4977
* ci: add GitHub Actions workflow for metadata subscribe tests
Add CI workflow that runs on:
- Push/PR to master affecting filer, log_buffer, or metadata subscribe code
- Runs the integration tests for metadata subscription
- Uploads logs on failure for debugging
Related to #4977
* fix: use multipart form-data for file uploads in integration tests
The filer expects multipart/form-data for file uploads, not raw POST body.
This fixes the 'Content-Type isn't multipart/form-data' error.
* test: use -peers=none for faster master startup
* test: add -peers=none to remaining master startup in ec tests
* fix: use filer HTTP port 8888, WithFilerClient adds 10000 for gRPC
WithFilerClient calls ToGrpcAddress() which adds 10000 to the port.
Passing 18888 resulted in connecting to 28888. Use 8888 instead.
* test: add concurrent writes and million updates tests
- TestMetadataSubscribeConcurrentWrites: 50 goroutines writing 20 files each
- TestMetadataSubscribeMillionUpdates: 1 million metadata entries via gRPC
(metadata only, no actual file content for speed)
* fix: address PR review comments
- Handle os.MkdirAll errors explicitly instead of ignoring
- Handle log file creation errors with proper error messages
- Replace silent event dropping with 100ms timeout and warning log
* Update metadata_subscribe_integration_test.go
Tests that metadata subscription doesn't stall in single-filer setups
Simulates high-load file uploads while a subscriber tries to keep up
Verifies that events are received without significant stalling
The bug was that in single-filer setups, SubscribeMetadata would block indefinitely
on MetaAggregator.MetaLogBuffer which remains empty (no peers to aggregate from).
The fix ensures that when the buffer is empty, the subscription returns to read from
persisted logs on disk.
TestMetadataSubscribeResumeFromDisk
Tests that subscription can resume from disk:
Upload files before starting subscription
Wait for logs to be flushed to disk
Start subscription from the beginning
Verify pre-uploaded files are received from disk
Running Tests
# Run all tests (requires weed binary in PATH or built)
go test -v ./test/metadata_subscribe/...
# Skip integration tests
go test -short ./test/metadata_subscribe/...
# Run with increased timeout for slow systems
go test -v -timeout 5m ./test/metadata_subscribe/...
Requirements
weed binary must be available in PATH or in the parent directories
Tests create temporary directories that are cleaned up after completion
Tests use ports 9333 (master), 8080 (volume), 8888 (filer)