* fix: EC rebalance fails with replica placement 000
This PR fixes several issues with EC shard distribution:
1. Pre-flight check before EC encoding
- Verify target disk type has capacity before encoding starts
- Prevents encoding shards only to fail during rebalance
- Shows helpful error when wrong diskType is specified (e.g., ssd when volumes are on hdd)
2. Fix EC rebalance with replica placement 000
- When DiffRackCount=0, shards should be distributed freely across racks
- The '000' placement means 'no volume replication needed' because EC provides redundancy
- Previously all racks were skipped with error 'shards X > replica placement limit (0)'
3. Add unit tests for EC rebalance slot calculation
- TestECRebalanceWithLimitedSlots: documents the limited slots scenario
- TestECRebalanceZeroFreeSlots: reproduces the 0 free slots error
4. Add Makefile for manual EC testing
- make setup: start cluster and populate data
- make shell: open weed shell for EC commands
- make clean: stop cluster and cleanup
* fix: default -rebalance to true for ec.encode
The -rebalance flag was defaulting to false, which meant ec.encode would
only print shard moves but not actually execute them. This is a poor
default since the whole point of EC encoding is to distribute shards
across servers for fault tolerance.
Now -rebalance defaults to true, so shards are actually distributed
after encoding. Users can use -rebalance=false if they only want to
see what would happen without making changes.
* test/erasure_coding: improve Makefile safety and docs
- Narrow pkill pattern for volume servers to use TEST_DIR instead of
port pattern, avoiding accidental kills of unrelated SeaweedFS processes
- Document external dependencies (curl, jq) in header comments
* shell: refactor buildRackWithEcShards to reuse buildEcShards
Extract common shard bit construction logic to avoid duplication
between buildEcShards and buildRackWithEcShards helper functions.
* shell: update test for EC replication 000 behavior
When DiffRackCount=0 (replication "000"), EC shards should be
distributed freely across racks since erasure coding provides its
own redundancy. Update test expectation to reflect this behavior.
* erasure_coding: add distribution package for proportional EC shard placement
Add a new reusable package for EC shard distribution that:
- Supports configurable EC ratios (not hard-coded 10+4)
- Distributes shards proportionally based on replication policy
- Provides fault tolerance analysis
- Prefers moving parity shards to keep data shards spread out
Key components:
- ECConfig: Configurable data/parity shard counts
- ReplicationConfig: Parsed XYZ replication policy
- ECDistribution: Target shard counts per DC/rack/node
- Rebalancer: Plans shard moves with parity-first strategy
This enables seaweed-enterprise custom EC ratios and weed worker
integration while maintaining a clean, testable architecture.
* shell: integrate distribution package for EC rebalancing
Add shell wrappers around the distribution package:
- ProportionalECRebalancer: Plans moves using distribution.Rebalancer
- NewProportionalECRebalancerWithConfig: Supports custom EC configs
- GetDistributionSummary/GetFaultToleranceAnalysis: Helper functions
The shell layer converts between EcNode types and the generic
TopologyNode types used by the distribution package.
* test setup
* ec: improve data and parity shard distribution across racks
- Add shardsByTypePerRack helper to track data vs parity shards
- Rewrite doBalanceEcShardsAcrossRacks for two-pass balancing:
1. Balance data shards (0-9) evenly, max ceil(10/6)=2 per rack
2. Balance parity shards (10-13) evenly, max ceil(4/6)=1 per rack
- Add balanceShardTypeAcrossRacks for generic shard type balancing
- Add pickRackForShardType to select destination with room for type
- Add unit tests for even data/parity distribution verification
This ensures even read load during normal operation by spreading
both data and parity shards across all available racks.
* ec: make data/parity shard counts configurable in ecBalancer
- Add dataShardCount and parityShardCount fields to ecBalancer struct
- Add getDataShardCount() and getParityShardCount() methods with defaults
- Replace direct constant usage with configurable methods
- Fix unused variable warning for parityPerRack
This allows seaweed-enterprise to use custom EC ratios while
defaulting to standard 10+4 scheme.
* Address PR 7812 review comments
Makefile improvements:
- Save PIDs for each volume server for precise termination
- Use PID-based killing in stop target with pkill fallback
- Use more specific pkill patterns with TEST_DIR paths
Documentation:
- Document jq dependency in README.md
Rebalancer fix:
- Fix duplicate shard count updates in applyMovesToAnalysis
- All planners (DC/rack/node) update counts inline during planning
- Remove duplicate updates from applyMovesToAnalysis to avoid double-counting
* test/erasure_coding: use mktemp for test file template
Use mktemp instead of hardcoded /tmp/testfile_template.bin path
to provide better isolation for concurrent test runs.