You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
1.5 KiB
1.5 KiB
CRITICAL DISCOVERY: Chunk Count is Irrelevant to EOF Error
Experiment Results
| Flush Strategy | Chunks Created | File Size | EOF Error |
|---|---|---|---|
| Flush on every getPos() | 17 | 1260 bytes | 78 bytes |
| Flush every 5 calls | 10 | 1260 bytes | 78 bytes |
| Flush every 20 calls | 10 | 1260 bytes | 78 bytes |
| NO flushes (single chunk) | 1 | 1260 bytes | 78 bytes |
Conclusion
The 78-byte error is CONSTANT regardless of chunking strategy.
This proves:
- The issue is NOT in SeaweedFS's chunked storage
- The issue is NOT in how we flush/write data
- The issue is NOT in chunk assembly during reads
- The file itself is COMPLETE and CORRECT (1260 bytes)
What This Means
The problem is in Parquet's footer metadata calculation. Parquet is computing that the file should be 1338 bytes (1260 + 78) based on something in our file metadata structure, NOT based on how we chunk the data.
Hypotheses
- FileMetaData size field: Parquet may be reading a size field from our entry metadata that doesn't match the actual chunk data
- Chunk offset interpretation: Parquet may be misinterpreting our chunk offset/size metadata
- Footer structure incompatibility: Our file format may not match what Parquet expects
Next Steps
Need to examine:
- What metadata SeaweedFS stores in entry.attributes
- How SeaweedRead assembles visible intervals from chunks
- What Parquet reads from entry metadata vs actual file data