seaweedfs

History

chrislu 1cdb2fcf07 fix: implement flush-before-getPos() for Parquet compatibility After analyzing Parquet-Java source code, confirmed that: 1. Parquet calls out.getPos() before writing each page to record offsets 2. These offsets are stored in footer metadata 3. Footer length (4 bytes) + MAGIC (4 bytes) are written after last page 4. When reading, Parquet seeks to recorded offsets IMPLEMENTATION: - getPos() now flushes buffer before returning position - This ensures recorded offsets match actual file positions - Added comprehensive debug logging RESULT: - Offsets are now correctly recorded (verified in logs) - Last getPos() returns 1252 ✓ - File ends at 1260 (1252 + 8 footer bytes) ✓ - Creates 17 chunks instead of 1 (side effect of many flushes) - EOF exception STILL PERSISTS ❌ ANALYSIS: The EOF error persists despite correct offset recording. The issue may be: 1. Too many small chunks (17 chunks for 1260 bytes) causing fragmentation 2. Chunks being assembled incorrectly during read 3. Or a deeper issue in how Parquet footer is structured The implementation is CORRECT per Parquet's design, but something in the chunk assembly or read path is still causing the 78-byte EOF error. Next: Investigate chunk assembly in SeaweedRead or consider atomic writes.		4 months ago
..
client	fix: implement flush-before-getPos() for Parquet compatibility	4 months ago
examples	java 3.80	1 year ago
hdfs2	fix: use SNAPSHOT version to force Maven to use locally built JARs	4 months ago
hdfs3	fix: implement flush-before-getPos() for Parquet compatibility	4 months ago
s3copier	add some more example	3 years ago