CRITICAL FIX for Parquet 78-byte EOF error!
Root Cause Analysis:
- Hadoop's FSDataOutputStream tracks position with an internal counter
- It does NOT call SeaweedOutputStream.getPos() by default
- When Parquet writes data and calls getPos() to record column chunk offsets,
it gets FSDataOutputStream's counter, not SeaweedOutputStream's actual position
- This creates a 78-byte mismatch between recorded offsets and actual file size
- Result: EOFException when reading (tries to read beyond file end)
The Fix:
- Override getPos() in the anonymous FSDataOutputStream subclass
- Delegate to SeaweedOutputStream.getPos() which returns 'position + buffer.position()'
- This ensures Parquet gets the correct position when recording metadata
- Column chunk offsets in footer will now match actual data positions
This should fix the consistent 78-byte discrepancy we've been seeing across
all Parquet file writes (regardless of file size: 684, 693, 1275 bytes, etc.)