CRITICAL FINDING: File is PERFECT but Spark fails to read it!
The downloaded Parquet file (1275 bytes):
- ✅ Valid header/trailer (PAR1)
- ✅ Complete metadata
- ✅ parquet-tools reads it successfully (all 4 rows)
- ❌ Spark gets 'Still have: 78 bytes left' EOF error
This proves the bug is in READING, not writing!
Hypothesis: SeaweedInputStream.contentLength is set to 1197 (1275-78)
instead of 1275 when opening the file for reading.
Adding WARN logs to track:
- When SeaweedInputStream is created
- What contentLength is calculated as
- How many chunks the entry has
This will show if the metadata is being read incorrectly when
Spark opens the file, causing contentLength to be 78 bytes short.