Browse Source
CRITICAL FINDING: File is PERFECT but Spark fails to read it! The downloaded Parquet file (1275 bytes): - ✅ Valid header/trailer (PAR1) - ✅ Complete metadata - ✅ parquet-tools reads it successfully (all 4 rows) - ❌ Spark gets 'Still have: 78 bytes left' EOF error This proves the bug is in READING, not writing! Hypothesis: SeaweedInputStream.contentLength is set to 1197 (1275-78) instead of 1275 when opening the file for reading. Adding WARN logs to track: - When SeaweedInputStream is created - What contentLength is calculated as - How many chunks the entry has This will show if the metadata is being read incorrectly when Spark opens the file, causing contentLength to be 78 bytes short.pull/7526/head
1 changed files with 4 additions and 0 deletions
Loading…
Reference in new issue