Browse Source
Added comprehensive logging to identify why Parquet files fail with 'EOFException: Still have: 78 bytes left'. Key additions: 1. SeaweedHadoopOutputStream constructor logging with 🔧 marker - Shows when output streams are created - Logs path, position, bufferSize, replication 2. totalBytesWritten counter in SeaweedOutputStream - Tracks cumulative bytes written via write() calls - Helps identify if Parquet wrote 762 bytes but only 684 reached chunks 3. Enhanced close() logging with 🔒 and ✅ markers - Shows totalBytesWritten vs position vs buffer.position() - If totalBytesWritten=762 but position=684, write submission failed - If buffer.position()=78 at close, buffer wasn't flushed Expected scenarios in next run: A) Stream never created → No 🔧 log for .parquet files B) Write failed → totalBytesWritten=762 but position=684 C) Buffer not flushed → buffer.position()=78 at close D) All correct → totalBytesWritten=position=684, but Parquet expects 762 This will pinpoint whether the issue is in: - Stream creation/lifecycle - Write submission - Buffer flushing - Or Parquet's internal statepull/7526/head
7 changed files with 22 additions and 7 deletions
Loading…
Reference in new issue