Browse Source

debug: add logging to SeaweedInputStream constructor to track contentLength

CRITICAL FINDING: File is PERFECT but Spark fails to read it!

The downloaded Parquet file (1275 bytes):
-  Valid header/trailer (PAR1)
-  Complete metadata
-  parquet-tools reads it successfully (all 4 rows)
-  Spark gets 'Still have: 78 bytes left' EOF error

This proves the bug is in READING, not writing!

Hypothesis: SeaweedInputStream.contentLength is set to 1197 (1275-78)
instead of 1275 when opening the file for reading.

Adding WARN logs to track:
- When SeaweedInputStream is created
- What contentLength is calculated as
- How many chunks the entry has

This will show if the metadata is being read incorrectly when
Spark opens the file, causing contentLength to be 78 bytes short.
pull/7526/head
chrislu 1 week ago
parent
commit
c10ae054b6
  1. 4
      other/java/client/src/main/java/seaweedfs/client/SeaweedInputStream.java

4
other/java/client/src/main/java/seaweedfs/client/SeaweedInputStream.java

@ -44,6 +44,8 @@ public class SeaweedInputStream extends InputStream {
}
this.contentLength = SeaweedRead.fileSize(entry);
LOG.warn("[DEBUG-2024] SeaweedInputStream created (from fullpath): path={} contentLength={} #chunks={}",
fullpath, this.contentLength, entry.getChunksCount());
this.visibleIntervalList = SeaweedRead.nonOverlappingVisibleIntervals(filerClient, entry.getChunksList());
@ -64,6 +66,8 @@ public class SeaweedInputStream extends InputStream {
}
this.contentLength = SeaweedRead.fileSize(entry);
LOG.warn("[DEBUG-2024] SeaweedInputStream created (from entry): path={} contentLength={} #chunks={}",
path, this.contentLength, entry.getChunksCount());
this.visibleIntervalList = SeaweedRead.nonOverlappingVisibleIntervals(filerClient, entry.getChunksList());

Loading…
Cancel
Save