You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
 
 
 
 
 
 

5.8 KiB

Debugging Breakthrough: EOF Exception Analysis

Summary

After extensive debugging, we've identified and partially fixed the root cause of the EOFException: Still have: 78 bytes left error in Parquet file reads.

Root Cause Analysis

Initial Hypothesis (Incorrect)

  • Thought: File size calculation was wrong (contentLength off by 78 bytes)
  • Reality: contentLength was always correct at 1275 bytes

Second Hypothesis (Partially Correct)

  • Thought: FSDataOutputStream.getPos() wasn't delegating to SeaweedOutputStream.getPos()
  • Reality: The override was working, but there was a deeper issue

Third Hypothesis (ROOT CAUSE)

  • Problem: SeaweedInputStream.read(ByteBuffer buf) was returning 0 bytes for inline content
  • Location: Line 127-129 in SeaweedInputStream.java
  • Bug: When copying inline content from protobuf entry, bytesRead was never updated
// BEFORE (BUGGY):
if (this.position < Integer.MAX_VALUE && (this.position + len) <= entry.getContent().size()) {
    entry.getContent().substring((int) this.position, (int) (this.position + len)).copyTo(buf);
    // bytesRead stays 0! <-- BUG
} else {
    bytesRead = SeaweedRead.read(...);
}
return (int) bytesRead; // Returns 0 when inline content was copied!
// AFTER (FIXED):
if (this.position < Integer.MAX_VALUE && (this.position + len) <= entry.getContent().size()) {
    entry.getContent().substring((int) this.position, (int) (this.position + len)).copyTo(buf);
    bytesRead = len; // FIX: Update bytesRead after inline copy
} else {
    bytesRead = SeaweedRead.read(...);
}
return (int) bytesRead; // Now returns correct value!

Why This Caused EOF Errors

  1. Parquet's readFully() loop:

    while (remaining > 0) {
        int read = inputStream.read(buffer, offset, remaining);
        if (read == -1 || read == 0) {
            throw new EOFException("Still have: " + remaining + " bytes left");
        }
        remaining -= read;
    }
    
  2. Our bug: When read() returned 0 instead of the actual bytes copied, Parquet thought the stream was done

  3. Result: EOF exception with exactly the number of bytes that weren't reported

Fixes Implemented

1. SeaweedInputStream.java (PRIMARY FIX)

  • File: other/java/client/src/main/java/seaweedfs/client/SeaweedInputStream.java
  • Change: Set bytesRead = len after inline content copy
  • Impact: Ensures read() always returns the correct number of bytes read

2. SeaweedOutputStream.java (DIAGNOSTIC)

  • File: other/java/client/src/main/java/seaweedfs/client/SeaweedOutputStream.java
  • Change: Added comprehensive logging to getPos() with stack traces
  • Purpose: Track who calls getPos() and what positions are returned
  • Finding: All positions appeared correct in tests

3. SeaweedFileSystem.java (ALREADY FIXED)

  • File: other/java/hdfs3/src/main/java/seaweed/hdfs/SeaweedFileSystem.java
  • Change: Override FSDataOutputStream.getPos() to delegate to SeaweedOutputStream
  • Verification: Confirmed working with WARN logs

4. Unit Test Added

  • File: other/java/client/src/test/java/seaweedfs/client/SeaweedStreamIntegrationTest.java
  • Test: testRangeReads()
  • Coverage:
    • Range reads at specific offsets (like Parquet footer reads)
    • Sequential readFully() pattern that was failing
    • Multiple small reads vs. large reads
    • The exact 78-byte read at offset 1197 that was failing

Test Results

Before Fix

EOFException: Reached the end of stream. Still have: 78 bytes left
- contentLength: 1275 (correct!)
- reads: position=1197 len=78 bytesRead=0 ❌

After Fix

No EOF exceptions observed
- contentLength: 1275 (correct)
- reads: position=1197 len=78 bytesRead=78 ✅

Why The 78-Byte Offset Was Consistent

The "78 bytes" wasn't random - it was systematically the last read() call that returned 0 instead of the actual bytes:

  • File size: 1275 bytes
  • Last read: position=1197, len=78
  • Expected: bytesRead=78
  • Actual (before fix): bytesRead=0
  • Parquet: "I need 78 more bytes but got EOF!" → EOFException

Commits

  1. e95f7061a: Fix inline content read bug + add unit test
  2. c10ae054b: Add SeaweedInputStream constructor logging
  3. 5c30bc8e7: Add detailed getPos() tracking with stack traces

Next Steps

  1. Push changes to your branch
  2. Run CI tests to verify fix works in GitHub Actions
  3. Monitor for any remaining edge cases
  4. Remove debug logging once confirmed stable (or reduce to DEBUG level)
  5. Backport to other SeaweedFS client versions if needed

Key Learnings

  1. Read the return value: Always ensure functions return the correct value, not just perform side effects
  2. Buffer operations need tracking: When copying data to buffers, track how much was copied
  3. Stack traces help: Knowing WHO calls a function helps understand WHEN bugs occur
  4. Consistent offsets = systematic bug: The 78-byte offset being consistent pointed to a logic error, not data corruption
  5. Downloaded file was perfect: The fact that parquet-tools could read the downloaded file proved the bug was in the read path, not write path

Files Modified

other/java/client/src/main/java/seaweedfs/client/SeaweedInputStream.java
other/java/client/src/main/java/seaweedfs/client/SeaweedOutputStream.java  
other/java/client/src/main/java/seaweedfs/client/SeaweedRead.java
other/java/client/src/test/java/seaweedfs/client/SeaweedStreamIntegrationTest.java
other/java/hdfs3/src/main/java/seaweed/hdfs/SeaweedFileSystem.java
other/java/hdfs3/src/main/java/seaweed/hdfs/SeaweedFileSystemStore.java
other/java/hdfs3/src/main/java/seaweed/hdfs/SeaweedHadoopOutputStream.java

References

  • Issue: Spark integration tests failing with EOF exception
  • Parquet version: 1.16.0
  • Spark version: 3.5.0
  • SeaweedFS client version: 3.80.1-SNAPSHOT