Browse Source
docs: add comprehensive debugging analysis for EOF exception fix
docs: add comprehensive debugging analysis for EOF exception fix
Documents the complete debugging journey from initial symptoms through to the root cause discovery and fix. Key finding: SeaweedInputStream.read() was returning 0 bytes when copying inline content, causing Parquet's readFully() to throw EOF exceptions. The fix ensures read() always returns the actual number of bytes copied.pull/7526/head
1 changed files with 151 additions and 0 deletions
@ -0,0 +1,151 @@ |
|||
# Debugging Breakthrough: EOF Exception Analysis |
|||
|
|||
## Summary |
|||
After extensive debugging, we've identified and partially fixed the root cause of the `EOFException: Still have: 78 bytes left` error in Parquet file reads. |
|||
|
|||
## Root Cause Analysis |
|||
|
|||
### Initial Hypothesis ❌ (Incorrect) |
|||
- **Thought**: File size calculation was wrong (`contentLength` off by 78 bytes) |
|||
- **Reality**: `contentLength` was **always correct** at 1275 bytes |
|||
|
|||
### Second Hypothesis ❌ (Partially Correct) |
|||
- **Thought**: `FSDataOutputStream.getPos()` wasn't delegating to `SeaweedOutputStream.getPos()` |
|||
- **Reality**: The override **was working**, but there was a deeper issue |
|||
|
|||
### Third Hypothesis ✅ (ROOT CAUSE) |
|||
- **Problem**: `SeaweedInputStream.read(ByteBuffer buf)` was returning 0 bytes for inline content |
|||
- **Location**: Line 127-129 in `SeaweedInputStream.java` |
|||
- **Bug**: When copying inline content from protobuf entry, `bytesRead` was never updated |
|||
|
|||
```java |
|||
// BEFORE (BUGGY): |
|||
if (this.position < Integer.MAX_VALUE && (this.position + len) <= entry.getContent().size()) { |
|||
entry.getContent().substring((int) this.position, (int) (this.position + len)).copyTo(buf); |
|||
// bytesRead stays 0! <-- BUG |
|||
} else { |
|||
bytesRead = SeaweedRead.read(...); |
|||
} |
|||
return (int) bytesRead; // Returns 0 when inline content was copied! |
|||
``` |
|||
|
|||
```java |
|||
// AFTER (FIXED): |
|||
if (this.position < Integer.MAX_VALUE && (this.position + len) <= entry.getContent().size()) { |
|||
entry.getContent().substring((int) this.position, (int) (this.position + len)).copyTo(buf); |
|||
bytesRead = len; // FIX: Update bytesRead after inline copy |
|||
} else { |
|||
bytesRead = SeaweedRead.read(...); |
|||
} |
|||
return (int) bytesRead; // Now returns correct value! |
|||
``` |
|||
|
|||
## Why This Caused EOF Errors |
|||
|
|||
1. **Parquet's readFully() loop**: |
|||
```java |
|||
while (remaining > 0) { |
|||
int read = inputStream.read(buffer, offset, remaining); |
|||
if (read == -1 || read == 0) { |
|||
throw new EOFException("Still have: " + remaining + " bytes left"); |
|||
} |
|||
remaining -= read; |
|||
} |
|||
``` |
|||
|
|||
2. **Our bug**: When `read()` returned 0 instead of the actual bytes copied, Parquet thought the stream was done |
|||
3. **Result**: EOF exception with exactly the number of bytes that weren't reported |
|||
|
|||
## Fixes Implemented |
|||
|
|||
### 1. SeaweedInputStream.java (PRIMARY FIX) |
|||
- **File**: `other/java/client/src/main/java/seaweedfs/client/SeaweedInputStream.java` |
|||
- **Change**: Set `bytesRead = len` after inline content copy |
|||
- **Impact**: Ensures `read()` always returns the correct number of bytes read |
|||
|
|||
### 2. SeaweedOutputStream.java (DIAGNOSTIC) |
|||
- **File**: `other/java/client/src/main/java/seaweedfs/client/SeaweedOutputStream.java` |
|||
- **Change**: Added comprehensive logging to `getPos()` with stack traces |
|||
- **Purpose**: Track who calls `getPos()` and what positions are returned |
|||
- **Finding**: All positions appeared correct in tests |
|||
|
|||
### 3. SeaweedFileSystem.java (ALREADY FIXED) |
|||
- **File**: `other/java/hdfs3/src/main/java/seaweed/hdfs/SeaweedFileSystem.java` |
|||
- **Change**: Override `FSDataOutputStream.getPos()` to delegate to `SeaweedOutputStream` |
|||
- **Verification**: Confirmed working with WARN logs |
|||
|
|||
### 4. Unit Test Added |
|||
- **File**: `other/java/client/src/test/java/seaweedfs/client/SeaweedStreamIntegrationTest.java` |
|||
- **Test**: `testRangeReads()` |
|||
- **Coverage**: |
|||
- Range reads at specific offsets (like Parquet footer reads) |
|||
- Sequential `readFully()` pattern that was failing |
|||
- Multiple small reads vs. large reads |
|||
- The exact 78-byte read at offset 1197 that was failing |
|||
|
|||
## Test Results |
|||
|
|||
### Before Fix |
|||
``` |
|||
EOFException: Reached the end of stream. Still have: 78 bytes left |
|||
- contentLength: 1275 (correct!) |
|||
- reads: position=1197 len=78 bytesRead=0 ❌ |
|||
``` |
|||
|
|||
### After Fix |
|||
``` |
|||
No EOF exceptions observed |
|||
- contentLength: 1275 (correct) |
|||
- reads: position=1197 len=78 bytesRead=78 ✅ |
|||
``` |
|||
|
|||
## Why The 78-Byte Offset Was Consistent |
|||
|
|||
The "78 bytes" wasn't random - it was **systematically the last `read()` call** that returned 0 instead of the actual bytes: |
|||
- File size: 1275 bytes |
|||
- Last read: position=1197, len=78 |
|||
- Expected: bytesRead=78 |
|||
- Actual (before fix): bytesRead=0 |
|||
- Parquet: "I need 78 more bytes but got EOF!" → EOFException |
|||
|
|||
## Commits |
|||
|
|||
1. **e95f7061a**: Fix inline content read bug + add unit test |
|||
2. **c10ae054b**: Add SeaweedInputStream constructor logging |
|||
3. **5c30bc8e7**: Add detailed getPos() tracking with stack traces |
|||
|
|||
## Next Steps |
|||
|
|||
1. **Push changes** to your branch |
|||
2. **Run CI tests** to verify fix works in GitHub Actions |
|||
3. **Monitor** for any remaining edge cases |
|||
4. **Remove debug logging** once confirmed stable (or reduce to DEBUG level) |
|||
5. **Backport** to other SeaweedFS client versions if needed |
|||
|
|||
## Key Learnings |
|||
|
|||
1. **Read the return value**: Always ensure functions return the correct value, not just perform side effects |
|||
2. **Buffer operations need tracking**: When copying data to buffers, track how much was copied |
|||
3. **Stack traces help**: Knowing WHO calls a function helps understand WHEN bugs occur |
|||
4. **Consistent offsets = systematic bug**: The 78-byte offset being consistent pointed to a logic error, not data corruption |
|||
5. **Downloaded file was perfect**: The fact that `parquet-tools` could read the downloaded file proved the bug was in the read path, not write path |
|||
|
|||
## Files Modified |
|||
|
|||
``` |
|||
other/java/client/src/main/java/seaweedfs/client/SeaweedInputStream.java |
|||
other/java/client/src/main/java/seaweedfs/client/SeaweedOutputStream.java |
|||
other/java/client/src/main/java/seaweedfs/client/SeaweedRead.java |
|||
other/java/client/src/test/java/seaweedfs/client/SeaweedStreamIntegrationTest.java |
|||
other/java/hdfs3/src/main/java/seaweed/hdfs/SeaweedFileSystem.java |
|||
other/java/hdfs3/src/main/java/seaweed/hdfs/SeaweedFileSystemStore.java |
|||
other/java/hdfs3/src/main/java/seaweed/hdfs/SeaweedHadoopOutputStream.java |
|||
``` |
|||
|
|||
## References |
|||
|
|||
- Issue: Spark integration tests failing with EOF exception |
|||
- Parquet version: 1.16.0 |
|||
- Spark version: 3.5.0 |
|||
- SeaweedFS client version: 3.80.1-SNAPSHOT |
|||
|
|||
Write
Preview
Loading…
Cancel
Save
Reference in new issue