Browse Source

docs: add comprehensive debugging analysis for EOF exception fix

Documents the complete debugging journey from initial symptoms through
to the root cause discovery and fix.

Key finding: SeaweedInputStream.read() was returning 0 bytes when copying
inline content, causing Parquet's readFully() to throw EOF exceptions.

The fix ensures read() always returns the actual number of bytes copied.
pull/7526/head
chrislu 1 week ago
parent
commit
216ae856ca
  1. 151
      test/java/spark/DEBUGGING_BREAKTHROUGH.md

151
test/java/spark/DEBUGGING_BREAKTHROUGH.md

@ -0,0 +1,151 @@
# Debugging Breakthrough: EOF Exception Analysis
## Summary
After extensive debugging, we've identified and partially fixed the root cause of the `EOFException: Still have: 78 bytes left` error in Parquet file reads.
## Root Cause Analysis
### Initial Hypothesis ❌ (Incorrect)
- **Thought**: File size calculation was wrong (`contentLength` off by 78 bytes)
- **Reality**: `contentLength` was **always correct** at 1275 bytes
### Second Hypothesis ❌ (Partially Correct)
- **Thought**: `FSDataOutputStream.getPos()` wasn't delegating to `SeaweedOutputStream.getPos()`
- **Reality**: The override **was working**, but there was a deeper issue
### Third Hypothesis ✅ (ROOT CAUSE)
- **Problem**: `SeaweedInputStream.read(ByteBuffer buf)` was returning 0 bytes for inline content
- **Location**: Line 127-129 in `SeaweedInputStream.java`
- **Bug**: When copying inline content from protobuf entry, `bytesRead` was never updated
```java
// BEFORE (BUGGY):
if (this.position < Integer.MAX_VALUE && (this.position + len) <= entry.getContent().size()) {
entry.getContent().substring((int) this.position, (int) (this.position + len)).copyTo(buf);
// bytesRead stays 0! <-- BUG
} else {
bytesRead = SeaweedRead.read(...);
}
return (int) bytesRead; // Returns 0 when inline content was copied!
```
```java
// AFTER (FIXED):
if (this.position < Integer.MAX_VALUE && (this.position + len) <= entry.getContent().size()) {
entry.getContent().substring((int) this.position, (int) (this.position + len)).copyTo(buf);
bytesRead = len; // FIX: Update bytesRead after inline copy
} else {
bytesRead = SeaweedRead.read(...);
}
return (int) bytesRead; // Now returns correct value!
```
## Why This Caused EOF Errors
1. **Parquet's readFully() loop**:
```java
while (remaining > 0) {
int read = inputStream.read(buffer, offset, remaining);
if (read == -1 || read == 0) {
throw new EOFException("Still have: " + remaining + " bytes left");
}
remaining -= read;
}
```
2. **Our bug**: When `read()` returned 0 instead of the actual bytes copied, Parquet thought the stream was done
3. **Result**: EOF exception with exactly the number of bytes that weren't reported
## Fixes Implemented
### 1. SeaweedInputStream.java (PRIMARY FIX)
- **File**: `other/java/client/src/main/java/seaweedfs/client/SeaweedInputStream.java`
- **Change**: Set `bytesRead = len` after inline content copy
- **Impact**: Ensures `read()` always returns the correct number of bytes read
### 2. SeaweedOutputStream.java (DIAGNOSTIC)
- **File**: `other/java/client/src/main/java/seaweedfs/client/SeaweedOutputStream.java`
- **Change**: Added comprehensive logging to `getPos()` with stack traces
- **Purpose**: Track who calls `getPos()` and what positions are returned
- **Finding**: All positions appeared correct in tests
### 3. SeaweedFileSystem.java (ALREADY FIXED)
- **File**: `other/java/hdfs3/src/main/java/seaweed/hdfs/SeaweedFileSystem.java`
- **Change**: Override `FSDataOutputStream.getPos()` to delegate to `SeaweedOutputStream`
- **Verification**: Confirmed working with WARN logs
### 4. Unit Test Added
- **File**: `other/java/client/src/test/java/seaweedfs/client/SeaweedStreamIntegrationTest.java`
- **Test**: `testRangeReads()`
- **Coverage**:
- Range reads at specific offsets (like Parquet footer reads)
- Sequential `readFully()` pattern that was failing
- Multiple small reads vs. large reads
- The exact 78-byte read at offset 1197 that was failing
## Test Results
### Before Fix
```
EOFException: Reached the end of stream. Still have: 78 bytes left
- contentLength: 1275 (correct!)
- reads: position=1197 len=78 bytesRead=0 ❌
```
### After Fix
```
No EOF exceptions observed
- contentLength: 1275 (correct)
- reads: position=1197 len=78 bytesRead=78 ✅
```
## Why The 78-Byte Offset Was Consistent
The "78 bytes" wasn't random - it was **systematically the last `read()` call** that returned 0 instead of the actual bytes:
- File size: 1275 bytes
- Last read: position=1197, len=78
- Expected: bytesRead=78
- Actual (before fix): bytesRead=0
- Parquet: "I need 78 more bytes but got EOF!" → EOFException
## Commits
1. **e95f7061a**: Fix inline content read bug + add unit test
2. **c10ae054b**: Add SeaweedInputStream constructor logging
3. **5c30bc8e7**: Add detailed getPos() tracking with stack traces
## Next Steps
1. **Push changes** to your branch
2. **Run CI tests** to verify fix works in GitHub Actions
3. **Monitor** for any remaining edge cases
4. **Remove debug logging** once confirmed stable (or reduce to DEBUG level)
5. **Backport** to other SeaweedFS client versions if needed
## Key Learnings
1. **Read the return value**: Always ensure functions return the correct value, not just perform side effects
2. **Buffer operations need tracking**: When copying data to buffers, track how much was copied
3. **Stack traces help**: Knowing WHO calls a function helps understand WHEN bugs occur
4. **Consistent offsets = systematic bug**: The 78-byte offset being consistent pointed to a logic error, not data corruption
5. **Downloaded file was perfect**: The fact that `parquet-tools` could read the downloaded file proved the bug was in the read path, not write path
## Files Modified
```
other/java/client/src/main/java/seaweedfs/client/SeaweedInputStream.java
other/java/client/src/main/java/seaweedfs/client/SeaweedOutputStream.java
other/java/client/src/main/java/seaweedfs/client/SeaweedRead.java
other/java/client/src/test/java/seaweedfs/client/SeaweedStreamIntegrationTest.java
other/java/hdfs3/src/main/java/seaweed/hdfs/SeaweedFileSystem.java
other/java/hdfs3/src/main/java/seaweed/hdfs/SeaweedFileSystemStore.java
other/java/hdfs3/src/main/java/seaweed/hdfs/SeaweedHadoopOutputStream.java
```
## References
- Issue: Spark integration tests failing with EOF exception
- Parquet version: 1.16.0
- Spark version: 3.5.0
- SeaweedFS client version: 3.80.1-SNAPSHOT
Loading…
Cancel
Save