5.3 KiB
Virtual Position Fix: Status and Findings
Implementation Complete
Changes Made
-
Added
virtualPositionfield toSeaweedOutputStream- Tracks total bytes written (including buffered)
- Initialized to match
positionin constructor - Incremented on every
write()call
-
Updated
getPos()to returnvirtualPosition- Always returns accurate total bytes written
- No longer depends on
position + buffer.position() - Aligns with Hadoop
FSDataOutputStreamsemantics
-
Enhanced debug logging
- All logs now show both
virtualPosandflushedPos - Clear separation between virtual and physical positions
- All logs now show both
Test Results
✅ What's Working
-
Virtual position tracking is accurate:
Last getPos() call: returns 1252 (writeCall #465) Final writes: writeCalls 466-470 (8 bytes) close(): virtualPos=1260 ✓ File written: 1260 bytes ✓ Metadata: fileSize=1260 ✓ -
No more position discrepancy:
- Before:
getPos()returnedposition + buffer.position()= 1252 - After:
getPos()returnsvirtualPosition= 1260 - File size matches virtualPosition
- Before:
❌ What's Still Failing
EOF Exception persists: EOFException: Still have: 78 bytes left
Root Cause Analysis
The virtual position fix ensures getPos() always returns the correct total, but it doesn't solve the fundamental timing issue:
-
The Parquet Write Sequence:
1. Parquet writes column chunk data 2. Parquet calls getPos() → gets 1252 3. Parquet STORES this value: columnChunkOffset = 1252 4. Parquet writes footer metadata (8 bytes) 5. Parquet writes the footer with columnChunkOffset = 1252 6. Close → flushes all 1260 bytes -
The Problem:
- Parquet uses the
getPos()value immediately when it's returned - It stores
columnChunkOffset = 1252in memory - Then writes more bytes (footer metadata)
- Then writes the footer containing
columnChunkOffset = 1252 - But by then, those 8 footer bytes have shifted everything!
- Parquet uses the
-
Why Virtual Position Doesn't Fix It:
- Even though
getPos()now correctly returns 1260 at close time - Parquet has ALREADY recorded offset = 1252 in its internal state
- Those stale offsets get written into the Parquet footer
- When reading, Parquet footer says "seek to 1252" but data is elsewhere
- Even though
The Real Issue
The problem is NOT that getPos() returns the wrong value.
The problem is that Parquet's write sequence is incompatible with buffered streams:
- Parquet assumes:
getPos()returns the position where the NEXT byte will be written - But with buffering: Bytes are written to buffer first, then flushed later
- Parquet records offsets based on
getPos(), then writes more data - Those "more data" bytes invalidate the recorded offsets
Why This Works in HDFS/S3
HDFS and S3 implementations likely:
- Flush on every
getPos()call - ensures position is always up-to-date - Use unbuffered streams for Parquet - no offset drift
- Have different buffering semantics - data committed immediately
Next Steps: True Fix Options
Option A: Flush on getPos() (Performance Hit)
public synchronized long getPos() {
if (buffer.position() > 0) {
writeCurrentBufferToService(); // Force flush
}
return position; // Now accurate
}
Pros: Guarantees correct offsets
Cons: Many small flushes, poor performance
Option B: Detect Parquet and Flush (Targeted)
public synchronized long getPos() {
if (path.endsWith(".parquet") && buffer.position() > 0) {
writeCurrentBufferToService(); // Flush for Parquet
}
return virtualPosition;
}
Pros: Only affects Parquet files
Cons: Hacky, file extension detection is brittle
Option C: Implement Hadoop's Syncable (Proper)
Make SeaweedOutputStream implement Syncable.hflush():
@Override
public void hflush() throws IOException {
writeCurrentBufferToService(); // Flush to service
flushWrittenBytesToService(); // Wait for completion
}
Let Parquet call hflush() when it needs guaranteed positions.
Pros: Clean, follows Hadoop contract
Cons: Requires Parquet/Spark to use hflush()
Option D: Buffer Size = 0 for Parquet (Workaround)
Detect Parquet writes and disable buffering:
if (path.endsWith(".parquet")) {
this.bufferSize = 0; // No buffering for Parquet
}
Pros: Simple, no offset issues
Cons: Terrible performance for Parquet
Recommended: Option C + Option A Hybrid
- Implement
Syncable.hflush()properly (Option C) - Make
getPos()flush if buffer is not empty (Option A) - This ensures:
- Correct offsets for Parquet
- Works with any client that calls
getPos() - Follows Hadoop semantics
Status
- ✅ Virtual position tracking implemented
- ✅
getPos()returns accurate total - ✅ File size metadata correct
- ❌ Parquet EOF exception persists
- ⏭️ Need to implement flush-on-getPos() or hflush()
Files Modified
other/java/client/src/main/java/seaweedfs/client/SeaweedOutputStream.java- Added
virtualPositionfield - Updated
getPos()to returnvirtualPosition - Enhanced debug logging
- Added
Next Action
Implement flush-on-getPos() to guarantee correct offsets for Parquet.