You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
 
 
 
 
 
 

1.5 KiB

CRITICAL DISCOVERY: Chunk Count is Irrelevant to EOF Error

Experiment Results

Flush Strategy Chunks Created File Size EOF Error
Flush on every getPos() 17 1260 bytes 78 bytes
Flush every 5 calls 10 1260 bytes 78 bytes
Flush every 20 calls 10 1260 bytes 78 bytes
NO flushes (single chunk) 1 1260 bytes 78 bytes

Conclusion

The 78-byte error is CONSTANT regardless of chunking strategy.

This proves:

  1. The issue is NOT in SeaweedFS's chunked storage
  2. The issue is NOT in how we flush/write data
  3. The issue is NOT in chunk assembly during reads
  4. The file itself is COMPLETE and CORRECT (1260 bytes)

What This Means

The problem is in Parquet's footer metadata calculation. Parquet is computing that the file should be 1338 bytes (1260 + 78) based on something in our file metadata structure, NOT based on how we chunk the data.

Hypotheses

  1. FileMetaData size field: Parquet may be reading a size field from our entry metadata that doesn't match the actual chunk data
  2. Chunk offset interpretation: Parquet may be misinterpreting our chunk offset/size metadata
  3. Footer structure incompatibility: Our file format may not match what Parquet expects

Next Steps

Need to examine:

  1. What metadata SeaweedFS stores in entry.attributes
  2. How SeaweedRead assembles visible intervals from chunks
  3. What Parquet reads from entry metadata vs actual file data