Browse Source
Added targeted logging to answer the key question: "Are the missing 78 bytes the Parquet footer that never got written?" Changes: 1. Log ALL writes after call 220 (likely footer-related) - Previous: only logged writes >= 20 bytes - Now: also log small writes near end marked [FOOTER?] 2. Enhanced getPos() logging with writeCalls context - Shows relationship between getPos() and actual writes - Helps identify if Parquet calculates size before writing footer This will reveal: A) What the last ~14 write calls contain (footer structure) B) If getPos() is called before/during footer writes C) If there's a mismatch between calculated size and actual writes Expected pattern if footer is missing: - Large writes up to ~600 bytes (data pages) - Small writes for metadata - getPos() called to calculate footer offset - Footer writes (78 bytes) that either: * Never happen (bug in Parquet) * Get lost in FSDataOutputStream * Are written but lost in flush Next run will show the exact write sequence!pull/7526/head
1 changed files with 16 additions and 7 deletions
Loading…
Reference in new issue