12244 Commits (d4d6836139b7973ce689ff982e948d242af9521e)
 

Author SHA1 Message Date
chrislu d4d6836139 test: prove Spark CAN read Parquet files (both direct and Spark-written) 1 week ago
chrislu 1d78409440 test: prove Parquet works perfectly when written directly (not via Spark) 1 week ago
chrislu fba35124af experiment: prove chunk count irrelevant to 78-byte EOF error 1 week ago
chrislu f6b0c1e216 docs: comprehensive recommendation for Parquet EOF fix 1 week ago
chrislu 1cdb2fcf07 fix: implement flush-before-getPos() for Parquet compatibility 1 week ago
chrislu b019ec8f08 feat: comprehensive Parquet EOF debugging with multiple fix attempts 1 week ago
chrislu 2bf6e814f0 docs: complete debug session summary and findings 1 week ago
chrislu 9eb71466d8 feat: implement flush-on-getPos() to ensure accurate offsets 1 week ago
chrislu c1b0aa6611 feat: implement virtual position tracking in SeaweedOutputStream 1 week ago
chrislu 2d6b571120 docs: comprehensive analysis of Parquet EOF root cause and fix strategies 1 week ago
chrislu 3e754792a5 feat: add comprehensive debug logging to track Parquet write sequence 1 week ago
chrislu 7d601191a5 docs: complete local reproduction analysis with detailed findings 1 week ago
chrislu 852ca41928 docs: BREAKTHROUGH - found the bug in Spark local reproduction! 1 week ago
chrislu 50a8a3eb11 docs: comprehensive test results showing unit tests PASS but Spark fails 1 week ago
chrislu 80b463b7e4 test: add GetPosBufferTest to reproduce Parquet issue - ALL TESTS PASS! 1 week ago
chrislu 4faa6d55f6 docs: comprehensive issue summary - getPos() buffer flush timing issue 1 week ago
chrislu 8f33f5240d debug: confirmed root cause - Parquet tries to read 78 bytes past EOF 1 week ago
chrislu 16b8cf3e52 debug: add logging to EOF return path - FOUND ROOT CAUSE! 1 week ago
chrislu 216ae856ca docs: add comprehensive debugging analysis for EOF exception fix 1 week ago
chrislu 5c30bc8e7b debug: add detailed getPos() tracking with caller stack trace 1 week ago
chrislu e95f7061a4 fix: SeaweedInputStream returning 0 bytes for inline content reads 1 week ago
chrislu c10ae054b6 debug: add logging to SeaweedInputStream constructor to track contentLength 1 week ago
chrislu 9bb000e150 Update SeaweedOutputStream.java 1 week ago
chrislu d7d4d97098 debug: verify JARs contain latest code before running tests 1 week ago
chrislu 4936f733d1 debug: add WARN logging to SeaweedOutputStream base constructor 1 week ago
chrislu c834e30a72 debug: add logging to SeaweedFileSystemStore.createFile() 1 week ago
chrislu aed16ca9d7 fix: enable DEBUG logging for seaweed.hdfs package 1 week ago
chrislu 6fe5c372ee debug: change logs to WARN level to ensure visibility 1 week ago
chrislu c91175cb97 fix: make path variable final for anonymous inner class 1 week ago
chrislu d6f9234cea debug: add aggressive logging to FSDataOutputStream getPos() override 1 week ago
chrislu 58d4d61f89 docs: push instructions for Parquet EOF fix 1 week ago
chrislu 90aa83dbe4 docs: add detailed analysis of Parquet EOF fix 1 week ago
chrislu 9e7ed48688 fix: Override FSDataOutputStream.getPos() to use SeaweedOutputStream position 1 week ago
chrislu a8491ecd3f Update SeaweedOutputStream.java 1 week ago
chrislu 16bd118125 fix: don't split chunk ID on comma - comma is PART of the ID! 1 week ago
chrislu a1fa949221 feat: extract chunk IDs from write log and download from volume 1 week ago
chrislu c774b807e1 fix: search temporary directories for Parquet files 1 week ago
chrislu 7b9b04cd59 feat: add explicit logging when employees Parquet file is written 1 week ago
chrislu 09b0a2505c fix: poll for files to appear instead of fixed sleep 1 week ago
chrislu 64357e73bf feat: proactive download - grab files BEFORE Spark deletes them 1 week ago
chrislu 8e0635b8ba fix: search for filename in 'Encountered error' message 1 week ago
chrislu c5c29bc820 fix: search for failing file in read context (SeaweedInputStream) 1 week ago
chrislu e76107c22e fix: extract chunk ID for the EXACT file causing EOF error 1 week ago
chrislu 0afe330b4e feat: add detailed offset analysis for 78-byte discrepancy 1 week ago
chrislu 72b4bf9098 fix: extract correct chunk ID (not source_file_id) 1 week ago
chrislu 4ec6fbcdc7 fix: download Parquet data directly from volume server 1 week ago
chrislu 4224fcf4f8 chore: trigger new workflow run with real-time monitoring 1 week ago
chrislu a4af6d880d fix: download Parquet file in real-time when EOF error occurs 1 week ago
chrislu 09384e41e3 fix: add comprehensive diagnostics for file location 1 week ago
chrislu 8ea2646084 fix: keep containers running during file download 1 week ago