You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
4.5 KiB
4.5 KiB
Ready to Push - Comprehensive Diagnostics
Current Status
Branch: java-client-replication-configuration
Commits ahead of origin: 1 (revert of documentation file)
All diagnostic code is already in place from previous pushes
What This Push Contains
Commit: afce69db1
Revert "docs: comprehensive analysis of persistent 78-byte Parquet issue"
Removes the PARQUET_ISSUE_SUMMARY.md documentation file (cleanup).
What's Already Pushed and Active
The following diagnostic features are already in origin and will run on next CI trigger:
1. Enhanced Write Logging (Commits: 48a2ddf, 885354b, 65c3ead)
- Tracks every write with
totalBytesWrittencounter - Logs footer-related writes (marked [FOOTER?])
- Shows write call count for pattern analysis
2. Parquet 1.16.0 Upgrade (Commit: 12504dc1a)
- Upgraded from 1.13.1 to 1.16.0
- All Parquet dependencies coordinated
- Result: Changed file sizes but error persists
3. File Download & Inspection (Commit: b767825ba) ⭐
- name: Download and examine Parquet files
if: failure()
working-directory: test/java/spark
run: |
# Install parquet-tools
pip3 install parquet-tools
# Download failing Parquet file
curl -o test.parquet "http://localhost:8888/test-spark/employees/..."
# Check magic bytes (PAR1)
# Hex dump header and footer
# Run parquet-tools inspect/show
# Upload as artifact
This will definitively show if the file is valid!
What Will Happen After Push
- GitHub Actions triggers automatically
- All diagnostics run (already in place)
- Test fails (expected - 78-byte error persists)
- File download step executes (on failure)
- Detailed file analysis printed to logs:
- File size (should be 693 or 705 bytes)
- PAR1 magic bytes check (header + trailer)
- Hex dump of footer (last 200 bytes)
- parquet-tools inspection output
- Artifact uploaded:
failed-parquet-file(test.parquet)
Expected Output from File Analysis
If File is Valid:
✓ PAR1 magic at start
✓ PAR1 magic at end
✓ Size: 693 bytes
parquet-tools inspect: [metadata displayed]
parquet-tools show: [can or cannot read data]
If File is Incomplete:
✓ PAR1 magic at start
✗ No PAR1 magic at end
✓ Size: 693 bytes
Footer appears truncated
Key Questions This Will Answer
-
Is the file structurally complete?
- Has PAR1 header? ✓ or ✗
- Has PAR1 trailer? ✓ or ✗
-
Can standard Parquet tools read it?
- If YES: Spark/SeaweedFS integration issue
- If NO with same error: Footer metadata wrong
- If NO with different error: New clue
-
What does the footer actually contain?
- Hex dump will show raw footer bytes
- Can manually decode to see column offsets
-
Where should we focus next?
- File format (if incomplete)
- Parquet writer bug (if wrong metadata)
- SeaweedFS read path (if file is valid)
- Spark integration (if tools can read it)
Artifacts Available After Run
- Test results:
spark-test-results(surefire reports) - Parquet file:
failed-parquet-file(test.parquet)- Download and analyze locally
- Use parquet-tools, pyarrow, or hex editor
Commands to Push
# Simple push (recommended)
git push origin java-client-replication-configuration
# Or with verbose output
git push -v origin java-client-replication-configuration
# To force push (NOT NEEDED - history is clean)
# git push --force origin java-client-replication-configuration
After CI Completes
- Check Actions tab for workflow run
- Look for "Download and examine Parquet files" step
- Read the output to see file analysis
- Download
failed-parquet-fileartifact for local inspection - Based on results, proceed with:
- Option A: Fix Parquet footer generation
- Option B: Try uncompressed Parquet
- Option C: Investigate SeaweedFS read path
- Option D: Update Spark/Parquet version
Current Understanding
From logs, we know:
- ✅ All 693 bytes are written
- ✅ Footer trailer is written (last 6 bytes)
- ✅ Buffer is fully flushed
- ✅ File metadata shows 693 bytes
- ❌ Parquet reader expects 771 bytes (693 + 78)
- ❌ Consistent 78-byte discrepancy across all files
Next step after download: See if the 78 bytes are actually missing, or if footer just claims they should exist.
Timeline
- Push now → ~2 minutes
- CI starts → ~30 seconds
- Build & test → ~5-10 minutes
- Test fails → File download executes
- Results available → ~15 minutes total