You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
 
 
 
 
 
 

4.5 KiB

Ready to Push - Comprehensive Diagnostics

Current Status

Branch: java-client-replication-configuration
Commits ahead of origin: 1 (revert of documentation file)
All diagnostic code is already in place from previous pushes

What This Push Contains

Commit: afce69db1

Revert "docs: comprehensive analysis of persistent 78-byte Parquet issue"

Removes the PARQUET_ISSUE_SUMMARY.md documentation file (cleanup).

What's Already Pushed and Active

The following diagnostic features are already in origin and will run on next CI trigger:

1. Enhanced Write Logging (Commits: 48a2ddf, 885354b, 65c3ead)

  • Tracks every write with totalBytesWritten counter
  • Logs footer-related writes (marked [FOOTER?])
  • Shows write call count for pattern analysis

2. Parquet 1.16.0 Upgrade (Commit: 12504dc1a)

  • Upgraded from 1.13.1 to 1.16.0
  • All Parquet dependencies coordinated
  • Result: Changed file sizes but error persists

3. File Download & Inspection (Commit: b767825ba)

- name: Download and examine Parquet files
  if: failure()
  working-directory: test/java/spark
  run: |
    # Install parquet-tools
    pip3 install parquet-tools
    
    # Download failing Parquet file
    curl -o test.parquet "http://localhost:8888/test-spark/employees/..."
    
    # Check magic bytes (PAR1)
    # Hex dump header and footer  
    # Run parquet-tools inspect/show
    # Upload as artifact    

This will definitively show if the file is valid!

What Will Happen After Push

  1. GitHub Actions triggers automatically
  2. All diagnostics run (already in place)
  3. Test fails (expected - 78-byte error persists)
  4. File download step executes (on failure)
  5. Detailed file analysis printed to logs:
    • File size (should be 693 or 705 bytes)
    • PAR1 magic bytes check (header + trailer)
    • Hex dump of footer (last 200 bytes)
    • parquet-tools inspection output
  6. Artifact uploaded: failed-parquet-file (test.parquet)

Expected Output from File Analysis

If File is Valid:

✓ PAR1 magic at start
✓ PAR1 magic at end
✓ Size: 693 bytes
parquet-tools inspect: [metadata displayed]
parquet-tools show: [can or cannot read data]

If File is Incomplete:

✓ PAR1 magic at start
✗ No PAR1 magic at end
✓ Size: 693 bytes
Footer appears truncated

Key Questions This Will Answer

  1. Is the file structurally complete?

    • Has PAR1 header? ✓ or ✗
    • Has PAR1 trailer? ✓ or ✗
  2. Can standard Parquet tools read it?

    • If YES: Spark/SeaweedFS integration issue
    • If NO with same error: Footer metadata wrong
    • If NO with different error: New clue
  3. What does the footer actually contain?

    • Hex dump will show raw footer bytes
    • Can manually decode to see column offsets
  4. Where should we focus next?

    • File format (if incomplete)
    • Parquet writer bug (if wrong metadata)
    • SeaweedFS read path (if file is valid)
    • Spark integration (if tools can read it)

Artifacts Available After Run

  1. Test results: spark-test-results (surefire reports)
  2. Parquet file: failed-parquet-file (test.parquet)
    • Download and analyze locally
    • Use parquet-tools, pyarrow, or hex editor

Commands to Push

# Simple push (recommended)
git push origin java-client-replication-configuration

# Or with verbose output
git push -v origin java-client-replication-configuration

# To force push (NOT NEEDED - history is clean)
# git push --force origin java-client-replication-configuration

After CI Completes

  1. Check Actions tab for workflow run
  2. Look for "Download and examine Parquet files" step
  3. Read the output to see file analysis
  4. Download failed-parquet-file artifact for local inspection
  5. Based on results, proceed with:
    • Option A: Fix Parquet footer generation
    • Option B: Try uncompressed Parquet
    • Option C: Investigate SeaweedFS read path
    • Option D: Update Spark/Parquet version

Current Understanding

From logs, we know:

  • All 693 bytes are written
  • Footer trailer is written (last 6 bytes)
  • Buffer is fully flushed
  • File metadata shows 693 bytes
  • Parquet reader expects 771 bytes (693 + 78)
  • Consistent 78-byte discrepancy across all files

Next step after download: See if the 78 bytes are actually missing, or if footer just claims they should exist.

Timeline

  • Push now → ~2 minutes
  • CI starts → ~30 seconds
  • Build & test → ~5-10 minutes
  • Test fails → File download executes
  • Results available → ~15 minutes total