Browse Source
Issue: Files written successfully but truncated when read back Error: 'EOFException: Reached the end of stream. Still have: 78 bytes left' Root cause: Potential race condition between write completion and read - File metadata updated before all chunks fully flushed - Spark immediately reads after write without ensuring sync - Parquet reader gets incomplete file Solutions applied: 1. Disable filesystem cache to avoid stale file handles - spark.hadoop.fs.seaweedfs.impl.disable.cache=true 2. Enable explicit flush/sync on write (if supported by client) - spark.hadoop.fs.seaweed.write.flush.sync=true 3. Add SPARK_SUBMIT_OPTS for cache disabling These settings ensure: - Files are fully flushed before close() returns - No cached file handles with stale metadata - Fresh reads always get current file state Note: If issue persists, may need to add explicit delay between write and read, or investigate seaweedfs-hadoop3-client flush behavior.pull/7526/head
2 changed files with 7 additions and 1 deletions
Loading…
Reference in new issue