Browse Source
The issue: Files written to employees/ but immediately moved/deleted by Spark Spark's file commit process: 1. Write to: employees/_temporary/0/_temporary/attempt_xxx/part-xxx.parquet 2. Commit/rename to: employees/part-xxx.parquet 3. Read and delete (on failure) By the time we check employees/, the file is already gone! Solution: Search multiple locations - employees/ (final location) - employees/_temporary/ (intermediate) - employees/_temporary/0/_temporary/ (write location) - Recursive search as fallback Also: - Extract exact filename from write log - Try all locations until we find the file - Show directory listings for debugging This should catch files in their temporary location before Spark moves them!pull/7526/head
1 changed files with 33 additions and 19 deletions
Loading…
Reference in new issue