The issue: Files written to employees/ but immediately moved/deleted by Spark
Spark's file commit process:
1. Write to: employees/_temporary/0/_temporary/attempt_xxx/part-xxx.parquet
2. Commit/rename to: employees/part-xxx.parquet
3. Read and delete (on failure)
By the time we check employees/, the file is already gone!
Solution: Search multiple locations
- employees/ (final location)
- employees/_temporary/ (intermediate)
- employees/_temporary/0/_temporary/ (write location)
- Recursive search as fallback
Also:
- Extract exact filename from write log
- Try all locations until we find the file
- Show directory listings for debugging
This should catch files in their temporary location before Spark moves them!