Browse Source

spark: fix flaky test by sorting DataFrame before first()

- In testLargeDataset(), add orderBy("value") before calling first()
- Parquet files don't guarantee row order, so first() on unordered
  DataFrame can return any row, making assertions flaky
- Sorting by 'value' ensures the first row is always the one with
  value=0, making the test deterministic and reliable
pull/7526/head
chrislu 1 week ago
parent
commit
b35463c8b4
  1. 4
      test/java/spark/src/test/java/seaweed/spark/SparkReadWriteTest.java

4
test/java/spark/src/test/java/seaweed/spark/SparkReadWriteTest.java

@ -168,8 +168,8 @@ public class SparkReadWriteTest extends SparkTestBase {
Dataset<Row> readDf = spark.read().parquet(outputPath); Dataset<Row> readDf = spark.read().parquet(outputPath);
assertEquals(10000, readDf.count()); assertEquals(10000, readDf.count());
// Verify some data
Row firstRow = readDf.first();
// Verify some data (sort to ensure deterministic order)
Row firstRow = readDf.orderBy("value").first();
assertEquals(0L, firstRow.getLong(0)); assertEquals(0L, firstRow.getLong(0));
assertEquals(0L, firstRow.getLong(1)); assertEquals(0L, firstRow.getLong(1));
} }

Loading…
Cancel
Save