6.2 KiB
GitHub Actions CI/CD Setup
Overview
The Spark integration tests are now configured to run automatically via GitHub Actions.
Workflow File
Location: .github/workflows/spark-integration-tests.yml
Triggers
The workflow runs automatically on:
- Push to master/main - When code is pushed to main branches
- Pull Requests - When PRs target master/main
- Manual Trigger - Via workflow_dispatch in GitHub UI
The workflow only runs when changes are detected in:
test/java/spark/**other/java/hdfs2/**other/java/hdfs3/**other/java/client/**- The workflow file itself
Jobs
Job 1: spark-tests (Required)
Duration: ~5-10 minutes
Steps:
- ✓ Checkout code
- ✓ Setup JDK 11
- ✓ Start SeaweedFS (master, volume, filer)
- ✓ Build project
- ✓ Run all integration tests (10 tests)
- ✓ Upload test results
- ✓ Publish test report
- ✓ Cleanup
Test Coverage:
- SparkReadWriteTest: 6 tests
- SparkSQLTest: 4 tests
Job 2: spark-example (Optional)
Duration: ~5 minutes
Runs: Only on push/manual trigger (not on PRs)
Steps:
- ✓ Checkout code
- ✓ Setup JDK 11
- ✓ Download Apache Spark 3.5.0 (cached)
- ✓ Start SeaweedFS
- ✓ Build project
- ✓ Run example Spark application
- ✓ Verify output
- ✓ Cleanup
Job 3: summary (Status Check)
Duration: < 1 minute
Provides overall test status summary.
Viewing Results
In GitHub UI
- Go to the Actions tab in your GitHub repository
- Click on Spark Integration Tests workflow
- View individual workflow runs
- Check test reports and logs
Status Badge
Add this badge to your README.md to show the workflow status:
[](https://github.com/seaweedfs/seaweedfs/actions/workflows/spark-integration-tests.yml)
Test Reports
After each run:
- Test results are uploaded as artifacts (retained for 30 days)
- Detailed JUnit reports are published
- Logs are available for each step
Configuration
Environment Variables
Set in the workflow:
env:
SEAWEEDFS_TEST_ENABLED: true
SEAWEEDFS_FILER_HOST: localhost
SEAWEEDFS_FILER_PORT: 8888
SEAWEEDFS_FILER_GRPC_PORT: 18888
Timeout
- spark-tests job: 30 minutes max
- spark-example job: 20 minutes max
Troubleshooting CI Failures
SeaweedFS Connection Issues
Symptom: Tests fail with connection refused
Check:
- View SeaweedFS logs in the workflow output
- Look for "Display SeaweedFS logs on failure" step
- Verify health check succeeded
Solution: The workflow already includes retry logic and health checks
Test Failures
Symptom: Tests pass locally but fail in CI
Check:
- Download test artifacts from the workflow run
- Review detailed surefire reports
- Check for timing issues or resource constraints
Common Issues:
- Docker startup timing (already handled with 30 retries)
- Network issues (retry logic included)
- Resource limits (CI has sufficient memory)
Build Failures
Symptom: Maven build fails
Check:
- Verify dependencies are available
- Check Maven cache
- Review build logs
Example Application Failures
Note: This job is optional and only runs on push/manual trigger
Check:
- Verify Spark was downloaded and cached correctly
- Check spark-submit logs
- Verify SeaweedFS output directory
Manual Workflow Trigger
To manually run the workflow:
- Go to Actions tab
- Select Spark Integration Tests
- Click Run workflow button
- Select branch
- Click Run workflow
This is useful for:
- Testing changes before pushing
- Re-running failed tests
- Testing with different configurations
Local Testing Matching CI
To run tests locally that match the CI environment:
# Use the same Docker setup as CI
cd test/java/spark
docker-compose up -d seaweedfs-master seaweedfs-volume seaweedfs-filer
# Wait for services (same as CI)
for i in {1..30}; do
curl -f http://localhost:8888/ && break
sleep 2
done
# Run tests (same environment variables as CI)
export SEAWEEDFS_TEST_ENABLED=true
export SEAWEEDFS_FILER_HOST=localhost
export SEAWEEDFS_FILER_PORT=8888
export SEAWEEDFS_FILER_GRPC_PORT=18888
mvn test -B
# Cleanup
docker-compose down -v
Maintenance
Updating Spark Version
To update to a newer Spark version:
- Update
pom.xml: Change<spark.version> - Update workflow: Change Spark download URL
- Test locally first
- Create PR to test in CI
Updating Java Version
- Update
pom.xml: Change<maven.compiler.source>and<target> - Update workflow: Change JDK version in
setup-javasteps - Test locally
- Update README with new requirements
Adding New Tests
New test classes are automatically discovered and run by the workflow. Just ensure they:
- Extend
SparkTestBase - Use
skipIfTestsDisabled() - Are in the correct package
CI Performance
Typical Run Times
| Job | Duration | Can Fail Build? |
|---|---|---|
| spark-tests | 5-10 min | Yes |
| spark-example | 5 min | No (optional) |
| summary | < 1 min | Only if tests fail |
Optimizations
The workflow includes:
- ✓ Maven dependency caching
- ✓ Spark binary caching
- ✓ Parallel job execution
- ✓ Smart path filtering
- ✓ Docker layer caching
Resource Usage
- Memory: ~4GB per job
- Disk: ~2GB (cached)
- Network: ~500MB (first run)
Security Considerations
- No secrets required (tests use default ports)
- Runs in isolated Docker environment
- Clean up removes all test data
- No external services accessed
Future Enhancements
Potential improvements:
- Matrix testing (multiple Spark versions)
- Performance benchmarking
- Code coverage reporting
- Integration with larger datasets
- Multi-node Spark cluster testing
Support
If CI tests fail:
- Check workflow logs in GitHub Actions
- Download test artifacts for detailed reports
- Try reproducing locally using the "Local Testing" section above
- Review recent changes in the failing paths
- Check SeaweedFS logs in the workflow output
For persistent issues:
- Open an issue with workflow run link
- Include test failure logs
- Note if it passes locally