Browse Source

Merge branch 'master' into ec_rebuild_fix_parallelization

pull/7466/head
Chris Lu 1 month ago
committed by GitHub
parent
commit
c0c3d6bcc0
No known key found for this signature in database GPG Key ID: B5690EEEBB952194
  1. 168
      .github/workflows/container_foundationdb_version.yml
  2. 71
      .github/workflows/container_release_foundationdb.yml
  3. 2
      .github/workflows/depsreview.yml
  4. 170
      .github/workflows/java_integration_tests.yml
  5. 64
      .github/workflows/java_unit_tests.yml
  6. 4
      .github/workflows/kafka-tests.yml
  7. 163
      .github/workflows/s3-parquet-tests.yml
  8. 12
      .github/workflows/s3-sse-tests.yml
  9. 11
      .github/workflows/s3tests.yml
  10. 27
      .github/workflows/test-s3-over-https-using-awscli.yml
  11. 242
      BUCKET_POLICY_ENGINE_INTEGRATION.md
  12. 131
      docker/Dockerfile.foundationdb_large
  13. 19
      docker/filer_foundationdb.toml
  14. 61
      docker/get_fdb_checksum.sh
  15. 29
      go.mod
  16. 60
      go.sum
  17. 323
      other/java/client/src/test/java/seaweedfs/client/FilerClientIntegrationTest.java
  18. 417
      other/java/client/src/test/java/seaweedfs/client/SeaweedStreamIntegrationTest.java
  19. 2
      other/java/hdfs-over-ftp/pom.xml
  20. 190
      other/java/hdfs2/README.md
  21. 21
      other/java/hdfs2/pom.xml
  22. 90
      other/java/hdfs2/src/test/java/seaweed/hdfs/SeaweedFileSystemConfigTest.java
  23. 379
      other/java/hdfs2/src/test/java/seaweed/hdfs/SeaweedFileSystemTest.java
  24. 190
      other/java/hdfs3/README.md
  25. 263
      other/java/hdfs3/dependency-reduced-pom.xml
  26. 21
      other/java/hdfs3/pom.xml
  27. 90
      other/java/hdfs3/src/test/java/seaweed/hdfs/SeaweedFileSystemConfigTest.java
  28. 379
      other/java/hdfs3/src/test/java/seaweed/hdfs/SeaweedFileSystemTest.java
  29. 2
      seaweedfs-rdma-sidecar/README.md
  30. 1
      test/erasure_coding/ec_integration_test.go
  31. 77
      test/foundationdb/Dockerfile.build
  32. 84
      test/foundationdb/Dockerfile.build.arm64
  33. 51
      test/foundationdb/Dockerfile.fdb-arm64
  34. 38
      test/foundationdb/Dockerfile.test
  35. 223
      test/foundationdb/Makefile
  36. 134
      test/foundationdb/README.ARM64.md
  37. 372
      test/foundationdb/README.md
  38. 177
      test/foundationdb/docker-compose.arm64.yml
  39. 101
      test/foundationdb/docker-compose.build.yml
  40. 100
      test/foundationdb/docker-compose.simple.yml
  41. 128
      test/foundationdb/docker-compose.yml
  42. 19
      test/foundationdb/filer.toml
  43. 445
      test/foundationdb/foundationdb_concurrent_test.go
  44. 370
      test/foundationdb/foundationdb_integration_test.go
  45. 424
      test/foundationdb/mock_integration_test.go
  46. 31
      test/foundationdb/s3.json
  47. 128
      test/foundationdb/test_fdb_s3.sh
  48. 174
      test/foundationdb/validation_test.go
  49. 109
      test/foundationdb/wait_for_services.sh
  50. 1
      test/fuse_integration/framework.go
  51. 2
      test/kafka/README.md
  52. 12
      test/kafka/go.mod
  53. 32
      test/kafka/go.sum
  54. 6
      test/kafka/kafka-client-loadtest/go.mod
  55. 12
      test/kafka/kafka-client-loadtest/go.sum
  56. 2
      test/kafka/loadtest/run_million_record_test.sh
  57. 2
      test/kafka/loadtest/setup_seaweed_infrastructure.sh
  58. 2
      test/kafka/scripts/test_schema_registry.sh
  59. 4
      test/mq/README.md
  60. 2
      test/s3/copying/Makefile
  61. 3
      test/s3/iam/Makefile
  62. 2
      test/s3/iam/README-Docker.md
  63. 2
      test/s3/iam/README.md
  64. 2
      test/s3/iam/STS_DISTRIBUTED.md
  65. 40
      test/s3/iam/iam_config.github.json
  66. 40
      test/s3/iam/iam_config.json
  67. 40
      test/s3/iam/iam_config.local.json
  68. 14
      test/s3/iam/iam_config_distributed.json
  69. 14
      test/s3/iam/iam_config_docker.json
  70. 16
      test/s3/iam/s3_iam_framework.go
  71. 45
      test/s3/iam/s3_iam_integration_test.go
  72. 34
      test/s3/iam/setup_keycloak_docker.sh
  73. 28
      test/s3/iam/test_config.json
  74. 40
      test/s3/parquet/.gitignore
  75. 172
      test/s3/parquet/CROSS_FILESYSTEM_COMPATIBILITY.md
  76. 58
      test/s3/parquet/FINAL_ROOT_CAUSE_ANALYSIS.md
  77. 70
      test/s3/parquet/MINIO_DIRECTORY_HANDLING.md
  78. 482
      test/s3/parquet/Makefile
  79. 312
      test/s3/parquet/README.md
  80. 46
      test/s3/parquet/TEST_COVERAGE.md
  81. 134
      test/s3/parquet/example_pyarrow_native.py
  82. 41
      test/s3/parquet/parquet_test_utils.py
  83. 7
      test/s3/parquet/requirements.txt
  84. 421
      test/s3/parquet/s3_parquet_test.py
  85. 453
      test/s3/parquet/test_cross_filesystem_compatibility.py
  86. 307
      test/s3/parquet/test_implicit_directory_fix.py
  87. 383
      test/s3/parquet/test_pyarrow_native_s3.py
  88. 254
      test/s3/parquet/test_sse_s3_compatibility.py
  89. 4
      test/s3/sse/Makefile
  90. 104
      test/s3/sse/s3_range_headers_test.go
  91. 445
      test/s3/sse/s3_sse_range_server_test.go
  92. 5
      weed/cluster/cluster.go
  93. 178
      weed/command/backup.go
  94. 5
      weed/command/iam.go
  95. 58
      weed/credential/filer_etc/filer_etc_identity.go
  96. 40
      weed/credential/filer_etc/filer_etc_policy.go
  97. 4
      weed/filer/filer.go
  98. 4
      weed/filer/filer_notify.go
  99. 385
      weed/filer/foundationdb/CONFIGURATION.md
  100. 435
      weed/filer/foundationdb/INSTALL.md

168
.github/workflows/container_foundationdb_version.yml

@ -0,0 +1,168 @@
name: "docker: build foundationdb image by version"
on:
pull_request:
branches: [ master, main ]
paths:
- 'weed/filer/foundationdb/**'
- 'test/foundationdb/**'
- 'docker/Dockerfile.foundationdb_large'
- 'docker/filer_foundationdb.toml'
- '.github/workflows/container_foundationdb_version.yml'
workflow_dispatch:
inputs:
fdb_version:
description: 'FoundationDB version to build (e.g. 7.4.5)'
required: true
default: '7.4.5'
seaweedfs_ref:
description: 'SeaweedFS git tag, branch, or commit to build'
required: true
default: 'master'
image_tag:
description: 'Optional Docker tag suffix (defaults to foundationdb_<fdb>_seaweedfs_<ref>)'
required: false
default: ''
permissions:
contents: read
jobs:
build-foundationdb-image:
runs-on: ubuntu-latest
steps:
- name: Checkout
uses: actions/checkout@v4
with:
fetch-depth: 0
- name: Install FoundationDB client libraries
run: |
set -euo pipefail
sudo apt-get update
sudo apt-get install -y ca-certificates wget
FDB_VERSION="${{ inputs.fdb_version || '7.4.5' }}"
case "${FDB_VERSION}_amd64" in
"7.4.5_amd64") EXPECTED_SHA256="eea6b98cf386a0848655b2e196d18633662a7440a7ee061c10e32153c7e7e112" ;;
"7.3.43_amd64") EXPECTED_SHA256="c3fa0a59c7355b914a1455dac909238d5ea3b6c6bc7b530af8597e6487c1651a" ;;
*)
echo "Unsupported FoundationDB version ${FDB_VERSION} for CI client install" >&2
exit 1 ;;
esac
PACKAGE="foundationdb-clients_${FDB_VERSION}-1_amd64.deb"
wget --timeout=30 --tries=3 -O "${PACKAGE}" "https://github.com/apple/foundationdb/releases/download/${FDB_VERSION}/${PACKAGE}"
echo "${EXPECTED_SHA256} ${PACKAGE}" | sha256sum -c -
sudo dpkg -i "${PACKAGE}"
rm "${PACKAGE}"
sudo ldconfig
- name: Set up Go
uses: actions/setup-go@v5
with:
go-version-file: go.mod
- name: Run FoundationDB tagged tests
env:
CGO_ENABLED: 1
run: |
go test ./weed/filer/foundationdb -tags foundationdb -count=1
- name: Prepare Docker tag
id: tag
env:
FDB_VERSION_INPUT: ${{ inputs.fdb_version }}
SEAWEEDFS_REF_INPUT: ${{ inputs.seaweedfs_ref }}
CUSTOM_TAG_INPUT: ${{ inputs.image_tag }}
EVENT_NAME: ${{ github.event_name }}
HEAD_REF: ${{ github.head_ref }}
REF_NAME: ${{ github.ref_name }}
run: |
set -euo pipefail
sanitize() {
local value="$1"
value="${value,,}"
value="${value// /-}"
value="${value//[^a-z0-9_.-]/-}"
value="${value#-}"
value="${value%-}"
printf '%s' "$value"
}
version="${FDB_VERSION_INPUT}"
seaweed="${SEAWEEDFS_REF_INPUT}"
tag="${CUSTOM_TAG_INPUT}"
# Use defaults for PR builds
if [ -z "$version" ]; then
version="7.4.5"
fi
if [ -z "$seaweed" ]; then
if [ "$EVENT_NAME" = "pull_request" ]; then
seaweed="${HEAD_REF}"
else
seaweed="${REF_NAME}"
fi
fi
sanitized_version="$(sanitize "$version")"
if [ -z "$sanitized_version" ]; then
echo "Unable to sanitize FoundationDB version '$version'." >&2
exit 1
fi
sanitized_seaweed="$(sanitize "$seaweed")"
if [ -z "$sanitized_seaweed" ]; then
echo "Unable to sanitize SeaweedFS ref '$seaweed'." >&2
exit 1
fi
if [ -z "$tag" ]; then
tag="foundationdb_${sanitized_version}_seaweedfs_${sanitized_seaweed}"
else
tag="$(sanitize "$tag")"
fi
if [ -z "$tag" ]; then
echo "Resulting Docker tag is empty." >&2
exit 1
fi
echo "docker_tag=$tag" >> "$GITHUB_OUTPUT"
echo "full_image=chrislusf/seaweedfs:$tag" >> "$GITHUB_OUTPUT"
echo "seaweedfs_ref=$seaweed" >> "$GITHUB_OUTPUT"
- name: Set up QEMU
uses: docker/setup-qemu-action@v3
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v3
- name: Login to Docker Hub
uses: docker/login-action@v3
with:
username: ${{ secrets.DOCKER_USERNAME }}
password: ${{ secrets.DOCKER_PASSWORD }}
- name: Determine branch to build
id: branch
run: |
if [ -n "${{ inputs.seaweedfs_ref }}" ]; then
echo "branch=${{ inputs.seaweedfs_ref }}" >> "$GITHUB_OUTPUT"
elif [ "${{ github.event_name }}" = "pull_request" ]; then
echo "branch=${{ github.head_ref }}" >> "$GITHUB_OUTPUT"
else
echo "branch=${{ github.ref_name }}" >> "$GITHUB_OUTPUT"
fi
- name: Build and push image
uses: docker/build-push-action@v6
with:
context: ./docker
push: ${{ github.event_name != 'pull_request' }}
file: ./docker/Dockerfile.foundationdb_large
build-args: |
FDB_VERSION=${{ inputs.fdb_version || '7.4.5' }}
BRANCH=${{ steps.branch.outputs.branch }}
# Note: ARM64 support requires FoundationDB ARM64 packages which are not available for all versions
# Currently only building for amd64. To enable ARM64, verify package availability and add checksums.
platforms: linux/amd64
tags: ${{ steps.tag.outputs.full_image || 'seaweedfs:foundationdb-test' }}
labels: |
org.opencontainers.image.title=seaweedfs
org.opencontainers.image.description=SeaweedFS is a distributed storage system for blobs, objects, files, and data lake, to store and serve billions of files fast!
org.opencontainers.image.vendor=Chris Lu

71
.github/workflows/container_release_foundationdb.yml

@ -0,0 +1,71 @@
name: "docker: build release containers for foundationdb"
on:
push:
tags:
- '*'
workflow_dispatch: {}
permissions:
contents: read
jobs:
build-large-release-container_foundationdb:
runs-on: [ubuntu-latest]
steps:
-
name: Checkout
uses: actions/checkout@v4
-
name: Docker meta
id: docker_meta
uses: docker/metadata-action@v5
with:
images: |
chrislusf/seaweedfs
tags: |
type=ref,event=tag,suffix=_large_disk_foundationdb
flavor: |
latest=false
labels: |
org.opencontainers.image.title=seaweedfs
org.opencontainers.image.description=SeaweedFS is a distributed storage system for blobs, objects, files, and data lake, to store and serve billions of files fast!
org.opencontainers.image.vendor=Chris Lu
-
name: Set up QEMU
uses: docker/setup-qemu-action@v3
-
name: Set up Docker Buildx
uses: docker/setup-buildx-action@v3
-
name: Login to Docker Hub
if: github.event_name != 'pull_request'
uses: docker/login-action@v3
with:
username: ${{ secrets.DOCKER_USERNAME }}
password: ${{ secrets.DOCKER_PASSWORD }}
-
name: Determine branch to build
id: branch
run: |
if [ "${{ github.event_name }}" = "push" ] && [ -n "${{ github.ref_name }}" ]; then
echo "branch=${{ github.ref_name }}" >> "$GITHUB_OUTPUT"
else
echo "branch=master" >> "$GITHUB_OUTPUT"
fi
-
name: Build
uses: docker/build-push-action@v6
with:
context: ./docker
push: ${{ github.event_name != 'pull_request' }}
file: ./docker/Dockerfile.foundationdb_large
build-args: |
BRANCH=${{ steps.branch.outputs.branch }}
# Note: ARM64 support requires FoundationDB ARM64 packages which are not available for all versions
platforms: linux/amd64
tags: ${{ steps.docker_meta.outputs.tags }}
labels: ${{ steps.docker_meta.outputs.labels }}

2
.github/workflows/depsreview.yml

@ -11,4 +11,4 @@ jobs:
- name: 'Checkout Repository'
uses: actions/checkout@08c6903cd8c0fde910a37f88322edcfb5dd907a8
- name: 'Dependency Review'
uses: actions/dependency-review-action@40c09b7dc99638e5ddb0bfd91c1673effc064d8a
uses: actions/dependency-review-action@3c4e3dcb1aa7874d2c16be7d79418e9b7efd6261

170
.github/workflows/java_integration_tests.yml

@ -0,0 +1,170 @@
name: Java Client Integration Tests
on:
push:
branches: [ master ]
paths:
- 'other/java/**'
- 'weed/**'
- '.github/workflows/java_integration_tests.yml'
pull_request:
branches: [ master ]
paths:
- 'other/java/**'
- 'weed/**'
- '.github/workflows/java_integration_tests.yml'
jobs:
test:
name: Java Integration Tests
runs-on: ubuntu-latest
strategy:
matrix:
java: ['11', '17']
steps:
- name: Checkout code
uses: actions/checkout@v4
- name: Set up Go
uses: actions/setup-go@v6
with:
go-version-file: 'go.mod'
id: go
- name: Set up Java
uses: actions/setup-java@v4
with:
java-version: ${{ matrix.java }}
distribution: 'temurin'
cache: 'maven'
- name: Build SeaweedFS
run: |
cd weed
go install -buildvcs=false
weed version
- name: Start SeaweedFS Server
run: |
# Create clean data directory
export WEED_DATA_DIR="/tmp/seaweedfs-java-tests-$(date +%s)"
mkdir -p "$WEED_DATA_DIR"
# Start SeaweedFS with optimized settings for CI
weed server -dir="$WEED_DATA_DIR" \
-master.raftHashicorp \
-master.electionTimeout=1s \
-master.volumeSizeLimitMB=100 \
-volume.max=100 \
-volume.preStopSeconds=1 \
-master.peers=none \
-filer -filer.maxMB=64 \
-master.port=9333 \
-volume.port=8080 \
-filer.port=8888 \
-metricsPort=9324 > seaweedfs.log 2>&1 &
SERVER_PID=$!
echo "SERVER_PID=$SERVER_PID" >> $GITHUB_ENV
echo "WEED_DATA_DIR=$WEED_DATA_DIR" >> $GITHUB_ENV
echo "SeaweedFS server started with PID: $SERVER_PID"
- name: Wait for SeaweedFS Components
run: |
echo "Waiting for SeaweedFS components to start..."
# Wait for master
for i in {1..30}; do
if curl -s http://localhost:9333/cluster/status > /dev/null 2>&1; then
echo "✓ Master server is ready"
break
fi
echo "Waiting for master server... ($i/30)"
sleep 2
done
# Wait for volume
for i in {1..30}; do
if curl -s http://localhost:8080/status > /dev/null 2>&1; then
echo "✓ Volume server is ready"
break
fi
echo "Waiting for volume server... ($i/30)"
sleep 2
done
# Wait for filer
for i in {1..30}; do
if curl -s http://localhost:8888/ > /dev/null 2>&1; then
echo "✓ Filer is ready"
break
fi
echo "Waiting for filer... ($i/30)"
sleep 2
done
echo "✓ All SeaweedFS components are ready!"
# Display cluster status
echo "Cluster status:"
curl -s http://localhost:9333/cluster/status | head -20
- name: Build and Install SeaweedFS Client
working-directory: other/java/client
run: |
mvn clean install -DskipTests -Dmaven.javadoc.skip=true -Dgpg.skip=true
- name: Run Client Unit Tests
working-directory: other/java/client
run: |
mvn test -Dtest=SeaweedReadTest,SeaweedCipherTest
- name: Run Client Integration Tests
working-directory: other/java/client
env:
SEAWEEDFS_TEST_ENABLED: true
run: |
mvn test -Dtest=*IntegrationTest
- name: Run HDFS2 Configuration Tests
working-directory: other/java/hdfs2
run: |
mvn test -Dtest=SeaweedFileSystemConfigTest -Dmaven.javadoc.skip=true -Dgpg.skip=true
- name: Run HDFS3 Configuration Tests
working-directory: other/java/hdfs3
run: |
mvn test -Dtest=SeaweedFileSystemConfigTest -Dmaven.javadoc.skip=true -Dgpg.skip=true
- name: Display logs on failure
if: failure()
run: |
echo "=== SeaweedFS Server Log ==="
tail -100 seaweedfs.log || echo "No server log"
echo ""
echo "=== Cluster Status ==="
curl -s http://localhost:9333/cluster/status || echo "Cannot reach cluster"
echo ""
echo "=== Process Status ==="
ps aux | grep weed || echo "No weed processes"
- name: Cleanup
if: always()
run: |
# Stop server using stored PID
if [ -n "$SERVER_PID" ]; then
echo "Stopping SeaweedFS server (PID: $SERVER_PID)"
kill -9 $SERVER_PID 2>/dev/null || true
fi
# Fallback: kill any remaining weed processes
pkill -f "weed server" || true
# Clean up data directory
if [ -n "$WEED_DATA_DIR" ]; then
echo "Cleaning up data directory: $WEED_DATA_DIR"
rm -rf "$WEED_DATA_DIR" || true
fi

64
.github/workflows/java_unit_tests.yml

@ -0,0 +1,64 @@
name: Java Client Unit Tests
on:
push:
branches: [ master ]
paths:
- 'other/java/**'
- '.github/workflows/java_unit_tests.yml'
pull_request:
branches: [ master ]
paths:
- 'other/java/**'
- '.github/workflows/java_unit_tests.yml'
jobs:
test:
name: Java Unit Tests
runs-on: ubuntu-latest
strategy:
matrix:
java: ['8', '11', '17', '21']
steps:
- name: Checkout code
uses: actions/checkout@v4
- name: Set up Java
uses: actions/setup-java@v4
with:
java-version: ${{ matrix.java }}
distribution: 'temurin'
cache: 'maven'
- name: Build and Install SeaweedFS Client
working-directory: other/java/client
run: |
mvn clean install -DskipTests -Dmaven.javadoc.skip=true -Dgpg.skip=true
- name: Run Client Unit Tests
working-directory: other/java/client
run: |
mvn test -Dtest=SeaweedReadTest,SeaweedCipherTest
- name: Run HDFS2 Configuration Tests
working-directory: other/java/hdfs2
run: |
mvn test -Dtest=SeaweedFileSystemConfigTest -Dmaven.javadoc.skip=true -Dgpg.skip=true
- name: Run HDFS3 Configuration Tests
working-directory: other/java/hdfs3
run: |
mvn test -Dtest=SeaweedFileSystemConfigTest -Dmaven.javadoc.skip=true -Dgpg.skip=true
- name: Upload Test Reports
if: always()
uses: actions/upload-artifact@v5
with:
name: test-reports-java-${{ matrix.java }}
path: |
other/java/client/target/surefire-reports/
other/java/hdfs2/target/surefire-reports/
other/java/hdfs3/target/surefire-reports/

4
.github/workflows/kafka-tests.yml

@ -176,6 +176,7 @@ jobs:
-filer.port=8888 \
-filer=true \
-metricsPort=9325 \
-master.peers=none \
> /tmp/weed-server.log 2>&1 &
# Wait for master to be ready
@ -353,6 +354,7 @@ jobs:
-filer.port=8888 \
-filer=true \
-metricsPort=9325 \
-master.peers=none \
> /tmp/weed-server.log 2>&1 &
# Wait for master to be ready
@ -512,6 +514,7 @@ jobs:
-filer.port=8888 \
-filer=true \
-metricsPort=9325 \
-master.peers=none \
> /tmp/weed-server.log 2>&1 &
# Wait for master to be ready
@ -668,6 +671,7 @@ jobs:
-filer.port=8888 \
-filer=true \
-metricsPort=9325 \
-master.peers=none \
> /tmp/weed-server.log 2>&1 &
# Wait for master to be ready

163
.github/workflows/s3-parquet-tests.yml

@ -0,0 +1,163 @@
name: "S3 PyArrow Parquet Tests"
on:
push:
branches: [master]
paths:
- 'weed/s3api/**'
- 'weed/filer/**'
- 'test/s3/parquet/**'
- '.github/workflows/s3-parquet-tests.yml'
pull_request:
branches: [master]
paths:
- 'weed/s3api/**'
- 'weed/filer/**'
- 'test/s3/parquet/**'
- '.github/workflows/s3-parquet-tests.yml'
workflow_dispatch:
env:
S3_ACCESS_KEY: some_access_key1
S3_SECRET_KEY: some_secret_key1
S3_ENDPOINT_URL: http://localhost:8333
BUCKET_NAME: test-parquet-bucket
jobs:
parquet-integration-tests:
name: PyArrow Parquet Tests (Python ${{ matrix.python-version }})
runs-on: ubuntu-latest
timeout-minutes: 20
strategy:
fail-fast: false
matrix:
python-version: ['3.9', '3.11', '3.12']
steps:
- name: Checkout code
uses: actions/checkout@v4
- name: Set up Go
uses: actions/setup-go@v5
with:
go-version: ^1.24
cache: true
- name: Set up Python ${{ matrix.python-version }}
uses: actions/setup-python@v5
with:
python-version: ${{ matrix.python-version }}
cache: 'pip'
cache-dependency-path: 'test/s3/parquet/requirements.txt'
- name: Install system dependencies
run: |
sudo apt-get update
sudo apt-get install -y lsof netcat-openbsd
- name: Build SeaweedFS
run: |
cd weed
go build -v
sudo cp weed /usr/local/bin/
weed version
- name: Run PyArrow Parquet integration tests
run: |
cd test/s3/parquet
make test-with-server
env:
SEAWEEDFS_BINARY: weed
S3_PORT: 8333
FILER_PORT: 8888
VOLUME_PORT: 8080
MASTER_PORT: 9333
VOLUME_MAX_SIZE_MB: 50
- name: Run implicit directory fix tests
run: |
cd test/s3/parquet
make test-implicit-dir-with-server
env:
SEAWEEDFS_BINARY: weed
S3_PORT: 8333
FILER_PORT: 8888
VOLUME_PORT: 8080
MASTER_PORT: 9333
- name: Run PyArrow native S3 filesystem tests
run: |
cd test/s3/parquet
make test-native-s3-with-server
env:
SEAWEEDFS_BINARY: weed
S3_PORT: 8333
FILER_PORT: 8888
VOLUME_PORT: 8080
MASTER_PORT: 9333
- name: Run cross-filesystem compatibility tests
run: |
cd test/s3/parquet
TEST_QUICK=1 make test-cross-fs-with-server
env:
SEAWEEDFS_BINARY: weed
S3_PORT: 8333
FILER_PORT: 8888
VOLUME_PORT: 8080
MASTER_PORT: 9333
- name: Run SSE-S3 encryption compatibility tests
run: |
cd test/s3/parquet
make test-sse-s3-compat
env:
SEAWEEDFS_BINARY: weed
S3_PORT: 8333
FILER_PORT: 8888
VOLUME_PORT: 8080
MASTER_PORT: 9333
- name: Upload test logs on failure
if: failure()
uses: actions/upload-artifact@v4
with:
name: test-logs-python-${{ matrix.python-version }}
path: |
/tmp/seaweedfs-parquet-*.log
test/s3/parquet/*.log
retention-days: 7
- name: Cleanup
if: always()
run: |
cd test/s3/parquet
make stop-seaweedfs-safe || true
make clean || true
unit-tests:
name: Go Unit Tests (Implicit Directory)
runs-on: ubuntu-latest
timeout-minutes: 10
steps:
- name: Checkout code
uses: actions/checkout@v4
- name: Set up Go
uses: actions/setup-go@v5
with:
go-version: ^1.24
cache: true
- name: Run Go unit tests
run: |
cd weed/s3api
go test -v -run TestImplicitDirectory
- name: Run all S3 API tests
run: |
cd weed/s3api
go test -v -timeout 5m

12
.github/workflows/s3-sse-tests.yml

@ -4,6 +4,7 @@ on:
pull_request:
paths:
- 'weed/s3api/s3_sse_*.go'
- 'weed/s3api/s3api_object_handlers.go'
- 'weed/s3api/s3api_object_handlers_put.go'
- 'weed/s3api/s3api_object_handlers_copy*.go'
- 'weed/server/filer_server_handlers_*.go'
@ -14,6 +15,7 @@ on:
branches: [ master, main ]
paths:
- 'weed/s3api/s3_sse_*.go'
- 'weed/s3api/s3api_object_handlers.go'
- 'weed/s3api/s3api_object_handlers_put.go'
- 'weed/s3api/s3api_object_handlers_copy*.go'
- 'weed/server/filer_server_handlers_*.go'
@ -68,11 +70,11 @@ jobs:
# Run tests with automatic server management
# The test-with-server target handles server startup/shutdown automatically
if [ "${{ matrix.test-type }}" = "quick" ]; then
# Quick tests - basic SSE-C and SSE-KMS functionality
make test-with-server TEST_PATTERN="TestSSECIntegrationBasic|TestSSEKMSIntegrationBasic|TestSimpleSSECIntegration"
# Quick tests - basic SSE-C and SSE-KMS functionality + Range requests
make test-with-server TEST_PATTERN="TestSSECIntegrationBasic|TestSSEKMSIntegrationBasic|TestSimpleSSECIntegration|.*RangeRequestsServerBehavior"
else
# Comprehensive tests - SSE-C/KMS functionality, excluding copy operations (pre-existing SSE-C issues)
make test-with-server TEST_PATTERN="TestSSECIntegrationBasic|TestSSECIntegrationVariousDataSizes|TestSSEKMSIntegrationBasic|TestSSEKMSIntegrationVariousDataSizes|.*Multipart.*Integration|TestSimpleSSECIntegration"
make test-with-server TEST_PATTERN="TestSSECIntegrationBasic|TestSSECIntegrationVariousDataSizes|TestSSEKMSIntegrationBasic|TestSSEKMSIntegrationVariousDataSizes|.*Multipart.*Integration|TestSimpleSSECIntegration|.*RangeRequestsServerBehavior"
fi
- name: Show server logs on failure
@ -127,8 +129,8 @@ jobs:
uname -a
free -h
# Run the specific tests that validate AWS S3 SSE compatibility - both SSE-C and SSE-KMS basic functionality
make test-with-server TEST_PATTERN="TestSSECIntegrationBasic|TestSSEKMSIntegrationBasic" || {
# Run the specific tests that validate AWS S3 SSE compatibility - both SSE-C and SSE-KMS basic functionality plus Range requests
make test-with-server TEST_PATTERN="TestSSECIntegrationBasic|TestSSEKMSIntegrationBasic|.*RangeRequestsServerBehavior" || {
echo "❌ SSE compatibility test failed, checking logs..."
if [ -f weed-test.log ]; then
echo "=== Server logs ==="

11
.github/workflows/s3tests.yml

@ -59,12 +59,12 @@ jobs:
# Create clean data directory for this test run
export WEED_DATA_DIR="/tmp/seaweedfs-s3tests-$(date +%s)"
mkdir -p "$WEED_DATA_DIR"
weed -v 0 server -filer -filer.maxMB=64 -s3 -ip.bind 0.0.0.0 \
weed -v 3 server -filer -filer.maxMB=64 -s3 -ip.bind 0.0.0.0 \
-dir="$WEED_DATA_DIR" \
-master.raftHashicorp -master.electionTimeout 1s -master.volumeSizeLimitMB=100 \
-volume.max=100 -volume.preStopSeconds=1 \
-master.port=9333 -volume.port=8080 -filer.port=8888 -s3.port=8000 -metricsPort=9324 \
-s3.allowEmptyFolder=false -s3.allowDeleteBucketNotEmpty=true -s3.config="$GITHUB_WORKSPACE/docker/compose/s3.json" &
-s3.allowEmptyFolder=false -s3.allowDeleteBucketNotEmpty=true -s3.config="$GITHUB_WORKSPACE/docker/compose/s3.json" -master.peers=none &
pid=$!
# Wait for all SeaweedFS components to be ready
@ -368,7 +368,7 @@ jobs:
-master.raftHashicorp -master.electionTimeout 1s -master.volumeSizeLimitMB=100 \
-volume.max=100 -volume.preStopSeconds=1 \
-master.port=9334 -volume.port=8081 -filer.port=8889 -s3.port=8001 -metricsPort=9325 \
-s3.allowEmptyFolder=false -s3.allowDeleteBucketNotEmpty=true -s3.config="$GITHUB_WORKSPACE/docker/compose/s3.json" &
-s3.allowEmptyFolder=false -s3.allowDeleteBucketNotEmpty=true -s3.config="$GITHUB_WORKSPACE/docker/compose/s3.json" -master.peers=none &
pid=$!
# Wait for all SeaweedFS components to be ready
@ -526,7 +526,7 @@ jobs:
-master.raftHashicorp -master.electionTimeout 1s -master.volumeSizeLimitMB=100 \
-volume.max=100 -volume.preStopSeconds=1 \
-master.port=9335 -volume.port=8082 -filer.port=8890 -s3.port=8002 -metricsPort=9326 \
-s3.allowEmptyFolder=false -s3.allowDeleteBucketNotEmpty=true -s3.config="$GITHUB_WORKSPACE/docker/compose/s3.json" &
-s3.allowEmptyFolder=false -s3.allowDeleteBucketNotEmpty=true -s3.config="$GITHUB_WORKSPACE/docker/compose/s3.json" -master.peers=none &
pid=$!
# Wait for all SeaweedFS components to be ready
@ -636,7 +636,7 @@ jobs:
-master.raftHashicorp -master.electionTimeout 1s -master.volumeSizeLimitMB=100 \
-volume.max=100 -volume.preStopSeconds=1 \
-master.port=9336 -volume.port=8083 -filer.port=8891 -s3.port=8003 -metricsPort=9327 \
-s3.allowEmptyFolder=false -s3.allowDeleteBucketNotEmpty=true -s3.config="$GITHUB_WORKSPACE/docker/compose/s3.json" &
-s3.allowEmptyFolder=false -s3.allowDeleteBucketNotEmpty=true -s3.config="$GITHUB_WORKSPACE/docker/compose/s3.json" -master.peers=none &
pid=$!
# Wait for all SeaweedFS components to be ready
@ -818,6 +818,7 @@ jobs:
-volume.max=100 -volume.preStopSeconds=1 \
-master.port=9337 -volume.port=8085 -filer.port=8892 -s3.port=8004 -metricsPort=9328 \
-s3.allowEmptyFolder=false -s3.allowDeleteBucketNotEmpty=true -s3.config="$GITHUB_WORKSPACE/docker/compose/s3.json" \
-master.peers=none \
> /tmp/seaweedfs-sql-server.log 2>&1 &
pid=$!

27
.github/workflows/test-s3-over-https-using-awscli.yml

@ -34,7 +34,7 @@ jobs:
run: |
set -e
mkdir -p /tmp/data
./weed server -s3 -dir=/tmp/data -s3.config=../docker/compose/s3.json &
./weed -v=3 server -s3 -dir=/tmp/data -s3.config=../docker/compose/s3.json -master.peers=none > weed.log 2>&1 &
until curl -s http://localhost:8333/ > /dev/null; do sleep 1; done
- name: Setup Caddy
@ -83,6 +83,29 @@ jobs:
set -e
dd if=/dev/urandom of=generated bs=1M count=32
ETAG=$(aws --no-verify-ssl s3api put-object --bucket bucket --key test-get-obj --body generated | jq -r .ETag)
aws --no-verify-ssl s3api get-object --bucket bucket --key test-get-obj --if-match ${ETAG:1:32} downloaded
# jq -r already removes quotes, so use ETAG directly (handles both simple and multipart ETags)
aws --no-verify-ssl s3api get-object --bucket bucket --key test-get-obj --if-match "$ETAG" downloaded
diff -q generated downloaded
rm -f generated downloaded
- name: Show server logs on failure
if: failure()
run: |
echo "========================================="
echo "SeaweedFS Server Logs"
echo "========================================="
# Note: weed.log is relative to working-directory (weed/)
if [ -f weed.log ]; then
cat weed.log
else
echo "No weed.log file found"
fi
- name: Upload server logs on failure
if: failure()
uses: actions/upload-artifact@v5
with:
name: seaweedfs-logs
# Note: actions don't use defaults.run.working-directory, so path is relative to workspace root
path: weed/weed.log
retention-days: 3

242
BUCKET_POLICY_ENGINE_INTEGRATION.md

@ -0,0 +1,242 @@
# Bucket Policy Engine Integration - Complete
## Summary
Successfully integrated the `policy_engine` package to evaluate bucket policies for **all requests** (both anonymous and authenticated). This provides comprehensive AWS S3-compatible bucket policy support.
## What Changed
### 1. **New File: `s3api_bucket_policy_engine.go`**
Created a wrapper around `policy_engine.PolicyEngine` to:
- Load bucket policies from filer entries
- Sync policies from the bucket config cache
- Evaluate policies for any request (bucket, object, action, principal)
- Return structured results (allowed, evaluated, error)
### 2. **Modified: `s3api_server.go`**
- Added `policyEngine *BucketPolicyEngine` field to `S3ApiServer` struct
- Initialized the policy engine in `NewS3ApiServerWithStore()`
- Linked `IdentityAccessManagement` back to `S3ApiServer` for policy evaluation
### 3. **Modified: `auth_credentials.go`**
- Added `s3ApiServer *S3ApiServer` field to `IdentityAccessManagement` struct
- Added `buildPrincipalARN()` helper to convert identities to AWS ARN format
- **Integrated bucket policy evaluation into the authentication flow:**
- Policies are now checked **before** IAM/identity-based permissions
- Explicit `Deny` in bucket policy blocks access immediately
- Explicit `Allow` in bucket policy grants access and **bypasses IAM checks** (enables cross-account access)
- If no policy exists, falls through to normal IAM checks
- Policy evaluation errors result in access denial (fail-close security)
### 4. **Modified: `s3api_bucket_config.go`**
- Added policy engine sync when bucket configs are loaded
- Ensures policies are loaded into the engine for evaluation
### 5. **Modified: `auth_credentials_subscribe.go`**
- Added policy engine sync when bucket metadata changes
- Keeps the policy engine up-to-date via event-driven updates
## How It Works
### Anonymous Requests
```
1. Request comes in (no credentials)
2. Check ACL-based public access → if public, allow
3. Check bucket policy for anonymous ("*") access → if allowed, allow
4. Otherwise, deny
```
### Authenticated Requests (NEW!)
```
1. Request comes in (with credentials)
2. Authenticate user → get Identity
3. Build principal ARN (e.g., "arn:aws:iam::123456:user/bob")
4. Check bucket policy:
- If DENY → reject immediately
- If ALLOW → grant access immediately (bypasses IAM checks)
- If no policy or no matching statements → continue to step 5
5. Check IAM/identity-based permissions (only if not already allowed by bucket policy)
6. Allow or deny based on identity permissions
```
## Policy Evaluation Flow
```
┌─────────────────────────────────────────────────────────┐
│ Request (GET /bucket/file) │
└───────────────────────────┬─────────────────────────────┘
┌───────────▼──────────┐
│ Authenticate User │
│ (or Anonymous) │
└───────────┬──────────┘
┌───────────▼──────────────────────────────┐
│ Build Principal ARN │
│ - Anonymous: "*" │
│ - User: "arn:aws:iam::123456:user/bob" │
└───────────┬──────────────────────────────┘
┌───────────▼──────────────────────────────┐
│ Evaluate Bucket Policy (PolicyEngine) │
│ - Action: "s3:GetObject" │
│ - Resource: "arn:aws:s3:::bucket/file" │
│ - Principal: (from above) │
└───────────┬──────────────────────────────┘
┌─────────────┼─────────────┐
│ │ │
DENY │ ALLOW │ NO POLICY
│ │ │
▼ ▼ ▼
Reject Request Grant Access Continue
┌───────────────────┘
┌────────────▼─────────────┐
│ IAM/Identity Check │
│ (identity.canDo) │
└────────────┬─────────────┘
┌─────────┴─────────┐
│ │
ALLOW │ DENY │
▼ ▼
Grant Access Reject Request
```
## Example Policies That Now Work
### 1. **Public Read Access** (Anonymous)
```json
{
"Version": "2012-10-17",
"Statement": [{
"Effect": "Allow",
"Principal": "*",
"Action": "s3:GetObject",
"Resource": "arn:aws:s3:::mybucket/*"
}]
}
```
- Anonymous users can read all objects
- Authenticated users are also evaluated against this policy. If they don't match an explicit `Allow` for this action, they will fall back to their own IAM permissions
### 2. **Grant Access to Specific User** (Authenticated)
```json
{
"Version": "2012-10-17",
"Statement": [{
"Effect": "Allow",
"Principal": {"AWS": "arn:aws:iam::123456789012:user/bob"},
"Action": ["s3:GetObject", "s3:PutObject"],
"Resource": "arn:aws:s3:::mybucket/shared/*"
}]
}
```
- User "bob" can read/write objects in `/shared/` prefix
- Other users cannot (unless granted by their IAM policies)
### 3. **Deny Access to Specific Path** (Both)
```json
{
"Version": "2012-10-17",
"Statement": [{
"Effect": "Deny",
"Principal": "*",
"Action": "s3:*",
"Resource": "arn:aws:s3:::mybucket/confidential/*"
}]
}
```
- **No one** can access `/confidential/` objects
- Denies override all other allows (AWS policy evaluation rules)
## Performance Characteristics
### Policy Loading
- **Cold start**: Policy loaded from filer → parsed → compiled → cached
- **Warm path**: Policy retrieved from `BucketConfigCache` (already parsed)
- **Updates**: Event-driven sync via metadata subscription (real-time)
### Policy Evaluation
- **Compiled policies**: Pre-compiled regex patterns and matchers
- **Pattern cache**: Regex patterns cached with LRU eviction (max 1000)
- **Fast path**: Common patterns (`*`, exact matches) optimized
- **Case sensitivity**: Actions case-insensitive, resources case-sensitive (AWS-compatible)
### Overhead
- **Anonymous requests**: Minimal (policy already checked, now using compiled engine)
- **Authenticated requests**: ~1-2ms added for policy evaluation (compiled patterns)
- **No policy**: Near-zero overhead (quick indeterminate check)
## Testing
All tests pass:
```bash
✅ TestBucketPolicyValidationBasics
✅ TestPrincipalMatchesAnonymous
✅ TestActionToS3Action
✅ TestResourceMatching
✅ TestMatchesPatternRegexEscaping (security tests)
✅ TestActionMatchingCaseInsensitive
✅ TestResourceMatchingCaseSensitive
✅ All policy_engine package tests (30+ tests)
```
## Security Improvements
1. **Regex Metacharacter Escaping**: Patterns like `*.json` properly match only files ending in `.json` (not `filexjson`)
2. **Case-Insensitive Actions**: S3 actions matched case-insensitively per AWS spec
3. **Case-Sensitive Resources**: Resource paths matched case-sensitively for security
4. **Pattern Cache Size Limit**: Prevents DoS attacks via unbounded cache growth
5. **Principal Validation**: Supports `[]string` for manually constructed policies
## AWS Compatibility
The implementation follows AWS S3 bucket policy evaluation rules:
1. **Explicit Deny** always wins (checked first)
2. **Explicit Allow** grants access (checked second)
3. **Default Deny** if no matching statements (implicit)
4. Bucket policies work alongside IAM policies (both are evaluated)
## Files Changed
```
Modified:
weed/s3api/auth_credentials.go (+47 lines)
weed/s3api/auth_credentials_subscribe.go (+8 lines)
weed/s3api/s3api_bucket_config.go (+8 lines)
weed/s3api/s3api_server.go (+5 lines)
New:
weed/s3api/s3api_bucket_policy_engine.go (115 lines)
```
## Migration Notes
- **Backward Compatible**: Existing setups without bucket policies work unchanged
- **No Breaking Changes**: All existing ACL and IAM-based authorization still works
- **Additive Feature**: Bucket policies are an additional layer of authorization
- **Performance**: Minimal impact on existing workloads
## Future Enhancements
Potential improvements (not implemented yet):
- [ ] Condition support (IP address, time-based, etc.) - already in policy_engine
- [ ] Cross-account policies (different AWS accounts)
- [ ] Policy validation API endpoint
- [ ] Policy simulation/testing tool
- [ ] Metrics for policy evaluations (allow/deny counts)
## Conclusion
Bucket policies now work for **all requests** in SeaweedFS S3 API:
- ✅ Anonymous requests (public access)
- ✅ Authenticated requests (user-specific policies)
- ✅ High performance (compiled policies, caching)
- ✅ AWS-compatible (follows AWS evaluation rules)
- ✅ Secure (proper escaping, case sensitivity)
The integration is complete, tested, and ready for use!

131
docker/Dockerfile.foundationdb_large

@ -0,0 +1,131 @@
FROM golang:1.24 AS builder
RUN apt-get update && \
apt-get install -y build-essential wget ca-certificates && \
rm -rf /var/lib/apt/lists/*
ARG FDB_VERSION=7.4.5
ENV FDB_VERSION=${FDB_VERSION}
ARG TARGETARCH
# Install FoundationDB client libraries with SHA256 checksum verification
# Known SHA256 checksums for FoundationDB client packages (verified 2025-01-19)
# To add checksums for new versions: run docker/get_fdb_checksum.sh <version> <arch>
RUN cd /tmp && \
case "${TARGETARCH}" in \
"amd64") FDB_ARCH="amd64"; PACKAGE_ARCH="amd64" ;; \
"arm64") FDB_ARCH="arm64"; PACKAGE_ARCH="aarch64" ;; \
*) echo "Unsupported architecture: ${TARGETARCH}" >&2; exit 1 ;; \
esac && \
case "${FDB_VERSION}_${FDB_ARCH}" in \
"7.4.5_amd64") \
EXPECTED_SHA256="eea6b98cf386a0848655b2e196d18633662a7440a7ee061c10e32153c7e7e112" ;; \
"7.4.5_arm64") \
EXPECTED_SHA256="f2176b86b7e1b561c3632b4e6e7efb82e3b8f57c2ff0d0ac4671e742867508aa" ;; \
"7.3.43_amd64") \
EXPECTED_SHA256="c3fa0a59c7355b914a1455dac909238d5ea3b6c6bc7b530af8597e6487c1651a" ;; \
"7.3.43_arm64") \
echo "ERROR: FoundationDB ${FDB_VERSION} does not publish arm64 client packages." >&2; \
echo "Please upgrade to 7.4.5+ when targeting arm64." >&2; \
exit 1 ;; \
*) \
echo "ERROR: No checksum available for FDB version ${FDB_VERSION} on ${FDB_ARCH}" >&2; \
echo "This is a security requirement. To add verification:" >&2; \
echo " 1. Run: docker/get_fdb_checksum.sh ${FDB_VERSION} ${FDB_ARCH}" >&2; \
echo " 2. Add the checksum to this Dockerfile" >&2; \
echo "Refusing to proceed without checksum verification." >&2; \
exit 1 ;; \
esac && \
PACKAGE="foundationdb-clients_${FDB_VERSION}-1_${PACKAGE_ARCH}.deb" && \
wget --timeout=30 --tries=3 https://github.com/apple/foundationdb/releases/download/${FDB_VERSION}/${PACKAGE} && \
echo "${EXPECTED_SHA256} ${PACKAGE}" | sha256sum -c - || \
(echo "ERROR: Checksum verification failed for FoundationDB ${FDB_VERSION} (${FDB_ARCH})" >&2; \
echo "Expected: ${EXPECTED_SHA256}" >&2; \
echo "This indicates either a corrupted download or potential tampering." >&2; \
exit 1) && \
dpkg -i ${PACKAGE} && \
rm ${PACKAGE}
# Set up FoundationDB environment variables for CGO
ENV CGO_CFLAGS="-I/usr/include/foundationdb"
ENV CGO_LDFLAGS="-lfdb_c"
# build SeaweedFS sources; prefer local context but fall back to git clone if context only has docker files
ARG SOURCE_REF=master
WORKDIR /go/src/github.com/seaweedfs/seaweedfs
COPY . .
RUN set -euo pipefail && \
if [ ! -d weed ]; then \
echo "Local build context does not include SeaweedFS sources; cloning ${SOURCE_REF}" >&2; \
mkdir -p /tmp/local-context && cp -a /go/src/github.com/seaweedfs/seaweedfs/. /tmp/local-context && \
cd / && rm -rf /go/src/github.com/seaweedfs/seaweedfs && \
git clone --depth 1 --branch ${SOURCE_REF} https://github.com/seaweedfs/seaweedfs /go/src/github.com/seaweedfs/seaweedfs && \
cp -a /tmp/local-context/. /go/src/github.com/seaweedfs/seaweedfs/docker/ && \
rm -rf /tmp/local-context && \
cd /go/src/github.com/seaweedfs/seaweedfs; \
fi && \
cd weed \
&& COMMIT_SHA=$(git rev-parse --short HEAD 2>/dev/null || echo "unknown") \
&& export LDFLAGS="-X github.com/seaweedfs/seaweedfs/weed/util/version.COMMIT=${COMMIT_SHA}" \
&& go install -tags "5BytesOffset foundationdb" -ldflags "${LDFLAGS}"
FROM debian:bookworm-slim AS final
LABEL author="Chris Lu"
# Install runtime dependencies first
RUN apt-get update && \
apt-get install -y --no-install-recommends \
ca-certificates \
fuse \
wget && \
rm -rf /var/lib/apt/lists/*
# Reuse FoundationDB artifacts installed during the build stage
COPY --from=builder /usr/lib/libfdb_c* /usr/lib/
COPY --from=builder /usr/lib/foundationdb /usr/lib/foundationdb
COPY --from=builder /usr/bin/fdb* /usr/bin/
RUN ldconfig
# Copy SeaweedFS binary and configuration
COPY --from=builder /go/bin/weed /usr/bin/
RUN mkdir -p /etc/seaweedfs
COPY --from=builder /go/src/github.com/seaweedfs/seaweedfs/docker/filer_foundationdb.toml /etc/seaweedfs/filer.toml
COPY --from=builder /go/src/github.com/seaweedfs/seaweedfs/docker/entrypoint.sh /entrypoint.sh
# Create non-root user
RUN groupadd -g 1000 seaweed && \
useradd -u 1000 -g seaweed -s /bin/bash -m seaweed
# volume server gprc port
EXPOSE 18080
# volume server http port
EXPOSE 8080
# filer server gprc port
EXPOSE 18888
# filer server http port
EXPOSE 8888
# master server shared gprc port
EXPOSE 19333
# master server shared http port
EXPOSE 9333
# s3 server http port
EXPOSE 8333
# webdav server http port
EXPOSE 7333
# Create data directory and set proper ownership for seaweed user
RUN mkdir -p /data && \
chown -R seaweed:seaweed /data && \
chown -R seaweed:seaweed /etc/seaweedfs && \
chmod 755 /entrypoint.sh
VOLUME /data
WORKDIR /data
# Switch to non-root user
USER seaweed
ENTRYPOINT ["/entrypoint.sh"]

19
docker/filer_foundationdb.toml

@ -0,0 +1,19 @@
[filer.options]
# with http DELETE, by default the filer would check whether a folder is empty.
# recursive_delete will delete all sub folders and files, similar to "rm -Rf"
recursive_delete = false
####################################################
# FoundationDB store
####################################################
[foundationdb]
enabled = true
cluster_file = "/etc/foundationdb/fdb.cluster"
api_version = 740
# Optional: timeout for FDB operations (default: 10s)
# timeout = "10s"
# Optional: max retry delay for retryable errors (default: 1s)
# max_retry_delay = "1s"
# Optional: directory prefix for storing SeaweedFS data (default: "seaweedfs")
# directory_prefix = "seaweedfs"

61
docker/get_fdb_checksum.sh

@ -0,0 +1,61 @@
#!/bin/bash
# Helper script to get SHA256 checksum for FoundationDB client package
# Usage: ./get_fdb_checksum.sh <version> [arch]
# Example: ./get_fdb_checksum.sh 7.4.5 amd64
# Example: ./get_fdb_checksum.sh 7.4.5 arm64
set -euo pipefail
if [ $# -lt 1 ] || [ $# -gt 2 ]; then
echo "Usage: $0 <fdb_version> [arch]" >&2
echo "Example: $0 7.4.5" >&2
echo "Example: $0 7.4.5 arm64" >&2
exit 1
fi
FDB_VERSION="$1"
FDB_ARCH="${2:-amd64}"
case "$FDB_ARCH" in
"amd64")
CANONICAL_ARCH="amd64"
PACKAGE_ARCH="amd64"
;;
"arm64"|"aarch64")
CANONICAL_ARCH="arm64"
PACKAGE_ARCH="aarch64"
;;
*)
echo "Error: Architecture must be 'amd64', 'arm64', or 'aarch64'" >&2
exit 1
;;
esac
PACKAGE="foundationdb-clients_${FDB_VERSION}-1_${PACKAGE_ARCH}.deb"
URL="https://github.com/apple/foundationdb/releases/download/${FDB_VERSION}/${PACKAGE}"
echo "Downloading FoundationDB ${FDB_VERSION} client package for ${FDB_ARCH}..."
echo "URL: ${URL}"
echo ""
# Download to temp directory
TEMP_DIR=$(mktemp -d)
trap 'rm -rf "${TEMP_DIR}"' EXIT
cd "${TEMP_DIR}"
if wget --timeout=30 --tries=3 -q "${URL}"; then
CHECKSUM=$(sha256sum "${PACKAGE}" | awk '{print $1}')
echo "✓ Download successful"
echo ""
echo "SHA256 Checksum:"
echo "${CHECKSUM}"
echo ""
echo "Add this to Dockerfile.foundationdb_large:"
echo " \"${FDB_VERSION}_${CANONICAL_ARCH}\") \\"
echo " EXPECTED_SHA256=\"${CHECKSUM}\" ;; \\"
else
echo "✗ Failed to download package from ${URL}" >&2
echo "Please verify the version number, architecture, and URL" >&2
exit 1
fi

29
go.mod

@ -53,7 +53,7 @@ require (
github.com/klauspost/compress v1.18.1
github.com/klauspost/reedsolomon v1.12.5
github.com/kurin/blazer v0.5.3
github.com/linxGnu/grocksdb v1.10.2
github.com/linxGnu/grocksdb v1.10.3
github.com/mailru/easyjson v0.7.7 // indirect
github.com/mattn/go-isatty v0.0.20 // indirect
github.com/modern-go/concurrent v0.0.0-20180306012644-bacd9c7ef1dd // indirect
@ -90,20 +90,20 @@ require (
github.com/xdg-go/scram v1.1.2 // indirect
github.com/xdg-go/stringprep v1.0.4 // indirect
github.com/youmark/pkcs8 v0.0.0-20240726163527-a2c0da244d78 // indirect
go.etcd.io/etcd/client/v3 v3.6.5
go.etcd.io/etcd/client/v3 v3.6.6
go.mongodb.org/mongo-driver v1.17.6
go.opencensus.io v0.24.0 // indirect
gocloud.dev v0.43.0
gocloud.dev/pubsub/natspubsub v0.43.0
gocloud.dev/pubsub/rabbitpubsub v0.43.0
golang.org/x/crypto v0.43.0
golang.org/x/crypto v0.45.0
golang.org/x/exp v0.0.0-20250811191247-51f88131bc50
golang.org/x/image v0.32.0
golang.org/x/net v0.46.0
golang.org/x/image v0.33.0
golang.org/x/net v0.47.0
golang.org/x/oauth2 v0.30.0 // indirect
golang.org/x/sys v0.38.0
golang.org/x/text v0.30.0 // indirect
golang.org/x/tools v0.37.0 // indirect
golang.org/x/text v0.31.0 // indirect
golang.org/x/tools v0.38.0 // indirect
golang.org/x/xerrors v0.0.0-20240903120638-7835f813f4da // indirect
google.golang.org/api v0.247.0
google.golang.org/genproto v0.0.0-20250715232539-7130f93afb79 // indirect
@ -123,6 +123,7 @@ require (
github.com/Jille/raft-grpc-transport v1.6.1
github.com/ThreeDotsLabs/watermill v1.5.1
github.com/a-h/templ v0.3.943
github.com/apple/foundationdb/bindings/go v0.0.0-20240515141816-262c6fe778ad
github.com/arangodb/go-driver v1.6.7
github.com/armon/go-metrics v0.4.1
github.com/aws/aws-sdk-go-v2 v1.39.5
@ -131,12 +132,12 @@ require (
github.com/aws/aws-sdk-go-v2/service/s3 v1.89.1
github.com/cognusion/imaging v1.0.2
github.com/fluent/fluent-logger-golang v1.10.1
github.com/getsentry/sentry-go v0.36.1
github.com/getsentry/sentry-go v0.38.0
github.com/gin-contrib/sessions v1.0.4
github.com/gin-gonic/gin v1.11.0
github.com/golang-jwt/jwt/v5 v5.3.0
github.com/google/flatbuffers/go v0.0.0-20230108230133-3b8644d32c50
github.com/hanwen/go-fuse/v2 v2.8.0
github.com/hanwen/go-fuse/v2 v2.9.0
github.com/hashicorp/raft v1.7.3
github.com/hashicorp/raft-boltdb/v2 v2.3.1
github.com/hashicorp/vault/api v1.22.0
@ -159,9 +160,9 @@ require (
github.com/xeipuuv/gojsonschema v1.2.0
github.com/ydb-platform/ydb-go-sdk-auth-environ v0.5.0
github.com/ydb-platform/ydb-go-sdk/v3 v3.113.5
go.etcd.io/etcd/client/pkg/v3 v3.6.5
go.etcd.io/etcd/client/pkg/v3 v3.6.6
go.uber.org/atomic v1.11.0
golang.org/x/sync v0.17.0
golang.org/x/sync v0.18.0
golang.org/x/tools/godoc v0.1.0-deprecated
google.golang.org/grpc/security/advancedtls v1.0.0
)
@ -220,7 +221,7 @@ require (
go.uber.org/mock v0.5.0 // indirect
go.yaml.in/yaml/v2 v2.4.2 // indirect
go.yaml.in/yaml/v3 v3.0.4 // indirect
golang.org/x/mod v0.28.0 // indirect
golang.org/x/mod v0.29.0 // indirect
gonum.org/v1/gonum v0.16.0 // indirect
)
@ -432,7 +433,7 @@ require (
github.com/zeebo/blake3 v0.2.4 // indirect
github.com/zeebo/errs v1.4.0 // indirect
go.etcd.io/bbolt v1.4.2 // indirect
go.etcd.io/etcd/api/v3 v3.6.5 // indirect
go.etcd.io/etcd/api/v3 v3.6.6 // indirect
go.opentelemetry.io/auto/sdk v1.1.0 // indirect
go.opentelemetry.io/contrib/detectors/gcp v1.37.0 // indirect
go.opentelemetry.io/contrib/instrumentation/google.golang.org/grpc/otelgrpc v0.62.0 // indirect
@ -445,7 +446,7 @@ require (
go.uber.org/multierr v1.11.0 // indirect
go.uber.org/zap v1.27.0 // indirect
golang.org/x/arch v0.20.0 // indirect
golang.org/x/term v0.36.0 // indirect
golang.org/x/term v0.37.0 // indirect
golang.org/x/time v0.12.0 // indirect
google.golang.org/genproto/googleapis/api v0.0.0-20250818200422-3122310a409c // indirect
google.golang.org/genproto/googleapis/rpc v0.0.0-20250818200422-3122310a409c // indirect

60
go.sum

@ -651,6 +651,10 @@ github.com/andybalholm/cascadia v1.3.3/go.mod h1:xNd9bqTn98Ln4DwST8/nG+H0yuB8Hmg
github.com/antihax/optional v1.0.0/go.mod h1:uupD/76wgC+ih3iEmQUL+0Ugr19nfwCT1kdvxnR2qWY=
github.com/apache/arrow/go/v10 v10.0.1/go.mod h1:YvhnlEePVnBS4+0z3fhPfUy7W1Ikj0Ih0vcRo/gZ1M0=
github.com/apache/thrift v0.16.0/go.mod h1:PHK3hniurgQaNMZYaCLEqXKsYK8upmhPbmdP2FXSqgU=
github.com/apple/foundationdb/bindings/go v0.0.0-20240515141816-262c6fe778ad h1:fQBkhYv86zyW95PWhzBlkgz3NoY1ue0L+8oYBaoCMbg=
github.com/apple/foundationdb/bindings/go v0.0.0-20240515141816-262c6fe778ad/go.mod h1:OMVSB21p9+xQUIqlGizHPZfjK+SHws1ht+ZytVDoz9U=
github.com/apple/foundationdb/bindings/go v0.0.0-20250828195015-ba4c89167099 h1:rLHyln0+S1BNj6RgMo1t5uyB8qoCDhgt/P1Z6tdc5rE=
github.com/apple/foundationdb/bindings/go v0.0.0-20250828195015-ba4c89167099/go.mod h1:OMVSB21p9+xQUIqlGizHPZfjK+SHws1ht+ZytVDoz9U=
github.com/appscode/go-querystring v0.0.0-20170504095604-0126cfb3f1dc h1:LoL75er+LKDHDUfU5tRvFwxH0LjPpZN8OoG8Ll+liGU=
github.com/appscode/go-querystring v0.0.0-20170504095604-0126cfb3f1dc/go.mod h1:w648aMHEgFYS6xb0KVMMtZ2uMeemhiKCuD2vj6gY52A=
github.com/arangodb/go-driver v1.6.7 h1:9FBUsH60cKu7DjFGozTsaqWMy+3UeEplplqUn4yEcg4=
@ -923,8 +927,8 @@ github.com/gabriel-vasile/mimetype v1.4.9 h1:5k+WDwEsD9eTLL8Tz3L0VnmVh9QxGjRmjBv
github.com/gabriel-vasile/mimetype v1.4.9/go.mod h1:WnSQhFKJuBlRyLiKohA/2DtIlPFAbguNaG7QCHcyGok=
github.com/geoffgarside/ber v1.2.0 h1:/loowoRcs/MWLYmGX9QtIAbA+V/FrnVLsMMPhwiRm64=
github.com/geoffgarside/ber v1.2.0/go.mod h1:jVPKeCbj6MvQZhwLYsGwaGI52oUorHoHKNecGT85ZCc=
github.com/getsentry/sentry-go v0.36.1 h1:kMJt0WWsxWATUxkvFgVBZdIeHSk/Oiv5P0jZ9e5m/Lw=
github.com/getsentry/sentry-go v0.36.1/go.mod h1:p5Im24mJBeruET8Q4bbcMfCQ+F+Iadc4L48tB1apo2c=
github.com/getsentry/sentry-go v0.38.0 h1:S8Xui7gLeAvXINVLMOaX94HnsDf1GexnfXGSNC4+KQs=
github.com/getsentry/sentry-go v0.38.0/go.mod h1:eRXCoh3uvmjQLY6qu63BjUZnaBu5L5WhMV1RwYO8W5s=
github.com/ghodss/yaml v1.0.0/go.mod h1:4dBDuWmgqj2HViK6kFavaiC9ZROes6MMH2rRYeMEF04=
github.com/gin-contrib/sessions v1.0.4 h1:ha6CNdpYiTOK/hTp05miJLbpTSNfOnFg5Jm2kbcqy8U=
github.com/gin-contrib/sessions v1.0.4/go.mod h1:ccmkrb2z6iU2osiAHZG3x3J4suJK+OU27oqzlWOqQgs=
@ -1184,8 +1188,8 @@ github.com/grpc-ecosystem/grpc-gateway/v2 v2.27.1 h1:X5VWvz21y3gzm9Nw/kaUeku/1+u
github.com/grpc-ecosystem/grpc-gateway/v2 v2.27.1/go.mod h1:Zanoh4+gvIgluNqcfMVTJueD4wSS5hT7zTt4Mrutd90=
github.com/hailocab/go-hostpool v0.0.0-20160125115350-e80d13ce29ed h1:5upAirOpQc1Q53c0bnx2ufif5kANL7bfZWcc6VJWJd8=
github.com/hailocab/go-hostpool v0.0.0-20160125115350-e80d13ce29ed/go.mod h1:tMWxXQ9wFIaZeTI9F+hmhFiGpFmhOHzyShyFUhRm0H4=
github.com/hanwen/go-fuse/v2 v2.8.0 h1:wV8rG7rmCz8XHSOwBZhG5YcVqcYjkzivjmbaMafPlAs=
github.com/hanwen/go-fuse/v2 v2.8.0/go.mod h1:yE6D2PqWwm3CbYRxFXV9xUd8Md5d6NG0WBs5spCswmI=
github.com/hanwen/go-fuse/v2 v2.9.0 h1:0AOGUkHtbOVeyGLr0tXupiid1Vg7QB7M6YUcdmVdC58=
github.com/hanwen/go-fuse/v2 v2.9.0/go.mod h1:yE6D2PqWwm3CbYRxFXV9xUd8Md5d6NG0WBs5spCswmI=
github.com/hashicorp/errwrap v1.0.0/go.mod h1:YH+1FKiLXxHSkmPseP+kNlulaMuP3n2brvKWEqk/Jc4=
github.com/hashicorp/errwrap v1.1.0 h1:OxrOeh75EUXMY8TBjag2fzXGZ40LB6IKw45YeGUDY2I=
github.com/hashicorp/errwrap v1.1.0/go.mod h1:YH+1FKiLXxHSkmPseP+kNlulaMuP3n2brvKWEqk/Jc4=
@ -1363,8 +1367,8 @@ github.com/lib/pq v1.10.9 h1:YXG7RB+JIjhP29X+OtkiDnYaXQwpS4JEWq7dtCCRUEw=
github.com/lib/pq v1.10.9/go.mod h1:AlVN5x4E4T544tWzH6hKfbfQvm3HdbOxrmggDNAPY9o=
github.com/linkedin/goavro/v2 v2.14.0 h1:aNO/js65U+Mwq4yB5f1h01c3wiM458qtRad1DN0CMUI=
github.com/linkedin/goavro/v2 v2.14.0/go.mod h1:KXx+erlq+RPlGSPmLF7xGo6SAbh8sCQ53x064+ioxhk=
github.com/linxGnu/grocksdb v1.10.2 h1:y0dXsWYULY15/BZMcwAZzLd13ZuyA470vyoNzWwmqG0=
github.com/linxGnu/grocksdb v1.10.2/go.mod h1:C3CNe9UYc9hlEM2pC82AqiGS3LRW537u9LFV4wIZuHk=
github.com/linxGnu/grocksdb v1.10.3 h1:0laII9AQ6kFxo5SjhdTfSh9EgF20piD6TMHK6YuDm+4=
github.com/linxGnu/grocksdb v1.10.3/go.mod h1:OLQKZwiKwaJiAVCsOzWKvwiLwfZ5Vz8Md5TYR7t7pM8=
github.com/lithammer/shortuuid/v3 v3.0.7 h1:trX0KTHy4Pbwo/6ia8fscyHoGA+mf1jWbPJVuvyJQQ8=
github.com/lithammer/shortuuid/v3 v3.0.7/go.mod h1:vMk8ke37EmiewwolSO1NLW8vP4ZaKlRuDIi8tWWmAts=
github.com/lpar/date v1.0.0 h1:bq/zVqFTUmsxvd/CylidY4Udqpr9BOFrParoP6p0x/I=
@ -1813,12 +1817,12 @@ go.einride.tech/aip v0.73.0 h1:bPo4oqBo2ZQeBKo4ZzLb1kxYXTY1ysJhpvQyfuGzvps=
go.einride.tech/aip v0.73.0/go.mod h1:Mj7rFbmXEgw0dq1dqJ7JGMvYCZZVxmGOR3S4ZcV5LvQ=
go.etcd.io/bbolt v1.4.2 h1:IrUHp260R8c+zYx/Tm8QZr04CX+qWS5PGfPdevhdm1I=
go.etcd.io/bbolt v1.4.2/go.mod h1:Is8rSHO/b4f3XigBC0lL0+4FwAQv3HXEEIgFMuKHceM=
go.etcd.io/etcd/api/v3 v3.6.5 h1:pMMc42276sgR1j1raO/Qv3QI9Af/AuyQUW6CBAWuntA=
go.etcd.io/etcd/api/v3 v3.6.5/go.mod h1:ob0/oWA/UQQlT1BmaEkWQzI0sJ1M0Et0mMpaABxguOQ=
go.etcd.io/etcd/client/pkg/v3 v3.6.5 h1:Duz9fAzIZFhYWgRjp/FgNq2gO1jId9Yae/rLn3RrBP8=
go.etcd.io/etcd/client/pkg/v3 v3.6.5/go.mod h1:8Wx3eGRPiy0qOFMZT/hfvdos+DjEaPxdIDiCDUv/FQk=
go.etcd.io/etcd/client/v3 v3.6.5 h1:yRwZNFBx/35VKHTcLDeO7XVLbCBFbPi+XV4OC3QJf2U=
go.etcd.io/etcd/client/v3 v3.6.5/go.mod h1:ZqwG/7TAFZ0BJ0jXRPoJjKQJtbFo/9NIY8uoFFKcCyo=
go.etcd.io/etcd/api/v3 v3.6.6 h1:mcaMp3+7JawWv69p6QShYWS8cIWUOl32bFLb6qf8pOQ=
go.etcd.io/etcd/api/v3 v3.6.6/go.mod h1:f/om26iXl2wSkcTA1zGQv8reJRSLVdoEBsi4JdfMrx4=
go.etcd.io/etcd/client/pkg/v3 v3.6.6 h1:uoqgzSOv2H9KlIF5O1Lsd8sW+eMLuV6wzE3q5GJGQNs=
go.etcd.io/etcd/client/pkg/v3 v3.6.6/go.mod h1:YngfUVmvsvOJ2rRgStIyHsKtOt9SZI2aBJrZiWJhCbI=
go.etcd.io/etcd/client/v3 v3.6.6 h1:G5z1wMf5B9SNexoxOHUGBaULurOZPIgGPsW6CN492ec=
go.etcd.io/etcd/client/v3 v3.6.6/go.mod h1:36Qv6baQ07znPR3+n7t+Rk5VHEzVYPvFfGmfF4wBHV8=
go.mongodb.org/mongo-driver v1.17.6 h1:87JUG1wZfWsr6rIz3ZmpH90rL5tea7O3IHuSwHUpsss=
go.mongodb.org/mongo-driver v1.17.6/go.mod h1:Hy04i7O2kC4RS06ZrhPRqj/u4DTYkFDAAccj+rVKqgQ=
go.opencensus.io v0.21.0/go.mod h1:mSImk1erAIZhrmZN+AvHh14ztQfjbGwt4TtuofqLduU=
@ -1915,8 +1919,8 @@ golang.org/x/crypto v0.22.0/go.mod h1:vr6Su+7cTlO45qkww3VDJlzDn0ctJvRgYbC2NvXHt+
golang.org/x/crypto v0.23.0/go.mod h1:CKFgDieR+mRhux2Lsu27y0fO304Db0wZe70UKqHu0v8=
golang.org/x/crypto v0.31.0/go.mod h1:kDsLvtWBEx7MV9tJOj9bnXsPbxwJQ6csT/x4KIN4Ssk=
golang.org/x/crypto v0.33.0/go.mod h1:bVdXmD7IV/4GdElGPozy6U7lWdRXA4qyRVGJV57uQ5M=
golang.org/x/crypto v0.43.0 h1:dduJYIi3A3KOfdGOHX8AVZ/jGiyPa3IbBozJ5kNuE04=
golang.org/x/crypto v0.43.0/go.mod h1:BFbav4mRNlXJL4wNeejLpWxB7wMbc79PdRGhWKncxR0=
golang.org/x/crypto v0.45.0 h1:jMBrvKuj23MTlT0bQEOBcAE0mjg8mK9RXFhRH6nyF3Q=
golang.org/x/crypto v0.45.0/go.mod h1:XTGrrkGJve7CYK7J8PEww4aY7gM3qMCElcJQ8n8JdX4=
golang.org/x/exp v0.0.0-20180321215751-8460e604b9de/go.mod h1:CJ0aWSM057203Lf6IL+f9T1iT9GByDxfZKAQTCR3kQA=
golang.org/x/exp v0.0.0-20180807140117-3d87b88a115f/go.mod h1:CJ0aWSM057203Lf6IL+f9T1iT9GByDxfZKAQTCR3kQA=
golang.org/x/exp v0.0.0-20190121172915-509febef88a4/go.mod h1:CJ0aWSM057203Lf6IL+f9T1iT9GByDxfZKAQTCR3kQA=
@ -1947,8 +1951,8 @@ golang.org/x/image v0.0.0-20210607152325-775e3b0c77b9/go.mod h1:023OzeP/+EPmXeap
golang.org/x/image v0.0.0-20210628002857-a66eb6448b8d/go.mod h1:023OzeP/+EPmXeapQh35lcL3II3LrY8Ic+EFFKVhULM=
golang.org/x/image v0.0.0-20211028202545-6944b10bf410/go.mod h1:023OzeP/+EPmXeapQh35lcL3II3LrY8Ic+EFFKVhULM=
golang.org/x/image v0.0.0-20220302094943-723b81ca9867/go.mod h1:023OzeP/+EPmXeapQh35lcL3II3LrY8Ic+EFFKVhULM=
golang.org/x/image v0.32.0 h1:6lZQWq75h7L5IWNk0r+SCpUJ6tUVd3v4ZHnbRKLkUDQ=
golang.org/x/image v0.32.0/go.mod h1:/R37rrQmKXtO6tYXAjtDLwQgFLHmhW+V6ayXlxzP2Pc=
golang.org/x/image v0.33.0 h1:LXRZRnv1+zGd5XBUVRFmYEphyyKJjQjCRiOuAP3sZfQ=
golang.org/x/image v0.33.0/go.mod h1:DD3OsTYT9chzuzTQt+zMcOlBHgfoKQb1gry8p76Y1sc=
golang.org/x/lint v0.0.0-20181026193005-c67002cb31c3/go.mod h1:UVdnD1Gm6xHRNCYTkRU2/jEulfH38KcIWyp/GAMgvoE=
golang.org/x/lint v0.0.0-20190227174305-5b3e6a55c961/go.mod h1:wehouNa3lNwaWXcvxsM5YxQ5yQlVC4a0KAMCusXpPoU=
golang.org/x/lint v0.0.0-20190301231843-5614ed5bae6f/go.mod h1:UVdnD1Gm6xHRNCYTkRU2/jEulfH38KcIWyp/GAMgvoE=
@ -1983,8 +1987,8 @@ golang.org/x/mod v0.13.0/go.mod h1:hTbmBsO62+eylJbnUtE2MGJUyE7QWk4xUqPFrRgJ+7c=
golang.org/x/mod v0.14.0/go.mod h1:hTbmBsO62+eylJbnUtE2MGJUyE7QWk4xUqPFrRgJ+7c=
golang.org/x/mod v0.15.0/go.mod h1:hTbmBsO62+eylJbnUtE2MGJUyE7QWk4xUqPFrRgJ+7c=
golang.org/x/mod v0.17.0/go.mod h1:hTbmBsO62+eylJbnUtE2MGJUyE7QWk4xUqPFrRgJ+7c=
golang.org/x/mod v0.28.0 h1:gQBtGhjxykdjY9YhZpSlZIsbnaE2+PgjfLWUQTnoZ1U=
golang.org/x/mod v0.28.0/go.mod h1:yfB/L0NOf/kmEbXjzCPOx1iK1fRutOydrCMsqRhEBxI=
golang.org/x/mod v0.29.0 h1:HV8lRxZC4l2cr3Zq1LvtOsi/ThTgWnUk/y64QSs8GwA=
golang.org/x/mod v0.29.0/go.mod h1:NyhrlYXJ2H4eJiRy/WDBO6HMqZQ6q9nk4JzS3NuCK+w=
golang.org/x/net v0.0.0-20180724234803-3673e40ba225/go.mod h1:mL1N/T3taQHkDXs73rZJwtUhF3w3ftmwwsq0BUmARs4=
golang.org/x/net v0.0.0-20180826012351-8a410e7b638d/go.mod h1:mL1N/T3taQHkDXs73rZJwtUhF3w3ftmwwsq0BUmARs4=
golang.org/x/net v0.0.0-20180906233101-161cd47e91fd/go.mod h1:mL1N/T3taQHkDXs73rZJwtUhF3w3ftmwwsq0BUmARs4=
@ -2054,8 +2058,8 @@ golang.org/x/net v0.20.0/go.mod h1:z8BVo6PvndSri0LbOE3hAn0apkU+1YvI6E70E9jsnvY=
golang.org/x/net v0.21.0/go.mod h1:bIjVDfnllIU7BJ2DNgfnXvpSvtn8VRwhlsaeUTyUS44=
golang.org/x/net v0.25.0/go.mod h1:JkAGAh7GEvH74S6FOH42FLoXpXbE/aqXSrIQjXgsiwM=
golang.org/x/net v0.33.0/go.mod h1:HXLR5J+9DxmrqMwG9qjGCxZ+zKXxBru04zlTvWlWuN4=
golang.org/x/net v0.46.0 h1:giFlY12I07fugqwPuWJi68oOnpfqFnJIJzaIIm2JVV4=
golang.org/x/net v0.46.0/go.mod h1:Q9BGdFy1y4nkUwiLvT5qtyhAnEHgnQ/zd8PfU6nc210=
golang.org/x/net v0.47.0 h1:Mx+4dIFzqraBXUugkia1OOvlD6LemFo1ALMHjrXDOhY=
golang.org/x/net v0.47.0/go.mod h1:/jNxtkgq5yWUGYkaZGqo27cfGZ1c5Nen03aYrrKpVRU=
golang.org/x/oauth2 v0.0.0-20180821212333-d2e6202438be/go.mod h1:N/0e6XlmueqKjAGxoOufVs8QHGRruUQn6yWY3a++T0U=
golang.org/x/oauth2 v0.0.0-20190226205417-e64efc72b421/go.mod h1:gOpvHmFTYa4IltrdGE7lF6nIHvwfUNPOp7c8zoXwtLw=
golang.org/x/oauth2 v0.0.0-20190604053449-0f29369cfe45/go.mod h1:gOpvHmFTYa4IltrdGE7lF6nIHvwfUNPOp7c8zoXwtLw=
@ -2108,8 +2112,8 @@ golang.org/x/sync v0.6.0/go.mod h1:Czt+wKu1gCyEFDUtn0jG5QVvpJ6rzVqr5aXyt9drQfk=
golang.org/x/sync v0.7.0/go.mod h1:Czt+wKu1gCyEFDUtn0jG5QVvpJ6rzVqr5aXyt9drQfk=
golang.org/x/sync v0.10.0/go.mod h1:Czt+wKu1gCyEFDUtn0jG5QVvpJ6rzVqr5aXyt9drQfk=
golang.org/x/sync v0.11.0/go.mod h1:Czt+wKu1gCyEFDUtn0jG5QVvpJ6rzVqr5aXyt9drQfk=
golang.org/x/sync v0.17.0 h1:l60nONMj9l5drqw6jlhIELNv9I0A4OFgRsG9k2oT9Ug=
golang.org/x/sync v0.17.0/go.mod h1:9KTHXmSnoGruLpwFjVSX0lNNA75CykiMECbovNTZqGI=
golang.org/x/sync v0.18.0 h1:kr88TuHDroi+UVf+0hZnirlk8o8T+4MrK6mr60WkH/I=
golang.org/x/sync v0.18.0/go.mod h1:9KTHXmSnoGruLpwFjVSX0lNNA75CykiMECbovNTZqGI=
golang.org/x/sys v0.0.0-20180810173357-98c5dad5d1a0/go.mod h1:STP8DvDyc/dI5b8T5hshtkjS+E42TnysNCUPdjciGhY=
golang.org/x/sys v0.0.0-20180830151530-49385e6e1522/go.mod h1:STP8DvDyc/dI5b8T5hshtkjS+E42TnysNCUPdjciGhY=
golang.org/x/sys v0.0.0-20180905080454-ebe1bf3edb33/go.mod h1:STP8DvDyc/dI5b8T5hshtkjS+E42TnysNCUPdjciGhY=
@ -2235,8 +2239,8 @@ golang.org/x/term v0.19.0/go.mod h1:2CuTdWZ7KHSQwUzKva0cbMg6q2DMI3Mmxp+gKJbskEk=
golang.org/x/term v0.20.0/go.mod h1:8UkIAJTvZgivsXaD6/pH6U9ecQzZ45awqEOzuCvwpFY=
golang.org/x/term v0.27.0/go.mod h1:iMsnZpn0cago0GOrHO2+Y7u7JPn5AylBrcoWkElMTSM=
golang.org/x/term v0.29.0/go.mod h1:6bl4lRlvVuDgSf3179VpIxBF0o10JUpXWOnI7nErv7s=
golang.org/x/term v0.36.0 h1:zMPR+aF8gfksFprF/Nc/rd1wRS1EI6nDBGyWAvDzx2Q=
golang.org/x/term v0.36.0/go.mod h1:Qu394IJq6V6dCBRgwqshf3mPF85AqzYEzofzRdZkWss=
golang.org/x/term v0.37.0 h1:8EGAD0qCmHYZg6J17DvsMy9/wJ7/D/4pV/wfnld5lTU=
golang.org/x/term v0.37.0/go.mod h1:5pB4lxRNYYVZuTLmy8oR2BH8dflOR+IbTYFD8fi3254=
golang.org/x/text v0.0.0-20170915032832-14c0d48ead0c/go.mod h1:NqM8EUOU14njkJ3fqMW+pc6Ldnwhi/IjpwHt7yyuwOQ=
golang.org/x/text v0.3.0/go.mod h1:NqM8EUOU14njkJ3fqMW+pc6Ldnwhi/IjpwHt7yyuwOQ=
golang.org/x/text v0.3.1-0.20180807135948-17ff2d5776d2/go.mod h1:NqM8EUOU14njkJ3fqMW+pc6Ldnwhi/IjpwHt7yyuwOQ=
@ -2258,8 +2262,8 @@ golang.org/x/text v0.14.0/go.mod h1:18ZOQIKpY8NJVqYksKHtTdi31H5itFRjB5/qKTNYzSU=
golang.org/x/text v0.15.0/go.mod h1:18ZOQIKpY8NJVqYksKHtTdi31H5itFRjB5/qKTNYzSU=
golang.org/x/text v0.21.0/go.mod h1:4IBbMaMmOPCJ8SecivzSH54+73PCFmPWxNTLm+vZkEQ=
golang.org/x/text v0.22.0/go.mod h1:YRoo4H8PVmsu+E3Ou7cqLVH8oXWIHVoX0jqUWALQhfY=
golang.org/x/text v0.30.0 h1:yznKA/E9zq54KzlzBEAWn1NXSQ8DIp/NYMy88xJjl4k=
golang.org/x/text v0.30.0/go.mod h1:yDdHFIX9t+tORqspjENWgzaCVXgk0yYnYuSZ8UzzBVM=
golang.org/x/text v0.31.0 h1:aC8ghyu4JhP8VojJ2lEHBnochRno1sgL6nEi9WGFGMM=
golang.org/x/text v0.31.0/go.mod h1:tKRAlv61yKIjGGHX/4tP1LTbc13YSec1pxVEWXzfoeM=
golang.org/x/time v0.0.0-20181108054448-85acf8d2951c/go.mod h1:tRJNPiyCQ0inRvYxbN9jk5I+vvW/OXSQhTDSoE431IQ=
golang.org/x/time v0.0.0-20190308202827-9d24e82272b4/go.mod h1:tRJNPiyCQ0inRvYxbN9jk5I+vvW/OXSQhTDSoE431IQ=
golang.org/x/time v0.0.0-20191024005414-555d28b269f0/go.mod h1:tRJNPiyCQ0inRvYxbN9jk5I+vvW/OXSQhTDSoE431IQ=
@ -2339,8 +2343,8 @@ golang.org/x/tools v0.13.0/go.mod h1:HvlwmtVNQAhOuCjW7xxvovg8wbNq7LwfXh/k7wXUl58
golang.org/x/tools v0.14.0/go.mod h1:uYBEerGOWcJyEORxN+Ek8+TT266gXkNlHdJBwexUsBg=
golang.org/x/tools v0.17.0/go.mod h1:xsh6VxdV005rRVaS6SSAf9oiAqljS7UZUacMZ8Bnsps=
golang.org/x/tools v0.21.1-0.20240508182429-e35e4ccd0d2d/go.mod h1:aiJjzUbINMkxbQROHiO6hDPo2LHcIPhhQsa9DLh0yGk=
golang.org/x/tools v0.37.0 h1:DVSRzp7FwePZW356yEAChSdNcQo6Nsp+fex1SUW09lE=
golang.org/x/tools v0.37.0/go.mod h1:MBN5QPQtLMHVdvsbtarmTNukZDdgwdwlO5qGacAzF0w=
golang.org/x/tools v0.38.0 h1:Hx2Xv8hISq8Lm16jvBZ2VQf+RLmbd7wVUsALibYI/IQ=
golang.org/x/tools v0.38.0/go.mod h1:yEsQ/d/YK8cjh0L6rZlY8tgtlKiBNTL14pGDJPJpYQs=
golang.org/x/tools/godoc v0.1.0-deprecated h1:o+aZ1BOj6Hsx/GBdJO/s815sqftjSnrZZwyYTHODvtk=
golang.org/x/tools/godoc v0.1.0-deprecated/go.mod h1:qM63CriJ961IHWmnWa9CjZnBndniPt4a3CK0PVB9bIg=
golang.org/x/xerrors v0.0.0-20190717185122-a985d3407aa7/go.mod h1:I/5z698sn9Ka8TeJc9MKroUUfqBBauWjQqLJ2OPfmY0=

323
other/java/client/src/test/java/seaweedfs/client/FilerClientIntegrationTest.java

@ -0,0 +1,323 @@
package seaweedfs.client;
import org.junit.After;
import org.junit.Before;
import org.junit.Test;
import java.io.IOException;
import java.nio.charset.StandardCharsets;
import java.util.List;
import static org.junit.Assert.*;
/**
* Integration tests for FilerClient.
*
* These tests verify FilerClient operations against a running SeaweedFS filer
* instance.
*
* Prerequisites:
* - SeaweedFS master, volume server, and filer must be running
* - Default ports: filer HTTP 8888, filer gRPC 18888
*
* To run tests:
* export SEAWEEDFS_TEST_ENABLED=true
* mvn test -Dtest=FilerClientIntegrationTest
*/
public class FilerClientIntegrationTest {
private FilerClient filerClient;
private static final String TEST_ROOT = "/test-client-integration";
private static final boolean TESTS_ENABLED = "true".equalsIgnoreCase(System.getenv("SEAWEEDFS_TEST_ENABLED"));
@Before
public void setUp() throws Exception {
if (!TESTS_ENABLED) {
return;
}
filerClient = new FilerClient("localhost", 18888);
// Clean up any existing test directory
if (filerClient.exists(TEST_ROOT)) {
filerClient.rm(TEST_ROOT, true, true);
}
// Create test root directory
filerClient.mkdirs(TEST_ROOT, 0755);
}
@After
public void tearDown() throws Exception {
if (!TESTS_ENABLED || filerClient == null) {
return;
}
try {
// Clean up test directory
if (filerClient.exists(TEST_ROOT)) {
filerClient.rm(TEST_ROOT, true, true);
}
} finally {
filerClient.shutdown();
}
}
@Test
public void testMkdirs() {
if (!TESTS_ENABLED) {
System.out.println("Skipping test - SEAWEEDFS_TEST_ENABLED not set");
return;
}
String testDir = TEST_ROOT + "/testdir";
boolean success = filerClient.mkdirs(testDir, 0755);
assertTrue("Directory creation should succeed", success);
assertTrue("Directory should exist", filerClient.exists(testDir));
}
@Test
public void testTouch() {
if (!TESTS_ENABLED) {
System.out.println("Skipping test - SEAWEEDFS_TEST_ENABLED not set");
return;
}
String testFile = TEST_ROOT + "/testfile.txt";
boolean success = filerClient.touch(testFile, 0644);
assertTrue("Touch should succeed", success);
assertTrue("File should exist", filerClient.exists(testFile));
}
@Test
public void testExists() {
if (!TESTS_ENABLED) {
System.out.println("Skipping test - SEAWEEDFS_TEST_ENABLED not set");
return;
}
assertTrue("Root should exist", filerClient.exists("/"));
assertTrue("Test root should exist", filerClient.exists(TEST_ROOT));
assertFalse("Non-existent path should not exist",
filerClient.exists(TEST_ROOT + "/nonexistent"));
}
@Test
public void testListEntries() {
if (!TESTS_ENABLED) {
System.out.println("Skipping test - SEAWEEDFS_TEST_ENABLED not set");
return;
}
// Create some test files and directories
filerClient.touch(TEST_ROOT + "/file1.txt", 0644);
filerClient.touch(TEST_ROOT + "/file2.txt", 0644);
filerClient.mkdirs(TEST_ROOT + "/subdir", 0755);
List<FilerProto.Entry> entries = filerClient.listEntries(TEST_ROOT);
assertNotNull("Entries should not be null", entries);
assertEquals("Should have 3 entries", 3, entries.size());
}
@Test
public void testListEntriesWithPrefix() {
if (!TESTS_ENABLED) {
System.out.println("Skipping test - SEAWEEDFS_TEST_ENABLED not set");
return;
}
// Create test files
filerClient.touch(TEST_ROOT + "/test1.txt", 0644);
filerClient.touch(TEST_ROOT + "/test2.txt", 0644);
filerClient.touch(TEST_ROOT + "/other.txt", 0644);
List<FilerProto.Entry> entries = filerClient.listEntries(TEST_ROOT, "test", "", 100, false);
assertNotNull("Entries should not be null", entries);
assertEquals("Should have 2 entries starting with 'test'", 2, entries.size());
}
@Test
public void testDeleteFile() {
if (!TESTS_ENABLED) {
System.out.println("Skipping test - SEAWEEDFS_TEST_ENABLED not set");
return;
}
String testFile = TEST_ROOT + "/deleteme.txt";
filerClient.touch(testFile, 0644);
assertTrue("File should exist before delete", filerClient.exists(testFile));
boolean success = filerClient.rm(testFile, false, true);
assertTrue("Delete should succeed", success);
assertFalse("File should not exist after delete", filerClient.exists(testFile));
}
@Test
public void testDeleteDirectoryRecursive() {
if (!TESTS_ENABLED) {
System.out.println("Skipping test - SEAWEEDFS_TEST_ENABLED not set");
return;
}
String testDir = TEST_ROOT + "/deletedir";
filerClient.mkdirs(testDir, 0755);
filerClient.touch(testDir + "/file.txt", 0644);
assertTrue("Directory should exist", filerClient.exists(testDir));
assertTrue("File should exist", filerClient.exists(testDir + "/file.txt"));
boolean success = filerClient.rm(testDir, true, true);
assertTrue("Delete should succeed", success);
assertFalse("Directory should not exist after delete", filerClient.exists(testDir));
}
@Test
public void testRename() {
if (!TESTS_ENABLED) {
System.out.println("Skipping test - SEAWEEDFS_TEST_ENABLED not set");
return;
}
String srcFile = TEST_ROOT + "/source.txt";
String dstFile = TEST_ROOT + "/destination.txt";
filerClient.touch(srcFile, 0644);
assertTrue("Source file should exist", filerClient.exists(srcFile));
boolean success = filerClient.mv(srcFile, dstFile);
assertTrue("Rename should succeed", success);
assertFalse("Source file should not exist after rename", filerClient.exists(srcFile));
assertTrue("Destination file should exist after rename", filerClient.exists(dstFile));
}
@Test
public void testGetEntry() {
if (!TESTS_ENABLED) {
System.out.println("Skipping test - SEAWEEDFS_TEST_ENABLED not set");
return;
}
String testFile = TEST_ROOT + "/getentry.txt";
filerClient.touch(testFile, 0644);
FilerProto.Entry entry = filerClient.lookupEntry(TEST_ROOT, "getentry.txt");
assertNotNull("Entry should not be null", entry);
assertEquals("Entry name should match", "getentry.txt", entry.getName());
assertFalse("Entry should not be a directory", entry.getIsDirectory());
}
@Test
public void testGetEntryForDirectory() {
if (!TESTS_ENABLED) {
System.out.println("Skipping test - SEAWEEDFS_TEST_ENABLED not set");
return;
}
String testDir = TEST_ROOT + "/testsubdir";
filerClient.mkdirs(testDir, 0755);
FilerProto.Entry entry = filerClient.lookupEntry(TEST_ROOT, "testsubdir");
assertNotNull("Entry should not be null", entry);
assertEquals("Entry name should match", "testsubdir", entry.getName());
assertTrue("Entry should be a directory", entry.getIsDirectory());
}
@Test
public void testCreateAndListNestedDirectories() {
if (!TESTS_ENABLED) {
System.out.println("Skipping test - SEAWEEDFS_TEST_ENABLED not set");
return;
}
String nestedPath = TEST_ROOT + "/level1/level2/level3";
boolean success = filerClient.mkdirs(nestedPath, 0755);
assertTrue("Nested directory creation should succeed", success);
assertTrue("Nested directory should exist", filerClient.exists(nestedPath));
// Verify each level exists
assertTrue("Level 1 should exist", filerClient.exists(TEST_ROOT + "/level1"));
assertTrue("Level 2 should exist", filerClient.exists(TEST_ROOT + "/level1/level2"));
assertTrue("Level 3 should exist", filerClient.exists(nestedPath));
}
@Test
public void testMultipleFilesInDirectory() {
if (!TESTS_ENABLED) {
System.out.println("Skipping test - SEAWEEDFS_TEST_ENABLED not set");
return;
}
String testDir = TEST_ROOT + "/multifiles";
filerClient.mkdirs(testDir, 0755);
// Create 10 files
for (int i = 0; i < 10; i++) {
filerClient.touch(testDir + "/file" + i + ".txt", 0644);
}
List<FilerProto.Entry> entries = filerClient.listEntries(testDir);
assertNotNull("Entries should not be null", entries);
assertEquals("Should have 10 files", 10, entries.size());
}
@Test
public void testRenameDirectory() {
if (!TESTS_ENABLED) {
System.out.println("Skipping test - SEAWEEDFS_TEST_ENABLED not set");
return;
}
String srcDir = TEST_ROOT + "/sourcedir";
String dstDir = TEST_ROOT + "/destdir";
filerClient.mkdirs(srcDir, 0755);
filerClient.touch(srcDir + "/file.txt", 0644);
boolean success = filerClient.mv(srcDir, dstDir);
assertTrue("Directory rename should succeed", success);
assertFalse("Source directory should not exist", filerClient.exists(srcDir));
assertTrue("Destination directory should exist", filerClient.exists(dstDir));
assertTrue("File should exist in destination", filerClient.exists(dstDir + "/file.txt"));
}
@Test
public void testLookupNonExistentEntry() {
if (!TESTS_ENABLED) {
System.out.println("Skipping test - SEAWEEDFS_TEST_ENABLED not set");
return;
}
FilerProto.Entry entry = filerClient.lookupEntry(TEST_ROOT, "nonexistent.txt");
assertNull("Entry for non-existent file should be null", entry);
}
@Test
public void testEmptyDirectory() {
if (!TESTS_ENABLED) {
System.out.println("Skipping test - SEAWEEDFS_TEST_ENABLED not set");
return;
}
String emptyDir = TEST_ROOT + "/emptydir";
filerClient.mkdirs(emptyDir, 0755);
List<FilerProto.Entry> entries = filerClient.listEntries(emptyDir);
assertNotNull("Entries should not be null", entries);
assertTrue("Empty directory should have no entries", entries.isEmpty());
}
}

417
other/java/client/src/test/java/seaweedfs/client/SeaweedStreamIntegrationTest.java

@ -0,0 +1,417 @@
package seaweedfs.client;
import org.junit.After;
import org.junit.Before;
import org.junit.Test;
import java.io.IOException;
import java.nio.charset.StandardCharsets;
import java.util.Arrays;
import java.util.Random;
import static org.junit.Assert.*;
/**
* Integration tests for SeaweedInputStream and SeaweedOutputStream.
*
* These tests verify stream operations against a running SeaweedFS instance.
*
* Prerequisites:
* - SeaweedFS master, volume server, and filer must be running
* - Default ports: filer HTTP 8888, filer gRPC 18888
*
* To run tests:
* export SEAWEEDFS_TEST_ENABLED=true
* mvn test -Dtest=SeaweedStreamIntegrationTest
*/
public class SeaweedStreamIntegrationTest {
private FilerClient filerClient;
private static final String TEST_ROOT = "/test-stream-integration";
private static final boolean TESTS_ENABLED =
"true".equalsIgnoreCase(System.getenv("SEAWEEDFS_TEST_ENABLED"));
@Before
public void setUp() throws Exception {
if (!TESTS_ENABLED) {
return;
}
filerClient = new FilerClient("localhost", 18888);
// Clean up any existing test directory
if (filerClient.exists(TEST_ROOT)) {
filerClient.rm(TEST_ROOT, true, true);
}
// Create test root directory
filerClient.mkdirs(TEST_ROOT, 0755);
}
@After
public void tearDown() throws Exception {
if (!TESTS_ENABLED || filerClient == null) {
return;
}
try {
// Clean up test directory
if (filerClient.exists(TEST_ROOT)) {
filerClient.rm(TEST_ROOT, true, true);
}
} finally {
filerClient.shutdown();
}
}
@Test
public void testWriteAndReadSmallFile() throws IOException {
if (!TESTS_ENABLED) {
System.out.println("Skipping test - SEAWEEDFS_TEST_ENABLED not set");
return;
}
String testPath = TEST_ROOT + "/small.txt";
String testContent = "Hello, SeaweedFS!";
// Write file
SeaweedOutputStream outputStream = new SeaweedOutputStream(filerClient, testPath);
outputStream.write(testContent.getBytes(StandardCharsets.UTF_8));
outputStream.close();
// Verify file exists
assertTrue("File should exist", filerClient.exists(testPath));
// Read file
FilerProto.Entry entry = filerClient.lookupEntry(
SeaweedOutputStream.getParentDirectory(testPath),
SeaweedOutputStream.getFileName(testPath)
);
assertNotNull("Entry should not be null", entry);
SeaweedInputStream inputStream = new SeaweedInputStream(filerClient, testPath, entry);
byte[] buffer = new byte[testContent.length()];
int bytesRead = inputStream.read(buffer);
inputStream.close();
assertEquals("Should read all bytes", testContent.length(), bytesRead);
assertEquals("Content should match", testContent, new String(buffer, StandardCharsets.UTF_8));
}
@Test
public void testWriteAndReadLargeFile() throws IOException {
if (!TESTS_ENABLED) {
System.out.println("Skipping test - SEAWEEDFS_TEST_ENABLED not set");
return;
}
String testPath = TEST_ROOT + "/large.bin";
int fileSize = 10 * 1024 * 1024; // 10 MB
// Generate random data
byte[] originalData = new byte[fileSize];
new Random(42).nextBytes(originalData); // Use seed for reproducibility
// Write file
SeaweedOutputStream outputStream = new SeaweedOutputStream(filerClient, testPath);
outputStream.write(originalData);
outputStream.close();
// Verify file exists
assertTrue("File should exist", filerClient.exists(testPath));
// Read file
FilerProto.Entry entry = filerClient.lookupEntry(
SeaweedOutputStream.getParentDirectory(testPath),
SeaweedOutputStream.getFileName(testPath)
);
assertNotNull("Entry should not be null", entry);
SeaweedInputStream inputStream = new SeaweedInputStream(filerClient, testPath, entry);
// Read file in chunks to handle large files properly
byte[] readData = new byte[fileSize];
int totalRead = 0;
int bytesRead;
byte[] buffer = new byte[8192]; // Read in 8KB chunks
while ((bytesRead = inputStream.read(buffer)) > 0) {
System.arraycopy(buffer, 0, readData, totalRead, bytesRead);
totalRead += bytesRead;
}
inputStream.close();
assertEquals("Should read all bytes", fileSize, totalRead);
assertArrayEquals("Content should match", originalData, readData);
}
@Test
public void testWriteInChunks() throws IOException {
if (!TESTS_ENABLED) {
System.out.println("Skipping test - SEAWEEDFS_TEST_ENABLED not set");
return;
}
String testPath = TEST_ROOT + "/chunked.txt";
String[] chunks = {"First chunk. ", "Second chunk. ", "Third chunk."};
// Write file in chunks
SeaweedOutputStream outputStream = new SeaweedOutputStream(filerClient, testPath);
for (String chunk : chunks) {
outputStream.write(chunk.getBytes(StandardCharsets.UTF_8));
}
outputStream.close();
// Read and verify
FilerProto.Entry entry = filerClient.lookupEntry(
SeaweedOutputStream.getParentDirectory(testPath),
SeaweedOutputStream.getFileName(testPath)
);
SeaweedInputStream inputStream = new SeaweedInputStream(filerClient, testPath, entry);
byte[] buffer = new byte[1024];
int bytesRead = inputStream.read(buffer);
inputStream.close();
String expected = String.join("", chunks);
String actual = new String(buffer, 0, bytesRead, StandardCharsets.UTF_8);
assertEquals("Content should match", expected, actual);
}
@Test
public void testReadWithOffset() throws IOException {
if (!TESTS_ENABLED) {
System.out.println("Skipping test - SEAWEEDFS_TEST_ENABLED not set");
return;
}
String testPath = TEST_ROOT + "/offset.txt";
String testContent = "0123456789ABCDEFGHIJ";
// Write file
SeaweedOutputStream outputStream = new SeaweedOutputStream(filerClient, testPath);
outputStream.write(testContent.getBytes(StandardCharsets.UTF_8));
outputStream.close();
// Read with offset
FilerProto.Entry entry = filerClient.lookupEntry(
SeaweedOutputStream.getParentDirectory(testPath),
SeaweedOutputStream.getFileName(testPath)
);
SeaweedInputStream inputStream = new SeaweedInputStream(filerClient, testPath, entry);
inputStream.seek(10); // Skip first 10 bytes
byte[] buffer = new byte[10];
int bytesRead = inputStream.read(buffer);
inputStream.close();
assertEquals("Should read 10 bytes", 10, bytesRead);
assertEquals("Should read from offset", "ABCDEFGHIJ",
new String(buffer, StandardCharsets.UTF_8));
}
@Test
public void testReadPartial() throws IOException {
if (!TESTS_ENABLED) {
System.out.println("Skipping test - SEAWEEDFS_TEST_ENABLED not set");
return;
}
String testPath = TEST_ROOT + "/partial.txt";
String testContent = "The quick brown fox jumps over the lazy dog";
// Write file
SeaweedOutputStream outputStream = new SeaweedOutputStream(filerClient, testPath);
outputStream.write(testContent.getBytes(StandardCharsets.UTF_8));
outputStream.close();
// Read partial
FilerProto.Entry entry = filerClient.lookupEntry(
SeaweedOutputStream.getParentDirectory(testPath),
SeaweedOutputStream.getFileName(testPath)
);
SeaweedInputStream inputStream = new SeaweedInputStream(filerClient, testPath, entry);
// Read only "quick brown"
inputStream.seek(4);
byte[] buffer = new byte[11];
int bytesRead = inputStream.read(buffer);
inputStream.close();
assertEquals("Should read 11 bytes", 11, bytesRead);
assertEquals("Should read partial content", "quick brown",
new String(buffer, StandardCharsets.UTF_8));
}
@Test
public void testEmptyFile() throws IOException {
if (!TESTS_ENABLED) {
System.out.println("Skipping test - SEAWEEDFS_TEST_ENABLED not set");
return;
}
String testPath = TEST_ROOT + "/empty.txt";
// Write empty file
SeaweedOutputStream outputStream = new SeaweedOutputStream(filerClient, testPath);
outputStream.close();
// Verify file exists
assertTrue("File should exist", filerClient.exists(testPath));
// Read empty file
FilerProto.Entry entry = filerClient.lookupEntry(
SeaweedOutputStream.getParentDirectory(testPath),
SeaweedOutputStream.getFileName(testPath)
);
assertNotNull("Entry should not be null", entry);
SeaweedInputStream inputStream = new SeaweedInputStream(filerClient, testPath, entry);
byte[] buffer = new byte[100];
int bytesRead = inputStream.read(buffer);
inputStream.close();
assertEquals("Should read 0 bytes from empty file", -1, bytesRead);
}
@Test
public void testOverwriteFile() throws IOException {
if (!TESTS_ENABLED) {
System.out.println("Skipping test - SEAWEEDFS_TEST_ENABLED not set");
return;
}
String testPath = TEST_ROOT + "/overwrite.txt";
String originalContent = "Original content";
String newContent = "New content that overwrites the original";
// Write original file
SeaweedOutputStream outputStream = new SeaweedOutputStream(filerClient, testPath);
outputStream.write(originalContent.getBytes(StandardCharsets.UTF_8));
outputStream.close();
// Overwrite file
outputStream = new SeaweedOutputStream(filerClient, testPath);
outputStream.write(newContent.getBytes(StandardCharsets.UTF_8));
outputStream.close();
// Read and verify
FilerProto.Entry entry = filerClient.lookupEntry(
SeaweedOutputStream.getParentDirectory(testPath),
SeaweedOutputStream.getFileName(testPath)
);
SeaweedInputStream inputStream = new SeaweedInputStream(filerClient, testPath, entry);
byte[] buffer = new byte[1024];
int bytesRead = inputStream.read(buffer);
inputStream.close();
String actual = new String(buffer, 0, bytesRead, StandardCharsets.UTF_8);
assertEquals("Should have new content", newContent, actual);
}
@Test
public void testMultipleReads() throws IOException {
if (!TESTS_ENABLED) {
System.out.println("Skipping test - SEAWEEDFS_TEST_ENABLED not set");
return;
}
String testPath = TEST_ROOT + "/multireads.txt";
String testContent = "ABCDEFGHIJKLMNOPQRSTUVWXYZ";
// Write file
SeaweedOutputStream outputStream = new SeaweedOutputStream(filerClient, testPath);
outputStream.write(testContent.getBytes(StandardCharsets.UTF_8));
outputStream.close();
// Read in multiple small chunks
FilerProto.Entry entry = filerClient.lookupEntry(
SeaweedOutputStream.getParentDirectory(testPath),
SeaweedOutputStream.getFileName(testPath)
);
SeaweedInputStream inputStream = new SeaweedInputStream(filerClient, testPath, entry);
StringBuilder result = new StringBuilder();
byte[] buffer = new byte[5];
int bytesRead;
while ((bytesRead = inputStream.read(buffer)) > 0) {
result.append(new String(buffer, 0, bytesRead, StandardCharsets.UTF_8));
}
inputStream.close();
assertEquals("Should read entire content", testContent, result.toString());
}
@Test
public void testBinaryData() throws IOException {
if (!TESTS_ENABLED) {
System.out.println("Skipping test - SEAWEEDFS_TEST_ENABLED not set");
return;
}
String testPath = TEST_ROOT + "/binary.bin";
byte[] binaryData = new byte[256];
for (int i = 0; i < 256; i++) {
binaryData[i] = (byte) i;
}
// Write binary file
SeaweedOutputStream outputStream = new SeaweedOutputStream(filerClient, testPath);
outputStream.write(binaryData);
outputStream.close();
// Read and verify
FilerProto.Entry entry = filerClient.lookupEntry(
SeaweedOutputStream.getParentDirectory(testPath),
SeaweedOutputStream.getFileName(testPath)
);
SeaweedInputStream inputStream = new SeaweedInputStream(filerClient, testPath, entry);
byte[] readData = new byte[256];
int bytesRead = inputStream.read(readData);
inputStream.close();
assertEquals("Should read all bytes", 256, bytesRead);
assertArrayEquals("Binary data should match", binaryData, readData);
}
@Test
public void testFlush() throws IOException {
if (!TESTS_ENABLED) {
System.out.println("Skipping test - SEAWEEDFS_TEST_ENABLED not set");
return;
}
String testPath = TEST_ROOT + "/flush.txt";
String testContent = "Content to flush";
// Write file with flush
SeaweedOutputStream outputStream = new SeaweedOutputStream(filerClient, testPath);
outputStream.write(testContent.getBytes(StandardCharsets.UTF_8));
outputStream.flush(); // Explicitly flush
outputStream.close();
// Verify file was written
assertTrue("File should exist after flush", filerClient.exists(testPath));
// Read and verify
FilerProto.Entry entry = filerClient.lookupEntry(
SeaweedOutputStream.getParentDirectory(testPath),
SeaweedOutputStream.getFileName(testPath)
);
SeaweedInputStream inputStream = new SeaweedInputStream(filerClient, testPath, entry);
byte[] buffer = new byte[testContent.length()];
int bytesRead = inputStream.read(buffer);
inputStream.close();
assertEquals("Content should match", testContent,
new String(buffer, 0, bytesRead, StandardCharsets.UTF_8));
}
}

2
other/java/hdfs-over-ftp/pom.xml

@ -36,7 +36,7 @@
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-common</artifactId>
<version>3.2.4</version>
<version>3.4.0</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>

190
other/java/hdfs2/README.md

@ -0,0 +1,190 @@
# SeaweedFS Hadoop2 Client
Hadoop FileSystem implementation for SeaweedFS, compatible with Hadoop 2.x/3.x.
## Building
```bash
mvn clean install
```
## Testing
This project includes two types of tests:
### 1. Configuration Tests (No SeaweedFS Required)
These tests verify configuration handling and initialization logic without requiring a running SeaweedFS instance:
```bash
mvn test -Dtest=SeaweedFileSystemConfigTest
```
### 2. Integration Tests (Requires SeaweedFS)
These tests verify actual FileSystem operations against a running SeaweedFS instance.
#### Prerequisites
1. Start SeaweedFS with default ports:
```bash
# Terminal 1: Start master
weed master
# Terminal 2: Start volume server
weed volume -mserver=localhost:9333
# Terminal 3: Start filer
weed filer -master=localhost:9333
```
2. Verify services are running:
- Master: http://localhost:9333
- Filer HTTP: http://localhost:8888
- Filer gRPC: localhost:18888
#### Running Integration Tests
```bash
# Enable integration tests
export SEAWEEDFS_TEST_ENABLED=true
# Run all tests
mvn test
# Run specific test
mvn test -Dtest=SeaweedFileSystemTest
```
### Test Configuration
Integration tests can be configured via environment variables or system properties:
- `SEAWEEDFS_TEST_ENABLED`: Set to `true` to enable integration tests (default: false)
- Tests use these default connection settings:
- Filer Host: localhost
- Filer HTTP Port: 8888
- Filer gRPC Port: 18888
### Running Tests with Custom Configuration
To test against a different SeaweedFS instance, modify the test code or use Hadoop configuration:
```java
conf.set("fs.seaweed.filer.host", "your-host");
conf.setInt("fs.seaweed.filer.port", 8888);
conf.setInt("fs.seaweed.filer.port.grpc", 18888);
```
## Test Coverage
The test suite covers:
- **Configuration & Initialization**
- URI parsing and configuration
- Default values
- Configuration overrides
- Working directory management
- **File Operations**
- Create files
- Read files
- Write files
- Append to files
- Delete files
- **Directory Operations**
- Create directories
- List directory contents
- Delete directories (recursive and non-recursive)
- **Metadata Operations**
- Get file status
- Set permissions
- Set owner/group
- Rename files and directories
## Usage in Hadoop
1. Copy the built JAR to your Hadoop classpath:
```bash
cp target/seaweedfs-hadoop2-client-*.jar $HADOOP_HOME/share/hadoop/common/lib/
```
2. Configure `core-site.xml`:
```xml
<configuration>
<property>
<name>fs.seaweedfs.impl</name>
<value>seaweed.hdfs.SeaweedFileSystem</value>
</property>
<property>
<name>fs.seaweed.filer.host</name>
<value>localhost</value>
</property>
<property>
<name>fs.seaweed.filer.port</name>
<value>8888</value>
</property>
<property>
<name>fs.seaweed.filer.port.grpc</name>
<value>18888</value>
</property>
</configuration>
```
3. Use SeaweedFS with Hadoop commands:
```bash
hadoop fs -ls seaweedfs://localhost:8888/
hadoop fs -mkdir seaweedfs://localhost:8888/test
hadoop fs -put local.txt seaweedfs://localhost:8888/test/
```
## Continuous Integration
For CI environments, tests can be run in two modes:
1. **Configuration Tests Only** (default, no SeaweedFS required):
```bash
mvn test -Dtest=SeaweedFileSystemConfigTest
```
2. **Full Integration Tests** (requires SeaweedFS):
```bash
# Start SeaweedFS in CI environment
# Then run:
export SEAWEEDFS_TEST_ENABLED=true
mvn test
```
## Troubleshooting
### Tests are skipped
If you see "Skipping test - SEAWEEDFS_TEST_ENABLED not set":
```bash
export SEAWEEDFS_TEST_ENABLED=true
```
### Connection refused errors
Ensure SeaweedFS is running and accessible:
```bash
curl http://localhost:8888/
```
### gRPC errors
Verify the gRPC port is accessible:
```bash
# Should show the port is listening
netstat -an | grep 18888
```
## Contributing
When adding new features, please include:
1. Configuration tests (no SeaweedFS required)
2. Integration tests (with SEAWEEDFS_TEST_ENABLED guard)
3. Documentation updates

21
other/java/hdfs2/pom.xml

@ -6,7 +6,7 @@
<properties>
<seaweedfs.client.version>3.80</seaweedfs.client.version>
<hadoop.version>3.2.4</hadoop.version>
<hadoop.version>3.4.0</hadoop.version>
</properties>
<groupId>com.seaweedfs</groupId>
@ -171,6 +171,25 @@
<version>${hadoop.version}</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>junit</groupId>
<artifactId>junit</artifactId>
<version>4.13.1</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.mockito</groupId>
<artifactId>mockito-core</artifactId>
<version>3.12.4</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-common</artifactId>
<version>${hadoop.version}</version>
<scope>test</scope>
<type>test-jar</type>
</dependency>
</dependencies>
</project>

90
other/java/hdfs2/src/test/java/seaweed/hdfs/SeaweedFileSystemConfigTest.java

@ -0,0 +1,90 @@
package seaweed.hdfs;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.junit.Before;
import org.junit.Test;
import static org.junit.Assert.*;
/**
* Unit tests for SeaweedFileSystem configuration that don't require a running SeaweedFS instance.
*
* These tests verify basic properties and constants.
*/
public class SeaweedFileSystemConfigTest {
private SeaweedFileSystem fs;
private Configuration conf;
@Before
public void setUp() {
fs = new SeaweedFileSystem();
conf = new Configuration();
}
@Test
public void testScheme() {
assertEquals("seaweedfs", fs.getScheme());
}
@Test
public void testConstants() {
// Test that constants are defined correctly
assertEquals("fs.seaweed.filer.host", SeaweedFileSystem.FS_SEAWEED_FILER_HOST);
assertEquals("fs.seaweed.filer.port", SeaweedFileSystem.FS_SEAWEED_FILER_PORT);
assertEquals("fs.seaweed.filer.port.grpc", SeaweedFileSystem.FS_SEAWEED_FILER_PORT_GRPC);
assertEquals(8888, SeaweedFileSystem.FS_SEAWEED_DEFAULT_PORT);
assertEquals("fs.seaweed.buffer.size", SeaweedFileSystem.FS_SEAWEED_BUFFER_SIZE);
assertEquals(4 * 1024 * 1024, SeaweedFileSystem.FS_SEAWEED_DEFAULT_BUFFER_SIZE);
assertEquals("fs.seaweed.replication", SeaweedFileSystem.FS_SEAWEED_REPLICATION);
assertEquals("fs.seaweed.volume.server.access", SeaweedFileSystem.FS_SEAWEED_VOLUME_SERVER_ACCESS);
assertEquals("fs.seaweed.filer.cn", SeaweedFileSystem.FS_SEAWEED_FILER_CN);
}
@Test
public void testWorkingDirectoryPathOperations() {
// Test path operations that don't require initialization
Path testPath = new Path("/test/path");
assertTrue("Path should be absolute", testPath.isAbsolute());
assertEquals("/test/path", testPath.toUri().getPath());
Path childPath = new Path(testPath, "child");
assertEquals("/test/path/child", childPath.toUri().getPath());
}
@Test
public void testConfigurationProperties() {
// Test that configuration can be set and read
conf.set(SeaweedFileSystem.FS_SEAWEED_FILER_HOST, "testhost");
assertEquals("testhost", conf.get(SeaweedFileSystem.FS_SEAWEED_FILER_HOST));
conf.setInt(SeaweedFileSystem.FS_SEAWEED_FILER_PORT, 9999);
assertEquals(9999, conf.getInt(SeaweedFileSystem.FS_SEAWEED_FILER_PORT, 0));
conf.setInt(SeaweedFileSystem.FS_SEAWEED_BUFFER_SIZE, 8 * 1024 * 1024);
assertEquals(8 * 1024 * 1024, conf.getInt(SeaweedFileSystem.FS_SEAWEED_BUFFER_SIZE, 0));
conf.set(SeaweedFileSystem.FS_SEAWEED_REPLICATION, "001");
assertEquals("001", conf.get(SeaweedFileSystem.FS_SEAWEED_REPLICATION));
conf.set(SeaweedFileSystem.FS_SEAWEED_VOLUME_SERVER_ACCESS, "publicUrl");
assertEquals("publicUrl", conf.get(SeaweedFileSystem.FS_SEAWEED_VOLUME_SERVER_ACCESS));
conf.set(SeaweedFileSystem.FS_SEAWEED_FILER_CN, "test-cn");
assertEquals("test-cn", conf.get(SeaweedFileSystem.FS_SEAWEED_FILER_CN));
}
@Test
public void testDefaultBufferSize() {
// Test default buffer size constant
int expected = 4 * 1024 * 1024; // 4MB
assertEquals(expected, SeaweedFileSystem.FS_SEAWEED_DEFAULT_BUFFER_SIZE);
}
@Test
public void testDefaultPort() {
// Test default port constant
assertEquals(8888, SeaweedFileSystem.FS_SEAWEED_DEFAULT_PORT);
}
}

379
other/java/hdfs2/src/test/java/seaweed/hdfs/SeaweedFileSystemTest.java

@ -0,0 +1,379 @@
package seaweed.hdfs;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FSDataInputStream;
import org.apache.hadoop.fs.FSDataOutputStream;
import org.apache.hadoop.fs.FileStatus;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.fs.permission.FsPermission;
import org.junit.After;
import org.junit.Before;
import org.junit.Test;
import java.io.IOException;
import java.net.URI;
import static org.junit.Assert.*;
/**
* Unit tests for SeaweedFileSystem.
*
* These tests verify basic FileSystem operations against a SeaweedFS backend.
* Note: These tests require a running SeaweedFS filer instance.
*
* To run tests, ensure SeaweedFS is running with default ports:
* - Filer HTTP: 8888
* - Filer gRPC: 18888
*
* Set environment variable SEAWEEDFS_TEST_ENABLED=true to enable these tests.
*/
public class SeaweedFileSystemTest {
private SeaweedFileSystem fs;
private Configuration conf;
private static final String TEST_ROOT = "/test-hdfs2";
private static final boolean TESTS_ENABLED =
"true".equalsIgnoreCase(System.getenv("SEAWEEDFS_TEST_ENABLED"));
@Before
public void setUp() throws Exception {
if (!TESTS_ENABLED) {
return;
}
conf = new Configuration();
conf.set("fs.seaweed.filer.host", "localhost");
conf.setInt("fs.seaweed.filer.port", 8888);
conf.setInt("fs.seaweed.filer.port.grpc", 18888);
fs = new SeaweedFileSystem();
URI uri = new URI("seaweedfs://localhost:8888/");
fs.initialize(uri, conf);
// Clean up any existing test directory
Path testPath = new Path(TEST_ROOT);
if (fs.exists(testPath)) {
fs.delete(testPath, true);
}
}
@After
public void tearDown() throws Exception {
if (!TESTS_ENABLED || fs == null) {
return;
}
// Clean up test directory
Path testPath = new Path(TEST_ROOT);
if (fs.exists(testPath)) {
fs.delete(testPath, true);
}
fs.close();
}
@Test
public void testInitialization() throws Exception {
if (!TESTS_ENABLED) {
System.out.println("Skipping test - SEAWEEDFS_TEST_ENABLED not set");
return;
}
assertNotNull(fs);
assertEquals("seaweedfs", fs.getScheme());
assertNotNull(fs.getUri());
assertEquals("/", fs.getWorkingDirectory().toUri().getPath());
}
@Test
public void testMkdirs() throws Exception {
if (!TESTS_ENABLED) {
System.out.println("Skipping test - SEAWEEDFS_TEST_ENABLED not set");
return;
}
Path testDir = new Path(TEST_ROOT + "/testdir");
assertTrue("Failed to create directory", fs.mkdirs(testDir));
assertTrue("Directory should exist", fs.exists(testDir));
FileStatus status = fs.getFileStatus(testDir);
assertTrue("Path should be a directory", status.isDirectory());
}
@Test
public void testCreateAndReadFile() throws Exception {
if (!TESTS_ENABLED) {
System.out.println("Skipping test - SEAWEEDFS_TEST_ENABLED not set");
return;
}
Path testFile = new Path(TEST_ROOT + "/testfile.txt");
String testContent = "Hello, SeaweedFS!";
// Create and write to file
FSDataOutputStream out = fs.create(testFile, FsPermission.getDefault(),
false, 4096, (short) 1, 4 * 1024 * 1024, null);
assertNotNull("Output stream should not be null", out);
out.write(testContent.getBytes());
out.close();
// Verify file exists
assertTrue("File should exist", fs.exists(testFile));
// Read and verify content
FSDataInputStream in = fs.open(testFile, 4096);
assertNotNull("Input stream should not be null", in);
byte[] buffer = new byte[testContent.length()];
int bytesRead = in.read(buffer);
in.close();
assertEquals("Should read all bytes", testContent.length(), bytesRead);
assertEquals("Content should match", testContent, new String(buffer));
}
@Test
public void testFileStatus() throws Exception {
if (!TESTS_ENABLED) {
System.out.println("Skipping test - SEAWEEDFS_TEST_ENABLED not set");
return;
}
Path testFile = new Path(TEST_ROOT + "/statustest.txt");
String content = "test content";
FSDataOutputStream out = fs.create(testFile);
out.write(content.getBytes());
out.close();
FileStatus status = fs.getFileStatus(testFile);
assertNotNull("FileStatus should not be null", status);
assertFalse("Should not be a directory", status.isDirectory());
assertTrue("Should be a file", status.isFile());
assertEquals("File length should match", content.length(), status.getLen());
assertNotNull("Path should not be null", status.getPath());
}
@Test
public void testListStatus() throws Exception {
if (!TESTS_ENABLED) {
System.out.println("Skipping test - SEAWEEDFS_TEST_ENABLED not set");
return;
}
Path testDir = new Path(TEST_ROOT + "/listtest");
fs.mkdirs(testDir);
// Create multiple files
for (int i = 0; i < 3; i++) {
Path file = new Path(testDir, "file" + i + ".txt");
FSDataOutputStream out = fs.create(file);
out.write(("content" + i).getBytes());
out.close();
}
FileStatus[] statuses = fs.listStatus(testDir);
assertNotNull("List should not be null", statuses);
assertEquals("Should have 3 files", 3, statuses.length);
}
@Test
public void testRename() throws Exception {
if (!TESTS_ENABLED) {
System.out.println("Skipping test - SEAWEEDFS_TEST_ENABLED not set");
return;
}
Path srcFile = new Path(TEST_ROOT + "/source.txt");
Path dstFile = new Path(TEST_ROOT + "/destination.txt");
String content = "rename test";
// Create source file
FSDataOutputStream out = fs.create(srcFile);
out.write(content.getBytes());
out.close();
assertTrue("Source file should exist", fs.exists(srcFile));
// Rename
assertTrue("Rename should succeed", fs.rename(srcFile, dstFile));
// Verify
assertFalse("Source file should not exist", fs.exists(srcFile));
assertTrue("Destination file should exist", fs.exists(dstFile));
// Verify content preserved
FSDataInputStream in = fs.open(dstFile);
byte[] buffer = new byte[content.length()];
in.read(buffer);
in.close();
assertEquals("Content should be preserved", content, new String(buffer));
}
@Test
public void testDelete() throws Exception {
if (!TESTS_ENABLED) {
System.out.println("Skipping test - SEAWEEDFS_TEST_ENABLED not set");
return;
}
Path testFile = new Path(TEST_ROOT + "/deletetest.txt");
// Create file
FSDataOutputStream out = fs.create(testFile);
out.write("delete me".getBytes());
out.close();
assertTrue("File should exist before delete", fs.exists(testFile));
// Delete
assertTrue("Delete should succeed", fs.delete(testFile, false));
assertFalse("File should not exist after delete", fs.exists(testFile));
}
@Test
public void testDeleteDirectory() throws Exception {
if (!TESTS_ENABLED) {
System.out.println("Skipping test - SEAWEEDFS_TEST_ENABLED not set");
return;
}
Path testDir = new Path(TEST_ROOT + "/deletedir");
Path testFile = new Path(testDir, "file.txt");
// Create directory with file
fs.mkdirs(testDir);
FSDataOutputStream out = fs.create(testFile);
out.write("content".getBytes());
out.close();
assertTrue("Directory should exist", fs.exists(testDir));
assertTrue("File should exist", fs.exists(testFile));
// Recursive delete
assertTrue("Recursive delete should succeed", fs.delete(testDir, true));
assertFalse("Directory should not exist after delete", fs.exists(testDir));
assertFalse("File should not exist after delete", fs.exists(testFile));
}
@Test
public void testAppend() throws Exception {
if (!TESTS_ENABLED) {
System.out.println("Skipping test - SEAWEEDFS_TEST_ENABLED not set");
return;
}
Path testFile = new Path(TEST_ROOT + "/appendtest.txt");
String initialContent = "initial";
String appendContent = " appended";
// Create initial file
FSDataOutputStream out = fs.create(testFile);
out.write(initialContent.getBytes());
out.close();
// Append
FSDataOutputStream appendOut = fs.append(testFile, 4096, null);
assertNotNull("Append stream should not be null", appendOut);
appendOut.write(appendContent.getBytes());
appendOut.close();
// Verify combined content
FSDataInputStream in = fs.open(testFile);
byte[] buffer = new byte[initialContent.length() + appendContent.length()];
int bytesRead = in.read(buffer);
in.close();
String expected = initialContent + appendContent;
assertEquals("Should read all bytes", expected.length(), bytesRead);
assertEquals("Content should match", expected, new String(buffer));
}
@Test
public void testSetWorkingDirectory() throws Exception {
if (!TESTS_ENABLED) {
System.out.println("Skipping test - SEAWEEDFS_TEST_ENABLED not set");
return;
}
Path originalWd = fs.getWorkingDirectory();
assertEquals("Original working directory should be /", "/", originalWd.toUri().getPath());
Path newWd = new Path(TEST_ROOT);
fs.mkdirs(newWd);
fs.setWorkingDirectory(newWd);
Path currentWd = fs.getWorkingDirectory();
assertTrue("Working directory should be updated",
currentWd.toUri().getPath().contains(TEST_ROOT));
}
@Test
public void testSetPermission() throws Exception {
if (!TESTS_ENABLED) {
System.out.println("Skipping test - SEAWEEDFS_TEST_ENABLED not set");
return;
}
Path testFile = new Path(TEST_ROOT + "/permtest.txt");
// Create file
FSDataOutputStream out = fs.create(testFile);
out.write("permission test".getBytes());
out.close();
// Set permission
FsPermission newPerm = new FsPermission((short) 0644);
fs.setPermission(testFile, newPerm);
FileStatus status = fs.getFileStatus(testFile);
assertNotNull("Permission should not be null", status.getPermission());
}
@Test
public void testSetOwner() throws Exception {
if (!TESTS_ENABLED) {
System.out.println("Skipping test - SEAWEEDFS_TEST_ENABLED not set");
return;
}
Path testFile = new Path(TEST_ROOT + "/ownertest.txt");
// Create file
FSDataOutputStream out = fs.create(testFile);
out.write("owner test".getBytes());
out.close();
// Set owner - this may not fail even if not fully implemented
fs.setOwner(testFile, "testuser", "testgroup");
// Just verify the call doesn't throw an exception
FileStatus status = fs.getFileStatus(testFile);
assertNotNull("FileStatus should not be null", status);
}
@Test
public void testRenameToExistingDirectory() throws Exception {
if (!TESTS_ENABLED) {
System.out.println("Skipping test - SEAWEEDFS_TEST_ENABLED not set");
return;
}
Path srcFile = new Path(TEST_ROOT + "/movefile.txt");
Path dstDir = new Path(TEST_ROOT + "/movedir");
// Create source file and destination directory
FSDataOutputStream out = fs.create(srcFile);
out.write("move test".getBytes());
out.close();
fs.mkdirs(dstDir);
// Rename file to existing directory (should move file into directory)
assertTrue("Rename to directory should succeed", fs.rename(srcFile, dstDir));
// File should be moved into the directory
Path expectedLocation = new Path(dstDir, srcFile.getName());
assertTrue("File should exist in destination directory", fs.exists(expectedLocation));
assertFalse("Source file should not exist", fs.exists(srcFile));
}
}

190
other/java/hdfs3/README.md

@ -0,0 +1,190 @@
# SeaweedFS Hadoop3 Client
Hadoop FileSystem implementation for SeaweedFS, compatible with Hadoop 3.x.
## Building
```bash
mvn clean install
```
## Testing
This project includes two types of tests:
### 1. Configuration Tests (No SeaweedFS Required)
These tests verify configuration handling and initialization logic without requiring a running SeaweedFS instance:
```bash
mvn test -Dtest=SeaweedFileSystemConfigTest
```
### 2. Integration Tests (Requires SeaweedFS)
These tests verify actual FileSystem operations against a running SeaweedFS instance.
#### Prerequisites
1. Start SeaweedFS with default ports:
```bash
# Terminal 1: Start master
weed master
# Terminal 2: Start volume server
weed volume -mserver=localhost:9333
# Terminal 3: Start filer
weed filer -master=localhost:9333
```
2. Verify services are running:
- Master: http://localhost:9333
- Filer HTTP: http://localhost:8888
- Filer gRPC: localhost:18888
#### Running Integration Tests
```bash
# Enable integration tests
export SEAWEEDFS_TEST_ENABLED=true
# Run all tests
mvn test
# Run specific test
mvn test -Dtest=SeaweedFileSystemTest
```
### Test Configuration
Integration tests can be configured via environment variables or system properties:
- `SEAWEEDFS_TEST_ENABLED`: Set to `true` to enable integration tests (default: false)
- Tests use these default connection settings:
- Filer Host: localhost
- Filer HTTP Port: 8888
- Filer gRPC Port: 18888
### Running Tests with Custom Configuration
To test against a different SeaweedFS instance, modify the test code or use Hadoop configuration:
```java
conf.set("fs.seaweed.filer.host", "your-host");
conf.setInt("fs.seaweed.filer.port", 8888);
conf.setInt("fs.seaweed.filer.port.grpc", 18888);
```
## Test Coverage
The test suite covers:
- **Configuration & Initialization**
- URI parsing and configuration
- Default values
- Configuration overrides
- Working directory management
- **File Operations**
- Create files
- Read files
- Write files
- Append to files
- Delete files
- **Directory Operations**
- Create directories
- List directory contents
- Delete directories (recursive and non-recursive)
- **Metadata Operations**
- Get file status
- Set permissions
- Set owner/group
- Rename files and directories
## Usage in Hadoop
1. Copy the built JAR to your Hadoop classpath:
```bash
cp target/seaweedfs-hadoop3-client-*.jar $HADOOP_HOME/share/hadoop/common/lib/
```
2. Configure `core-site.xml`:
```xml
<configuration>
<property>
<name>fs.seaweedfs.impl</name>
<value>seaweed.hdfs.SeaweedFileSystem</value>
</property>
<property>
<name>fs.seaweed.filer.host</name>
<value>localhost</value>
</property>
<property>
<name>fs.seaweed.filer.port</name>
<value>8888</value>
</property>
<property>
<name>fs.seaweed.filer.port.grpc</name>
<value>18888</value>
</property>
</configuration>
```
3. Use SeaweedFS with Hadoop commands:
```bash
hadoop fs -ls seaweedfs://localhost:8888/
hadoop fs -mkdir seaweedfs://localhost:8888/test
hadoop fs -put local.txt seaweedfs://localhost:8888/test/
```
## Continuous Integration
For CI environments, tests can be run in two modes:
1. **Configuration Tests Only** (default, no SeaweedFS required):
```bash
mvn test -Dtest=SeaweedFileSystemConfigTest
```
2. **Full Integration Tests** (requires SeaweedFS):
```bash
# Start SeaweedFS in CI environment
# Then run:
export SEAWEEDFS_TEST_ENABLED=true
mvn test
```
## Troubleshooting
### Tests are skipped
If you see "Skipping test - SEAWEEDFS_TEST_ENABLED not set":
```bash
export SEAWEEDFS_TEST_ENABLED=true
```
### Connection refused errors
Ensure SeaweedFS is running and accessible:
```bash
curl http://localhost:8888/
```
### gRPC errors
Verify the gRPC port is accessible:
```bash
# Should show the port is listening
netstat -an | grep 18888
```
## Contributing
When adding new features, please include:
1. Configuration tests (no SeaweedFS required)
2. Integration tests (with SEAWEEDFS_TEST_ENABLED guard)
3. Documentation updates

263
other/java/hdfs3/dependency-reduced-pom.xml

@ -140,7 +140,7 @@
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-client</artifactId>
<version>3.2.4</version>
<version>3.4.0</version>
<scope>provided</scope>
<exclusions>
<exclusion>
@ -172,9 +172,17 @@
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-common</artifactId>
<version>3.2.4</version>
<version>3.4.0</version>
<scope>provided</scope>
<exclusions>
<exclusion>
<artifactId>hadoop-shaded-protobuf_3_21</artifactId>
<groupId>org.apache.hadoop.thirdparty</groupId>
</exclusion>
<exclusion>
<artifactId>hadoop-shaded-guava</artifactId>
<groupId>org.apache.hadoop.thirdparty</groupId>
</exclusion>
<exclusion>
<artifactId>commons-cli</artifactId>
<groupId>commons-cli</groupId>
@ -200,8 +208,8 @@
<groupId>javax.servlet</groupId>
</exclusion>
<exclusion>
<artifactId>javax.activation-api</artifactId>
<groupId>javax.activation</groupId>
<artifactId>jakarta.activation-api</artifactId>
<groupId>jakarta.activation</groupId>
</exclusion>
<exclusion>
<artifactId>jetty-server</artifactId>
@ -233,7 +241,11 @@
</exclusion>
<exclusion>
<artifactId>jersey-json</artifactId>
<groupId>com.sun.jersey</groupId>
<groupId>com.github.pjfanning</groupId>
</exclusion>
<exclusion>
<artifactId>jettison</artifactId>
<groupId>org.codehaus.jettison</groupId>
</exclusion>
<exclusion>
<artifactId>jersey-server</artifactId>
@ -288,19 +300,248 @@
<groupId>org.apache.curator</groupId>
</exclusion>
<exclusion>
<artifactId>htrace-core4</artifactId>
<groupId>org.apache.htrace</groupId>
<artifactId>zookeeper</artifactId>
<groupId>org.apache.zookeeper</groupId>
</exclusion>
<exclusion>
<artifactId>netty-handler</artifactId>
<groupId>io.netty</groupId>
</exclusion>
<exclusion>
<artifactId>netty-transport-native-epoll</artifactId>
<groupId>io.netty</groupId>
</exclusion>
<exclusion>
<artifactId>metrics-core</artifactId>
<groupId>io.dropwizard.metrics</groupId>
</exclusion>
<exclusion>
<artifactId>commons-compress</artifactId>
<groupId>org.apache.commons</groupId>
</exclusion>
<exclusion>
<artifactId>bcprov-jdk15on</artifactId>
<groupId>org.bouncycastle</groupId>
</exclusion>
<exclusion>
<artifactId>kerb-core</artifactId>
<groupId>org.apache.kerby</groupId>
</exclusion>
<exclusion>
<artifactId>jackson-databind</artifactId>
<groupId>com.fasterxml.jackson.core</groupId>
</exclusion>
<exclusion>
<artifactId>stax2-api</artifactId>
<groupId>org.codehaus.woodstox</groupId>
</exclusion>
<exclusion>
<artifactId>woodstox-core</artifactId>
<groupId>com.fasterxml.woodstox</groupId>
</exclusion>
<exclusion>
<artifactId>dnsjava</artifactId>
<groupId>dnsjava</groupId>
</exclusion>
<exclusion>
<artifactId>snappy-java</artifactId>
<groupId>org.xerial.snappy</groupId>
</exclusion>
<exclusion>
<artifactId>hadoop-annotations</artifactId>
<groupId>org.apache.hadoop</groupId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>junit</groupId>
<artifactId>junit</artifactId>
<version>4.13.1</version>
<scope>test</scope>
<exclusions>
<exclusion>
<artifactId>hamcrest-core</artifactId>
<groupId>org.hamcrest</groupId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>org.mockito</groupId>
<artifactId>mockito-core</artifactId>
<version>3.12.4</version>
<scope>test</scope>
<exclusions>
<exclusion>
<artifactId>byte-buddy</artifactId>
<groupId>net.bytebuddy</groupId>
</exclusion>
<exclusion>
<artifactId>byte-buddy-agent</artifactId>
<groupId>net.bytebuddy</groupId>
</exclusion>
<exclusion>
<artifactId>objenesis</artifactId>
<groupId>org.objenesis</groupId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-common</artifactId>
<version>3.4.0</version>
<type>test-jar</type>
<scope>test</scope>
<exclusions>
<exclusion>
<artifactId>hadoop-shaded-protobuf_3_21</artifactId>
<groupId>org.apache.hadoop.thirdparty</groupId>
</exclusion>
<exclusion>
<artifactId>hadoop-shaded-guava</artifactId>
<groupId>org.apache.hadoop.thirdparty</groupId>
</exclusion>
<exclusion>
<artifactId>commons-cli</artifactId>
<groupId>commons-cli</groupId>
</exclusion>
<exclusion>
<artifactId>commons-math3</artifactId>
<groupId>org.apache.commons</groupId>
</exclusion>
<exclusion>
<artifactId>commons-io</artifactId>
<groupId>commons-io</groupId>
</exclusion>
<exclusion>
<artifactId>commons-net</artifactId>
<groupId>commons-net</groupId>
</exclusion>
<exclusion>
<artifactId>commons-collections</artifactId>
<groupId>commons-collections</groupId>
</exclusion>
<exclusion>
<artifactId>javax.servlet-api</artifactId>
<groupId>javax.servlet</groupId>
</exclusion>
<exclusion>
<artifactId>jakarta.activation-api</artifactId>
<groupId>jakarta.activation</groupId>
</exclusion>
<exclusion>
<artifactId>jetty-server</artifactId>
<groupId>org.eclipse.jetty</groupId>
</exclusion>
<exclusion>
<artifactId>jetty-util</artifactId>
<groupId>org.eclipse.jetty</groupId>
</exclusion>
<exclusion>
<artifactId>jetty-servlet</artifactId>
<groupId>org.eclipse.jetty</groupId>
</exclusion>
<exclusion>
<artifactId>jetty-webapp</artifactId>
<groupId>org.eclipse.jetty</groupId>
</exclusion>
<exclusion>
<artifactId>jsp-api</artifactId>
<groupId>javax.servlet.jsp</groupId>
</exclusion>
<exclusion>
<artifactId>jersey-core</artifactId>
<groupId>com.sun.jersey</groupId>
</exclusion>
<exclusion>
<artifactId>jersey-servlet</artifactId>
<groupId>com.sun.jersey</groupId>
</exclusion>
<exclusion>
<artifactId>jersey-json</artifactId>
<groupId>com.github.pjfanning</groupId>
</exclusion>
<exclusion>
<artifactId>jettison</artifactId>
<groupId>org.codehaus.jettison</groupId>
</exclusion>
<exclusion>
<artifactId>jersey-server</artifactId>
<groupId>com.sun.jersey</groupId>
</exclusion>
<exclusion>
<artifactId>reload4j</artifactId>
<groupId>ch.qos.reload4j</groupId>
</exclusion>
<exclusion>
<artifactId>commons-beanutils</artifactId>
<groupId>commons-beanutils</groupId>
</exclusion>
<exclusion>
<artifactId>commons-configuration2</artifactId>
<groupId>org.apache.commons</groupId>
</exclusion>
<exclusion>
<artifactId>commons-lang3</artifactId>
<groupId>org.apache.commons</groupId>
</exclusion>
<exclusion>
<artifactId>commons-text</artifactId>
<groupId>org.apache.commons</groupId>
</exclusion>
<exclusion>
<artifactId>slf4j-reload4j</artifactId>
<groupId>org.slf4j</groupId>
</exclusion>
<exclusion>
<artifactId>avro</artifactId>
<groupId>org.apache.avro</groupId>
</exclusion>
<exclusion>
<artifactId>re2j</artifactId>
<groupId>com.google.re2j</groupId>
</exclusion>
<exclusion>
<artifactId>hadoop-auth</artifactId>
<groupId>org.apache.hadoop</groupId>
</exclusion>
<exclusion>
<artifactId>jsch</artifactId>
<groupId>com.jcraft</groupId>
</exclusion>
<exclusion>
<artifactId>curator-client</artifactId>
<groupId>org.apache.curator</groupId>
</exclusion>
<exclusion>
<artifactId>curator-recipes</artifactId>
<groupId>org.apache.curator</groupId>
</exclusion>
<exclusion>
<artifactId>zookeeper</artifactId>
<groupId>org.apache.zookeeper</groupId>
</exclusion>
<exclusion>
<artifactId>netty-handler</artifactId>
<groupId>io.netty</groupId>
</exclusion>
<exclusion>
<artifactId>netty-transport-native-epoll</artifactId>
<groupId>io.netty</groupId>
</exclusion>
<exclusion>
<artifactId>metrics-core</artifactId>
<groupId>io.dropwizard.metrics</groupId>
</exclusion>
<exclusion>
<artifactId>commons-compress</artifactId>
<groupId>org.apache.commons</groupId>
</exclusion>
<exclusion>
<artifactId>kerb-simplekdc</artifactId>
<artifactId>bcprov-jdk15on</artifactId>
<groupId>org.bouncycastle</groupId>
</exclusion>
<exclusion>
<artifactId>kerb-core</artifactId>
<groupId>org.apache.kerby</groupId>
</exclusion>
<exclusion>
@ -319,6 +560,10 @@
<artifactId>dnsjava</artifactId>
<groupId>dnsjava</groupId>
</exclusion>
<exclusion>
<artifactId>snappy-java</artifactId>
<groupId>org.xerial.snappy</groupId>
</exclusion>
<exclusion>
<artifactId>hadoop-annotations</artifactId>
<groupId>org.apache.hadoop</groupId>
@ -328,6 +573,6 @@
</dependencies>
<properties>
<seaweedfs.client.version>3.80</seaweedfs.client.version>
<hadoop.version>3.2.4</hadoop.version>
<hadoop.version>3.4.0</hadoop.version>
</properties>
</project>

21
other/java/hdfs3/pom.xml

@ -6,7 +6,7 @@
<properties>
<seaweedfs.client.version>3.80</seaweedfs.client.version>
<hadoop.version>3.2.4</hadoop.version>
<hadoop.version>3.4.0</hadoop.version>
</properties>
<groupId>com.seaweedfs</groupId>
@ -171,6 +171,25 @@
<version>${hadoop.version}</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>junit</groupId>
<artifactId>junit</artifactId>
<version>4.13.1</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.mockito</groupId>
<artifactId>mockito-core</artifactId>
<version>3.12.4</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-common</artifactId>
<version>${hadoop.version}</version>
<scope>test</scope>
<type>test-jar</type>
</dependency>
</dependencies>
</project>

90
other/java/hdfs3/src/test/java/seaweed/hdfs/SeaweedFileSystemConfigTest.java

@ -0,0 +1,90 @@
package seaweed.hdfs;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.junit.Before;
import org.junit.Test;
import static org.junit.Assert.*;
/**
* Unit tests for SeaweedFileSystem configuration that don't require a running SeaweedFS instance.
*
* These tests verify basic properties and constants.
*/
public class SeaweedFileSystemConfigTest {
private SeaweedFileSystem fs;
private Configuration conf;
@Before
public void setUp() {
fs = new SeaweedFileSystem();
conf = new Configuration();
}
@Test
public void testScheme() {
assertEquals("seaweedfs", fs.getScheme());
}
@Test
public void testConstants() {
// Test that constants are defined correctly
assertEquals("fs.seaweed.filer.host", SeaweedFileSystem.FS_SEAWEED_FILER_HOST);
assertEquals("fs.seaweed.filer.port", SeaweedFileSystem.FS_SEAWEED_FILER_PORT);
assertEquals("fs.seaweed.filer.port.grpc", SeaweedFileSystem.FS_SEAWEED_FILER_PORT_GRPC);
assertEquals(8888, SeaweedFileSystem.FS_SEAWEED_DEFAULT_PORT);
assertEquals("fs.seaweed.buffer.size", SeaweedFileSystem.FS_SEAWEED_BUFFER_SIZE);
assertEquals(4 * 1024 * 1024, SeaweedFileSystem.FS_SEAWEED_DEFAULT_BUFFER_SIZE);
assertEquals("fs.seaweed.replication", SeaweedFileSystem.FS_SEAWEED_REPLICATION);
assertEquals("fs.seaweed.volume.server.access", SeaweedFileSystem.FS_SEAWEED_VOLUME_SERVER_ACCESS);
assertEquals("fs.seaweed.filer.cn", SeaweedFileSystem.FS_SEAWEED_FILER_CN);
}
@Test
public void testWorkingDirectoryPathOperations() {
// Test path operations that don't require initialization
Path testPath = new Path("/test/path");
assertTrue("Path should be absolute", testPath.isAbsolute());
assertEquals("/test/path", testPath.toUri().getPath());
Path childPath = new Path(testPath, "child");
assertEquals("/test/path/child", childPath.toUri().getPath());
}
@Test
public void testConfigurationProperties() {
// Test that configuration can be set and read
conf.set(SeaweedFileSystem.FS_SEAWEED_FILER_HOST, "testhost");
assertEquals("testhost", conf.get(SeaweedFileSystem.FS_SEAWEED_FILER_HOST));
conf.setInt(SeaweedFileSystem.FS_SEAWEED_FILER_PORT, 9999);
assertEquals(9999, conf.getInt(SeaweedFileSystem.FS_SEAWEED_FILER_PORT, 0));
conf.setInt(SeaweedFileSystem.FS_SEAWEED_BUFFER_SIZE, 8 * 1024 * 1024);
assertEquals(8 * 1024 * 1024, conf.getInt(SeaweedFileSystem.FS_SEAWEED_BUFFER_SIZE, 0));
conf.set(SeaweedFileSystem.FS_SEAWEED_REPLICATION, "001");
assertEquals("001", conf.get(SeaweedFileSystem.FS_SEAWEED_REPLICATION));
conf.set(SeaweedFileSystem.FS_SEAWEED_VOLUME_SERVER_ACCESS, "publicUrl");
assertEquals("publicUrl", conf.get(SeaweedFileSystem.FS_SEAWEED_VOLUME_SERVER_ACCESS));
conf.set(SeaweedFileSystem.FS_SEAWEED_FILER_CN, "test-cn");
assertEquals("test-cn", conf.get(SeaweedFileSystem.FS_SEAWEED_FILER_CN));
}
@Test
public void testDefaultBufferSize() {
// Test default buffer size constant
int expected = 4 * 1024 * 1024; // 4MB
assertEquals(expected, SeaweedFileSystem.FS_SEAWEED_DEFAULT_BUFFER_SIZE);
}
@Test
public void testDefaultPort() {
// Test default port constant
assertEquals(8888, SeaweedFileSystem.FS_SEAWEED_DEFAULT_PORT);
}
}

379
other/java/hdfs3/src/test/java/seaweed/hdfs/SeaweedFileSystemTest.java

@ -0,0 +1,379 @@
package seaweed.hdfs;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FSDataInputStream;
import org.apache.hadoop.fs.FSDataOutputStream;
import org.apache.hadoop.fs.FileStatus;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.fs.permission.FsPermission;
import org.junit.After;
import org.junit.Before;
import org.junit.Test;
import java.io.IOException;
import java.net.URI;
import static org.junit.Assert.*;
/**
* Unit tests for SeaweedFileSystem.
*
* These tests verify basic FileSystem operations against a SeaweedFS backend.
* Note: These tests require a running SeaweedFS filer instance.
*
* To run tests, ensure SeaweedFS is running with default ports:
* - Filer HTTP: 8888
* - Filer gRPC: 18888
*
* Set environment variable SEAWEEDFS_TEST_ENABLED=true to enable these tests.
*/
public class SeaweedFileSystemTest {
private SeaweedFileSystem fs;
private Configuration conf;
private static final String TEST_ROOT = "/test-hdfs3";
private static final boolean TESTS_ENABLED =
"true".equalsIgnoreCase(System.getenv("SEAWEEDFS_TEST_ENABLED"));
@Before
public void setUp() throws Exception {
if (!TESTS_ENABLED) {
return;
}
conf = new Configuration();
conf.set("fs.seaweed.filer.host", "localhost");
conf.setInt("fs.seaweed.filer.port", 8888);
conf.setInt("fs.seaweed.filer.port.grpc", 18888);
fs = new SeaweedFileSystem();
URI uri = new URI("seaweedfs://localhost:8888/");
fs.initialize(uri, conf);
// Clean up any existing test directory
Path testPath = new Path(TEST_ROOT);
if (fs.exists(testPath)) {
fs.delete(testPath, true);
}
}
@After
public void tearDown() throws Exception {
if (!TESTS_ENABLED || fs == null) {
return;
}
// Clean up test directory
Path testPath = new Path(TEST_ROOT);
if (fs.exists(testPath)) {
fs.delete(testPath, true);
}
fs.close();
}
@Test
public void testInitialization() throws Exception {
if (!TESTS_ENABLED) {
System.out.println("Skipping test - SEAWEEDFS_TEST_ENABLED not set");
return;
}
assertNotNull(fs);
assertEquals("seaweedfs", fs.getScheme());
assertNotNull(fs.getUri());
assertEquals("/", fs.getWorkingDirectory().toUri().getPath());
}
@Test
public void testMkdirs() throws Exception {
if (!TESTS_ENABLED) {
System.out.println("Skipping test - SEAWEEDFS_TEST_ENABLED not set");
return;
}
Path testDir = new Path(TEST_ROOT + "/testdir");
assertTrue("Failed to create directory", fs.mkdirs(testDir));
assertTrue("Directory should exist", fs.exists(testDir));
FileStatus status = fs.getFileStatus(testDir);
assertTrue("Path should be a directory", status.isDirectory());
}
@Test
public void testCreateAndReadFile() throws Exception {
if (!TESTS_ENABLED) {
System.out.println("Skipping test - SEAWEEDFS_TEST_ENABLED not set");
return;
}
Path testFile = new Path(TEST_ROOT + "/testfile.txt");
String testContent = "Hello, SeaweedFS!";
// Create and write to file
FSDataOutputStream out = fs.create(testFile, FsPermission.getDefault(),
false, 4096, (short) 1, 4 * 1024 * 1024, null);
assertNotNull("Output stream should not be null", out);
out.write(testContent.getBytes());
out.close();
// Verify file exists
assertTrue("File should exist", fs.exists(testFile));
// Read and verify content
FSDataInputStream in = fs.open(testFile, 4096);
assertNotNull("Input stream should not be null", in);
byte[] buffer = new byte[testContent.length()];
int bytesRead = in.read(buffer);
in.close();
assertEquals("Should read all bytes", testContent.length(), bytesRead);
assertEquals("Content should match", testContent, new String(buffer));
}
@Test
public void testFileStatus() throws Exception {
if (!TESTS_ENABLED) {
System.out.println("Skipping test - SEAWEEDFS_TEST_ENABLED not set");
return;
}
Path testFile = new Path(TEST_ROOT + "/statustest.txt");
String content = "test content";
FSDataOutputStream out = fs.create(testFile);
out.write(content.getBytes());
out.close();
FileStatus status = fs.getFileStatus(testFile);
assertNotNull("FileStatus should not be null", status);
assertFalse("Should not be a directory", status.isDirectory());
assertTrue("Should be a file", status.isFile());
assertEquals("File length should match", content.length(), status.getLen());
assertNotNull("Path should not be null", status.getPath());
}
@Test
public void testListStatus() throws Exception {
if (!TESTS_ENABLED) {
System.out.println("Skipping test - SEAWEEDFS_TEST_ENABLED not set");
return;
}
Path testDir = new Path(TEST_ROOT + "/listtest");
fs.mkdirs(testDir);
// Create multiple files
for (int i = 0; i < 3; i++) {
Path file = new Path(testDir, "file" + i + ".txt");
FSDataOutputStream out = fs.create(file);
out.write(("content" + i).getBytes());
out.close();
}
FileStatus[] statuses = fs.listStatus(testDir);
assertNotNull("List should not be null", statuses);
assertEquals("Should have 3 files", 3, statuses.length);
}
@Test
public void testRename() throws Exception {
if (!TESTS_ENABLED) {
System.out.println("Skipping test - SEAWEEDFS_TEST_ENABLED not set");
return;
}
Path srcFile = new Path(TEST_ROOT + "/source.txt");
Path dstFile = new Path(TEST_ROOT + "/destination.txt");
String content = "rename test";
// Create source file
FSDataOutputStream out = fs.create(srcFile);
out.write(content.getBytes());
out.close();
assertTrue("Source file should exist", fs.exists(srcFile));
// Rename
assertTrue("Rename should succeed", fs.rename(srcFile, dstFile));
// Verify
assertFalse("Source file should not exist", fs.exists(srcFile));
assertTrue("Destination file should exist", fs.exists(dstFile));
// Verify content preserved
FSDataInputStream in = fs.open(dstFile);
byte[] buffer = new byte[content.length()];
in.read(buffer);
in.close();
assertEquals("Content should be preserved", content, new String(buffer));
}
@Test
public void testDelete() throws Exception {
if (!TESTS_ENABLED) {
System.out.println("Skipping test - SEAWEEDFS_TEST_ENABLED not set");
return;
}
Path testFile = new Path(TEST_ROOT + "/deletetest.txt");
// Create file
FSDataOutputStream out = fs.create(testFile);
out.write("delete me".getBytes());
out.close();
assertTrue("File should exist before delete", fs.exists(testFile));
// Delete
assertTrue("Delete should succeed", fs.delete(testFile, false));
assertFalse("File should not exist after delete", fs.exists(testFile));
}
@Test
public void testDeleteDirectory() throws Exception {
if (!TESTS_ENABLED) {
System.out.println("Skipping test - SEAWEEDFS_TEST_ENABLED not set");
return;
}
Path testDir = new Path(TEST_ROOT + "/deletedir");
Path testFile = new Path(testDir, "file.txt");
// Create directory with file
fs.mkdirs(testDir);
FSDataOutputStream out = fs.create(testFile);
out.write("content".getBytes());
out.close();
assertTrue("Directory should exist", fs.exists(testDir));
assertTrue("File should exist", fs.exists(testFile));
// Recursive delete
assertTrue("Recursive delete should succeed", fs.delete(testDir, true));
assertFalse("Directory should not exist after delete", fs.exists(testDir));
assertFalse("File should not exist after delete", fs.exists(testFile));
}
@Test
public void testAppend() throws Exception {
if (!TESTS_ENABLED) {
System.out.println("Skipping test - SEAWEEDFS_TEST_ENABLED not set");
return;
}
Path testFile = new Path(TEST_ROOT + "/appendtest.txt");
String initialContent = "initial";
String appendContent = " appended";
// Create initial file
FSDataOutputStream out = fs.create(testFile);
out.write(initialContent.getBytes());
out.close();
// Append
FSDataOutputStream appendOut = fs.append(testFile, 4096, null);
assertNotNull("Append stream should not be null", appendOut);
appendOut.write(appendContent.getBytes());
appendOut.close();
// Verify combined content
FSDataInputStream in = fs.open(testFile);
byte[] buffer = new byte[initialContent.length() + appendContent.length()];
int bytesRead = in.read(buffer);
in.close();
String expected = initialContent + appendContent;
assertEquals("Should read all bytes", expected.length(), bytesRead);
assertEquals("Content should match", expected, new String(buffer));
}
@Test
public void testSetWorkingDirectory() throws Exception {
if (!TESTS_ENABLED) {
System.out.println("Skipping test - SEAWEEDFS_TEST_ENABLED not set");
return;
}
Path originalWd = fs.getWorkingDirectory();
assertEquals("Original working directory should be /", "/", originalWd.toUri().getPath());
Path newWd = new Path(TEST_ROOT);
fs.mkdirs(newWd);
fs.setWorkingDirectory(newWd);
Path currentWd = fs.getWorkingDirectory();
assertTrue("Working directory should be updated",
currentWd.toUri().getPath().contains(TEST_ROOT));
}
@Test
public void testSetPermission() throws Exception {
if (!TESTS_ENABLED) {
System.out.println("Skipping test - SEAWEEDFS_TEST_ENABLED not set");
return;
}
Path testFile = new Path(TEST_ROOT + "/permtest.txt");
// Create file
FSDataOutputStream out = fs.create(testFile);
out.write("permission test".getBytes());
out.close();
// Set permission
FsPermission newPerm = new FsPermission((short) 0644);
fs.setPermission(testFile, newPerm);
FileStatus status = fs.getFileStatus(testFile);
assertNotNull("Permission should not be null", status.getPermission());
}
@Test
public void testSetOwner() throws Exception {
if (!TESTS_ENABLED) {
System.out.println("Skipping test - SEAWEEDFS_TEST_ENABLED not set");
return;
}
Path testFile = new Path(TEST_ROOT + "/ownertest.txt");
// Create file
FSDataOutputStream out = fs.create(testFile);
out.write("owner test".getBytes());
out.close();
// Set owner - this may not fail even if not fully implemented
fs.setOwner(testFile, "testuser", "testgroup");
// Just verify the call doesn't throw an exception
FileStatus status = fs.getFileStatus(testFile);
assertNotNull("FileStatus should not be null", status);
}
@Test
public void testRenameToExistingDirectory() throws Exception {
if (!TESTS_ENABLED) {
System.out.println("Skipping test - SEAWEEDFS_TEST_ENABLED not set");
return;
}
Path srcFile = new Path(TEST_ROOT + "/movefile.txt");
Path dstDir = new Path(TEST_ROOT + "/movedir");
// Create source file and destination directory
FSDataOutputStream out = fs.create(srcFile);
out.write("move test".getBytes());
out.close();
fs.mkdirs(dstDir);
// Rename file to existing directory (should move file into directory)
assertTrue("Rename to directory should succeed", fs.rename(srcFile, dstDir));
// File should be moved into the directory
Path expectedLocation = new Path(dstDir, srcFile.getName());
assertTrue("File should exist in destination directory", fs.exists(expectedLocation));
assertFalse("Source file should not exist", fs.exists(srcFile));
}
}

2
seaweedfs-rdma-sidecar/README.md

@ -16,7 +16,7 @@ This project implements a **high-performance RDMA (Remote Direct Memory Access)
```
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ SeaweedFS │ │ Go Sidecar │ │ Rust Engine │
│ Volume Server │◄──►│ (Control Plane) │◄──►│ (Data Plane) │
│ Volume Server │◄──►│ (Control Plane) │◄──►│ (Data Plane) │
└─────────────────┘ └─────────────────┘ └─────────────────┘
│ │ │
│ │ │

1
test/erasure_coding/ec_integration_test.go

@ -391,6 +391,7 @@ func startSeaweedFSCluster(ctx context.Context, dataDir string) (*TestCluster, e
"-mdir", masterDir,
"-volumeSizeLimitMB", "10", // Small volumes for testing
"-ip", "127.0.0.1",
"-peers", "none", // Faster startup when no multiple masters needed
)
masterLogFile, err := os.Create(filepath.Join(masterDir, "master.log"))

77
test/foundationdb/Dockerfile.build

@ -0,0 +1,77 @@
# Simplified single-stage build for SeaweedFS with FoundationDB support
# Force x86_64 platform to use AMD64 FoundationDB packages
FROM --platform=linux/amd64 golang:1.24-bookworm
ARG FOUNDATIONDB_VERSION=7.4.5
ENV FOUNDATIONDB_VERSION=${FOUNDATIONDB_VERSION}
# Install system dependencies and FoundationDB
RUN apt-get update && apt-get install -y \
build-essential \
wget \
ca-certificates \
&& rm -rf /var/lib/apt/lists/*
# Install FoundationDB client libraries (x86_64 emulation) with checksum verification
RUN set -euo pipefail \
&& echo "🏗️ Installing FoundationDB AMD64 package with x86_64 emulation..." \
&& case "${FOUNDATIONDB_VERSION}" in \
"7.4.5") EXPECTED_SHA256="eea6b98cf386a0848655b2e196d18633662a7440a7ee061c10e32153c7e7e112" ;; \
"7.3.43") EXPECTED_SHA256="c3fa0a59c7355b914a1455dac909238d5ea3b6c6bc7b530af8597e6487c1651a" ;; \
*) echo "Unsupported FoundationDB version ${FOUNDATIONDB_VERSION} for deterministic build" >&2; exit 1 ;; \
esac \
&& PACKAGE="foundationdb-clients_${FOUNDATIONDB_VERSION}-1_amd64.deb" \
&& wget -q https://github.com/apple/foundationdb/releases/download/${FOUNDATIONDB_VERSION}/${PACKAGE} \
&& echo "${EXPECTED_SHA256} ${PACKAGE}" | sha256sum -c - \
&& dpkg -i ${PACKAGE} \
&& rm ${PACKAGE} \
&& echo "🔍 Verifying FoundationDB installation..." \
&& ls -la /usr/include/foundationdb/ \
&& ls -la /usr/lib/*/libfdb_c* 2>/dev/null || echo "Library files:" \
&& find /usr -name "libfdb_c*" -type f 2>/dev/null \
&& ldconfig
# Set up Go environment for CGO
ENV CGO_ENABLED=1
ENV GOOS=linux
ENV CGO_CFLAGS="-I/usr/include/foundationdb -I/usr/local/include/foundationdb -DFDB_USE_LATEST_API_VERSION"
ENV CGO_LDFLAGS="-L/usr/lib -lfdb_c"
# Create work directory
WORKDIR /build
# Copy source code
COPY . .
# Using Go 1.24 to match project requirements
# Download dependencies (using versions from go.mod for deterministic builds)
RUN go mod download
# Build SeaweedFS with FoundationDB support
RUN echo "🔨 Building SeaweedFS with FoundationDB support..." && \
echo "🔍 Debugging: Checking headers before build..." && \
find /usr -name "fdb_c.h" -type f 2>/dev/null || echo "No fdb_c.h found" && \
ls -la /usr/include/foundationdb/ 2>/dev/null || echo "No foundationdb include dir" && \
ls -la /usr/lib/libfdb_c* 2>/dev/null || echo "No libfdb_c libraries" && \
echo "CGO_CFLAGS: $CGO_CFLAGS" && \
echo "CGO_LDFLAGS: $CGO_LDFLAGS" && \
go build -tags foundationdb -ldflags="-w -s" -o ./weed/weed ./weed && \
chmod +x ./weed/weed && \
echo "✅ Build successful!" && \
./weed/weed version
# Test compilation (don't run tests as they need cluster)
RUN echo "🧪 Compiling tests..." && \
go test -tags foundationdb -c -o fdb_store_test ./weed/filer/foundationdb/ && \
echo "✅ Tests compiled successfully!"
# Create runtime directories
RUN mkdir -p /var/fdb/config /usr/local/bin
# Copy binaries to final location
RUN cp weed/weed /usr/local/bin/weed && \
cp fdb_store_test /usr/local/bin/fdb_store_test
# Default command
CMD ["/usr/local/bin/weed", "version"]

84
test/foundationdb/Dockerfile.build.arm64

@ -0,0 +1,84 @@
# Multi-stage Dockerfile to build SeaweedFS with FoundationDB support for ARM64
FROM --platform=linux/arm64 golang:1.24-bookworm AS builder
ARG FOUNDATIONDB_VERSION=7.4.5
ENV FOUNDATIONDB_VERSION=${FOUNDATIONDB_VERSION}
# Install build dependencies and download prebuilt FoundationDB clients
RUN apt-get update && apt-get install -y \
build-essential \
git \
wget \
ca-certificates \
&& rm -rf /var/lib/apt/lists/* && \
set -euo pipefail && \
case "${FOUNDATIONDB_VERSION}" in \
"7.4.5") EXPECTED_SHA256="f2176b86b7e1b561c3632b4e6e7efb82e3b8f57c2ff0d0ac4671e742867508aa" ;; \
*) echo "ERROR: No known ARM64 client checksum for FoundationDB ${FOUNDATIONDB_VERSION}. Please update this Dockerfile." >&2; exit 1 ;; \
esac && \
PACKAGE="foundationdb-clients_${FOUNDATIONDB_VERSION}-1_aarch64.deb" && \
wget --timeout=30 --tries=3 https://github.com/apple/foundationdb/releases/download/${FOUNDATIONDB_VERSION}/${PACKAGE} && \
echo "${EXPECTED_SHA256} ${PACKAGE}" | sha256sum -c - && \
dpkg -i ${PACKAGE} && \
rm ${PACKAGE} && \
ldconfig && \
echo "✅ FoundationDB client libraries installed (prebuilt ${FOUNDATIONDB_VERSION})"
# Set up Go environment for CGO
ENV CGO_ENABLED=1
ENV GOOS=linux
ENV GOARCH=arm64
ENV CGO_CFLAGS="-I/usr/include -I/usr/include/foundationdb"
ENV CGO_LDFLAGS="-L/usr/lib -lfdb_c"
# Create work directory
WORKDIR /build
# Copy source code
COPY . .
# Download Go dependencies
RUN go mod download
# Build SeaweedFS with FoundationDB support
RUN echo "🔨 Building SeaweedFS with FoundationDB support for ARM64..." && \
echo "🔍 Debugging: Checking headers before build..." && \
find /usr -name "fdb_c.h" -type f 2>/dev/null && \
ls -la /usr/include/foundationdb/ 2>/dev/null && \
ls -la /usr/lib/libfdb_c* 2>/dev/null && \
echo "CGO_CFLAGS: $CGO_CFLAGS" && \
echo "CGO_LDFLAGS: $CGO_LDFLAGS" && \
go build -tags foundationdb -ldflags="-w -s" -o ./weed/weed ./weed && \
chmod +x ./weed/weed && \
echo "✅ Build successful!" && \
./weed/weed version
# Runtime stage
FROM --platform=linux/arm64 debian:bookworm-slim
# Install runtime dependencies
RUN apt-get update && apt-get install -y \
ca-certificates \
libssl3 \
&& rm -rf /var/lib/apt/lists/*
# Copy FoundationDB client library and headers from builder
COPY --from=builder /usr/lib/libfdb_c* /usr/lib/
COPY --from=builder /usr/include/foundationdb /usr/include/foundationdb
RUN ldconfig
# Copy SeaweedFS binary
COPY --from=builder /build/weed/weed /usr/local/bin/weed
# Create runtime directories
RUN mkdir -p /var/fdb/config /data
# Verify binary works
RUN weed version
# Expose SeaweedFS ports
EXPOSE 9333 19333 8888 8333 18888
# Default command
CMD ["weed", "version"]

51
test/foundationdb/Dockerfile.fdb-arm64

@ -0,0 +1,51 @@
# FoundationDB server image for ARM64 using official prebuilt packages
FROM --platform=linux/arm64 ubuntu:22.04
ARG FOUNDATIONDB_VERSION=7.4.5
ENV FOUNDATIONDB_VERSION=${FOUNDATIONDB_VERSION}
# Install prerequisites
RUN apt-get update && apt-get install -y \
ca-certificates \
wget \
python3 \
libssl3 \
libboost-system1.74.0 \
libboost-filesystem1.74.0 \
&& rm -rf /var/lib/apt/lists/*
# Install FoundationDB server + client debs with checksum verification
RUN set -euo pipefail && \
apt-get update && \
case "${FOUNDATIONDB_VERSION}" in \
"7.4.5") \
CLIENT_SHA="f2176b86b7e1b561c3632b4e6e7efb82e3b8f57c2ff0d0ac4671e742867508aa"; \
SERVER_SHA="d7b081afbbabfdf2452cfbdc5c7c895165457ae32d91fc7f9489da921ab02e26"; \
;; \
*) \
echo "Unsupported FoundationDB version ${FOUNDATIONDB_VERSION} for ARM64 runtime" >&2; \
exit 1 ;; \
esac && \
for component in clients server; do \
if [ "${component}" = "clients" ]; then \
EXPECTED_SHA="${CLIENT_SHA}"; \
else \
EXPECTED_SHA="${SERVER_SHA}"; \
fi && \
PACKAGE="foundationdb-${component}_${FOUNDATIONDB_VERSION}-1_aarch64.deb" && \
PACKAGE_PATH="/tmp/${PACKAGE}" && \
wget --timeout=30 --tries=3 -O "${PACKAGE_PATH}" \
"https://github.com/apple/foundationdb/releases/download/${FOUNDATIONDB_VERSION}/${PACKAGE}" && \
echo "${EXPECTED_SHA} ${PACKAGE_PATH}" | sha256sum -c - && \
apt-get install -y "${PACKAGE_PATH}" && \
rm "${PACKAGE_PATH}"; \
done && \
rm -rf /var/lib/apt/lists/* && \
ldconfig && \
echo "✅ Installed FoundationDB ${FOUNDATIONDB_VERSION} (server + clients)"
# Prepare directories commonly bind-mounted by docker-compose
RUN mkdir -p /var/fdb/{logs,data,config} /usr/lib/foundationdb
# Provide a simple default command (docker-compose overrides this)
CMD ["/bin/bash"]

38
test/foundationdb/Dockerfile.test

@ -0,0 +1,38 @@
# Test environment with Go and FoundationDB support
FROM golang:1.24-bookworm
# Install system dependencies
RUN apt-get update && apt-get install -y \
build-essential \
wget \
ca-certificates \
&& rm -rf /var/lib/apt/lists/*
# Download and install FoundationDB client libraries with checksum verification
RUN set -euo pipefail \
&& FDB_VERSION="7.4.5" \
&& EXPECTED_SHA256="eea6b98cf386a0848655b2e196d18633662a7440a7ee061c10e32153c7e7e112" \
&& PACKAGE="foundationdb-clients_${FDB_VERSION}-1_amd64.deb" \
&& wget -q https://github.com/apple/foundationdb/releases/download/${FDB_VERSION}/${PACKAGE} \
&& echo "${EXPECTED_SHA256} ${PACKAGE}" | sha256sum -c - \
&& (dpkg -i ${PACKAGE} || apt-get install -f -y) \
&& rm ${PACKAGE}
# Set up Go environment for CGO
ENV CGO_ENABLED=1
ENV GOOS=linux
# Set work directory
WORKDIR /app
# Copy source code
COPY . .
# Create directories
RUN mkdir -p /test/results
# Pre-download dependencies
RUN go mod download
# Default command (will be overridden)
CMD ["go", "version"]

223
test/foundationdb/Makefile

@ -0,0 +1,223 @@
# SeaweedFS FoundationDB Integration Testing Makefile
# Configuration
FDB_CLUSTER_FILE ?= /tmp/fdb.cluster
SEAWEEDFS_S3_ENDPOINT ?= http://127.0.0.1:8333
TEST_TIMEOUT ?= 5m
DOCKER_COMPOSE ?= docker-compose
DOCKER_COMPOSE_ARM64 ?= docker-compose -f docker-compose.arm64.yml
# Colors for output
BLUE := \033[36m
GREEN := \033[32m
YELLOW := \033[33m
RED := \033[31m
NC := \033[0m # No Color
.PHONY: help setup test test-unit test-integration test-e2e clean logs status \
setup-arm64 test-arm64 setup-emulated test-emulated clean-arm64
help: ## Show this help message
@echo "$(BLUE)SeaweedFS FoundationDB Integration Testing$(NC)"
@echo ""
@echo "Available targets:"
@awk 'BEGIN {FS = ":.*?## "} /^[a-zA-Z_][a-zA-Z0-9_-]*:.*?## / {printf " $(GREEN)%-15s$(NC) %s\n", $$1, $$2}' $(MAKEFILE_LIST)
setup: ## Set up test environment (FoundationDB + SeaweedFS)
@echo "$(YELLOW)Setting up FoundationDB cluster and SeaweedFS...$(NC)"
@$(DOCKER_COMPOSE) up -d fdb1 fdb2 fdb3
@echo "$(BLUE)Waiting for FoundationDB cluster to initialize...$(NC)"
@sleep 15
@$(DOCKER_COMPOSE) up -d fdb-init
@sleep 10
@echo "$(BLUE)Starting SeaweedFS with FoundationDB filer...$(NC)"
@$(DOCKER_COMPOSE) up -d seaweedfs
@echo "$(GREEN)✅ Test environment ready!$(NC)"
@echo "$(BLUE)Checking cluster status...$(NC)"
@make status
test: setup test-unit test-integration ## Run all tests
test-unit: ## Run unit tests for FoundationDB filer store
@echo "$(YELLOW)Running FoundationDB filer store unit tests...$(NC)"
@cd ../../ && go test -v -timeout=$(TEST_TIMEOUT) -tags foundationdb ./weed/filer/foundationdb/...
test-integration: ## Run integration tests with FoundationDB
@echo "$(YELLOW)Running FoundationDB integration tests...$(NC)"
@cd ../../ && go test -v -timeout=$(TEST_TIMEOUT) -tags foundationdb ./test/foundationdb/...
test-benchmark: ## Run performance benchmarks
@echo "$(YELLOW)Running FoundationDB performance benchmarks...$(NC)"
@cd ../../ && go test -v -timeout=$(TEST_TIMEOUT) -tags foundationdb -bench=. ./test/foundationdb/...
# ARM64 specific targets (Apple Silicon / M1/M2/M3 Macs)
setup-arm64: ## Set up ARM64-native FoundationDB cluster (builds from source)
@echo "$(YELLOW)Setting up ARM64-native FoundationDB cluster...$(NC)"
@echo "$(BLUE)Note: This will build FoundationDB from source - may take 10-15 minutes$(NC)"
@$(DOCKER_COMPOSE_ARM64) build
@$(DOCKER_COMPOSE_ARM64) up -d fdb1 fdb2 fdb3
@echo "$(BLUE)Waiting for FoundationDB cluster to initialize...$(NC)"
@sleep 20
@$(DOCKER_COMPOSE_ARM64) up -d fdb-init
@sleep 15
@echo "$(BLUE)Starting SeaweedFS with FoundationDB filer...$(NC)"
@$(DOCKER_COMPOSE_ARM64) up -d seaweedfs
@echo "$(GREEN)✅ ARM64 test environment ready!$(NC)"
test-arm64: setup-arm64 test-unit test-integration ## Run all tests with ARM64-native FoundationDB
setup-emulated: ## Set up FoundationDB cluster with x86 emulation on ARM64
@echo "$(YELLOW)Setting up FoundationDB cluster with x86 emulation...$(NC)"
@echo "$(BLUE)Note: Using Docker platform emulation - may be slower$(NC)"
@DOCKER_DEFAULT_PLATFORM=linux/amd64 $(DOCKER_COMPOSE) up -d fdb1 fdb2 fdb3
@echo "$(BLUE)Waiting for FoundationDB cluster to initialize...$(NC)"
@sleep 15
@DOCKER_DEFAULT_PLATFORM=linux/amd64 $(DOCKER_COMPOSE) up -d fdb-init
@sleep 10
@echo "$(BLUE)Starting SeaweedFS with FoundationDB filer...$(NC)"
@$(DOCKER_COMPOSE) up -d seaweedfs
@echo "$(GREEN)✅ Emulated test environment ready!$(NC)"
test-emulated: setup-emulated test-unit test-integration ## Run all tests with x86 emulation
clean-arm64: ## Clean up ARM64-specific containers and volumes
@echo "$(YELLOW)Cleaning up ARM64 test environment...$(NC)"
@$(DOCKER_COMPOSE_ARM64) down -v --remove-orphans 2>/dev/null || true
@echo "$(GREEN)✅ ARM64 environment cleaned up!$(NC)"
test-e2e: setup-complete ## Run end-to-end tests with SeaweedFS + FoundationDB
@echo "$(YELLOW)Running end-to-end FoundationDB tests...$(NC)"
@sleep 10 # Wait for SeaweedFS to be ready
@./test_fdb_s3.sh
setup-complete: ## Start complete environment and wait for readiness
@echo "$(YELLOW)Starting complete environment...$(NC)"
@$(DOCKER_COMPOSE) up -d
@echo "$(BLUE)Waiting for all services to be ready...$(NC)"
@./wait_for_services.sh
test-crud: ## Test basic CRUD operations
@echo "$(YELLOW)Testing CRUD operations...$(NC)"
@cd ../../ && go test -v -timeout=$(TEST_TIMEOUT) -tags foundationdb -run TestFoundationDBCRUD ./test/foundationdb/
test-concurrent: ## Test concurrent operations
@echo "$(YELLOW)Testing concurrent operations...$(NC)"
@cd ../../ && go test -v -timeout=$(TEST_TIMEOUT) -tags foundationdb -run TestFoundationDBConcurrent ./test/foundationdb/
clean: ## Clean up test environment (standard + ARM64)
@echo "$(YELLOW)Cleaning up test environment...$(NC)"
@$(DOCKER_COMPOSE) down -v --remove-orphans 2>/dev/null || true
@$(DOCKER_COMPOSE_ARM64) down -v --remove-orphans 2>/dev/null || true
@echo "$(GREEN)✅ Environment cleaned up!$(NC)"
logs: ## Show logs from all services
@$(DOCKER_COMPOSE) logs --tail=50 -f
logs-fdb: ## Show FoundationDB logs
@$(DOCKER_COMPOSE) logs --tail=100 -f fdb1 fdb2 fdb3 fdb-init
logs-seaweedfs: ## Show SeaweedFS logs
@$(DOCKER_COMPOSE) logs --tail=100 -f seaweedfs
status: ## Show status of all services
@echo "$(BLUE)Service Status:$(NC)"
@$(DOCKER_COMPOSE) ps
@echo ""
@echo "$(BLUE)FoundationDB Cluster Status:$(NC)"
@$(DOCKER_COMPOSE) exec fdb-init fdbcli --exec 'status' || echo "FoundationDB not accessible"
@echo ""
@echo "$(BLUE)SeaweedFS S3 Status:$(NC)"
@curl -s $(SEAWEEDFS_S3_ENDPOINT) || echo "SeaweedFS S3 not accessible"
debug: ## Debug test environment
@echo "$(BLUE)Debug Information:$(NC)"
@echo "FoundationDB Cluster File: $(FDB_CLUSTER_FILE)"
@echo "SeaweedFS S3 Endpoint: $(SEAWEEDFS_S3_ENDPOINT)"
@echo "Docker Compose Status:"
@$(DOCKER_COMPOSE) ps
@echo ""
@echo "Network connectivity:"
@docker network ls | grep foundationdb || echo "No FoundationDB network found"
@echo ""
@echo "FoundationDB cluster file:"
@$(DOCKER_COMPOSE) exec fdb1 cat /var/fdb/config/fdb.cluster || echo "Cannot read cluster file"
# Development targets
dev-fdb: ## Start only FoundationDB cluster for development
@$(DOCKER_COMPOSE) up -d fdb1 fdb2 fdb3 fdb-init
@sleep 15
dev-test: dev-fdb ## Quick test with just FoundationDB
@cd ../../ && go test -v -timeout=30s -tags foundationdb -run TestFoundationDBStore_Initialize ./weed/filer/foundationdb/
# Utility targets
install-deps: ## Install required dependencies
@echo "$(YELLOW)Installing test dependencies...$(NC)"
@which docker > /dev/null || (echo "$(RED)Docker not found$(NC)" && exit 1)
@which docker-compose > /dev/null || (echo "$(RED)Docker Compose not found$(NC)" && exit 1)
@which curl > /dev/null || (echo "$(RED)curl not found$(NC)" && exit 1)
@echo "$(GREEN)✅ All dependencies available$(NC)"
check-env: ## Check test environment setup
@echo "$(BLUE)Environment Check:$(NC)"
@echo "FDB_CLUSTER_FILE: $(FDB_CLUSTER_FILE)"
@echo "SEAWEEDFS_S3_ENDPOINT: $(SEAWEEDFS_S3_ENDPOINT)"
@echo "TEST_TIMEOUT: $(TEST_TIMEOUT)"
@make install-deps
# CI targets
ci-test: ## Run tests in CI environment
@echo "$(YELLOW)Running CI tests...$(NC)"
@make setup
@make test-unit
@make test-integration
@make clean
ci-e2e: ## Run end-to-end tests in CI
@echo "$(YELLOW)Running CI end-to-end tests...$(NC)"
@make setup-complete
@make test-e2e
@make clean
# Container build targets
build-container: ## Build SeaweedFS with FoundationDB in container
@echo "$(YELLOW)Building SeaweedFS with FoundationDB in container...$(NC)"
@docker-compose -f docker-compose.build.yml build seaweedfs-fdb-builder
@echo "$(GREEN)✅ Container build complete!$(NC)"
test-container: build-container ## Run containerized FoundationDB integration test
@echo "$(YELLOW)Running containerized FoundationDB integration test...$(NC)"
@docker-compose -f docker-compose.build.yml up --build --abort-on-container-exit
@echo "$(GREEN)🎉 Containerized integration test complete!$(NC)"
extract-binary: build-container ## Extract built SeaweedFS binary from container
@echo "$(YELLOW)Extracting SeaweedFS binary from container...$(NC)"
@docker run --rm -v $(PWD)/bin:/output seaweedfs:foundationdb sh -c "cp /usr/local/bin/weed /output/weed-foundationdb && echo '✅ Binary extracted to ./bin/weed-foundationdb'"
@mkdir -p bin
@echo "$(GREEN)✅ Binary available at ./bin/weed-foundationdb$(NC)"
clean-container: ## Clean up container builds
@echo "$(YELLOW)Cleaning up container builds...$(NC)"
@docker-compose -f docker-compose.build.yml down -v --remove-orphans || true
@docker rmi seaweedfs:foundationdb 2>/dev/null || true
@echo "$(GREEN)✅ Container cleanup complete!$(NC)"
# Simple test environment targets
test-simple: ## Run tests with simplified Docker environment
@echo "$(YELLOW)Running simplified FoundationDB integration tests...$(NC)"
@docker-compose -f docker-compose.simple.yml up --build --abort-on-container-exit
@echo "$(GREEN)🎉 Simple integration tests complete!$(NC)"
test-mock: ## Run mock tests (no FoundationDB required)
@echo "$(YELLOW)Running mock integration tests...$(NC)"
@go test -v ./validation_test.go ./mock_integration_test.go
@echo "$(GREEN)✅ Mock tests completed!$(NC)"
clean-simple: ## Clean up simple test environment
@echo "$(YELLOW)Cleaning up simple test environment...$(NC)"
@docker-compose -f docker-compose.simple.yml down -v --remove-orphans || true
@echo "$(GREEN)✅ Simple environment cleaned up!$(NC)"
# Combined test target - guaranteed to work
test-reliable: test-mock ## Run all tests that are guaranteed to work
@echo "$(GREEN)🎉 All reliable tests completed successfully!$(NC)"

134
test/foundationdb/README.ARM64.md

@ -0,0 +1,134 @@
# ARM64 Support for FoundationDB Integration
This document explains how to run FoundationDB integration tests on ARM64 systems (Apple Silicon M1/M2/M3 Macs).
## Problem
The official FoundationDB Docker images (`foundationdb/foundationdb:7.1.61`) are only available for `linux/amd64` architecture. When running on ARM64 systems, you'll encounter "Illegal instruction" errors. Apple now publishes official ARM64 Debian packages (starting with 7.4.5), which this repo downloads directly for native workflows.
## Solutions
We provide **three different approaches** to run FoundationDB on ARM64:
### 1. 🚀 ARM64 Native (Recommended for Development)
**Pros:** Native performance, no emulation overhead
**Cons:** Requires downloading ~100MB of FoundationDB packages on first run
```bash
# Build and run ARM64-native FoundationDB from source
make setup-arm64
make test-arm64
```
This approach:
- Downloads the official FoundationDB 7.4.5 ARM64 packages
- Takes ~2-3 minutes on first run (no source compilation)
- Provides native performance
- Uses `docker-compose.arm64.yml`
### 2. 🐳 x86 Emulation (Quick Setup)
**Pros:** Fast setup, uses official images
**Cons:** Slower runtime performance due to emulation
```bash
# Run x86 images with Docker emulation
make setup-emulated
make test-emulated
```
This approach:
- Uses Docker's x86 emulation
- Quick setup with official images
- May have performance overhead
- Uses standard `docker-compose.yml` with platform specification
### 3. 📝 Mock Testing (Fastest)
**Pros:** No dependencies, always works, fast execution
**Cons:** Doesn't test real FoundationDB integration
```bash
# Run mock tests (no FoundationDB cluster needed)
make test-mock
make test-reliable
```
## Files Overview
| File | Purpose |
|------|---------|
| `docker-compose.yml` | Standard setup with platform specification |
| `docker-compose.arm64.yml` | ARM64-native setup with source builds |
| `Dockerfile.fdb-arm64` | Multi-stage build for ARM64 FoundationDB |
| `README.ARM64.md` | This documentation |
## Performance Comparison
| Approach | Setup Time | Runtime Performance | Compatibility |
|----------|------------|-------------------|---------------|
| ARM64 Native | 2-3 min | ⭐⭐⭐⭐⭐ | ARM64 only |
| x86 Emulation | 2-3 min | ⭐⭐⭐ | ARM64 + x86 |
| Mock Testing | < 1 min | | Any platform |
## Quick Start Commands
```bash
# For ARM64 Mac users - choose your approach:
# Option 1: ARM64 native (best performance)
make clean && make setup-arm64
# Option 2: x86 emulation (faster setup)
make clean && make setup-emulated
# Option 3: Mock testing (no FDB needed)
make test-mock
# Clean up everything
make clean
```
## Troubleshooting
### Build Timeouts
If ARM64 builds timeout, increase Docker build timeout:
```bash
export DOCKER_BUILDKIT=1
export BUILDKIT_PROGRESS=plain
make setup-arm64
```
### Memory Issues
ARM64 builds require significant memory:
- Increase Docker memory limit to 8GB+
- Close other applications during build
### Platform Detection
Verify your platform:
```bash
docker info | grep -i arch
uname -m # Should show arm64
```
## CI/CD Recommendations
- **Development**: Use `make test-mock` for fast feedback
- **ARM64 CI**: Use `make setup-arm64`
- **x86 CI**: Use `make setup` (standard)
- **Multi-platform CI**: Run both depending on runner architecture
## Architecture Details
The ARM64 solution now uses the official FoundationDB 7.4.5 aarch64 packages:
1. **Builder Stage**: Downloads prebuilt FoundationDB client libraries
- Uses Debian-based Go image for compiling SeaweedFS
- Verifies SHA256 checksums before installing the deb package
2. **Runtime Stage**: Copies the already-installed artifacts
- SeaweedFS runtime layers reuse the validated libraries
- FoundationDB server containers install the prebuilt server + client packages with checksum verification
This keeps the setup time short while preserving native ARM64 performance and strong supply-chain guarantees.

372
test/foundationdb/README.md

@ -0,0 +1,372 @@
# FoundationDB Integration Testing
This directory contains integration tests and setup scripts for the FoundationDB filer store in SeaweedFS.
## Quick Start
```bash
# ✅ GUARANTEED TO WORK - Run reliable tests (no FoundationDB dependencies)
make test-reliable # Validation + Mock tests
# Run individual test types
make test-mock # Mock FoundationDB tests (always work)
go test -v ./validation_test.go # Package structure validation
# 🐳 FULL INTEGRATION (requires Docker + FoundationDB dependencies)
make setup # Start FoundationDB cluster + SeaweedFS
make test # Run all integration tests
make test-simple # Simple containerized test environment
# Clean up
make clean # Clean main environment
make clean-simple # Clean simple test environment
# 🍎 ARM64 / APPLE SILICON SUPPORT
make setup-arm64 # Native ARM64 FoundationDB (builds from source)
make setup-emulated # x86 emulation (faster setup)
make test-arm64 # Test with ARM64 native
make test-emulated # Test with x86 emulation
```
### Test Levels
1. **✅ Validation Tests** (`validation_test.go`) - Always work, no dependencies
2. **✅ Mock Tests** (`mock_integration_test.go`) - Test FoundationDB store logic with mocks
3. **⚠️ Real Integration Tests** (`foundationdb_*_test.go`) - Require actual FoundationDB cluster
### ARM64 / Apple Silicon Support
**🍎 For M1/M2/M3 Mac users:** FoundationDB's official Docker images are AMD64-only. We provide three solutions:
- **Native ARM64** (`make setup-arm64`) - Downloads official FoundationDB ARM64 packages and builds SeaweedFS natively (≈2-3 min setup, best performance)
- **x86 Emulation** (`make setup-emulated`) - Uses Docker emulation (fast setup, slower runtime)
- **Mock Testing** (`make test-mock`) - No FoundationDB needed (instant, tests logic only)
The ARM64 setup automatically builds both FoundationDB and SeaweedFS from source using `docker-compose.arm64.yml` and dedicated ARM64 Dockerfiles. No pre-built images required!
📖 **Detailed Guide:** See [README.ARM64.md](README.ARM64.md) for complete ARM64 documentation.
## Test Environment
The test environment includes:
- **3-node FoundationDB cluster** (fdb1, fdb2, fdb3) for realistic distributed testing
- **Database initialization service** (fdb-init) that configures the cluster
- **SeaweedFS service** configured to use the FoundationDB filer store
- **Automatic service orchestration** with proper startup dependencies
## Test Structure
### Integration Tests
#### `foundationdb_integration_test.go`
- Basic CRUD operations (Create, Read, Update, Delete)
- Directory operations and listing:
- `ListDirectoryEntries` - List all entries in a directory
- `ListDirectoryPrefixedEntries` - List entries matching a prefix
- `DeleteFolderChildren` - Bulk deletion of directory contents
- Transaction handling (begin, commit, rollback)
- Key-Value operations
- Large entry handling with compression
- Error scenarios and edge cases
**Note:** These tests operate at the filer store level, testing the metadata index operations that underpin S3 bucket listing and directory tree operations.
#### `foundationdb_concurrent_test.go`
- Concurrent insert operations across multiple goroutines
- Concurrent read/write operations on shared files
- Concurrent transaction handling with conflict resolution
- Concurrent directory operations
- Concurrent key-value operations
- Stress testing under load
#### `test_fdb_s3.sh` - End-to-End S3 Integration Tests
- **S3 bucket creation** - Create buckets via S3 API
- **S3 file upload** - Upload files to buckets
- **S3 bucket listing** (`aws s3 ls`) - **Validates listing operations work correctly**
- **S3 file download** - Retrieve and verify file contents
- **S3 file deletion** - Delete objects and verify removal
- **FoundationDB backend verification** - Confirms data is stored in FDB
- **Filer directory operations** - Direct filer API calls for directory creation/listing
**This test validates the complete S3 workflow including the listing operations that were problematic in earlier versions.**
#### Unit Tests (`weed/filer/foundationdb/foundationdb_store_test.go`)
- Store initialization and configuration
- Key generation and directory prefixes
- Error handling and validation
- Performance benchmarks
- Configuration validation
## Configuration
### Environment Variables
The tests can be configured using environment variables:
```bash
export FDB_CLUSTER_FILE=/var/fdb/config/fdb.cluster
export WEED_FOUNDATIONDB_ENABLED=true
export WEED_FOUNDATIONDB_API_VERSION=740
export WEED_FOUNDATIONDB_TIMEOUT=10s
```
#### Docker Compose Environment Variables
The `docker-compose.yml` file supports the following optional environment variables with sensible defaults:
```bash
# FoundationDB image (default: foundationdb/foundationdb:7.1.61)
export FOUNDATIONDB_IMAGE=foundationdb/foundationdb:7.1.61
# FoundationDB port (default: 4500)
export FDB_PORT=4500
# FoundationDB cluster file contents (default: docker:docker@fdb1:4500,fdb2:4500,fdb3:4500)
export FDB_CLUSTER_FILE_CONTENTS="docker:docker@fdb1:4500,fdb2:4500,fdb3:4500"
# SeaweedFS image (default: chrislusf/seaweedfs:latest)
export SEAWEEDFS_IMAGE=chrislusf/seaweedfs:latest
```
**Note:** These variables are optional. If not set, the docker-compose will use the default values shown above, allowing `docker-compose up` to work out-of-the-box without any `.env` file or manual configuration.
### Docker Compose Configuration
The `docker-compose.yml` sets up:
1. **FoundationDB Cluster**: 3 coordinating nodes with data distribution
2. **Database Configuration**: Single SSD storage class for testing
3. **SeaweedFS Integration**: Automatic filer store configuration
4. **Volume Persistence**: Data persists between container restarts
### Test Configuration Files
- `filer.toml`: FoundationDB filer store configuration
- `s3.json`: S3 API credentials for end-to-end testing
- `Makefile`: Test automation and environment management
## Test Commands
### Setup Commands
```bash
make setup # Full environment setup
make dev-fdb # Just FoundationDB cluster
make install-deps # Check dependencies
make check-env # Validate configuration
```
### Test Commands
```bash
make test # All tests
make test-unit # Go unit tests
make test-integration # Integration tests
make test-e2e # End-to-end S3 tests (includes S3 bucket listing)
make test-crud # Basic CRUD operations
make test-concurrent # Concurrency tests
make test-benchmark # Performance benchmarks
```
#### S3 and Listing Operation Coverage
**✅ Currently Tested:**
- **S3 bucket listing** (`aws s3 ls`) - Validated in `test_fdb_s3.sh`
- **Directory metadata listing** (`ListDirectoryEntries`) - Tested in `foundationdb_integration_test.go`
- **Prefix-based listing** (`ListDirectoryPrefixedEntries`) - Tested in `foundationdb_integration_test.go`
- **Filer directory operations** - Basic filer API calls in `test_fdb_s3.sh`
- **Metadata index operations** - All CRUD operations on directory entries
**⚠️ Limited/Future Coverage:**
- **Recursive tree operations** - Not explicitly tested (e.g., `weed filer.tree` command)
- **Large directory stress tests** - Listings with thousands of entries not currently benchmarked
- **Concurrent listing operations** - Multiple simultaneous directory listings under load
- **S3 ListObjectsV2 pagination** - Large bucket listing with continuation tokens
**Recommendation:** If experiencing issues with S3 listing operations in production, add stress tests for large directories and concurrent listing scenarios to validate FoundationDB's range scan performance at scale.
### Debug Commands
```bash
make status # Show service status
make logs # Show all logs
make logs-fdb # FoundationDB logs only
make logs-seaweedfs # SeaweedFS logs only
make debug # Debug information
```
### Cleanup Commands
```bash
make clean # Stop services and cleanup
```
## Test Data
Tests use isolated directory prefixes to avoid conflicts:
- **Unit tests**: `seaweedfs_test`
- **Integration tests**: `seaweedfs_test`
- **Concurrent tests**: `seaweedfs_concurrent_test_<timestamp>`
- **E2E tests**: `seaweedfs` (default)
## Expected Test Results
### Performance Expectations
Based on FoundationDB characteristics:
- **Single operations**: < 10ms latency
- **Batch operations**: High throughput with transactions
- **Concurrent operations**: Linear scaling with multiple clients
- **Directory listings**: Efficient range scans
### Reliability Expectations
- **ACID compliance**: All operations are atomic and consistent
- **Fault tolerance**: Automatic recovery from node failures
- **Concurrency**: No data corruption under concurrent load
- **Durability**: Data persists across restarts
## Troubleshooting
### Common Issues
1. **FoundationDB Connection Errors**
```bash
# Check cluster status
make status
# Verify cluster file
docker-compose exec fdb-init cat /var/fdb/config/fdb.cluster
```
2. **Test Failures**
```bash
# Check service logs
make logs-fdb
make logs-seaweedfs
# Run with verbose output
go test -v -tags foundationdb ./...
```
3. **Performance Issues**
```bash
# Check cluster health
docker-compose exec fdb-init fdbcli --exec 'status details'
# Monitor resource usage
docker stats
```
4. **Docker Issues**
```bash
# Clean Docker state
make clean
docker system prune -f
# Restart from scratch
make setup
```
### Debug Mode
Enable verbose logging for detailed troubleshooting:
```bash
# SeaweedFS debug logs
WEED_FILER_OPTIONS_V=2 make test
# FoundationDB debug logs (in fdbcli)
configure new single ssd; status details
```
### Manual Testing
For manual verification:
```bash
# Start environment
make dev-fdb
# Connect to FoundationDB
docker-compose exec fdb-init fdbcli
# FDB commands:
# status - Show cluster status
# getrange "" \xFF - Show all keys
# getrange seaweedfs seaweedfs\xFF - Show SeaweedFS keys
```
### Listing Operations Return Empty Results
**Symptoms:** Uploads succeed, direct file reads work, but listing operations (`aws s3 ls`, `s3.bucket.list`, `weed filer.ls/tree`) return no results.
**Test Coverage:** The `test_fdb_s3.sh` script explicitly tests S3 bucket listing (`aws s3 ls`) to catch this class of issue. Integration tests cover the underlying `ListDirectoryEntries` operations.
**Diagnostic steps:**
```bash
# 1. Verify writes reached FoundationDB
docker-compose exec fdb-init fdbcli
> getrange seaweedfs seaweedfs\xFF
# If no keys appear, writes aren't reaching the store
# 2. Check SeaweedFS volume assignment
curl http://localhost:9333/cluster/status
# Look for "AssignVolume" errors in logs:
make logs-seaweedfs | grep -i "assignvolume\|writable"
# 3. Verify filer health and configuration
curl http://localhost:8888/statistics/health
make logs-seaweedfs | grep -i "store\|foundationdb"
```
**Interpretation:**
- No SeaweedFS keys in FDB: Directory index writes failing; check filer logs for write errors
- AssignVolume errors: Volume assignment blocked; check master status and disk space
- Filer health errors: Configuration or connectivity issue; restart services and verify filer.toml
**Recovery:**
- If fresh data: restart services (`make clean && make setup`)
- If production data: ensure volume assignment works, check disk space on data nodes
## CI Integration
For continuous integration:
```bash
# CI test suite
make ci-test # Unit + integration tests
make ci-e2e # Full end-to-end test suite
```
The tests are designed to be reliable in CI environments with:
- Automatic service startup and health checking
- Timeout handling for slow CI systems
- Proper cleanup and resource management
- Detailed error reporting and logs
## Performance Benchmarks
Run performance benchmarks:
```bash
make test-benchmark
# Sample expected results:
# BenchmarkFoundationDBStore_InsertEntry-8 1000 1.2ms per op
# BenchmarkFoundationDBStore_FindEntry-8 5000 0.5ms per op
# BenchmarkFoundationDBStore_KvOperations-8 2000 0.8ms per op
```
## Contributing
When adding new tests:
1. Use the `//go:build foundationdb` build tag
2. Follow the existing test structure and naming
3. Include both success and error scenarios
4. Add appropriate cleanup and resource management
5. Update this README with new test descriptions

177
test/foundationdb/docker-compose.arm64.yml

@ -0,0 +1,177 @@
version: '3.9'
services:
# FoundationDB cluster nodes - ARM64 compatible
fdb1:
build:
context: .
dockerfile: Dockerfile.fdb-arm64
platforms:
- linux/arm64
platform: linux/arm64
environment:
- FDB_NETWORKING_MODE=host
- FDB_COORDINATOR_PORT=4500
- FDB_PORT=4501
ports:
- "4500:4500"
- "4501:4501"
volumes:
- fdb1_data:/var/fdb/data
- fdb_config:/var/fdb/config
networks:
- fdb_network
command: |
bash -c "
# Initialize cluster configuration
if [ ! -f /var/fdb/config/fdb.cluster ]; then
echo 'testing:testing@fdb1:4500,fdb2:4502,fdb3:4504' > /var/fdb/config/fdb.cluster
fi
# Start FDB processes
/usr/bin/fdbserver --config_path=/var/fdb/config --datadir=/var/fdb/data --logdir=/var/fdb/logs --public_address=fdb1:4501 --listen_address=0.0.0.0:4501 --coordination=fdb1:4500 &
/usr/bin/fdbserver --config_path=/var/fdb/config --datadir=/var/fdb/data --logdir=/var/fdb/logs --public_address=fdb1:4500 --listen_address=0.0.0.0:4500 --coordination=fdb1:4500 --class=coordination &
wait
"
fdb2:
build:
context: .
dockerfile: Dockerfile.fdb-arm64
platforms:
- linux/arm64
platform: linux/arm64
environment:
- FDB_NETWORKING_MODE=host
- FDB_COORDINATOR_PORT=4502
- FDB_PORT=4503
ports:
- "4502:4502"
- "4503:4503"
volumes:
- fdb2_data:/var/fdb/data
- fdb_config:/var/fdb/config
networks:
- fdb_network
depends_on:
- fdb1
command: |
bash -c "
# Wait for cluster file from fdb1
while [ ! -f /var/fdb/config/fdb.cluster ]; do sleep 1; done
# Start FDB processes
/usr/bin/fdbserver --config_path=/var/fdb/config --datadir=/var/fdb/data --logdir=/var/fdb/logs --public_address=fdb2:4503 --listen_address=0.0.0.0:4503 --coordination=fdb1:4500 &
/usr/bin/fdbserver --config_path=/var/fdb/config --datadir=/var/fdb/data --logdir=/var/fdb/logs --public_address=fdb2:4502 --listen_address=0.0.0.0:4502 --coordination=fdb1:4500 --class=coordination &
wait
"
fdb3:
build:
context: .
dockerfile: Dockerfile.fdb-arm64
platforms:
- linux/arm64
platform: linux/arm64
environment:
- FDB_NETWORKING_MODE=host
- FDB_COORDINATOR_PORT=4504
- FDB_PORT=4505
ports:
- "4504:4504"
- "4505:4505"
volumes:
- fdb3_data:/var/fdb/data
- fdb_config:/var/fdb/config
networks:
- fdb_network
depends_on:
- fdb1
command: |
bash -c "
# Wait for cluster file from fdb1
while [ ! -f /var/fdb/config/fdb.cluster ]; do sleep 1; done
# Start FDB processes
/usr/bin/fdbserver --config_path=/var/fdb/config --datadir=/var/fdb/data --logdir=/var/fdb/logs --public_address=fdb3:4505 --listen_address=0.0.0.0:4505 --coordination=fdb1:4500 &
/usr/bin/fdbserver --config_path=/var/fdb/config --datadir=/var/fdb/data --logdir=/var/fdb/logs --public_address=fdb3:4504 --listen_address=0.0.0.0:4504 --coordination=fdb1:4500 --class=coordination &
wait
"
# Initialize and configure the database
fdb-init:
build:
context: .
dockerfile: Dockerfile.fdb-arm64
platforms:
- linux/arm64
platform: linux/arm64
volumes:
- fdb_config:/var/fdb/config
networks:
- fdb_network
depends_on:
- fdb1
- fdb2
- fdb3
command: |
bash -c "
set -euo pipefail
# Wait for cluster file
while [ ! -f /var/fdb/config/fdb.cluster ]; do sleep 1; done
# Wait for cluster to be ready
sleep 10
# Configure database
echo 'Initializing FoundationDB database...'
fdbcli -C /var/fdb/config/fdb.cluster --exec 'configure new single ssd'
# Wait for configuration to complete
sleep 5
# Verify cluster status
fdbcli -C /var/fdb/config/fdb.cluster --exec 'status'
echo 'FoundationDB cluster initialization complete!'
"
# SeaweedFS service with FoundationDB filer
seaweedfs:
build:
context: ../..
dockerfile: test/foundationdb/Dockerfile.build.arm64
platforms:
- linux/arm64
platform: linux/arm64
ports:
- "9333:9333"
- "19333:19333"
- "8888:8888"
- "8333:8333"
- "18888:18888"
command: "server -ip=seaweedfs -filer -master.volumeSizeLimitMB=16 -volume.max=0 -volume -volume.preStopSeconds=1 -s3 -s3.config=/etc/seaweedfs/s3.json -s3.port=8333 -s3.allowEmptyFolder=false -s3.allowDeleteBucketNotEmpty=false"
volumes:
- ./s3.json:/etc/seaweedfs/s3.json
- ./filer.toml:/etc/seaweedfs/filer.toml
- fdb_config:/var/fdb/config
environment:
WEED_LEVELDB2_ENABLED: "false"
WEED_FOUNDATIONDB_ENABLED: "true"
WEED_FOUNDATIONDB_CLUSTER_FILE: "/var/fdb/config/fdb.cluster"
WEED_FOUNDATIONDB_API_VERSION: "740"
WEED_FOUNDATIONDB_TIMEOUT: "5s"
WEED_FOUNDATIONDB_MAX_RETRY_DELAY: "1s"
WEED_MASTER_VOLUME_GROWTH_COPY_1: 1
WEED_MASTER_VOLUME_GROWTH_COPY_OTHER: 1
networks:
- fdb_network
depends_on:
- fdb-init
volumes:
fdb1_data:
fdb2_data:
fdb3_data:
fdb_config:
networks:
fdb_network:
driver: bridge

101
test/foundationdb/docker-compose.build.yml

@ -0,0 +1,101 @@
version: '3.9'
services:
# Build SeaweedFS with FoundationDB support
seaweedfs-fdb-builder:
build:
context: ../.. # Build from seaweedfs root
dockerfile: test/foundationdb/Dockerfile.build
image: seaweedfs:foundationdb
container_name: seaweedfs-fdb-builder
volumes:
- seaweedfs-build:/build/output
command: >
sh -c "
echo '🔨 Building SeaweedFS with FoundationDB support...' &&
cp /usr/local/bin/weed /build/output/weed-foundationdb &&
cp /usr/local/bin/fdb_store_test /build/output/fdb_store_test &&
echo '✅ Build complete! Binaries saved to volume.' &&
/usr/local/bin/weed version &&
echo '📦 Available binaries:' &&
ls -la /build/output/
"
networks:
- fdb_network
# FoundationDB cluster for testing
fdb1:
image: foundationdb/foundationdb:7.1.61
hostname: fdb1
environment:
- FDB_NETWORKING_MODE=container
networks:
- fdb_network
volumes:
- fdb_data1:/var/fdb/data
- fdb_config:/var/fdb/config
command: >
bash -c "
echo 'docker:docker@fdb1:4500' > /var/fdb/config/fdb.cluster &&
/usr/bin/fdbserver --config_path=/var/fdb/config --datadir=/var/fdb/data --logdir=/var/fdb/logs --public_address=fdb1:4500 --listen_address=0.0.0.0:4500 --class=storage
"
# FoundationDB client for database initialization
fdb-init:
image: foundationdb/foundationdb:7.1.61
depends_on:
- fdb1
volumes:
- fdb_config:/var/fdb/config
networks:
- fdb_network
command: >
bash -c "
sleep 10 &&
echo '🔧 Initializing FoundationDB...' &&
fdbcli -C /var/fdb/config/fdb.cluster --exec 'configure new single memory' &&
fdbcli -C /var/fdb/config/fdb.cluster --exec 'status' &&
echo '✅ FoundationDB initialized!'
"
# Test the built SeaweedFS with FoundationDB
seaweedfs-test:
image: seaweedfs:foundationdb
depends_on:
fdb-init:
condition: service_completed_successfully
seaweedfs-fdb-builder:
condition: service_completed_successfully
volumes:
- fdb_config:/var/fdb/config
- seaweedfs-build:/build/output
networks:
- fdb_network
environment:
WEED_FOUNDATIONDB_ENABLED: "true"
WEED_FOUNDATIONDB_CLUSTER_FILE: "/var/fdb/config/fdb.cluster"
WEED_FOUNDATIONDB_API_VERSION: "740"
WEED_FOUNDATIONDB_DIRECTORY_PREFIX: "seaweedfs_test"
command: >
bash -c "
echo '🧪 Testing FoundationDB integration...' &&
sleep 5 &&
echo '📋 Cluster file contents:' &&
cat /var/fdb/config/fdb.cluster &&
echo '🚀 Starting SeaweedFS server with FoundationDB...' &&
/usr/local/bin/weed server -filer -master.volumeSizeLimitMB=16 -volume.max=0 &
SERVER_PID=$! &&
sleep 10 &&
echo '✅ SeaweedFS started successfully with FoundationDB!' &&
echo '🏁 Integration test passed!' &&
kill $SERVER_PID
"
volumes:
fdb_data1:
fdb_config:
seaweedfs-build:
networks:
fdb_network:
driver: bridge

100
test/foundationdb/docker-compose.simple.yml

@ -0,0 +1,100 @@
version: '3.9'
services:
# Simple single-node FoundationDB for testing
foundationdb:
image: foundationdb/foundationdb:7.1.61
platform: linux/amd64 # Force amd64 platform
container_name: foundationdb-single
environment:
- FDB_NETWORKING_MODE=host
ports:
- "4500:4500"
volumes:
- fdb_data:/var/fdb/data
- fdb_config:/var/fdb/config
networks:
- test_network
healthcheck:
test: ["CMD", "fdbcli", "-C", "/var/fdb/config/fdb.cluster", "--exec", "status"]
interval: 5s
timeout: 3s
retries: 10
start_period: 20s
command: >
bash -c "
echo 'Starting FoundationDB single node...' &&
echo 'docker:docker@foundationdb:4500' > /var/fdb/config/fdb.cluster &&
# Start the server
/usr/bin/fdbserver --config_path=/var/fdb/config --datadir=/var/fdb/data --logdir=/var/fdb/logs --public_address=foundationdb:4500 --listen_address=0.0.0.0:4500 --class=storage &
# Wait a moment for server to start
sleep 10 &&
# Configure the database
echo 'Configuring database...' &&
fdbcli -C /var/fdb/config/fdb.cluster --exec 'configure new single memory' &&
echo 'FoundationDB ready!' &&
fdbcli -C /var/fdb/config/fdb.cluster --exec 'status' &&
# Keep running
wait
"
# Test runner with Go environment and FoundationDB dependencies
test-runner:
build:
context: ../..
dockerfile: test/foundationdb/Dockerfile.test
depends_on:
foundationdb:
condition: service_healthy
volumes:
- fdb_config:/var/fdb/config
- test_results:/test/results
networks:
- test_network
environment:
- FDB_CLUSTER_FILE=/var/fdb/config/fdb.cluster
- WEED_FOUNDATIONDB_ENABLED=true
- WEED_FOUNDATIONDB_CLUSTER_FILE=/var/fdb/config/fdb.cluster
- WEED_FOUNDATIONDB_API_VERSION=740
command: >
bash -c "
echo 'FoundationDB is ready, starting tests...' &&
echo 'Testing FoundationDB connection...' &&
fdbcli -C /var/fdb/config/fdb.cluster --exec 'status' &&
echo 'Running integration tests...' &&
cd /app/test/foundationdb &&
# Run validation tests (always work)
echo '=== Running Validation Tests ===' &&
go test -v ./validation_test.go &&
# Run mock tests (always work)
echo '=== Running Mock Integration Tests ===' &&
go test -v ./mock_integration_test.go &&
# Try to run actual integration tests with FoundationDB
echo '=== Running FoundationDB Integration Tests ===' &&
go test -tags foundationdb -v . 2>&1 | tee /test/results/integration_test_results.log &&
echo 'All tests completed!' &&
echo 'Results saved to /test/results/' &&
# Keep container running for debugging
tail -f /dev/null
"
volumes:
fdb_data:
fdb_config:
test_results:
networks:
test_network:
driver: bridge

128
test/foundationdb/docker-compose.yml

@ -0,0 +1,128 @@
services:
fdb1:
image: ${FOUNDATIONDB_IMAGE:-foundationdb/foundationdb:7.1.61}
environment:
- FDB_CLUSTER_FILE_CONTENTS
- FDB_NETWORKING_MODE=container
- FDB_COORDINATOR_PORT=${FDB_PORT:-4500}
- FDB_PORT=${FDB_PORT:-4500}
networks:
- fdb_network
healthcheck:
test: [ "CMD", "nc", "-z", "127.0.0.1", "4500" ]
interval: 5s
timeout: 5s
retries: 60
fdb2:
image: ${FOUNDATIONDB_IMAGE:-foundationdb/foundationdb:7.1.61}
environment:
- FDB_CLUSTER_FILE_CONTENTS
- FDB_NETWORKING_MODE=container
- FDB_COORDINATOR_PORT=${FDB_PORT:-4500}
- FDB_PORT=${FDB_PORT:-4500}
networks:
- fdb_network
healthcheck:
test: [ "CMD", "nc", "-z", "127.0.0.1", "4500" ]
interval: 5s
timeout: 5s
retries: 60
fdb3:
image: ${FOUNDATIONDB_IMAGE:-foundationdb/foundationdb:7.1.61}
environment:
- FDB_CLUSTER_FILE_CONTENTS
- FDB_NETWORKING_MODE=container
- FDB_COORDINATOR_PORT=${FDB_PORT:-4500}
- FDB_PORT=${FDB_PORT:-4500}
networks:
- fdb_network
healthcheck:
test: [ "CMD", "nc", "-z", "127.0.0.1", "4500" ]
interval: 5s
timeout: 5s
retries: 60
# Initialize and configure the database
fdb-init:
image: ${FOUNDATIONDB_IMAGE:-foundationdb/foundationdb:7.1.61}
configs:
- target: /var/fdb/config/fdb.cluster
source: fdb.cluster
environment:
- FDB_CLUSTER_FILE=/var/fdb/config/fdb.cluster
networks:
- fdb_network
depends_on:
fdb1:
condition: service_healthy
fdb2:
condition: service_healthy
fdb3:
condition: service_healthy
entrypoint: |
bash -c "
set -o errexit
# Wait for cluster to be ready
sleep 10
# Configure database
echo 'Initializing FoundationDB database...'
if ! fdbcli --exec 'configure new single ssd' >/tmp/fdbcli.out 2>&1; then
if ! grep -qi 'ERROR: Database already exists!' /tmp/fdbcli.out >/dev/null 2>/dev/null; then
echo 'ERROR: Database initialization failed!' >&2
cat /tmp/fdbcli.out >&2
exit 1
fi
fi
# Wait for configuration to complete
sleep 5
# Verify cluster status
fdbcli --exec 'status'
echo 'FoundationDB cluster initialization complete!'
"
# SeaweedFS service with FoundationDB filer
seaweedfs:
image: ${SEAWEEDFS_IMAGE:-chrislusf/seaweedfs:latest}
depends_on:
fdb-init:
condition: service_completed_successfully
networks:
- fdb_network
ports:
- "9333:9333"
- "19333:19333"
- "8888:8888"
- "8333:8333"
- "18888:18888"
configs:
- target: /var/fdb/config/fdb.cluster
source: fdb.cluster
volumes:
- ./s3.json:/etc/seaweedfs/s3.json
- ./filer.toml:/etc/seaweedfs/filer.toml
environment:
- WEED_LEVELDB2_ENABLED
- WEED_FOUNDATIONDB_ENABLED
- WEED_FOUNDATIONDB_CLUSTER_FILE
- WEED_FOUNDATIONDB_API_VERSION
- WEED_FOUNDATIONDB_TIMEOUT
- WEED_FOUNDATIONDB_MAX_RETRY_DELAY
- WEED_MASTER_VOLUME_GROWTH_COPY_1=1
- WEED_MASTER_VOLUME_GROWTH_COPY_OTHER=1
command: "weed server -ip=seaweedfs -filer -master.volumeSizeLimitMB=16 -volume.max=0 -volume -volume.preStopSeconds=1 -s3 -s3.config=/etc/seaweedfs/s3.json -s3.port=8333 -s3.allowEmptyFolder=false -s3.allowDeleteBucketNotEmpty=false"
configs:
fdb.cluster:
content: |
${FDB_CLUSTER_FILE_CONTENTS:-docker:docker@fdb1:4500,fdb2:4500,fdb3:4500}
networks:
fdb_network:
driver: bridge

19
test/foundationdb/filer.toml

@ -0,0 +1,19 @@
# FoundationDB Filer Configuration
[foundationdb]
enabled = true
cluster_file = "/var/fdb/config/fdb.cluster"
api_version = 740
timeout = "5s"
max_retry_delay = "1s"
directory_prefix = "seaweedfs"
# For testing different configurations
[foundationdb.test]
enabled = false
cluster_file = "/var/fdb/config/fdb.cluster"
api_version = 740
timeout = "10s"
max_retry_delay = "2s"
directory_prefix = "seaweedfs_test"
location = "/test"

445
test/foundationdb/foundationdb_concurrent_test.go

@ -0,0 +1,445 @@
//go:build foundationdb
// +build foundationdb
package foundationdb
import (
"context"
"fmt"
"os"
"sync"
"testing"
"time"
"github.com/seaweedfs/seaweedfs/weed/filer"
"github.com/seaweedfs/seaweedfs/weed/filer/foundationdb"
"github.com/seaweedfs/seaweedfs/weed/util"
)
func TestFoundationDBStore_ConcurrentInserts(t *testing.T) {
store := createTestStore(t)
defer store.Shutdown()
ctx := context.Background()
numGoroutines := 10
entriesPerGoroutine := 100
var wg sync.WaitGroup
errors := make(chan error, numGoroutines*entriesPerGoroutine)
// Launch concurrent insert operations
for g := 0; g < numGoroutines; g++ {
wg.Add(1)
go func(goroutineID int) {
defer wg.Done()
for i := 0; i < entriesPerGoroutine; i++ {
entry := &filer.Entry{
FullPath: util.NewFullPath("/concurrent", fmt.Sprintf("g%d_file%d.txt", goroutineID, i)),
Attr: filer.Attr{
Mode: 0644,
Uid: uint32(goroutineID),
Gid: 1000,
Mtime: time.Now(),
},
}
err := store.InsertEntry(ctx, entry)
if err != nil {
errors <- fmt.Errorf("goroutine %d, entry %d: %v", goroutineID, i, err)
return
}
}
}(g)
}
wg.Wait()
close(errors)
// Check for errors
for err := range errors {
t.Errorf("Concurrent insert error: %v", err)
}
// Verify all entries were inserted
expectedTotal := numGoroutines * entriesPerGoroutine
actualCount := 0
_, err := store.ListDirectoryEntries(ctx, "/concurrent", "", true, 10000, func(entry *filer.Entry) bool {
actualCount++
return true
})
if err != nil {
t.Fatalf("ListDirectoryEntries failed: %v", err)
}
if actualCount != expectedTotal {
t.Errorf("Expected %d entries, found %d", expectedTotal, actualCount)
}
}
func TestFoundationDBStore_ConcurrentReadsAndWrites(t *testing.T) {
store := createTestStore(t)
defer store.Shutdown()
ctx := context.Background()
numReaders := 5
numWriters := 5
operationsPerGoroutine := 50
testFile := "/concurrent/rw_test_file.txt"
// Insert initial file
initialEntry := &filer.Entry{
FullPath: testFile,
Attr: filer.Attr{
Mode: 0644,
Uid: 1000,
Gid: 1000,
Mtime: time.Now(),
},
}
err := store.InsertEntry(ctx, initialEntry)
if err != nil {
t.Fatalf("Initial InsertEntry failed: %v", err)
}
var wg sync.WaitGroup
errors := make(chan error, (numReaders+numWriters)*operationsPerGoroutine)
// Launch reader goroutines
for r := 0; r < numReaders; r++ {
wg.Add(1)
go func(readerID int) {
defer wg.Done()
for i := 0; i < operationsPerGoroutine; i++ {
_, err := store.FindEntry(ctx, testFile)
if err != nil {
errors <- fmt.Errorf("reader %d, operation %d: %v", readerID, i, err)
return
}
// Small delay to allow interleaving with writes
time.Sleep(1 * time.Millisecond)
}
}(r)
}
// Launch writer goroutines
for w := 0; w < numWriters; w++ {
wg.Add(1)
go func(writerID int) {
defer wg.Done()
for i := 0; i < operationsPerGoroutine; i++ {
entry := &filer.Entry{
FullPath: testFile,
Attr: filer.Attr{
Mode: 0644,
Uid: uint32(writerID + 1000),
Gid: uint32(i),
Mtime: time.Now(),
},
}
err := store.UpdateEntry(ctx, entry)
if err != nil {
errors <- fmt.Errorf("writer %d, operation %d: %v", writerID, i, err)
return
}
// Small delay to allow interleaving with reads
time.Sleep(1 * time.Millisecond)
}
}(w)
}
wg.Wait()
close(errors)
// Check for errors
for err := range errors {
t.Errorf("Concurrent read/write error: %v", err)
}
// Verify final state
finalEntry, err := store.FindEntry(ctx, testFile)
if err != nil {
t.Fatalf("Final FindEntry failed: %v", err)
}
if finalEntry.FullPath != testFile {
t.Errorf("Expected final path %s, got %s", testFile, finalEntry.FullPath)
}
}
func TestFoundationDBStore_ConcurrentTransactions(t *testing.T) {
store := createTestStore(t)
defer store.Shutdown()
ctx := context.Background()
numTransactions := 5
entriesPerTransaction := 10
var wg sync.WaitGroup
errors := make(chan error, numTransactions)
successfulTx := make(chan int, numTransactions)
// Launch concurrent transactions
for tx := 0; tx < numTransactions; tx++ {
wg.Add(1)
go func(txID int) {
defer wg.Done()
// Note: FoundationDB has optimistic concurrency control
// Some transactions may need to retry due to conflicts
maxRetries := 3
for attempt := 0; attempt < maxRetries; attempt++ {
txCtx, err := store.BeginTransaction(ctx)
if err != nil {
if attempt == maxRetries-1 {
errors <- fmt.Errorf("tx %d: failed to begin after %d attempts: %v", txID, maxRetries, err)
}
time.Sleep(time.Duration(attempt+1) * 10 * time.Millisecond)
continue
}
// Insert multiple entries in transaction
success := true
for i := 0; i < entriesPerTransaction; i++ {
entry := &filer.Entry{
FullPath: util.NewFullPath("/transactions", fmt.Sprintf("tx%d_file%d.txt", txID, i)),
Attr: filer.Attr{
Mode: 0644,
Uid: uint32(txID),
Gid: uint32(i),
Mtime: time.Now(),
},
}
err = store.InsertEntry(txCtx, entry)
if err != nil {
errors <- fmt.Errorf("tx %d, entry %d: insert failed: %v", txID, i, err)
store.RollbackTransaction(txCtx)
success = false
break
}
}
if success {
err = store.CommitTransaction(txCtx)
if err != nil {
if attempt == maxRetries-1 {
errors <- fmt.Errorf("tx %d: commit failed after %d attempts: %v", txID, maxRetries, err)
}
time.Sleep(time.Duration(attempt+1) * 10 * time.Millisecond)
continue
}
successfulTx <- txID
return
}
}
}(tx)
}
wg.Wait()
close(errors)
close(successfulTx)
// Check for errors
for err := range errors {
t.Errorf("Concurrent transaction error: %v", err)
}
// Count successful transactions
successCount := 0
successfulTxIDs := make([]int, 0)
for txID := range successfulTx {
successCount++
successfulTxIDs = append(successfulTxIDs, txID)
}
t.Logf("Successful transactions: %d/%d (IDs: %v)", successCount, numTransactions, successfulTxIDs)
// Verify entries from successful transactions
totalExpectedEntries := successCount * entriesPerTransaction
actualCount := 0
_, err := store.ListDirectoryEntries(ctx, "/transactions", "", true, 10000, func(entry *filer.Entry) bool {
actualCount++
return true
})
if err != nil {
t.Fatalf("ListDirectoryEntries failed: %v", err)
}
if actualCount != totalExpectedEntries {
t.Errorf("Expected %d entries from successful transactions, found %d", totalExpectedEntries, actualCount)
}
}
func TestFoundationDBStore_ConcurrentDirectoryOperations(t *testing.T) {
store := createTestStore(t)
defer store.Shutdown()
ctx := context.Background()
numWorkers := 10
directoriesPerWorker := 20
filesPerDirectory := 5
var wg sync.WaitGroup
errors := make(chan error, numWorkers*directoriesPerWorker*filesPerDirectory)
// Launch workers that create directories with files
for w := 0; w < numWorkers; w++ {
wg.Add(1)
go func(workerID int) {
defer wg.Done()
for d := 0; d < directoriesPerWorker; d++ {
dirPath := fmt.Sprintf("/worker%d/dir%d", workerID, d)
// Create files in directory
for f := 0; f < filesPerDirectory; f++ {
entry := &filer.Entry{
FullPath: util.NewFullPath(dirPath, fmt.Sprintf("file%d.txt", f)),
Attr: filer.Attr{
Mode: 0644,
Uid: uint32(workerID),
Gid: uint32(d),
Mtime: time.Now(),
},
}
err := store.InsertEntry(ctx, entry)
if err != nil {
errors <- fmt.Errorf("worker %d, dir %d, file %d: %v", workerID, d, f, err)
return
}
}
}
}(w)
}
wg.Wait()
close(errors)
// Check for errors
for err := range errors {
t.Errorf("Concurrent directory operation error: %v", err)
}
// Verify directory structure
for w := 0; w < numWorkers; w++ {
for d := 0; d < directoriesPerWorker; d++ {
dirPath := fmt.Sprintf("/worker%d/dir%d", w, d)
fileCount := 0
_, err := store.ListDirectoryEntries(ctx, dirPath, "", true, 1000, func(entry *filer.Entry) bool {
fileCount++
return true
})
if err != nil {
t.Errorf("ListDirectoryEntries failed for %s: %v", dirPath, err)
continue
}
if fileCount != filesPerDirectory {
t.Errorf("Expected %d files in %s, found %d", filesPerDirectory, dirPath, fileCount)
}
}
}
}
func TestFoundationDBStore_ConcurrentKVOperations(t *testing.T) {
store := createTestStore(t)
defer store.Shutdown()
ctx := context.Background()
numWorkers := 8
operationsPerWorker := 100
var wg sync.WaitGroup
errors := make(chan error, numWorkers*operationsPerWorker)
// Launch workers performing KV operations
for w := 0; w < numWorkers; w++ {
wg.Add(1)
go func(workerID int) {
defer wg.Done()
for i := 0; i < operationsPerWorker; i++ {
key := []byte(fmt.Sprintf("worker%d_key%d", workerID, i))
value := []byte(fmt.Sprintf("worker%d_value%d_timestamp%d", workerID, i, time.Now().UnixNano()))
// Put operation
err := store.KvPut(ctx, key, value)
if err != nil {
errors <- fmt.Errorf("worker %d, operation %d: KvPut failed: %v", workerID, i, err)
continue
}
// Get operation
retrievedValue, err := store.KvGet(ctx, key)
if err != nil {
errors <- fmt.Errorf("worker %d, operation %d: KvGet failed: %v", workerID, i, err)
continue
}
if string(retrievedValue) != string(value) {
errors <- fmt.Errorf("worker %d, operation %d: value mismatch", workerID, i)
continue
}
// Delete operation (for some keys)
if i%5 == 0 {
err = store.KvDelete(ctx, key)
if err != nil {
errors <- fmt.Errorf("worker %d, operation %d: KvDelete failed: %v", workerID, i, err)
}
}
}
}(w)
}
wg.Wait()
close(errors)
// Check for errors
errorCount := 0
for err := range errors {
t.Errorf("Concurrent KV operation error: %v", err)
errorCount++
}
if errorCount > 0 {
t.Errorf("Total errors in concurrent KV operations: %d", errorCount)
}
}
func createTestStore(t *testing.T) *foundationdb.FoundationDBStore {
// Skip test if FoundationDB cluster file doesn't exist
clusterFile := os.Getenv("FDB_CLUSTER_FILE")
if clusterFile == "" {
clusterFile = "/var/fdb/config/fdb.cluster"
}
if _, err := os.Stat(clusterFile); os.IsNotExist(err) {
t.Skip("FoundationDB cluster file not found, skipping test")
}
config := util.GetViper()
config.Set("foundationdb.cluster_file", clusterFile)
config.Set("foundationdb.api_version", 740)
config.Set("foundationdb.timeout", "10s")
config.Set("foundationdb.max_retry_delay", "2s")
config.Set("foundationdb.directory_prefix", fmt.Sprintf("seaweedfs_concurrent_test_%d", time.Now().UnixNano()))
store := &foundationdb.FoundationDBStore{}
err := store.Initialize(config, "foundationdb.")
if err != nil {
t.Fatalf("Failed to initialize FoundationDB store: %v", err)
}
return store
}

370
test/foundationdb/foundationdb_integration_test.go

@ -0,0 +1,370 @@
//go:build foundationdb
// +build foundationdb
package foundationdb
import (
"context"
"fmt"
"os"
"testing"
"time"
"github.com/seaweedfs/seaweedfs/weed/filer"
"github.com/seaweedfs/seaweedfs/weed/filer/foundationdb"
"github.com/seaweedfs/seaweedfs/weed/pb/filer_pb"
"github.com/seaweedfs/seaweedfs/weed/util"
)
func TestFoundationDBStore_BasicOperations(t *testing.T) {
store := createTestStore(t)
defer store.Shutdown()
ctx := context.Background()
// Test InsertEntry
entry := &filer.Entry{
FullPath: "/test/file1.txt",
Attr: filer.Attr{
Mode: 0644,
Uid: 1000,
Gid: 1000,
Mtime: time.Now(),
},
}
err := store.InsertEntry(ctx, entry)
if err != nil {
t.Fatalf("InsertEntry failed: %v", err)
}
// Test FindEntry
foundEntry, err := store.FindEntry(ctx, "/test/file1.txt")
if err != nil {
t.Fatalf("FindEntry failed: %v", err)
}
if foundEntry.FullPath != entry.FullPath {
t.Errorf("Expected path %s, got %s", entry.FullPath, foundEntry.FullPath)
}
if foundEntry.Attr.Mode != entry.Attr.Mode {
t.Errorf("Expected mode %o, got %o", entry.Attr.Mode, foundEntry.Attr.Mode)
}
// Test UpdateEntry
foundEntry.Attr.Mode = 0755
err = store.UpdateEntry(ctx, foundEntry)
if err != nil {
t.Fatalf("UpdateEntry failed: %v", err)
}
updatedEntry, err := store.FindEntry(ctx, "/test/file1.txt")
if err != nil {
t.Fatalf("FindEntry after update failed: %v", err)
}
if updatedEntry.Attr.Mode != 0755 {
t.Errorf("Expected updated mode 0755, got %o", updatedEntry.Attr.Mode)
}
// Test DeleteEntry
err = store.DeleteEntry(ctx, "/test/file1.txt")
if err != nil {
t.Fatalf("DeleteEntry failed: %v", err)
}
_, err = store.FindEntry(ctx, "/test/file1.txt")
if err == nil {
t.Error("Expected entry to be deleted, but it was found")
}
if err != filer_pb.ErrNotFound {
t.Errorf("Expected ErrNotFound, got %v", err)
}
}
func TestFoundationDBStore_DirectoryOperations(t *testing.T) {
store := createTestStore(t)
defer store.Shutdown()
ctx := context.Background()
// Create multiple entries in a directory
testDir := "/test/dir"
files := []string{"file1.txt", "file2.txt", "file3.txt", "subdir/"}
for _, fileName := range files {
entry := &filer.Entry{
FullPath: util.NewFullPath(testDir, fileName),
Attr: filer.Attr{
Mode: 0644,
Uid: 1000,
Gid: 1000,
Mtime: time.Now(),
},
}
if fileName == "subdir/" {
entry.Attr.Mode = 0755 | os.ModeDir
}
err := store.InsertEntry(ctx, entry)
if err != nil {
t.Fatalf("InsertEntry failed for %s: %v", fileName, err)
}
}
// Test ListDirectoryEntries
var listedFiles []string
lastFileName, err := store.ListDirectoryEntries(ctx, testDir, "", true, 100, func(entry *filer.Entry) bool {
listedFiles = append(listedFiles, entry.Name())
return true
})
if err != nil {
t.Fatalf("ListDirectoryEntries failed: %v", err)
}
t.Logf("Last file name: %s", lastFileName)
t.Logf("Listed files: %v", listedFiles)
if len(listedFiles) != len(files) {
t.Errorf("Expected %d files, got %d", len(files), len(listedFiles))
}
// Test ListDirectoryPrefixedEntries
var prefixedFiles []string
_, err = store.ListDirectoryPrefixedEntries(ctx, testDir, "", true, 100, "file", func(entry *filer.Entry) bool {
prefixedFiles = append(prefixedFiles, entry.Name())
return true
})
if err != nil {
t.Fatalf("ListDirectoryPrefixedEntries failed: %v", err)
}
expectedPrefixedCount := 3 // file1.txt, file2.txt, file3.txt
if len(prefixedFiles) != expectedPrefixedCount {
t.Errorf("Expected %d prefixed files, got %d: %v", expectedPrefixedCount, len(prefixedFiles), prefixedFiles)
}
// Test DeleteFolderChildren
err = store.DeleteFolderChildren(ctx, testDir)
if err != nil {
t.Fatalf("DeleteFolderChildren failed: %v", err)
}
// Verify children are deleted
var remainingFiles []string
_, err = store.ListDirectoryEntries(ctx, testDir, "", true, 100, func(entry *filer.Entry) bool {
remainingFiles = append(remainingFiles, entry.Name())
return true
})
if err != nil {
t.Fatalf("ListDirectoryEntries after delete failed: %v", err)
}
if len(remainingFiles) != 0 {
t.Errorf("Expected no files after DeleteFolderChildren, got %d: %v", len(remainingFiles), remainingFiles)
}
}
func TestFoundationDBStore_TransactionOperations(t *testing.T) {
store := createTestStore(t)
defer store.Shutdown()
ctx := context.Background()
// Begin transaction
txCtx, err := store.BeginTransaction(ctx)
if err != nil {
t.Fatalf("BeginTransaction failed: %v", err)
}
// Insert entry in transaction
entry := &filer.Entry{
FullPath: "/test/tx_file.txt",
Attr: filer.Attr{
Mode: 0644,
Uid: 1000,
Gid: 1000,
Mtime: time.Now(),
},
}
err = store.InsertEntry(txCtx, entry)
if err != nil {
t.Fatalf("InsertEntry in transaction failed: %v", err)
}
// Entry should not be visible outside transaction yet
_, err = store.FindEntry(ctx, "/test/tx_file.txt")
if err == nil {
t.Error("Entry should not be visible before transaction commit")
}
// Commit transaction
err = store.CommitTransaction(txCtx)
if err != nil {
t.Fatalf("CommitTransaction failed: %v", err)
}
// Entry should now be visible
foundEntry, err := store.FindEntry(ctx, "/test/tx_file.txt")
if err != nil {
t.Fatalf("FindEntry after commit failed: %v", err)
}
if foundEntry.FullPath != entry.FullPath {
t.Errorf("Expected path %s, got %s", entry.FullPath, foundEntry.FullPath)
}
// Test rollback
txCtx2, err := store.BeginTransaction(ctx)
if err != nil {
t.Fatalf("BeginTransaction for rollback test failed: %v", err)
}
entry2 := &filer.Entry{
FullPath: "/test/rollback_file.txt",
Attr: filer.Attr{
Mode: 0644,
Uid: 1000,
Gid: 1000,
Mtime: time.Now(),
},
}
err = store.InsertEntry(txCtx2, entry2)
if err != nil {
t.Fatalf("InsertEntry for rollback test failed: %v", err)
}
// Rollback transaction
err = store.RollbackTransaction(txCtx2)
if err != nil {
t.Fatalf("RollbackTransaction failed: %v", err)
}
// Entry should not exist after rollback
_, err = store.FindEntry(ctx, "/test/rollback_file.txt")
if err == nil {
t.Error("Entry should not exist after rollback")
}
if err != filer_pb.ErrNotFound {
t.Errorf("Expected ErrNotFound after rollback, got %v", err)
}
}
func TestFoundationDBStore_KVOperations(t *testing.T) {
store := createTestStore(t)
defer store.Shutdown()
ctx := context.Background()
// Test KvPut
key := []byte("test_key")
value := []byte("test_value")
err := store.KvPut(ctx, key, value)
if err != nil {
t.Fatalf("KvPut failed: %v", err)
}
// Test KvGet
retrievedValue, err := store.KvGet(ctx, key)
if err != nil {
t.Fatalf("KvGet failed: %v", err)
}
if string(retrievedValue) != string(value) {
t.Errorf("Expected value %s, got %s", value, retrievedValue)
}
// Test KvDelete
err = store.KvDelete(ctx, key)
if err != nil {
t.Fatalf("KvDelete failed: %v", err)
}
// Verify key is deleted
_, err = store.KvGet(ctx, key)
if err == nil {
t.Error("Expected key to be deleted")
}
if err != filer.ErrKvNotFound {
t.Errorf("Expected ErrKvNotFound, got %v", err)
}
}
func TestFoundationDBStore_LargeEntry(t *testing.T) {
store := createTestStore(t)
defer store.Shutdown()
ctx := context.Background()
// Create entry with many chunks (to test compression)
entry := &filer.Entry{
FullPath: "/test/large_file.txt",
Attr: filer.Attr{
Mode: 0644,
Uid: 1000,
Gid: 1000,
Mtime: time.Now(),
},
}
// Add many chunks to trigger compression
for i := 0; i < filer.CountEntryChunksForGzip+10; i++ {
chunk := &filer_pb.FileChunk{
FileId: util.Uint64toHex(uint64(i)),
Offset: int64(i * 1024),
Size: 1024,
}
entry.Chunks = append(entry.Chunks, chunk)
}
err := store.InsertEntry(ctx, entry)
if err != nil {
t.Fatalf("InsertEntry with large chunks failed: %v", err)
}
// Retrieve and verify
foundEntry, err := store.FindEntry(ctx, "/test/large_file.txt")
if err != nil {
t.Fatalf("FindEntry for large file failed: %v", err)
}
if len(foundEntry.Chunks) != len(entry.Chunks) {
t.Errorf("Expected %d chunks, got %d", len(entry.Chunks), len(foundEntry.Chunks))
}
// Verify some chunk data
if foundEntry.Chunks[0].FileId != entry.Chunks[0].FileId {
t.Errorf("Expected first chunk FileId %s, got %s", entry.Chunks[0].FileId, foundEntry.Chunks[0].FileId)
}
}
func createTestStore(t *testing.T) *foundationdb.FoundationDBStore {
// Skip test if FoundationDB cluster file doesn't exist
clusterFile := os.Getenv("FDB_CLUSTER_FILE")
if clusterFile == "" {
clusterFile = "/var/fdb/config/fdb.cluster"
}
if _, err := os.Stat(clusterFile); os.IsNotExist(err) {
t.Skip("FoundationDB cluster file not found, skipping test")
}
config := util.GetViper()
config.Set("foundationdb.cluster_file", clusterFile)
config.Set("foundationdb.api_version", 740)
config.Set("foundationdb.timeout", "10s")
config.Set("foundationdb.max_retry_delay", "2s")
config.Set("foundationdb.directory_prefix", fmt.Sprintf("seaweedfs_test_%d", time.Now().UnixNano()))
store := &foundationdb.FoundationDBStore{}
err := store.Initialize(config, "foundationdb.")
if err != nil {
t.Fatalf("Failed to initialize FoundationDB store: %v", err)
}
return store
}

424
test/foundationdb/mock_integration_test.go

@ -0,0 +1,424 @@
package foundationdb
import (
"context"
"sort"
"strings"
"testing"
"time"
"github.com/seaweedfs/seaweedfs/weed/filer"
"github.com/seaweedfs/seaweedfs/weed/pb/filer_pb"
"github.com/seaweedfs/seaweedfs/weed/util"
)
// MockFoundationDBStore provides a simple mock implementation for testing
type MockFoundationDBStore struct {
data map[string][]byte
kvStore map[string][]byte
inTransaction bool
}
func NewMockFoundationDBStore() *MockFoundationDBStore {
return &MockFoundationDBStore{
data: make(map[string][]byte),
kvStore: make(map[string][]byte),
}
}
func (store *MockFoundationDBStore) GetName() string {
return "foundationdb_mock"
}
func (store *MockFoundationDBStore) Initialize(configuration util.Configuration, prefix string) error {
return nil
}
func (store *MockFoundationDBStore) BeginTransaction(ctx context.Context) (context.Context, error) {
store.inTransaction = true
return ctx, nil
}
func (store *MockFoundationDBStore) CommitTransaction(ctx context.Context) error {
store.inTransaction = false
return nil
}
func (store *MockFoundationDBStore) RollbackTransaction(ctx context.Context) error {
store.inTransaction = false
return nil
}
func (store *MockFoundationDBStore) InsertEntry(ctx context.Context, entry *filer.Entry) error {
return store.UpdateEntry(ctx, entry)
}
func (store *MockFoundationDBStore) UpdateEntry(ctx context.Context, entry *filer.Entry) error {
key := string(entry.FullPath)
value, err := entry.EncodeAttributesAndChunks()
if err != nil {
return err
}
store.data[key] = value
return nil
}
func (store *MockFoundationDBStore) FindEntry(ctx context.Context, fullpath util.FullPath) (entry *filer.Entry, err error) {
key := string(fullpath)
data, exists := store.data[key]
if !exists {
return nil, filer_pb.ErrNotFound
}
entry = &filer.Entry{
FullPath: fullpath,
}
err = entry.DecodeAttributesAndChunks(data)
return entry, err
}
func (store *MockFoundationDBStore) DeleteEntry(ctx context.Context, fullpath util.FullPath) error {
key := string(fullpath)
delete(store.data, key)
return nil
}
func (store *MockFoundationDBStore) DeleteFolderChildren(ctx context.Context, fullpath util.FullPath) error {
prefix := string(fullpath)
if !strings.HasSuffix(prefix, "/") {
prefix += "/"
}
for key := range store.data {
if strings.HasPrefix(key, prefix) {
delete(store.data, key)
}
}
return nil
}
func (store *MockFoundationDBStore) ListDirectoryEntries(ctx context.Context, dirPath util.FullPath, startFileName string, includeStartFile bool, limit int64, eachEntryFunc filer.ListEachEntryFunc) (lastFileName string, err error) {
return store.ListDirectoryPrefixedEntries(ctx, dirPath, startFileName, includeStartFile, limit, "", eachEntryFunc)
}
func (store *MockFoundationDBStore) ListDirectoryPrefixedEntries(ctx context.Context, dirPath util.FullPath, startFileName string, includeStartFile bool, limit int64, prefix string, eachEntryFunc filer.ListEachEntryFunc) (lastFileName string, err error) {
dirPrefix := string(dirPath)
if !strings.HasSuffix(dirPrefix, "/") {
dirPrefix += "/"
}
var entries []string
for key := range store.data {
if strings.HasPrefix(key, dirPrefix) {
relativePath := strings.TrimPrefix(key, dirPrefix)
// Only direct children (no subdirectories)
if !strings.Contains(relativePath, "/") && strings.HasPrefix(relativePath, prefix) {
entries = append(entries, key)
}
}
}
// Sort entries for consistent ordering
sort.Strings(entries)
// Apply startFileName filter
startIndex := 0
if startFileName != "" {
for i, entryPath := range entries {
fileName := strings.TrimPrefix(entryPath, dirPrefix)
if fileName == startFileName {
if includeStartFile {
startIndex = i
} else {
startIndex = i + 1
}
break
} else if fileName > startFileName {
startIndex = i
break
}
}
}
// Iterate through sorted entries with limit
count := int64(0)
for i := startIndex; i < len(entries) && count < limit; i++ {
entryPath := entries[i]
data := store.data[entryPath]
entry := &filer.Entry{
FullPath: util.FullPath(entryPath),
}
if err := entry.DecodeAttributesAndChunks(data); err != nil {
continue
}
if !eachEntryFunc(entry) {
break
}
lastFileName = entry.Name()
count++
}
return lastFileName, nil
}
func (store *MockFoundationDBStore) KvPut(ctx context.Context, key []byte, value []byte) error {
store.kvStore[string(key)] = value
return nil
}
func (store *MockFoundationDBStore) KvGet(ctx context.Context, key []byte) ([]byte, error) {
value, exists := store.kvStore[string(key)]
if !exists {
return nil, filer.ErrKvNotFound
}
return value, nil
}
func (store *MockFoundationDBStore) KvDelete(ctx context.Context, key []byte) error {
delete(store.kvStore, string(key))
return nil
}
func (store *MockFoundationDBStore) Shutdown() {
// Nothing to do for mock
}
// TestMockFoundationDBStore_BasicOperations tests basic store operations with mock
func TestMockFoundationDBStore_BasicOperations(t *testing.T) {
store := NewMockFoundationDBStore()
defer store.Shutdown()
ctx := context.Background()
// Test InsertEntry
entry := &filer.Entry{
FullPath: "/test/file1.txt",
Attr: filer.Attr{
Mode: 0644,
Uid: 1000,
Gid: 1000,
Mtime: time.Now(),
},
}
err := store.InsertEntry(ctx, entry)
if err != nil {
t.Fatalf("InsertEntry failed: %v", err)
}
t.Log("✅ InsertEntry successful")
// Test FindEntry
foundEntry, err := store.FindEntry(ctx, "/test/file1.txt")
if err != nil {
t.Fatalf("FindEntry failed: %v", err)
}
if foundEntry.FullPath != entry.FullPath {
t.Errorf("Expected path %s, got %s", entry.FullPath, foundEntry.FullPath)
}
t.Log("✅ FindEntry successful")
// Test UpdateEntry
foundEntry.Attr.Mode = 0755
err = store.UpdateEntry(ctx, foundEntry)
if err != nil {
t.Fatalf("UpdateEntry failed: %v", err)
}
t.Log("✅ UpdateEntry successful")
// Test DeleteEntry
err = store.DeleteEntry(ctx, "/test/file1.txt")
if err != nil {
t.Fatalf("DeleteEntry failed: %v", err)
}
t.Log("✅ DeleteEntry successful")
// Test entry is deleted
_, err = store.FindEntry(ctx, "/test/file1.txt")
if err == nil {
t.Error("Expected entry to be deleted, but it was found")
}
if err != filer_pb.ErrNotFound {
t.Errorf("Expected ErrNotFound, got %v", err)
}
t.Log("✅ Entry deletion verified")
}
// TestMockFoundationDBStore_TransactionOperations tests transaction handling
func TestMockFoundationDBStore_TransactionOperations(t *testing.T) {
store := NewMockFoundationDBStore()
defer store.Shutdown()
ctx := context.Background()
// Test transaction workflow
txCtx, err := store.BeginTransaction(ctx)
if err != nil {
t.Fatalf("BeginTransaction failed: %v", err)
}
t.Log("✅ BeginTransaction successful")
if !store.inTransaction {
t.Error("Expected to be in transaction")
}
// Insert entry in transaction
entry := &filer.Entry{
FullPath: "/test/tx_file.txt",
Attr: filer.Attr{
Mode: 0644,
Uid: 1000,
Gid: 1000,
Mtime: time.Now(),
},
}
err = store.InsertEntry(txCtx, entry)
if err != nil {
t.Fatalf("InsertEntry in transaction failed: %v", err)
}
t.Log("✅ InsertEntry in transaction successful")
// Commit transaction
err = store.CommitTransaction(txCtx)
if err != nil {
t.Fatalf("CommitTransaction failed: %v", err)
}
t.Log("✅ CommitTransaction successful")
if store.inTransaction {
t.Error("Expected to not be in transaction after commit")
}
// Test rollback
txCtx2, err := store.BeginTransaction(ctx)
if err != nil {
t.Fatalf("BeginTransaction for rollback test failed: %v", err)
}
err = store.RollbackTransaction(txCtx2)
if err != nil {
t.Fatalf("RollbackTransaction failed: %v", err)
}
t.Log("✅ RollbackTransaction successful")
if store.inTransaction {
t.Error("Expected to not be in transaction after rollback")
}
}
// TestMockFoundationDBStore_KVOperations tests key-value operations
func TestMockFoundationDBStore_KVOperations(t *testing.T) {
store := NewMockFoundationDBStore()
defer store.Shutdown()
ctx := context.Background()
// Test KvPut
key := []byte("test_key")
value := []byte("test_value")
err := store.KvPut(ctx, key, value)
if err != nil {
t.Fatalf("KvPut failed: %v", err)
}
t.Log("✅ KvPut successful")
// Test KvGet
retrievedValue, err := store.KvGet(ctx, key)
if err != nil {
t.Fatalf("KvGet failed: %v", err)
}
if string(retrievedValue) != string(value) {
t.Errorf("Expected value %s, got %s", value, retrievedValue)
}
t.Log("✅ KvGet successful")
// Test KvDelete
err = store.KvDelete(ctx, key)
if err != nil {
t.Fatalf("KvDelete failed: %v", err)
}
t.Log("✅ KvDelete successful")
// Verify key is deleted
_, err = store.KvGet(ctx, key)
if err == nil {
t.Error("Expected key to be deleted")
}
if err != filer.ErrKvNotFound {
t.Errorf("Expected ErrKvNotFound, got %v", err)
}
t.Log("✅ Key deletion verified")
}
// TestMockFoundationDBStore_DirectoryOperations tests directory operations
func TestMockFoundationDBStore_DirectoryOperations(t *testing.T) {
store := NewMockFoundationDBStore()
defer store.Shutdown()
ctx := context.Background()
// Create multiple entries in a directory
testDir := util.FullPath("/test/dir/")
files := []string{"file1.txt", "file2.txt", "file3.txt"}
for _, fileName := range files {
entry := &filer.Entry{
FullPath: util.NewFullPath(string(testDir), fileName),
Attr: filer.Attr{
Mode: 0644,
Uid: 1000,
Gid: 1000,
Mtime: time.Now(),
},
}
err := store.InsertEntry(ctx, entry)
if err != nil {
t.Fatalf("InsertEntry failed for %s: %v", fileName, err)
}
}
t.Log("✅ Directory entries created")
// Test ListDirectoryEntries
var listedFiles []string
lastFileName, err := store.ListDirectoryEntries(ctx, testDir, "", true, 100, func(entry *filer.Entry) bool {
listedFiles = append(listedFiles, entry.Name())
return true
})
if err != nil {
t.Fatalf("ListDirectoryEntries failed: %v", err)
}
t.Logf("✅ ListDirectoryEntries successful, last file: %s", lastFileName)
t.Logf("Listed files: %v", listedFiles)
// Test DeleteFolderChildren
err = store.DeleteFolderChildren(ctx, testDir)
if err != nil {
t.Fatalf("DeleteFolderChildren failed: %v", err)
}
t.Log("✅ DeleteFolderChildren successful")
// Verify children are deleted
var remainingFiles []string
_, err = store.ListDirectoryEntries(ctx, testDir, "", true, 100, func(entry *filer.Entry) bool {
remainingFiles = append(remainingFiles, entry.Name())
return true
})
if err != nil {
t.Fatalf("ListDirectoryEntries after delete failed: %v", err)
}
if len(remainingFiles) != 0 {
t.Errorf("Expected no files after DeleteFolderChildren, got %d: %v", len(remainingFiles), remainingFiles)
}
t.Log("✅ Folder children deletion verified")
}

31
test/foundationdb/s3.json

@ -0,0 +1,31 @@
{
"identities": [
{
"name": "anvil",
"credentials": [
{
"accessKey": "admin",
"secretKey": "admin_secret_key"
}
],
"actions": [
"Admin",
"Read",
"Write"
]
},
{
"name": "test_user",
"credentials": [
{
"accessKey": "test_access_key",
"secretKey": "test_secret_key"
}
],
"actions": [
"Read",
"Write"
]
}
]
}

128
test/foundationdb/test_fdb_s3.sh

@ -0,0 +1,128 @@
#!/bin/bash
# End-to-end test script for SeaweedFS with FoundationDB
set -e
# Colors
BLUE='\033[36m'
GREEN='\033[32m'
YELLOW='\033[33m'
RED='\033[31m'
NC='\033[0m' # No Color
# Test configuration
S3_ENDPOINT="http://127.0.0.1:8333"
ACCESS_KEY="admin"
SECRET_KEY="admin_secret_key"
BUCKET_NAME="test-fdb-bucket"
TEST_FILE="test-file.txt"
TEST_CONTENT="Hello FoundationDB from SeaweedFS!"
echo -e "${BLUE}Starting FoundationDB S3 integration tests...${NC}"
# Install aws-cli if not present (for testing)
if ! command -v aws &> /dev/null; then
echo -e "${YELLOW}AWS CLI not found. Please install it for full S3 testing.${NC}"
echo -e "${YELLOW}Continuing with curl-based tests...${NC}"
USE_CURL=true
else
USE_CURL=false
# Configure AWS CLI
export AWS_ACCESS_KEY_ID="$ACCESS_KEY"
export AWS_SECRET_ACCESS_KEY="$SECRET_KEY"
export AWS_DEFAULT_REGION="us-east-1"
fi
cleanup() {
echo -e "${YELLOW}Cleaning up test resources...${NC}"
if [ "$USE_CURL" = false ]; then
aws s3 rb s3://$BUCKET_NAME --force --endpoint-url=$S3_ENDPOINT 2>/dev/null || true
fi
rm -f $TEST_FILE
}
trap cleanup EXIT
echo -e "${BLUE}Test 1: Create test file${NC}"
echo "$TEST_CONTENT" > $TEST_FILE
echo -e "${GREEN}✅ Created test file${NC}"
if [ "$USE_CURL" = false ]; then
echo -e "${BLUE}Test 2: Create S3 bucket${NC}"
aws s3 mb s3://$BUCKET_NAME --endpoint-url=$S3_ENDPOINT
echo -e "${GREEN}✅ Bucket created successfully${NC}"
echo -e "${BLUE}Test 3: Upload file to S3${NC}"
aws s3 cp $TEST_FILE s3://$BUCKET_NAME/ --endpoint-url=$S3_ENDPOINT
echo -e "${GREEN}✅ File uploaded successfully${NC}"
echo -e "${BLUE}Test 4: List bucket contents${NC}"
aws s3 ls s3://$BUCKET_NAME --endpoint-url=$S3_ENDPOINT
echo -e "${GREEN}✅ Listed bucket contents${NC}"
echo -e "${BLUE}Test 5: Download and verify file${NC}"
aws s3 cp s3://$BUCKET_NAME/$TEST_FILE downloaded-$TEST_FILE --endpoint-url=$S3_ENDPOINT
if diff $TEST_FILE downloaded-$TEST_FILE > /dev/null; then
echo -e "${GREEN}✅ File content verification passed${NC}"
else
echo -e "${RED}❌ File content verification failed${NC}"
exit 1
fi
rm -f downloaded-$TEST_FILE
echo -e "${BLUE}Test 6: Delete file${NC}"
aws s3 rm s3://$BUCKET_NAME/$TEST_FILE --endpoint-url=$S3_ENDPOINT
echo -e "${GREEN}✅ File deleted successfully${NC}"
echo -e "${BLUE}Test 7: Verify file deletion${NC}"
if aws s3 ls s3://$BUCKET_NAME --endpoint-url=$S3_ENDPOINT | grep -q $TEST_FILE; then
echo -e "${RED}❌ File deletion verification failed${NC}"
exit 1
else
echo -e "${GREEN}✅ File deletion verified${NC}"
fi
else
echo -e "${YELLOW}Running basic curl tests...${NC}"
echo -e "${BLUE}Test 2: Check S3 endpoint availability${NC}"
if curl -f -s $S3_ENDPOINT > /dev/null; then
echo -e "${GREEN}✅ S3 endpoint is accessible${NC}"
else
echo -e "${RED}❌ S3 endpoint is not accessible${NC}"
exit 1
fi
fi
echo -e "${BLUE}Test: FoundationDB backend verification${NC}"
# Check that data is actually stored in FoundationDB
docker-compose exec -T fdb1 fdbcli --exec 'getrange seaweedfs seaweedfs\xFF' > fdb_keys.txt || true
if [ -s fdb_keys.txt ] && grep -q "seaweedfs" fdb_keys.txt; then
echo -e "${GREEN}✅ Data confirmed in FoundationDB backend${NC}"
else
echo -e "${YELLOW}⚠️ No data found in FoundationDB (may be expected if no operations performed)${NC}"
fi
rm -f fdb_keys.txt
echo -e "${BLUE}Test: Filer metadata operations${NC}"
# Test direct filer operations
FILER_ENDPOINT="http://127.0.0.1:8888"
# Create a directory
curl -X POST "$FILER_ENDPOINT/test-dir/" -H "Content-Type: application/json" -d '{}' || true
echo -e "${GREEN}✅ Directory creation test completed${NC}"
# List directory
curl -s "$FILER_ENDPOINT/" | head -10 || true
echo -e "${GREEN}✅ Directory listing test completed${NC}"
echo -e "${GREEN}🎉 All FoundationDB integration tests passed!${NC}"
echo -e "${BLUE}Test Summary:${NC}"
echo "- S3 API compatibility: ✅"
echo "- FoundationDB backend: ✅"
echo "- Filer operations: ✅"
echo "- Data persistence: ✅"

174
test/foundationdb/validation_test.go

@ -0,0 +1,174 @@
package foundationdb
import (
"fmt"
"os"
"path/filepath"
"strings"
"testing"
)
// TestPackageStructure validates the FoundationDB package structure without requiring dependencies
func TestPackageStructure(t *testing.T) {
t.Log("✅ Testing FoundationDB package structure...")
// Verify the main package files exist
packagePath := "../../weed/filer/foundationdb"
expectedFiles := map[string]bool{
"foundationdb_store.go": false,
"foundationdb_store_test.go": false,
"doc.go": false,
"README.md": false,
}
err := filepath.Walk(packagePath, func(path string, info os.FileInfo, err error) error {
if err != nil {
return nil // Skip errors
}
fileName := filepath.Base(path)
if _, exists := expectedFiles[fileName]; exists {
expectedFiles[fileName] = true
t.Logf("Found: %s", fileName)
}
return nil
})
if err != nil {
t.Logf("Warning: Could not access package path %s", packagePath)
}
for file, found := range expectedFiles {
if found {
t.Logf("✅ %s exists", file)
} else {
t.Logf("⚠️ %s not found (may be normal)", file)
}
}
}
// TestServerIntegration validates that the filer server includes FoundationDB import
func TestServerIntegration(t *testing.T) {
t.Log("✅ Testing server integration...")
serverFile := "../../weed/server/filer_server.go"
content, err := os.ReadFile(serverFile)
if err != nil {
t.Skipf("Cannot read server file: %v", err)
return
}
contentStr := string(content)
// Check for FoundationDB import
if strings.Contains(contentStr, `"github.com/seaweedfs/seaweedfs/weed/filer/foundationdb"`) {
t.Log("✅ FoundationDB import found in filer_server.go")
} else {
t.Error("❌ FoundationDB import not found in filer_server.go")
}
// Check for other expected imports for comparison
expectedImports := []string{
"leveldb",
"redis",
"mysql",
}
foundImports := 0
for _, imp := range expectedImports {
if strings.Contains(contentStr, fmt.Sprintf(`"github.com/seaweedfs/seaweedfs/weed/filer/%s"`, imp)) {
foundImports++
}
}
t.Logf("✅ Found %d/%d expected filer store imports", foundImports, len(expectedImports))
}
// TestBuildConstraints validates that build constraints work correctly
func TestBuildConstraints(t *testing.T) {
t.Log("✅ Testing build constraints...")
// Check that foundationdb package files have correct build tags
packagePath := "../../weed/filer/foundationdb"
err := filepath.Walk(packagePath, func(path string, info os.FileInfo, err error) error {
if err != nil || !strings.HasSuffix(path, ".go") || strings.HasSuffix(path, "_test.go") {
return nil
}
content, readErr := os.ReadFile(path)
if readErr != nil {
return nil
}
contentStr := string(content)
// Skip doc.go as it might not have build tags
if strings.HasSuffix(path, "doc.go") {
return nil
}
if strings.Contains(contentStr, "//go:build foundationdb") ||
strings.Contains(contentStr, "// +build foundationdb") {
t.Logf("✅ Build constraints found in %s", filepath.Base(path))
} else {
t.Logf("⚠️ No build constraints in %s", filepath.Base(path))
}
return nil
})
if err != nil {
t.Logf("Warning: Could not validate build constraints: %v", err)
}
}
// TestDocumentationExists validates that documentation files are present
func TestDocumentationExists(t *testing.T) {
t.Log("✅ Testing documentation...")
docs := []struct {
path string
name string
}{
{"README.md", "Main README"},
{"Makefile", "Build automation"},
{"docker-compose.yml", "Docker setup"},
{"filer.toml", "Configuration template"},
{"../../weed/filer/foundationdb/README.md", "Package README"},
}
for _, doc := range docs {
if _, err := os.Stat(doc.path); err == nil {
t.Logf("✅ %s exists", doc.name)
} else {
t.Logf("⚠️ %s not found: %s", doc.name, doc.path)
}
}
}
// TestConfigurationValidation tests configuration file syntax
func TestConfigurationValidation(t *testing.T) {
t.Log("✅ Testing configuration files...")
// Test filer.toml syntax
if content, err := os.ReadFile("filer.toml"); err == nil {
contentStr := string(content)
expectedConfigs := []string{
"[foundationdb]",
"enabled",
"cluster_file",
"api_version",
}
for _, config := range expectedConfigs {
if strings.Contains(contentStr, config) {
t.Logf("✅ Found config: %s", config)
} else {
t.Logf("⚠️ Config not found: %s", config)
}
}
} else {
t.Log("⚠️ filer.toml not accessible")
}
}

109
test/foundationdb/wait_for_services.sh

@ -0,0 +1,109 @@
#!/bin/bash
# Script to wait for all services to be ready
set -e
# Colors
BLUE='\033[36m'
GREEN='\033[32m'
YELLOW='\033[33m'
RED='\033[31m'
NC='\033[0m' # No Color
echo -e "${BLUE}Waiting for FoundationDB cluster to be ready...${NC}"
# Wait for FoundationDB cluster
MAX_ATTEMPTS=30
ATTEMPT=0
while [ $ATTEMPT -lt $MAX_ATTEMPTS ]; do
if docker-compose exec -T fdb1 fdbcli --exec 'status' > /dev/null 2>&1; then
echo -e "${GREEN}✅ FoundationDB cluster is ready${NC}"
break
fi
ATTEMPT=$((ATTEMPT + 1))
echo -e "${YELLOW}Attempt $ATTEMPT/$MAX_ATTEMPTS - waiting for FoundationDB...${NC}"
sleep 5
done
if [ $ATTEMPT -eq $MAX_ATTEMPTS ]; then
echo -e "${RED}❌ FoundationDB cluster failed to start after $MAX_ATTEMPTS attempts${NC}"
echo -e "${RED}Checking logs...${NC}"
docker-compose logs fdb1 fdb2 fdb3 fdb-init
exit 1
fi
echo -e "${BLUE}Waiting for SeaweedFS to be ready...${NC}"
# Wait for SeaweedFS master
MAX_ATTEMPTS=20
ATTEMPT=0
while [ $ATTEMPT -lt $MAX_ATTEMPTS ]; do
if curl -s http://127.0.0.1:9333/cluster/status > /dev/null 2>&1; then
echo -e "${GREEN}✅ SeaweedFS master is ready${NC}"
break
fi
ATTEMPT=$((ATTEMPT + 1))
echo -e "${YELLOW}Attempt $ATTEMPT/$MAX_ATTEMPTS - waiting for SeaweedFS master...${NC}"
sleep 3
done
if [ $ATTEMPT -eq $MAX_ATTEMPTS ]; then
echo -e "${RED}❌ SeaweedFS master failed to start${NC}"
docker-compose logs seaweedfs
exit 1
fi
# Wait for SeaweedFS filer
MAX_ATTEMPTS=20
ATTEMPT=0
while [ $ATTEMPT -lt $MAX_ATTEMPTS ]; do
if curl -s http://127.0.0.1:8888/ > /dev/null 2>&1; then
echo -e "${GREEN}✅ SeaweedFS filer is ready${NC}"
break
fi
ATTEMPT=$((ATTEMPT + 1))
echo -e "${YELLOW}Attempt $ATTEMPT/$MAX_ATTEMPTS - waiting for SeaweedFS filer...${NC}"
sleep 3
done
if [ $ATTEMPT -eq $MAX_ATTEMPTS ]; then
echo -e "${RED}❌ SeaweedFS filer failed to start${NC}"
docker-compose logs seaweedfs
exit 1
fi
# Wait for SeaweedFS S3 API
MAX_ATTEMPTS=20
ATTEMPT=0
while [ $ATTEMPT -lt $MAX_ATTEMPTS ]; do
if curl -s http://127.0.0.1:8333/ > /dev/null 2>&1; then
echo -e "${GREEN}✅ SeaweedFS S3 API is ready${NC}"
break
fi
ATTEMPT=$((ATTEMPT + 1))
echo -e "${YELLOW}Attempt $ATTEMPT/$MAX_ATTEMPTS - waiting for SeaweedFS S3 API...${NC}"
sleep 3
done
if [ $ATTEMPT -eq $MAX_ATTEMPTS ]; then
echo -e "${RED}❌ SeaweedFS S3 API failed to start${NC}"
docker-compose logs seaweedfs
exit 1
fi
echo -e "${GREEN}🎉 All services are ready!${NC}"
# Display final status
echo -e "${BLUE}Final status check:${NC}"
docker-compose exec -T fdb1 fdbcli --exec 'status'
echo ""
echo -e "${BLUE}SeaweedFS cluster info:${NC}"
curl -s http://127.0.0.1:9333/cluster/status | head -20

1
test/fuse_integration/framework.go

@ -175,6 +175,7 @@ func (f *FuseTestFramework) startMaster(config *TestConfig) error {
"-port=19333",
"-mdir=" + filepath.Join(f.dataDir, "master"),
"-raftBootstrap",
"-peers=none", // Faster startup when no multiple masters needed
}
if config.EnableDebug {
args = append(args, "-v=4")

2
test/kafka/README.md

@ -37,7 +37,7 @@ Requires running SeaweedFS instance:
1. **Start SeaweedFS with MQ support:**
```bash
# Terminal 1: Start SeaweedFS server
weed server -ip="127.0.0.1" -ip.bind="0.0.0.0" -dir=/tmp/seaweedfs-data -master.port=9333 -volume.port=8081 -filer.port=8888 -filer=true
weed server -ip="127.0.0.1" -ip.bind="0.0.0.0" -dir=/tmp/seaweedfs-data -master.port=9333 -volume.port=8081 -filer.port=8888 -filer=true -master.peers=none
# Terminal 2: Start MQ broker
weed mq.broker -master="127.0.0.1:9333" -ip="127.0.0.1" -port=17777

12
test/kafka/go.mod

@ -229,15 +229,15 @@ require (
go.opentelemetry.io/otel/trace v1.37.0 // indirect
go.yaml.in/yaml/v2 v2.4.2 // indirect
go.yaml.in/yaml/v3 v3.0.4 // indirect
golang.org/x/crypto v0.43.0 // indirect
golang.org/x/crypto v0.45.0 // indirect
golang.org/x/exp v0.0.0-20250811191247-51f88131bc50 // indirect
golang.org/x/image v0.32.0 // indirect
golang.org/x/net v0.46.0 // indirect
golang.org/x/image v0.33.0 // indirect
golang.org/x/net v0.47.0 // indirect
golang.org/x/oauth2 v0.30.0 // indirect
golang.org/x/sync v0.17.0 // indirect
golang.org/x/sync v0.18.0 // indirect
golang.org/x/sys v0.38.0 // indirect
golang.org/x/term v0.36.0 // indirect
golang.org/x/text v0.30.0 // indirect
golang.org/x/term v0.37.0 // indirect
golang.org/x/text v0.31.0 // indirect
golang.org/x/time v0.12.0 // indirect
google.golang.org/api v0.247.0 // indirect
google.golang.org/genproto/googleapis/rpc v0.0.0-20250818200422-3122310a409c // indirect

32
test/kafka/go.sum

@ -732,8 +732,8 @@ golang.org/x/crypto v0.22.0/go.mod h1:vr6Su+7cTlO45qkww3VDJlzDn0ctJvRgYbC2NvXHt+
golang.org/x/crypto v0.23.0/go.mod h1:CKFgDieR+mRhux2Lsu27y0fO304Db0wZe70UKqHu0v8=
golang.org/x/crypto v0.31.0/go.mod h1:kDsLvtWBEx7MV9tJOj9bnXsPbxwJQ6csT/x4KIN4Ssk=
golang.org/x/crypto v0.33.0/go.mod h1:bVdXmD7IV/4GdElGPozy6U7lWdRXA4qyRVGJV57uQ5M=
golang.org/x/crypto v0.43.0 h1:dduJYIi3A3KOfdGOHX8AVZ/jGiyPa3IbBozJ5kNuE04=
golang.org/x/crypto v0.43.0/go.mod h1:BFbav4mRNlXJL4wNeejLpWxB7wMbc79PdRGhWKncxR0=
golang.org/x/crypto v0.45.0 h1:jMBrvKuj23MTlT0bQEOBcAE0mjg8mK9RXFhRH6nyF3Q=
golang.org/x/crypto v0.45.0/go.mod h1:XTGrrkGJve7CYK7J8PEww4aY7gM3qMCElcJQ8n8JdX4=
golang.org/x/exp v0.0.0-20190121172915-509febef88a4/go.mod h1:CJ0aWSM057203Lf6IL+f9T1iT9GByDxfZKAQTCR3kQA=
golang.org/x/exp v0.0.0-20190306152737-a1d7652674e8/go.mod h1:CJ0aWSM057203Lf6IL+f9T1iT9GByDxfZKAQTCR3kQA=
golang.org/x/exp v0.0.0-20190510132918-efd6b22b2522/go.mod h1:ZjyILWgesfNpC6sMxTJOJm9Kp84zZh5NQWvqDGG3Qr8=
@ -748,8 +748,8 @@ golang.org/x/exp v0.0.0-20250811191247-51f88131bc50 h1:3yiSh9fhy5/RhCSntf4Sy0Tnx
golang.org/x/exp v0.0.0-20250811191247-51f88131bc50/go.mod h1:rT6SFzZ7oxADUDx58pcaKFTcZ+inxAa9fTrYx/uVYwg=
golang.org/x/image v0.0.0-20190227222117-0694c2d4d067/go.mod h1:kZ7UVZpmo3dzQBMxlp+ypCbDeSB+sBbTgSJuh5dn5js=
golang.org/x/image v0.0.0-20190802002840-cff245a6509b/go.mod h1:FeLwcggjj3mMvU+oOTbSwawSJRM1uh48EjtB4UJZlP0=
golang.org/x/image v0.32.0 h1:6lZQWq75h7L5IWNk0r+SCpUJ6tUVd3v4ZHnbRKLkUDQ=
golang.org/x/image v0.32.0/go.mod h1:/R37rrQmKXtO6tYXAjtDLwQgFLHmhW+V6ayXlxzP2Pc=
golang.org/x/image v0.33.0 h1:LXRZRnv1+zGd5XBUVRFmYEphyyKJjQjCRiOuAP3sZfQ=
golang.org/x/image v0.33.0/go.mod h1:DD3OsTYT9chzuzTQt+zMcOlBHgfoKQb1gry8p76Y1sc=
golang.org/x/lint v0.0.0-20181026193005-c67002cb31c3/go.mod h1:UVdnD1Gm6xHRNCYTkRU2/jEulfH38KcIWyp/GAMgvoE=
golang.org/x/lint v0.0.0-20190227174305-5b3e6a55c961/go.mod h1:wehouNa3lNwaWXcvxsM5YxQ5yQlVC4a0KAMCusXpPoU=
golang.org/x/lint v0.0.0-20190301231843-5614ed5bae6f/go.mod h1:UVdnD1Gm6xHRNCYTkRU2/jEulfH38KcIWyp/GAMgvoE=
@ -774,8 +774,8 @@ golang.org/x/mod v0.8.0/go.mod h1:iBbtSCu2XBx23ZKBPSOrRkjjQPZFPuis4dIYUhu/chs=
golang.org/x/mod v0.12.0/go.mod h1:iBbtSCu2XBx23ZKBPSOrRkjjQPZFPuis4dIYUhu/chs=
golang.org/x/mod v0.15.0/go.mod h1:hTbmBsO62+eylJbnUtE2MGJUyE7QWk4xUqPFrRgJ+7c=
golang.org/x/mod v0.17.0/go.mod h1:hTbmBsO62+eylJbnUtE2MGJUyE7QWk4xUqPFrRgJ+7c=
golang.org/x/mod v0.28.0 h1:gQBtGhjxykdjY9YhZpSlZIsbnaE2+PgjfLWUQTnoZ1U=
golang.org/x/mod v0.28.0/go.mod h1:yfB/L0NOf/kmEbXjzCPOx1iK1fRutOydrCMsqRhEBxI=
golang.org/x/mod v0.29.0 h1:HV8lRxZC4l2cr3Zq1LvtOsi/ThTgWnUk/y64QSs8GwA=
golang.org/x/mod v0.29.0/go.mod h1:NyhrlYXJ2H4eJiRy/WDBO6HMqZQ6q9nk4JzS3NuCK+w=
golang.org/x/net v0.0.0-20180724234803-3673e40ba225/go.mod h1:mL1N/T3taQHkDXs73rZJwtUhF3w3ftmwwsq0BUmARs4=
golang.org/x/net v0.0.0-20180826012351-8a410e7b638d/go.mod h1:mL1N/T3taQHkDXs73rZJwtUhF3w3ftmwwsq0BUmARs4=
golang.org/x/net v0.0.0-20180906233101-161cd47e91fd/go.mod h1:mL1N/T3taQHkDXs73rZJwtUhF3w3ftmwwsq0BUmARs4=
@ -815,8 +815,8 @@ golang.org/x/net v0.15.0/go.mod h1:idbUs1IY1+zTqbi8yxTbhexhEEk5ur9LInksu6HrEpk=
golang.org/x/net v0.21.0/go.mod h1:bIjVDfnllIU7BJ2DNgfnXvpSvtn8VRwhlsaeUTyUS44=
golang.org/x/net v0.25.0/go.mod h1:JkAGAh7GEvH74S6FOH42FLoXpXbE/aqXSrIQjXgsiwM=
golang.org/x/net v0.33.0/go.mod h1:HXLR5J+9DxmrqMwG9qjGCxZ+zKXxBru04zlTvWlWuN4=
golang.org/x/net v0.46.0 h1:giFlY12I07fugqwPuWJi68oOnpfqFnJIJzaIIm2JVV4=
golang.org/x/net v0.46.0/go.mod h1:Q9BGdFy1y4nkUwiLvT5qtyhAnEHgnQ/zd8PfU6nc210=
golang.org/x/net v0.47.0 h1:Mx+4dIFzqraBXUugkia1OOvlD6LemFo1ALMHjrXDOhY=
golang.org/x/net v0.47.0/go.mod h1:/jNxtkgq5yWUGYkaZGqo27cfGZ1c5Nen03aYrrKpVRU=
golang.org/x/oauth2 v0.0.0-20180821212333-d2e6202438be/go.mod h1:N/0e6XlmueqKjAGxoOufVs8QHGRruUQn6yWY3a++T0U=
golang.org/x/oauth2 v0.0.0-20190226205417-e64efc72b421/go.mod h1:gOpvHmFTYa4IltrdGE7lF6nIHvwfUNPOp7c8zoXwtLw=
golang.org/x/oauth2 v0.0.0-20190604053449-0f29369cfe45/go.mod h1:gOpvHmFTYa4IltrdGE7lF6nIHvwfUNPOp7c8zoXwtLw=
@ -841,8 +841,8 @@ golang.org/x/sync v0.6.0/go.mod h1:Czt+wKu1gCyEFDUtn0jG5QVvpJ6rzVqr5aXyt9drQfk=
golang.org/x/sync v0.7.0/go.mod h1:Czt+wKu1gCyEFDUtn0jG5QVvpJ6rzVqr5aXyt9drQfk=
golang.org/x/sync v0.10.0/go.mod h1:Czt+wKu1gCyEFDUtn0jG5QVvpJ6rzVqr5aXyt9drQfk=
golang.org/x/sync v0.11.0/go.mod h1:Czt+wKu1gCyEFDUtn0jG5QVvpJ6rzVqr5aXyt9drQfk=
golang.org/x/sync v0.17.0 h1:l60nONMj9l5drqw6jlhIELNv9I0A4OFgRsG9k2oT9Ug=
golang.org/x/sync v0.17.0/go.mod h1:9KTHXmSnoGruLpwFjVSX0lNNA75CykiMECbovNTZqGI=
golang.org/x/sync v0.18.0 h1:kr88TuHDroi+UVf+0hZnirlk8o8T+4MrK6mr60WkH/I=
golang.org/x/sync v0.18.0/go.mod h1:9KTHXmSnoGruLpwFjVSX0lNNA75CykiMECbovNTZqGI=
golang.org/x/sys v0.0.0-20180830151530-49385e6e1522/go.mod h1:STP8DvDyc/dI5b8T5hshtkjS+E42TnysNCUPdjciGhY=
golang.org/x/sys v0.0.0-20180909124046-d0be0721c37e/go.mod h1:STP8DvDyc/dI5b8T5hshtkjS+E42TnysNCUPdjciGhY=
golang.org/x/sys v0.0.0-20190215142949-d0b11bdaac8a/go.mod h1:STP8DvDyc/dI5b8T5hshtkjS+E42TnysNCUPdjciGhY=
@ -908,8 +908,8 @@ golang.org/x/term v0.19.0/go.mod h1:2CuTdWZ7KHSQwUzKva0cbMg6q2DMI3Mmxp+gKJbskEk=
golang.org/x/term v0.20.0/go.mod h1:8UkIAJTvZgivsXaD6/pH6U9ecQzZ45awqEOzuCvwpFY=
golang.org/x/term v0.27.0/go.mod h1:iMsnZpn0cago0GOrHO2+Y7u7JPn5AylBrcoWkElMTSM=
golang.org/x/term v0.29.0/go.mod h1:6bl4lRlvVuDgSf3179VpIxBF0o10JUpXWOnI7nErv7s=
golang.org/x/term v0.36.0 h1:zMPR+aF8gfksFprF/Nc/rd1wRS1EI6nDBGyWAvDzx2Q=
golang.org/x/term v0.36.0/go.mod h1:Qu394IJq6V6dCBRgwqshf3mPF85AqzYEzofzRdZkWss=
golang.org/x/term v0.37.0 h1:8EGAD0qCmHYZg6J17DvsMy9/wJ7/D/4pV/wfnld5lTU=
golang.org/x/term v0.37.0/go.mod h1:5pB4lxRNYYVZuTLmy8oR2BH8dflOR+IbTYFD8fi3254=
golang.org/x/text v0.0.0-20170915032832-14c0d48ead0c/go.mod h1:NqM8EUOU14njkJ3fqMW+pc6Ldnwhi/IjpwHt7yyuwOQ=
golang.org/x/text v0.3.0/go.mod h1:NqM8EUOU14njkJ3fqMW+pc6Ldnwhi/IjpwHt7yyuwOQ=
golang.org/x/text v0.3.1-0.20180807135948-17ff2d5776d2/go.mod h1:NqM8EUOU14njkJ3fqMW+pc6Ldnwhi/IjpwHt7yyuwOQ=
@ -925,8 +925,8 @@ golang.org/x/text v0.14.0/go.mod h1:18ZOQIKpY8NJVqYksKHtTdi31H5itFRjB5/qKTNYzSU=
golang.org/x/text v0.15.0/go.mod h1:18ZOQIKpY8NJVqYksKHtTdi31H5itFRjB5/qKTNYzSU=
golang.org/x/text v0.21.0/go.mod h1:4IBbMaMmOPCJ8SecivzSH54+73PCFmPWxNTLm+vZkEQ=
golang.org/x/text v0.22.0/go.mod h1:YRoo4H8PVmsu+E3Ou7cqLVH8oXWIHVoX0jqUWALQhfY=
golang.org/x/text v0.30.0 h1:yznKA/E9zq54KzlzBEAWn1NXSQ8DIp/NYMy88xJjl4k=
golang.org/x/text v0.30.0/go.mod h1:yDdHFIX9t+tORqspjENWgzaCVXgk0yYnYuSZ8UzzBVM=
golang.org/x/text v0.31.0 h1:aC8ghyu4JhP8VojJ2lEHBnochRno1sgL6nEi9WGFGMM=
golang.org/x/text v0.31.0/go.mod h1:tKRAlv61yKIjGGHX/4tP1LTbc13YSec1pxVEWXzfoeM=
golang.org/x/time v0.0.0-20181108054448-85acf8d2951c/go.mod h1:tRJNPiyCQ0inRvYxbN9jk5I+vvW/OXSQhTDSoE431IQ=
golang.org/x/time v0.0.0-20190308202827-9d24e82272b4/go.mod h1:tRJNPiyCQ0inRvYxbN9jk5I+vvW/OXSQhTDSoE431IQ=
golang.org/x/time v0.0.0-20191024005414-555d28b269f0/go.mod h1:tRJNPiyCQ0inRvYxbN9jk5I+vvW/OXSQhTDSoE431IQ=
@ -979,8 +979,8 @@ golang.org/x/tools v0.1.12/go.mod h1:hNGJHUnrk76NpqgfD5Aqm5Crs+Hm0VOH/i9J2+nxYbc
golang.org/x/tools v0.6.0/go.mod h1:Xwgl3UAJ/d3gWutnCtw505GrjyAbvKui8lOU390QaIU=
golang.org/x/tools v0.13.0/go.mod h1:HvlwmtVNQAhOuCjW7xxvovg8wbNq7LwfXh/k7wXUl58=
golang.org/x/tools v0.21.1-0.20240508182429-e35e4ccd0d2d/go.mod h1:aiJjzUbINMkxbQROHiO6hDPo2LHcIPhhQsa9DLh0yGk=
golang.org/x/tools v0.37.0 h1:DVSRzp7FwePZW356yEAChSdNcQo6Nsp+fex1SUW09lE=
golang.org/x/tools v0.37.0/go.mod h1:MBN5QPQtLMHVdvsbtarmTNukZDdgwdwlO5qGacAzF0w=
golang.org/x/tools v0.38.0 h1:Hx2Xv8hISq8Lm16jvBZ2VQf+RLmbd7wVUsALibYI/IQ=
golang.org/x/tools v0.38.0/go.mod h1:yEsQ/d/YK8cjh0L6rZlY8tgtlKiBNTL14pGDJPJpYQs=
golang.org/x/xerrors v0.0.0-20190717185122-a985d3407aa7/go.mod h1:I/5z698sn9Ka8TeJc9MKroUUfqBBauWjQqLJ2OPfmY0=
golang.org/x/xerrors v0.0.0-20191011141410-1b5146add898/go.mod h1:I/5z698sn9Ka8TeJc9MKroUUfqBBauWjQqLJ2OPfmY0=
golang.org/x/xerrors v0.0.0-20191204190536-9bdfabe68543/go.mod h1:I/5z698sn9Ka8TeJc9MKroUUfqBBauWjQqLJ2OPfmY0=

6
test/kafka/kafka-client-loadtest/go.mod

@ -35,7 +35,7 @@ require (
github.com/prometheus/procfs v0.16.1 // indirect
github.com/rcrowley/go-metrics v0.0.0-20250401214520-65e299d6c5c9 // indirect
go.yaml.in/yaml/v2 v2.4.2 // indirect
golang.org/x/crypto v0.43.0 // indirect
golang.org/x/net v0.46.0 // indirect
golang.org/x/sys v0.37.0 // indirect
golang.org/x/crypto v0.45.0 // indirect
golang.org/x/net v0.47.0 // indirect
golang.org/x/sys v0.38.0 // indirect
)

12
test/kafka/kafka-client-loadtest/go.sum

@ -84,8 +84,8 @@ go.yaml.in/yaml/v2 v2.4.2/go.mod h1:081UH+NErpNdqlCXm3TtEran0rJZGxAYx9hb/ELlsPU=
golang.org/x/crypto v0.0.0-20190308221718-c2843e01d9a2/go.mod h1:djNgcEr1/C05ACkg1iLfiJU5Ep61QUkGW8qpdssI0+w=
golang.org/x/crypto v0.0.0-20210921155107-089bfa567519/go.mod h1:GvvjBRRGRdwPK5ydBHafDWAxML/pGHZbMvKqRZ5+Abc=
golang.org/x/crypto v0.6.0/go.mod h1:OFC/31mSvZgRz0V1QTNCzfAI1aIRzbiufJtkMIlEp58=
golang.org/x/crypto v0.43.0 h1:dduJYIi3A3KOfdGOHX8AVZ/jGiyPa3IbBozJ5kNuE04=
golang.org/x/crypto v0.43.0/go.mod h1:BFbav4mRNlXJL4wNeejLpWxB7wMbc79PdRGhWKncxR0=
golang.org/x/crypto v0.45.0 h1:jMBrvKuj23MTlT0bQEOBcAE0mjg8mK9RXFhRH6nyF3Q=
golang.org/x/crypto v0.45.0/go.mod h1:XTGrrkGJve7CYK7J8PEww4aY7gM3qMCElcJQ8n8JdX4=
golang.org/x/mod v0.6.0-dev.0.20220419223038-86c51ed26bb4/go.mod h1:jJ57K6gSWd91VN4djpZkiMVwK6gcyfeH4XE8wZrZaV4=
golang.org/x/net v0.0.0-20190620200207-3b0461eec859/go.mod h1:z5CRVTTTmAJ677TzLLGU+0bjPO0LkuOLi4/5GtJWs/s=
golang.org/x/net v0.0.0-20200114155413-6afb5195e5aa/go.mod h1:z5CRVTTTmAJ677TzLLGU+0bjPO0LkuOLi4/5GtJWs/s=
@ -93,8 +93,8 @@ golang.org/x/net v0.0.0-20210226172049-e18ecbb05110/go.mod h1:m0MpNAwzfU5UDzcl9v
golang.org/x/net v0.0.0-20220722155237-a158d28d115b/go.mod h1:XRhObCWvk6IyKnWLug+ECip1KBveYUHfp+8e9klMJ9c=
golang.org/x/net v0.6.0/go.mod h1:2Tu9+aMcznHK/AK1HMvgo6xiTLG5rD5rZLDS+rp2Bjs=
golang.org/x/net v0.7.0/go.mod h1:2Tu9+aMcznHK/AK1HMvgo6xiTLG5rD5rZLDS+rp2Bjs=
golang.org/x/net v0.46.0 h1:giFlY12I07fugqwPuWJi68oOnpfqFnJIJzaIIm2JVV4=
golang.org/x/net v0.46.0/go.mod h1:Q9BGdFy1y4nkUwiLvT5qtyhAnEHgnQ/zd8PfU6nc210=
golang.org/x/net v0.47.0 h1:Mx+4dIFzqraBXUugkia1OOvlD6LemFo1ALMHjrXDOhY=
golang.org/x/net v0.47.0/go.mod h1:/jNxtkgq5yWUGYkaZGqo27cfGZ1c5Nen03aYrrKpVRU=
golang.org/x/sync v0.0.0-20190423024810-112230192c58/go.mod h1:RxMgew5VJxzue5/jJTE5uejpjVlOe/izrB70Jof72aM=
golang.org/x/sync v0.0.0-20220722155255-886fb9371eb4/go.mod h1:RxMgew5VJxzue5/jJTE5uejpjVlOe/izrB70Jof72aM=
golang.org/x/sync v0.17.0 h1:l60nONMj9l5drqw6jlhIELNv9I0A4OFgRsG9k2oT9Ug=
@ -105,8 +105,8 @@ golang.org/x/sys v0.0.0-20210615035016-665e8c7367d1/go.mod h1:oPkhp1MJrh7nUepCBc
golang.org/x/sys v0.0.0-20220520151302-bc2c85ada10a/go.mod h1:oPkhp1MJrh7nUepCBck5+mAzfO9JrbApNNgaTdGDITg=
golang.org/x/sys v0.0.0-20220722155257-8c9f86f7a55f/go.mod h1:oPkhp1MJrh7nUepCBck5+mAzfO9JrbApNNgaTdGDITg=
golang.org/x/sys v0.5.0/go.mod h1:oPkhp1MJrh7nUepCBck5+mAzfO9JrbApNNgaTdGDITg=
golang.org/x/sys v0.37.0 h1:fdNQudmxPjkdUTPnLn5mdQv7Zwvbvpaxqs831goi9kQ=
golang.org/x/sys v0.37.0/go.mod h1:OgkHotnGiDImocRcuBABYBEXf8A9a87e/uXjp9XT3ks=
golang.org/x/sys v0.38.0 h1:3yZWxaJjBmCWXqhN1qh02AkOnCQ1poK6oF+a7xWL6Gc=
golang.org/x/sys v0.38.0/go.mod h1:OgkHotnGiDImocRcuBABYBEXf8A9a87e/uXjp9XT3ks=
golang.org/x/term v0.0.0-20201126162022-7de9c90e9dd1/go.mod h1:bj7SfCRtBDWHUb9snDiAeCFNEtKQo2Wmx5Cou7ajbmo=
golang.org/x/term v0.0.0-20210927222741-03fcf44c2211/go.mod h1:jbD1KX2456YbFQfuXm/mYQcufACuNUgVhRMnK/tPxf8=
golang.org/x/term v0.5.0/go.mod h1:jMB1sMXY+tzblOD4FWmEbocvup2/aLOaQEp7JmGp78k=

2
test/kafka/loadtest/run_million_record_test.sh

@ -55,7 +55,7 @@ if [ "$MASTERS_OK" = false ]; then
echo ""
echo "Example commands to start SeaweedFS:"
echo " # Terminal 1: Start Master"
echo " weed master -defaultReplication=001 -mdir=/tmp/seaweedfs/master"
echo " weed master -defaultReplication=001 -mdir=/tmp/seaweedfs/master -peers=none"
echo ""
echo " # Terminal 2: Start Filer"
echo " weed filer -master=localhost:9333 -filer.dir=/tmp/seaweedfs/filer"

2
test/kafka/loadtest/setup_seaweed_infrastructure.sh

@ -76,7 +76,7 @@ echo "=== Starting SeaweedFS Components ==="
# Start Master
if ! check_service "localhost:9333" "SeaweedFS Master"; then
start_service \
"weed master -defaultReplication=001 -mdir=$BASE_DIR/master" \
"weed master -defaultReplication=001 -mdir=$BASE_DIR/master -peers=none" \
"SeaweedFS Master" \
"$LOG_DIR/master.log" \
"localhost:9333"

2
test/kafka/scripts/test_schema_registry.sh

@ -29,7 +29,7 @@ sleep 2
# Start SeaweedFS server
echo "🗄️ Starting SeaweedFS server..."
/tmp/weed server -dir=/tmp/seaweedfs-test -master.port=9333 -volume.port=8080 -filer.port=8888 -ip=localhost > /tmp/seaweed-server.log 2>&1 &
/tmp/weed server -dir=/tmp/seaweedfs-test -master.port=9333 -volume.port=8080 -filer.port=8888 -ip=localhost -master.peers=none > /tmp/seaweed-server.log 2>&1 &
SERVER_PID=$!
# Wait for server to be ready

4
test/mq/README.md

@ -13,10 +13,10 @@ This directory contains test programs for SeaweedFS Message Queue (MQ) functiona
```bash
# Start SeaweedFS server with MQ broker and agent
weed server -mq.broker -mq.agent -filer -volume
weed server -mq.broker -mq.agent -filer -volume -master.peers=none
# Or start components separately
weed master
weed master -peers=none
weed volume -mserver=localhost:9333
weed filer -master=localhost:9333
weed mq.broker -filer=localhost:8888

2
test/s3/copying/Makefile

@ -70,7 +70,7 @@ start-seaweedfs: check-binary
@mkdir -p /tmp/seaweedfs-test-copying-volume
# Start master server with volume size limit
@nohup $(SEAWEEDFS_BINARY) master -port=$(MASTER_PORT) -mdir=/tmp/seaweedfs-test-copying-master -volumeSizeLimitMB=$(VOLUME_MAX_SIZE_MB) -ip=127.0.0.1 > /tmp/seaweedfs-master.log 2>&1 &
@nohup $(SEAWEEDFS_BINARY) master -port=$(MASTER_PORT) -mdir=/tmp/seaweedfs-test-copying-master -volumeSizeLimitMB=$(VOLUME_MAX_SIZE_MB) -ip=127.0.0.1 -peers=none > /tmp/seaweedfs-master.log 2>&1 &
@sleep 3
# Start volume server

3
test/s3/iam/Makefile

@ -61,7 +61,8 @@ start-services: ## Start SeaweedFS services for testing
@echo "🚀 Starting SeaweedFS services..."
@echo "Starting master server..."
@$(WEED_BINARY) master -port=$(MASTER_PORT) \
-mdir=test-volume-data/m9333 > weed-master.log 2>&1 & \
-mdir=test-volume-data/m9333 \
-peers=none > weed-master.log 2>&1 & \
echo $$! > $(MASTER_PID_FILE)
@echo "Waiting for master server to be ready..."

2
test/s3/iam/README-Docker.md

@ -170,7 +170,7 @@ The `setup_keycloak_docker.sh` script automatically generates `iam_config.json`
{
"claim": "roles",
"value": "s3-admin",
"role": "arn:seaweed:iam::role/KeycloakAdminRole"
"role": "arn:aws:iam::role/KeycloakAdminRole"
}
```

2
test/s3/iam/README.md

@ -257,7 +257,7 @@ Add policies to `test_config.json`:
{
"Effect": "Allow",
"Action": ["s3:GetObject"],
"Resource": ["arn:seaweed:s3:::specific-bucket/*"],
"Resource": ["arn:aws:s3:::specific-bucket/*"],
"Condition": {
"StringEquals": {
"s3:prefix": ["allowed-prefix/"]

2
test/s3/iam/STS_DISTRIBUTED.md

@ -248,7 +248,7 @@ services:
3. User calls SeaweedFS STS AssumeRoleWithWebIdentity
POST /sts/assume-role-with-web-identity
{
"RoleArn": "arn:seaweed:iam::role/S3AdminRole",
"RoleArn": "arn:aws:iam::role/S3AdminRole",
"WebIdentityToken": "eyJ0eXAiOiJKV1QiLCJhbGc...",
"RoleSessionName": "user-session"
}

40
test/s3/iam/iam_config.github.json

@ -35,25 +35,25 @@
{
"claim": "roles",
"value": "s3-admin",
"role": "arn:seaweed:iam::role/KeycloakAdminRole"
"role": "arn:aws:iam::role/KeycloakAdminRole"
},
{
"claim": "roles",
"value": "s3-read-only",
"role": "arn:seaweed:iam::role/KeycloakReadOnlyRole"
"role": "arn:aws:iam::role/KeycloakReadOnlyRole"
},
{
"claim": "roles",
"value": "s3-write-only",
"role": "arn:seaweed:iam::role/KeycloakWriteOnlyRole"
"role": "arn:aws:iam::role/KeycloakWriteOnlyRole"
},
{
"claim": "roles",
"value": "s3-read-write",
"role": "arn:seaweed:iam::role/KeycloakReadWriteRole"
"role": "arn:aws:iam::role/KeycloakReadWriteRole"
}
],
"defaultRole": "arn:seaweed:iam::role/KeycloakReadOnlyRole"
"defaultRole": "arn:aws:iam::role/KeycloakReadOnlyRole"
}
}
}
@ -64,7 +64,7 @@
"roles": [
{
"roleName": "TestAdminRole",
"roleArn": "arn:seaweed:iam::role/TestAdminRole",
"roleArn": "arn:aws:iam::role/TestAdminRole",
"trustPolicy": {
"Version": "2012-10-17",
"Statement": [
@ -82,7 +82,7 @@
},
{
"roleName": "TestReadOnlyRole",
"roleArn": "arn:seaweed:iam::role/TestReadOnlyRole",
"roleArn": "arn:aws:iam::role/TestReadOnlyRole",
"trustPolicy": {
"Version": "2012-10-17",
"Statement": [
@ -100,7 +100,7 @@
},
{
"roleName": "TestWriteOnlyRole",
"roleArn": "arn:seaweed:iam::role/TestWriteOnlyRole",
"roleArn": "arn:aws:iam::role/TestWriteOnlyRole",
"trustPolicy": {
"Version": "2012-10-17",
"Statement": [
@ -118,7 +118,7 @@
},
{
"roleName": "KeycloakAdminRole",
"roleArn": "arn:seaweed:iam::role/KeycloakAdminRole",
"roleArn": "arn:aws:iam::role/KeycloakAdminRole",
"trustPolicy": {
"Version": "2012-10-17",
"Statement": [
@ -136,7 +136,7 @@
},
{
"roleName": "KeycloakReadOnlyRole",
"roleArn": "arn:seaweed:iam::role/KeycloakReadOnlyRole",
"roleArn": "arn:aws:iam::role/KeycloakReadOnlyRole",
"trustPolicy": {
"Version": "2012-10-17",
"Statement": [
@ -154,7 +154,7 @@
},
{
"roleName": "KeycloakWriteOnlyRole",
"roleArn": "arn:seaweed:iam::role/KeycloakWriteOnlyRole",
"roleArn": "arn:aws:iam::role/KeycloakWriteOnlyRole",
"trustPolicy": {
"Version": "2012-10-17",
"Statement": [
@ -172,7 +172,7 @@
},
{
"roleName": "KeycloakReadWriteRole",
"roleArn": "arn:seaweed:iam::role/KeycloakReadWriteRole",
"roleArn": "arn:aws:iam::role/KeycloakReadWriteRole",
"trustPolicy": {
"Version": "2012-10-17",
"Statement": [
@ -220,8 +220,8 @@
"s3:ListBucket"
],
"Resource": [
"arn:seaweed:s3:::*",
"arn:seaweed:s3:::*/*"
"arn:aws:s3:::*",
"arn:aws:s3:::*/*"
]
},
{
@ -243,8 +243,8 @@
"s3:*"
],
"Resource": [
"arn:seaweed:s3:::*",
"arn:seaweed:s3:::*/*"
"arn:aws:s3:::*",
"arn:aws:s3:::*/*"
]
},
{
@ -254,8 +254,8 @@
"s3:ListBucket"
],
"Resource": [
"arn:seaweed:s3:::*",
"arn:seaweed:s3:::*/*"
"arn:aws:s3:::*",
"arn:aws:s3:::*/*"
]
},
{
@ -277,8 +277,8 @@
"s3:*"
],
"Resource": [
"arn:seaweed:s3:::*",
"arn:seaweed:s3:::*/*"
"arn:aws:s3:::*",
"arn:aws:s3:::*/*"
]
},
{

40
test/s3/iam/iam_config.json

@ -35,25 +35,25 @@
{
"claim": "roles",
"value": "s3-admin",
"role": "arn:seaweed:iam::role/KeycloakAdminRole"
"role": "arn:aws:iam::role/KeycloakAdminRole"
},
{
"claim": "roles",
"value": "s3-read-only",
"role": "arn:seaweed:iam::role/KeycloakReadOnlyRole"
"role": "arn:aws:iam::role/KeycloakReadOnlyRole"
},
{
"claim": "roles",
"value": "s3-write-only",
"role": "arn:seaweed:iam::role/KeycloakWriteOnlyRole"
"role": "arn:aws:iam::role/KeycloakWriteOnlyRole"
},
{
"claim": "roles",
"value": "s3-read-write",
"role": "arn:seaweed:iam::role/KeycloakReadWriteRole"
"role": "arn:aws:iam::role/KeycloakReadWriteRole"
}
],
"defaultRole": "arn:seaweed:iam::role/KeycloakReadOnlyRole"
"defaultRole": "arn:aws:iam::role/KeycloakReadOnlyRole"
}
}
}
@ -64,7 +64,7 @@
"roles": [
{
"roleName": "TestAdminRole",
"roleArn": "arn:seaweed:iam::role/TestAdminRole",
"roleArn": "arn:aws:iam::role/TestAdminRole",
"trustPolicy": {
"Version": "2012-10-17",
"Statement": [
@ -82,7 +82,7 @@
},
{
"roleName": "TestReadOnlyRole",
"roleArn": "arn:seaweed:iam::role/TestReadOnlyRole",
"roleArn": "arn:aws:iam::role/TestReadOnlyRole",
"trustPolicy": {
"Version": "2012-10-17",
"Statement": [
@ -100,7 +100,7 @@
},
{
"roleName": "TestWriteOnlyRole",
"roleArn": "arn:seaweed:iam::role/TestWriteOnlyRole",
"roleArn": "arn:aws:iam::role/TestWriteOnlyRole",
"trustPolicy": {
"Version": "2012-10-17",
"Statement": [
@ -118,7 +118,7 @@
},
{
"roleName": "KeycloakAdminRole",
"roleArn": "arn:seaweed:iam::role/KeycloakAdminRole",
"roleArn": "arn:aws:iam::role/KeycloakAdminRole",
"trustPolicy": {
"Version": "2012-10-17",
"Statement": [
@ -136,7 +136,7 @@
},
{
"roleName": "KeycloakReadOnlyRole",
"roleArn": "arn:seaweed:iam::role/KeycloakReadOnlyRole",
"roleArn": "arn:aws:iam::role/KeycloakReadOnlyRole",
"trustPolicy": {
"Version": "2012-10-17",
"Statement": [
@ -154,7 +154,7 @@
},
{
"roleName": "KeycloakWriteOnlyRole",
"roleArn": "arn:seaweed:iam::role/KeycloakWriteOnlyRole",
"roleArn": "arn:aws:iam::role/KeycloakWriteOnlyRole",
"trustPolicy": {
"Version": "2012-10-17",
"Statement": [
@ -172,7 +172,7 @@
},
{
"roleName": "KeycloakReadWriteRole",
"roleArn": "arn:seaweed:iam::role/KeycloakReadWriteRole",
"roleArn": "arn:aws:iam::role/KeycloakReadWriteRole",
"trustPolicy": {
"Version": "2012-10-17",
"Statement": [
@ -220,8 +220,8 @@
"s3:ListBucket"
],
"Resource": [
"arn:seaweed:s3:::*",
"arn:seaweed:s3:::*/*"
"arn:aws:s3:::*",
"arn:aws:s3:::*/*"
]
},
{
@ -243,8 +243,8 @@
"s3:*"
],
"Resource": [
"arn:seaweed:s3:::*",
"arn:seaweed:s3:::*/*"
"arn:aws:s3:::*",
"arn:aws:s3:::*/*"
]
},
{
@ -254,8 +254,8 @@
"s3:ListBucket"
],
"Resource": [
"arn:seaweed:s3:::*",
"arn:seaweed:s3:::*/*"
"arn:aws:s3:::*",
"arn:aws:s3:::*/*"
]
},
{
@ -277,8 +277,8 @@
"s3:*"
],
"Resource": [
"arn:seaweed:s3:::*",
"arn:seaweed:s3:::*/*"
"arn:aws:s3:::*",
"arn:aws:s3:::*/*"
]
},
{

40
test/s3/iam/iam_config.local.json

@ -39,25 +39,25 @@
{
"claim": "roles",
"value": "s3-admin",
"role": "arn:seaweed:iam::role/KeycloakAdminRole"
"role": "arn:aws:iam::role/KeycloakAdminRole"
},
{
"claim": "roles",
"value": "s3-read-only",
"role": "arn:seaweed:iam::role/KeycloakReadOnlyRole"
"role": "arn:aws:iam::role/KeycloakReadOnlyRole"
},
{
"claim": "roles",
"value": "s3-write-only",
"role": "arn:seaweed:iam::role/KeycloakWriteOnlyRole"
"role": "arn:aws:iam::role/KeycloakWriteOnlyRole"
},
{
"claim": "roles",
"value": "s3-read-write",
"role": "arn:seaweed:iam::role/KeycloakReadWriteRole"
"role": "arn:aws:iam::role/KeycloakReadWriteRole"
}
],
"defaultRole": "arn:seaweed:iam::role/KeycloakReadOnlyRole"
"defaultRole": "arn:aws:iam::role/KeycloakReadOnlyRole"
}
}
}
@ -68,7 +68,7 @@
"roles": [
{
"roleName": "TestAdminRole",
"roleArn": "arn:seaweed:iam::role/TestAdminRole",
"roleArn": "arn:aws:iam::role/TestAdminRole",
"trustPolicy": {
"Version": "2012-10-17",
"Statement": [
@ -90,7 +90,7 @@
},
{
"roleName": "TestReadOnlyRole",
"roleArn": "arn:seaweed:iam::role/TestReadOnlyRole",
"roleArn": "arn:aws:iam::role/TestReadOnlyRole",
"trustPolicy": {
"Version": "2012-10-17",
"Statement": [
@ -112,7 +112,7 @@
},
{
"roleName": "TestWriteOnlyRole",
"roleArn": "arn:seaweed:iam::role/TestWriteOnlyRole",
"roleArn": "arn:aws:iam::role/TestWriteOnlyRole",
"trustPolicy": {
"Version": "2012-10-17",
"Statement": [
@ -134,7 +134,7 @@
},
{
"roleName": "KeycloakAdminRole",
"roleArn": "arn:seaweed:iam::role/KeycloakAdminRole",
"roleArn": "arn:aws:iam::role/KeycloakAdminRole",
"trustPolicy": {
"Version": "2012-10-17",
"Statement": [
@ -156,7 +156,7 @@
},
{
"roleName": "KeycloakReadOnlyRole",
"roleArn": "arn:seaweed:iam::role/KeycloakReadOnlyRole",
"roleArn": "arn:aws:iam::role/KeycloakReadOnlyRole",
"trustPolicy": {
"Version": "2012-10-17",
"Statement": [
@ -178,7 +178,7 @@
},
{
"roleName": "KeycloakWriteOnlyRole",
"roleArn": "arn:seaweed:iam::role/KeycloakWriteOnlyRole",
"roleArn": "arn:aws:iam::role/KeycloakWriteOnlyRole",
"trustPolicy": {
"Version": "2012-10-17",
"Statement": [
@ -200,7 +200,7 @@
},
{
"roleName": "KeycloakReadWriteRole",
"roleArn": "arn:seaweed:iam::role/KeycloakReadWriteRole",
"roleArn": "arn:aws:iam::role/KeycloakReadWriteRole",
"trustPolicy": {
"Version": "2012-10-17",
"Statement": [
@ -260,8 +260,8 @@
"s3:ListBucket"
],
"Resource": [
"arn:seaweed:s3:::*",
"arn:seaweed:s3:::*/*"
"arn:aws:s3:::*",
"arn:aws:s3:::*/*"
]
},
{
@ -287,8 +287,8 @@
"s3:*"
],
"Resource": [
"arn:seaweed:s3:::*",
"arn:seaweed:s3:::*/*"
"arn:aws:s3:::*",
"arn:aws:s3:::*/*"
]
},
{
@ -298,8 +298,8 @@
"s3:ListBucket"
],
"Resource": [
"arn:seaweed:s3:::*",
"arn:seaweed:s3:::*/*"
"arn:aws:s3:::*",
"arn:aws:s3:::*/*"
]
},
{
@ -325,8 +325,8 @@
"s3:*"
],
"Resource": [
"arn:seaweed:s3:::*",
"arn:seaweed:s3:::*/*"
"arn:aws:s3:::*",
"arn:aws:s3:::*/*"
]
},
{

14
test/s3/iam/iam_config_distributed.json

@ -40,7 +40,7 @@
"roles": [
{
"roleName": "S3AdminRole",
"roleArn": "arn:seaweed:iam::role/S3AdminRole",
"roleArn": "arn:aws:iam::role/S3AdminRole",
"trustPolicy": {
"Version": "2012-10-17",
"Statement": [
@ -63,7 +63,7 @@
},
{
"roleName": "S3ReadOnlyRole",
"roleArn": "arn:seaweed:iam::role/S3ReadOnlyRole",
"roleArn": "arn:aws:iam::role/S3ReadOnlyRole",
"trustPolicy": {
"Version": "2012-10-17",
"Statement": [
@ -86,7 +86,7 @@
},
{
"roleName": "S3ReadWriteRole",
"roleArn": "arn:seaweed:iam::role/S3ReadWriteRole",
"roleArn": "arn:aws:iam::role/S3ReadWriteRole",
"trustPolicy": {
"Version": "2012-10-17",
"Statement": [
@ -137,8 +137,8 @@
"s3:ListBucketVersions"
],
"Resource": [
"arn:seaweed:s3:::*",
"arn:seaweed:s3:::*/*"
"arn:aws:s3:::*",
"arn:aws:s3:::*/*"
]
}
]
@ -162,8 +162,8 @@
"s3:ListBucketVersions"
],
"Resource": [
"arn:seaweed:s3:::*",
"arn:seaweed:s3:::*/*"
"arn:aws:s3:::*",
"arn:aws:s3:::*/*"
]
}
]

14
test/s3/iam/iam_config_docker.json

@ -25,7 +25,7 @@
"roles": [
{
"roleName": "S3AdminRole",
"roleArn": "arn:seaweed:iam::role/S3AdminRole",
"roleArn": "arn:aws:iam::role/S3AdminRole",
"trustPolicy": {
"Version": "2012-10-17",
"Statement": [
@ -48,7 +48,7 @@
},
{
"roleName": "S3ReadOnlyRole",
"roleArn": "arn:seaweed:iam::role/S3ReadOnlyRole",
"roleArn": "arn:aws:iam::role/S3ReadOnlyRole",
"trustPolicy": {
"Version": "2012-10-17",
"Statement": [
@ -71,7 +71,7 @@
},
{
"roleName": "S3ReadWriteRole",
"roleArn": "arn:seaweed:iam::role/S3ReadWriteRole",
"roleArn": "arn:aws:iam::role/S3ReadWriteRole",
"trustPolicy": {
"Version": "2012-10-17",
"Statement": [
@ -122,8 +122,8 @@
"s3:ListBucketVersions"
],
"Resource": [
"arn:seaweed:s3:::*",
"arn:seaweed:s3:::*/*"
"arn:aws:s3:::*",
"arn:aws:s3:::*/*"
]
}
]
@ -147,8 +147,8 @@
"s3:ListBucketVersions"
],
"Resource": [
"arn:seaweed:s3:::*",
"arn:seaweed:s3:::*/*"
"arn:aws:s3:::*",
"arn:aws:s3:::*/*"
]
}
]

16
test/s3/iam/s3_iam_framework.go

@ -369,9 +369,9 @@ func (f *S3IAMTestFramework) generateSTSSessionToken(username, roleName string,
sessionId := fmt.Sprintf("test-session-%s-%s-%d", username, roleName, now.Unix())
// Create session token claims exactly matching STSSessionClaims struct
roleArn := fmt.Sprintf("arn:seaweed:iam::role/%s", roleName)
roleArn := fmt.Sprintf("arn:aws:iam::role/%s", roleName)
sessionName := fmt.Sprintf("test-session-%s", username)
principalArn := fmt.Sprintf("arn:seaweed:sts::assumed-role/%s/%s", roleName, sessionName)
principalArn := fmt.Sprintf("arn:aws:sts::assumed-role/%s/%s", roleName, sessionName)
// Use jwt.MapClaims but with exact field names that STSSessionClaims expects
sessionClaims := jwt.MapClaims{
@ -705,12 +705,22 @@ func (f *S3IAMTestFramework) CreateBucketWithCleanup(s3Client *s3.S3, bucketName
f.t.Logf("Warning: Failed to delete existing bucket %s: %v", bucketName, deleteErr)
}
// Add a small delay to allow deletion to propagate
time.Sleep(100 * time.Millisecond)
// Now create it fresh
_, err = s3Client.CreateBucket(&s3.CreateBucketInput{
Bucket: aws.String(bucketName),
})
if err != nil {
return fmt.Errorf("failed to recreate bucket after cleanup: %v", err)
// If it still says bucket exists after cleanup, it might be in an inconsistent state
// In this case, just use the existing bucket since we emptied it
if awsErr, ok := err.(awserr.Error); ok && (awsErr.Code() == "BucketAlreadyExists" || awsErr.Code() == "BucketAlreadyOwnedByYou") {
f.t.Logf("Bucket %s still exists after cleanup, reusing it", bucketName)
// Bucket exists and is empty, so we can proceed
} else {
return fmt.Errorf("failed to recreate bucket after cleanup: %v", err)
}
}
} else {
return err

45
test/s3/iam/s3_iam_integration_test.go

@ -410,7 +410,7 @@ func TestS3IAMBucketPolicyIntegration(t *testing.T) {
"Effect": "Allow",
"Principal": "*",
"Action": ["s3:GetObject"],
"Resource": ["arn:seaweed:s3:::%s/*"]
"Resource": ["arn:aws:s3:::%s/*"]
}
]
}`, bucketName)
@ -443,6 +443,12 @@ func TestS3IAMBucketPolicyIntegration(t *testing.T) {
require.NoError(t, err)
assert.Equal(t, testObjectData, string(data))
result.Body.Close()
// Clean up bucket policy after this test
_, err = adminClient.DeleteBucketPolicy(&s3.DeleteBucketPolicyInput{
Bucket: aws.String(bucketName),
})
require.NoError(t, err)
})
t.Run("bucket_policy_denies_specific_action", func(t *testing.T) {
@ -455,7 +461,7 @@ func TestS3IAMBucketPolicyIntegration(t *testing.T) {
"Effect": "Deny",
"Principal": "*",
"Action": ["s3:DeleteObject"],
"Resource": ["arn:seaweed:s3:::%s/*"]
"Resource": ["arn:aws:s3:::%s/*"]
}
]
}`, bucketName)
@ -474,17 +480,34 @@ func TestS3IAMBucketPolicyIntegration(t *testing.T) {
assert.Contains(t, *policyResult.Policy, "s3:DeleteObject")
assert.Contains(t, *policyResult.Policy, "Deny")
// IMPLEMENTATION NOTE: Bucket policy enforcement in authorization flow
// is planned for a future phase. Currently, this test validates policy
// storage and retrieval. When enforcement is implemented, this test
// should be extended to verify that delete operations are actually denied.
// NOTE: Enforcement test is commented out due to known architectural limitation:
//
// KNOWN LIMITATION: DeleteObject uses the coarse-grained ACTION_WRITE constant,
// which convertActionToS3Format maps to "s3:PutObject" (not "s3:DeleteObject").
// This means the policy engine evaluates the deny policy against "s3:PutObject",
// doesn't find a match, and allows the delete operation.
//
// TODO: Uncomment this test once the action mapping is refactored to use
// specific S3 action strings throughout the S3 API handlers.
// See: weed/s3api/s3api_bucket_policy_engine.go lines 135-146
//
// _, err = adminClient.DeleteObject(&s3.DeleteObjectInput{
// Bucket: aws.String(bucketName),
// Key: aws.String(testObjectKey),
// })
// require.Error(t, err, "DeleteObject should be denied by the bucket policy")
// awsErr, ok := err.(awserr.Error)
// require.True(t, ok, "Error should be an awserr.Error")
// assert.Equal(t, "AccessDenied", awsErr.Code(), "Expected AccessDenied error code")
// Clean up bucket policy after this test
_, err = adminClient.DeleteBucketPolicy(&s3.DeleteBucketPolicyInput{
Bucket: aws.String(bucketName),
})
require.NoError(t, err)
})
// Cleanup - delete bucket policy first, then objects and bucket
_, err = adminClient.DeleteBucketPolicy(&s3.DeleteBucketPolicyInput{
Bucket: aws.String(bucketName),
})
require.NoError(t, err)
// Cleanup - delete objects and bucket (policy already cleaned up in subtests)
_, err = adminClient.DeleteObject(&s3.DeleteObjectInput{
Bucket: aws.String(bucketName),

34
test/s3/iam/setup_keycloak_docker.sh

@ -178,25 +178,25 @@ cat > iam_config.json << 'EOF'
{
"claim": "roles",
"value": "s3-admin",
"role": "arn:seaweed:iam::role/KeycloakAdminRole"
"role": "arn:aws:iam::role/KeycloakAdminRole"
},
{
"claim": "roles",
"value": "s3-read-only",
"role": "arn:seaweed:iam::role/KeycloakReadOnlyRole"
"role": "arn:aws:iam::role/KeycloakReadOnlyRole"
},
{
"claim": "roles",
"value": "s3-write-only",
"role": "arn:seaweed:iam::role/KeycloakWriteOnlyRole"
"role": "arn:aws:iam::role/KeycloakWriteOnlyRole"
},
{
"claim": "roles",
"value": "s3-read-write",
"role": "arn:seaweed:iam::role/KeycloakReadWriteRole"
"role": "arn:aws:iam::role/KeycloakReadWriteRole"
}
],
"defaultRole": "arn:seaweed:iam::role/KeycloakReadOnlyRole"
"defaultRole": "arn:aws:iam::role/KeycloakReadOnlyRole"
}
}
}
@ -207,7 +207,7 @@ cat > iam_config.json << 'EOF'
"roles": [
{
"roleName": "KeycloakAdminRole",
"roleArn": "arn:seaweed:iam::role/KeycloakAdminRole",
"roleArn": "arn:aws:iam::role/KeycloakAdminRole",
"trustPolicy": {
"Version": "2012-10-17",
"Statement": [
@ -225,7 +225,7 @@ cat > iam_config.json << 'EOF'
},
{
"roleName": "KeycloakReadOnlyRole",
"roleArn": "arn:seaweed:iam::role/KeycloakReadOnlyRole",
"roleArn": "arn:aws:iam::role/KeycloakReadOnlyRole",
"trustPolicy": {
"Version": "2012-10-17",
"Statement": [
@ -243,7 +243,7 @@ cat > iam_config.json << 'EOF'
},
{
"roleName": "KeycloakWriteOnlyRole",
"roleArn": "arn:seaweed:iam::role/KeycloakWriteOnlyRole",
"roleArn": "arn:aws:iam::role/KeycloakWriteOnlyRole",
"trustPolicy": {
"Version": "2012-10-17",
"Statement": [
@ -261,7 +261,7 @@ cat > iam_config.json << 'EOF'
},
{
"roleName": "KeycloakReadWriteRole",
"roleArn": "arn:seaweed:iam::role/KeycloakReadWriteRole",
"roleArn": "arn:aws:iam::role/KeycloakReadWriteRole",
"trustPolicy": {
"Version": "2012-10-17",
"Statement": [
@ -309,8 +309,8 @@ cat > iam_config.json << 'EOF'
"s3:ListBucket"
],
"Resource": [
"arn:seaweed:s3:::*",
"arn:seaweed:s3:::*/*"
"arn:aws:s3:::*",
"arn:aws:s3:::*/*"
]
},
{
@ -330,8 +330,8 @@ cat > iam_config.json << 'EOF'
"Effect": "Allow",
"Action": ["s3:*"],
"Resource": [
"arn:seaweed:s3:::*",
"arn:seaweed:s3:::*/*"
"arn:aws:s3:::*",
"arn:aws:s3:::*/*"
]
},
{
@ -341,8 +341,8 @@ cat > iam_config.json << 'EOF'
"s3:ListBucket"
],
"Resource": [
"arn:seaweed:s3:::*",
"arn:seaweed:s3:::*/*"
"arn:aws:s3:::*",
"arn:aws:s3:::*/*"
]
},
{
@ -362,8 +362,8 @@ cat > iam_config.json << 'EOF'
"Effect": "Allow",
"Action": ["s3:*"],
"Resource": [
"arn:seaweed:s3:::*",
"arn:seaweed:s3:::*/*"
"arn:aws:s3:::*",
"arn:aws:s3:::*/*"
]
},
{

28
test/s3/iam/test_config.json

@ -164,8 +164,8 @@
"Effect": "Allow",
"Action": ["s3:*"],
"Resource": [
"arn:seaweed:s3:::*",
"arn:seaweed:s3:::*/*"
"arn:aws:s3:::*",
"arn:aws:s3:::*/*"
]
}
]
@ -184,8 +184,8 @@
"s3:GetBucketVersioning"
],
"Resource": [
"arn:seaweed:s3:::*",
"arn:seaweed:s3:::*/*"
"arn:aws:s3:::*",
"arn:aws:s3:::*/*"
]
}
]
@ -207,7 +207,7 @@
"s3:ListMultipartUploadParts"
],
"Resource": [
"arn:seaweed:s3:::*/*"
"arn:aws:s3:::*/*"
]
}
]
@ -227,7 +227,7 @@
"s3:PutBucketVersioning"
],
"Resource": [
"arn:seaweed:s3:::*"
"arn:aws:s3:::*"
]
}
]
@ -239,8 +239,8 @@
"Effect": "Allow",
"Action": ["s3:*"],
"Resource": [
"arn:seaweed:s3:::*",
"arn:seaweed:s3:::*/*"
"arn:aws:s3:::*",
"arn:aws:s3:::*/*"
],
"Condition": {
"IpAddress": {
@ -257,8 +257,8 @@
"Effect": "Allow",
"Action": ["s3:GetObject", "s3:ListBucket"],
"Resource": [
"arn:seaweed:s3:::*",
"arn:seaweed:s3:::*/*"
"arn:aws:s3:::*",
"arn:aws:s3:::*/*"
],
"Condition": {
"DateGreaterThan": {
@ -281,7 +281,7 @@
"Effect": "Allow",
"Principal": "*",
"Action": "s3:GetObject",
"Resource": "arn:seaweed:s3:::example-bucket/*"
"Resource": "arn:aws:s3:::example-bucket/*"
}
]
},
@ -294,8 +294,8 @@
"Principal": "*",
"Action": ["s3:DeleteObject", "s3:DeleteBucket"],
"Resource": [
"arn:seaweed:s3:::example-bucket",
"arn:seaweed:s3:::example-bucket/*"
"arn:aws:s3:::example-bucket",
"arn:aws:s3:::example-bucket/*"
]
}
]
@ -308,7 +308,7 @@
"Effect": "Allow",
"Principal": "*",
"Action": ["s3:GetObject", "s3:PutObject"],
"Resource": "arn:seaweed:s3:::example-bucket/*",
"Resource": "arn:aws:s3:::example-bucket/*",
"Condition": {
"IpAddress": {
"aws:SourceIp": ["203.0.113.0/24"]

40
test/s3/parquet/.gitignore

@ -0,0 +1,40 @@
# Python virtual environment
venv/
.venv/
env/
ENV/
# Python cache
__pycache__/
*.py[cod]
*$py.class
*.so
.Python
# Test artifacts
*.log
test_run.log
weed-test.log
# SeaweedFS data directories
filerldb2/
idx/
dat/
*.idx
*.dat
# Temporary test files
.pytest_cache/
.coverage
htmlcov/
# IDE
.vscode/
.idea/
*.swp
*.swo
*~
# OS
.DS_Store
Thumbs.db

172
test/s3/parquet/CROSS_FILESYSTEM_COMPATIBILITY.md

@ -0,0 +1,172 @@
# Cross-Filesystem Compatibility Test Results
## Overview
This document summarizes the cross-filesystem compatibility testing between **s3fs** and **PyArrow native S3 filesystem** implementations when working with SeaweedFS.
## Test Purpose
Verify that Parquet files written using one filesystem implementation (s3fs or PyArrow native S3) can be correctly read using the other implementation, confirming true file format compatibility.
## Test Methodology
### Test Matrix
The test performs the following combinations:
1. **Write with s3fs → Read with PyArrow native S3**
2. **Write with PyArrow native S3 → Read with s3fs**
For each direction, the test:
- Creates a sample PyArrow table with multiple data types (int64, string, float64, bool)
- Writes the Parquet file using one filesystem implementation
- Reads the Parquet file using the other filesystem implementation
- Verifies data integrity by comparing:
- Row counts
- Schema equality
- Data contents (after sorting by ID to handle row order differences)
### File Sizes Tested
- **Small files**: 5 rows (quick validation)
- **Large files**: 200,000 rows (multi-row-group validation)
## Test Results
### ✅ Small Files (5 rows)
| Write Method | Read Method | Result | Read Function Used |
|--------------|-------------|--------|--------------------|
| s3fs | PyArrow native S3 | ✅ PASS | pq.read_table |
| PyArrow native S3 | s3fs | ✅ PASS | pq.read_table |
**Status**: **ALL TESTS PASSED**
### Large Files (200,000 rows)
Large file testing requires adequate volume capacity in SeaweedFS. When run with default volume settings (50MB max size), tests may encounter capacity issues with the number of large test files created simultaneously.
**Recommendation**: For large file testing, increase `VOLUME_MAX_SIZE_MB` in the Makefile or run tests with `TEST_QUICK=1` for development/validation purposes.
## Key Findings
### ✅ Full Compatibility Confirmed
**Files written with s3fs and PyArrow native S3 filesystem are fully compatible and can be read by either implementation.**
This confirms that:
1. **Identical Parquet Format**: Both s3fs and PyArrow native S3 use the same underlying PyArrow library to generate Parquet files, resulting in identical file formats at the binary level.
2. **S3 API Compatibility**: SeaweedFS's S3 implementation handles both filesystem backends correctly, with proper:
- Object creation (PutObject)
- Object reading (GetObject)
- Directory handling (implicit directories)
- Multipart uploads (for larger files)
3. **Metadata Consistency**: File metadata, schemas, and data integrity are preserved across both write and read operations regardless of which filesystem implementation is used.
## Implementation Details
### Common Write Path
Both implementations use PyArrow's `pads.write_dataset()` function:
```python
# s3fs approach
fs = s3fs.S3FileSystem(...)
pads.write_dataset(table, path, format="parquet", filesystem=fs)
# PyArrow native approach
s3 = pafs.S3FileSystem(...)
pads.write_dataset(table, path, format="parquet", filesystem=s3)
```
### Multiple Read Methods Tested
The test attempts reads using multiple PyArrow methods:
- `pq.read_table()` - Direct table reading
- `pq.ParquetDataset()` - Dataset-based reading
- `pads.dataset()` - PyArrow dataset API
All methods successfully read files written by either filesystem implementation.
## Practical Implications
### For Users
1. **Flexibility**: Users can choose either s3fs or PyArrow native S3 based on their preferences:
- **s3fs**: More mature, widely used, familiar API
- **PyArrow native**: Pure PyArrow solution, fewer dependencies
2. **Interoperability**: Teams using different tools can seamlessly share Parquet datasets stored in SeaweedFS
3. **Migration**: Easy to migrate between filesystem implementations without data conversion
### For SeaweedFS
1. **S3 Compatibility**: Confirms SeaweedFS's S3 implementation is compatible with major Python data science tools
2. **Implicit Directory Handling**: The implicit directory fix works correctly for both filesystem implementations
3. **Standard Compliance**: SeaweedFS handles S3 operations in a way that's compatible with AWS S3 behavior
## Running the Tests
### Quick Test (Recommended for Development)
```bash
cd test/s3/parquet
TEST_QUICK=1 make test-cross-fs-with-server
```
### Full Test (All File Sizes)
```bash
cd test/s3/parquet
make test-cross-fs-with-server
```
### Manual Test (Assuming Server is Running)
```bash
cd test/s3/parquet
make setup-python
make start-seaweedfs-ci
# In another terminal
TEST_QUICK=1 make test-cross-fs
# Cleanup
make stop-seaweedfs-safe
```
## Environment Variables
The test supports customization through environment variables:
- `S3_ENDPOINT_URL`: S3 endpoint (default: `http://localhost:8333`)
- `S3_ACCESS_KEY`: Access key (default: `some_access_key1`)
- `S3_SECRET_KEY`: Secret key (default: `some_secret_key1`)
- `BUCKET_NAME`: Bucket name (default: `test-parquet-bucket`)
- `TEST_QUICK`: Run only small tests (default: `0`, set to `1` for quick mode)
## Conclusion
The cross-filesystem compatibility tests demonstrate that **Parquet files written via s3fs and PyArrow native S3 filesystem are completely interchangeable**. This validates that:
1. The Parquet file format is implementation-agnostic
2. SeaweedFS's S3 API correctly handles both filesystem backends
3. Users have full flexibility in choosing their preferred filesystem implementation
This compatibility is a testament to:
- PyArrow's consistent file format generation
- SeaweedFS's robust S3 API implementation
- Proper handling of S3 semantics (especially implicit directories)
---
**Test Implementation**: `test_cross_filesystem_compatibility.py`
**Last Updated**: November 21, 2024
**Status**: ✅ All critical tests passing

58
test/s3/parquet/FINAL_ROOT_CAUSE_ANALYSIS.md

@ -0,0 +1,58 @@
# Final Root Cause Analysis
## Overview
This document provides a deep technical analysis of the s3fs compatibility issue with PyArrow Parquet datasets on SeaweedFS, and the solution implemented to resolve it.
## Root Cause
When PyArrow writes datasets using `write_dataset()`, it creates implicit directory structures by writing files without explicit directory markers. However, some S3 workflows may create 0-byte directory markers.
### The Problem
1. **PyArrow writes dataset files** without creating explicit directory objects
2. **s3fs calls HEAD** on the directory path to check if it exists
3. **If HEAD returns 200** with `Content-Length: 0`, s3fs interprets it as a file (not a directory)
4. **PyArrow fails** when trying to read, reporting "Parquet file size is 0 bytes"
### AWS S3 Behavior
AWS S3 returns **404 Not Found** for implicit directories (directories that only exist because they have children but no explicit marker object). This allows s3fs to fall back to LIST operations to detect the directory.
## The Solution
### Implementation
Modified the S3 API HEAD handler in `weed/s3api/s3api_object_handlers.go` to:
1. **Check if object ends with `/`**: Explicit directory markers return 200 as before
2. **Check if object has children**: If a 0-byte object has children in the filer, treat it as an implicit directory
3. **Return 404 for implicit directories**: This matches AWS S3 behavior and triggers s3fs's LIST fallback
### Code Changes
The fix is implemented in the `HeadObjectHandler` function with logic to:
- Detect implicit directories by checking for child entries
- Return 404 (NoSuchKey) for implicit directories
- Preserve existing behavior for explicit directory markers and regular files
## Performance Considerations
### Optimization: Child Check Cache
- Child existence checks are performed via filer LIST operations
- Results could be cached for frequently accessed paths
- Trade-off between consistency and performance
### Impact
- Minimal performance impact for normal file operations
- Slight overhead for HEAD requests on implicit directories (one additional LIST call)
- Overall improvement in PyArrow compatibility outweighs minor performance cost
## TODO
- [ ] Add detailed benchmarking results comparing before/after fix
- [ ] Document edge cases discovered during implementation
- [ ] Add architectural diagrams showing the request flow
- [ ] Document alternative solutions considered and why they were rejected
- [ ] Add performance profiling data for child existence checks

70
test/s3/parquet/MINIO_DIRECTORY_HANDLING.md

@ -0,0 +1,70 @@
# MinIO Directory Handling Comparison
## Overview
This document compares how MinIO handles directory markers versus SeaweedFS's implementation, and explains the different approaches to S3 directory semantics.
## MinIO's Approach
MinIO handles implicit directories similarly to AWS S3:
1. **No explicit directory objects**: Directories are implicit, defined only by object key prefixes
2. **HEAD on directory returns 404**: Consistent with AWS S3 behavior
3. **LIST operations reveal directories**: Directories are discovered through delimiter-based LIST operations
4. **Automatic prefix handling**: MinIO automatically recognizes prefixes as directories
### MinIO Implementation Details
- Uses in-memory metadata for fast prefix lookups
- Optimized for LIST operations with common delimiter (`/`)
- No persistent directory objects in storage layer
- Directories "exist" as long as they contain objects
## SeaweedFS Approach
SeaweedFS uses a filer-based approach with real directory entries:
### Before the Fix
1. **Explicit directory objects**: Could create 0-byte objects as directory markers
2. **HEAD returns 200**: Even for implicit directories
3. **Caused s3fs issues**: s3fs interpreted 0-byte HEAD responses as empty files
### After the Fix
1. **Hybrid approach**: Supports both explicit markers (with `/` suffix) and implicit directories
2. **HEAD returns 404 for implicit directories**: Matches AWS S3 and MinIO behavior
3. **Filer integration**: Uses filer's directory metadata to detect implicit directories
4. **s3fs compatibility**: Triggers proper LIST fallback behavior
## Key Differences
| Aspect | MinIO | SeaweedFS (After Fix) |
|--------|-------|----------------------|
| Directory Storage | No persistent objects | Filer directory entries |
| Implicit Directory HEAD | 404 Not Found | 404 Not Found |
| Explicit Marker HEAD | Not applicable | 200 OK (with `/` suffix) |
| Child Detection | Prefix scan | Filer LIST operation |
| Performance | In-memory lookups | Filer gRPC calls |
## Implementation Considerations
### Advantages of SeaweedFS Approach
- Integrates with existing filer metadata
- Supports both implicit and explicit directories
- Preserves directory metadata and attributes
- Compatible with POSIX filer semantics
### Trade-offs
- Additional filer communication overhead for HEAD requests
- Complexity of supporting both directory paradigms
- Performance depends on filer efficiency
## TODO
- [ ] Add performance benchmark comparison: MinIO vs SeaweedFS
- [ ] Document edge cases where behaviors differ
- [ ] Add example request/response traces for both systems
- [ ] Document migration path for users moving from MinIO to SeaweedFS
- [ ] Add compatibility matrix for different S3 clients

482
test/s3/parquet/Makefile

@ -0,0 +1,482 @@
# Makefile for S3 Parquet Integration Tests
# This Makefile provides targets for running comprehensive S3 Parquet tests with PyArrow
# Default values
SEAWEEDFS_BINARY ?= weed
S3_PORT ?= 8333
FILER_PORT ?= 8888
VOLUME_PORT ?= 8080
MASTER_PORT ?= 9333
TEST_TIMEOUT ?= 15m
ACCESS_KEY ?= some_access_key1
SECRET_KEY ?= some_secret_key1
VOLUME_MAX_SIZE_MB ?= 50
VOLUME_MAX_COUNT ?= 100
BUCKET_NAME ?= test-parquet-bucket
ENABLE_SSE_S3 ?= false
# Python configuration
PYTHON ?= python3
VENV_DIR ?= .venv
PYTHON_TEST_SCRIPT ?= s3_parquet_test.py
# Test directory
TEST_DIR := $(shell pwd)
SEAWEEDFS_ROOT := $(shell cd ../../../ && pwd)
# Colors for output
RED := \033[0;31m
GREEN := \033[0;32m
YELLOW := \033[1;33m
NC := \033[0m # No Color
.PHONY: all build-weed check-binary check-python ci-test clean debug-logs debug-status help manual-start manual-stop setup-python start-seaweedfs start-seaweedfs-ci stop-seaweedfs stop-seaweedfs-safe test test-cross-fs test-cross-fs-with-server test-implicit-dir test-implicit-dir-with-server test-native-s3 test-native-s3-with-server test-native-s3-with-sse test-quick test-sse-s3-compat test-with-server
all: test
# Build SeaweedFS binary (GitHub Actions compatible)
build-weed:
@echo "Building SeaweedFS binary..."
@cd $(SEAWEEDFS_ROOT)/weed && go install -buildvcs=false
@echo "✅ SeaweedFS binary built successfully"
help:
@echo "SeaweedFS S3 Parquet Integration Tests"
@echo ""
@echo "Available targets:"
@echo " test - Run full S3 Parquet integration tests (small and large files)"
@echo " test-with-server - Run full tests with automatic server management (CI compatible)"
@echo " test-quick - Run quick tests with small files only (sets TEST_QUICK=1)"
@echo " test-implicit-dir - Test implicit directory fix for s3fs compatibility"
@echo " test-implicit-dir-with-server - Test implicit directory fix with server management"
@echo " test-native-s3 - Test PyArrow's native S3 filesystem (assumes server running)"
@echo " test-native-s3-with-server - Test PyArrow's native S3 filesystem with server management"
@echo " test-native-s3-with-sse - Test PyArrow's native S3 with SSE-S3 encryption enabled"
@echo " test-cross-fs - Test cross-filesystem compatibility (s3fs ↔ PyArrow native)"
@echo " test-cross-fs-with-server - Test cross-filesystem compatibility with server management"
@echo " test-sse-s3-compat - Comprehensive SSE-S3 compatibility test (multipart uploads)"
@echo " setup-python - Setup Python virtual environment and install dependencies"
@echo " check-python - Check if Python and required packages are available"
@echo " start-seaweedfs - Start SeaweedFS server for testing"
@echo " start-seaweedfs-ci - Start SeaweedFS server (CI-safe version)"
@echo " stop-seaweedfs - Stop SeaweedFS server"
@echo " stop-seaweedfs-safe - Stop SeaweedFS server (CI-safe version)"
@echo " clean - Clean up test artifacts"
@echo " check-binary - Check if SeaweedFS binary exists"
@echo " build-weed - Build SeaweedFS binary"
@echo ""
@echo "Configuration:"
@echo " SEAWEEDFS_BINARY=$(SEAWEEDFS_BINARY)"
@echo " S3_PORT=$(S3_PORT)"
@echo " FILER_PORT=$(FILER_PORT)"
@echo " VOLUME_PORT=$(VOLUME_PORT)"
@echo " MASTER_PORT=$(MASTER_PORT)"
@echo " BUCKET_NAME=$(BUCKET_NAME)"
@echo " VOLUME_MAX_SIZE_MB=$(VOLUME_MAX_SIZE_MB)"
@echo " ENABLE_SSE_S3=$(ENABLE_SSE_S3)"
@echo " PYTHON=$(PYTHON)"
check-binary:
@if ! command -v $(SEAWEEDFS_BINARY) > /dev/null 2>&1; then \
echo "$(RED)Error: SeaweedFS binary '$(SEAWEEDFS_BINARY)' not found in PATH$(NC)"; \
echo "Please build SeaweedFS first by running 'make' in the root directory"; \
exit 1; \
fi
@echo "$(GREEN)SeaweedFS binary found: $$(which $(SEAWEEDFS_BINARY))$(NC)"
check-python:
@if ! command -v $(PYTHON) > /dev/null 2>&1; then \
echo "$(RED)Error: Python '$(PYTHON)' not found$(NC)"; \
echo "Please install Python 3.8 or later"; \
exit 1; \
fi
@echo "$(GREEN)Python found: $$(which $(PYTHON)) ($$($(PYTHON) --version))$(NC)"
setup-python: check-python
@echo "$(YELLOW)Setting up Python virtual environment...$(NC)"
@if [ ! -d "$(VENV_DIR)" ]; then \
$(PYTHON) -m venv $(VENV_DIR); \
echo "$(GREEN)Virtual environment created$(NC)"; \
fi
@echo "$(YELLOW)Installing Python dependencies...$(NC)"
@$(VENV_DIR)/bin/pip install --upgrade pip > /dev/null
@$(VENV_DIR)/bin/pip install -r requirements.txt
@echo "$(GREEN)Python dependencies installed successfully$(NC)"
start-seaweedfs-ci: check-binary
@echo "$(YELLOW)Starting SeaweedFS server for Parquet testing...$(NC)"
# Clean up any existing processes first (CI-safe)
@echo "Cleaning up any existing processes..."
@if command -v lsof >/dev/null 2>&1; then \
lsof -ti :$(MASTER_PORT) 2>/dev/null | head -5 | while read pid; do kill -TERM $$pid 2>/dev/null || true; done; \
lsof -ti :$(VOLUME_PORT) 2>/dev/null | head -5 | while read pid; do kill -TERM $$pid 2>/dev/null || true; done; \
lsof -ti :$(FILER_PORT) 2>/dev/null | head -5 | while read pid; do kill -TERM $$pid 2>/dev/null || true; done; \
lsof -ti :$(S3_PORT) 2>/dev/null | head -5 | while read pid; do kill -TERM $$pid 2>/dev/null || true; done; \
lsof -ti :$$(( $(MASTER_PORT) + 10000 )) 2>/dev/null | head -5 | while read pid; do kill -TERM $$pid 2>/dev/null || true; done; \
lsof -ti :$$(( $(VOLUME_PORT) + 10000 )) 2>/dev/null | head -5 | while read pid; do kill -TERM $$pid 2>/dev/null || true; done; \
lsof -ti :$$(( $(FILER_PORT) + 10000 )) 2>/dev/null | head -5 | while read pid; do kill -TERM $$pid 2>/dev/null || true; done; \
fi
@sleep 2
# Create necessary directories
@mkdir -p /tmp/seaweedfs-test-parquet-master
@mkdir -p /tmp/seaweedfs-test-parquet-volume
@mkdir -p /tmp/seaweedfs-test-parquet-filer
# Clean up any old server logs
@rm -f /tmp/seaweedfs-parquet-*.log || true
# Start master server with volume size limit and explicit gRPC port
@echo "Starting master server..."
@nohup $(SEAWEEDFS_BINARY) master -port=$(MASTER_PORT) -port.grpc=$$(( $(MASTER_PORT) + 10000 )) -mdir=/tmp/seaweedfs-test-parquet-master -volumeSizeLimitMB=$(VOLUME_MAX_SIZE_MB) -ip=127.0.0.1 -peers=none > /tmp/seaweedfs-parquet-master.log 2>&1 &
@sleep 3
# Start volume server with master HTTP port and increased capacity
@echo "Starting volume server..."
@nohup $(SEAWEEDFS_BINARY) volume -port=$(VOLUME_PORT) -mserver=127.0.0.1:$(MASTER_PORT) -dir=/tmp/seaweedfs-test-parquet-volume -max=$(VOLUME_MAX_COUNT) -ip=127.0.0.1 -preStopSeconds=1 > /tmp/seaweedfs-parquet-volume.log 2>&1 &
@sleep 5
# Start filer server with embedded S3
@echo "Starting filer server with embedded S3..."
@if [ "$(ENABLE_SSE_S3)" = "true" ]; then \
echo " SSE-S3 encryption: ENABLED"; \
printf '{"identities":[{"name":"%s","credentials":[{"accessKey":"%s","secretKey":"%s"}],"actions":["Admin","Read","Write"]}],"buckets":[{"name":"$(BUCKET_NAME)","encryption":{"sseS3":{"enabled":true}}}]}' "$(ACCESS_KEY)" "$(ACCESS_KEY)" "$(SECRET_KEY)" > /tmp/seaweedfs-parquet-s3.json; \
else \
echo " SSE-S3 encryption: DISABLED"; \
printf '{"identities":[{"name":"%s","credentials":[{"accessKey":"%s","secretKey":"%s"}],"actions":["Admin","Read","Write"]}]}' "$(ACCESS_KEY)" "$(ACCESS_KEY)" "$(SECRET_KEY)" > /tmp/seaweedfs-parquet-s3.json; \
fi
@AWS_ACCESS_KEY_ID=$(ACCESS_KEY) AWS_SECRET_ACCESS_KEY=$(SECRET_KEY) nohup $(SEAWEEDFS_BINARY) filer -port=$(FILER_PORT) -port.grpc=$$(( $(FILER_PORT) + 10000 )) -master=127.0.0.1:$(MASTER_PORT) -dataCenter=defaultDataCenter -ip=127.0.0.1 -s3 -s3.port=$(S3_PORT) -s3.config=/tmp/seaweedfs-parquet-s3.json > /tmp/seaweedfs-parquet-filer.log 2>&1 &
@sleep 5
# Wait for S3 service to be ready - use port-based checking for reliability
@echo "$(YELLOW)Waiting for S3 service to be ready...$(NC)"
@for i in $$(seq 1 20); do \
if netstat -an 2>/dev/null | grep -q ":$(S3_PORT).*LISTEN" || \
ss -an 2>/dev/null | grep -q ":$(S3_PORT).*LISTEN" || \
lsof -i :$(S3_PORT) >/dev/null 2>&1; then \
echo "$(GREEN)S3 service is listening on port $(S3_PORT)$(NC)"; \
sleep 1; \
break; \
fi; \
if [ $$i -eq 20 ]; then \
echo "$(RED)S3 service failed to start within 20 seconds$(NC)"; \
echo "=== Detailed Logs ==="; \
echo "Master log:"; tail -30 /tmp/seaweedfs-parquet-master.log || true; \
echo "Volume log:"; tail -30 /tmp/seaweedfs-parquet-volume.log || true; \
echo "Filer log:"; tail -30 /tmp/seaweedfs-parquet-filer.log || true; \
echo "=== Port Status ==="; \
netstat -an 2>/dev/null | grep ":$(S3_PORT)" || \
ss -an 2>/dev/null | grep ":$(S3_PORT)" || \
echo "No port listening on $(S3_PORT)"; \
exit 1; \
fi; \
echo "Waiting for S3 service... ($$i/20)"; \
sleep 1; \
done
# Additional wait for filer gRPC to be ready
@echo "$(YELLOW)Waiting for filer gRPC to be ready...$(NC)"
@sleep 2
# Wait for volume server to register with master and ensure volume assignment works
@echo "$(YELLOW)Waiting for volume assignment to be ready...$(NC)"
@for i in $$(seq 1 30); do \
ASSIGN_RESULT=$$(curl -s "http://localhost:$(MASTER_PORT)/dir/assign?count=1" 2>/dev/null); \
if echo "$$ASSIGN_RESULT" | grep -q '"fid"'; then \
echo "$(GREEN)Volume assignment is ready$(NC)"; \
break; \
fi; \
if [ $$i -eq 30 ]; then \
echo "$(RED)Volume assignment not ready after 30 seconds$(NC)"; \
echo "=== Last assign attempt ==="; \
echo "$$ASSIGN_RESULT"; \
echo "=== Master Status ==="; \
curl -s "http://localhost:$(MASTER_PORT)/dir/status" 2>/dev/null || echo "Failed to get master status"; \
echo "=== Master Logs ==="; \
tail -50 /tmp/seaweedfs-parquet-master.log 2>/dev/null || echo "No master log"; \
echo "=== Volume Logs ==="; \
tail -50 /tmp/seaweedfs-parquet-volume.log 2>/dev/null || echo "No volume log"; \
exit 1; \
fi; \
echo "Waiting for volume assignment... ($$i/30)"; \
sleep 1; \
done
@echo "$(GREEN)SeaweedFS server started successfully for Parquet testing$(NC)"
@echo "Master: http://localhost:$(MASTER_PORT)"
@echo "Volume: http://localhost:$(VOLUME_PORT)"
@echo "Filer: http://localhost:$(FILER_PORT)"
@echo "S3: http://localhost:$(S3_PORT)"
@echo "Volume Max Size: $(VOLUME_MAX_SIZE_MB)MB"
start-seaweedfs: check-binary
@echo "$(YELLOW)Starting SeaweedFS server for Parquet testing...$(NC)"
@# Use port-based cleanup for consistency and safety
@echo "Cleaning up any existing processes..."
@lsof -ti :$(MASTER_PORT) 2>/dev/null | xargs -r kill -TERM || true
@lsof -ti :$(VOLUME_PORT) 2>/dev/null | xargs -r kill -TERM || true
@lsof -ti :$(FILER_PORT) 2>/dev/null | xargs -r kill -TERM || true
@lsof -ti :$(S3_PORT) 2>/dev/null | xargs -r kill -TERM || true
@# Clean up gRPC ports (HTTP port + 10000)
@lsof -ti :$$(( $(MASTER_PORT) + 10000 )) 2>/dev/null | xargs -r kill -TERM || true
@lsof -ti :$$(( $(VOLUME_PORT) + 10000 )) 2>/dev/null | xargs -r kill -TERM || true
@lsof -ti :$$(( $(FILER_PORT) + 10000 )) 2>/dev/null | xargs -r kill -TERM || true
@sleep 2
@$(MAKE) start-seaweedfs-ci
stop-seaweedfs:
@echo "$(YELLOW)Stopping SeaweedFS server...$(NC)"
@# Use port-based cleanup for consistency and safety
@lsof -ti :$(MASTER_PORT) 2>/dev/null | xargs -r kill -TERM || true
@lsof -ti :$(VOLUME_PORT) 2>/dev/null | xargs -r kill -TERM || true
@lsof -ti :$(FILER_PORT) 2>/dev/null | xargs -r kill -TERM || true
@lsof -ti :$(S3_PORT) 2>/dev/null | xargs -r kill -TERM || true
@# Clean up gRPC ports (HTTP port + 10000)
@lsof -ti :$$(( $(MASTER_PORT) + 10000 )) 2>/dev/null | xargs -r kill -TERM || true
@lsof -ti :$$(( $(VOLUME_PORT) + 10000 )) 2>/dev/null | xargs -r kill -TERM || true
@lsof -ti :$$(( $(FILER_PORT) + 10000 )) 2>/dev/null | xargs -r kill -TERM || true
@sleep 2
@echo "$(GREEN)SeaweedFS server stopped$(NC)"
# CI-safe server stop that's more conservative
stop-seaweedfs-safe:
@echo "$(YELLOW)Safely stopping SeaweedFS server...$(NC)"
@# Use port-based cleanup which is safer in CI
@if command -v lsof >/dev/null 2>&1; then \
echo "Using lsof for port-based cleanup..."; \
lsof -ti :$(MASTER_PORT) 2>/dev/null | head -5 | while read pid; do kill -TERM $$pid 2>/dev/null || true; done; \
lsof -ti :$(VOLUME_PORT) 2>/dev/null | head -5 | while read pid; do kill -TERM $$pid 2>/dev/null || true; done; \
lsof -ti :$(FILER_PORT) 2>/dev/null | head -5 | while read pid; do kill -TERM $$pid 2>/dev/null || true; done; \
lsof -ti :$(S3_PORT) 2>/dev/null | head -5 | while read pid; do kill -TERM $$pid 2>/dev/null || true; done; \
lsof -ti :$$(( $(MASTER_PORT) + 10000 )) 2>/dev/null | head -5 | while read pid; do kill -TERM $$pid 2>/dev/null || true; done; \
lsof -ti :$$(( $(VOLUME_PORT) + 10000 )) 2>/dev/null | head -5 | while read pid; do kill -TERM $$pid 2>/dev/null || true; done; \
lsof -ti :$$(( $(FILER_PORT) + 10000 )) 2>/dev/null | head -5 | while read pid; do kill -TERM $$pid 2>/dev/null || true; done; \
else \
echo "lsof not available, using netstat approach..."; \
netstat -tlnp 2>/dev/null | grep :$(MASTER_PORT) | awk '{print $$7}' | cut -d/ -f1 | head -5 | while read pid; do [ "$$pid" != "-" ] && kill -TERM $$pid 2>/dev/null || true; done; \
netstat -tlnp 2>/dev/null | grep :$(VOLUME_PORT) | awk '{print $$7}' | cut -d/ -f1 | head -5 | while read pid; do [ "$$pid" != "-" ] && kill -TERM $$pid 2>/dev/null || true; done; \
netstat -tlnp 2>/dev/null | grep :$(FILER_PORT) | awk '{print $$7}' | cut -d/ -f1 | head -5 | while read pid; do [ "$$pid" != "-" ] && kill -TERM $$pid 2>/dev/null || true; done; \
netstat -tlnp 2>/dev/null | grep :$(S3_PORT) | awk '{print $$7}' | cut -d/ -f1 | head -5 | while read pid; do [ "$$pid" != "-" ] && kill -TERM $$pid 2>/dev/null || true; done; \
netstat -tlnp 2>/dev/null | grep :$$(( $(MASTER_PORT) + 10000 )) | awk '{print $$7}' | cut -d/ -f1 | head -5 | while read pid; do [ "$$pid" != "-" ] && kill -TERM $$pid 2>/dev/null || true; done; \
netstat -tlnp 2>/dev/null | grep :$$(( $(VOLUME_PORT) + 10000 )) | awk '{print $$7}' | cut -d/ -f1 | head -5 | while read pid; do [ "$$pid" != "-" ] && kill -TERM $$pid 2>/dev/null || true; done; \
netstat -tlnp 2>/dev/null | grep :$$(( $(FILER_PORT) + 10000 )) | awk '{print $$7}' | cut -d/ -f1 | head -5 | while read pid; do [ "$$pid" != "-" ] && kill -TERM $$pid 2>/dev/null || true; done; \
fi
@sleep 2
@echo "$(GREEN)SeaweedFS server safely stopped$(NC)"
clean:
@echo "$(YELLOW)Cleaning up Parquet test artifacts...$(NC)"
@rm -rf /tmp/seaweedfs-test-parquet-*
@rm -f /tmp/seaweedfs-parquet-*.log
@rm -f /tmp/seaweedfs-parquet-s3.json
@rm -f s3_parquet_test_errors_*.log
@rm -rf $(VENV_DIR)
@echo "$(GREEN)Parquet test cleanup completed$(NC)"
# Test with automatic server management (GitHub Actions compatible)
test-with-server: build-weed setup-python
@echo "🚀 Starting Parquet integration tests with automated server management..."
@echo "Starting SeaweedFS cluster..."
@if $(MAKE) start-seaweedfs-ci > weed-test.log 2>&1; then \
echo "✅ SeaweedFS cluster started successfully"; \
echo "Running Parquet integration tests..."; \
trap '$(MAKE) -C $(TEST_DIR) stop-seaweedfs-safe || true' EXIT; \
S3_ENDPOINT_URL=http://localhost:$(S3_PORT) \
S3_ACCESS_KEY=$(ACCESS_KEY) \
S3_SECRET_KEY=$(SECRET_KEY) \
BUCKET_NAME=$(BUCKET_NAME) \
$(VENV_DIR)/bin/$(PYTHON) $(PYTHON_TEST_SCRIPT) || exit 1; \
echo "✅ All tests completed successfully"; \
else \
echo "❌ Failed to start SeaweedFS cluster"; \
echo "=== Server startup logs ==="; \
tail -100 weed-test.log 2>/dev/null || echo "No startup log available"; \
echo "=== System information ==="; \
ps aux | grep -E "weed|make" | grep -v grep || echo "No relevant processes found"; \
exit 1; \
fi
# Run tests assuming SeaweedFS is already running
test: setup-python
@echo "$(YELLOW)Running Parquet integration tests...$(NC)"
@echo "$(YELLOW)Assuming SeaweedFS is already running on localhost:$(S3_PORT)$(NC)"
@S3_ENDPOINT_URL=http://localhost:$(S3_PORT) \
S3_ACCESS_KEY=$(ACCESS_KEY) \
S3_SECRET_KEY=$(SECRET_KEY) \
BUCKET_NAME=$(BUCKET_NAME) \
$(VENV_DIR)/bin/$(PYTHON) $(PYTHON_TEST_SCRIPT)
# Run quick tests with small files only
test-quick: setup-python
@echo "$(YELLOW)Running quick Parquet tests (small files only)...$(NC)"
@echo "$(YELLOW)Assuming SeaweedFS is already running on localhost:$(S3_PORT)$(NC)"
@S3_ENDPOINT_URL=http://localhost:$(S3_PORT) \
S3_ACCESS_KEY=$(ACCESS_KEY) \
S3_SECRET_KEY=$(SECRET_KEY) \
BUCKET_NAME=$(BUCKET_NAME) \
TEST_QUICK=1 \
$(VENV_DIR)/bin/$(PYTHON) $(PYTHON_TEST_SCRIPT)
# Test implicit directory fix for s3fs compatibility
test-implicit-dir: setup-python
@echo "$(YELLOW)Running implicit directory fix tests...$(NC)"
@echo "$(YELLOW)Assuming SeaweedFS is already running on localhost:$(S3_PORT)$(NC)"
@S3_ENDPOINT_URL=http://localhost:$(S3_PORT) \
S3_ACCESS_KEY=$(ACCESS_KEY) \
S3_SECRET_KEY=$(SECRET_KEY) \
BUCKET_NAME=test-implicit-dir \
$(VENV_DIR)/bin/$(PYTHON) test_implicit_directory_fix.py
# Test implicit directory fix with automatic server management
test-implicit-dir-with-server: build-weed setup-python
@echo "🚀 Starting implicit directory fix tests with automated server management..."
@echo "Starting SeaweedFS cluster..."
@if $(MAKE) start-seaweedfs-ci > weed-test.log 2>&1; then \
echo "✅ SeaweedFS cluster started successfully"; \
echo "Running implicit directory fix tests..."; \
trap '$(MAKE) -C $(TEST_DIR) stop-seaweedfs-safe || true' EXIT; \
S3_ENDPOINT_URL=http://localhost:$(S3_PORT) \
S3_ACCESS_KEY=$(ACCESS_KEY) \
S3_SECRET_KEY=$(SECRET_KEY) \
BUCKET_NAME=test-implicit-dir \
$(VENV_DIR)/bin/$(PYTHON) test_implicit_directory_fix.py || exit 1; \
echo "✅ All tests completed successfully"; \
else \
echo "❌ Failed to start SeaweedFS cluster"; \
echo "=== Server startup logs ==="; \
tail -100 weed-test.log 2>/dev/null || echo "No startup log available"; \
exit 1; \
fi
# Debug targets
debug-logs:
@echo "$(YELLOW)=== Master Log ===$(NC)"
@tail -n 50 /tmp/seaweedfs-parquet-master.log || echo "No master log found"
@echo "$(YELLOW)=== Volume Log ===$(NC)"
@tail -n 50 /tmp/seaweedfs-parquet-volume.log || echo "No volume log found"
@echo "$(YELLOW)=== Filer Log ===$(NC)"
@tail -n 50 /tmp/seaweedfs-parquet-filer.log || echo "No filer log found"
debug-status:
@echo "$(YELLOW)=== Process Status ===$(NC)"
@ps aux | grep -E "(weed|seaweedfs)" | grep -v grep || echo "No SeaweedFS processes found"
@echo "$(YELLOW)=== Port Status ===$(NC)"
@netstat -an | grep -E "($(MASTER_PORT)|$(VOLUME_PORT)|$(FILER_PORT)|$(S3_PORT))" || echo "No ports in use"
# Manual test targets for development
manual-start: start-seaweedfs
@echo "$(GREEN)SeaweedFS with S3 is now running for manual testing$(NC)"
@echo "You can now run Parquet tests manually"
@echo "Run 'make manual-stop' when finished"
manual-stop: stop-seaweedfs clean
# Test PyArrow's native S3 filesystem
test-native-s3: setup-python
@echo "$(YELLOW)Running PyArrow native S3 filesystem tests...$(NC)"
@echo "$(YELLOW)Assuming SeaweedFS is already running on localhost:$(S3_PORT)$(NC)"
@S3_ENDPOINT_URL=http://localhost:$(S3_PORT) \
S3_ACCESS_KEY=$(ACCESS_KEY) \
S3_SECRET_KEY=$(SECRET_KEY) \
BUCKET_NAME=$(BUCKET_NAME) \
$(VENV_DIR)/bin/$(PYTHON) test_pyarrow_native_s3.py
# Test PyArrow's native S3 filesystem with automatic server management
test-native-s3-with-server: build-weed setup-python
@echo "🚀 Starting PyArrow native S3 filesystem tests with automated server management..."
@echo "Starting SeaweedFS cluster..."
@if $(MAKE) start-seaweedfs-ci > weed-test.log 2>&1; then \
echo "✅ SeaweedFS cluster started successfully"; \
echo "Running PyArrow native S3 filesystem tests..."; \
trap '$(MAKE) -C $(TEST_DIR) stop-seaweedfs-safe || true' EXIT; \
S3_ENDPOINT_URL=http://localhost:$(S3_PORT) \
S3_ACCESS_KEY=$(ACCESS_KEY) \
S3_SECRET_KEY=$(SECRET_KEY) \
BUCKET_NAME=$(BUCKET_NAME) \
$(VENV_DIR)/bin/$(PYTHON) test_pyarrow_native_s3.py || exit 1; \
echo "✅ All tests completed successfully"; \
else \
echo "❌ Failed to start SeaweedFS cluster"; \
echo "=== Server startup logs ==="; \
tail -100 weed-test.log 2>/dev/null || echo "No startup log available"; \
exit 1; \
fi
# Test cross-filesystem compatibility (s3fs ↔ PyArrow native S3)
test-cross-fs: setup-python
@echo "$(YELLOW)Running cross-filesystem compatibility tests...$(NC)"
@echo "$(YELLOW)Assuming SeaweedFS is already running on localhost:$(S3_PORT)$(NC)"
@S3_ENDPOINT_URL=http://localhost:$(S3_PORT) \
S3_ACCESS_KEY=$(ACCESS_KEY) \
S3_SECRET_KEY=$(SECRET_KEY) \
BUCKET_NAME=$(BUCKET_NAME) \
$(VENV_DIR)/bin/$(PYTHON) test_cross_filesystem_compatibility.py
# Test cross-filesystem compatibility with automatic server management
test-cross-fs-with-server: build-weed setup-python
@echo "🚀 Starting cross-filesystem compatibility tests with automated server management..."
@echo "Starting SeaweedFS cluster..."
@if $(MAKE) start-seaweedfs-ci > weed-test.log 2>&1; then \
echo "✅ SeaweedFS cluster started successfully"; \
echo "Running cross-filesystem compatibility tests..."; \
trap '$(MAKE) -C $(TEST_DIR) stop-seaweedfs-safe || true' EXIT; \
S3_ENDPOINT_URL=http://localhost:$(S3_PORT) \
S3_ACCESS_KEY=$(ACCESS_KEY) \
S3_SECRET_KEY=$(SECRET_KEY) \
BUCKET_NAME=$(BUCKET_NAME) \
$(VENV_DIR)/bin/$(PYTHON) test_cross_filesystem_compatibility.py || exit 1; \
echo "✅ All tests completed successfully"; \
else \
echo "❌ Failed to start SeaweedFS cluster"; \
echo "=== Server startup logs ==="; \
tail -100 weed-test.log 2>/dev/null || echo "No startup log available"; \
exit 1; \
fi
# Test PyArrow's native S3 filesystem compatibility with SSE-S3 enabled backend
# (For encryption-specific validation, use test-sse-s3-compat)
test-native-s3-with-sse: build-weed setup-python
@echo "🚀 Testing PyArrow native S3 compatibility with SSE-S3 enabled backend..."
@echo "Starting SeaweedFS cluster with SSE-S3 enabled..."
@if $(MAKE) start-seaweedfs-ci ENABLE_SSE_S3=true > weed-test-sse.log 2>&1; then \
echo "✅ SeaweedFS cluster started successfully with SSE-S3"; \
echo "Running PyArrow native S3 filesystem tests with SSE-S3..."; \
trap '$(MAKE) -C $(TEST_DIR) stop-seaweedfs-safe || true' EXIT; \
S3_ENDPOINT_URL=http://localhost:$(S3_PORT) \
S3_ACCESS_KEY=$(ACCESS_KEY) \
S3_SECRET_KEY=$(SECRET_KEY) \
BUCKET_NAME=$(BUCKET_NAME) \
$(VENV_DIR)/bin/$(PYTHON) test_pyarrow_native_s3.py || exit 1; \
echo "✅ All SSE-S3 tests completed successfully"; \
else \
echo "❌ Failed to start SeaweedFS cluster with SSE-S3"; \
echo "=== Server startup logs ==="; \
tail -100 weed-test-sse.log 2>/dev/null || echo "No startup log available"; \
exit 1; \
fi
# Comprehensive SSE-S3 compatibility test
test-sse-s3-compat: build-weed setup-python
@echo "🚀 Starting comprehensive SSE-S3 compatibility tests..."
@echo "Starting SeaweedFS cluster with SSE-S3 enabled..."
@if $(MAKE) start-seaweedfs-ci ENABLE_SSE_S3=true > weed-test-sse-compat.log 2>&1; then \
echo "✅ SeaweedFS cluster started successfully with SSE-S3"; \
echo "Running comprehensive SSE-S3 compatibility tests..."; \
trap '$(MAKE) -C $(TEST_DIR) stop-seaweedfs-safe || true' EXIT; \
S3_ENDPOINT_URL=http://localhost:$(S3_PORT) \
S3_ACCESS_KEY=$(ACCESS_KEY) \
S3_SECRET_KEY=$(SECRET_KEY) \
BUCKET_NAME=$(BUCKET_NAME) \
$(VENV_DIR)/bin/$(PYTHON) test_sse_s3_compatibility.py || exit 1; \
echo "✅ All SSE-S3 compatibility tests completed successfully"; \
else \
echo "❌ Failed to start SeaweedFS cluster with SSE-S3"; \
echo "=== Server startup logs ==="; \
tail -100 weed-test-sse-compat.log 2>/dev/null || echo "No startup log available"; \
exit 1; \
fi
# CI/CD targets
ci-test: test-with-server

312
test/s3/parquet/README.md

@ -0,0 +1,312 @@
# PyArrow Parquet S3 Compatibility Tests
This directory contains tests for PyArrow Parquet compatibility with SeaweedFS S3 API, including the implicit directory detection fix.
## Overview
**Status**: ✅ **All PyArrow methods work correctly with SeaweedFS**
SeaweedFS implements implicit directory detection to improve compatibility with s3fs and PyArrow. When PyArrow writes datasets using `write_dataset()`, it may create directory markers that can confuse s3fs. SeaweedFS now handles these correctly by returning 404 for HEAD requests on implicit directories (directories with children), forcing s3fs to use LIST-based discovery.
## Quick Start
### Running the Example Script
```bash
# Start SeaweedFS server
make start-seaweedfs-ci
# Run the example script
python3 example_pyarrow_native.py
# Or with uv (if available)
uv run example_pyarrow_native.py
# Stop the server when done
make stop-seaweedfs-safe
```
### Running Tests
```bash
# Setup Python environment
make setup-python
# Run all tests with server (small and large files)
make test-with-server
# Run quick tests with small files only (faster for development)
make test-quick
# Run implicit directory fix tests
make test-implicit-dir-with-server
# Run PyArrow native S3 filesystem tests
make test-native-s3-with-server
# Run cross-filesystem compatibility tests (s3fs ↔ PyArrow native)
make test-cross-fs-with-server
# Run SSE-S3 encryption tests
make test-sse-s3-compat
# Clean up
make clean
```
### Using PyArrow with SeaweedFS
#### Option 1: Using s3fs (recommended for compatibility)
```python
import pyarrow as pa
import pyarrow.parquet as pq
import pyarrow.dataset as pads
import s3fs
# Configure s3fs
fs = s3fs.S3FileSystem(
key='your_access_key',
secret='your_secret_key',
endpoint_url='http://localhost:8333',
use_ssl=False
)
# Write dataset (creates directory structure)
table = pa.table({'id': [1, 2, 3], 'value': ['a', 'b', 'c']})
pads.write_dataset(table, 'bucket/dataset', filesystem=fs)
# Read dataset (all methods work!)
dataset = pads.dataset('bucket/dataset', filesystem=fs) # ✅
table = pq.read_table('bucket/dataset', filesystem=fs) # ✅
dataset = pq.ParquetDataset('bucket/dataset', filesystem=fs) # ✅
```
#### Option 2: Using PyArrow's native S3 filesystem (pure PyArrow)
```python
import pyarrow as pa
import pyarrow.parquet as pq
import pyarrow.dataset as pads
import pyarrow.fs as pafs
# Configure PyArrow's native S3 filesystem
s3 = pafs.S3FileSystem(
access_key='your_access_key',
secret_key='your_secret_key',
endpoint_override='localhost:8333',
scheme='http',
allow_bucket_creation=True,
allow_bucket_deletion=True
)
# Write dataset
table = pa.table({'id': [1, 2, 3], 'value': ['a', 'b', 'c']})
pads.write_dataset(table, 'bucket/dataset', filesystem=s3)
# Read dataset (all methods work!)
table = pq.read_table('bucket/dataset', filesystem=s3) # ✅
dataset = pq.ParquetDataset('bucket/dataset', filesystem=s3) # ✅
dataset = pads.dataset('bucket/dataset', filesystem=s3) # ✅
```
## Test Files
### Main Test Suite
- **`s3_parquet_test.py`** - Comprehensive PyArrow test suite
- Tests 2 write methods × 5 read methods × 2 dataset sizes = 20 combinations
- Uses s3fs library for S3 operations
- All tests pass with the implicit directory fix ✅
### PyArrow Native S3 Tests
- **`test_pyarrow_native_s3.py`** - PyArrow's native S3 filesystem tests
- Tests PyArrow's built-in S3FileSystem (pyarrow.fs.S3FileSystem)
- Pure PyArrow solution without s3fs dependency
- Tests 3 read methods × 2 dataset sizes = 6 scenarios
- All tests pass ✅
- **`test_sse_s3_compatibility.py`** - SSE-S3 encryption compatibility tests
- Tests PyArrow native S3 with SSE-S3 server-side encryption
- Tests 5 different file sizes (10 to 500,000 rows)
- Verifies multipart upload encryption works correctly
- All tests pass ✅
### Cross-Filesystem Compatibility Tests
- **`test_cross_filesystem_compatibility.py`** - Verifies cross-compatibility between s3fs and PyArrow native S3
- Tests write with s3fs → read with PyArrow native S3
- Tests write with PyArrow native S3 → read with s3fs
- Tests 2 directions × 3 read methods × 2 dataset sizes = 12 scenarios
- Validates that files written by either filesystem can be read by the other
- **All tests pass**
- See **`CROSS_FILESYSTEM_COMPATIBILITY.md`** for detailed test results and analysis
### Implicit Directory Tests
- **`test_implicit_directory_fix.py`** - Specific tests for the implicit directory fix
- Tests HEAD request behavior
- Tests s3fs directory detection
- Tests PyArrow dataset reading
- All 6 tests pass ✅
### Examples
- **`example_pyarrow_native.py`** - Simple standalone example
- Demonstrates PyArrow's native S3 filesystem usage
- Can be run with `uv run` or regular Python
- Minimal dependencies (pyarrow, boto3)
### Configuration
- **`Makefile`** - Build and test automation
- **`requirements.txt`** - Python dependencies (pyarrow, s3fs, boto3)
- **`.gitignore`** - Ignore patterns for test artifacts
## Documentation
### Technical Documentation
- **`TEST_COVERAGE.md`** - Comprehensive test coverage documentation
- Unit tests (Go): 17 test cases
- Integration tests (Python): 6 test cases
- End-to-end tests (Python): 20 test cases
- **`FINAL_ROOT_CAUSE_ANALYSIS.md`** - Deep technical analysis
- Root cause of the s3fs compatibility issue
- How the implicit directory fix works
- Performance considerations
- **`CROSS_FILESYSTEM_COMPATIBILITY.md`** - Cross-filesystem compatibility test results ✅ **NEW**
- Validates s3fs ↔ PyArrow native S3 interoperability
- Confirms files written by either can be read by the other
- Test methodology and detailed results
- **`MINIO_DIRECTORY_HANDLING.md`** - Comparison with MinIO
- How MinIO handles directory markers
- Differences in implementation approaches
## The Implicit Directory Fix
### Problem
When PyArrow writes datasets with `write_dataset()`, it may create 0-byte directory markers. s3fs's `info()` method calls HEAD on these paths, and if HEAD returns 200 with size=0, s3fs incorrectly reports them as files instead of directories. This causes PyArrow to fail with "Parquet file size is 0 bytes".
### Solution
SeaweedFS now returns 404 for HEAD requests on implicit directories (0-byte objects or directories with children, when requested without a trailing slash). This forces s3fs to fall back to LIST-based discovery, which correctly identifies directories by checking for children.
### Implementation
The fix is implemented in `weed/s3api/s3api_object_handlers.go`:
- `HeadObjectHandler` - Returns 404 for implicit directories
- `hasChildren` - Helper function to check if a path has children
See the source code for detailed inline documentation.
### Test Coverage
- **Unit tests** (Go): `weed/s3api/s3api_implicit_directory_test.go`
- Run: `cd weed/s3api && go test -v -run TestImplicitDirectory`
- **Integration tests** (Python): `test_implicit_directory_fix.py`
- Run: `cd test/s3/parquet && make test-implicit-dir-with-server`
- **End-to-end tests** (Python): `s3_parquet_test.py`
- Run: `cd test/s3/parquet && make test-with-server`
## Makefile Targets
```bash
# Setup
make setup-python # Create Python virtual environment and install dependencies
make build-weed # Build SeaweedFS binary
# Testing
make test # Run full tests (assumes server is already running)
make test-with-server # Run full PyArrow test suite with server (small + large files)
make test-quick # Run quick tests with small files only (assumes server is running)
make test-implicit-dir-with-server # Run implicit directory tests with server
make test-native-s3 # Run PyArrow native S3 tests (assumes server is running)
make test-native-s3-with-server # Run PyArrow native S3 tests with server management
make test-cross-fs # Run cross-filesystem compatibility tests (assumes server is running)
make test-cross-fs-with-server # Run cross-filesystem compatibility tests with server management
make test-sse-s3-compat # Run comprehensive SSE-S3 encryption compatibility tests
# Server Management
make start-seaweedfs-ci # Start SeaweedFS in background (CI mode)
make stop-seaweedfs-safe # Stop SeaweedFS gracefully
make clean # Clean up all test artifacts
# Development
make help # Show all available targets
```
## Continuous Integration
The tests are automatically run in GitHub Actions on every push/PR that affects S3 or filer code:
**Workflow**: `.github/workflows/s3-parquet-tests.yml`
**Test Matrix**:
- Python versions: 3.9, 3.11, 3.12
- PyArrow integration tests (s3fs): 20 test combinations
- PyArrow native S3 tests: 6 test scenarios ✅
- Cross-filesystem compatibility tests: 12 test scenarios ✅ **NEW**
- SSE-S3 encryption tests: 5 file sizes ✅
- Implicit directory fix tests: 6 test scenarios
- Go unit tests: 17 test cases
**Test Steps** (run for each Python version):
1. Build SeaweedFS
2. Run PyArrow Parquet integration tests (`make test-with-server`)
3. Run implicit directory fix tests (`make test-implicit-dir-with-server`)
4. Run PyArrow native S3 filesystem tests (`make test-native-s3-with-server`)
5. Run cross-filesystem compatibility tests (`make test-cross-fs-with-server`) ✅ **NEW**
6. Run SSE-S3 encryption compatibility tests (`make test-sse-s3-compat`)
7. Run Go unit tests for implicit directory handling
**Triggers**:
- Push/PR to master (when `weed/s3api/**` or `weed/filer/**` changes)
- Manual trigger via GitHub UI (workflow_dispatch)
## Requirements
- Python 3.8+
- PyArrow 22.0.0+
- s3fs 2024.12.0+
- boto3 1.40.0+
- SeaweedFS (latest)
## AWS S3 Compatibility
The implicit directory fix makes SeaweedFS behavior more compatible with AWS S3:
- AWS S3 typically doesn't create directory markers for implicit directories
- HEAD on "dataset" (when only "dataset/file.txt" exists) returns 404 on AWS
- SeaweedFS now matches this behavior for implicit directories with children
## Edge Cases Handled
**Implicit directories with children** → 404 (forces LIST-based discovery)
**Empty files (0-byte, no children)** → 200 (legitimate empty file)
**Empty directories (no children)** → 200 (legitimate empty directory)
**Explicit directory requests (trailing slash)** → 200 (normal directory behavior)
**Versioned buckets** → Skip implicit directory check (versioned semantics)
**Regular files** → 200 (normal file behavior)
## Performance
The implicit directory check adds minimal overhead:
- Only triggered for 0-byte objects or directories without trailing slash
- Cost: One LIST operation with Limit=1 (~1-5ms)
- No impact on regular file operations
## Contributing
When adding new tests:
1. Add test cases to the appropriate test file
2. Update TEST_COVERAGE.md
3. Run the full test suite to ensure no regressions
4. Update this README if adding new functionality
## References
- [PyArrow Documentation](https://arrow.apache.org/docs/python/parquet.html)
- [s3fs Documentation](https://s3fs.readthedocs.io/)
- [SeaweedFS S3 API](https://github.com/seaweedfs/seaweedfs/wiki/Amazon-S3-API)
- [AWS S3 API Reference](https://docs.aws.amazon.com/AmazonS3/latest/API/)
---
**Last Updated**: November 19, 2025
**Status**: All tests passing ✅

46
test/s3/parquet/TEST_COVERAGE.md

@ -0,0 +1,46 @@
# Test Coverage Documentation
## Overview
This document provides comprehensive test coverage documentation for the SeaweedFS S3 Parquet integration tests.
## Test Categories
### Unit Tests (Go)
- 17 test cases covering S3 API handlers
- Tests for implicit directory handling
- HEAD request behavior validation
- Located in: `weed/s3api/s3api_implicit_directory_test.go`
### Integration Tests (Python)
- 6 test cases for implicit directory fix
- Tests HEAD request behavior on directory markers
- s3fs directory detection validation
- PyArrow dataset read compatibility
- Located in: `test_implicit_directory_fix.py`
### End-to-End Tests (Python)
- 20 test cases combining write and read methods
- Small file tests (5 rows): 10 test combinations
- Large file tests (200,000 rows): 10 test combinations
- Tests multiple write methods: `pads.write_dataset`, `pq.write_table+s3fs`
- Tests multiple read methods: `pads.dataset`, `pq.ParquetDataset`, `pq.read_table`, `s3fs+direct`, `s3fs+buffered`
- Located in: `s3_parquet_test.py`
## Coverage Summary
| Test Type | Count | Status |
|-----------|-------|--------|
| Unit Tests (Go) | 17 | ✅ Pass |
| Integration Tests (Python) | 6 | ✅ Pass |
| End-to-End Tests (Python) | 20 | ✅ Pass |
| **Total** | **43** | **✅ All Pass** |
## TODO
- [ ] Add detailed test execution time metrics
- [ ] Document test data generation strategies
- [ ] Add code coverage percentages for Go tests
- [ ] Document edge cases and corner cases tested
- [ ] Add performance benchmarking results

134
test/s3/parquet/example_pyarrow_native.py

@ -0,0 +1,134 @@
#!/usr/bin/env python3
# /// script
# dependencies = [
# "pyarrow>=22",
# "boto3>=1.28.0",
# ]
# ///
"""
Simple example of using PyArrow's native S3 filesystem with SeaweedFS.
This is a minimal example demonstrating how to write and read Parquet files
using PyArrow's built-in S3FileSystem without any additional dependencies
like s3fs.
Usage:
# Set environment variables
export S3_ENDPOINT_URL=localhost:8333
export S3_ACCESS_KEY=some_access_key1
export S3_SECRET_KEY=some_secret_key1
export BUCKET_NAME=test-parquet-bucket
# Run the script
python3 example_pyarrow_native.py
# Or run with uv (if available)
uv run example_pyarrow_native.py
"""
import os
import secrets
import pyarrow as pa
import pyarrow.dataset as pads
import pyarrow.fs as pafs
import pyarrow.parquet as pq
from parquet_test_utils import create_sample_table
# Configuration
BUCKET_NAME = os.getenv("BUCKET_NAME", "test-parquet-bucket")
S3_ENDPOINT_URL = os.getenv("S3_ENDPOINT_URL", "localhost:8333")
S3_ACCESS_KEY = os.getenv("S3_ACCESS_KEY", "some_access_key1")
S3_SECRET_KEY = os.getenv("S3_SECRET_KEY", "some_secret_key1")
# Determine scheme from endpoint
if S3_ENDPOINT_URL.startswith("http://"):
scheme = "http"
endpoint = S3_ENDPOINT_URL[7:]
elif S3_ENDPOINT_URL.startswith("https://"):
scheme = "https"
endpoint = S3_ENDPOINT_URL[8:]
else:
scheme = "http" # Default to http for localhost
endpoint = S3_ENDPOINT_URL
print(f"Connecting to S3 endpoint: {scheme}://{endpoint}")
# Initialize PyArrow's NATIVE S3 filesystem
s3 = pafs.S3FileSystem(
access_key=S3_ACCESS_KEY,
secret_key=S3_SECRET_KEY,
endpoint_override=endpoint,
scheme=scheme,
allow_bucket_creation=True,
allow_bucket_deletion=True,
)
print("✓ Connected to S3 endpoint")
# Create bucket if needed (using boto3)
try:
import boto3
from botocore.exceptions import ClientError
s3_client = boto3.client(
's3',
endpoint_url=f"{scheme}://{endpoint}",
aws_access_key_id=S3_ACCESS_KEY,
aws_secret_access_key=S3_SECRET_KEY,
region_name='us-east-1',
)
try:
s3_client.head_bucket(Bucket=BUCKET_NAME)
print(f"✓ Bucket exists: {BUCKET_NAME}")
except ClientError as e:
if e.response['Error']['Code'] == '404':
print(f"Creating bucket: {BUCKET_NAME}")
s3_client.create_bucket(Bucket=BUCKET_NAME)
print(f"✓ Bucket created: {BUCKET_NAME}")
else:
raise
except ImportError:
print("Warning: boto3 not available, assuming bucket exists")
# Generate a unique filename
filename = f"{BUCKET_NAME}/dataset-{secrets.token_hex(8)}/test.parquet"
print(f"\nWriting Parquet dataset to: {filename}")
# Write dataset
table = create_sample_table(200_000)
pads.write_dataset(
table,
filename,
filesystem=s3,
format="parquet",
)
print(f"✓ Wrote {table.num_rows:,} rows")
# Read with pq.read_table
print("\nReading with pq.read_table...")
table_read = pq.read_table(filename, filesystem=s3)
print(f"✓ Read {table_read.num_rows:,} rows")
# Read with pq.ParquetDataset
print("\nReading with pq.ParquetDataset...")
dataset = pq.ParquetDataset(filename, filesystem=s3)
table_dataset = dataset.read()
print(f"✓ Read {table_dataset.num_rows:,} rows")
# Read with pads.dataset
print("\nReading with pads.dataset...")
dataset_pads = pads.dataset(filename, filesystem=s3)
table_pads = dataset_pads.to_table()
print(f"✓ Read {table_pads.num_rows:,} rows")
print("\n✅ All operations completed successfully!")
print(f"\nFile written to: {filename}")
print("You can verify the file using the SeaweedFS S3 API or weed shell")

41
test/s3/parquet/parquet_test_utils.py

@ -0,0 +1,41 @@
"""
Shared utility functions for PyArrow Parquet tests.
This module provides common test utilities used across multiple test scripts
to avoid code duplication and ensure consistency.
"""
import pyarrow as pa
def create_sample_table(num_rows: int = 5) -> pa.Table:
"""Create a sample PyArrow table for testing.
Args:
num_rows: Number of rows to generate (default: 5)
Returns:
PyArrow Table with test data containing:
- id: int64 sequential IDs (0 to num_rows-1)
- name: string user names (user_0, user_1, ...)
- value: float64 values (id * 1.5)
- flag: bool alternating True/False based on even/odd id
Example:
>>> table = create_sample_table(3)
>>> print(table)
pyarrow.Table
id: int64
name: string
value: double
flag: bool
"""
return pa.table(
{
"id": pa.array(range(num_rows), type=pa.int64()),
"name": pa.array([f"user_{i}" for i in range(num_rows)], type=pa.string()),
"value": pa.array([float(i) * 1.5 for i in range(num_rows)], type=pa.float64()),
"flag": pa.array([i % 2 == 0 for i in range(num_rows)], type=pa.bool_()),
}
)

7
test/s3/parquet/requirements.txt

@ -0,0 +1,7 @@
# Python dependencies for S3 Parquet tests
# Install with: pip install -r requirements.txt
pyarrow>=10.0.0
s3fs>=2023.12.0
boto3>=1.28.0

421
test/s3/parquet/s3_parquet_test.py

@ -0,0 +1,421 @@
#!/usr/bin/env python3
"""
Test script for S3-compatible storage with PyArrow Parquet files.
This script tests different write methods (PyArrow write_dataset vs. pq.write_table to buffer)
combined with different read methods (PyArrow dataset, direct s3fs read, buffered read) to
identify which combinations work with large files that span multiple row groups.
This test specifically addresses issues with large tables using PyArrow where files span
multiple row-groups (default row_group size is around 130,000 rows).
Requirements:
- pyarrow>=22
- s3fs>=2024.12.0
Environment Variables:
S3_ENDPOINT_URL: S3 endpoint (default: http://localhost:8333)
S3_ACCESS_KEY: S3 access key (default: some_access_key1)
S3_SECRET_KEY: S3 secret key (default: some_secret_key1)
BUCKET_NAME: S3 bucket name (default: test-parquet-bucket)
TEST_QUICK: Run only small/quick tests (default: 0, set to 1 for quick mode)
Usage:
# Run with default environment variables
python3 s3_parquet_test.py
# Run with custom environment variables
S3_ENDPOINT_URL=http://localhost:8333 \
S3_ACCESS_KEY=mykey \
S3_SECRET_KEY=mysecret \
BUCKET_NAME=mybucket \
python3 s3_parquet_test.py
"""
import io
import logging
import os
import secrets
import sys
import traceback
from datetime import datetime
from typing import Tuple
import pyarrow as pa
import pyarrow.dataset as pads
import pyarrow.parquet as pq
try:
import s3fs
except ImportError:
logging.error("s3fs not installed. Install with: pip install s3fs")
sys.exit(1)
logging.basicConfig(level=logging.INFO, format="%(message)s")
# Error log file
ERROR_LOG_FILE = f"s3_parquet_test_errors_{datetime.now().strftime('%Y%m%d_%H%M%S')}.log"
# Configuration from environment variables with defaults
S3_ENDPOINT_URL = os.environ.get("S3_ENDPOINT_URL", "http://localhost:8333")
S3_ACCESS_KEY = os.environ.get("S3_ACCESS_KEY", "some_access_key1")
S3_SECRET_KEY = os.environ.get("S3_SECRET_KEY", "some_secret_key1")
BUCKET_NAME = os.getenv("BUCKET_NAME", "test-parquet-bucket")
TEST_QUICK = os.getenv("TEST_QUICK", "0") == "1"
# Create randomized test directory
TEST_RUN_ID = secrets.token_hex(8)
TEST_DIR = f"{BUCKET_NAME}/parquet-tests/{TEST_RUN_ID}"
# Test file sizes
TEST_SIZES = {
"small": 5,
"large": 200_000, # This will create multiple row groups
}
# Filter to only small tests if quick mode is enabled
if TEST_QUICK:
TEST_SIZES = {"small": TEST_SIZES["small"]}
logging.info("Quick test mode enabled - running only small tests")
def create_sample_table(num_rows: int = 5) -> pa.Table:
"""Create a sample PyArrow table for testing."""
return pa.table({
"id": pa.array(range(num_rows), type=pa.int64()),
"name": pa.array([f"user_{i}" for i in range(num_rows)], type=pa.string()),
"value": pa.array([float(i) * 1.5 for i in range(num_rows)], type=pa.float64()),
"flag": pa.array([i % 2 == 0 for i in range(num_rows)], type=pa.bool_()),
})
def log_error(operation: str, short_msg: str) -> None:
"""Log error details to file with full traceback."""
with open(ERROR_LOG_FILE, "a") as f:
f.write(f"\n{'='*80}\n")
f.write(f"Operation: {operation}\n")
f.write(f"Time: {datetime.now().isoformat()}\n")
f.write(f"Message: {short_msg}\n")
f.write("Full Traceback:\n")
f.write(traceback.format_exc())
f.write(f"{'='*80}\n")
def init_s3fs() -> s3fs.S3FileSystem:
"""Initialize and return S3FileSystem."""
logging.info("Initializing S3FileSystem...")
logging.info(f" Endpoint: {S3_ENDPOINT_URL}")
logging.info(f" Bucket: {BUCKET_NAME}")
try:
fs = s3fs.S3FileSystem(
client_kwargs={"endpoint_url": S3_ENDPOINT_URL},
key=S3_ACCESS_KEY,
secret=S3_SECRET_KEY,
use_listings_cache=False,
)
logging.info("✓ S3FileSystem initialized successfully\n")
return fs
except Exception:
logging.exception("✗ Failed to initialize S3FileSystem")
raise
def ensure_bucket_exists(fs: s3fs.S3FileSystem) -> None:
"""Ensure the test bucket exists."""
try:
if not fs.exists(BUCKET_NAME):
logging.info(f"Creating bucket: {BUCKET_NAME}")
fs.mkdir(BUCKET_NAME)
logging.info(f"✓ Bucket created: {BUCKET_NAME}")
else:
logging.info(f"✓ Bucket exists: {BUCKET_NAME}")
except Exception:
logging.exception("✗ Failed to create/check bucket")
raise
# Write Methods
def write_with_pads(table: pa.Table, path: str, fs: s3fs.S3FileSystem) -> Tuple[bool, str]:
"""Write using pads.write_dataset with filesystem parameter."""
try:
pads.write_dataset(table, path, format="parquet", filesystem=fs)
return True, "pads.write_dataset"
except Exception as e:
error_msg = f"pads.write_dataset: {type(e).__name__}"
log_error("write_with_pads", error_msg)
return False, error_msg
def write_with_buffer_and_s3fs(table: pa.Table, path: str, fs: s3fs.S3FileSystem) -> Tuple[bool, str]:
"""Write using pq.write_table to buffer, then upload via s3fs."""
try:
buffer = io.BytesIO()
pq.write_table(table, buffer)
buffer.seek(0)
with fs.open(path, "wb") as f:
f.write(buffer.read())
return True, "pq.write_table+s3fs.open"
except Exception as e:
error_msg = f"pq.write_table+s3fs.open: {type(e).__name__}"
log_error("write_with_buffer_and_s3fs", error_msg)
return False, error_msg
# Read Methods
def get_parquet_files(path: str, fs: s3fs.S3FileSystem) -> list:
"""
Helper to discover all parquet files for a given path.
Args:
path: S3 path (file or directory)
fs: S3FileSystem instance
Returns:
List of parquet file paths
Raises:
ValueError: If no parquet files are found in a directory
"""
if fs.isdir(path):
# Find all parquet files in the directory
files = [f for f in fs.ls(path) if f.endswith('.parquet')]
if not files:
raise ValueError(f"No parquet files found in directory: {path}")
return files
else:
# Single file path
return [path]
def read_with_pads_dataset(path: str, fs: s3fs.S3FileSystem) -> Tuple[bool, str, int]:
"""Read using pads.dataset - handles both single files and directories."""
try:
# pads.dataset() should auto-discover parquet files in the directory
dataset = pads.dataset(path, format="parquet", filesystem=fs)
result = dataset.to_table()
return True, "pads.dataset", result.num_rows
except Exception as e:
error_msg = f"pads.dataset: {type(e).__name__}"
log_error("read_with_pads_dataset", error_msg)
return False, error_msg, 0
def read_direct_s3fs(path: str, fs: s3fs.S3FileSystem) -> Tuple[bool, str, int]:
"""Read directly via s3fs.open() streaming."""
try:
# Get all parquet files (handles both single file and directory)
parquet_files = get_parquet_files(path, fs)
# Read all parquet files and concatenate them
tables = []
for file_path in parquet_files:
with fs.open(file_path, "rb") as f:
table = pq.read_table(f)
tables.append(table)
# Concatenate all tables into one
if len(tables) == 1:
result = tables[0]
else:
result = pa.concat_tables(tables)
return True, "s3fs.open+pq.read_table", result.num_rows
except Exception as e:
error_msg = f"s3fs.open+pq.read_table: {type(e).__name__}"
log_error("read_direct_s3fs", error_msg)
return False, error_msg, 0
def read_buffered_s3fs(path: str, fs: s3fs.S3FileSystem) -> Tuple[bool, str, int]:
"""Read via s3fs.open() into buffer, then pq.read_table."""
try:
# Get all parquet files (handles both single file and directory)
parquet_files = get_parquet_files(path, fs)
# Read all parquet files and concatenate them
tables = []
for file_path in parquet_files:
with fs.open(file_path, "rb") as f:
buffer = io.BytesIO(f.read())
buffer.seek(0)
table = pq.read_table(buffer)
tables.append(table)
# Concatenate all tables into one
if len(tables) == 1:
result = tables[0]
else:
result = pa.concat_tables(tables)
return True, "s3fs.open+BytesIO+pq.read_table", result.num_rows
except Exception as e:
error_msg = f"s3fs.open+BytesIO+pq.read_table: {type(e).__name__}"
log_error("read_buffered_s3fs", error_msg)
return False, error_msg, 0
def read_with_parquet_dataset(path: str, fs: s3fs.S3FileSystem) -> Tuple[bool, str, int]:
"""Read using pq.ParquetDataset - designed for directories."""
try:
# ParquetDataset is specifically designed to handle directories
dataset = pq.ParquetDataset(path, filesystem=fs)
result = dataset.read()
return True, "pq.ParquetDataset", result.num_rows
except Exception as e:
error_msg = f"pq.ParquetDataset: {type(e).__name__}"
log_error("read_with_parquet_dataset", error_msg)
return False, error_msg, 0
def read_with_pq_read_table(path: str, fs: s3fs.S3FileSystem) -> Tuple[bool, str, int]:
"""Read using pq.read_table with filesystem parameter."""
try:
# pq.read_table() with filesystem should handle directories
result = pq.read_table(path, filesystem=fs)
return True, "pq.read_table+filesystem", result.num_rows
except Exception as e:
error_msg = f"pq.read_table+filesystem: {type(e).__name__}"
log_error("read_with_pq_read_table", error_msg)
return False, error_msg, 0
def test_combination(
fs: s3fs.S3FileSystem,
test_name: str,
write_func,
read_func,
num_rows: int,
) -> Tuple[bool, str]:
"""Test a specific write/read combination."""
table = create_sample_table(num_rows=num_rows)
path = f"{TEST_DIR}/{test_name}/data.parquet"
# Write
write_ok, write_msg = write_func(table, path, fs)
if not write_ok:
return False, f"WRITE_FAIL: {write_msg}"
# Read
read_ok, read_msg, rows_read = read_func(path, fs)
if not read_ok:
return False, f"READ_FAIL: {read_msg}"
# Verify
if rows_read != num_rows:
return False, f"DATA_MISMATCH: expected {num_rows}, got {rows_read}"
return True, f"{write_msg} + {read_msg}"
def cleanup_test_files(fs: s3fs.S3FileSystem) -> None:
"""Clean up test files from S3."""
try:
if fs.exists(TEST_DIR):
logging.info(f"Cleaning up test directory: {TEST_DIR}")
fs.rm(TEST_DIR, recursive=True)
logging.info("✓ Test directory cleaned up")
except Exception as e:
logging.warning(f"Failed to cleanup test directory: {e}")
def main():
"""Run all write/read method combinations."""
print("=" * 80)
print("Write/Read Method Combination Tests for S3-Compatible Storage")
print("Testing PyArrow Parquet Files with Multiple Row Groups")
if TEST_QUICK:
print("*** QUICK TEST MODE - Small files only ***")
print("=" * 80 + "\n")
print("Configuration:")
print(f" S3 Endpoint: {S3_ENDPOINT_URL}")
print(f" Bucket: {BUCKET_NAME}")
print(f" Test Directory: {TEST_DIR}")
print(f" Quick Mode: {'Yes (small files only)' if TEST_QUICK else 'No (all file sizes)'}")
print()
try:
fs = init_s3fs()
ensure_bucket_exists(fs)
except Exception as e:
print(f"Cannot proceed without S3 connection: {e}")
return 1
# Define all write methods
write_methods = [
("pads", write_with_pads),
("buffer+s3fs", write_with_buffer_and_s3fs),
]
# Define all read methods
read_methods = [
("pads.dataset", read_with_pads_dataset),
("pq.ParquetDataset", read_with_parquet_dataset),
("pq.read_table", read_with_pq_read_table),
("s3fs+direct", read_direct_s3fs),
("s3fs+buffered", read_buffered_s3fs),
]
results = []
# Test all combinations for each file size
for size_name, num_rows in TEST_SIZES.items():
print(f"\n{'='*80}")
print(f"Testing with {size_name} files ({num_rows:,} rows)")
print(f"{'='*80}\n")
print(f"{'Write Method':<20} | {'Read Method':<20} | {'Result':<40}")
print("-" * 85)
for write_name, write_func in write_methods:
for read_name, read_func in read_methods:
test_name = f"{size_name}_{write_name}_{read_name}"
success, message = test_combination(
fs, test_name, write_func, read_func, num_rows
)
results.append((test_name, success, message))
status = "✓ PASS" if success else "✗ FAIL"
print(f"{write_name:<20} | {read_name:<20} | {status}: {message[:35]}")
# Summary
print("\n" + "=" * 80)
print("SUMMARY")
print("=" * 80)
passed = sum(1 for _, success, _ in results if success)
total = len(results)
print(f"\nTotal: {passed}/{total} passed\n")
# Group results by file size
for size_name in TEST_SIZES.keys():
size_results = [r for r in results if size_name in r[0]]
size_passed = sum(1 for _, success, _ in size_results if success)
print(f"{size_name.upper()}: {size_passed}/{len(size_results)} passed")
print("\n" + "=" * 80)
if passed == total:
print("✓ ALL TESTS PASSED!")
else:
print(f"✗ {total - passed} test(s) failed")
print("\nFailing combinations:")
for name, success, message in results:
if not success:
parts = name.split("_")
size = parts[0]
write = parts[1]
read = "_".join(parts[2:])
print(f" - {size:6} | {write:15} | {read:20} -> {message[:50]}")
print("=" * 80 + "\n")
print(f"Error details logged to: {ERROR_LOG_FILE}")
print("=" * 80 + "\n")
# Cleanup
cleanup_test_files(fs)
return 0 if passed == total else 1
if __name__ == "__main__":
sys.exit(main())

453
test/s3/parquet/test_cross_filesystem_compatibility.py

@ -0,0 +1,453 @@
#!/usr/bin/env python3
"""
Cross-filesystem compatibility tests for PyArrow Parquet files.
This test verifies that Parquet files written using one filesystem implementation
(s3fs or PyArrow native S3) can be correctly read using the other implementation.
Test Matrix:
- Write with s3fs Read with PyArrow native S3
- Write with PyArrow native S3 Read with s3fs
Requirements:
- pyarrow>=22.0.0
- s3fs>=2024.12.0
- boto3>=1.40.0
Environment Variables:
S3_ENDPOINT_URL: S3 endpoint (default: http://localhost:8333)
S3_ACCESS_KEY: S3 access key (default: some_access_key1)
S3_SECRET_KEY: S3 secret key (default: some_secret_key1)
BUCKET_NAME: S3 bucket name (default: test-parquet-bucket)
TEST_QUICK: Run only small/quick tests (default: 0, set to 1 for quick mode)
Usage:
# Run with default environment variables
python3 test_cross_filesystem_compatibility.py
# Run with custom environment variables
S3_ENDPOINT_URL=http://localhost:8333 \
S3_ACCESS_KEY=mykey \
S3_SECRET_KEY=mysecret \
BUCKET_NAME=mybucket \
python3 test_cross_filesystem_compatibility.py
"""
import os
import secrets
import sys
import logging
from typing import Optional, Tuple
import pyarrow as pa
import pyarrow.dataset as pads
import pyarrow.fs as pafs
import pyarrow.parquet as pq
import s3fs
try:
import boto3
from botocore.exceptions import ClientError
HAS_BOTO3 = True
except ImportError:
HAS_BOTO3 = False
from parquet_test_utils import create_sample_table
logging.basicConfig(level=logging.INFO, format="%(message)s")
# Configuration from environment variables with defaults
S3_ENDPOINT_URL = os.environ.get("S3_ENDPOINT_URL", "http://localhost:8333")
S3_ACCESS_KEY = os.environ.get("S3_ACCESS_KEY", "some_access_key1")
S3_SECRET_KEY = os.environ.get("S3_SECRET_KEY", "some_secret_key1")
BUCKET_NAME = os.getenv("BUCKET_NAME", "test-parquet-bucket")
TEST_QUICK = os.getenv("TEST_QUICK", "0") == "1"
# Create randomized test directory
TEST_RUN_ID = secrets.token_hex(8)
TEST_DIR = f"parquet-cross-fs-tests/{TEST_RUN_ID}"
# Test file sizes
TEST_SIZES = {
"small": 5,
"large": 200_000, # This will create multiple row groups
}
# Filter to only small tests if quick mode is enabled
if TEST_QUICK:
TEST_SIZES = {"small": TEST_SIZES["small"]}
logging.info("Quick test mode enabled - running only small tests")
def init_s3fs() -> Optional[s3fs.S3FileSystem]:
"""Initialize s3fs filesystem."""
try:
logging.info("Initializing s3fs...")
fs = s3fs.S3FileSystem(
client_kwargs={"endpoint_url": S3_ENDPOINT_URL},
key=S3_ACCESS_KEY,
secret=S3_SECRET_KEY,
use_listings_cache=False,
)
logging.info("✓ s3fs initialized successfully")
return fs
except Exception:
logging.exception("✗ Failed to initialize s3fs")
return None
def init_pyarrow_s3() -> Tuple[Optional[pafs.S3FileSystem], str, str]:
"""Initialize PyArrow's native S3 filesystem.
Returns:
tuple: (S3FileSystem instance, scheme, endpoint)
"""
try:
logging.info("Initializing PyArrow S3FileSystem...")
# Determine scheme from endpoint
if S3_ENDPOINT_URL.startswith("http://"):
scheme = "http"
endpoint = S3_ENDPOINT_URL[7:] # Remove http://
elif S3_ENDPOINT_URL.startswith("https://"):
scheme = "https"
endpoint = S3_ENDPOINT_URL[8:] # Remove https://
else:
# Default to http for localhost
scheme = "http"
endpoint = S3_ENDPOINT_URL
# Enable bucket creation and deletion for testing
s3 = pafs.S3FileSystem(
access_key=S3_ACCESS_KEY,
secret_key=S3_SECRET_KEY,
endpoint_override=endpoint,
scheme=scheme,
allow_bucket_creation=True,
allow_bucket_deletion=True,
)
logging.info("✓ PyArrow S3FileSystem initialized successfully")
return s3, scheme, endpoint
except Exception:
logging.exception("✗ Failed to initialize PyArrow S3FileSystem")
return None, "", ""
def ensure_bucket_exists(s3fs_fs: s3fs.S3FileSystem, pyarrow_s3: pafs.S3FileSystem) -> bool:
"""Ensure the test bucket exists using s3fs."""
try:
if not s3fs_fs.exists(BUCKET_NAME):
logging.info(f"Creating bucket: {BUCKET_NAME}")
try:
s3fs_fs.mkdir(BUCKET_NAME)
logging.info(f"✓ Bucket created: {BUCKET_NAME}")
except FileExistsError:
# Bucket was created between the check and mkdir call
logging.info(f"✓ Bucket exists: {BUCKET_NAME}")
else:
logging.info(f"✓ Bucket exists: {BUCKET_NAME}")
return True
except Exception:
logging.exception("✗ Failed to create/check bucket")
return False
def write_with_s3fs(table: pa.Table, path: str, s3fs_fs: s3fs.S3FileSystem) -> bool:
"""Write Parquet file using s3fs filesystem."""
try:
pads.write_dataset(table, path, format="parquet", filesystem=s3fs_fs)
return True
except Exception:
logging.exception("✗ Failed to write with s3fs")
return False
def write_with_pyarrow_s3(table: pa.Table, path: str, pyarrow_s3: pafs.S3FileSystem) -> bool:
"""Write Parquet file using PyArrow native S3 filesystem."""
try:
pads.write_dataset(table, path, format="parquet", filesystem=pyarrow_s3)
return True
except Exception:
logging.exception("✗ Failed to write with PyArrow S3")
return False
def read_with_s3fs(path: str, s3fs_fs: s3fs.S3FileSystem) -> Tuple[bool, Optional[pa.Table], str]:
"""Read Parquet file using s3fs filesystem with multiple methods."""
errors = []
# Try pq.read_table
try:
table = pq.read_table(path, filesystem=s3fs_fs)
except Exception as e: # noqa: BLE001 - Intentionally broad for compatibility testing
errors.append(f"pq.read_table: {type(e).__name__}: {e}")
else:
return True, table, "pq.read_table"
# Try pq.ParquetDataset
try:
dataset = pq.ParquetDataset(path, filesystem=s3fs_fs)
table = dataset.read()
except Exception as e: # noqa: BLE001 - Intentionally broad for compatibility testing
errors.append(f"pq.ParquetDataset: {type(e).__name__}: {e}")
else:
return True, table, "pq.ParquetDataset"
# Try pads.dataset
try:
dataset = pads.dataset(path, format="parquet", filesystem=s3fs_fs)
table = dataset.to_table()
except Exception as e: # noqa: BLE001 - Intentionally broad for compatibility testing
errors.append(f"pads.dataset: {type(e).__name__}: {e}")
else:
return True, table, "pads.dataset"
return False, None, " | ".join(errors)
def read_with_pyarrow_s3(path: str, pyarrow_s3: pafs.S3FileSystem) -> Tuple[bool, Optional[pa.Table], str]:
"""Read Parquet file using PyArrow native S3 filesystem with multiple methods."""
errors = []
# Try pq.read_table
try:
table = pq.read_table(path, filesystem=pyarrow_s3)
except Exception as e: # noqa: BLE001 - Intentionally broad for compatibility testing
errors.append(f"pq.read_table: {type(e).__name__}: {e}")
else:
return True, table, "pq.read_table"
# Try pq.ParquetDataset
try:
dataset = pq.ParquetDataset(path, filesystem=pyarrow_s3)
table = dataset.read()
except Exception as e: # noqa: BLE001 - Intentionally broad for compatibility testing
errors.append(f"pq.ParquetDataset: {type(e).__name__}: {e}")
else:
return True, table, "pq.ParquetDataset"
# Try pads.dataset
try:
dataset = pads.dataset(path, filesystem=pyarrow_s3)
table = dataset.to_table()
except Exception as e: # noqa: BLE001 - Intentionally broad for compatibility testing
errors.append(f"pads.dataset: {type(e).__name__}: {e}")
else:
return True, table, "pads.dataset"
return False, None, " | ".join(errors)
def verify_table_integrity(original: pa.Table, read: pa.Table) -> Tuple[bool, str]:
"""Verify that read table matches the original table."""
# Check row count
if read.num_rows != original.num_rows:
return False, f"Row count mismatch: expected {original.num_rows}, got {read.num_rows}"
# Check schema
if not read.schema.equals(original.schema):
return False, f"Schema mismatch: expected {original.schema}, got {read.schema}"
# Sort both tables by 'id' column before comparison to handle potential row order differences
original_sorted = original.sort_by([('id', 'ascending')])
read_sorted = read.sort_by([('id', 'ascending')])
# Check data equality
if not read_sorted.equals(original_sorted):
# Provide detailed error information
error_details = []
for col_name in original.column_names:
col_original = original_sorted.column(col_name)
col_read = read_sorted.column(col_name)
if not col_original.equals(col_read):
error_details.append(f"column '{col_name}' differs")
return False, f"Data mismatch: {', '.join(error_details)}"
return True, "Data verified successfully"
def test_write_s3fs_read_pyarrow(
test_name: str,
num_rows: int,
s3fs_fs: s3fs.S3FileSystem,
pyarrow_s3: pafs.S3FileSystem
) -> Tuple[bool, str]:
"""Test: Write with s3fs, read with PyArrow native S3."""
try:
table = create_sample_table(num_rows)
path = f"{BUCKET_NAME}/{TEST_DIR}/{test_name}/data.parquet"
# Write with s3fs
logging.info(f" Writing {num_rows:,} rows with s3fs to {path}...")
if not write_with_s3fs(table, path, s3fs_fs):
return False, "Write with s3fs failed"
logging.info(" ✓ Write completed")
# Read with PyArrow native S3
logging.info(" Reading with PyArrow native S3...")
success, read_table, method = read_with_pyarrow_s3(path, pyarrow_s3)
if not success:
return False, f"Read with PyArrow S3 failed: {method}"
logging.info(f" ✓ Read {read_table.num_rows:,} rows using {method}")
# Verify data integrity
verify_success, verify_msg = verify_table_integrity(table, read_table)
if not verify_success:
return False, f"Verification failed: {verify_msg}"
logging.info(f" ✓ {verify_msg}")
return True, f"s3fs→PyArrow: {method}"
except Exception as e: # noqa: BLE001 - Top-level exception handler for test orchestration
logging.exception(" ✗ Test failed")
return False, f"{type(e).__name__}: {e}"
def test_write_pyarrow_read_s3fs(
test_name: str,
num_rows: int,
s3fs_fs: s3fs.S3FileSystem,
pyarrow_s3: pafs.S3FileSystem
) -> Tuple[bool, str]:
"""Test: Write with PyArrow native S3, read with s3fs."""
try:
table = create_sample_table(num_rows)
path = f"{BUCKET_NAME}/{TEST_DIR}/{test_name}/data.parquet"
# Write with PyArrow native S3
logging.info(f" Writing {num_rows:,} rows with PyArrow native S3 to {path}...")
if not write_with_pyarrow_s3(table, path, pyarrow_s3):
return False, "Write with PyArrow S3 failed"
logging.info(" ✓ Write completed")
# Read with s3fs
logging.info(" Reading with s3fs...")
success, read_table, method = read_with_s3fs(path, s3fs_fs)
if not success:
return False, f"Read with s3fs failed: {method}"
logging.info(f" ✓ Read {read_table.num_rows:,} rows using {method}")
# Verify data integrity
verify_success, verify_msg = verify_table_integrity(table, read_table)
if not verify_success:
return False, f"Verification failed: {verify_msg}"
logging.info(f" ✓ {verify_msg}")
return True, f"PyArrow→s3fs: {method}"
except Exception as e: # noqa: BLE001 - Top-level exception handler for test orchestration
logging.exception(" ✗ Test failed")
return False, f"{type(e).__name__}: {e}"
def cleanup_test_files(s3fs_fs: s3fs.S3FileSystem) -> None:
"""Clean up test files from S3."""
try:
test_path = f"{BUCKET_NAME}/{TEST_DIR}"
if s3fs_fs.exists(test_path):
logging.info(f"Cleaning up test directory: {test_path}")
s3fs_fs.rm(test_path, recursive=True)
logging.info("✓ Test directory cleaned up")
except Exception:
logging.exception("Failed to cleanup test directory")
def main():
"""Run cross-filesystem compatibility tests."""
print("=" * 80)
print("Cross-Filesystem Compatibility Tests for PyArrow Parquet")
print("Testing: s3fs ↔ PyArrow Native S3 Filesystem")
if TEST_QUICK:
print("*** QUICK TEST MODE - Small files only ***")
print("=" * 80 + "\n")
print("Configuration:")
print(f" S3 Endpoint: {S3_ENDPOINT_URL}")
print(f" Access Key: {S3_ACCESS_KEY}")
print(f" Bucket: {BUCKET_NAME}")
print(f" Test Directory: {TEST_DIR}")
print(f" Quick Mode: {'Yes (small files only)' if TEST_QUICK else 'No (all file sizes)'}")
print(f" PyArrow Version: {pa.__version__}")
print()
# Initialize both filesystems
s3fs_fs = init_s3fs()
if s3fs_fs is None:
print("Cannot proceed without s3fs connection")
return 1
pyarrow_s3, _scheme, _endpoint = init_pyarrow_s3()
if pyarrow_s3 is None:
print("Cannot proceed without PyArrow S3 connection")
return 1
print()
# Ensure bucket exists
if not ensure_bucket_exists(s3fs_fs, pyarrow_s3):
print("Cannot proceed without bucket")
return 1
print()
results = []
# Test all file sizes
for size_name, num_rows in TEST_SIZES.items():
print(f"\n{'='*80}")
print(f"Testing with {size_name} files ({num_rows:,} rows)")
print(f"{'='*80}\n")
# Test 1: Write with s3fs, read with PyArrow native S3
test_name = f"{size_name}_s3fs_to_pyarrow"
print(f"Test: Write with s3fs → Read with PyArrow native S3")
success, message = test_write_s3fs_read_pyarrow(
test_name, num_rows, s3fs_fs, pyarrow_s3
)
results.append((test_name, success, message))
status = "✓ PASS" if success else "✗ FAIL"
print(f"{status}: {message}\n")
# Test 2: Write with PyArrow native S3, read with s3fs
test_name = f"{size_name}_pyarrow_to_s3fs"
print(f"Test: Write with PyArrow native S3 → Read with s3fs")
success, message = test_write_pyarrow_read_s3fs(
test_name, num_rows, s3fs_fs, pyarrow_s3
)
results.append((test_name, success, message))
status = "✓ PASS" if success else "✗ FAIL"
print(f"{status}: {message}\n")
# Summary
print("\n" + "=" * 80)
print("SUMMARY")
print("=" * 80)
passed = sum(1 for _, success, _ in results if success)
total = len(results)
print(f"\nTotal: {passed}/{total} passed\n")
for test_name, success, message in results:
status = "" if success else ""
print(f" {status} {test_name}: {message}")
print("\n" + "=" * 80)
if passed == total:
print("✓ ALL CROSS-FILESYSTEM TESTS PASSED!")
print()
print("Conclusion: Files written with s3fs and PyArrow native S3 are")
print("fully compatible and can be read by either filesystem implementation.")
else:
print(f"✗ {total - passed} test(s) failed")
print("=" * 80 + "\n")
# Cleanup
cleanup_test_files(s3fs_fs)
return 0 if passed == total else 1
if __name__ == "__main__":
sys.exit(main())

307
test/s3/parquet/test_implicit_directory_fix.py

@ -0,0 +1,307 @@
#!/usr/bin/env python3
"""
Test script to verify the implicit directory fix for s3fs compatibility.
This test verifies that:
1. Implicit directory markers (0-byte objects with children) return 404 on HEAD
2. s3fs correctly identifies them as directories via LIST fallback
3. PyArrow can read datasets created with write_dataset()
The fix makes SeaweedFS behave like AWS S3 and improves s3fs compatibility.
"""
import io
import logging
import os
import sys
import traceback
import pyarrow as pa
import pyarrow.dataset as pads
import pyarrow.parquet as pq
import s3fs
import boto3
from botocore.exceptions import ClientError
# Configure logging
logging.basicConfig(
level=logging.INFO,
format='%(asctime)s - %(levelname)s - %(message)s'
)
logger = logging.getLogger(__name__)
# Configuration
S3_ENDPOINT_URL = os.environ.get("S3_ENDPOINT_URL", "http://localhost:8333")
S3_ACCESS_KEY = os.environ.get("S3_ACCESS_KEY", "some_access_key1")
S3_SECRET_KEY = os.environ.get("S3_SECRET_KEY", "some_secret_key1")
BUCKET_NAME = os.getenv("BUCKET_NAME", "test-implicit-dir")
def create_sample_table(num_rows: int = 1000) -> pa.Table:
"""Create a sample PyArrow table."""
return pa.table({
'id': pa.array(range(num_rows), type=pa.int64()),
'value': pa.array([f'value_{i}' for i in range(num_rows)], type=pa.string()),
'score': pa.array([float(i) * 1.5 for i in range(num_rows)], type=pa.float64()),
})
def setup_s3():
"""Set up S3 clients."""
# s3fs client
fs = s3fs.S3FileSystem(
key=S3_ACCESS_KEY,
secret=S3_SECRET_KEY,
client_kwargs={'endpoint_url': S3_ENDPOINT_URL},
use_ssl=False
)
# boto3 client for raw S3 operations
s3_client = boto3.client(
's3',
endpoint_url=S3_ENDPOINT_URL,
aws_access_key_id=S3_ACCESS_KEY,
aws_secret_access_key=S3_SECRET_KEY,
use_ssl=False
)
return fs, s3_client
def test_implicit_directory_head_behavior(fs, s3_client):
"""Test that HEAD on implicit directory markers returns 404."""
logger.info("\n" + "="*80)
logger.info("TEST 1: Implicit Directory HEAD Behavior")
logger.info("="*80)
test_path = f"{BUCKET_NAME}/test_implicit_dir"
# Clean up any existing data
try:
fs.rm(test_path, recursive=True)
except:
pass
# Create a dataset using PyArrow (creates implicit directory)
logger.info(f"Creating dataset at: {test_path}")
table = create_sample_table(1000)
pads.write_dataset(table, test_path, filesystem=fs, format='parquet')
# List what was created
logger.info("\nFiles created:")
files = fs.ls(test_path, detail=True)
for f in files:
logger.info(f" {f['name']} - size: {f['size']} bytes, type: {f['type']}")
# Test HEAD request on the directory marker (without trailing slash)
logger.info(f"\nTesting HEAD on: {test_path}")
try:
response = s3_client.head_object(Bucket=BUCKET_NAME, Key='test_implicit_dir')
logger.info(f" HEAD response: {response['ResponseMetadata']['HTTPStatusCode']}")
logger.info(f" Content-Length: {response.get('ContentLength', 'N/A')}")
logger.info(f" Content-Type: {response.get('ContentType', 'N/A')}")
logger.warning(" ⚠️ Expected 404, but got 200 - fix may not be working")
return False
except ClientError as e:
if e.response['Error']['Code'] == '404':
logger.info(" ✓ HEAD returned 404 (expected - implicit directory)")
return True
else:
logger.error(f" ✗ Unexpected error: {e}")
return False
def test_s3fs_directory_detection(fs):
"""Test that s3fs correctly detects the directory."""
logger.info("\n" + "="*80)
logger.info("TEST 2: s3fs Directory Detection")
logger.info("="*80)
test_path = f"{BUCKET_NAME}/test_implicit_dir"
# Test s3fs.info()
logger.info(f"\nTesting s3fs.info('{test_path}'):")
try:
info = fs.info(test_path)
logger.info(f" Type: {info.get('type', 'N/A')}")
logger.info(f" Size: {info.get('size', 'N/A')}")
if info.get('type') == 'directory':
logger.info(" ✓ s3fs correctly identified as directory")
return True
else:
logger.warning(f" ⚠️ s3fs identified as: {info.get('type')}")
return False
except Exception as e:
logger.error(f" ✗ Error: {e}")
return False
def test_s3fs_isdir(fs):
"""Test that s3fs.isdir() works correctly."""
logger.info("\n" + "="*80)
logger.info("TEST 3: s3fs.isdir() Method")
logger.info("="*80)
test_path = f"{BUCKET_NAME}/test_implicit_dir"
logger.info(f"\nTesting s3fs.isdir('{test_path}'):")
try:
is_dir = fs.isdir(test_path)
logger.info(f" Result: {is_dir}")
if is_dir:
logger.info(" ✓ s3fs.isdir() correctly returned True")
return True
else:
logger.warning(" ⚠️ s3fs.isdir() returned False")
return False
except Exception as e:
logger.error(f" ✗ Error: {e}")
return False
def test_pyarrow_dataset_read(fs):
"""Test that PyArrow can read the dataset."""
logger.info("\n" + "="*80)
logger.info("TEST 4: PyArrow Dataset Read")
logger.info("="*80)
test_path = f"{BUCKET_NAME}/test_implicit_dir"
logger.info(f"\nReading dataset from: {test_path}")
try:
ds = pads.dataset(test_path, filesystem=fs, format='parquet')
table = ds.to_table()
logger.info(f" ✓ Successfully read {len(table)} rows")
logger.info(f" Columns: {table.column_names}")
return True
except Exception as e:
logger.error(f" ✗ Failed to read dataset: {e}")
traceback.print_exc()
return False
def test_explicit_directory_marker(fs, s3_client):
"""Test that explicit directory markers (with trailing slash) still work."""
logger.info("\n" + "="*80)
logger.info("TEST 5: Explicit Directory Marker (with trailing slash)")
logger.info("="*80)
# Create an explicit directory marker
logger.info(f"\nCreating explicit directory: {BUCKET_NAME}/explicit_dir/")
try:
s3_client.put_object(
Bucket=BUCKET_NAME,
Key='explicit_dir/',
Body=b'',
ContentType='httpd/unix-directory'
)
logger.info(" ✓ Created explicit directory marker")
except Exception as e:
logger.error(f" ✗ Failed to create: {e}")
return False
# Test HEAD with trailing slash
logger.info(f"\nTesting HEAD on: {BUCKET_NAME}/explicit_dir/")
try:
response = s3_client.head_object(Bucket=BUCKET_NAME, Key='explicit_dir/')
logger.info(f" ✓ HEAD returned 200 (expected for explicit directory)")
logger.info(f" Content-Type: {response.get('ContentType', 'N/A')}")
return True
except ClientError as e:
logger.error(f" ✗ HEAD failed: {e}")
return False
def test_empty_file_not_directory(fs, s3_client):
"""Test that legitimate empty files are not treated as directories."""
logger.info("\n" + "="*80)
logger.info("TEST 6: Empty File (not a directory)")
logger.info("="*80)
# Create an empty file with text/plain mime type
logger.info(f"\nCreating empty file: {BUCKET_NAME}/empty.txt")
try:
s3_client.put_object(
Bucket=BUCKET_NAME,
Key='empty.txt',
Body=b'',
ContentType='text/plain'
)
logger.info(" ✓ Created empty file")
except Exception as e:
logger.error(f" ✗ Failed to create: {e}")
return False
# Test HEAD
logger.info(f"\nTesting HEAD on: {BUCKET_NAME}/empty.txt")
try:
response = s3_client.head_object(Bucket=BUCKET_NAME, Key='empty.txt')
logger.info(f" ✓ HEAD returned 200 (expected for empty file)")
logger.info(f" Content-Type: {response.get('ContentType', 'N/A')}")
# Verify s3fs doesn't think it's a directory
info = fs.info(f"{BUCKET_NAME}/empty.txt")
if info.get('type') == 'file':
logger.info(" ✓ s3fs correctly identified as file")
return True
else:
logger.warning(f" ⚠️ s3fs identified as: {info.get('type')}")
return False
except Exception as e:
logger.error(f" ✗ Error: {e}")
return False
def main():
"""Run all tests."""
logger.info("="*80)
logger.info("Implicit Directory Fix Test Suite")
logger.info("="*80)
logger.info(f"Endpoint: {S3_ENDPOINT_URL}")
logger.info(f"Bucket: {BUCKET_NAME}")
logger.info("="*80)
# Set up S3 clients
fs, s3_client = setup_s3()
# Create bucket if it doesn't exist
try:
s3_client.create_bucket(Bucket=BUCKET_NAME)
logger.info(f"\n✓ Created bucket: {BUCKET_NAME}")
except ClientError as e:
error_code = e.response['Error']['Code']
if error_code in ['BucketAlreadyOwnedByYou', 'BucketAlreadyExists']:
logger.info(f"\n✓ Bucket already exists: {BUCKET_NAME}")
else:
logger.error(f"\n✗ Failed to create bucket: {e}")
return 1
# Run tests
results = []
results.append(("Implicit Directory HEAD", test_implicit_directory_head_behavior(fs, s3_client)))
results.append(("s3fs Directory Detection", test_s3fs_directory_detection(fs)))
results.append(("s3fs.isdir() Method", test_s3fs_isdir(fs)))
results.append(("PyArrow Dataset Read", test_pyarrow_dataset_read(fs)))
results.append(("Explicit Directory Marker", test_explicit_directory_marker(fs, s3_client)))
results.append(("Empty File Not Directory", test_empty_file_not_directory(fs, s3_client)))
# Print summary
logger.info("\n" + "="*80)
logger.info("TEST SUMMARY")
logger.info("="*80)
passed = sum(1 for _, result in results if result)
total = len(results)
for name, result in results:
status = "✓ PASS" if result else "✗ FAIL"
logger.info(f"{status}: {name}")
logger.info("="*80)
logger.info(f"Results: {passed}/{total} tests passed")
logger.info("="*80)
if passed == total:
logger.info("\n🎉 All tests passed! The implicit directory fix is working correctly.")
return 0
else:
logger.warning(f"\n⚠️ {total - passed} test(s) failed. The fix may not be fully working.")
return 1
if __name__ == "__main__":
sys.exit(main())

383
test/s3/parquet/test_pyarrow_native_s3.py

@ -0,0 +1,383 @@
#!/usr/bin/env python3
"""
Test script for PyArrow's NATIVE S3 filesystem with SeaweedFS.
This test uses PyArrow's built-in S3FileSystem (pyarrow.fs.S3FileSystem)
instead of s3fs, providing a pure PyArrow solution for reading and writing
Parquet files to S3-compatible storage.
Requirements:
- pyarrow>=10.0.0
Environment Variables:
S3_ENDPOINT_URL: S3 endpoint (default: localhost:8333)
S3_ACCESS_KEY: S3 access key (default: some_access_key1)
S3_SECRET_KEY: S3 secret key (default: some_secret_key1)
BUCKET_NAME: S3 bucket name (default: test-parquet-bucket)
TEST_QUICK: Run only small/quick tests (default: 0, set to 1 for quick mode)
Usage:
# Run with default environment variables
python3 test_pyarrow_native_s3.py
# Run with custom environment variables
S3_ENDPOINT_URL=localhost:8333 \
S3_ACCESS_KEY=mykey \
S3_SECRET_KEY=mysecret \
BUCKET_NAME=mybucket \
python3 test_pyarrow_native_s3.py
"""
import os
import secrets
import sys
import logging
from typing import Optional
import pyarrow as pa
import pyarrow.dataset as pads
import pyarrow.fs as pafs
import pyarrow.parquet as pq
try:
import boto3
from botocore.exceptions import ClientError
HAS_BOTO3 = True
except ImportError:
HAS_BOTO3 = False
from parquet_test_utils import create_sample_table
logging.basicConfig(level=logging.INFO, format="%(message)s")
# Configuration from environment variables with defaults
S3_ENDPOINT_URL = os.environ.get("S3_ENDPOINT_URL", "localhost:8333")
S3_ACCESS_KEY = os.environ.get("S3_ACCESS_KEY", "some_access_key1")
S3_SECRET_KEY = os.environ.get("S3_SECRET_KEY", "some_secret_key1")
BUCKET_NAME = os.getenv("BUCKET_NAME", "test-parquet-bucket")
TEST_QUICK = os.getenv("TEST_QUICK", "0") == "1"
# Create randomized test directory
TEST_RUN_ID = secrets.token_hex(8)
TEST_DIR = f"parquet-native-tests/{TEST_RUN_ID}"
# Test file sizes
TEST_SIZES = {
"small": 5,
"large": 200_000, # This will create multiple row groups
}
# Filter to only small tests if quick mode is enabled
if TEST_QUICK:
TEST_SIZES = {"small": TEST_SIZES["small"]}
logging.info("Quick test mode enabled - running only small tests")
def init_s3_filesystem() -> tuple[Optional[pafs.S3FileSystem], str, str]:
"""Initialize PyArrow's native S3 filesystem.
Returns:
tuple: (S3FileSystem instance, scheme, endpoint)
"""
try:
logging.info("Initializing PyArrow S3FileSystem...")
logging.info(f" Endpoint: {S3_ENDPOINT_URL}")
logging.info(f" Bucket: {BUCKET_NAME}")
# Determine scheme from endpoint
if S3_ENDPOINT_URL.startswith("http://"):
scheme = "http"
endpoint = S3_ENDPOINT_URL[7:] # Remove http://
elif S3_ENDPOINT_URL.startswith("https://"):
scheme = "https"
endpoint = S3_ENDPOINT_URL[8:] # Remove https://
else:
# Default to http for localhost
scheme = "http"
endpoint = S3_ENDPOINT_URL
# Enable bucket creation and deletion for testing
s3 = pafs.S3FileSystem(
access_key=S3_ACCESS_KEY,
secret_key=S3_SECRET_KEY,
endpoint_override=endpoint,
scheme=scheme,
allow_bucket_creation=True,
allow_bucket_deletion=True,
)
logging.info("✓ PyArrow S3FileSystem initialized successfully\n")
return s3, scheme, endpoint
except Exception:
logging.exception("✗ Failed to initialize PyArrow S3FileSystem")
return None, "", ""
def ensure_bucket_exists_boto3(scheme: str, endpoint: str) -> bool:
"""Ensure the test bucket exists using boto3."""
if not HAS_BOTO3:
logging.error("boto3 is required for bucket creation")
return False
try:
# Create boto3 client
endpoint_url = f"{scheme}://{endpoint}"
s3_client = boto3.client(
's3',
endpoint_url=endpoint_url,
aws_access_key_id=S3_ACCESS_KEY,
aws_secret_access_key=S3_SECRET_KEY,
region_name='us-east-1',
)
# Check if bucket exists
try:
s3_client.head_bucket(Bucket=BUCKET_NAME)
logging.info(f"✓ Bucket exists: {BUCKET_NAME}")
return True
except ClientError as e:
error_code = e.response['Error']['Code']
if error_code == '404':
# Bucket doesn't exist, create it
logging.info(f"Creating bucket: {BUCKET_NAME}")
s3_client.create_bucket(Bucket=BUCKET_NAME)
logging.info(f"✓ Bucket created: {BUCKET_NAME}")
return True
else:
raise
except Exception:
logging.exception("✗ Failed to create/check bucket")
return False
def ensure_bucket_exists(s3: pafs.S3FileSystem) -> bool:
"""Ensure the test bucket exists using PyArrow's native S3FileSystem."""
try:
# Check if bucket exists by trying to list it
try:
file_info = s3.get_file_info(BUCKET_NAME)
if file_info.type == pafs.FileType.Directory:
logging.info(f"✓ Bucket exists: {BUCKET_NAME}")
return True
except OSError as e:
# OSError typically means bucket not found or network/permission issues
error_msg = str(e).lower()
if "not found" in error_msg or "does not exist" in error_msg or "nosuchbucket" in error_msg:
logging.debug(f"Bucket '{BUCKET_NAME}' not found, will attempt creation: {e}")
else:
# Log other OSErrors (network, auth, etc.) for debugging
logging.debug(f"Error checking bucket '{BUCKET_NAME}', will attempt creation anyway: {type(e).__name__}: {e}")
except Exception as e:
# Catch any other unexpected exceptions and log them
logging.debug(f"Unexpected error checking bucket '{BUCKET_NAME}', will attempt creation: {type(e).__name__}: {e}")
# Try to create the bucket
logging.info(f"Creating bucket: {BUCKET_NAME}")
s3.create_dir(BUCKET_NAME)
logging.info(f"✓ Bucket created: {BUCKET_NAME}")
return True
except Exception:
logging.exception(f"✗ Failed to create/check bucket '{BUCKET_NAME}' with PyArrow")
return False
def test_write_and_read(s3: pafs.S3FileSystem, test_name: str, num_rows: int) -> tuple[bool, str]:
"""Test writing and reading a Parquet dataset using PyArrow's native S3 filesystem."""
try:
table = create_sample_table(num_rows)
# Write using pads.write_dataset
filename = f"{BUCKET_NAME}/{TEST_DIR}/{test_name}/data.parquet"
logging.info(f" Writing {num_rows:,} rows to {filename}...")
pads.write_dataset(
table,
filename,
filesystem=s3,
format="parquet",
)
logging.info(" ✓ Write completed")
# Test Method 1: Read with pq.read_table
logging.info(" Reading with pq.read_table...")
table_read = pq.read_table(filename, filesystem=s3)
if table_read.num_rows != num_rows:
return False, f"pq.read_table: Row count mismatch (expected {num_rows}, got {table_read.num_rows})"
# Check schema first
if not table_read.schema.equals(table.schema):
return False, f"pq.read_table: Schema mismatch (expected {table.schema}, got {table_read.schema})"
# Sort both tables by 'id' column before comparison to handle potential row order differences
table_sorted = table.sort_by([('id', 'ascending')])
table_read_sorted = table_read.sort_by([('id', 'ascending')])
if not table_read_sorted.equals(table_sorted):
# Provide more detailed error information
error_details = []
for col_name in table.column_names:
col_original = table_sorted.column(col_name)
col_read = table_read_sorted.column(col_name)
if not col_original.equals(col_read):
error_details.append(f"column '{col_name}' differs")
return False, f"pq.read_table: Table contents mismatch ({', '.join(error_details)})"
logging.info(f" ✓ pq.read_table: {table_read.num_rows:,} rows")
# Test Method 2: Read with pq.ParquetDataset
logging.info(" Reading with pq.ParquetDataset...")
dataset = pq.ParquetDataset(filename, filesystem=s3)
table_dataset = dataset.read()
if table_dataset.num_rows != num_rows:
return False, f"pq.ParquetDataset: Row count mismatch (expected {num_rows}, got {table_dataset.num_rows})"
# Sort before comparison
table_dataset_sorted = table_dataset.sort_by([('id', 'ascending')])
if not table_dataset_sorted.equals(table_sorted):
error_details = []
for col_name in table.column_names:
col_original = table_sorted.column(col_name)
col_read = table_dataset_sorted.column(col_name)
if not col_original.equals(col_read):
error_details.append(f"column '{col_name}' differs")
return False, f"pq.ParquetDataset: Table contents mismatch ({', '.join(error_details)})"
logging.info(f" ✓ pq.ParquetDataset: {table_dataset.num_rows:,} rows")
# Test Method 3: Read with pads.dataset
logging.info(" Reading with pads.dataset...")
dataset_pads = pads.dataset(filename, filesystem=s3)
table_pads = dataset_pads.to_table()
if table_pads.num_rows != num_rows:
return False, f"pads.dataset: Row count mismatch (expected {num_rows}, got {table_pads.num_rows})"
# Sort before comparison
table_pads_sorted = table_pads.sort_by([('id', 'ascending')])
if not table_pads_sorted.equals(table_sorted):
error_details = []
for col_name in table.column_names:
col_original = table_sorted.column(col_name)
col_read = table_pads_sorted.column(col_name)
if not col_original.equals(col_read):
error_details.append(f"column '{col_name}' differs")
return False, f"pads.dataset: Table contents mismatch ({', '.join(error_details)})"
logging.info(f" ✓ pads.dataset: {table_pads.num_rows:,} rows")
return True, "All read methods passed"
except Exception as exc:
logging.exception(" ✗ Test failed")
return False, f"{type(exc).__name__}: {exc}"
def cleanup_test_files(s3: pafs.S3FileSystem) -> None:
"""Clean up test files from S3.
Note: We cannot use s3.delete_dir() directly because SeaweedFS uses implicit
directories (path prefixes without physical directory objects). PyArrow's
delete_dir() attempts to delete the directory marker itself, which fails with
"INTERNAL_FAILURE" on SeaweedFS. Instead, we list and delete files individually,
letting implicit directories disappear automatically.
"""
try:
test_path = f"{BUCKET_NAME}/{TEST_DIR}"
logging.info(f"Cleaning up test directory: {test_path}")
# List and delete files individually to handle implicit directories
try:
file_selector = pafs.FileSelector(test_path, recursive=True)
files = s3.get_file_info(file_selector)
# Delete files first (not directories)
for file_info in files:
if file_info.type == pafs.FileType.File:
s3.delete_file(file_info.path)
logging.debug(f" Deleted file: {file_info.path}")
logging.info("✓ Test directory cleaned up")
except OSError as e:
# Handle the case where the path doesn't exist or is inaccessible
if "does not exist" in str(e).lower() or "not found" in str(e).lower():
logging.info("✓ Test directory already clean or doesn't exist")
else:
raise
except Exception:
logging.exception("Failed to cleanup test directory")
def main():
"""Run all tests with PyArrow's native S3 filesystem."""
print("=" * 80)
print("PyArrow Native S3 Filesystem Tests for SeaweedFS")
print("Testing Parquet Files with Multiple Row Groups")
if TEST_QUICK:
print("*** QUICK TEST MODE - Small files only ***")
print("=" * 80 + "\n")
print("Configuration:")
print(f" S3 Endpoint: {S3_ENDPOINT_URL}")
print(f" Access Key: {S3_ACCESS_KEY}")
print(f" Bucket: {BUCKET_NAME}")
print(f" Test Directory: {TEST_DIR}")
print(f" Quick Mode: {'Yes (small files only)' if TEST_QUICK else 'No (all file sizes)'}")
print(f" PyArrow Version: {pa.__version__}")
print()
# Initialize S3 filesystem
s3, scheme, endpoint = init_s3_filesystem()
if s3 is None:
print("Cannot proceed without S3 connection")
return 1
# Ensure bucket exists - try PyArrow first, fall back to boto3
bucket_created = ensure_bucket_exists(s3)
if not bucket_created:
logging.info("Trying to create bucket with boto3...")
bucket_created = ensure_bucket_exists_boto3(scheme, endpoint)
if not bucket_created:
print("Cannot proceed without bucket")
return 1
results = []
# Test all file sizes
for size_name, num_rows in TEST_SIZES.items():
print(f"\n{'='*80}")
print(f"Testing with {size_name} files ({num_rows:,} rows)")
print(f"{'='*80}\n")
test_name = f"{size_name}_test"
success, message = test_write_and_read(s3, test_name, num_rows)
results.append((test_name, success, message))
status = "✓ PASS" if success else "✗ FAIL"
print(f"\n{status}: {message}\n")
# Summary
print("\n" + "=" * 80)
print("SUMMARY")
print("=" * 80)
passed = sum(1 for _, success, _ in results if success)
total = len(results)
print(f"\nTotal: {passed}/{total} passed\n")
for test_name, success, message in results:
status = "" if success else ""
print(f" {status} {test_name}: {message}")
print("\n" + "=" * 80)
if passed == total:
print("✓ ALL TESTS PASSED!")
else:
print(f"✗ {total - passed} test(s) failed")
print("=" * 80 + "\n")
# Cleanup
cleanup_test_files(s3)
return 0 if passed == total else 1
if __name__ == "__main__":
sys.exit(main())

254
test/s3/parquet/test_sse_s3_compatibility.py

@ -0,0 +1,254 @@
#!/usr/bin/env python3
"""
Test script for SSE-S3 compatibility with PyArrow native S3 filesystem.
This test specifically targets the SSE-S3 multipart upload bug where
SeaweedFS panics with "bad IV length" when reading multipart uploads
that were encrypted with bucket-default SSE-S3.
Requirements:
- pyarrow>=10.0.0
- boto3>=1.28.0
Environment Variables:
S3_ENDPOINT_URL: S3 endpoint (default: localhost:8333)
S3_ACCESS_KEY: S3 access key (default: some_access_key1)
S3_SECRET_KEY: S3 secret key (default: some_secret_key1)
BUCKET_NAME: S3 bucket name (default: test-parquet-bucket)
Usage:
# Start SeaweedFS with SSE-S3 enabled
make start-seaweedfs-ci ENABLE_SSE_S3=true
# Run the test
python3 test_sse_s3_compatibility.py
"""
import os
import secrets
import sys
import logging
from typing import Optional
import pyarrow as pa
import pyarrow.dataset as pads
import pyarrow.fs as pafs
import pyarrow.parquet as pq
try:
import boto3
from botocore.exceptions import ClientError
HAS_BOTO3 = True
except ImportError:
HAS_BOTO3 = False
logging.exception("boto3 is required for this test")
sys.exit(1)
from parquet_test_utils import create_sample_table
logging.basicConfig(level=logging.INFO, format="%(message)s")
# Configuration
S3_ENDPOINT_URL = os.environ.get("S3_ENDPOINT_URL", "localhost:8333")
S3_ACCESS_KEY = os.environ.get("S3_ACCESS_KEY", "some_access_key1")
S3_SECRET_KEY = os.environ.get("S3_SECRET_KEY", "some_secret_key1")
BUCKET_NAME = os.getenv("BUCKET_NAME", "test-parquet-bucket")
TEST_RUN_ID = secrets.token_hex(8)
TEST_DIR = f"sse-s3-tests/{TEST_RUN_ID}"
# Test sizes designed to trigger multipart uploads
# PyArrow typically uses 5MB chunks, so these sizes should trigger multipart
TEST_SIZES = {
"tiny": 10, # Single part
"small": 1_000, # Single part
"medium": 50_000, # Single part (~1.5MB)
"large": 200_000, # Multiple parts (~6MB)
"very_large": 500_000, # Multiple parts (~15MB)
}
def init_s3_filesystem() -> tuple[Optional[pafs.S3FileSystem], str, str]:
"""Initialize PyArrow's native S3 filesystem."""
try:
logging.info("Initializing PyArrow S3FileSystem...")
# Determine scheme from endpoint
if S3_ENDPOINT_URL.startswith("http://"):
scheme = "http"
endpoint = S3_ENDPOINT_URL[7:]
elif S3_ENDPOINT_URL.startswith("https://"):
scheme = "https"
endpoint = S3_ENDPOINT_URL[8:]
else:
scheme = "http"
endpoint = S3_ENDPOINT_URL
s3 = pafs.S3FileSystem(
access_key=S3_ACCESS_KEY,
secret_key=S3_SECRET_KEY,
endpoint_override=endpoint,
scheme=scheme,
allow_bucket_creation=True,
allow_bucket_deletion=True,
)
logging.info("✓ PyArrow S3FileSystem initialized\n")
return s3, scheme, endpoint
except Exception:
logging.exception("✗ Failed to initialize PyArrow S3FileSystem")
return None, "", ""
def ensure_bucket_exists(scheme: str, endpoint: str) -> bool:
"""Ensure the test bucket exists using boto3."""
try:
endpoint_url = f"{scheme}://{endpoint}"
s3_client = boto3.client(
's3',
endpoint_url=endpoint_url,
aws_access_key_id=S3_ACCESS_KEY,
aws_secret_access_key=S3_SECRET_KEY,
region_name='us-east-1',
)
try:
s3_client.head_bucket(Bucket=BUCKET_NAME)
logging.info(f"✓ Bucket exists: {BUCKET_NAME}")
except ClientError as e:
error_code = e.response['Error']['Code']
if error_code == '404':
logging.info(f"Creating bucket: {BUCKET_NAME}")
s3_client.create_bucket(Bucket=BUCKET_NAME)
logging.info(f"✓ Bucket created: {BUCKET_NAME}")
else:
logging.exception("✗ Failed to access bucket")
return False
# Note: SeaweedFS doesn't support GetBucketEncryption API
# so we can't verify if SSE-S3 is enabled via API
# We assume it's configured correctly in the s3.json config file
logging.info("✓ Assuming SSE-S3 is configured in s3.json")
return True
except Exception:
logging.exception("✗ Failed to check bucket")
return False
def test_write_read_with_sse(
s3: pafs.S3FileSystem,
test_name: str,
num_rows: int
) -> tuple[bool, str, int]:
"""Test writing and reading with SSE-S3 encryption."""
try:
table = create_sample_table(num_rows)
filename = f"{BUCKET_NAME}/{TEST_DIR}/{test_name}/data.parquet"
logging.info(f" Writing {num_rows:,} rows...")
pads.write_dataset(
table,
filename,
filesystem=s3,
format="parquet",
)
logging.info(" Reading back...")
table_read = pq.read_table(filename, filesystem=s3)
if table_read.num_rows != num_rows:
return False, f"Row count mismatch: {table_read.num_rows} != {num_rows}", 0
return True, "Success", table_read.num_rows
except Exception as e:
error_msg = f"{type(e).__name__}: {e!s}"
logging.exception(" ✗ Failed")
return False, error_msg, 0
def main():
"""Run SSE-S3 compatibility tests."""
print("=" * 80)
print("SSE-S3 Compatibility Tests for PyArrow Native S3")
print("Testing Multipart Upload Encryption")
print("=" * 80 + "\n")
print("Configuration:")
print(f" S3 Endpoint: {S3_ENDPOINT_URL}")
print(f" Bucket: {BUCKET_NAME}")
print(f" Test Directory: {TEST_DIR}")
print(f" PyArrow Version: {pa.__version__}")
print()
# Initialize
s3, scheme, endpoint = init_s3_filesystem()
if s3 is None:
print("Cannot proceed without S3 connection")
return 1
# Check bucket and SSE-S3
if not ensure_bucket_exists(scheme, endpoint):
print("\n⚠ WARNING: Failed to access or create the test bucket!")
print("This test requires a reachable bucket with SSE-S3 enabled.")
print("Please ensure SeaweedFS is running with: make start-seaweedfs-ci ENABLE_SSE_S3=true")
return 1
print()
results = []
# Test all sizes
for size_name, num_rows in TEST_SIZES.items():
print(f"\n{'='*80}")
print(f"Testing {size_name} dataset ({num_rows:,} rows)")
print(f"{'='*80}")
success, message, rows_read = test_write_read_with_sse(
s3, size_name, num_rows
)
results.append((size_name, num_rows, success, message, rows_read))
if success:
print(f" ✓ SUCCESS: Read {rows_read:,} rows")
else:
print(f" ✗ FAILED: {message}")
# Summary
print("\n" + "=" * 80)
print("SUMMARY")
print("=" * 80)
passed = sum(1 for _, _, success, _, _ in results if success)
total = len(results)
print(f"\nTotal: {passed}/{total} tests passed\n")
print(f"{'Size':<15} {'Rows':>10} {'Status':<10} {'Rows Read':>10} {'Message':<40}")
print("-" * 90)
for size_name, num_rows, success, message, rows_read in results:
status = "✓ PASS" if success else "✗ FAIL"
rows_str = f"{rows_read:,}" if success else "N/A"
print(f"{size_name:<15} {num_rows:>10,} {status:<10} {rows_str:>10} {message[:40]}")
print("\n" + "=" * 80)
if passed == total:
print("✓ ALL TESTS PASSED WITH SSE-S3!")
print("\nThis means:")
print(" - SSE-S3 encryption is working correctly")
print(" - PyArrow native S3 filesystem is compatible")
print(" - Multipart uploads are handled properly")
else:
print(f"✗ {total - passed} test(s) failed")
print("\nPossible issues:")
print(" - SSE-S3 multipart upload bug with empty IV")
print(" - Encryption/decryption mismatch")
print(" - File corruption during upload")
print("=" * 80 + "\n")
return 0 if passed == total else 1
if __name__ == "__main__":
sys.exit(main())

4
test/s3/sse/Makefile

@ -98,7 +98,7 @@ start-seaweedfs: check-binary
@mkdir -p /tmp/seaweedfs-test-sse-filer
# Start master server with volume size limit and explicit gRPC port
@nohup $(SEAWEEDFS_BINARY) master -port=$(MASTER_PORT) -port.grpc=$$(( $(MASTER_PORT) + 10000 )) -mdir=/tmp/seaweedfs-test-sse-master -volumeSizeLimitMB=$(VOLUME_MAX_SIZE_MB) -ip=127.0.0.1 > /tmp/seaweedfs-sse-master.log 2>&1 &
@nohup $(SEAWEEDFS_BINARY) master -port=$(MASTER_PORT) -port.grpc=$$(( $(MASTER_PORT) + 10000 )) -mdir=/tmp/seaweedfs-test-sse-master -volumeSizeLimitMB=$(VOLUME_MAX_SIZE_MB) -ip=127.0.0.1 -peers=none > /tmp/seaweedfs-sse-master.log 2>&1 &
@sleep 3
# Start volume server with master HTTP port and increased capacity
@ -354,7 +354,7 @@ start-seaweedfs-ci: check-binary
# Start master server with volume size limit and explicit gRPC port
@echo "Starting master server..."
@nohup $(SEAWEEDFS_BINARY) master -port=$(MASTER_PORT) -port.grpc=$$(( $(MASTER_PORT) + 10000 )) -mdir=/tmp/seaweedfs-test-sse-master -volumeSizeLimitMB=$(VOLUME_MAX_SIZE_MB) -ip=127.0.0.1 > /tmp/seaweedfs-sse-master.log 2>&1 &
@nohup $(SEAWEEDFS_BINARY) master -port=$(MASTER_PORT) -port.grpc=$$(( $(MASTER_PORT) + 10000 )) -mdir=/tmp/seaweedfs-test-sse-master -volumeSizeLimitMB=$(VOLUME_MAX_SIZE_MB) -ip=127.0.0.1 -peers=none > /tmp/seaweedfs-sse-master.log 2>&1 &
@sleep 3
# Start volume server with master HTTP port and increased capacity

104
test/s3/sse/s3_range_headers_test.go

@ -0,0 +1,104 @@
package sse_test
import (
"bytes"
"context"
"fmt"
"io"
"testing"
"github.com/aws/aws-sdk-go-v2/aws"
"github.com/aws/aws-sdk-go-v2/service/s3"
"github.com/stretchr/testify/assert"
"github.com/stretchr/testify/require"
)
// TestPlainObjectRangeAndHeadHeaders ensures non-SSE objects advertise correct
// Content-Length and Content-Range information for both HEAD and ranged GETs.
func TestPlainObjectRangeAndHeadHeaders(t *testing.T) {
ctx := context.Background()
client, err := createS3Client(ctx, defaultConfig)
require.NoError(t, err, "failed to create S3 client")
bucketName, err := createTestBucket(ctx, client, defaultConfig.BucketPrefix+"range-plain-")
require.NoError(t, err, "failed to create test bucket")
defer cleanupTestBucket(ctx, client, bucketName)
// SeaweedFS S3 auto-chunks uploads at 8MiB (see chunkSize in putToFiler).
// Using 16MiB ensures at least two chunks without stressing CI resources.
const chunkSize = 8 * 1024 * 1024
const objectSize = 2 * chunkSize
objectKey := "plain-range-validation"
testData := generateTestData(objectSize)
_, err = client.PutObject(ctx, &s3.PutObjectInput{
Bucket: aws.String(bucketName),
Key: aws.String(objectKey),
Body: bytes.NewReader(testData),
})
require.NoError(t, err, "failed to upload test object")
t.Run("HeadObject reports accurate Content-Length", func(t *testing.T) {
resp, err := client.HeadObject(ctx, &s3.HeadObjectInput{
Bucket: aws.String(bucketName),
Key: aws.String(objectKey),
})
require.NoError(t, err, "HeadObject request failed")
assert.Equal(t, int64(objectSize), resp.ContentLength, "Content-Length mismatch on HEAD")
assert.Equal(t, "bytes", aws.ToString(resp.AcceptRanges), "Accept-Ranges should advertise bytes")
})
t.Run("Range request across chunk boundary", func(t *testing.T) {
// Test range that spans an 8MiB chunk boundary (chunkSize - 1KB to chunkSize + 3KB)
rangeStart := int64(chunkSize - 1024)
rangeEnd := rangeStart + 4096 - 1
rangeHeader := fmt.Sprintf("bytes=%d-%d", rangeStart, rangeEnd)
resp, err := client.GetObject(ctx, &s3.GetObjectInput{
Bucket: aws.String(bucketName),
Key: aws.String(objectKey),
Range: aws.String(rangeHeader),
})
require.NoError(t, err, "GetObject range request failed")
defer resp.Body.Close()
expectedLen := rangeEnd - rangeStart + 1
assert.Equal(t, expectedLen, resp.ContentLength, "Content-Length must match requested range size")
assert.Equal(t,
fmt.Sprintf("bytes %d-%d/%d", rangeStart, rangeEnd, objectSize),
aws.ToString(resp.ContentRange),
"Content-Range header mismatch")
body, err := io.ReadAll(resp.Body)
require.NoError(t, err, "failed to read range response body")
assert.Equal(t, int(expectedLen), len(body), "actual bytes read mismatch")
assert.Equal(t, testData[rangeStart:rangeEnd+1], body, "range payload mismatch")
})
t.Run("Suffix range request", func(t *testing.T) {
const suffixSize = 2048
resp, err := client.GetObject(ctx, &s3.GetObjectInput{
Bucket: aws.String(bucketName),
Key: aws.String(objectKey),
Range: aws.String(fmt.Sprintf("bytes=-%d", suffixSize)),
})
require.NoError(t, err, "GetObject suffix range request failed")
defer resp.Body.Close()
expectedStart := int64(objectSize - suffixSize)
expectedEnd := int64(objectSize - 1)
expectedLen := expectedEnd - expectedStart + 1
assert.Equal(t, expectedLen, resp.ContentLength, "suffix Content-Length mismatch")
assert.Equal(t,
fmt.Sprintf("bytes %d-%d/%d", expectedStart, expectedEnd, objectSize),
aws.ToString(resp.ContentRange),
"suffix Content-Range mismatch")
body, err := io.ReadAll(resp.Body)
require.NoError(t, err, "failed to read suffix range response body")
assert.Equal(t, int(expectedLen), len(body), "suffix range byte count mismatch")
assert.Equal(t, testData[expectedStart:expectedEnd+1], body, "suffix range payload mismatch")
})
}

445
test/s3/sse/s3_sse_range_server_test.go

@ -0,0 +1,445 @@
package sse_test
import (
"bytes"
"context"
"crypto/sha256"
"fmt"
"io"
"net/http"
"testing"
"time"
"github.com/aws/aws-sdk-go-v2/aws"
v4 "github.com/aws/aws-sdk-go-v2/aws/signer/v4"
"github.com/aws/aws-sdk-go-v2/service/s3"
s3types "github.com/aws/aws-sdk-go-v2/service/s3/types"
"github.com/stretchr/testify/assert"
"github.com/stretchr/testify/require"
)
// signRawHTTPRequest signs a raw HTTP request with AWS Signature V4
func signRawHTTPRequest(ctx context.Context, req *http.Request, cfg *S3SSETestConfig) error {
// Create credentials
creds := aws.Credentials{
AccessKeyID: cfg.AccessKey,
SecretAccessKey: cfg.SecretKey,
}
// Create signer
signer := v4.NewSigner()
// Calculate payload hash (empty for GET requests)
payloadHash := fmt.Sprintf("%x", sha256.Sum256([]byte{}))
// Sign the request
err := signer.SignHTTP(ctx, creds, req, payloadHash, "s3", cfg.Region, time.Now())
if err != nil {
return fmt.Errorf("failed to sign request: %w", err)
}
return nil
}
// TestSSECRangeRequestsServerBehavior tests that the server correctly handles Range requests
// for SSE-C encrypted objects by checking actual HTTP response (not SDK-processed response)
func TestSSECRangeRequestsServerBehavior(t *testing.T) {
ctx := context.Background()
client, err := createS3Client(ctx, defaultConfig)
require.NoError(t, err, "Failed to create S3 client")
bucketName, err := createTestBucket(ctx, client, defaultConfig.BucketPrefix+"ssec-range-server-")
require.NoError(t, err, "Failed to create test bucket")
defer cleanupTestBucket(ctx, client, bucketName)
sseKey := generateSSECKey()
testData := generateTestData(2048) // 2KB test file
objectKey := "test-range-server-validation"
// Upload with SSE-C
_, err = client.PutObject(ctx, &s3.PutObjectInput{
Bucket: aws.String(bucketName),
Key: aws.String(objectKey),
Body: bytes.NewReader(testData),
SSECustomerAlgorithm: aws.String("AES256"),
SSECustomerKey: aws.String(sseKey.KeyB64),
SSECustomerKeyMD5: aws.String(sseKey.KeyMD5),
})
require.NoError(t, err, "Failed to upload SSE-C object")
// Test cases for range requests
testCases := []struct {
name string
rangeHeader string
expectedStart int64
expectedEnd int64
expectedTotal int64
}{
{
name: "First 100 bytes",
rangeHeader: "bytes=0-99",
expectedStart: 0,
expectedEnd: 99,
expectedTotal: 2048,
},
{
name: "Middle range",
rangeHeader: "bytes=500-699",
expectedStart: 500,
expectedEnd: 699,
expectedTotal: 2048,
},
{
name: "Last 100 bytes",
rangeHeader: "bytes=1948-2047",
expectedStart: 1948,
expectedEnd: 2047,
expectedTotal: 2048,
},
{
name: "Single byte",
rangeHeader: "bytes=1000-1000",
expectedStart: 1000,
expectedEnd: 1000,
expectedTotal: 2048,
},
{
name: "AES block boundary crossing",
rangeHeader: "bytes=15-17",
expectedStart: 15,
expectedEnd: 17,
expectedTotal: 2048,
},
{
name: "Open-ended range",
rangeHeader: "bytes=2000-",
expectedStart: 2000,
expectedEnd: 2047,
expectedTotal: 2048,
},
{
name: "Suffix range (last 100 bytes)",
rangeHeader: "bytes=-100",
expectedStart: 1948,
expectedEnd: 2047,
expectedTotal: 2048,
},
}
for _, tc := range testCases {
t.Run(tc.name, func(t *testing.T) {
// Build object URL (Endpoint already includes http://)
objectURL := fmt.Sprintf("%s/%s/%s",
defaultConfig.Endpoint,
bucketName,
objectKey,
)
// Create raw HTTP request
req, err := http.NewRequest("GET", objectURL, nil)
require.NoError(t, err, "Failed to create HTTP request")
// Add Range header
req.Header.Set("Range", tc.rangeHeader)
// Add SSE-C headers
req.Header.Set("x-amz-server-side-encryption-customer-algorithm", "AES256")
req.Header.Set("x-amz-server-side-encryption-customer-key", sseKey.KeyB64)
req.Header.Set("x-amz-server-side-encryption-customer-key-MD5", sseKey.KeyMD5)
// Sign the request with AWS Signature V4
err = signRawHTTPRequest(ctx, req, defaultConfig)
require.NoError(t, err, "Failed to sign HTTP request")
// Make request with raw HTTP client
httpClient := &http.Client{}
resp, err := httpClient.Do(req)
require.NoError(t, err, "Failed to execute range request")
defer resp.Body.Close()
// CRITICAL CHECK 1: Status code must be 206 Partial Content
assert.Equal(t, http.StatusPartialContent, resp.StatusCode,
"Server must return 206 Partial Content for range request, got %d", resp.StatusCode)
// CRITICAL CHECK 2: Content-Range header must be present and correct
expectedContentRange := fmt.Sprintf("bytes %d-%d/%d",
tc.expectedStart, tc.expectedEnd, tc.expectedTotal)
actualContentRange := resp.Header.Get("Content-Range")
assert.Equal(t, expectedContentRange, actualContentRange,
"Content-Range header mismatch")
// CRITICAL CHECK 3: Content-Length must match requested range size
expectedLength := tc.expectedEnd - tc.expectedStart + 1
actualLength := resp.ContentLength
assert.Equal(t, expectedLength, actualLength,
"Content-Length mismatch: expected %d, got %d", expectedLength, actualLength)
// CRITICAL CHECK 4: Actual bytes received from network
bodyBytes, err := io.ReadAll(resp.Body)
require.NoError(t, err, "Failed to read response body")
assert.Equal(t, int(expectedLength), len(bodyBytes),
"Actual bytes received from server mismatch: expected %d, got %d",
expectedLength, len(bodyBytes))
// CRITICAL CHECK 5: Verify decrypted content matches expected range
expectedData := testData[tc.expectedStart : tc.expectedEnd+1]
assert.Equal(t, expectedData, bodyBytes,
"Decrypted range content doesn't match expected data")
// Verify SSE-C headers are present in response
assert.Equal(t, "AES256", resp.Header.Get("x-amz-server-side-encryption-customer-algorithm"),
"SSE-C algorithm header missing in range response")
assert.Equal(t, sseKey.KeyMD5, resp.Header.Get("x-amz-server-side-encryption-customer-key-MD5"),
"SSE-C key MD5 header missing in range response")
})
}
}
// TestSSEKMSRangeRequestsServerBehavior tests server-side Range handling for SSE-KMS
func TestSSEKMSRangeRequestsServerBehavior(t *testing.T) {
ctx := context.Background()
client, err := createS3Client(ctx, defaultConfig)
require.NoError(t, err, "Failed to create S3 client")
bucketName, err := createTestBucket(ctx, client, defaultConfig.BucketPrefix+"ssekms-range-server-")
require.NoError(t, err, "Failed to create test bucket")
defer cleanupTestBucket(ctx, client, bucketName)
kmsKeyID := "test-range-key"
testData := generateTestData(4096) // 4KB test file
objectKey := "test-kms-range-server-validation"
// Upload with SSE-KMS
_, err = client.PutObject(ctx, &s3.PutObjectInput{
Bucket: aws.String(bucketName),
Key: aws.String(objectKey),
Body: bytes.NewReader(testData),
ServerSideEncryption: "aws:kms",
SSEKMSKeyId: aws.String(kmsKeyID),
})
require.NoError(t, err, "Failed to upload SSE-KMS object")
// Test various ranges
testCases := []struct {
name string
rangeHeader string
start int64
end int64
}{
{"First KB", "bytes=0-1023", 0, 1023},
{"Second KB", "bytes=1024-2047", 1024, 2047},
{"Last KB", "bytes=3072-4095", 3072, 4095},
{"Unaligned range", "bytes=100-299", 100, 299},
}
for _, tc := range testCases {
t.Run(tc.name, func(t *testing.T) {
objectURL := fmt.Sprintf("%s/%s/%s",
defaultConfig.Endpoint,
bucketName,
objectKey,
)
req, err := http.NewRequest("GET", objectURL, nil)
require.NoError(t, err)
req.Header.Set("Range", tc.rangeHeader)
// Sign the request with AWS Signature V4
err = signRawHTTPRequest(ctx, req, defaultConfig)
require.NoError(t, err, "Failed to sign HTTP request")
httpClient := &http.Client{}
resp, err := httpClient.Do(req)
require.NoError(t, err)
defer resp.Body.Close()
// Verify 206 status
assert.Equal(t, http.StatusPartialContent, resp.StatusCode,
"SSE-KMS range request must return 206, got %d", resp.StatusCode)
// Verify Content-Range
expectedContentRange := fmt.Sprintf("bytes %d-%d/%d", tc.start, tc.end, int64(len(testData)))
assert.Equal(t, expectedContentRange, resp.Header.Get("Content-Range"))
// Verify actual bytes received
bodyBytes, err := io.ReadAll(resp.Body)
require.NoError(t, err)
expectedLength := tc.end - tc.start + 1
assert.Equal(t, int(expectedLength), len(bodyBytes),
"Actual network bytes mismatch")
// Verify content
expectedData := testData[tc.start : tc.end+1]
assert.Equal(t, expectedData, bodyBytes)
})
}
}
// TestSSES3RangeRequestsServerBehavior tests server-side Range handling for SSE-S3
func TestSSES3RangeRequestsServerBehavior(t *testing.T) {
ctx := context.Background()
client, err := createS3Client(ctx, defaultConfig)
require.NoError(t, err, "Failed to create S3 client")
bucketName, err := createTestBucket(ctx, client, "sses3-range-server")
require.NoError(t, err, "Failed to create test bucket")
defer cleanupTestBucket(ctx, client, bucketName)
testData := generateTestData(8192) // 8KB test file
objectKey := "test-s3-range-server-validation"
// Upload with SSE-S3
_, err = client.PutObject(ctx, &s3.PutObjectInput{
Bucket: aws.String(bucketName),
Key: aws.String(objectKey),
Body: bytes.NewReader(testData),
ServerSideEncryption: "AES256",
})
require.NoError(t, err, "Failed to upload SSE-S3 object")
// Test range request
objectURL := fmt.Sprintf("%s/%s/%s",
defaultConfig.Endpoint,
bucketName,
objectKey,
)
req, err := http.NewRequest("GET", objectURL, nil)
require.NoError(t, err)
req.Header.Set("Range", "bytes=1000-1999")
// Sign the request with AWS Signature V4
err = signRawHTTPRequest(ctx, req, defaultConfig)
require.NoError(t, err, "Failed to sign HTTP request")
httpClient := &http.Client{}
resp, err := httpClient.Do(req)
require.NoError(t, err)
defer resp.Body.Close()
// Verify server response
assert.Equal(t, http.StatusPartialContent, resp.StatusCode)
assert.Equal(t, "bytes 1000-1999/8192", resp.Header.Get("Content-Range"))
assert.Equal(t, int64(1000), resp.ContentLength)
bodyBytes, err := io.ReadAll(resp.Body)
require.NoError(t, err)
assert.Equal(t, 1000, len(bodyBytes))
assert.Equal(t, testData[1000:2000], bodyBytes)
}
// TestSSEMultipartRangeRequestsServerBehavior tests Range requests on multipart encrypted objects
func TestSSEMultipartRangeRequestsServerBehavior(t *testing.T) {
ctx := context.Background()
client, err := createS3Client(ctx, defaultConfig)
require.NoError(t, err)
bucketName, err := createTestBucket(ctx, client, defaultConfig.BucketPrefix+"ssec-mp-range-")
require.NoError(t, err)
defer cleanupTestBucket(ctx, client, bucketName)
sseKey := generateSSECKey()
objectKey := "test-multipart-range-server"
// Create 10MB test data (2 parts of 5MB each)
partSize := 5 * 1024 * 1024
part1Data := generateTestData(partSize)
part2Data := generateTestData(partSize)
fullData := append(part1Data, part2Data...)
// Initiate multipart upload
createResp, err := client.CreateMultipartUpload(ctx, &s3.CreateMultipartUploadInput{
Bucket: aws.String(bucketName),
Key: aws.String(objectKey),
SSECustomerAlgorithm: aws.String("AES256"),
SSECustomerKey: aws.String(sseKey.KeyB64),
SSECustomerKeyMD5: aws.String(sseKey.KeyMD5),
})
require.NoError(t, err)
uploadID := aws.ToString(createResp.UploadId)
// Upload part 1
part1Resp, err := client.UploadPart(ctx, &s3.UploadPartInput{
Bucket: aws.String(bucketName),
Key: aws.String(objectKey),
UploadId: aws.String(uploadID),
PartNumber: aws.Int32(1),
Body: bytes.NewReader(part1Data),
SSECustomerAlgorithm: aws.String("AES256"),
SSECustomerKey: aws.String(sseKey.KeyB64),
SSECustomerKeyMD5: aws.String(sseKey.KeyMD5),
})
require.NoError(t, err)
// Upload part 2
part2Resp, err := client.UploadPart(ctx, &s3.UploadPartInput{
Bucket: aws.String(bucketName),
Key: aws.String(objectKey),
UploadId: aws.String(uploadID),
PartNumber: aws.Int32(2),
Body: bytes.NewReader(part2Data),
SSECustomerAlgorithm: aws.String("AES256"),
SSECustomerKey: aws.String(sseKey.KeyB64),
SSECustomerKeyMD5: aws.String(sseKey.KeyMD5),
})
require.NoError(t, err)
// Complete multipart upload
_, err = client.CompleteMultipartUpload(ctx, &s3.CompleteMultipartUploadInput{
Bucket: aws.String(bucketName),
Key: aws.String(objectKey),
UploadId: aws.String(uploadID),
MultipartUpload: &s3types.CompletedMultipartUpload{
Parts: []s3types.CompletedPart{
{PartNumber: aws.Int32(1), ETag: part1Resp.ETag},
{PartNumber: aws.Int32(2), ETag: part2Resp.ETag},
},
},
})
require.NoError(t, err)
// Test range that crosses part boundary
objectURL := fmt.Sprintf("%s/%s/%s",
defaultConfig.Endpoint,
bucketName,
objectKey,
)
// Range spanning across the part boundary
start := int64(partSize - 1000)
end := int64(partSize + 1000)
req, err := http.NewRequest("GET", objectURL, nil)
require.NoError(t, err)
req.Header.Set("Range", fmt.Sprintf("bytes=%d-%d", start, end))
req.Header.Set("x-amz-server-side-encryption-customer-algorithm", "AES256")
req.Header.Set("x-amz-server-side-encryption-customer-key", sseKey.KeyB64)
req.Header.Set("x-amz-server-side-encryption-customer-key-MD5", sseKey.KeyMD5)
// Sign the request with AWS Signature V4
err = signRawHTTPRequest(ctx, req, defaultConfig)
require.NoError(t, err, "Failed to sign HTTP request")
httpClient := &http.Client{}
resp, err := httpClient.Do(req)
require.NoError(t, err)
defer resp.Body.Close()
// Verify server behavior for cross-part range
assert.Equal(t, http.StatusPartialContent, resp.StatusCode,
"Multipart range request must return 206")
expectedLength := end - start + 1
assert.Equal(t, expectedLength, resp.ContentLength,
"Content-Length for cross-part range")
bodyBytes, err := io.ReadAll(resp.Body)
require.NoError(t, err)
assert.Equal(t, int(expectedLength), len(bodyBytes),
"Actual bytes for cross-part range")
// Verify content spans the part boundary correctly
expectedData := fullData[start : end+1]
assert.Equal(t, expectedData, bodyBytes,
"Cross-part range content must be correctly decrypted and assembled")
}

5
weed/cluster/cluster.go

@ -1,10 +1,11 @@
package cluster
import (
"github.com/seaweedfs/seaweedfs/weed/pb"
"github.com/seaweedfs/seaweedfs/weed/pb/master_pb"
"sync"
"time"
"github.com/seaweedfs/seaweedfs/weed/pb"
"github.com/seaweedfs/seaweedfs/weed/pb/master_pb"
)
const (

178
weed/command/backup.go

@ -4,6 +4,8 @@ import (
"context"
"fmt"
"google.golang.org/grpc"
"github.com/seaweedfs/seaweedfs/weed/pb"
"github.com/seaweedfs/seaweedfs/weed/security"
@ -66,105 +68,167 @@ var cmdBackup = &Command{
`,
}
func runBackup(cmd *Command, args []string) bool {
util.LoadSecurityConfiguration()
grpcDialOption := security.LoadClientTLS(util.GetViper(), "grpc.client")
// parseTTL parses the TTL from user input or volume stats.
// Returns (ttl, error, isFatal) where isFatal=true for invalid user input.
func parseTTL(userTTL string, statsTTL string) (*needle.TTL, error, bool) {
if userTTL != "" {
ttl, err := needle.ReadTTL(userTTL)
if err != nil {
// User-provided TTL is invalid - this is fatal
return nil, fmt.Errorf("invalid user-provided ttl %s: %w", userTTL, err), true
}
return ttl, nil, false
}
// Backward compatibility: if -server is provided, use it
masterServer := *s.master
if *s.server != "" {
masterServer = *s.server
ttl, err := needle.ReadTTL(statsTTL)
if err != nil {
return nil, fmt.Errorf("parsing ttl %s from stats: %w", statsTTL, err), false
}
return ttl, nil, false
}
if *s.volumeId == -1 {
return false
// parseReplication parses the replication from user input or volume stats.
// Returns (replication, error, isFatal) where isFatal=true for invalid user input.
func parseReplication(userReplication string, statsReplication string) (*super_block.ReplicaPlacement, error, bool) {
if userReplication != "" {
replication, err := super_block.NewReplicaPlacementFromString(userReplication)
if err != nil {
// User-provided replication is invalid - this is fatal
return nil, fmt.Errorf("invalid user-provided replication %s: %w", userReplication, err), true
}
return replication, nil, false
}
vid := needle.VolumeId(*s.volumeId)
// find volume location, replication, ttl info
lookup, err := operation.LookupVolumeId(func(_ context.Context) pb.ServerAddress { return pb.ServerAddress(masterServer) }, grpcDialOption, vid.String())
replication, err := super_block.NewReplicaPlacementFromString(statsReplication)
if err != nil {
fmt.Printf("Error looking up volume %d: %v\n", vid, err)
return true
return nil, fmt.Errorf("parsing replication %s from stats: %w", statsReplication, err), false
}
volumeServer := lookup.Locations[0].ServerAddress()
return replication, nil, false
}
// backupFromLocation attempts to backup a volume from a specific volume server location.
// Returns (error, isFatal) where isFatal=true means the error is due to invalid user input
// and should not be retried with other locations.
func backupFromLocation(volumeServer pb.ServerAddress, grpcDialOption grpc.DialOption, vid needle.VolumeId) (error, bool) {
stats, err := operation.GetVolumeSyncStatus(volumeServer, grpcDialOption, uint32(vid))
if err != nil {
fmt.Printf("Error get volume %d status: %v\n", vid, err)
return true
return fmt.Errorf("getting volume status: %w", err), false
}
var ttl *needle.TTL
if *s.ttl != "" {
ttl, err = needle.ReadTTL(*s.ttl)
if err != nil {
fmt.Printf("Error generate volume %d ttl %s: %v\n", vid, *s.ttl, err)
return true
}
} else {
ttl, err = needle.ReadTTL(stats.Ttl)
if err != nil {
fmt.Printf("Error get volume %d ttl %s: %v\n", vid, stats.Ttl, err)
return true
}
// Parse TTL
ttl, err, isFatal := parseTTL(*s.ttl, stats.Ttl)
if err != nil {
return err, isFatal
}
var replication *super_block.ReplicaPlacement
if *s.replication != "" {
replication, err = super_block.NewReplicaPlacementFromString(*s.replication)
if err != nil {
fmt.Printf("Error generate volume %d replication %s : %v\n", vid, *s.replication, err)
return true
}
} else {
replication, err = super_block.NewReplicaPlacementFromString(stats.Replication)
if err != nil {
fmt.Printf("Error get volume %d replication %s : %v\n", vid, stats.Replication, err)
return true
}
// Parse replication
replication, err, isFatal := parseReplication(*s.replication, stats.Replication)
if err != nil {
return err, isFatal
}
ver := needle.Version(stats.Version)
// Create or load the volume
v, err := storage.NewVolume(util.ResolvePath(*s.dir), util.ResolvePath(*s.dir), *s.collection, vid, storage.NeedleMapInMemory, replication, ttl, 0, ver, 0, 0)
if err != nil {
fmt.Printf("Error creating or reading from volume %d: %v\n", vid, err)
return true
return fmt.Errorf("creating or reading volume: %w", err), false
}
// Handle compaction if needed
if v.SuperBlock.CompactionRevision < uint16(stats.CompactRevision) {
if err = v.Compact2(0, 0, nil); err != nil {
fmt.Printf("Compact Volume before synchronizing %v\n", err)
return true
v.Close()
return fmt.Errorf("compacting volume: %w", err), false
}
if err = v.CommitCompact(); err != nil {
fmt.Printf("Commit Compact before synchronizing %v\n", err)
return true
v.Close()
return fmt.Errorf("committing compaction: %w", err), false
}
v.SuperBlock.CompactionRevision = uint16(stats.CompactRevision)
v.DataBackend.WriteAt(v.SuperBlock.Bytes(), 0)
if _, err = v.DataBackend.WriteAt(v.SuperBlock.Bytes(), 0); err != nil {
v.Close()
return fmt.Errorf("writing superblock: %w", err), false
}
}
datSize, _, _ := v.FileStat()
// If local volume is larger than remote, recreate it
if datSize > stats.TailOffset {
// remove the old data
if err := v.Destroy(false); err != nil {
fmt.Printf("Error destroying volume: %v\n", err)
v.Close()
return fmt.Errorf("destroying volume: %w", err), false
}
v.Close() // Close the destroyed volume
// recreate an empty volume
v, err = storage.NewVolume(util.ResolvePath(*s.dir), util.ResolvePath(*s.dir), *s.collection, vid, storage.NeedleMapInMemory, replication, ttl, 0, ver, 0, 0)
if err != nil {
fmt.Printf("Error creating or reading from volume %d: %v\n", vid, err)
return true
return fmt.Errorf("recreating volume: %w", err), false
}
}
defer v.Close()
// Perform the incremental backup
if err := v.IncrementalBackup(volumeServer, grpcDialOption); err != nil {
fmt.Printf("Error synchronizing volume %d: %v\n", vid, err)
v.Close()
return fmt.Errorf("incremental backup: %w", err), false
}
v.Close()
return nil, false
}
func runBackup(cmd *Command, args []string) bool {
util.LoadSecurityConfiguration()
grpcDialOption := security.LoadClientTLS(util.GetViper(), "grpc.client")
// Backward compatibility: if -server is provided, use it
masterServer := *s.master
if *s.server != "" {
masterServer = *s.server
}
if *s.volumeId == -1 {
return false
}
vid := needle.VolumeId(*s.volumeId)
// find volume location, replication, ttl info
lookup, err := operation.LookupVolumeId(func(_ context.Context) pb.ServerAddress { return pb.ServerAddress(masterServer) }, grpcDialOption, vid.String())
if err != nil {
fmt.Printf("Error looking up volume %d: %v\n", vid, err)
return true
}
if len(lookup.Locations) == 0 {
fmt.Printf("Error: volume %d has no locations available\n", vid)
return true
}
// Try each available location until one succeeds
var lastErr error
for i, location := range lookup.Locations {
volumeServer := location.ServerAddress()
fmt.Printf("Attempting to backup volume %d from location %d/%d: %s\n", vid, i+1, len(lookup.Locations), volumeServer)
err, isFatal := backupFromLocation(volumeServer, grpcDialOption, vid)
if err != nil {
fmt.Printf("Error backing up volume %d from %s: %v\n", vid, volumeServer, err)
lastErr = err
// Check if this is a fatal user-input error
if isFatal {
return true
}
continue
}
// Success!
fmt.Printf("Successfully backed up volume %d from %s\n", vid, volumeServer)
return true
}
// All locations failed
fmt.Printf("Failed to backup volume %d after trying all %d locations. Last error: %v\n", vid, len(lookup.Locations), lastErr)
return true
}

5
weed/command/iam.go

@ -76,7 +76,7 @@ func (iamopt *IamOptions) startIamServer() bool {
masters := pb.ServerAddresses(*iamopt.masters).ToAddressMap()
router := mux.NewRouter().SkipClean(true)
_, iamApiServer_err := iamapi.NewIamApiServer(router, &iamapi.IamServerOption{
iamApiServer, iamApiServer_err := iamapi.NewIamApiServer(router, &iamapi.IamServerOption{
Masters: masters,
Filer: filerAddress,
Port: *iamopt.port,
@ -86,6 +86,9 @@ func (iamopt *IamOptions) startIamServer() bool {
if iamApiServer_err != nil {
glog.Fatalf("IAM API Server startup error: %v", iamApiServer_err)
}
// Ensure cleanup on shutdown
defer iamApiServer.Shutdown()
listenAddress := fmt.Sprintf(":%d", *iamopt.port)
iamApiListener, iamApiLocalListener, err := util.NewIpAndLocalListeners(*iamopt.ip, *iamopt.port, time.Duration(10)*time.Second)

58
weed/credential/filer_etc/filer_etc_identity.go

@ -7,6 +7,7 @@ import (
"github.com/seaweedfs/seaweedfs/weed/credential"
"github.com/seaweedfs/seaweedfs/weed/filer"
"github.com/seaweedfs/seaweedfs/weed/glog"
"github.com/seaweedfs/seaweedfs/weed/pb/filer_pb"
"github.com/seaweedfs/seaweedfs/weed/pb/iam_pb"
)
@ -14,20 +15,63 @@ import (
func (store *FilerEtcStore) LoadConfiguration(ctx context.Context) (*iam_pb.S3ApiConfiguration, error) {
s3cfg := &iam_pb.S3ApiConfiguration{}
glog.V(1).Infof("Loading IAM configuration from %s/%s (filer: %s)",
filer.IamConfigDirectory, filer.IamIdentityFile, store.filerGrpcAddress)
err := store.withFilerClient(func(client filer_pb.SeaweedFilerClient) error {
var buf bytes.Buffer
if err := filer.ReadEntry(nil, client, filer.IamConfigDirectory, filer.IamIdentityFile, &buf); err != nil {
if err != filer_pb.ErrNotFound {
return err
// Use ReadInsideFiler instead of ReadEntry since identity.json is small
// and stored inline. ReadEntry requires a master client for chunked files,
// but ReadInsideFiler only reads inline content.
content, err := filer.ReadInsideFiler(client, filer.IamConfigDirectory, filer.IamIdentityFile)
if err != nil {
if err == filer_pb.ErrNotFound {
glog.V(1).Infof("IAM identity file not found at %s/%s, no credentials loaded",
filer.IamConfigDirectory, filer.IamIdentityFile)
return nil
}
glog.Errorf("Failed to read IAM identity file from %s/%s: %v",
filer.IamConfigDirectory, filer.IamIdentityFile, err)
return err
}
if buf.Len() > 0 {
return filer.ParseS3ConfigurationFromBytes(buf.Bytes(), s3cfg)
if len(content) == 0 {
glog.V(1).Infof("IAM identity file at %s/%s is empty",
filer.IamConfigDirectory, filer.IamIdentityFile)
return nil
}
glog.V(2).Infof("Read %d bytes from %s/%s",
len(content), filer.IamConfigDirectory, filer.IamIdentityFile)
if err := filer.ParseS3ConfigurationFromBytes(content, s3cfg); err != nil {
glog.Errorf("Failed to parse IAM configuration from %s/%s: %v",
filer.IamConfigDirectory, filer.IamIdentityFile, err)
return err
}
glog.V(1).Infof("Successfully parsed IAM configuration with %d identities and %d accounts",
len(s3cfg.Identities), len(s3cfg.Accounts))
return nil
})
return s3cfg, err
if err != nil {
return s3cfg, err
}
// Log loaded identities for debugging
if glog.V(2) {
for _, identity := range s3cfg.Identities {
credCount := len(identity.Credentials)
actionCount := len(identity.Actions)
glog.V(2).Infof(" Identity: %s (credentials: %d, actions: %d)",
identity.Name, credCount, actionCount)
for _, cred := range identity.Credentials {
glog.V(3).Infof(" Access Key: %s", cred.AccessKey)
}
}
}
return s3cfg, nil
}
func (store *FilerEtcStore) SaveConfiguration(ctx context.Context, config *iam_pb.S3ApiConfiguration) error {

40
weed/credential/filer_etc/filer_etc_policy.go

@ -1,7 +1,6 @@
package filer_etc
import (
"bytes"
"context"
"encoding/json"
@ -28,20 +27,42 @@ func (store *FilerEtcStore) GetPolicies(ctx context.Context) (map[string]policy_
return policiesCollection.Policies, nil
}
glog.V(2).Infof("Loading IAM policies from %s/%s (filer: %s)",
filer.IamConfigDirectory, filer.IamPoliciesFile, store.filerGrpcAddress)
err := store.withFilerClient(func(client filer_pb.SeaweedFilerClient) error {
var buf bytes.Buffer
if err := filer.ReadEntry(nil, client, filer.IamConfigDirectory, filer.IamPoliciesFile, &buf); err != nil {
// Use ReadInsideFiler instead of ReadEntry since policies.json is small
// and stored inline. ReadEntry requires a master client for chunked files,
// but ReadInsideFiler only reads inline content.
content, err := filer.ReadInsideFiler(client, filer.IamConfigDirectory, filer.IamPoliciesFile)
if err != nil {
if err == filer_pb.ErrNotFound {
glog.V(1).Infof("Policies file not found at %s/%s, returning empty policies", filer.IamConfigDirectory, filer.IamPoliciesFile)
glog.V(1).Infof("Policies file not found at %s/%s, returning empty policies",
filer.IamConfigDirectory, filer.IamPoliciesFile)
// If file doesn't exist, return empty collection
return nil
}
glog.Errorf("Failed to read IAM policies file from %s/%s: %v",
filer.IamConfigDirectory, filer.IamPoliciesFile, err)
return err
}
if buf.Len() > 0 {
return json.Unmarshal(buf.Bytes(), policiesCollection)
if len(content) == 0 {
glog.V(2).Infof("IAM policies file at %s/%s is empty",
filer.IamConfigDirectory, filer.IamPoliciesFile)
return nil
}
glog.V(2).Infof("Read %d bytes from %s/%s",
len(content), filer.IamConfigDirectory, filer.IamPoliciesFile)
if err := json.Unmarshal(content, policiesCollection); err != nil {
glog.Errorf("Failed to parse IAM policies from %s/%s: %v",
filer.IamConfigDirectory, filer.IamPoliciesFile, err)
return err
}
glog.V(1).Infof("Successfully loaded %d IAM policies", len(policiesCollection.Policies))
return nil
})
@ -49,6 +70,13 @@ func (store *FilerEtcStore) GetPolicies(ctx context.Context) (map[string]policy_
return nil, err
}
// Log policy names for debugging
if glog.V(2) && len(policiesCollection.Policies) > 0 {
for policyName := range policiesCollection.Policies {
glog.V(2).Infof(" Policy: %s", policyName)
}
}
return policiesCollection.Policies, nil
}

4
weed/filer/filer.go

@ -175,10 +175,6 @@ func (fs *Filer) GetMaster(ctx context.Context) pb.ServerAddress {
return fs.MasterClient.GetMaster(ctx)
}
func (fs *Filer) KeepMasterClientConnected(ctx context.Context) {
fs.MasterClient.KeepConnectedToMaster(ctx)
}
func (f *Filer) BeginTransaction(ctx context.Context) (context.Context, error) {
return f.Store.BeginTransaction(ctx)
}

4
weed/filer/filer_notify.go

@ -83,7 +83,9 @@ func (f *Filer) logMetaEvent(ctx context.Context, fullpath string, eventNotifica
return
}
f.LocalMetaLogBuffer.AddDataToBuffer([]byte(dir), data, event.TsNs)
if err := f.LocalMetaLogBuffer.AddDataToBuffer([]byte(dir), data, event.TsNs); err != nil {
glog.Errorf("failed to add data to log buffer for %s: %v", dir, err)
}
}

385
weed/filer/foundationdb/CONFIGURATION.md

@ -0,0 +1,385 @@
# FoundationDB Filer Store Configuration Reference
This document provides comprehensive configuration options for the FoundationDB filer store.
## Configuration Methods
### 1. Configuration File (filer.toml)
```toml
[foundationdb]
enabled = true
cluster_file = "/etc/foundationdb/fdb.cluster"
api_version = 740
timeout = "5s"
max_retry_delay = "1s"
directory_prefix = "seaweedfs"
```
### 2. Environment Variables
All configuration options can be set via environment variables with the `WEED_FOUNDATIONDB_` prefix:
```bash
export WEED_FOUNDATIONDB_ENABLED=true
export WEED_FOUNDATIONDB_CLUSTER_FILE=/etc/foundationdb/fdb.cluster
export WEED_FOUNDATIONDB_API_VERSION=740
export WEED_FOUNDATIONDB_TIMEOUT=5s
export WEED_FOUNDATIONDB_MAX_RETRY_DELAY=1s
export WEED_FOUNDATIONDB_DIRECTORY_PREFIX=seaweedfs
```
### 3. Command Line Arguments
While not directly supported, configuration can be specified via config files passed to the `weed` command.
## Configuration Options
### Basic Options
| Option | Type | Default | Description |
|--------|------|---------|-------------|
| `enabled` | boolean | `false` | Enable the FoundationDB filer store |
| `cluster_file` | string | `/etc/foundationdb/fdb.cluster` | Path to FoundationDB cluster file |
| `api_version` | integer | `740` | FoundationDB API version to use |
### Connection Options
| Option | Type | Default | Description |
|--------|------|---------|-------------|
| `timeout` | duration | `5s` | Transaction timeout duration |
| `max_retry_delay` | duration | `1s` | Maximum delay between retries |
### Storage Options
| Option | Type | Default | Description |
|--------|------|---------|-------------|
| `directory_prefix` | string | `seaweedfs` | Directory prefix for key organization |
## Configuration Examples
### Development Environment
```toml
[foundationdb]
enabled = true
cluster_file = "/var/fdb/config/fdb.cluster"
api_version = 740
timeout = "10s"
max_retry_delay = "2s"
directory_prefix = "seaweedfs_dev"
```
### Production Environment
```toml
[foundationdb]
enabled = true
cluster_file = "/etc/foundationdb/fdb.cluster"
api_version = 740
timeout = "30s"
max_retry_delay = "5s"
directory_prefix = "seaweedfs_prod"
```
### High-Performance Setup
```toml
[foundationdb]
enabled = true
cluster_file = "/etc/foundationdb/fdb.cluster"
api_version = 740
timeout = "60s"
max_retry_delay = "10s"
directory_prefix = "sw" # Shorter prefix for efficiency
```
### Path-Specific Configuration
Configure different FoundationDB settings for different paths:
```toml
# Default configuration
[foundationdb]
enabled = true
cluster_file = "/etc/foundationdb/fdb.cluster"
directory_prefix = "seaweedfs_main"
# Backup path with different prefix
[foundationdb.backup]
enabled = true
cluster_file = "/etc/foundationdb/fdb.cluster"
directory_prefix = "seaweedfs_backup"
location = "/backup"
timeout = "120s"
# Archive path with extended timeouts
[foundationdb.archive]
enabled = true
cluster_file = "/etc/foundationdb/fdb.cluster"
directory_prefix = "seaweedfs_archive"
location = "/archive"
timeout = "300s"
max_retry_delay = "30s"
```
## Configuration Validation
### Required Settings
The following settings are required for FoundationDB to function:
1. `enabled = true`
2. `cluster_file` must point to a valid FoundationDB cluster file
3. `api_version` must match your FoundationDB installation
### Validation Rules
- `api_version` must be between 600 and 740
- `timeout` must be a valid duration string (e.g., "5s", "30s", "2m")
- `max_retry_delay` must be a valid duration string
- `cluster_file` must exist and be readable
- `directory_prefix` must not be empty
### Error Handling
Invalid configurations will result in startup errors:
```
FATAL: Failed to initialize store for foundationdb: invalid timeout duration
FATAL: Failed to initialize store for foundationdb: failed to open FoundationDB database
FATAL: Failed to initialize store for foundationdb: cluster file not found
```
## Performance Tuning
### Timeout Configuration
| Use Case | Timeout | Max Retry Delay | Notes |
|----------|---------|-----------------|-------|
| Interactive workloads | 5s | 1s | Fast response times |
| Batch processing | 60s | 10s | Handle large operations |
| Archive operations | 300s | 30s | Very large data sets |
### Connection Pool Settings
FoundationDB automatically manages connection pooling. No additional configuration needed.
### Directory Organization
Use meaningful directory prefixes to organize data:
```toml
# Separate environments
directory_prefix = "prod_seaweedfs" # Production
directory_prefix = "staging_seaweedfs" # Staging
directory_prefix = "dev_seaweedfs" # Development
# Separate applications
directory_prefix = "app1_seaweedfs" # Application 1
directory_prefix = "app2_seaweedfs" # Application 2
```
## Security Configuration
### Cluster File Security
Protect the FoundationDB cluster file:
```bash
# Set proper permissions
sudo chown root:seaweedfs /etc/foundationdb/fdb.cluster
sudo chmod 640 /etc/foundationdb/fdb.cluster
```
### Network Security
FoundationDB supports TLS encryption. Configure in the cluster file:
```
description:cluster_id@tls(server1:4500,server2:4500,server3:4500)
```
### Access Control
Use FoundationDB's built-in access control mechanisms when available.
## Monitoring Configuration
### Health Check Settings
Configure health check timeouts appropriately:
```toml
[foundationdb]
enabled = true
timeout = "10s" # Reasonable timeout for health checks
```
### Logging Configuration
Enable verbose logging for troubleshooting:
```bash
# Start SeaweedFS with debug logs
WEED_FOUNDATIONDB_ENABLED=true weed -v=2 server -filer
```
## Migration Configuration
### From Other Filer Stores
When migrating from other filer stores:
1. Configure both stores temporarily
2. Use path-specific configuration for gradual migration
3. Migrate data using SeaweedFS tools
```toml
# During migration - keep old store for reads
[leveldb2]
enabled = true
dir = "/old/filer/data"
# New writes go to FoundationDB
[foundationdb.migration]
enabled = true
location = "/new"
cluster_file = "/etc/foundationdb/fdb.cluster"
```
## Backup Configuration
### Metadata Backup Strategy
```toml
# Main storage
[foundationdb]
enabled = true
directory_prefix = "seaweedfs_main"
# Backup storage (different cluster recommended)
[foundationdb.backup]
enabled = true
cluster_file = "/etc/foundationdb/backup_fdb.cluster"
directory_prefix = "seaweedfs_backup"
location = "/backup"
```
## Container Configuration
### Docker Environment Variables
```bash
# Docker environment
WEED_FOUNDATIONDB_ENABLED=true
WEED_FOUNDATIONDB_CLUSTER_FILE=/var/fdb/config/fdb.cluster
WEED_FOUNDATIONDB_API_VERSION=740
```
### Kubernetes ConfigMap
```yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: seaweedfs-config
data:
filer.toml: |
[foundationdb]
enabled = true
cluster_file = "/var/fdb/config/cluster_file"
api_version = 740
timeout = "30s"
max_retry_delay = "5s"
directory_prefix = "k8s_seaweedfs"
```
## Troubleshooting Configuration
### Debug Configuration
```toml
[foundationdb]
enabled = true
cluster_file = "/etc/foundationdb/fdb.cluster"
timeout = "60s" # Longer timeouts for debugging
max_retry_delay = "10s"
directory_prefix = "debug_seaweedfs"
```
### Test Configuration
```toml
[foundationdb]
enabled = true
cluster_file = "/tmp/fdb.cluster" # Test cluster
timeout = "5s"
directory_prefix = "test_seaweedfs"
```
## Configuration Best Practices
### 1. Environment Separation
Use different directory prefixes for different environments:
- Production: `prod_seaweedfs`
- Staging: `staging_seaweedfs`
- Development: `dev_seaweedfs`
### 2. Timeout Settings
- Interactive: 5-10 seconds
- Batch: 30-60 seconds
- Archive: 120-300 seconds
### 3. Cluster File Management
- Use absolute paths for cluster files
- Ensure proper file permissions
- Keep backup copies of cluster files
### 4. Directory Naming
- Use descriptive prefixes
- Include environment/application identifiers
- Keep prefixes reasonably short for efficiency
### 5. Error Handling
- Configure appropriate timeouts
- Monitor retry patterns
- Set up alerting for configuration errors
## Configuration Testing
### Validation Script
```bash
#!/bin/bash
# Test FoundationDB configuration
# Check cluster file
if [ ! -f "$WEED_FOUNDATIONDB_CLUSTER_FILE" ]; then
echo "ERROR: Cluster file not found: $WEED_FOUNDATIONDB_CLUSTER_FILE"
exit 1
fi
# Test connection
fdbcli -C "$WEED_FOUNDATIONDB_CLUSTER_FILE" --exec 'status' > /dev/null
if [ $? -ne 0 ]; then
echo "ERROR: Cannot connect to FoundationDB cluster"
exit 1
fi
echo "Configuration validation passed"
```
### Integration Testing
```bash
# Test configuration with SeaweedFS
cd test/foundationdb
make check-env
make test-unit
```

435
weed/filer/foundationdb/INSTALL.md

@ -0,0 +1,435 @@
# FoundationDB Filer Store Installation Guide
This guide covers the installation and setup of the FoundationDB filer store for SeaweedFS.
## Prerequisites
### FoundationDB Server
1. **Install FoundationDB Server**
**Ubuntu/Debian:**
```bash
# Add FoundationDB repository
curl -L https://github.com/apple/foundationdb/releases/download/7.4.5/foundationdb-clients_7.4.5-1_amd64.deb -o foundationdb-clients.deb
curl -L https://github.com/apple/foundationdb/releases/download/7.4.5/foundationdb-server_7.4.5-1_amd64.deb -o foundationdb-server.deb
sudo dpkg -i foundationdb-clients.deb foundationdb-server.deb
```
**CentOS/RHEL:**
```bash
# Install RPM packages
wget https://github.com/apple/foundationdb/releases/download/7.4.5/foundationdb-clients-7.4.5-1.el7.x86_64.rpm
wget https://github.com/apple/foundationdb/releases/download/7.4.5/foundationdb-server-7.4.5-1.el7.x86_64.rpm
sudo rpm -Uvh foundationdb-clients-7.4.5-1.el7.x86_64.rpm foundationdb-server-7.4.5-1.el7.x86_64.rpm
```
**macOS:**
```bash
# Using Homebrew (if available)
brew install foundationdb
# Or download from GitHub releases
# https://github.com/apple/foundationdb/releases
```
2. **Initialize FoundationDB Cluster**
**Single Node (Development):**
```bash
# Start FoundationDB service
sudo systemctl start foundationdb
sudo systemctl enable foundationdb
# Initialize database
fdbcli --exec 'configure new single ssd'
```
**Multi-Node Cluster (Production):**
```bash
# On each node, edit /etc/foundationdb/fdb.cluster
# Example: testing:testing@node1:4500,node2:4500,node3:4500
# On one node, initialize cluster
fdbcli --exec 'configure new double ssd'
```
3. **Verify Installation**
```bash
fdbcli --exec 'status'
```
### FoundationDB Client Libraries
The SeaweedFS FoundationDB integration requires the FoundationDB client libraries.
**Ubuntu/Debian:**
```bash
sudo apt-get install libfdb-dev
```
**CentOS/RHEL:**
```bash
sudo yum install foundationdb-devel
```
**macOS:**
```bash
# Client libraries are included with the server installation
export LIBRARY_PATH=/usr/local/lib
export CPATH=/usr/local/include
```
## Building SeaweedFS with FoundationDB Support
### Download FoundationDB Go Bindings
```bash
go mod init seaweedfs-foundationdb
go get github.com/apple/foundationdb/bindings/go/src/fdb
```
### Build SeaweedFS
```bash
# Clone SeaweedFS repository
git clone https://github.com/seaweedfs/seaweedfs.git
cd seaweedfs
# Build with FoundationDB support
go build -tags foundationdb -o weed
```
### Verify Build
```bash
./weed version
# Should show version information
./weed help
# Should list available commands
```
## Configuration
### Basic Configuration
Create or edit `filer.toml`:
```toml
[foundationdb]
enabled = true
cluster_file = "/etc/foundationdb/fdb.cluster"
api_version = 740
timeout = "5s"
max_retry_delay = "1s"
directory_prefix = "seaweedfs"
```
### Environment Variables
Alternative configuration via environment variables:
```bash
export WEED_FOUNDATIONDB_ENABLED=true
export WEED_FOUNDATIONDB_CLUSTER_FILE=/etc/foundationdb/fdb.cluster
export WEED_FOUNDATIONDB_API_VERSION=740
export WEED_FOUNDATIONDB_TIMEOUT=5s
export WEED_FOUNDATIONDB_MAX_RETRY_DELAY=1s
export WEED_FOUNDATIONDB_DIRECTORY_PREFIX=seaweedfs
```
### Advanced Configuration
For production deployments:
```toml
[foundationdb]
enabled = true
cluster_file = "/etc/foundationdb/fdb.cluster"
api_version = 740
timeout = "30s"
max_retry_delay = "5s"
directory_prefix = "seaweedfs_prod"
# Path-specific configuration for backups
[foundationdb.backup]
enabled = true
cluster_file = "/etc/foundationdb/fdb.cluster"
directory_prefix = "seaweedfs_backup"
location = "/backup"
timeout = "60s"
```
## Deployment
### Single Node Deployment
```bash
# Start SeaweedFS with FoundationDB filer
./weed server -filer \
-master.port=9333 \
-volume.port=8080 \
-filer.port=8888 \
-s3.port=8333
```
### Distributed Deployment
**Master Servers:**
```bash
# Node 1
./weed master -port=9333 -peers=master1:9333,master2:9333,master3:9333
# Node 2
./weed master -port=9333 -peers=master1:9333,master2:9333,master3:9333 -ip=master2
# Node 3
./weed master -port=9333 -peers=master1:9333,master2:9333,master3:9333 -ip=master3
```
**Filer Servers with FoundationDB:**
```bash
# Filer nodes
./weed filer -master=master1:9333,master2:9333,master3:9333 -port=8888
```
**Volume Servers:**
```bash
./weed volume -master=master1:9333,master2:9333,master3:9333 -port=8080
```
### Docker Deployment
**docker-compose.yml:**
```yaml
version: '3.9'
services:
foundationdb:
image: foundationdb/foundationdb:7.4.5
ports:
- "4500:4500"
volumes:
- fdb_data:/var/fdb/data
- fdb_config:/var/fdb/config
seaweedfs:
image: chrislusf/seaweedfs:latest
command: "server -filer -ip=seaweedfs"
ports:
- "9333:9333"
- "8888:8888"
- "8333:8333"
environment:
WEED_FOUNDATIONDB_ENABLED: "true"
WEED_FOUNDATIONDB_CLUSTER_FILE: "/var/fdb/config/fdb.cluster"
volumes:
- fdb_config:/var/fdb/config
depends_on:
- foundationdb
volumes:
fdb_data:
fdb_config:
```
### Kubernetes Deployment
**FoundationDB Operator:**
```bash
# Install FoundationDB operator
kubectl apply -f https://raw.githubusercontent.com/FoundationDB/fdb-kubernetes-operator/main/config/samples/deployment.yaml
```
**SeaweedFS with FoundationDB:**
```yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: seaweedfs-filer
spec:
replicas: 3
selector:
matchLabels:
app: seaweedfs-filer
template:
metadata:
labels:
app: seaweedfs-filer
spec:
containers:
- name: seaweedfs
image: chrislusf/seaweedfs:latest
command: ["weed", "filer"]
env:
- name: WEED_FOUNDATIONDB_ENABLED
value: "true"
- name: WEED_FOUNDATIONDB_CLUSTER_FILE
value: "/var/fdb/config/cluster_file"
ports:
- containerPort: 8888
volumeMounts:
- name: fdb-config
mountPath: /var/fdb/config
volumes:
- name: fdb-config
configMap:
name: fdb-cluster-config
```
## Testing Installation
### Quick Test
```bash
# Start SeaweedFS with FoundationDB
./weed server -filer &
# Test file operations
echo "Hello FoundationDB" > test.txt
curl -F file=@test.txt "http://localhost:8888/test/"
curl "http://localhost:8888/test/test.txt"
# Test S3 API
curl -X PUT "http://localhost:8333/testbucket"
curl -T test.txt "http://localhost:8333/testbucket/test.txt"
```
### Integration Test Suite
```bash
# Run the provided test suite
cd test/foundationdb
make setup
make test
```
## Performance Tuning
### FoundationDB Tuning
```bash
# Configure for high performance
fdbcli --exec 'configure triple ssd'
fdbcli --exec 'configure storage_engine=ssd-redwood-1-experimental'
```
### SeaweedFS Configuration
```toml
[foundationdb]
enabled = true
cluster_file = "/etc/foundationdb/fdb.cluster"
timeout = "10s" # Longer timeout for large operations
max_retry_delay = "2s" # Adjust retry behavior
directory_prefix = "sw" # Shorter prefix for efficiency
```
### OS-Level Tuning
```bash
# Increase file descriptor limits
echo "* soft nofile 65536" >> /etc/security/limits.conf
echo "* hard nofile 65536" >> /etc/security/limits.conf
# Adjust network parameters
echo "net.core.rmem_max = 134217728" >> /etc/sysctl.conf
echo "net.core.wmem_max = 134217728" >> /etc/sysctl.conf
sysctl -p
```
## Monitoring and Maintenance
### Health Checks
```bash
# FoundationDB cluster health
fdbcli --exec 'status'
fdbcli --exec 'status details'
# SeaweedFS health
curl http://localhost:9333/cluster/status
curl http://localhost:8888/statistics/health
```
### Log Monitoring
**FoundationDB Logs:**
- `/var/log/foundationdb/` (default location)
- Monitor for errors, warnings, and performance issues
**SeaweedFS Logs:**
```bash
# Start with verbose logging
./weed -v=2 server -filer
```
### Backup and Recovery
**FoundationDB Backup:**
```bash
# Start backup
fdbbackup start -d file:///path/to/backup -t backup_tag
# Monitor backup
fdbbackup status -t backup_tag
# Restore from backup
fdbrestore start -r file:///path/to/backup -t backup_tag --wait
```
**SeaweedFS Metadata Backup:**
```bash
# Export filer metadata
./weed shell
> fs.meta.save /path/to/metadata/backup.gz
```
## Troubleshooting
### Common Issues
1. **Connection Refused**
- Check FoundationDB service status: `sudo systemctl status foundationdb`
- Verify cluster file: `cat /etc/foundationdb/fdb.cluster`
- Check network connectivity: `telnet localhost 4500`
2. **API Version Mismatch**
- Update API version in configuration
- Rebuild SeaweedFS with matching FDB client library
3. **Transaction Conflicts**
- Reduce transaction scope
- Implement appropriate retry logic
- Check for concurrent access patterns
4. **Performance Issues**
- Monitor cluster status: `fdbcli --exec 'status details'`
- Check data distribution: `fdbcli --exec 'status json'`
- Verify storage configuration
### Debug Mode
```bash
# Enable FoundationDB client tracing
export FDB_TRACE_ENABLE=1
export FDB_TRACE_PATH=/tmp/fdb_trace
# Start SeaweedFS with debug logging
./weed -v=3 server -filer
```
### Getting Help
1. **FoundationDB Documentation**: https://apple.github.io/foundationdb/
2. **SeaweedFS Community**: https://github.com/seaweedfs/seaweedfs/discussions
3. **Issue Reporting**: https://github.com/seaweedfs/seaweedfs/issues
For specific FoundationDB filer store issues, include:
- FoundationDB version and cluster configuration
- SeaweedFS version and build tags
- Configuration files (filer.toml)
- Error messages and logs
- Steps to reproduce the issue

Some files were not shown because too many files changed in this diff

Loading…
Cancel
Save