From 40cc0e04a65cdf3fd1c5f3d9c52ee4d843971b02 Mon Sep 17 00:00:00 2001 From: Chris Lu Date: Fri, 20 Feb 2026 00:35:42 -0800 Subject: [PATCH] docker: fix entrypoint chown guard; helm: add openshift-values.yaml (#8390) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit * Enforce IAM for s3tables bucket creation * Prefer IAM path when policies exist * Ensure IAM enforcement honors default allow * address comments * Reused the precomputed principal when setting tableBucketMetadata.OwnerAccountID, avoiding the redundant getAccountID call. * get identity * fix * dedup * fix * comments * fix tests * update iam config * go fmt * fix ports * fix flags * mini clean shutdown * Revert "update iam config" This reverts commit ca48fdbb0afa45657823d98657556c0bbf24f239. Revert "mini clean shutdown" This reverts commit 9e17f6baffd5dd7cc404d831d18dd618b9fe5049. Revert "fix flags" This reverts commit e9e7b29d2f77ee5cb82147d50621255410695ee3. Revert "go fmt" This reverts commit bd3241960b1d9484b7900190773b0ecb3f762c9a. * test/s3tables: share single weed mini per test package via TestMain Previously each top-level test function in the catalog and s3tables package started and stopped its own weed mini instance. This caused failures when a prior instance wasn't cleanly stopped before the next one started (port conflicts, leaked global state). Changes: - catalog/iceberg_catalog_test.go: introduce TestMain that starts one shared TestEnvironment (external weed binary) before all tests and tears it down after. All individual test functions now use sharedEnv. Added randomSuffix() for unique resource names across tests. - catalog/pyiceberg_test.go: updated to use sharedEnv instead of per-test environments. - catalog/pyiceberg_test_helpers.go -> pyiceberg_test_helpers_test.go: renamed to a _test.go file so it can access TestEnvironment which is defined in a test file. - table-buckets/setup.go: add package-level sharedCluster variable. - table-buckets/s3tables_integration_test.go: introduce TestMain that starts one shared TestCluster before all tests. TestS3TablesIntegration now uses sharedCluster. Extract startMiniClusterInDir (no *testing.T) for TestMain use. TestS3TablesCreateBucketIAMPolicy keeps its own cluster (different IAM config). Remove miniClusterMutex (no longer needed). Fix Stop() to not panic when t is nil." * delete * parse * default allow should work with anonymous * fix port * iceberg route The failures are from Iceberg REST using the default bucket warehouse when no prefix is provided. Your tests create random buckets, so /v1/namespaces was looking in warehouse and failing. I updated the tests to use the prefixed Iceberg routes (/v1/{bucket}/...) via a small helper. * test(s3tables): fix port conflicts and IAM ARN matching in integration tests - Pass -master.dir explicitly to prevent filer store directory collision between shared cluster and per-test clusters running in the same process - Pass -volume.port.public and -volume.publicUrl to prevent the global publicPort flag (mutated from 0 → concrete port by first cluster) from being reused by a second cluster, causing 'address already in use' - Remove the flag-reset loop in Stop() that reset global flag values while other goroutines were reading them (race → panic) - Fix IAM policy Resource ARN in TestS3TablesCreateBucketIAMPolicy to use wildcards (arn:aws:s3tables:*:*:bucket/) because the handler generates ARNs with its own DefaultRegion (us-east-1) and principal name ('admin'), not the test constants testRegion/testAccountID * docker: fix entrypoint chown guard; helm: add openshift-values.yaml Fix a regression in entrypoint.sh where the DATA_UID/DATA_GID ownership comparison was dropped, causing chown -R /data to run unconditionally on every container start even when ownership was already correct. Restore the guard so the recursive chown is skipped when the seaweed user already owns /data — making startup faster on subsequent runs and a no-op on OpenShift/PVC deployments where fsGroup has already set correct ownership. Add k8s/charts/seaweedfs/openshift-values.yaml: an example Helm overrides file for deploying SeaweedFS on OpenShift (or any cluster enforcing the Kubernetes restricted Pod Security Standard). Replaces hostPath volumes with PVCs, sets runAsUser/fsGroup to 1000 (the seaweed user baked into the image), drops all capabilities, disables privilege escalation, and enables RuntimeDefault seccomp — satisfying OpenShift's default restricted SCC without needing a custom SCC or root access. Fixes #8381" --- docker/entrypoint.sh | 8 +- k8s/charts/seaweedfs/openshift-values.yaml | 131 +++++++++++++++++++++ 2 files changed, 137 insertions(+), 2 deletions(-) create mode 100644 k8s/charts/seaweedfs/openshift-values.yaml diff --git a/docker/entrypoint.sh b/docker/entrypoint.sh index 822f2fa6e..d5ef16be1 100755 --- a/docker/entrypoint.sh +++ b/docker/entrypoint.sh @@ -20,13 +20,17 @@ if [ "$(id -u)" = "0" ]; then DATA_UID=$(stat -c '%u' /data 2>/dev/null) DATA_GID=$(stat -c '%g' /data 2>/dev/null) - - # Only run chown -R if ownership doesn't match (much faster for subsequent starts) + + # Only run chown -R if ownership doesn't already match (avoids expensive + # recursive chown on subsequent starts, and is a no-op on OpenShift when + # fsGroup has already set correct ownership on the PVC). + if [ "$DATA_UID" != "$SEAWEED_UID" ] || [ "$DATA_GID" != "$SEAWEED_GID" ]; then echo "Fixing /data ownership for seaweed user (uid=$SEAWEED_UID, gid=$SEAWEED_GID)" if ! chown -R seaweed:seaweed /data; then echo "Warning: Failed to change ownership of /data. This may cause permission errors." >&2 echo "If /data is read-only or has mount issues, the application may fail to start." >&2 fi + fi # Use su-exec to drop privileges and run as seaweed user exec su-exec seaweed "$0" "$@" diff --git a/k8s/charts/seaweedfs/openshift-values.yaml b/k8s/charts/seaweedfs/openshift-values.yaml new file mode 100644 index 000000000..1fd540d13 --- /dev/null +++ b/k8s/charts/seaweedfs/openshift-values.yaml @@ -0,0 +1,131 @@ +# openshift-values.yaml +# +# Example overrides for deploying SeaweedFS on OpenShift (or any cluster +# enforcing the Kubernetes "restricted" Pod Security Standard). +# +# OpenShift's default "restricted" SCC blocks containers that: +# - Run as UID 0 (root) +# - Request privilege escalation +# - Use hostPath volumes +# - Omit a seccompProfile +# +# These overrides satisfy all four requirements by: +# 1. Replacing hostPath volumes with PersistentVolumeClaims (or emptyDir for logs) +# 2. Setting runAsUser: 1000 (the "seaweed" user baked into the image) +# 3. Setting fsGroup: 1000 so Kubernetes pre-sets PVC ownership before the +# container starts — the entrypoint's chown -R is then skipped entirely +# 4. Dropping all Linux capabilities and setting allowPrivilegeEscalation: false +# 5. Enabling RuntimeDefault seccompProfile +# +# Usage: +# helm install seaweedfs seaweedfs/seaweedfs \ +# -n seaweedfs --create-namespace \ +# -f openshift-values.yaml +# +# Adjust storageClass and sizes to match your cluster's available StorageClasses. +# On OpenShift you can discover them with: oc get storageclass + +# ── Shared security context helpers ────────────────────────────────────────── +# These are referenced in the per-component sections below. +# If your OpenShift cluster assigns an arbitrary UID (as most do with the +# "restricted" SCC), replace 1000 with the numeric UID in the range shown by: +# oc get project -o jsonpath='{.metadata.annotations.openshift\.io/sa\.scc\.uid-range}' +# and set the same value for runAsUser across all components. + +master: + data: + type: "persistentVolumeClaim" + size: "10Gi" + storageClass: "" # leave empty to use the cluster default StorageClass + + logs: + type: "emptyDir" # avoids hostPath; use persistentVolumeClaim if you need log persistence + + podSecurityContext: + enabled: true + runAsUser: 1000 + runAsGroup: 1000 + fsGroup: 1000 # Kubernetes sets PVC ownership to this GID before container start + runAsNonRoot: true + + containerSecurityContext: + enabled: true + allowPrivilegeEscalation: false + capabilities: + drop: ["ALL"] + runAsNonRoot: true + runAsUser: 1000 + seccompProfile: + type: RuntimeDefault + +volume: + dataDirs: + - name: data1 + type: "persistentVolumeClaim" + size: "100Gi" + storageClass: "" # leave empty to use the cluster default StorageClass + maxVolumes: 0 + + logs: {} # emptyDir by default (no logs section means no log volume) + + podSecurityContext: + enabled: true + runAsUser: 1000 + runAsGroup: 1000 + fsGroup: 1000 + runAsNonRoot: true + + containerSecurityContext: + enabled: true + allowPrivilegeEscalation: false + capabilities: + drop: ["ALL"] + runAsNonRoot: true + runAsUser: 1000 + seccompProfile: + type: RuntimeDefault + +filer: + data: + type: "persistentVolumeClaim" + size: "25Gi" + storageClass: "" # leave empty to use the cluster default StorageClass + + logs: + type: "emptyDir" + + podSecurityContext: + enabled: true + runAsUser: 1000 + runAsGroup: 1000 + fsGroup: 1000 + runAsNonRoot: true + + containerSecurityContext: + enabled: true + allowPrivilegeEscalation: false + capabilities: + drop: ["ALL"] + runAsNonRoot: true + runAsUser: 1000 + seccompProfile: + type: RuntimeDefault + +# S3 gateway (if enabled) +s3: + podSecurityContext: + enabled: true + runAsUser: 1000 + runAsGroup: 1000 + fsGroup: 1000 + runAsNonRoot: true + + containerSecurityContext: + enabled: true + allowPrivilegeEscalation: false + capabilities: + drop: ["ALL"] + runAsNonRoot: true + runAsUser: 1000 + seccompProfile: + type: RuntimeDefault