Browse Source

fix: reduce N+1 queries in S3 versioned object list operations (#7814)

* fix: achieve single-scan efficiency for S3 versioned object listing

When listing objects in a versioning-enabled bucket, the original code
triggered multiple getEntry calls per versioned object (up to 12 with
retries), causing excessive 'find' operations visible in Grafana and
leading to high memory usage.

This fix achieves single-scan efficiency by caching list metadata
(size, ETag, mtime, owner) directly in the .versions directory:

1. Add new Extended keys for caching list metadata in .versions dir
2. Update upload/copy/multipart paths to cache metadata when creating versions
3. Update getLatestVersionEntryFromDirectoryEntry to use cached metadata
   (zero getEntry calls when cache is available)
4. Update updateLatestVersionAfterDeletion to maintain cache consistency

Performance improvement for N versioned objects:
- Before: N×1 to N×12 find operations per list request
- After: 0 extra find operations (all metadata from single scan)

This matches the efficiency of normal (non-versioned) object listing.

* Update s3api_object_versioning.go

* s3api: fix ETag handling for versioned objects and simplify delete marker creation

- Add Md5 attribute to synthetic logicalEntry for single-part uploads to ensure
  filer.ETag() returns correct value in ListObjects response
- Simplify delete marker creation by initializing entry directly in mkFile callback
- Add bytes and encoding/hex imports for ETag parsing

* s3api: preserve default attributes in delete marker mkFile callback

Only modify Mtime field instead of replacing the entire Attributes struct,
preserving default values like Crtime, FileMode, Uid, and Gid that mkFile
initializes.

* s3api: fix ETag handling in newListEntry for multipart uploads

Prioritize ExtETagKey from Extended attributes before falling back to
filer.ETag(). This properly handles multipart upload ETags (format: md5-parts)
for versioned objects, where the synthetic entry has cached ETag metadata
but no chunks to calculate from.

* s3api: reduce code duplication in delete marker creation

Extract deleteMarkerExtended map to be reused in both mkFile callback
and deleteMarkerEntry construction.

* test: add multipart upload versioning tests for ETag verification

Add tests to verify that multipart uploaded objects in versioned buckets
have correct ETags when listed:

- TestMultipartUploadVersioningListETag: Basic multipart upload with 2 parts
- TestMultipartUploadMultipleVersionsListETag: Multiple multipart versions
- TestMixedSingleAndMultipartVersionsListETag: Mix of single-part and multipart

These tests cover a bug where synthetic entries for versioned objects
didn't include proper ETag handling for multipart uploads.

* test: add delete marker test for multipart uploaded versioned objects

TestMultipartUploadDeleteMarkerListBehavior verifies:
- Delete marker creation hides object from ListObjectsV2
- ListObjectVersions shows both version and delete marker
- Version ETag (multipart format) is preserved after delete marker
- Object can be accessed by version ID after delete marker
- Removing delete marker restores object visibility

* refactor: address code review feedback

- test: use assert.ElementsMatch for ETag verification (more idiomatic)
- s3api: optimize newListEntry ETag logic (check ExtETagKey first)
- s3api: fix edge case in ETag parsing (>= 2 instead of > 2)

* s3api: prevent stale cached metadata and preserve existing extended attrs

- setCachedListMetadata: clear old cached keys before setting new values
  to prevent stale data when new version lacks certain fields (e.g., owner)
- createDeleteMarker: merge extended attributes instead of overwriting
  to preserve any existing metadata on the entry

* s3api: extract clearCachedVersionMetadata to reduce code duplication

- clearCachedVersionMetadata: clears only metadata fields (size, mtime, etag, owner, deleteMarker)
- clearCachedListMetadata: now reuses clearCachedVersionMetadata + clears ID/filename
- setCachedListMetadata: uses clearCachedVersionMetadata (not clearCachedListMetadata
  because caller has already set ID/filename)

* s3api: share timestamp between version entry and cache entry

Capture versionMtime once before mkFile and reuse for both:
- versionEntry.Attributes.Mtime in the mkFile callback
- versionEntryForCache.Attributes.Mtime for list caching

This keeps list vs. HEAD LastModified timestamps aligned.

* s3api: remove amzAccountId variable shadowing in multipart upload

Extract amzAccountId before mkFile callback and reuse in both places,
similar to how versionMtime is handled. Avoids confusion from
redeclaring the same variable.
pull/7817/head
Chris Lu 2 days ago
committed by GitHub
parent
commit
bccef78082
No known key found for this signature in database GPG Key ID: B5690EEEBB952194
  1. 521
      test/s3/versioning/s3_versioning_multipart_test.go
  2. 25
      weed/s3api/filer_multipart.go
  3. 6
      weed/s3api/s3_constants/extend_key.go
  4. 13
      weed/s3api/s3api_object_handlers.go
  5. 3
      weed/s3api/s3api_object_handlers_copy.go
  6. 54
      weed/s3api/s3api_object_handlers_list.go
  7. 9
      weed/s3api/s3api_object_handlers_put.go
  8. 214
      weed/s3api/s3api_object_versioning.go

521
test/s3/versioning/s3_versioning_multipart_test.go

@ -0,0 +1,521 @@
package s3api
import (
"bytes"
"context"
"crypto/md5"
"fmt"
"strings"
"testing"
"github.com/aws/aws-sdk-go-v2/aws"
"github.com/aws/aws-sdk-go-v2/service/s3"
"github.com/aws/aws-sdk-go-v2/service/s3/types"
"github.com/stretchr/testify/assert"
"github.com/stretchr/testify/require"
)
// TestMultipartUploadVersioningListETag tests that multipart uploaded objects
// in versioned buckets have correct ETags when listed.
// This covers a bug where synthetic entries for versioned objects didn't include
// proper ETag handling for multipart uploads (ETags with format "<md5>-<parts>").
func TestMultipartUploadVersioningListETag(t *testing.T) {
client := getS3Client(t)
bucketName := getNewBucketName()
// Create bucket
createBucket(t, client, bucketName)
defer deleteBucket(t, client, bucketName)
// Enable versioning
_, err := client.PutBucketVersioning(context.TODO(), &s3.PutBucketVersioningInput{
Bucket: aws.String(bucketName),
VersioningConfiguration: &types.VersioningConfiguration{
Status: types.BucketVersioningStatusEnabled,
},
})
require.NoError(t, err, "Failed to enable versioning")
// Create multipart upload
objectKey := "multipart-test-object"
createResp, err := client.CreateMultipartUpload(context.TODO(), &s3.CreateMultipartUploadInput{
Bucket: aws.String(bucketName),
Key: aws.String(objectKey),
})
require.NoError(t, err, "Failed to create multipart upload")
uploadId := *createResp.UploadId
// Upload 2 parts (minimum 5MB per part except last)
partSize := 5 * 1024 * 1024 // 5MB
part1Data := bytes.Repeat([]byte("a"), partSize)
part2Data := bytes.Repeat([]byte("b"), partSize)
// Calculate MD5 for each part
part1MD5 := md5.Sum(part1Data)
part2MD5 := md5.Sum(part2Data)
// Upload part 1
uploadPart1Resp, err := client.UploadPart(context.TODO(), &s3.UploadPartInput{
Bucket: aws.String(bucketName),
Key: aws.String(objectKey),
UploadId: aws.String(uploadId),
PartNumber: aws.Int32(1),
Body: bytes.NewReader(part1Data),
})
require.NoError(t, err, "Failed to upload part 1")
// Upload part 2
uploadPart2Resp, err := client.UploadPart(context.TODO(), &s3.UploadPartInput{
Bucket: aws.String(bucketName),
Key: aws.String(objectKey),
UploadId: aws.String(uploadId),
PartNumber: aws.Int32(2),
Body: bytes.NewReader(part2Data),
})
require.NoError(t, err, "Failed to upload part 2")
// Complete multipart upload
completeResp, err := client.CompleteMultipartUpload(context.TODO(), &s3.CompleteMultipartUploadInput{
Bucket: aws.String(bucketName),
Key: aws.String(objectKey),
UploadId: aws.String(uploadId),
MultipartUpload: &types.CompletedMultipartUpload{
Parts: []types.CompletedPart{
{
ETag: uploadPart1Resp.ETag,
PartNumber: aws.Int32(1),
},
{
ETag: uploadPart2Resp.ETag,
PartNumber: aws.Int32(2),
},
},
},
})
require.NoError(t, err, "Failed to complete multipart upload")
// Verify the ETag from CompleteMultipartUpload has the multipart format (md5-parts)
completeETag := strings.Trim(*completeResp.ETag, "\"")
assert.Contains(t, completeETag, "-", "Multipart ETag should contain '-' (format: md5-parts)")
assert.True(t, strings.HasSuffix(completeETag, "-2"), "Multipart ETag should end with '-2' for 2 parts")
t.Logf("CompleteMultipartUpload ETag: %s", completeETag)
t.Logf("Part 1 MD5: %x", part1MD5)
t.Logf("Part 2 MD5: %x", part2MD5)
// HeadObject should return the same ETag
headResp, err := client.HeadObject(context.TODO(), &s3.HeadObjectInput{
Bucket: aws.String(bucketName),
Key: aws.String(objectKey),
})
require.NoError(t, err, "Failed to head object")
headETag := strings.Trim(*headResp.ETag, "\"")
assert.Equal(t, completeETag, headETag, "HeadObject ETag should match CompleteMultipartUpload ETag")
// ListObjectsV2 should return the same ETag
listResp, err := client.ListObjectsV2(context.TODO(), &s3.ListObjectsV2Input{
Bucket: aws.String(bucketName),
Prefix: aws.String(objectKey),
})
require.NoError(t, err, "Failed to list objects")
require.Len(t, listResp.Contents, 1, "Should have exactly one object")
listETag := strings.Trim(*listResp.Contents[0].ETag, "\"")
assert.Equal(t, completeETag, listETag, "ListObjectsV2 ETag should match CompleteMultipartUpload ETag")
assert.NotEmpty(t, listETag, "ListObjectsV2 ETag should not be empty")
t.Logf("ListObjectsV2 ETag: %s", listETag)
// ListObjectVersions should also return the correct ETag
versionsResp, err := client.ListObjectVersions(context.TODO(), &s3.ListObjectVersionsInput{
Bucket: aws.String(bucketName),
Prefix: aws.String(objectKey),
})
require.NoError(t, err, "Failed to list object versions")
require.Len(t, versionsResp.Versions, 1, "Should have exactly one version")
versionETag := strings.Trim(*versionsResp.Versions[0].ETag, "\"")
assert.Equal(t, completeETag, versionETag, "ListObjectVersions ETag should match CompleteMultipartUpload ETag")
assert.NotEmpty(t, versionETag, "ListObjectVersions ETag should not be empty")
t.Logf("ListObjectVersions ETag: %s", versionETag)
}
// TestMultipartUploadMultipleVersionsListETag tests that multiple versions
// of multipart uploaded objects all have correct ETags when listed.
func TestMultipartUploadMultipleVersionsListETag(t *testing.T) {
client := getS3Client(t)
bucketName := getNewBucketName()
// Create bucket
createBucket(t, client, bucketName)
defer deleteBucket(t, client, bucketName)
// Enable versioning
_, err := client.PutBucketVersioning(context.TODO(), &s3.PutBucketVersioningInput{
Bucket: aws.String(bucketName),
VersioningConfiguration: &types.VersioningConfiguration{
Status: types.BucketVersioningStatusEnabled,
},
})
require.NoError(t, err, "Failed to enable versioning")
objectKey := "multipart-multi-version-object"
partSize := 5 * 1024 * 1024 // 5MB
var expectedETags []string
// Create 3 versions using multipart upload
for version := 1; version <= 3; version++ {
// Create multipart upload
createResp, err := client.CreateMultipartUpload(context.TODO(), &s3.CreateMultipartUploadInput{
Bucket: aws.String(bucketName),
Key: aws.String(objectKey),
})
require.NoError(t, err, "Failed to create multipart upload for version %d", version)
uploadId := *createResp.UploadId
// Create unique data for each version
partData := bytes.Repeat([]byte(fmt.Sprintf("%d", version)), partSize)
// Upload single part (still results in multipart ETag format)
uploadPartResp, err := client.UploadPart(context.TODO(), &s3.UploadPartInput{
Bucket: aws.String(bucketName),
Key: aws.String(objectKey),
UploadId: aws.String(uploadId),
PartNumber: aws.Int32(1),
Body: bytes.NewReader(partData),
})
require.NoError(t, err, "Failed to upload part for version %d", version)
// Complete multipart upload
completeResp, err := client.CompleteMultipartUpload(context.TODO(), &s3.CompleteMultipartUploadInput{
Bucket: aws.String(bucketName),
Key: aws.String(objectKey),
UploadId: aws.String(uploadId),
MultipartUpload: &types.CompletedMultipartUpload{
Parts: []types.CompletedPart{
{
ETag: uploadPartResp.ETag,
PartNumber: aws.Int32(1),
},
},
},
})
require.NoError(t, err, "Failed to complete multipart upload for version %d", version)
etag := strings.Trim(*completeResp.ETag, "\"")
expectedETags = append(expectedETags, etag)
t.Logf("Version %d ETag: %s", version, etag)
}
// ListObjectVersions should return all versions with correct ETags
versionsResp, err := client.ListObjectVersions(context.TODO(), &s3.ListObjectVersionsInput{
Bucket: aws.String(bucketName),
Prefix: aws.String(objectKey),
})
require.NoError(t, err, "Failed to list object versions")
require.Len(t, versionsResp.Versions, 3, "Should have exactly 3 versions")
// Collect ETags from the listing
var listedETags []string
for _, v := range versionsResp.Versions {
etag := strings.Trim(*v.ETag, "\"")
listedETags = append(listedETags, etag)
assert.NotEmpty(t, etag, "Version ETag should not be empty")
assert.Contains(t, etag, "-", "Multipart ETag should contain '-'")
}
t.Logf("Expected ETags: %v", expectedETags)
t.Logf("Listed ETags: %v", listedETags)
// Verify all expected ETags are present (order may differ due to version ordering)
assert.ElementsMatch(t, expectedETags, listedETags, "Listed ETags should match all expected ETags")
// Regular ListObjectsV2 should return only the latest version with correct ETag
listResp, err := client.ListObjectsV2(context.TODO(), &s3.ListObjectsV2Input{
Bucket: aws.String(bucketName),
Prefix: aws.String(objectKey),
})
require.NoError(t, err, "Failed to list objects")
require.Len(t, listResp.Contents, 1, "Should have exactly one object in regular listing")
listETag := strings.Trim(*listResp.Contents[0].ETag, "\"")
// The latest version (version 3) should be the one shown
assert.Equal(t, expectedETags[2], listETag, "ListObjectsV2 should show latest version's ETag")
}
// TestMixedSingleAndMultipartVersionsListETag tests that a mix of
// single-part and multipart uploaded versions all have correct ETags.
func TestMixedSingleAndMultipartVersionsListETag(t *testing.T) {
client := getS3Client(t)
bucketName := getNewBucketName()
// Create bucket
createBucket(t, client, bucketName)
defer deleteBucket(t, client, bucketName)
// Enable versioning
_, err := client.PutBucketVersioning(context.TODO(), &s3.PutBucketVersioningInput{
Bucket: aws.String(bucketName),
VersioningConfiguration: &types.VersioningConfiguration{
Status: types.BucketVersioningStatusEnabled,
},
})
require.NoError(t, err, "Failed to enable versioning")
objectKey := "mixed-upload-versions"
// Version 1: Regular PutObject (single-part, pure MD5 ETag)
content1 := []byte("This is version 1 content - single part upload")
putResp1, err := client.PutObject(context.TODO(), &s3.PutObjectInput{
Bucket: aws.String(bucketName),
Key: aws.String(objectKey),
Body: bytes.NewReader(content1),
})
require.NoError(t, err, "Failed to put version 1")
etag1 := strings.Trim(*putResp1.ETag, "\"")
assert.NotContains(t, etag1, "-", "Single-part ETag should not contain '-'")
t.Logf("Version 1 (PutObject) ETag: %s", etag1)
// Version 2: Multipart upload
partSize := 5 * 1024 * 1024
partData := bytes.Repeat([]byte("x"), partSize)
createResp, err := client.CreateMultipartUpload(context.TODO(), &s3.CreateMultipartUploadInput{
Bucket: aws.String(bucketName),
Key: aws.String(objectKey),
})
require.NoError(t, err, "Failed to create multipart upload")
uploadPartResp, err := client.UploadPart(context.TODO(), &s3.UploadPartInput{
Bucket: aws.String(bucketName),
Key: aws.String(objectKey),
UploadId: createResp.UploadId,
PartNumber: aws.Int32(1),
Body: bytes.NewReader(partData),
})
require.NoError(t, err, "Failed to upload part")
completeResp, err := client.CompleteMultipartUpload(context.TODO(), &s3.CompleteMultipartUploadInput{
Bucket: aws.String(bucketName),
Key: aws.String(objectKey),
UploadId: createResp.UploadId,
MultipartUpload: &types.CompletedMultipartUpload{
Parts: []types.CompletedPart{
{
ETag: uploadPartResp.ETag,
PartNumber: aws.Int32(1),
},
},
},
})
require.NoError(t, err, "Failed to complete multipart upload")
etag2 := strings.Trim(*completeResp.ETag, "\"")
assert.Contains(t, etag2, "-", "Multipart ETag should contain '-'")
t.Logf("Version 2 (Multipart) ETag: %s", etag2)
// Version 3: Another regular PutObject
content3 := []byte("This is version 3 content - another single part upload")
putResp3, err := client.PutObject(context.TODO(), &s3.PutObjectInput{
Bucket: aws.String(bucketName),
Key: aws.String(objectKey),
Body: bytes.NewReader(content3),
})
require.NoError(t, err, "Failed to put version 3")
etag3 := strings.Trim(*putResp3.ETag, "\"")
assert.NotContains(t, etag3, "-", "Single-part ETag should not contain '-'")
t.Logf("Version 3 (PutObject) ETag: %s", etag3)
// ListObjectVersions should return all 3 versions with correct ETags
versionsResp, err := client.ListObjectVersions(context.TODO(), &s3.ListObjectVersionsInput{
Bucket: aws.String(bucketName),
Prefix: aws.String(objectKey),
})
require.NoError(t, err, "Failed to list object versions")
require.Len(t, versionsResp.Versions, 3, "Should have exactly 3 versions")
var listedETags []string
for _, v := range versionsResp.Versions {
etag := strings.Trim(*v.ETag, "\"")
assert.NotEmpty(t, etag, "Version ETag should not be empty")
listedETags = append(listedETags, etag)
t.Logf("Listed version %s ETag: %s, IsLatest: %v", *v.VersionId, etag, *v.IsLatest)
}
// Verify all ETags were found
assert.ElementsMatch(t, []string{etag1, etag2, etag3}, listedETags, "Listed ETags should match all expected ETags")
// Regular ListObjectsV2 should return only the latest (version 3)
listResp, err := client.ListObjectsV2(context.TODO(), &s3.ListObjectsV2Input{
Bucket: aws.String(bucketName),
Prefix: aws.String(objectKey),
})
require.NoError(t, err, "Failed to list objects")
require.Len(t, listResp.Contents, 1, "Should have exactly one object")
listETag := strings.Trim(*listResp.Contents[0].ETag, "\"")
assert.Equal(t, etag3, listETag, "ListObjectsV2 should show latest version's ETag (version 3)")
}
// TestMultipartUploadDeleteMarkerListBehavior tests that delete markers work correctly
// with multipart uploaded objects in versioned buckets.
func TestMultipartUploadDeleteMarkerListBehavior(t *testing.T) {
client := getS3Client(t)
bucketName := getNewBucketName()
// Create bucket
createBucket(t, client, bucketName)
defer deleteBucket(t, client, bucketName)
// Enable versioning
_, err := client.PutBucketVersioning(context.TODO(), &s3.PutBucketVersioningInput{
Bucket: aws.String(bucketName),
VersioningConfiguration: &types.VersioningConfiguration{
Status: types.BucketVersioningStatusEnabled,
},
})
require.NoError(t, err, "Failed to enable versioning")
objectKey := "multipart-delete-marker-test"
partSize := 5 * 1024 * 1024 // 5MB
// Create multipart upload
createResp, err := client.CreateMultipartUpload(context.TODO(), &s3.CreateMultipartUploadInput{
Bucket: aws.String(bucketName),
Key: aws.String(objectKey),
})
require.NoError(t, err, "Failed to create multipart upload")
// Upload 2 parts
part1Data := bytes.Repeat([]byte("a"), partSize)
part2Data := bytes.Repeat([]byte("b"), partSize)
uploadPart1Resp, err := client.UploadPart(context.TODO(), &s3.UploadPartInput{
Bucket: aws.String(bucketName),
Key: aws.String(objectKey),
UploadId: createResp.UploadId,
PartNumber: aws.Int32(1),
Body: bytes.NewReader(part1Data),
})
require.NoError(t, err, "Failed to upload part 1")
uploadPart2Resp, err := client.UploadPart(context.TODO(), &s3.UploadPartInput{
Bucket: aws.String(bucketName),
Key: aws.String(objectKey),
UploadId: createResp.UploadId,
PartNumber: aws.Int32(2),
Body: bytes.NewReader(part2Data),
})
require.NoError(t, err, "Failed to upload part 2")
// Complete multipart upload
completeResp, err := client.CompleteMultipartUpload(context.TODO(), &s3.CompleteMultipartUploadInput{
Bucket: aws.String(bucketName),
Key: aws.String(objectKey),
UploadId: createResp.UploadId,
MultipartUpload: &types.CompletedMultipartUpload{
Parts: []types.CompletedPart{
{ETag: uploadPart1Resp.ETag, PartNumber: aws.Int32(1)},
{ETag: uploadPart2Resp.ETag, PartNumber: aws.Int32(2)},
},
},
})
require.NoError(t, err, "Failed to complete multipart upload")
multipartETag := strings.Trim(*completeResp.ETag, "\"")
multipartVersionId := *completeResp.VersionId
t.Logf("Multipart upload completed: ETag=%s, VersionId=%s", multipartETag, multipartVersionId)
// Verify object is visible in ListObjectsV2
listBeforeDelete, err := client.ListObjectsV2(context.TODO(), &s3.ListObjectsV2Input{
Bucket: aws.String(bucketName),
Prefix: aws.String(objectKey),
})
require.NoError(t, err, "Failed to list objects before delete")
require.Len(t, listBeforeDelete.Contents, 1, "Object should be visible before delete")
assert.Equal(t, multipartETag, strings.Trim(*listBeforeDelete.Contents[0].ETag, "\""),
"Listed ETag should match multipart ETag before delete")
// Delete object (creates delete marker)
deleteResp, err := client.DeleteObject(context.TODO(), &s3.DeleteObjectInput{
Bucket: aws.String(bucketName),
Key: aws.String(objectKey),
})
require.NoError(t, err, "Failed to delete object")
require.NotNil(t, deleteResp.DeleteMarker, "Should create delete marker")
assert.True(t, *deleteResp.DeleteMarker, "DeleteMarker should be true")
require.NotNil(t, deleteResp.VersionId, "Delete marker should have version ID")
deleteMarkerVersionId := *deleteResp.VersionId
t.Logf("Delete marker created: VersionId=%s", deleteMarkerVersionId)
// ListObjectsV2 should NOT show the object anymore
listAfterDelete, err := client.ListObjectsV2(context.TODO(), &s3.ListObjectsV2Input{
Bucket: aws.String(bucketName),
Prefix: aws.String(objectKey),
})
require.NoError(t, err, "Failed to list objects after delete")
assert.Empty(t, listAfterDelete.Contents, "Object should NOT be visible after delete marker")
// ListObjectVersions should show both the original version AND the delete marker
versionsResp, err := client.ListObjectVersions(context.TODO(), &s3.ListObjectVersionsInput{
Bucket: aws.String(bucketName),
Prefix: aws.String(objectKey),
})
require.NoError(t, err, "Failed to list object versions")
// Should have 1 version (the multipart object)
require.Len(t, versionsResp.Versions, 1, "Should have exactly 1 version (the multipart object)")
version := versionsResp.Versions[0]
assert.Equal(t, multipartVersionId, *version.VersionId, "Version ID should match")
assert.Equal(t, multipartETag, strings.Trim(*version.ETag, "\""), "Version ETag should match multipart ETag")
assert.False(t, *version.IsLatest, "Multipart version should NOT be latest (delete marker is latest)")
// Should have 1 delete marker
require.Len(t, versionsResp.DeleteMarkers, 1, "Should have exactly 1 delete marker")
deleteMarker := versionsResp.DeleteMarkers[0]
assert.Equal(t, deleteMarkerVersionId, *deleteMarker.VersionId, "Delete marker version ID should match")
assert.True(t, *deleteMarker.IsLatest, "Delete marker should be latest")
t.Logf("ListObjectVersions: 1 version (ETag=%s), 1 delete marker", multipartETag)
// Access the specific version by version ID - should still work
getResp, err := client.GetObject(context.TODO(), &s3.GetObjectInput{
Bucket: aws.String(bucketName),
Key: aws.String(objectKey),
VersionId: aws.String(multipartVersionId),
})
require.NoError(t, err, "Should be able to get object by version ID after delete marker")
defer getResp.Body.Close()
assert.Equal(t, multipartETag, strings.Trim(*getResp.ETag, "\""),
"GetObject with version ID should return correct ETag")
assert.Equal(t, int64(partSize*2), *getResp.ContentLength,
"GetObject with version ID should return correct size")
t.Logf("Successfully retrieved version %s after delete marker", multipartVersionId)
// Delete the delete marker to "undelete" the object
_, err = client.DeleteObject(context.TODO(), &s3.DeleteObjectInput{
Bucket: aws.String(bucketName),
Key: aws.String(objectKey),
VersionId: aws.String(deleteMarkerVersionId),
})
require.NoError(t, err, "Failed to delete the delete marker")
// ListObjectsV2 should show the object again
listAfterUndelete, err := client.ListObjectsV2(context.TODO(), &s3.ListObjectsV2Input{
Bucket: aws.String(bucketName),
Prefix: aws.String(objectKey),
})
require.NoError(t, err, "Failed to list objects after undelete")
require.Len(t, listAfterUndelete.Contents, 1, "Object should be visible again after removing delete marker")
assert.Equal(t, multipartETag, strings.Trim(*listAfterUndelete.Contents[0].ETag, "\""),
"Undeleted object should have correct multipart ETag")
t.Logf("Object restored after delete marker removal, ETag=%s", multipartETag)
}

25
weed/s3api/filer_multipart.go

@ -367,6 +367,10 @@ func (s3a *S3ApiServer) completeMultipartUpload(r *http.Request, input *s3.Compl
versionFileName := s3a.getVersionFileName(versionId) versionFileName := s3a.getVersionFileName(versionId)
versionDir := dirName + "/" + entryName + s3_constants.VersionsFolder versionDir := dirName + "/" + entryName + s3_constants.VersionsFolder
// Capture timestamp and owner once for consistency between version entry and cache entry
versionMtime := time.Now().Unix()
amzAccountId := r.Header.Get(s3_constants.AmzAccountId)
// Create the version file in the .versions directory // Create the version file in the .versions directory
err = s3a.mkFile(versionDir, versionFileName, finalParts, func(versionEntry *filer_pb.Entry) { err = s3a.mkFile(versionDir, versionFileName, finalParts, func(versionEntry *filer_pb.Entry) {
if versionEntry.Extended == nil { if versionEntry.Extended == nil {
@ -382,7 +386,6 @@ func (s3a *S3ApiServer) completeMultipartUpload(r *http.Request, input *s3.Compl
} }
// Set object owner for versioned multipart objects // Set object owner for versioned multipart objects
amzAccountId := r.Header.Get(s3_constants.AmzAccountId)
if amzAccountId != "" { if amzAccountId != "" {
versionEntry.Extended[s3_constants.ExtAmzOwnerKey] = []byte(amzAccountId) versionEntry.Extended[s3_constants.ExtAmzOwnerKey] = []byte(amzAccountId)
} }
@ -405,6 +408,7 @@ func (s3a *S3ApiServer) completeMultipartUpload(r *http.Request, input *s3.Compl
versionEntry.Attributes.Mime = mime versionEntry.Attributes.Mime = mime
} }
versionEntry.Attributes.FileSize = uint64(offset) versionEntry.Attributes.FileSize = uint64(offset)
versionEntry.Attributes.Mtime = versionMtime
}) })
if err != nil { if err != nil {
@ -412,8 +416,25 @@ func (s3a *S3ApiServer) completeMultipartUpload(r *http.Request, input *s3.Compl
return nil, s3err.ErrInternalError return nil, s3err.ErrInternalError
} }
// Construct entry with metadata for caching in .versions directory
// Reuse versionMtime to keep list vs. HEAD timestamps aligned
etag := "\"" + filer.ETagChunks(finalParts) + "\""
versionEntryForCache := &filer_pb.Entry{
Attributes: &filer_pb.FuseAttributes{
FileSize: uint64(offset),
Mtime: versionMtime,
},
Extended: map[string][]byte{
s3_constants.ExtETagKey: []byte(etag),
},
}
if amzAccountId != "" {
versionEntryForCache.Extended[s3_constants.ExtAmzOwnerKey] = []byte(amzAccountId)
}
// Update the .versions directory metadata to indicate this is the latest version // Update the .versions directory metadata to indicate this is the latest version
err = s3a.updateLatestVersionInDirectory(*input.Bucket, *input.Key, versionId, versionFileName)
// Pass entry to cache its metadata for single-scan list efficiency
err = s3a.updateLatestVersionInDirectory(*input.Bucket, *input.Key, versionId, versionFileName, versionEntryForCache)
if err != nil { if err != nil {
glog.Errorf("completeMultipartUpload: failed to update latest version in directory: %v", err) glog.Errorf("completeMultipartUpload: failed to update latest version in directory: %v", err)
return nil, s3err.ErrInternalError return nil, s3err.ErrInternalError

6
weed/s3api/s3_constants/extend_key.go

@ -11,6 +11,12 @@ const (
ExtETagKey = "Seaweed-X-Amz-ETag" ExtETagKey = "Seaweed-X-Amz-ETag"
ExtLatestVersionIdKey = "Seaweed-X-Amz-Latest-Version-Id" ExtLatestVersionIdKey = "Seaweed-X-Amz-Latest-Version-Id"
ExtLatestVersionFileNameKey = "Seaweed-X-Amz-Latest-Version-File-Name" ExtLatestVersionFileNameKey = "Seaweed-X-Amz-Latest-Version-File-Name"
// Cached list metadata in .versions directory for single-scan efficiency
ExtLatestVersionSizeKey = "Seaweed-X-Amz-Latest-Version-Size"
ExtLatestVersionETagKey = "Seaweed-X-Amz-Latest-Version-ETag"
ExtLatestVersionMtimeKey = "Seaweed-X-Amz-Latest-Version-Mtime"
ExtLatestVersionOwnerKey = "Seaweed-X-Amz-Latest-Version-Owner"
ExtLatestVersionIsDeleteMarker = "Seaweed-X-Amz-Latest-Version-Is-Delete-Marker"
ExtMultipartObjectKey = "key" ExtMultipartObjectKey = "key"
// Bucket Policy // Bucket Policy

13
weed/s3api/s3api_object_handlers.go

@ -386,10 +386,21 @@ func newListEntry(entry *filer_pb.Entry, key string, dir string, name string, bu
if encodingTypeUrl { if encodingTypeUrl {
key = urlPathEscape(key) key = urlPathEscape(key)
} }
// Determine ETag: prioritize ExtETagKey for versioned objects (supports multipart ETags),
// then fall back to filer.ETag() which uses Md5 attribute or calculates from chunks
var etag string
if entry.Extended != nil {
if etagBytes, hasETag := entry.Extended[s3_constants.ExtETagKey]; hasETag {
etag = string(etagBytes)
}
}
if etag == "" {
etag = "\"" + filer.ETag(entry) + "\""
}
listEntry = ListEntry{ listEntry = ListEntry{
Key: key, Key: key,
LastModified: time.Unix(entry.Attributes.Mtime, 0).UTC(), LastModified: time.Unix(entry.Attributes.Mtime, 0).UTC(),
ETag: "\"" + filer.ETag(entry) + "\"",
ETag: etag,
Size: int64(filer.FileSize(entry)), Size: int64(filer.FileSize(entry)),
StorageClass: StorageClass(storageClass), StorageClass: StorageClass(storageClass),
} }

3
weed/s3api/s3api_object_handlers_copy.go

@ -313,7 +313,8 @@ func (s3a *S3ApiServer) CopyObjectHandler(w http.ResponseWriter, r *http.Request
} }
// Update the .versions directory metadata // Update the .versions directory metadata
err = s3a.updateLatestVersionInDirectory(dstBucket, dstObject, dstVersionId, versionFileName)
// Pass dstEntry to cache its metadata for single-scan list efficiency
err = s3a.updateLatestVersionInDirectory(dstBucket, dstObject, dstVersionId, versionFileName, dstEntry)
if err != nil { if err != nil {
glog.Errorf("CopyObjectHandler: failed to update latest version in directory: %v", err) glog.Errorf("CopyObjectHandler: failed to update latest version in directory: %v", err)
s3err.WriteErrorResponse(w, r, s3err.ErrInternalError) s3err.WriteErrorResponse(w, r, s3err.ErrInternalError)

54
weed/s3api/s3api_object_handlers_list.go

@ -3,11 +3,11 @@ package s3api
import ( import (
"context" "context"
"encoding/xml" "encoding/xml"
"errors"
"fmt" "fmt"
"io" "io"
"net/http" "net/http"
"net/url" "net/url"
"path"
"sort" "sort"
"strconv" "strconv"
"strings" "strings"
@ -491,7 +491,8 @@ func (s3a *S3ApiServer) doListFilerEntries(client filer_pb.SeaweedFilerClient, d
} }
// Track .versions directories found in this directory for later processing // Track .versions directories found in this directory for later processing
var versionsDirs []string
// Store the full entry to avoid additional getEntry calls (N+1 query optimization)
var versionsDirs []*filer_pb.Entry
for { for {
resp, recvErr := stream.Recv() resp, recvErr := stream.Recv()
@ -528,9 +529,10 @@ func (s3a *S3ApiServer) doListFilerEntries(client filer_pb.SeaweedFilerClient, d
} }
// Skip .versions directories in regular list operations but track them for logical object creation // Skip .versions directories in regular list operations but track them for logical object creation
// Store the full entry to avoid additional getEntry calls later
if strings.HasSuffix(entry.Name, s3_constants.VersionsFolder) { if strings.HasSuffix(entry.Name, s3_constants.VersionsFolder) {
glog.V(4).Infof("Found .versions directory: %s", entry.Name) glog.V(4).Infof("Found .versions directory: %s", entry.Name)
versionsDirs = append(versionsDirs, entry.Name)
versionsDirs = append(versionsDirs, entry)
continue continue
} }
@ -568,6 +570,7 @@ func (s3a *S3ApiServer) doListFilerEntries(client filer_pb.SeaweedFilerClient, d
// After processing all regular entries, handle versioned objects // After processing all regular entries, handle versioned objects
// Create logical entries for objects that have .versions directories // Create logical entries for objects that have .versions directories
// OPTIMIZATION: Use the already-fetched .versions directory entry to avoid N+1 queries
for _, versionsDir := range versionsDirs { for _, versionsDir := range versionsDirs {
if cursor.maxKeys <= 0 { if cursor.maxKeys <= 0 {
cursor.isTruncated = true cursor.isTruncated = true
@ -576,10 +579,10 @@ func (s3a *S3ApiServer) doListFilerEntries(client filer_pb.SeaweedFilerClient, d
// Update nextMarker to ensure pagination advances past this .versions directory // Update nextMarker to ensure pagination advances past this .versions directory
// This is critical to prevent infinite loops when results are truncated // This is critical to prevent infinite loops when results are truncated
nextMarker = versionsDir
nextMarker = versionsDir.Name
// Extract object name from .versions directory name (remove .versions suffix) // Extract object name from .versions directory name (remove .versions suffix)
baseObjectName := strings.TrimSuffix(versionsDir, s3_constants.VersionsFolder)
baseObjectName := strings.TrimSuffix(versionsDir.Name, s3_constants.VersionsFolder)
// Construct full object path relative to bucket // Construct full object path relative to bucket
// dir is something like "/buckets/sea-test-1/Veeam/Backup/vbr/Config" // dir is something like "/buckets/sea-test-1/Veeam/Backup/vbr/Config"
@ -602,12 +605,17 @@ func (s3a *S3ApiServer) doListFilerEntries(client filer_pb.SeaweedFilerClient, d
glog.V(4).Infof("Processing versioned object: baseObjectName=%s, bucketRelativePath=%s, fullObjectPath=%s", glog.V(4).Infof("Processing versioned object: baseObjectName=%s, bucketRelativePath=%s, fullObjectPath=%s",
baseObjectName, bucketRelativePath, fullObjectPath) baseObjectName, bucketRelativePath, fullObjectPath)
// Get the latest version information for this object
if latestVersionEntry, latestVersionErr := s3a.getLatestVersionEntryForListOperation(bucketName, fullObjectPath); latestVersionErr == nil {
// OPTIMIZATION: Use metadata from the already-fetched .versions directory entry
// This avoids additional getEntry calls which cause high "find" usage
if latestVersionEntry, err := s3a.getLatestVersionEntryFromDirectoryEntry(bucketName, fullObjectPath, versionsDir); err == nil {
glog.V(4).Infof("Creating logical entry for versioned object: %s", fullObjectPath) glog.V(4).Infof("Creating logical entry for versioned object: %s", fullObjectPath)
eachEntryFn(dir, latestVersionEntry) eachEntryFn(dir, latestVersionEntry)
} else if errors.Is(err, ErrDeleteMarker) {
// Expected: latest version is a delete marker, object should not appear in list
glog.V(4).Infof("Skipping versioned object %s: delete marker", fullObjectPath)
} else { } else {
glog.V(4).Infof("Failed to get latest version for %s: %v", fullObjectPath, latestVersionErr)
// Unexpected failure: missing metadata, fetch error, etc.
glog.V(3).Infof("Skipping versioned object %s due to error: %v", fullObjectPath, err)
} }
} }
@ -712,36 +720,6 @@ func (s3a *S3ApiServer) ensureDirectoryAllEmpty(filerClient filer_pb.SeaweedFile
return true, nil return true, nil
} }
// getLatestVersionEntryForListOperation gets the latest version of an object and creates a logical entry for list operations
// This is used to show versioned objects as logical object names in regular list operations
func (s3a *S3ApiServer) getLatestVersionEntryForListOperation(bucket, object string) (*filer_pb.Entry, error) {
// Get the latest version entry
latestVersionEntry, err := s3a.getLatestObjectVersion(bucket, object)
if err != nil {
return nil, fmt.Errorf("failed to get latest version: %w", err)
}
// Check if this is a delete marker (should not be shown in regular list)
if latestVersionEntry.Extended != nil {
if deleteMarker, exists := latestVersionEntry.Extended[s3_constants.ExtDeleteMarkerKey]; exists && string(deleteMarker) == "true" {
return nil, fmt.Errorf("latest version is a delete marker")
}
}
// Create a logical entry that appears to be stored at the object path (not the versioned path)
// This allows the list operation to show the logical object name while preserving all metadata
// Use path.Base to get just the filename, since the entry.Name should be the local name only
// (the directory path is already included in the 'dir' parameter passed to eachEntryFn)
logicalEntry := &filer_pb.Entry{
Name: path.Base(object),
IsDirectory: false,
Attributes: latestVersionEntry.Attributes,
Extended: latestVersionEntry.Extended,
Chunks: latestVersionEntry.Chunks,
}
return logicalEntry, nil
}
// compareWithDelimiter compares two strings for sorting, treating the delimiter character // compareWithDelimiter compares two strings for sorting, treating the delimiter character
// as having lower precedence than other characters to match AWS S3 behavior. // as having lower precedence than other characters to match AWS S3 behavior.

9
weed/s3api/s3api_object_handlers_put.go

@ -1035,7 +1035,8 @@ func (s3a *S3ApiServer) putVersionedObject(r *http.Request, bucket, object strin
} }
// Update the .versions directory metadata to indicate this is the latest version // Update the .versions directory metadata to indicate this is the latest version
err = s3a.updateLatestVersionInDirectory(bucket, normalizedObject, versionId, versionFileName)
// Pass versionEntry to cache its metadata for single-scan list efficiency
err = s3a.updateLatestVersionInDirectory(bucket, normalizedObject, versionId, versionFileName, versionEntry)
if err != nil { if err != nil {
glog.Errorf("putVersionedObject: failed to update latest version in directory: %v", err) glog.Errorf("putVersionedObject: failed to update latest version in directory: %v", err)
return "", "", s3err.ErrInternalError, SSEResponseMetadata{} return "", "", s3err.ErrInternalError, SSEResponseMetadata{}
@ -1045,7 +1046,8 @@ func (s3a *S3ApiServer) putVersionedObject(r *http.Request, bucket, object strin
} }
// updateLatestVersionInDirectory updates the .versions directory metadata to indicate the latest version // updateLatestVersionInDirectory updates the .versions directory metadata to indicate the latest version
func (s3a *S3ApiServer) updateLatestVersionInDirectory(bucket, object, versionId, versionFileName string) error {
// versionEntry contains the metadata (size, ETag, mtime, owner) to cache for single-scan list efficiency
func (s3a *S3ApiServer) updateLatestVersionInDirectory(bucket, object, versionId, versionFileName string, versionEntry *filer_pb.Entry) error {
bucketDir := s3a.option.BucketsPath + "/" + bucket bucketDir := s3a.option.BucketsPath + "/" + bucket
versionsObjectPath := object + s3_constants.VersionsFolder versionsObjectPath := object + s3_constants.VersionsFolder
@ -1078,6 +1080,9 @@ func (s3a *S3ApiServer) updateLatestVersionInDirectory(bucket, object, versionId
versionsEntry.Extended[s3_constants.ExtLatestVersionIdKey] = []byte(versionId) versionsEntry.Extended[s3_constants.ExtLatestVersionIdKey] = []byte(versionId)
versionsEntry.Extended[s3_constants.ExtLatestVersionFileNameKey] = []byte(versionFileName) versionsEntry.Extended[s3_constants.ExtLatestVersionFileNameKey] = []byte(versionFileName)
// Cache list metadata for single-scan efficiency (avoids extra getEntry per object during list)
setCachedListMetadata(versionsEntry, versionEntry)
// Update the .versions directory entry with metadata // Update the .versions directory entry with metadata
err = s3a.mkFile(bucketDir, versionsObjectPath, versionsEntry.Chunks, func(updatedEntry *filer_pb.Entry) { err = s3a.mkFile(bucketDir, versionsObjectPath, versionsEntry.Chunks, func(updatedEntry *filer_pb.Entry) {
updatedEntry.Extended = versionsEntry.Extended updatedEntry.Extended = versionsEntry.Extended

214
weed/s3api/s3api_object_versioning.go

@ -4,7 +4,10 @@ package s3api
// Version ID format handling is in s3api_version_id.go // Version ID format handling is in s3api_version_id.go
import ( import (
"bytes"
"encoding/hex"
"encoding/xml" "encoding/xml"
"errors"
"fmt" "fmt"
"net/http" "net/http"
"path" "path"
@ -20,6 +23,65 @@ import (
"github.com/seaweedfs/seaweedfs/weed/s3api/s3err" "github.com/seaweedfs/seaweedfs/weed/s3api/s3err"
) )
// ErrDeleteMarker is returned when the latest version is a delete marker (expected condition)
var ErrDeleteMarker = errors.New("latest version is a delete marker")
// clearCachedVersionMetadata clears only the version metadata fields (not ID/filename).
// Used by setCachedListMetadata to prevent stale values when updating.
func clearCachedVersionMetadata(extended map[string][]byte) {
delete(extended, s3_constants.ExtLatestVersionSizeKey)
delete(extended, s3_constants.ExtLatestVersionMtimeKey)
delete(extended, s3_constants.ExtLatestVersionETagKey)
delete(extended, s3_constants.ExtLatestVersionOwnerKey)
delete(extended, s3_constants.ExtLatestVersionIsDeleteMarker)
}
// setCachedListMetadata caches list metadata in the .versions directory entry for single-scan efficiency
func setCachedListMetadata(versionsEntry, versionEntry *filer_pb.Entry) {
if versionEntry == nil || versionsEntry == nil {
return
}
if versionsEntry.Extended == nil {
versionsEntry.Extended = make(map[string][]byte)
}
// Clear old cached metadata to prevent stale values
// Note: We don't use clearCachedListMetadata here because it also clears
// ExtLatestVersionIdKey and ExtLatestVersionFileNameKey, which are set by the caller
clearCachedVersionMetadata(versionsEntry.Extended)
// Size and Mtime
if versionEntry.Attributes != nil {
versionsEntry.Extended[s3_constants.ExtLatestVersionSizeKey] = []byte(strconv.FormatUint(versionEntry.Attributes.FileSize, 10))
versionsEntry.Extended[s3_constants.ExtLatestVersionMtimeKey] = []byte(strconv.FormatInt(versionEntry.Attributes.Mtime, 10))
}
// ETag, Owner, DeleteMarker from Extended
if versionEntry.Extended != nil {
if etag, ok := versionEntry.Extended[s3_constants.ExtETagKey]; ok {
versionsEntry.Extended[s3_constants.ExtLatestVersionETagKey] = etag
}
if owner, ok := versionEntry.Extended[s3_constants.ExtAmzOwnerKey]; ok {
versionsEntry.Extended[s3_constants.ExtLatestVersionOwnerKey] = owner
}
if deleteMarker, ok := versionEntry.Extended[s3_constants.ExtDeleteMarkerKey]; ok {
versionsEntry.Extended[s3_constants.ExtLatestVersionIsDeleteMarker] = deleteMarker
} else {
versionsEntry.Extended[s3_constants.ExtLatestVersionIsDeleteMarker] = []byte("false")
}
}
}
// clearCachedListMetadata removes all cached list metadata from the .versions directory entry
func clearCachedListMetadata(extended map[string][]byte) {
if extended == nil {
return
}
delete(extended, s3_constants.ExtLatestVersionIdKey)
delete(extended, s3_constants.ExtLatestVersionFileNameKey)
clearCachedVersionMetadata(extended)
}
// S3ListObjectVersionsResult - Custom struct for S3 list-object-versions response // S3ListObjectVersionsResult - Custom struct for S3 list-object-versions response
// This avoids conflicts with the XSD generated ListVersionsResult struct // This avoids conflicts with the XSD generated ListVersionsResult struct
// and ensures proper separation of versions and delete markers into arrays // and ensures proper separation of versions and delete markers into arrays
@ -93,25 +155,40 @@ func (s3a *S3ApiServer) createDeleteMarker(bucket, object string) (string, error
versionsDir := bucketDir + "/" + cleanObject + s3_constants.VersionsFolder versionsDir := bucketDir + "/" + cleanObject + s3_constants.VersionsFolder
// Create the delete marker entry in the .versions directory // Create the delete marker entry in the .versions directory
deleteMarkerMtime := time.Now().Unix()
deleteMarkerExtended := map[string][]byte{
s3_constants.ExtVersionIdKey: []byte(versionId),
s3_constants.ExtDeleteMarkerKey: []byte("true"),
}
err := s3a.mkFile(versionsDir, versionFileName, nil, func(entry *filer_pb.Entry) { err := s3a.mkFile(versionsDir, versionFileName, nil, func(entry *filer_pb.Entry) {
entry.Name = versionFileName
entry.IsDirectory = false entry.IsDirectory = false
if entry.Attributes == nil { if entry.Attributes == nil {
entry.Attributes = &filer_pb.FuseAttributes{} entry.Attributes = &filer_pb.FuseAttributes{}
} }
entry.Attributes.Mtime = time.Now().Unix()
entry.Attributes.Mtime = deleteMarkerMtime
if entry.Extended == nil { if entry.Extended == nil {
entry.Extended = make(map[string][]byte) entry.Extended = make(map[string][]byte)
} }
entry.Extended[s3_constants.ExtVersionIdKey] = []byte(versionId)
entry.Extended[s3_constants.ExtDeleteMarkerKey] = []byte("true")
for k, v := range deleteMarkerExtended {
entry.Extended[k] = v
}
}) })
if err != nil { if err != nil {
return "", fmt.Errorf("failed to create delete marker in .versions directory: %w", err) return "", fmt.Errorf("failed to create delete marker in .versions directory: %w", err)
} }
// Update the .versions directory metadata to indicate this delete marker is the latest version // Update the .versions directory metadata to indicate this delete marker is the latest version
err = s3a.updateLatestVersionInDirectory(bucket, cleanObject, versionId, versionFileName)
// Pass deleteMarkerEntry to cache its metadata for single-scan list efficiency
deleteMarkerEntry := &filer_pb.Entry{
Name: versionFileName,
IsDirectory: false,
Attributes: &filer_pb.FuseAttributes{
Mtime: deleteMarkerMtime,
},
Extended: deleteMarkerExtended,
}
err = s3a.updateLatestVersionInDirectory(bucket, cleanObject, versionId, versionFileName, deleteMarkerEntry)
if err != nil { if err != nil {
glog.Errorf("createDeleteMarker: failed to update latest version in directory: %v", err) glog.Errorf("createDeleteMarker: failed to update latest version in directory: %v", err)
return "", fmt.Errorf("failed to update latest version in directory: %w", err) return "", fmt.Errorf("failed to update latest version in directory: %w", err)
@ -827,6 +904,7 @@ func (s3a *S3ApiServer) updateLatestVersionAfterDeletion(bucket, object string)
// Find the most recent remaining version (latest timestamp in version ID) // Find the most recent remaining version (latest timestamp in version ID)
var latestVersionId string var latestVersionId string
var latestVersionFileName string var latestVersionFileName string
var latestVersionEntry *filer_pb.Entry
for _, entry := range entries { for _, entry := range entries {
if entry.Extended == nil { if entry.Extended == nil {
@ -852,6 +930,7 @@ func (s3a *S3ApiServer) updateLatestVersionAfterDeletion(bucket, object string)
glog.V(1).Infof("updateLatestVersionAfterDeletion: found newer version %s (file: %s)", versionId, entry.Name) glog.V(1).Infof("updateLatestVersionAfterDeletion: found newer version %s (file: %s)", versionId, entry.Name)
latestVersionId = versionId latestVersionId = versionId
latestVersionFileName = entry.Name latestVersionFileName = entry.Name
latestVersionEntry = entry
} else { } else {
glog.V(1).Infof("updateLatestVersionAfterDeletion: skipping older or equal version %s", versionId) glog.V(1).Infof("updateLatestVersionAfterDeletion: skipping older or equal version %s", versionId)
} }
@ -871,11 +950,14 @@ func (s3a *S3ApiServer) updateLatestVersionAfterDeletion(bucket, object string)
// Update metadata to point to new latest version // Update metadata to point to new latest version
versionsEntry.Extended[s3_constants.ExtLatestVersionIdKey] = []byte(latestVersionId) versionsEntry.Extended[s3_constants.ExtLatestVersionIdKey] = []byte(latestVersionId)
versionsEntry.Extended[s3_constants.ExtLatestVersionFileNameKey] = []byte(latestVersionFileName) versionsEntry.Extended[s3_constants.ExtLatestVersionFileNameKey] = []byte(latestVersionFileName)
// Update cached list metadata from the new latest version entry
setCachedListMetadata(versionsEntry, latestVersionEntry)
glog.V(2).Infof("updateLatestVersionAfterDeletion: new latest version for %s/%s is %s", bucket, object, latestVersionId) glog.V(2).Infof("updateLatestVersionAfterDeletion: new latest version for %s/%s is %s", bucket, object, latestVersionId)
} else { } else {
// No versions left, remove latest version metadata
delete(versionsEntry.Extended, s3_constants.ExtLatestVersionIdKey)
delete(versionsEntry.Extended, s3_constants.ExtLatestVersionFileNameKey)
// No versions left, remove all cached metadata
clearCachedListMetadata(versionsEntry.Extended)
glog.V(2).Infof("updateLatestVersionAfterDeletion: no versions left for %s/%s", bucket, object) glog.V(2).Infof("updateLatestVersionAfterDeletion: no versions left for %s/%s", bucket, object)
} }
@ -1043,6 +1125,121 @@ func (s3a *S3ApiServer) getLatestObjectVersion(bucket, object string) (*filer_pb
return latestVersionEntry, nil return latestVersionEntry, nil
} }
// getLatestVersionEntryFromDirectoryEntry creates a logical entry for list operations using cached metadata
// from the .versions directory entry. This achieves SINGLE-SCAN efficiency - no additional getEntry calls needed.
//
// For N versioned objects:
// - Before: N×1 to N×12 find operations per list
// - After: 0 extra find operations (all metadata cached in .versions directory)
//
// Returns ErrDeleteMarker if the latest version is a delete marker (expected condition, not an error).
func (s3a *S3ApiServer) getLatestVersionEntryFromDirectoryEntry(bucket, object string, versionsDirEntry *filer_pb.Entry) (*filer_pb.Entry, error) {
// Defensive nil check
if versionsDirEntry == nil {
return nil, fmt.Errorf("nil .versions directory entry")
}
normalizedObject := removeDuplicateSlashes(object)
// Check if the directory entry has latest version metadata
if versionsDirEntry.Extended == nil {
return nil, fmt.Errorf("no Extended metadata in .versions directory entry")
}
latestVersionIdBytes, hasLatestVersionId := versionsDirEntry.Extended[s3_constants.ExtLatestVersionIdKey]
if !hasLatestVersionId {
return nil, fmt.Errorf("missing latest version ID metadata in .versions directory entry")
}
// Check if this is a delete marker (should not be shown in regular list)
if isDeleteMarker, exists := versionsDirEntry.Extended[s3_constants.ExtLatestVersionIsDeleteMarker]; exists && string(isDeleteMarker) == "true" {
return nil, ErrDeleteMarker
}
latestVersionId := string(latestVersionIdBytes)
// Try to use cached metadata for zero-copy list (single-scan efficiency)
sizeBytes, hasSize := versionsDirEntry.Extended[s3_constants.ExtLatestVersionSizeKey]
mtimeBytes, hasMtime := versionsDirEntry.Extended[s3_constants.ExtLatestVersionMtimeKey]
etagBytes, hasEtag := versionsDirEntry.Extended[s3_constants.ExtLatestVersionETagKey]
if hasSize && hasMtime && hasEtag {
size, sizeErr := strconv.ParseUint(string(sizeBytes), 10, 64)
mtime, mtimeErr := strconv.ParseInt(string(mtimeBytes), 10, 64)
if sizeErr == nil && mtimeErr == nil {
// Use cached metadata - no getEntry call needed!
glog.V(3).Infof("getLatestVersionEntryFromDirectoryEntry: using cached metadata for %s/%s (size=%d, mtime=%d)", bucket, normalizedObject, size, mtime)
logicalEntry := &filer_pb.Entry{
Name: path.Base(normalizedObject),
IsDirectory: false,
Attributes: &filer_pb.FuseAttributes{
FileSize: size,
Mtime: mtime,
},
Extended: map[string][]byte{
s3_constants.ExtVersionIdKey: []byte(latestVersionId),
s3_constants.ExtETagKey: etagBytes,
},
}
// Attempt to parse the ETag and set it as Md5 attribute for compatibility with filer.ETag().
// This is a partial fix for single-part uploads. Multipart ETags will still use ExtETagKey.
if len(etagBytes) >= 2 && etagBytes[0] == '"' && etagBytes[len(etagBytes)-1] == '"' {
unquotedEtag := etagBytes[1 : len(etagBytes)-1]
if !bytes.Contains(unquotedEtag, []byte("-")) {
if md5bytes, err := hex.DecodeString(string(unquotedEtag)); err == nil {
logicalEntry.Attributes.Md5 = md5bytes
}
}
}
// Add owner if cached
if ownerBytes, hasOwner := versionsDirEntry.Extended[s3_constants.ExtLatestVersionOwnerKey]; hasOwner {
logicalEntry.Extended[s3_constants.ExtAmzOwnerKey] = ownerBytes
}
return logicalEntry, nil
}
glog.Warningf("getLatestVersionEntryFromDirectoryEntry: failed to parse cached metadata for %s/%s, falling back. sizeErr:%v, mtimeErr:%v", bucket, normalizedObject, sizeErr, mtimeErr)
}
// Fallback: fetch version file if cached metadata not available (for older versions)
latestVersionFileBytes, hasLatestVersionFile := versionsDirEntry.Extended[s3_constants.ExtLatestVersionFileNameKey]
if !hasLatestVersionFile {
return nil, fmt.Errorf("missing latest version file name metadata in .versions directory entry")
}
latestVersionFile := string(latestVersionFileBytes)
glog.V(3).Infof("getLatestVersionEntryFromDirectoryEntry: fetching version file for %s/%s (no cached metadata)", bucket, normalizedObject)
bucketDir := path.Join(s3a.option.BucketsPath, bucket)
versionsObjectPath := path.Join(normalizedObject, s3_constants.VersionsFolder)
latestVersionPath := path.Join(versionsObjectPath, latestVersionFile)
latestVersionEntry, err := s3a.getEntry(bucketDir, latestVersionPath)
if err != nil {
return nil, fmt.Errorf("failed to get latest version file %s: %v", latestVersionPath, err)
}
// Check if this is a delete marker (should not be shown in regular list)
if latestVersionEntry.Extended != nil {
if deleteMarker, exists := latestVersionEntry.Extended[s3_constants.ExtDeleteMarkerKey]; exists && string(deleteMarker) == "true" {
return nil, ErrDeleteMarker
}
}
// Create a logical entry that appears at the object path (not the versioned path)
logicalEntry := &filer_pb.Entry{
Name: path.Base(normalizedObject),
IsDirectory: false,
Attributes: latestVersionEntry.Attributes,
Extended: latestVersionEntry.Extended,
Chunks: latestVersionEntry.Chunks,
}
return logicalEntry, nil
}
// getObjectOwnerFromVersion extracts object owner information from version metadata // getObjectOwnerFromVersion extracts object owner information from version metadata
func (s3a *S3ApiServer) getObjectOwnerFromVersion(version *ObjectVersion, bucket, objectKey string) CanonicalUser { func (s3a *S3ApiServer) getObjectOwnerFromVersion(version *ObjectVersion, bucket, objectKey string) CanonicalUser {
// First try to get owner from the version's OwnerID field (extracted during listing) // First try to get owner from the version's OwnerID field (extracted during listing)
@ -1078,4 +1275,3 @@ func (s3a *S3ApiServer) getObjectOwnerFromEntry(entry *filer_pb.Entry) Canonical
// Fallback: return anonymous if no owner found // Fallback: return anonymous if no owner found
return CanonicalUser{ID: s3_constants.AccountAnonymousId, DisplayName: "anonymous"} return CanonicalUser{ID: s3_constants.AccountAnonymousId, DisplayName: "anonymous"}
} }
Loading…
Cancel
Save