6 changed files with 100 additions and 528 deletions
-
364COMBINED_TTFB_PROFILING.md
-
14weed/s3api/s3api_bucket_config.go
-
4weed/s3api/s3api_bucket_handlers.go
-
148weed/s3api/s3api_object_handlers.go
-
33weed/s3api/s3api_object_handlers_multipart.go
-
65weed/s3api/s3api_object_handlers_put.go
@ -1,364 +0,0 @@ |
|||||
# Combined TTFB Profiling - Complete Request Breakdown |
|
||||
|
|
||||
## Overview |
|
||||
Integrated profiling that shows complete GET request timing from entry to completion, including both high-level TTFB stages and detailed streaming metrics. |
|
||||
|
|
||||
## Profiling Output Format |
|
||||
|
|
||||
When running with `-v=2`, you'll see hierarchical profiling logs for each GET request: |
|
||||
|
|
||||
### Example 1: Unencrypted Object GET |
|
||||
|
|
||||
``` |
|
||||
GET TTFB PROFILE mybucket/myobject.txt: total=52ms | conditional=0.5ms, versioning=8ms, entryFetch=1ms, stream=42ms |
|
||||
└─ streamFromVolumeServers: total=42ms, rangeParse=0.2ms, headerSet=0.4ms, chunkResolve=9ms, streamPrep=2ms, streamExec=30ms |
|
||||
``` |
|
||||
|
|
||||
**Breakdown**: |
|
||||
``` |
|
||||
Total Request: 52ms |
|
||||
├─ Conditional Headers: 0.5ms (check If-Match, If-None-Match, etc.) |
|
||||
├─ Versioning Check: 8ms (check bucket versioning config + fetch version) |
|
||||
├─ Entry Fetch: 1ms (reuse or fetch metadata for SSE detection) |
|
||||
└─ Stream: 42ms (direct volume read - MAIN TIME) |
|
||||
├─ Range Parse: 0.2ms (parse Range header if present) |
|
||||
├─ Header Set: 0.4ms (set response headers) |
|
||||
├─ Chunk Resolve: 9ms (lookup volume locations) |
|
||||
├─ Stream Prep: 2ms (prepare streaming function) |
|
||||
└─ Stream Exec: 30ms (actual data transfer) ⚡ |
|
||||
``` |
|
||||
|
|
||||
### Example 2: SSE-C Encrypted Object GET |
|
||||
|
|
||||
``` |
|
||||
GET TTFB PROFILE mybucket/encrypted.bin: total=68ms | conditional=0.4ms, versioning=7ms, entryFetch=2ms, stream=58ms |
|
||||
└─ streamFromVolumeServersWithSSE (SSE-C): total=58ms, keyValidate=4ms, headerSet=0.5ms, streamFetch=14ms, decryptSetup=1ms, copy=38ms |
|
||||
``` |
|
||||
|
|
||||
**Breakdown**: |
|
||||
``` |
|
||||
Total Request: 68ms |
|
||||
├─ Conditional Headers: 0.4ms |
|
||||
├─ Versioning Check: 7ms |
|
||||
├─ Entry Fetch: 2ms |
|
||||
└─ Stream (SSE-C): 58ms (encrypted object with inline decryption) |
|
||||
├─ Key Validate: 4ms (validate SSE-C customer key) |
|
||||
├─ Header Set: 0.5ms (set SSE response headers) |
|
||||
├─ Stream Fetch: 14ms (fetch encrypted data from volumes) |
|
||||
├─ Decrypt Setup: 1ms (setup AES-256 decryption wrapper) |
|
||||
└─ Copy: 38ms (decrypt + stream to client) ⚡ |
|
||||
``` |
|
||||
|
|
||||
### Example 3: Range Request |
|
||||
|
|
||||
``` |
|
||||
GET TTFB PROFILE mybucket/video.mp4: total=28ms | conditional=0.3ms, versioning=0.1ms, entryFetch=0.5ms, stream=27ms |
|
||||
└─ streamFromVolumeServers: total=27ms, rangeParse=0.4ms, headerSet=0.3ms, chunkResolve=7ms, streamPrep=1ms, streamExec=18ms |
|
||||
``` |
|
||||
|
|
||||
**Breakdown**: |
|
||||
``` |
|
||||
Total Request: 28ms (faster due to partial content) |
|
||||
├─ Conditional Headers: 0.3ms |
|
||||
├─ Versioning Check: 0.1ms (non-versioned bucket) |
|
||||
├─ Entry Fetch: 0.5ms (reused from cache) |
|
||||
└─ Stream: 27ms |
|
||||
├─ Range Parse: 0.4ms (parse "bytes=0-1023") |
|
||||
├─ Header Set: 0.3ms (set 206 Partial Content) |
|
||||
├─ Chunk Resolve: 7ms (resolve for range only) |
|
||||
├─ Stream Prep: 1ms |
|
||||
└─ Stream Exec: 18ms (stream only requested range) |
|
||||
``` |
|
||||
|
|
||||
## Performance Metrics |
|
||||
|
|
||||
### Typical Performance (10MB object) |
|
||||
|
|
||||
| Stage | Unencrypted | SSE-C | SSE-KMS | SSE-S3 | |
|
||||
|-------|-------------|-------|---------|--------| |
|
||||
| **Conditional Headers** | 0.3-1ms | 0.3-1ms | 0.3-1ms | 0.3-1ms | |
|
||||
| **Versioning Check** | 0.1-10ms* | 0.1-10ms* | 0.1-10ms* | 0.1-10ms* | |
|
||||
| **Entry Fetch** | 0.5-2ms** | 0.5-2ms** | 0.5-2ms** | 0.5-2ms** | |
|
||||
| **Stream Total** | 40-50ms | 55-65ms | 55-65ms | 55-65ms | |
|
||||
| └─ Range Parse | 0.1-0.5ms | - | - | - | |
|
||||
| └─ Key Validate | - | 2-5ms | 3-8ms | 2-4ms | |
|
||||
| └─ Header Set | 0.3-1ms | 0.5-1ms | 0.5-1ms | 0.5-1ms | |
|
||||
| └─ Chunk Resolve | 6-12ms | - | - | - | |
|
||||
| └─ Stream Fetch | - | 12-18ms | 12-18ms | 12-18ms | |
|
||||
| └─ Stream Prep | 1-3ms | - | - | - | |
|
||||
| └─ Decrypt Setup | - | 0.5-2ms | 0.5-2ms | 0.5-2ms | |
|
||||
| └─ Stream Exec / Copy | 30-40ms | 35-45ms | 35-45ms | 35-45ms | |
|
||||
| **TOTAL** | **42-63ms** | **57-78ms** | **58-82ms** | **57-78ms** | |
|
||||
|
|
||||
\* Higher if versioned bucket with many versions |
|
||||
\** Higher if entry not cached, near 0 if reused |
|
||||
|
|
||||
### Overhead Analysis |
|
||||
|
|
||||
**SSE Encryption Overhead**: |
|
||||
- SSE-C: +12-20ms (~20-30% overhead) |
|
||||
- SSE-KMS: +15-25ms (~25-35% overhead) |
|
||||
- SSE-S3: +12-20ms (~20-30% overhead) |
|
||||
|
|
||||
**Main Cost**: Decryption during streaming (`copy` stage) |
|
||||
|
|
||||
## How to Enable |
|
||||
|
|
||||
```bash |
|
||||
# Start SeaweedFS S3 API with verbose logging |
|
||||
weed server -s3 -v=2 |
|
||||
|
|
||||
# Or standalone S3 server |
|
||||
weed s3 -filer=localhost:8888 -v=2 |
|
||||
``` |
|
||||
|
|
||||
## Filtering Logs |
|
||||
|
|
||||
```bash |
|
||||
# View complete TTFB profiles |
|
||||
grep "GET TTFB PROFILE" /path/to/logs |
|
||||
|
|
||||
# View with detailed streaming breakdown |
|
||||
grep -A1 "GET TTFB PROFILE" /path/to/logs |
|
||||
|
|
||||
# View only SSE requests |
|
||||
grep "SSE-C\|SSE-KMS\|SSE-S3" /path/to/logs | grep "PROFILE" |
|
||||
|
|
||||
# Extract timing statistics |
|
||||
grep "GET TTFB PROFILE" logs.txt | \ |
|
||||
awk -F'total=' '{print $2}' | \ |
|
||||
awk '{print $1}' | \ |
|
||||
sed 's/ms.*//' | \ |
|
||||
sort -n | \ |
|
||||
awk ' |
|
||||
{sum+=$1; count++; vals[count]=$1} |
|
||||
END { |
|
||||
print "Requests:", count |
|
||||
print "Average:", sum/count "ms" |
|
||||
print "Median:", vals[int(count/2)] "ms" |
|
||||
print "P90:", vals[int(count*0.9)] "ms" |
|
||||
print "P99:", vals[int(count*0.99)] "ms" |
|
||||
} |
|
||||
' |
|
||||
``` |
|
||||
|
|
||||
## Performance Analysis Guide |
|
||||
|
|
||||
### ✅ Healthy Performance Indicators |
|
||||
|
|
||||
1. **High-Level (TTFB)**: |
|
||||
- `conditional` < 2ms |
|
||||
- `versioning` < 15ms |
|
||||
- `entryFetch` < 3ms |
|
||||
- `stream` dominates total time (>70%) |
|
||||
|
|
||||
2. **Low-Level (Stream)**: |
|
||||
- Unencrypted: `streamExec` dominates |
|
||||
- Encrypted: `copy` dominates |
|
||||
- `chunkResolve` < 15ms |
|
||||
- `streamPrep` < 5ms |
|
||||
|
|
||||
### ⚠️ Performance Issues |
|
||||
|
|
||||
#### 1. Slow Versioning (`versioning` > 20ms) |
|
||||
**Symptom**: |
|
||||
``` |
|
||||
GET TTFB PROFILE: total=85ms | conditional=0.5ms, versioning=45ms, entryFetch=1ms, stream=38ms |
|
||||
``` |
|
||||
**Causes**: |
|
||||
- Filer slow to check versioning config |
|
||||
- Many versions to iterate through |
|
||||
- Filer under heavy load |
|
||||
|
|
||||
**Solutions**: |
|
||||
- Check filer CPU/memory |
|
||||
- Review filer metadata store performance |
|
||||
- Consider filer caching improvements |
|
||||
|
|
||||
#### 2. Slow Chunk Resolution (`chunkResolve` > 20ms) |
|
||||
**Symptom**: |
|
||||
``` |
|
||||
└─ streamFromVolumeServers: total=65ms, rangeParse=0.2ms, headerSet=0.4ms, chunkResolve=35ms, streamPrep=2ms, streamExec=27ms |
|
||||
``` |
|
||||
**Causes**: |
|
||||
- Filer LookupVolume RPC slow |
|
||||
- Network latency to filer |
|
||||
- Many chunks to resolve |
|
||||
|
|
||||
**Solutions**: |
|
||||
- Check network latency between S3 API and filer |
|
||||
- Review filer volume location cache |
|
||||
- Consider volume location caching in S3 API |
|
||||
|
|
||||
#### 3. Slow SSE Key Validation (`keyValidate` > 10ms) |
|
||||
**Symptom**: |
|
||||
``` |
|
||||
└─ streamFromVolumeServersWithSSE (SSE-KMS): total=75ms, keyValidate=25ms, headerSet=0.5ms, streamFetch=14ms, decryptSetup=1ms, copy=34ms |
|
||||
``` |
|
||||
**Causes**: |
|
||||
- KMS external service slow |
|
||||
- Complex key derivation |
|
||||
- Network latency to KMS |
|
||||
|
|
||||
**Solutions**: |
|
||||
- Review KMS service performance |
|
||||
- Consider key caching (if secure) |
|
||||
- Check network to KMS endpoint |
|
||||
|
|
||||
#### 4. Slow Data Transfer (`streamExec` or `copy` >> expected) |
|
||||
**Symptom**: |
|
||||
``` |
|
||||
└─ streamFromVolumeServers: total=180ms, rangeParse=0.2ms, headerSet=0.4ms, chunkResolve=8ms, streamPrep=2ms, streamExec=169ms |
|
||||
``` |
|
||||
**Causes**: |
|
||||
- Network bandwidth constrained |
|
||||
- Volume server disk I/O slow |
|
||||
- Volume server CPU overloaded |
|
||||
- Large object size |
|
||||
|
|
||||
**Solutions**: |
|
||||
- Check network throughput |
|
||||
- Review volume server metrics (CPU, disk I/O) |
|
||||
- Verify object size expectations |
|
||||
- Consider volume server scaling |
|
||||
|
|
||||
#### 5. High Decryption Overhead (SSE `copy` >> unencrypted `streamExec`) |
|
||||
**Symptom**: |
|
||||
``` |
|
||||
# Unencrypted: streamExec=30ms |
|
||||
# SSE-C: copy=85ms (283% overhead, should be ~20-30%) |
|
||||
``` |
|
||||
**Causes**: |
|
||||
- CPU lacking AES-NI instructions |
|
||||
- S3 API server CPU constrained |
|
||||
- Inefficient cipher implementation |
|
||||
|
|
||||
**Solutions**: |
|
||||
- Verify CPU has AES-NI support |
|
||||
- Scale S3 API server horizontally |
|
||||
- Review Go crypto performance |
|
||||
|
|
||||
## Comparison with Old Architecture |
|
||||
|
|
||||
### Before: Filer Proxy Path |
|
||||
``` |
|
||||
Total TTFB: ~75ms |
|
||||
├─ Entry Fetch: 8ms |
|
||||
├─ Proxy Setup: 19ms ⚠️ WASTED TIME |
|
||||
├─ Filer Proxy: 5ms ⚠️ EXTRA HOP |
|
||||
└─ Volume Stream: 43ms |
|
||||
``` |
|
||||
|
|
||||
### After: Direct Volume Reads |
|
||||
``` |
|
||||
Total TTFB: ~52ms (31% faster!) |
|
||||
├─ Conditional: 0.5ms |
|
||||
├─ Versioning: 8ms |
|
||||
├─ Entry Fetch: 1ms |
|
||||
└─ Stream: 42ms |
|
||||
├─ Chunk Resolve: 9ms |
|
||||
├─ Stream Prep: 2ms |
|
||||
└─ Stream Exec: 30ms ✅ DIRECT |
|
||||
``` |
|
||||
|
|
||||
**Improvements**: |
|
||||
- ✅ Eliminated 19ms proxy setup overhead |
|
||||
- ✅ Eliminated extra filer hop |
|
||||
- ✅ Reduced total TTFB by ~31% |
|
||||
- ✅ Detailed visibility into all stages |
|
||||
|
|
||||
## Integration with Load Testing |
|
||||
|
|
||||
### With warp |
|
||||
```bash |
|
||||
# Run load test |
|
||||
warp get --host localhost:8333 --access-key ... --secret-key ... \ |
|
||||
--obj.size 10MiB --duration 5m --concurrent 20 2>&1 | tee warp.log & |
|
||||
|
|
||||
# Monitor profiling in real-time |
|
||||
tail -f /path/to/s3.log | grep "GET TTFB PROFILE" |
|
||||
|
|
||||
# After test completes, analyze |
|
||||
grep "GET TTFB PROFILE" /path/to/s3.log > ttfb_results.txt |
|
||||
|
|
||||
# Calculate average per stage |
|
||||
echo "=== Conditional Headers ===" |
|
||||
grep "conditional=" ttfb_results.txt | grep -o "conditional=[^,]*" | \ |
|
||||
cut -d'=' -f2 | sed 's/ms//' | \ |
|
||||
awk '{sum+=$1; count++} END {print "Avg:", sum/count "ms"}' |
|
||||
|
|
||||
echo "=== Versioning Check ===" |
|
||||
grep "versioning=" ttfb_results.txt | grep -o "versioning=[^,]*" | \ |
|
||||
cut -d'=' -f2 | sed 's/ms//' | \ |
|
||||
awk '{sum+=$1; count++} END {print "Avg:", sum/count "ms"}' |
|
||||
|
|
||||
echo "=== Stream Time ===" |
|
||||
grep "stream=" ttfb_results.txt | grep -o "stream=[^,]*$" | \ |
|
||||
cut -d'=' -f2 | sed 's/ms//' | \ |
|
||||
awk '{sum+=$1; count++} END {print "Avg:", sum/count "ms"}' |
|
||||
``` |
|
||||
|
|
||||
### With s3-tests |
|
||||
```bash |
|
||||
# Run specific test with profiling |
|
||||
cd /path/to/s3-tests |
|
||||
tox -e py -- s3tests/functional/test_s3.py::test_object_read_not_exist -vv |
|
||||
|
|
||||
# Check profiling for that test |
|
||||
grep "GET TTFB PROFILE.*test-bucket" /tmp/s3.log |
|
||||
|
|
||||
# Example output: |
|
||||
# GET TTFB PROFILE test-bucket/nonexistent: total=12ms | conditional=0.3ms, versioning=8ms, entryFetch=3ms, stream=0ms |
|
||||
# (note: stream=0ms because object didn't exist) |
|
||||
``` |
|
||||
|
|
||||
## Real-World Example Analysis |
|
||||
|
|
||||
### Example: Slow Overall Performance |
|
||||
``` |
|
||||
GET TTFB PROFILE videos/movie.mp4: total=245ms | conditional=0.5ms, versioning=2ms, entryFetch=1ms, stream=241ms |
|
||||
└─ streamFromVolumeServers: total=241ms, rangeParse=0.3ms, headerSet=0.4ms, chunkResolve=8ms, streamPrep=2ms, streamExec=230ms |
|
||||
``` |
|
||||
|
|
||||
**Analysis**: |
|
||||
- ✅ Conditional/Versioning/Entry fast (good) |
|
||||
- ⚠️ `streamExec` = 230ms for unknown size object |
|
||||
- **Action**: Check object size. If 10MB, this is slow (expected ~30ms). If 100MB, this is reasonable. |
|
||||
|
|
||||
### Example: Versioning Bottleneck |
|
||||
``` |
|
||||
GET TTFB PROFILE docs/policy.pdf: total=95ms | conditional=0.4ms, versioning=78ms, entryFetch=1ms, stream=15ms |
|
||||
└─ streamFromVolumeServers: total=15ms, rangeParse=0.2ms, headerSet=0.3ms, chunkResolve=4ms, streamPrep=1ms, streamExec=9ms |
|
||||
``` |
|
||||
|
|
||||
**Analysis**: |
|
||||
- ⚠️ `versioning` = 78ms (very slow!) |
|
||||
- ✅ Stream fast (15ms) |
|
||||
- **Action**: Investigate filer versioning performance. Possibly many versions or slow filer. |
|
||||
|
|
||||
### Example: Perfect Performance |
|
||||
``` |
|
||||
GET TTFB PROFILE images/logo.png: total=18ms | conditional=0.2ms, versioning=0.1ms, entryFetch=0.5ms, stream=17ms |
|
||||
└─ streamFromVolumeServers: total=17ms, rangeParse=0.1ms, headerSet=0.2ms, chunkResolve=5ms, streamPrep=1ms, streamExec=10ms |
|
||||
``` |
|
||||
|
|
||||
**Analysis**: |
|
||||
- ✅ All stages optimal |
|
||||
- ✅ Small object streamed quickly |
|
||||
- ✅ No bottlenecks |
|
||||
- **Result**: Operating at peak performance! |
|
||||
|
|
||||
## Summary |
|
||||
|
|
||||
**Integrated TTFB profiling provides**: |
|
||||
- ✅ Complete visibility from request entry to completion |
|
||||
- ✅ High-level breakdown (conditional, versioning, entry, stream) |
|
||||
- ✅ Low-level breakdown (parse, resolve, prep, exec/decrypt) |
|
||||
- ✅ SSE-specific metrics (key validation, decryption) |
|
||||
- ✅ Easy identification of bottlenecks |
|
||||
- ✅ Hierarchical output for clarity |
|
||||
|
|
||||
**Enable with**: `-v=2` flag |
|
||||
**View with**: `grep "GET TTFB PROFILE" /path/to/logs` |
|
||||
**Analyze**: Use provided scripts to calculate statistics per stage |
|
||||
|
|
||||
Write
Preview
Loading…
Cancel
Save
Reference in new issue