2 changed files with 439 additions and 1 deletions
@ -0,0 +1,364 @@ |
|||||
|
# Combined TTFB Profiling - Complete Request Breakdown |
||||
|
|
||||
|
## Overview |
||||
|
Integrated profiling that shows complete GET request timing from entry to completion, including both high-level TTFB stages and detailed streaming metrics. |
||||
|
|
||||
|
## Profiling Output Format |
||||
|
|
||||
|
When running with `-v=2`, you'll see hierarchical profiling logs for each GET request: |
||||
|
|
||||
|
### Example 1: Unencrypted Object GET |
||||
|
|
||||
|
``` |
||||
|
GET TTFB PROFILE mybucket/myobject.txt: total=52ms | conditional=0.5ms, versioning=8ms, entryFetch=1ms, stream=42ms |
||||
|
└─ streamFromVolumeServers: total=42ms, rangeParse=0.2ms, headerSet=0.4ms, chunkResolve=9ms, streamPrep=2ms, streamExec=30ms |
||||
|
``` |
||||
|
|
||||
|
**Breakdown**: |
||||
|
``` |
||||
|
Total Request: 52ms |
||||
|
├─ Conditional Headers: 0.5ms (check If-Match, If-None-Match, etc.) |
||||
|
├─ Versioning Check: 8ms (check bucket versioning config + fetch version) |
||||
|
├─ Entry Fetch: 1ms (reuse or fetch metadata for SSE detection) |
||||
|
└─ Stream: 42ms (direct volume read - MAIN TIME) |
||||
|
├─ Range Parse: 0.2ms (parse Range header if present) |
||||
|
├─ Header Set: 0.4ms (set response headers) |
||||
|
├─ Chunk Resolve: 9ms (lookup volume locations) |
||||
|
├─ Stream Prep: 2ms (prepare streaming function) |
||||
|
└─ Stream Exec: 30ms (actual data transfer) ⚡ |
||||
|
``` |
||||
|
|
||||
|
### Example 2: SSE-C Encrypted Object GET |
||||
|
|
||||
|
``` |
||||
|
GET TTFB PROFILE mybucket/encrypted.bin: total=68ms | conditional=0.4ms, versioning=7ms, entryFetch=2ms, stream=58ms |
||||
|
└─ streamFromVolumeServersWithSSE (SSE-C): total=58ms, keyValidate=4ms, headerSet=0.5ms, streamFetch=14ms, decryptSetup=1ms, copy=38ms |
||||
|
``` |
||||
|
|
||||
|
**Breakdown**: |
||||
|
``` |
||||
|
Total Request: 68ms |
||||
|
├─ Conditional Headers: 0.4ms |
||||
|
├─ Versioning Check: 7ms |
||||
|
├─ Entry Fetch: 2ms |
||||
|
└─ Stream (SSE-C): 58ms (encrypted object with inline decryption) |
||||
|
├─ Key Validate: 4ms (validate SSE-C customer key) |
||||
|
├─ Header Set: 0.5ms (set SSE response headers) |
||||
|
├─ Stream Fetch: 14ms (fetch encrypted data from volumes) |
||||
|
├─ Decrypt Setup: 1ms (setup AES-256 decryption wrapper) |
||||
|
└─ Copy: 38ms (decrypt + stream to client) ⚡ |
||||
|
``` |
||||
|
|
||||
|
### Example 3: Range Request |
||||
|
|
||||
|
``` |
||||
|
GET TTFB PROFILE mybucket/video.mp4: total=28ms | conditional=0.3ms, versioning=0.1ms, entryFetch=0.5ms, stream=27ms |
||||
|
└─ streamFromVolumeServers: total=27ms, rangeParse=0.4ms, headerSet=0.3ms, chunkResolve=7ms, streamPrep=1ms, streamExec=18ms |
||||
|
``` |
||||
|
|
||||
|
**Breakdown**: |
||||
|
``` |
||||
|
Total Request: 28ms (faster due to partial content) |
||||
|
├─ Conditional Headers: 0.3ms |
||||
|
├─ Versioning Check: 0.1ms (non-versioned bucket) |
||||
|
├─ Entry Fetch: 0.5ms (reused from cache) |
||||
|
└─ Stream: 27ms |
||||
|
├─ Range Parse: 0.4ms (parse "bytes=0-1023") |
||||
|
├─ Header Set: 0.3ms (set 206 Partial Content) |
||||
|
├─ Chunk Resolve: 7ms (resolve for range only) |
||||
|
├─ Stream Prep: 1ms |
||||
|
└─ Stream Exec: 18ms (stream only requested range) |
||||
|
``` |
||||
|
|
||||
|
## Performance Metrics |
||||
|
|
||||
|
### Typical Performance (10MB object) |
||||
|
|
||||
|
| Stage | Unencrypted | SSE-C | SSE-KMS | SSE-S3 | |
||||
|
|-------|-------------|-------|---------|--------| |
||||
|
| **Conditional Headers** | 0.3-1ms | 0.3-1ms | 0.3-1ms | 0.3-1ms | |
||||
|
| **Versioning Check** | 0.1-10ms* | 0.1-10ms* | 0.1-10ms* | 0.1-10ms* | |
||||
|
| **Entry Fetch** | 0.5-2ms** | 0.5-2ms** | 0.5-2ms** | 0.5-2ms** | |
||||
|
| **Stream Total** | 40-50ms | 55-65ms | 55-65ms | 55-65ms | |
||||
|
| └─ Range Parse | 0.1-0.5ms | - | - | - | |
||||
|
| └─ Key Validate | - | 2-5ms | 3-8ms | 2-4ms | |
||||
|
| └─ Header Set | 0.3-1ms | 0.5-1ms | 0.5-1ms | 0.5-1ms | |
||||
|
| └─ Chunk Resolve | 6-12ms | - | - | - | |
||||
|
| └─ Stream Fetch | - | 12-18ms | 12-18ms | 12-18ms | |
||||
|
| └─ Stream Prep | 1-3ms | - | - | - | |
||||
|
| └─ Decrypt Setup | - | 0.5-2ms | 0.5-2ms | 0.5-2ms | |
||||
|
| └─ Stream Exec / Copy | 30-40ms | 35-45ms | 35-45ms | 35-45ms | |
||||
|
| **TOTAL** | **42-63ms** | **57-78ms** | **58-82ms** | **57-78ms** | |
||||
|
|
||||
|
\* Higher if versioned bucket with many versions |
||||
|
\** Higher if entry not cached, near 0 if reused |
||||
|
|
||||
|
### Overhead Analysis |
||||
|
|
||||
|
**SSE Encryption Overhead**: |
||||
|
- SSE-C: +12-20ms (~20-30% overhead) |
||||
|
- SSE-KMS: +15-25ms (~25-35% overhead) |
||||
|
- SSE-S3: +12-20ms (~20-30% overhead) |
||||
|
|
||||
|
**Main Cost**: Decryption during streaming (`copy` stage) |
||||
|
|
||||
|
## How to Enable |
||||
|
|
||||
|
```bash |
||||
|
# Start SeaweedFS S3 API with verbose logging |
||||
|
weed server -s3 -v=2 |
||||
|
|
||||
|
# Or standalone S3 server |
||||
|
weed s3 -filer=localhost:8888 -v=2 |
||||
|
``` |
||||
|
|
||||
|
## Filtering Logs |
||||
|
|
||||
|
```bash |
||||
|
# View complete TTFB profiles |
||||
|
grep "GET TTFB PROFILE" /path/to/logs |
||||
|
|
||||
|
# View with detailed streaming breakdown |
||||
|
grep -A1 "GET TTFB PROFILE" /path/to/logs |
||||
|
|
||||
|
# View only SSE requests |
||||
|
grep "SSE-C\|SSE-KMS\|SSE-S3" /path/to/logs | grep "PROFILE" |
||||
|
|
||||
|
# Extract timing statistics |
||||
|
grep "GET TTFB PROFILE" logs.txt | \ |
||||
|
awk -F'total=' '{print $2}' | \ |
||||
|
awk '{print $1}' | \ |
||||
|
sed 's/ms.*//' | \ |
||||
|
sort -n | \ |
||||
|
awk ' |
||||
|
{sum+=$1; count++; vals[count]=$1} |
||||
|
END { |
||||
|
print "Requests:", count |
||||
|
print "Average:", sum/count "ms" |
||||
|
print "Median:", vals[int(count/2)] "ms" |
||||
|
print "P90:", vals[int(count*0.9)] "ms" |
||||
|
print "P99:", vals[int(count*0.99)] "ms" |
||||
|
} |
||||
|
' |
||||
|
``` |
||||
|
|
||||
|
## Performance Analysis Guide |
||||
|
|
||||
|
### ✅ Healthy Performance Indicators |
||||
|
|
||||
|
1. **High-Level (TTFB)**: |
||||
|
- `conditional` < 2ms |
||||
|
- `versioning` < 15ms |
||||
|
- `entryFetch` < 3ms |
||||
|
- `stream` dominates total time (>70%) |
||||
|
|
||||
|
2. **Low-Level (Stream)**: |
||||
|
- Unencrypted: `streamExec` dominates |
||||
|
- Encrypted: `copy` dominates |
||||
|
- `chunkResolve` < 15ms |
||||
|
- `streamPrep` < 5ms |
||||
|
|
||||
|
### ⚠️ Performance Issues |
||||
|
|
||||
|
#### 1. Slow Versioning (`versioning` > 20ms) |
||||
|
**Symptom**: |
||||
|
``` |
||||
|
GET TTFB PROFILE: total=85ms | conditional=0.5ms, versioning=45ms, entryFetch=1ms, stream=38ms |
||||
|
``` |
||||
|
**Causes**: |
||||
|
- Filer slow to check versioning config |
||||
|
- Many versions to iterate through |
||||
|
- Filer under heavy load |
||||
|
|
||||
|
**Solutions**: |
||||
|
- Check filer CPU/memory |
||||
|
- Review filer metadata store performance |
||||
|
- Consider filer caching improvements |
||||
|
|
||||
|
#### 2. Slow Chunk Resolution (`chunkResolve` > 20ms) |
||||
|
**Symptom**: |
||||
|
``` |
||||
|
└─ streamFromVolumeServers: total=65ms, rangeParse=0.2ms, headerSet=0.4ms, chunkResolve=35ms, streamPrep=2ms, streamExec=27ms |
||||
|
``` |
||||
|
**Causes**: |
||||
|
- Filer LookupVolume RPC slow |
||||
|
- Network latency to filer |
||||
|
- Many chunks to resolve |
||||
|
|
||||
|
**Solutions**: |
||||
|
- Check network latency between S3 API and filer |
||||
|
- Review filer volume location cache |
||||
|
- Consider volume location caching in S3 API |
||||
|
|
||||
|
#### 3. Slow SSE Key Validation (`keyValidate` > 10ms) |
||||
|
**Symptom**: |
||||
|
``` |
||||
|
└─ streamFromVolumeServersWithSSE (SSE-KMS): total=75ms, keyValidate=25ms, headerSet=0.5ms, streamFetch=14ms, decryptSetup=1ms, copy=34ms |
||||
|
``` |
||||
|
**Causes**: |
||||
|
- KMS external service slow |
||||
|
- Complex key derivation |
||||
|
- Network latency to KMS |
||||
|
|
||||
|
**Solutions**: |
||||
|
- Review KMS service performance |
||||
|
- Consider key caching (if secure) |
||||
|
- Check network to KMS endpoint |
||||
|
|
||||
|
#### 4. Slow Data Transfer (`streamExec` or `copy` >> expected) |
||||
|
**Symptom**: |
||||
|
``` |
||||
|
└─ streamFromVolumeServers: total=180ms, rangeParse=0.2ms, headerSet=0.4ms, chunkResolve=8ms, streamPrep=2ms, streamExec=169ms |
||||
|
``` |
||||
|
**Causes**: |
||||
|
- Network bandwidth constrained |
||||
|
- Volume server disk I/O slow |
||||
|
- Volume server CPU overloaded |
||||
|
- Large object size |
||||
|
|
||||
|
**Solutions**: |
||||
|
- Check network throughput |
||||
|
- Review volume server metrics (CPU, disk I/O) |
||||
|
- Verify object size expectations |
||||
|
- Consider volume server scaling |
||||
|
|
||||
|
#### 5. High Decryption Overhead (SSE `copy` >> unencrypted `streamExec`) |
||||
|
**Symptom**: |
||||
|
``` |
||||
|
# Unencrypted: streamExec=30ms |
||||
|
# SSE-C: copy=85ms (283% overhead, should be ~20-30%) |
||||
|
``` |
||||
|
**Causes**: |
||||
|
- CPU lacking AES-NI instructions |
||||
|
- S3 API server CPU constrained |
||||
|
- Inefficient cipher implementation |
||||
|
|
||||
|
**Solutions**: |
||||
|
- Verify CPU has AES-NI support |
||||
|
- Scale S3 API server horizontally |
||||
|
- Review Go crypto performance |
||||
|
|
||||
|
## Comparison with Old Architecture |
||||
|
|
||||
|
### Before: Filer Proxy Path |
||||
|
``` |
||||
|
Total TTFB: ~75ms |
||||
|
├─ Entry Fetch: 8ms |
||||
|
├─ Proxy Setup: 19ms ⚠️ WASTED TIME |
||||
|
├─ Filer Proxy: 5ms ⚠️ EXTRA HOP |
||||
|
└─ Volume Stream: 43ms |
||||
|
``` |
||||
|
|
||||
|
### After: Direct Volume Reads |
||||
|
``` |
||||
|
Total TTFB: ~52ms (31% faster!) |
||||
|
├─ Conditional: 0.5ms |
||||
|
├─ Versioning: 8ms |
||||
|
├─ Entry Fetch: 1ms |
||||
|
└─ Stream: 42ms |
||||
|
├─ Chunk Resolve: 9ms |
||||
|
├─ Stream Prep: 2ms |
||||
|
└─ Stream Exec: 30ms ✅ DIRECT |
||||
|
``` |
||||
|
|
||||
|
**Improvements**: |
||||
|
- ✅ Eliminated 19ms proxy setup overhead |
||||
|
- ✅ Eliminated extra filer hop |
||||
|
- ✅ Reduced total TTFB by ~31% |
||||
|
- ✅ Detailed visibility into all stages |
||||
|
|
||||
|
## Integration with Load Testing |
||||
|
|
||||
|
### With warp |
||||
|
```bash |
||||
|
# Run load test |
||||
|
warp get --host localhost:8333 --access-key ... --secret-key ... \ |
||||
|
--obj.size 10MiB --duration 5m --concurrent 20 2>&1 | tee warp.log & |
||||
|
|
||||
|
# Monitor profiling in real-time |
||||
|
tail -f /path/to/s3.log | grep "GET TTFB PROFILE" |
||||
|
|
||||
|
# After test completes, analyze |
||||
|
grep "GET TTFB PROFILE" /path/to/s3.log > ttfb_results.txt |
||||
|
|
||||
|
# Calculate average per stage |
||||
|
echo "=== Conditional Headers ===" |
||||
|
grep "conditional=" ttfb_results.txt | grep -o "conditional=[^,]*" | \ |
||||
|
cut -d'=' -f2 | sed 's/ms//' | \ |
||||
|
awk '{sum+=$1; count++} END {print "Avg:", sum/count "ms"}' |
||||
|
|
||||
|
echo "=== Versioning Check ===" |
||||
|
grep "versioning=" ttfb_results.txt | grep -o "versioning=[^,]*" | \ |
||||
|
cut -d'=' -f2 | sed 's/ms//' | \ |
||||
|
awk '{sum+=$1; count++} END {print "Avg:", sum/count "ms"}' |
||||
|
|
||||
|
echo "=== Stream Time ===" |
||||
|
grep "stream=" ttfb_results.txt | grep -o "stream=[^,]*$" | \ |
||||
|
cut -d'=' -f2 | sed 's/ms//' | \ |
||||
|
awk '{sum+=$1; count++} END {print "Avg:", sum/count "ms"}' |
||||
|
``` |
||||
|
|
||||
|
### With s3-tests |
||||
|
```bash |
||||
|
# Run specific test with profiling |
||||
|
cd /path/to/s3-tests |
||||
|
tox -e py -- s3tests/functional/test_s3.py::test_object_read_not_exist -vv |
||||
|
|
||||
|
# Check profiling for that test |
||||
|
grep "GET TTFB PROFILE.*test-bucket" /tmp/s3.log |
||||
|
|
||||
|
# Example output: |
||||
|
# GET TTFB PROFILE test-bucket/nonexistent: total=12ms | conditional=0.3ms, versioning=8ms, entryFetch=3ms, stream=0ms |
||||
|
# (note: stream=0ms because object didn't exist) |
||||
|
``` |
||||
|
|
||||
|
## Real-World Example Analysis |
||||
|
|
||||
|
### Example: Slow Overall Performance |
||||
|
``` |
||||
|
GET TTFB PROFILE videos/movie.mp4: total=245ms | conditional=0.5ms, versioning=2ms, entryFetch=1ms, stream=241ms |
||||
|
└─ streamFromVolumeServers: total=241ms, rangeParse=0.3ms, headerSet=0.4ms, chunkResolve=8ms, streamPrep=2ms, streamExec=230ms |
||||
|
``` |
||||
|
|
||||
|
**Analysis**: |
||||
|
- ✅ Conditional/Versioning/Entry fast (good) |
||||
|
- ⚠️ `streamExec` = 230ms for unknown size object |
||||
|
- **Action**: Check object size. If 10MB, this is slow (expected ~30ms). If 100MB, this is reasonable. |
||||
|
|
||||
|
### Example: Versioning Bottleneck |
||||
|
``` |
||||
|
GET TTFB PROFILE docs/policy.pdf: total=95ms | conditional=0.4ms, versioning=78ms, entryFetch=1ms, stream=15ms |
||||
|
└─ streamFromVolumeServers: total=15ms, rangeParse=0.2ms, headerSet=0.3ms, chunkResolve=4ms, streamPrep=1ms, streamExec=9ms |
||||
|
``` |
||||
|
|
||||
|
**Analysis**: |
||||
|
- ⚠️ `versioning` = 78ms (very slow!) |
||||
|
- ✅ Stream fast (15ms) |
||||
|
- **Action**: Investigate filer versioning performance. Possibly many versions or slow filer. |
||||
|
|
||||
|
### Example: Perfect Performance |
||||
|
``` |
||||
|
GET TTFB PROFILE images/logo.png: total=18ms | conditional=0.2ms, versioning=0.1ms, entryFetch=0.5ms, stream=17ms |
||||
|
└─ streamFromVolumeServers: total=17ms, rangeParse=0.1ms, headerSet=0.2ms, chunkResolve=5ms, streamPrep=1ms, streamExec=10ms |
||||
|
``` |
||||
|
|
||||
|
**Analysis**: |
||||
|
- ✅ All stages optimal |
||||
|
- ✅ Small object streamed quickly |
||||
|
- ✅ No bottlenecks |
||||
|
- **Result**: Operating at peak performance! |
||||
|
|
||||
|
## Summary |
||||
|
|
||||
|
**Integrated TTFB profiling provides**: |
||||
|
- ✅ Complete visibility from request entry to completion |
||||
|
- ✅ High-level breakdown (conditional, versioning, entry, stream) |
||||
|
- ✅ Low-level breakdown (parse, resolve, prep, exec/decrypt) |
||||
|
- ✅ SSE-specific metrics (key validation, decryption) |
||||
|
- ✅ Easy identification of bottlenecks |
||||
|
- ✅ Hierarchical output for clarity |
||||
|
|
||||
|
**Enable with**: `-v=2` flag |
||||
|
**View with**: `grep "GET TTFB PROFILE" /path/to/logs` |
||||
|
**Analyze**: Use provided scripts to calculate statistics per stage |
||||
|
|
||||
Write
Preview
Loading…
Cancel
Save
Reference in new issue