Performance Benchmarks
Métricas reales de rendimiento y técnicas de optimización para maximizar la eficiencia de tus ETLs.
Benchmarks de Referencia
Rendimiento Base por Tipo de Conexión
| Conexión | Registros/seg | Latencia promedio | Memoria peak | Casos de uso |
|---|---|---|---|---|
| MySQL | 850-1,200 | 45ms | 128MB | OLTP, transaccional |
| PostgreSQL | 750-1,000 | 52ms | 145MB | Analytics, reportes |
| Fracttal API | 320-480 | 180ms | 89MB | Datos de mantenimiento |
| Google Sheets | 45-85 | 1,200ms | 67MB | Reportes manuales |
| HTTP REST | 200-800 | 250ms | 95MB | Integraciones externas |
| Email SMTP | 15-25 | 2,500ms | 45MB | Notificaciones |
Factores de Rendimiento
graph LR
A[ETL Performance] --> B[Source Performance]
A --> C[Transform Complexity]
A --> D[Target Performance]
B --> E[Query Optimization]
C --> F[JSON Logic Steps]
D --> G[Network Latency]
classDef main fill:#e3f2fd,stroke:#1976d2,stroke-width:3px,color:#0d47a1
classDef factor fill:#f3e5f5,stroke:#7b1fa2,stroke-width:2px,color:#4a148c
classDef metric fill:#e8f5e8,stroke:#388e3c,stroke-width:2px,color:#1b5e20
class A main
class B,C,D factor
class E,F,G metric
D --> D1[Batch processing]
D --> D2[Connection pooling]
D --> D3[Rate limiting]
```
Casos de Estudio Reales
Caso 1: Sincronización de Órdenes (DB → ERP)
Escenario: 50K órdenes diarias desde MySQL a API externa
json
{
"benchmark": {
"records": 50000,
"execution_time": "18m 45s",
"throughput": "44 records/sec",
"memory_peak": "2.1GB",
"cpu_avg": "78%",
"issues": [
"Queries N+1",
"Sin batch processing",
"Transform en memoria completa"
]
}
}
json
{
"config": {
"source": {
"parameters": {
"query_string": {
"table": "orders",
"where": "status IN ('completed','shipped') AND updated_at >= CURRENT_DATE - INTERVAL 1 DAY",
"index_hint": "USE INDEX (idx_status_updated)",
"limit": 10000
}
}
},
"transform": {
"streaming": true,
"chunk_size": 500
},
"target": {
"parameters": {
"batch": true,
"batch_size": 100,
"parallel_workers": 4
}
}
},
"benchmark": {
"records": 50000,
"execution_time": "3m 12s",
"throughput": "260 records/sec",
"memory_peak": "245MB",
"cpu_avg": "32%",
"improvement": "83% faster, 88% less memory"
}
}
Caso 2: Reporte de Assets (Fracttal → Google Sheets)
Escenario: 15K assets para reporte mensual
json
{
"issues": {
"execution_time": "25m 14s",
"memory_usage": "1.8GB",
"api_calls": 15000,
"rate_limit_hits": 47,
"errors": [
"Google Sheets quota exceeded",
"Memory overflow",
"Individual API calls"
]
}
}
json
{
"optimizations": {
"source": {
"feature": "list_assets_batch",
"parameters": {
"fields": "id,name,code,location,status,last_maintenance",
"batch_size": 2000,
"apply_jsonpath": "$.data[*]"
}
},
"target": {
"feature": "batch_update",
"parameters": {
"range": "Assets!A2:F15001",
"value_input_option": "RAW",
"batch_size": 1000
}
}
},
"results": {
"execution_time": "4m 38s",
"memory_usage": "185MB",
"api_calls": 23,
"improvement": "82% faster, 90% less memory, 99% fewer API calls"
}
}
Técnicas de Optimización
1. Source Optimization
Prácticas recomendadas
json
{
"good_practices": {
"use_indexes": {
"query_string": {
"table": "work_orders",
"where": "status_id = 3 AND creation_date >= '2024-01-01'",
"index_hint": "USE INDEX (idx_status_creation)"
}
},
"limit_fields": {
"query_string": {
"fields": "id,code,description,creation_date",
"table": "work_orders"
}
},
"pagination": {
"query_string": {
"limit": 5000,
"offset": 0
}
}
}
}
Anti-patterns a evitar
{
"avoid": {
"select_all": "SELECT * FROM large_table",
"no_where": "SELECT id FROM users",
"no_indexes": "WHERE UPPER(name) LIKE '%SEARCH%'",
"large_results": "SELECT * FROM orders WHERE year = 2024"
}
}
2. Transform Optimization
JSON Logic eficiente
{
"efficient": {
"single_pass": {
"map": [
{"var": ""},
{
"merge": [
{"var": ""},
{
"formatted_date": {"format_date": [{"var": "date"}, "%Y-%m-%d", "%d/%m/%Y"]},
"clean_name": {"trim": [{"upper": [{"var": "name"}]}]},
"calculated_total": {"+": [{"var": "subtotal"}, {"*": [{"var": "subtotal"}, 0.16]}]}
}
]
}
]
}
},
"inefficient": {
"multiple_passes": [
{"map": [{"var": ""}, {"merge": [{"var": ""}, {"formatted_date": "..."}]}]},
{"map": [{"var": ""}, {"merge": [{"var": ""}, {"clean_name": "..."}]}]},
{"map": [{"var": ""}, {"merge": [{"var": ""}, {"calculated_total": "..."}]}]}
]
}
}
Memory Management
{
"streaming_transform": {
"transform": {
"streaming": true,
"chunk_size": 1000,
"parallel_chunks": true
}
},
"garbage_collection": {
"transform": {
"gc_frequency": 100,
"memory_limit": "512MB"
}
}
}
3. Target Optimization
Batch Processing
{
"database": {
"parameters": {
"batch": true,
"batch_size": 500,
"transaction_isolation": "READ_COMMITTED",
"connection_pool": {
"size": 5,
"max_overflow": 10
}
}
},
"api": {
"parameters": {
"batch": true,
"batch_size": 50,
"parallel_workers": 3,
"rate_limit": {
"requests_per_minute": 120,
"burst_limit": 20
}
}
}
}
Connection Pooling
{
"connection_optimization": {
"pool_size": 5,
"max_overflow": 10,
"pool_timeout": 30,
"pool_recycle": 3600,
"keepalive": true
}
}
Monitoring y Métricas
KPIs Clave
{
"performance_kpis": {
"throughput": {
"metric": "records_per_second",
"target": "> 100",
"alert_threshold": "< 50"
},
"latency": {
"metric": "avg_processing_time_ms",
"target": "< 5000",
"alert_threshold": "> 10000"
},
"memory": {
"metric": "peak_memory_mb",
"target": "< 500",
"alert_threshold": "> 1000"
},
"error_rate": {
"metric": "failed_records_percentage",
"target": "< 1%",
"alert_threshold": "> 5%"
}
}
}
Dashboard Ejemplo
{
"etl_dashboard": {
"execution_stats": {
"total_records": 45678,
"processed_successfully": 45421,
"failed_records": 257,
"execution_time": "4m 23s",
"throughput": "174 records/sec"
},
"resource_usage": {
"peak_memory": "328MB",
"avg_cpu": "23%",
"network_io": "45MB",
"disk_io": "12MB"
},
"bottlenecks": [
{
"component": "target_api",
"issue": "rate_limiting",
"impact": "15% slowdown"
}
]
}
}
Performance Tuning Checklist
Pre-ejecución
- Source queries optimizadas con índices
- Filtros WHERE reducen datos al mínimo necesario
- SELECT fields específicos, no
SELECT * - Batch sizes configurados apropiadamente
- Connection pools dimensionados correctamente
Durante ejecución
- Memory usage monitoreado < 500MB
- CPU usage balanceado < 70%
- Network latency optimizada
- Error rate < 1%
- Throughput dentro de targets
Post-ejecución
- Logs analizados para bottlenecks
- Métricas comparadas con benchmarks
- Alertas configuradas para degradación
- Optimizaciones documentadas para futuras ejecuciones
Benchmarks por Volumen de Datos
Small Scale (< 10K registros)
| Métrica | Target | Typical | Excellent |
|---|---|---|---|
| Execution Time | < 2 min | 30-90s | < 30s |
| Throughput | > 100/s | 150-300/s | > 500/s |
| Memory | < 200MB | 50-150MB | < 100MB |
| CPU | < 50% | 20-40% | < 30% |
Medium Scale (10K - 100K registros)
| Métrica | Target | Typical | Excellent |
|---|---|---|---|
| Execution Time | < 15 min | 5-12 min | < 5 min |
| Throughput | > 150/s | 200-400/s | > 600/s |
| Memory | < 500MB | 200-400MB | < 300MB |
| CPU | < 60% | 30-50% | < 40% |
Large Scale (> 100K registros)
| Métrica | Target | Typical | Excellent |
|---|---|---|---|
| Execution Time | < 60 min | 20-45 min | < 20 min |
| Throughput | > 200/s | 300-600/s | > 800/s |
| Memory | < 1GB | 400-800MB | < 600MB |
| CPU | < 70% | 40-60% | < 50% |
Recursos Adicionales
- Troubleshooting Guide - Resolver problemas de performance
- Advanced Configuration - Configuraciones avanzadas
- Monitoring Setup - Configurar monitoreo completo
- Scaling Strategies - Estrategias de escalamiento