Saltar a contenido

Performance Benchmarks

Métricas reales de rendimiento y técnicas de optimización para maximizar la eficiencia de tus ETLs.

Benchmarks de Referencia

Rendimiento Base por Tipo de Conexión

Conexión Registros/seg Latencia promedio Memoria peak Casos de uso
MySQL 850-1,200 45ms 128MB OLTP, transaccional
PostgreSQL 750-1,000 52ms 145MB Analytics, reportes
Fracttal API 320-480 180ms 89MB Datos de mantenimiento
Google Sheets 45-85 1,200ms 67MB Reportes manuales
HTTP REST 200-800 250ms 95MB Integraciones externas
Email SMTP 15-25 2,500ms 45MB Notificaciones

Factores de Rendimiento

graph LR
    A[ETL Performance] --> B[Source Performance]
    A --> C[Transform Complexity]
    A --> D[Target Performance]
    B --> E[Query Optimization]
    C --> F[JSON Logic Steps]
    D --> G[Network Latency]

    classDef main fill:#e3f2fd,stroke:#1976d2,stroke-width:3px,color:#0d47a1
    classDef factor fill:#f3e5f5,stroke:#7b1fa2,stroke-width:2px,color:#4a148c
    classDef metric fill:#e8f5e8,stroke:#388e3c,stroke-width:2px,color:#1b5e20

    class A main
    class B,C,D factor
    class E,F,G metric
D --> D1[Batch processing]
D --> D2[Connection pooling]
D --> D3[Rate limiting]

```


Casos de Estudio Reales

Caso 1: Sincronización de Órdenes (DB → ERP)

Escenario: 50K órdenes diarias desde MySQL a API externa

json { "benchmark": { "records": 50000, "execution_time": "18m 45s", "throughput": "44 records/sec", "memory_peak": "2.1GB", "cpu_avg": "78%", "issues": [ "Queries N+1", "Sin batch processing", "Transform en memoria completa" ] } }

json { "config": { "source": { "parameters": { "query_string": { "table": "orders", "where": "status IN ('completed','shipped') AND updated_at >= CURRENT_DATE - INTERVAL 1 DAY", "index_hint": "USE INDEX (idx_status_updated)", "limit": 10000 } } }, "transform": { "streaming": true, "chunk_size": 500 }, "target": { "parameters": { "batch": true, "batch_size": 100, "parallel_workers": 4 } } }, "benchmark": { "records": 50000, "execution_time": "3m 12s", "throughput": "260 records/sec", "memory_peak": "245MB", "cpu_avg": "32%", "improvement": "83% faster, 88% less memory" } }

Caso 2: Reporte de Assets (Fracttal → Google Sheets)

Escenario: 15K assets para reporte mensual

json { "issues": { "execution_time": "25m 14s", "memory_usage": "1.8GB", "api_calls": 15000, "rate_limit_hits": 47, "errors": [ "Google Sheets quota exceeded", "Memory overflow", "Individual API calls" ] } }

json { "optimizations": { "source": { "feature": "list_assets_batch", "parameters": { "fields": "id,name,code,location,status,last_maintenance", "batch_size": 2000, "apply_jsonpath": "$.data[*]" } }, "target": { "feature": "batch_update", "parameters": { "range": "Assets!A2:F15001", "value_input_option": "RAW", "batch_size": 1000 } } }, "results": { "execution_time": "4m 38s", "memory_usage": "185MB", "api_calls": 23, "improvement": "82% faster, 90% less memory, 99% fewer API calls" } }


Técnicas de Optimización

1. Source Optimization

Prácticas recomendadas

json { "good_practices": { "use_indexes": { "query_string": { "table": "work_orders", "where": "status_id = 3 AND creation_date >= '2024-01-01'", "index_hint": "USE INDEX (idx_status_creation)" } }, "limit_fields": { "query_string": { "fields": "id,code,description,creation_date", "table": "work_orders" } }, "pagination": { "query_string": { "limit": 5000, "offset": 0 } } } }

Anti-patterns a evitar

{
  "avoid": {
    "select_all": "SELECT * FROM large_table",
    "no_where": "SELECT id FROM users",
    "no_indexes": "WHERE UPPER(name) LIKE '%SEARCH%'",
    "large_results": "SELECT * FROM orders WHERE year = 2024"
  }
}

2. Transform Optimization

JSON Logic eficiente

{
  "efficient": {
    "single_pass": {
      "map": [
        {"var": ""},
        {
          "merge": [
            {"var": ""},
            {
              "formatted_date": {"format_date": [{"var": "date"}, "%Y-%m-%d", "%d/%m/%Y"]},
              "clean_name": {"trim": [{"upper": [{"var": "name"}]}]},
              "calculated_total": {"+": [{"var": "subtotal"}, {"*": [{"var": "subtotal"}, 0.16]}]}
            }
          ]
        }
      ]
    }
  },
  "inefficient": {
    "multiple_passes": [
      {"map": [{"var": ""}, {"merge": [{"var": ""}, {"formatted_date": "..."}]}]},
      {"map": [{"var": ""}, {"merge": [{"var": ""}, {"clean_name": "..."}]}]},
      {"map": [{"var": ""}, {"merge": [{"var": ""}, {"calculated_total": "..."}]}]}
    ]
  }
}

Memory Management

{
  "streaming_transform": {
    "transform": {
      "streaming": true,
      "chunk_size": 1000,
      "parallel_chunks": true
    }
  },
  "garbage_collection": {
    "transform": {
      "gc_frequency": 100,
      "memory_limit": "512MB"
    }
  }
}

3. Target Optimization

Batch Processing

{
  "database": {
    "parameters": {
      "batch": true,
      "batch_size": 500,
      "transaction_isolation": "READ_COMMITTED",
      "connection_pool": {
        "size": 5,
        "max_overflow": 10
      }
    }
  },
  "api": {
    "parameters": {
      "batch": true,
      "batch_size": 50,
      "parallel_workers": 3,
      "rate_limit": {
        "requests_per_minute": 120,
        "burst_limit": 20
      }
    }
  }
}

Connection Pooling

{
  "connection_optimization": {
    "pool_size": 5,
    "max_overflow": 10,
    "pool_timeout": 30,
    "pool_recycle": 3600,
    "keepalive": true
  }
}

Monitoring y Métricas

KPIs Clave

{
  "performance_kpis": {
    "throughput": {
      "metric": "records_per_second",
      "target": "> 100",
      "alert_threshold": "< 50"
    },
    "latency": {
      "metric": "avg_processing_time_ms",
      "target": "< 5000",
      "alert_threshold": "> 10000"
    },
    "memory": {
      "metric": "peak_memory_mb",
      "target": "< 500",
      "alert_threshold": "> 1000"
    },
    "error_rate": {
      "metric": "failed_records_percentage",
      "target": "< 1%",
      "alert_threshold": "> 5%"
    }
  }
}

Dashboard Ejemplo

{
  "etl_dashboard": {
    "execution_stats": {
      "total_records": 45678,
      "processed_successfully": 45421,
      "failed_records": 257,
      "execution_time": "4m 23s",
      "throughput": "174 records/sec"
    },
    "resource_usage": {
      "peak_memory": "328MB",
      "avg_cpu": "23%",
      "network_io": "45MB",
      "disk_io": "12MB"
    },
    "bottlenecks": [
      {
        "component": "target_api",
        "issue": "rate_limiting",
        "impact": "15% slowdown"
      }
    ]
  }
}

Performance Tuning Checklist

Pre-ejecución

  • Source queries optimizadas con índices
  • Filtros WHERE reducen datos al mínimo necesario
  • SELECT fields específicos, no SELECT *
  • Batch sizes configurados apropiadamente
  • Connection pools dimensionados correctamente

Durante ejecución

  • Memory usage monitoreado < 500MB
  • CPU usage balanceado < 70%
  • Network latency optimizada
  • Error rate < 1%
  • Throughput dentro de targets

Post-ejecución

  • Logs analizados para bottlenecks
  • Métricas comparadas con benchmarks
  • Alertas configuradas para degradación
  • Optimizaciones documentadas para futuras ejecuciones

Benchmarks por Volumen de Datos

Small Scale (< 10K registros)

Métrica Target Typical Excellent
Execution Time < 2 min 30-90s < 30s
Throughput > 100/s 150-300/s > 500/s
Memory < 200MB 50-150MB < 100MB
CPU < 50% 20-40% < 30%

Medium Scale (10K - 100K registros)

Métrica Target Typical Excellent
Execution Time < 15 min 5-12 min < 5 min
Throughput > 150/s 200-400/s > 600/s
Memory < 500MB 200-400MB < 300MB
CPU < 60% 30-50% < 40%

Large Scale (> 100K registros)

Métrica Target Typical Excellent
Execution Time < 60 min 20-45 min < 20 min
Throughput > 200/s 300-600/s > 800/s
Memory < 1GB 400-800MB < 600MB
CPU < 70% 40-60% < 50%

Recursos Adicionales