Skip to content

Troubleshooting Hub

Complete guide to diagnose and resolve common issues in Fracttal ETL Hub.

Quick Diagnosis

What's your problem?

  • Connection Issues


    Timeouts, authentication failed, connection refused

    See solutions

  • Data Errors


    Invalid schemas, failed transformations, incorrect types

    See solutions

  • Performance Issues


    Slow ETLs, timeouts, high memory, high CPU

    See solutions

  • JSON Configuration


    Incorrect syntax, missing parameters, validation failed

    See solutions


Connection Issues

Connection Timeout

Symptoms:

{
  "error": "Connection timeout after 30 seconds",
  "code": "TIMEOUT_ERROR",
  "connection": "my_database"
}

Common Causes:

Cause Diagnosis Solution
Firewall blocking No network connectivity Open necessary ports
Incorrect credentials Authentication error Verify user/password
Overloaded server Slow server response Increase timeout or use off-peak hours
Slow network High latency Optimize query or use batch processing

Solutions:

{
  "source": {
    "id_type": 1,
    "host": "your-database.com",
    "timeout": 60,
    "connection_timeout": 30,
    "read_timeout": 120
  }
}
{
  "source": {
    "id_type": 1,
    "pool_size": 5,
    "max_overflow": 10,
    "pool_timeout": 30
  }
}
{
  "source": {
    "id_type": 1,
    "retry_attempts": 3,
    "retry_delay": 5,
    "backoff_factor": 2
  }
}

Authentication Failed

Symptoms:

{
  "error": "Access denied for user 'etl_user'@'host'",
  "code": "AUTH_ERROR"
}

Solutions:

Database Connections:

-- Grant necessary permissions
GRANT SELECT, INSERT, UPDATE ON database.* TO 'etl_user'@'%';
FLUSH PRIVILEGES;

API Connections:

{
  "source": {
    "id_type": 2,
    "authentication": {
      "type": "bearer",
      "token": "your-valid-token"
    }
  }
}

Connection Refused

Symptoms:

{
  "error": "Connection refused to localhost:3306",
  "code": "CONNECTION_REFUSED"
}

Diagnostic Steps:

  1. Check service status:

    # MySQL
    sudo systemctl status mysql
    
    # PostgreSQL
    sudo systemctl status postgresql
    

  2. Verify port availability:

    netstat -tlnp | grep :3306
    telnet your-host 3306
    

  3. Check firewall rules:

    # Ubuntu/Debian
    sudo ufw status
    
    # CentOS/RHEL
    sudo firewall-cmd --list-all
    


Data Errors

Schema Validation Failed

Symptoms:

{
  "error": "Column 'expected_field' not found in source data",
  "code": "SCHEMA_ERROR",
  "expected": ["id", "name", "email"],
  "received": ["id", "full_name", "email_address"]
}

Solutions:

{
  "transform": [
    {
      "rename": {
        "full_name": "name",
        "email_address": "email"
      }
    }
  ]
}
{
  "source": {
    "id_type": 1,
    "form": {
      "sql": "SELECT id, full_name as name, email_address as email FROM users"
    }
  }
}
{
  "transform": [
    {
      "validate_schema": {
        "required_fields": ["id", "name", "email"],
        "action_on_missing": "fill_default"
      }
    }
  ]
}

Data Type Mismatch

Symptoms:

{
  "error": "Cannot convert 'abc' to integer",
  "code": "TYPE_ERROR",
  "field": "age",
  "value": "abc",
  "expected_type": "integer"
}

Solutions:

{
  "transform": [
    {
      "convert_type": {
        "field": "age",
        "from_type": "string",
        "to_type": "integer",
        "default_value": 0
      }
    }
  ]
}
{
  "transform": [
    {
      "clean_data": {
        "field": "phone",
        "remove_chars": ["-", "(", ")", " "],
        "format": "numbers_only"
      }
    }
  ]
}

Missing Required Fields

Symptoms:

{
  "error": "Required field 'customer_id' is null or missing",
  "code": "REQUIRED_FIELD_ERROR"
}

Solutions:

{
  "transform": [
    {
      "fill_missing": {
        "customer_id": "UNKNOWN",
        "order_date": "1900-01-01",
        "status": "PENDING"
      }
    }
  ]
}
{
  "transform": [
    {
      "filter": {
        "condition": {
          "and": [
            {"customer_id": {"!=": null}},
            {"customer_id": {"!=": ""}}
          ]
        }
      }
    }
  ]
}

Performance Issues

Slow ETL Execution

Symptoms:

{
  "execution_time": "45 minutes",
  "records_processed": 10000,
  "avg_records_per_second": 3.7
}

Optimization Strategies:

{
  "source": {
    "id_type": 1,
    "form": {
      "sql": "SELECT * FROM large_table WHERE id BETWEEN ? AND ?",
      "batch_size": 1000,
      "parallel_batches": 4
    }
  }
}
{
  "source": {
    "id_type": 1,
    "form": {
      "sql": "SELECT id, name, email FROM users WHERE created_date >= '2024-01-01' AND status = 'active'",
      "use_index": "idx_created_status"
    }
  }
}
{
  "target": {
    "id_type": 3,
    "compression": "gzip",
    "compression_level": 6
  }
}

High Memory Usage

Symptoms:

ETL Process: 8.2GB RAM usage
Available: 2.1GB

Solutions:

{
  "processing": {
    "mode": "streaming",
    "buffer_size": 1000,
    "memory_limit": "2GB"
  }
}
{
  "source": {
    "id_type": 1,
    "form": {
      "sql": "SELECT * FROM users",
      "chunk_size": 5000,
      "process_chunks_separately": true
    }
  }
}

JSON Configuration

Syntax Errors

Symptoms:

{
  "error": "Invalid JSON syntax at line 15, column 23",
  "code": "JSON_SYNTAX_ERROR"
}

Common Mistakes:

Error Example Fix
Trailing comma {"a": 1,} {"a": 1}
Missing quotes {key: "value"} {"key": "value"}
Wrong quotes {'key': 'value'} {"key": "value"}
Missing bracket {"a": 1 {"a": 1}

Validation Tools:

# Use JSONLint
cat config.json | python -m json.tool

# Use jq
jq . config.json

# VS Code JSON validation (automatic)

Missing Required Parameters

Symptoms:

{
  "error": "Missing required parameter 'host' in source configuration",
  "code": "MISSING_PARAMETER"
}

Required Parameters by Connection Type:

{
  "source": {
    "id_type": 1,
    "host": "required",
    "port": "required",
    "database": "required",
    "username": "required",
    "password": "required"
  }
}
{
  "source": {
    "id_type": 2,
    "url": "required",
    "method": "required"
  }
}
{
  "target": {
    "id_type": 10,
    "credentials_path": "required",
    "form": {
      "spreadsheet_id": "required",
      "range": "required"
    }
  }
}

Advanced Debugging

Enable Debug Logging

{
  "config": {
    "debug": {
      "enabled": true,
      "level": "DEBUG",
      "log_sql_queries": true,
      "log_data_samples": true,
      "max_sample_records": 5
    }
  }
}

Performance Monitoring

{
  "config": {
    "monitoring": {
      "track_performance": true,
      "log_execution_time": true,
      "log_memory_usage": true,
      "alert_thresholds": {
        "execution_time_minutes": 30,
        "memory_usage_gb": 4
      }
    }
  }
}

Error Recovery

{
  "config": {
    "error_handling": {
      "on_connection_error": "retry",
      "on_data_error": "skip_record",
      "on_transform_error": "log_and_continue",
      "max_errors_per_batch": 10
    }
  }
}

Getting Help

Support Channels

Information to Include

When reporting issues, include:

  1. ETL Configuration (sanitized)
  2. Error logs (last 50 lines)
  3. Environment details (OS, Python version)
  4. Data samples (anonymized)
  5. Steps to reproduce

Remember: Most ETL issues are configuration-related. Double-check your JSON syntax and required parameters before seeking support.