Skip to content

Getting Started

Installation

Prerequisites

  • Python 3.11+
  • Apache Airflow 2.9.3
  • Connections to source and target systems

Package Installation

pip install fracttal-etl-hub

Initial Setup

  1. Configure Apache Airflow

    export AIRFLOW_HOME=/path/to/airflow
    airflow db init
    

  2. Configure Environment Variables

    export FRACTTAL_API_KEY=your_api_key
    export DATABASE_URL=postgresql://user:pass@localhost/db
    

Basic Configuration

ETL Structure

Every ETL in Fracttal follows this JSON-RPC 2.0 structure:

{
  "id": "unique-request-id",
  "jsonrpc": "2.0",
  "method": "etl.etl_update",
  "params": {
    "id": "unique-config-id",
    "config": {
      "source": { /* source configuration */ },
      "transform": { /* transformation logic */ },
      "target": { /* target configuration */ },
      "settings": { /* additional settings */ }
    },
    "environment": "production"
  }
}

Basic Connection

"source": {
  "connection": {
    "id_type": 1,
    "name": "Database"
  },
  "operation": "list_table",
  "parameters": {
    "table": "users"
  }
}

First ETL

Let's create a simple ETL that extracts data from a database and loads it into Google Sheets.

1. Prepare the Configuration

{
  "id": "etl-example-001",
  "jsonrpc": "2.0",
  "method": "etl.etl_update",
  "params": {
    "id": "config-example-001",
    "config": {
      "source": {
        "connection": {
          "id_type": 1,
          "name": "PostgreSQL",
          "parameters": {
            "host": "localhost",
            "port": 5432,
            "database": "myapp",
            "username": "etl_user",
            "password": "secure_password"
          }
        },
        "operation": "list_table",
        "parameters": {
          "table": "customers"
        }
      },
      "transform": {
        "rename": [
          [{"var": "first_name"}, "name", "string"],
          [{"var": "last_name"}, "surname", "string"],
          [{"var": "email"}, "email", "string"]
        ]
      },
      "target": {
        "connection": {
          "id_type": 3,
          "name": "Google Sheets",
          "parameters": {
            "credentials_file": "/path/to/service-account.json",
            "spreadsheet_id": "1BxiMVs0XRA5nFMdKvBdBZjgmUUqptlbs74OgvE2upms"
          }
        },
        "operation": "append",
        "parameters": {
          "sheet_name": "Customers"
        }
      }
    },
    "environment": "develop"
  }
}

2. Execute the ETL

from fracttal_etl import ETLHub

hub = ETLHub()
result = hub.execute_etl(etl_config)
print(f"ETL executed: {result}")

3. Verify Results

The data will have been loaded into the "Customers" sheet of the specified Google Sheet.

Next Steps