Getting Started
Installation
Prerequisites
- Python 3.11+
- Apache Airflow 2.9.3
- Connections to source and target systems
Package Installation
pip install fracttal-etl-hub
Initial Setup
-
Configure Apache Airflow
export AIRFLOW_HOME=/path/to/airflow airflow db init -
Configure Environment Variables
export FRACTTAL_API_KEY=your_api_key export DATABASE_URL=postgresql://user:pass@localhost/db
Basic Configuration
ETL Structure
Every ETL in Fracttal follows this JSON-RPC 2.0 structure:
{
"id": "unique-request-id",
"jsonrpc": "2.0",
"method": "etl.etl_update",
"params": {
"id": "unique-config-id",
"config": {
"source": { /* source configuration */ },
"transform": { /* transformation logic */ },
"target": { /* target configuration */ },
"settings": { /* additional settings */ }
},
"environment": "production"
}
}
Basic Connection
"source": {
"connection": {
"id_type": 1,
"name": "Database"
},
"operation": "list_table",
"parameters": {
"table": "users"
}
}
First ETL
Let's create a simple ETL that extracts data from a database and loads it into Google Sheets.
1. Prepare the Configuration
{
"id": "etl-example-001",
"jsonrpc": "2.0",
"method": "etl.etl_update",
"params": {
"id": "config-example-001",
"config": {
"source": {
"connection": {
"id_type": 1,
"name": "PostgreSQL",
"parameters": {
"host": "localhost",
"port": 5432,
"database": "myapp",
"username": "etl_user",
"password": "secure_password"
}
},
"operation": "list_table",
"parameters": {
"table": "customers"
}
},
"transform": {
"rename": [
[{"var": "first_name"}, "name", "string"],
[{"var": "last_name"}, "surname", "string"],
[{"var": "email"}, "email", "string"]
]
},
"target": {
"connection": {
"id_type": 3,
"name": "Google Sheets",
"parameters": {
"credentials_file": "/path/to/service-account.json",
"spreadsheet_id": "1BxiMVs0XRA5nFMdKvBdBZjgmUUqptlbs74OgvE2upms"
}
},
"operation": "append",
"parameters": {
"sheet_name": "Customers"
}
}
},
"environment": "develop"
}
}
2. Execute the ETL
from fracttal_etl import ETLHub
hub = ETLHub()
result = hub.execute_etl(etl_config)
print(f"ETL executed: {result}")
3. Verify Results
The data will have been loaded into the "Customers" sheet of the specified Google Sheet.
Next Steps
- Explore all available connections
- Learn about advanced transformations
- Check the API reference