Apache Airflow
The batch orchestrator that moved 44 background jobs off the paiddaily.io API event loop. Self-hosted, DAG-based, Python-native. Runs every data pipeline, price refresh, and scoring job on a cron — so the API serves pure reads from Postgres with zero blocking I/O.
Apache Airflow is the scheduler that keeps the API fast. Every background job that used to block the paiddaily.io event loop now runs as an Airflow DAG on a cron. This page covers how it fits the GL stack, not the full Airflow ecosystem — official docs at airflow.apache.org own the reference.
What it is
A platform for programmatically authoring, scheduling, and monitoring workflows. Workflows are defined as Python DAGs (directed acyclic graphs) — each node is a task, each edge is a dependency. Apache 2.0 license. Self-hosted on Docker Compose or Kubernetes; managed offerings from Astronomer and GCP Cloud Composer exist but aren't used here.
The pitch: Airflow is the right tool when the workload is batch, the schedule is cron-shaped, and the orchestration logic is "run these steps in order on a timer." It's not a streaming engine, not a real-time event bus, not a task queue. It's a scheduler that knows about retries, dependencies, backfills, and failure callbacks.
At a glance
Core concepts
- DAG — a Python file defining a workflow as a directed acyclic graph. Tasks run in dependency order.
- Task — a single unit of work. Usually a Python function decorated with
@task. - Operator — a pre-built task template (PythonOperator, BashOperator, etc.). The
@taskdecorator is syntactic sugar for PythonOperator. - Schedule — cron expression or preset (
@daily,@hourly). Defines when the DAG runs. - XCom — cross-communication between tasks. Return values from
@taskfunctions are automatically pushed/pulled as XComs. - Connection — a stored credential reference (database DSN, API key). Managed through the Airflow UI or env vars.
- Callback — function called on task/DAG success or failure. Used for alerting.
The GL deployment shape
Self-hosted on Docker Compose alongside the paiddaily.io stack. Single-node setup: webserver (UI on port 8080), scheduler, triggerer, and Postgres metadata database. DAGs live in airflow/dags/, shared code in airflow/include/gl_paiddaily_common/.
44 DAGs total as of 2026-05-27. Two patterns:
- Direct domain calls (16 DAGs) — import and call the same async domain functions the API uses, via a
run_domain()sync wrapper that manages the asyncpg pool lifecycle. - Inline HTTP/RPC (11 DAGs) — fetch data from external APIs (Boros, Pendle, DeFiLlama), transform, and write to Postgres via stored procs.
- Hybrid (remaining) — mix of both patterns depending on the pipeline shape.
How to integrate
The paiddaily.io pattern for adding a new scheduled job:
- Write the domain function in
features/. Pure async Python, takes a db pool, returns a result. Test it in isolation. - Create a DAG in
airflow/dags/. Import_pathfor sys.path setup. Use@dagand@taskdecorators. Callrun_domain("features.module.path", "function_name")from the task. - Set the schedule. Cron expression in
schedule=. Think about ordering — if this DAG depends on data from another DAG, setstart_dateand schedule accordingly. - Wire failure alerting. Pass
on_failure_callback=telegram_on_failureindefault_argsor DAG-level kwargs. - Remove the old in-process job. Delete the APScheduler / background-loop code from the API. The API should only serve reads.
The domain_runner pattern
from gl_paiddaily_common.domain_runner import run_domain
result = run_domain(
"features.aerodrome.jobs.veaero_lock_sync",
"sync_lock",
wallet_lc,
)
run_domain initializes an asyncpg pool, calls the async function, tears down. This lets DAGs call the exact same code the API was calling — no HTTP round-trip, no separate client, no API auth.
Gotchas
- DAGs are parsed on every scheduler heartbeat. Heavy imports at module level slow down the scheduler. Keep DAG files thin — import the real logic inside the
@taskfunction body, not at the top of the file. run_domain()creates a new connection pool per task invocation. Fine for cron-cadence jobs (minutes/hours). Would be wasteful for sub-second scheduling — but that's not what Airflow is for.- XCom serializes to JSON by default. Don't pass large DataFrames or binary blobs between tasks via return values. Write intermediate results to the database or object store.
- Backfill runs every missed interval. Set
catchup=Falseon DAGs that don't need historical backfills, or you'll get a flood of runs on first deploy. - The webserver is not hardened for public exposure. Keep it behind a VPN or localhost. The GL setup binds to
127.0.0.1:8080.
Risks
- Operational surface. Self-hosted Airflow is a real service to maintain — scheduler, webserver, metadata DB, log rotation. More moving parts than a cron job in a shell script. Worth it at 44 DAGs; questionable at 3.
- Python-only DAGs. The workflow definition language is Python. If the team writes Go or TypeScript, Airflow is a context switch. Not an issue for the GL stack (Python backend), but worth noting.
- Scheduler is single-point. Single-node setup means scheduler downtime = no DAG runs. Acceptable for a trading dashboard; not acceptable for a billing pipeline. HA requires CeleryExecutor or KubernetesExecutor — more infra.
