Why We Chose Temporal Over Airflow

When we started building Cupel's pipeline orchestration layer, the obvious default was Apache Airflow. It is the de facto standard for data pipeline scheduling, has massive community adoption, and most data engineers have worked with it at some point in their careers. We chose Temporal instead. This post explains the reasoning behind ADR-001, the tradeoffs involved, and why durable execution fundamentally changes what is possible in a data pipeline platform.

The Problem Airflow Solves Well

Airflow was designed to schedule and monitor workflows defined as directed acyclic graphs (DAGs). It excels at cron-scheduled batch jobs: extract data from source A, transform it, load it into warehouse B. For that use case, Airflow's scheduler-worker architecture is straightforward and battle-tested.

If Cupel were a simple ETL scheduler, Airflow would have been the right choice. But Cupel pipelines have requirements that push well beyond batch scheduling.

Where Airflow Falls Short for Cupel

Stateless Task Execution

Airflow tasks are stateless by design. Each task is an independent unit of work that runs in an isolated worker process. When a task fails, Airflow can retry it from the beginning, but it cannot resume from where it left off within a task. Consider a pipeline that processes 100 stages. If it fails at stage 47, Airflow can retry the entire pipeline or retry stage 47 from scratch, but it cannot resume stage 47 from the exact instruction where the failure occurred.

This matters for long-running data processing jobs. A transform that takes 45 minutes to process a large partition should not restart from row zero because of a transient network timeout at minute 40.

The Scheduler Bottleneck

Airflow's centralized scheduler polls the metadata database to determine which tasks are ready to run. At scale, this polling loop becomes a bottleneck. Organizations with thousands of DAGs and tens of thousands of daily task instances frequently encounter scheduler lag, where tasks sit in a "queued" state waiting for the scheduler to pick them up.

Limited Retry Semantics

Airflow provides basic retry capabilities: a retry count and a delay between retries. But enterprise data pipelines need more nuanced failure handling. Different stages require different failure strategies. A source extraction might need exponential backoff with jitter. A compliance scan might need to quarantine partial results and alert a human. A downstream aggregation might need to skip and continue with a degraded dataset. Airflow's retry model is too coarse for these distinctions.

No Mid-Workflow Human Approval

Financial services pipelines frequently require human approval gates. A data steward must review and approve data before it moves from Silver to Gold. An analyst must sign off on a reconciliation before it publishes to regulatory reports. Airflow has no native mechanism for pausing a workflow, waiting for human input, and resuming exactly where it left off. External tools and polling hacks are required.

What Temporal Provides

Durable Execution

Temporal's core abstraction is the durable workflow. Every step in a workflow is recorded in Temporal's event history. If a worker crashes, restarts, or is replaced entirely, the workflow resumes from the exact point of failure. It does not re-execute completed steps. It does not lose intermediate state.

Here is what a simplified Cupel pipeline workflow looks like in Temporal:

@workflow.defn
class PipelineWorkflow:
    """Durable pipeline execution — resumes at exact failure point."""

    @workflow.run
    async def run(self, pipeline_config: PipelineConfig) -> PipelineResult:
        # Step 1: Extract — if this completed before crash, it is skipped on replay
        raw_data_uri = await workflow.execute_activity(
            extract_source,
            pipeline_config.source,
            start_to_close_timeout=timedelta(hours=2),
            retry_policy=RetryPolicy(
                initial_interval=timedelta(seconds=10),
                backoff_coefficient=2.0,
                maximum_attempts=5,
            ),
        )

        # Step 2: Quality gate — human approval waits indefinitely
        qg_result = await workflow.execute_activity(
            run_quality_gate,
            QualityGateInput(data_uri=raw_data_uri, profile="financial"),
            start_to_close_timeout=timedelta(minutes=30),
        )

        if qg_result.requires_review:
            # Pause and wait for human approval — no polling, no hacks
            approved = await workflow.wait_condition(
                lambda: self._approval_received,
                timeout=timedelta(days=7),
            )
            if not approved:
                return PipelineResult(status="rejected")

        # Step 3: Transform
        transformed_uri = await workflow.execute_activity(
            apply_transforms,
            TransformInput(data_uri=raw_data_uri, transforms=pipeline_config.transforms),
            start_to_close_timeout=timedelta(hours=4),
        )

        return PipelineResult(status="completed", output_uri=transformed_uri)

Compare this with the Airflow equivalent, which requires external state management for approvals:

# Airflow — human approval requires external polling
with DAG("pipeline", schedule_interval="@daily") as dag:
    extract = PythonOperator(task_id="extract", python_callable=extract_fn)
    quality_gate = PythonOperator(task_id="quality_gate", python_callable=qg_fn)

    # No native way to pause and wait for human input.
    # Common workaround: poll an external database or API in a sensor.
    wait_for_approval = HttpSensor(
        task_id="wait_approval",
        http_conn_id="approval_api",
        endpoint="/approvals/{{ run_id }}",
        poke_interval=60,        # Poll every 60 seconds
        timeout=7 * 24 * 3600,   # For up to 7 days
    )

    transform = PythonOperator(task_id="transform", python_callable=transform_fn)

    extract >> quality_gate >> wait_for_approval >> transform

The Airflow version wastes resources polling an external endpoint. The Temporal version simply waits, consuming no compute, and resumes instantly when the approval signal arrives.

Per-Activity Failure Strategies

In Temporal, every activity (the equivalent of an Airflow task) has its own retry policy, timeout configuration, and failure handling logic. The pipeline compiler in Cupel maps each component's failure strategy to a Temporal retry policy:

# Per-component failure strategies compiled to Temporal retry policies
FAILURE_STRATEGY_MAP = {
    "retry_with_backoff": RetryPolicy(
        initial_interval=timedelta(seconds=5),
        backoff_coefficient=2.0,
        maximum_attempts=10,
        non_retryable_error_types=["DataValidationError"],
    ),
    "skip_and_continue": RetryPolicy(maximum_attempts=1),
    "quarantine_and_alert": RetryPolicy(
        maximum_attempts=3,
        initial_interval=timedelta(seconds=30),
    ),
}

Temporal's per-activity retry policies mean that a source extraction can retry with exponential backoff while a downstream compliance scan uses a quarantine-and-alert strategy, all within the same workflow. In Airflow, you would need to implement this logic inside each task's Python callable.

Workflow Versioning

Temporal supports workflow versioning natively. When a workflow definition changes, in-flight executions continue using the version they started with, while new executions use the updated definition. This is essential for a platform like Cupel where pipeline definitions evolve continuously and long-running pipelines may execute for hours or days.

Airflow DAGs, by contrast, are re-parsed on every scheduler heartbeat. Changing a DAG definition affects all future task executions, including those belonging to currently running DAG runs. This can cause subtle bugs when a structural change to a DAG invalidates the execution plan of an in-progress run.

Code-Native Workflows

Temporal workflows are standard Python code. Control flow, conditionals, loops, error handling, and data passing all use familiar Python constructs. There is no DSL to learn, no YAML to debug, no operator abstraction layer between the developer and the logic.

This matters for Cupel's pipeline compiler. The compiler generates Python code from the visual DAG definition. Generating idiomatic Python that uses Temporal's SDK is far simpler than generating Airflow DAG files, which require specific operator instantiation patterns, XCom for data passing, and trigger rules for conditional execution.

The Tradeoffs

Temporal is not without costs. The operational surface area is larger than Airflow: you need a Temporal server cluster (or Temporal Cloud), and the mental model of durable execution takes time for engineers accustomed to stateless task execution. The community is smaller, the ecosystem of pre-built integrations is narrower, and hiring engineers with Temporal experience is harder than finding Airflow engineers.

We accepted these tradeoffs because Cupel's core value proposition — visual pipeline building with durable execution, human-in-the-loop approval gates, and enterprise-grade failure handling — requires capabilities that Airflow cannot provide without significant custom engineering on top.

For teams evaluating orchestration engines, the decision depends on your use case. If you run batch ETL jobs on a daily schedule with no human interaction, Airflow is a proven choice. If you need durable execution, mid-workflow approvals, per-step failure strategies, and long-running stateful workflows, Temporal is the stronger foundation.

What This Means for Cupel Users

This architectural decision is invisible to most Cupel users. Pipeline builders drag components onto the canvas, configure failure strategies and approval gates through the UI, and click deploy. The pipeline compiler handles the translation to Temporal workflow code. But the benefits surface immediately: pipelines that resume from failure without re-processing completed work, approval gates that pause cleanly without consuming compute, and failure strategies that are granular to the individual component. These capabilities are fundamental to building data pipelines that financial services teams can trust for production regulatory workloads.