Skip to content
Merged
2 changes: 2 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@
### Added

- Add a flag to determine if database initialization steps should be executed ([#669]).
- Add new roles for dag-processor and triggerer processes ([#679]).

### Fixed

Expand All @@ -19,6 +20,7 @@
[#668]: https://github.com/stackabletech/airflow-operator/pull/668
[#669]: https://github.com/stackabletech/airflow-operator/pull/669
[#678]: https://github.com/stackabletech/airflow-operator/pull/678
[#679]: https://github.com/stackabletech/airflow-operator/pull/679
[#683]: https://github.com/stackabletech/airflow-operator/pull/683

## [25.7.0] - 2025-07-23
Expand Down
860 changes: 845 additions & 15 deletions deploy/helm/airflow-operator/crds/crds.yaml

Large diffs are not rendered by default.

10 changes: 9 additions & 1 deletion docs/modules/airflow/examples/getting_started/code/airflow.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -19,8 +19,16 @@ spec:
celeryExecutors:
roleGroups:
default:
replicas: 2
replicas: 1
schedulers:
roleGroups:
default:
replicas: 1
dagProcessors:
roleGroups:
default:
replicas: 1
triggerers:
roleGroups:
default:
replicas: 1
Original file line number Diff line number Diff line change
Expand Up @@ -88,6 +88,8 @@ echo "Awaiting Airflow rollout finish ..."
kubectl rollout status --watch --timeout=5m statefulset/airflow-webserver-default
kubectl rollout status --watch --timeout=5m statefulset/airflow-worker-default
kubectl rollout status --watch --timeout=5m statefulset/airflow-scheduler-default
kubectl rollout status --watch --timeout=5m statefulset/airflow-dagprocessor-default
kubectl rollout status --watch --timeout=5m statefulset/airflow-triggerer-default
# end::watch-airflow-rollout[]

echo "Starting port-forwarding of port 8080"
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -88,6 +88,8 @@ echo "Awaiting Airflow rollout finish ..."
kubectl rollout status --watch --timeout=5m statefulset/airflow-webserver-default
kubectl rollout status --watch --timeout=5m statefulset/airflow-worker-default
kubectl rollout status --watch --timeout=5m statefulset/airflow-scheduler-default
kubectl rollout status --watch --timeout=5m statefulset/airflow-dagprocessor-default
kubectl rollout status --watch --timeout=5m statefulset/airflow-triggerer-default
# end::watch-airflow-rollout[]

echo "Starting port-forwarding of port 8080"
Expand Down
10 changes: 8 additions & 2 deletions docs/modules/airflow/pages/getting_started/first_steps.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -38,11 +38,15 @@ NOTE: The admin user is disabled if you use a non-default authentication mechani

== Airflow

An Airflow cluster is made of up three components:
An Airflow cluster is made up of several components, two of which are optional:

* `webserver`: this provides the main UI for user-interaction
* `executors`: the CeleryExecutor or KubernetesExecutor nodes over which the job workload is distributed by the scheduler
* `scheduler`: responsible for triggering jobs and persisting their metadata to the backend database
* `dagProcessors`: (Optional) responsible for monitoring, parsing and preparing DAGs for processing.
If this role is not specified then the process will be started as a scheduler subprocess (Airflow 2.x), or as a standalone process in the same container as the scheduler (Airflow 3.x+)
* `triggerers`: (Optional) DAGs making use of deferrable operators can be used together with one or more triggerer processes to free up worker slots.
This deferral process is also useful for providing a measure of high availability

Create a file named `airflow.yaml` with the following contents:

Expand Down Expand Up @@ -92,7 +96,9 @@ airflow-redis-master 1/1 16m
airflow-redis-replicas 1/1 16m
airflow-scheduler-default 1/1 11m
airflow-webserver-default 1/1 11m
airflow-celery-executor-default 2/2 11m
airflow-celery-executor-default 1/1 11m
airflow-dagprocessor-default 1/1 11m
airflow-triggerer-default 1/1 11m
----

When the Airflow cluster has been created and the database is initialized, Airflow can be opened in the
Expand Down
16 changes: 14 additions & 2 deletions docs/modules/airflow/pages/index.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -3,11 +3,11 @@
:keywords: Stackable Operator, Apache Airflow, Kubernetes, k8s, operator, job pipeline, scheduler, ETL
:airflow: https://airflow.apache.org/
:dags: https://airflow.apache.org/docs/apache-airflow/stable/core-concepts/dags.html
:k8s-crs: https://kubernetes.io/docs/concepts/extend-kubernetes/api-extension/custom-resources/
:github: https://github.com/stackabletech/airflow-operator/
:crd: {crd-docs-base-url}/airflow-operator/{crd-docs-version}/
:crd-airflowcluster: {crd-docs}/airflow.stackable.tech/airflowcluster/v1alpha1/
:feature-tracker: https://features.stackable.tech/unified
:deferrable-operators: https://airflow.apache.org/docs/apache-airflow/stable/authoring-and-scheduling/deferring.html#deferrable-operators-triggers

[.link-bar]
* {github}[GitHub {external-link-icon}^]
Expand All @@ -27,7 +27,8 @@ It guides you through installing the operator alongside a PostgreSQL database an
=== Custom resources

The AirflowCluster is the resource for the configuration of the Airflow instance.
The resource defines three xref:concepts:roles-and-role-groups.adoc[roles]: `webserver`, `worker` and `scheduler` (the `worker` role is embedded within `spec.celeryExecutors`: this is described in the next section).
The custom resource defines the following xref:concepts:roles-and-role-groups.adoc[roles]: `webserver`, `worker`, `scheduler`, `dagProcessor` and `triggerer` (the `worker` role is embedded within `spec.celeryExecutors`: this is described in the next section).
The `dagProcessor` and `triggerer` roles are optional.
The various configuration options are explained in the xref:usage-guide/index.adoc[].
It helps you tune your cluster to your needs by configuring xref:usage-guide/storage-resources.adoc[resource usage], xref:usage-guide/security.adoc[security], xref:usage-guide/logging.adoc[logging] and more.

Expand Down Expand Up @@ -70,6 +71,17 @@ kubernetesExecutors:
...
----

=== DAG-Processors

In Airflow 2.x, a DAG-Processor can be started either as a standalone process or a subprocess within the scheduler component.
For Airflow 3.x+ it _must_ be started as a standalone process, either in a separate container or in the scheduler container.
In each case the default will be applied (subprocess or combined in the scheduler container) if the role is not specified.

=== Triggerers

DAGs using deferrable operators can be combined with the triggerer component to free up worker slots and/or provide high availability.
For more information, please refer to the {deferrable-operators}[documentation {external-link-icon}^].

=== Kubernetes resources

Based on the custom resources you define, the operator creates ConfigMaps, StatefulSets and Services.
Expand Down
42 changes: 37 additions & 5 deletions docs/modules/airflow/pages/usage-guide/storage-resources.adoc
Original file line number Diff line number Diff line change
@@ -1,14 +1,15 @@
= Resource Requests
:description: Find out about minimal HA Airflow requirements for CPU and memory, with defaults for schedulers, Celery executors, webservers using Kubernetes resource limits.
:description: Find out about minimal HA Airflow requirements for CPU and memory, with defaults for schedulers, Celery executors, webservers, dagProcessors and triggerers using Kubernetes resource limits.

include::home:concepts:stackable_resource_requests.adoc[]

A minimal HA setup consisting of 2 schedulers, 2 workers and 2 webservers has the following https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/[resource requirements]:
A minimal HA setup consisting of 2 schedulers, 2 workers, 2 webservers, 2 dag-processors and 1 triggerer has the following https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/[resource requirements]:

* `8700m` CPU request
* `17400m` CPU limit
* `15872Mi` memory request and limit
* `11600` CPU request
* `23200` CPU limit
* `18432Mi` memory request and limit

This includes auxiliary containers for logging, metrics, and gitsync.
Corresponding to the values above, the operator uses the following resource defaults:

[source,yaml]
Expand All @@ -22,6 +23,9 @@ spec:
max: "2"
memory:
limit: 1Gi
roleGroups:
default:
replicas: 2
celeryExecutors:
config:
resources:
Expand All @@ -30,6 +34,9 @@ spec:
max: "2"
memory:
limit: 3Gi
roleGroups:
default:
replicas: 2
webservers:
config:
resources:
Expand All @@ -38,4 +45,29 @@ spec:
max: "2"
memory:
limit: 3Gi
roleGroups:
default:
replicas: 2
dagProcessors:
config:
resources:
cpu:
min: "1"
max: "2"
memory:
limit: 1Gi
roleGroups:
default:
replicas: 2
triggerers:
config:
resources:
cpu:
min: "1"
max: "2"
memory:
limit: 1Gi
roleGroups:
default:
replicas: 1
----
5 changes: 4 additions & 1 deletion rust/operator-binary/src/airflow_controller.rs
Original file line number Diff line number Diff line change
Expand Up @@ -1156,7 +1156,10 @@ fn build_server_rolegroup_statefulset(
AirflowRole::Scheduler => {
"OrderedReady" // Scheduler pods should start after another, since part of their startup phase is initializing the database, see crd/src/lib.rs
}
AirflowRole::Webserver | AirflowRole::Worker => "Parallel",
AirflowRole::Webserver
| AirflowRole::Worker
| AirflowRole::DagProcessor
| AirflowRole::Triggerer => "Parallel",
}
.to_string(),
),
Expand Down
Loading