Skip to content

Commit 138f333

Browse files
adwk67maltesander
andauthored
feat: Add DagProcessor and Triggerer roles (#679)
* wip: for 3.x stabilisation changes * added roles for triggerer and dag-processor * added triggerer test * changelog, docs, getting-started * remove start & finish task * Update docs/modules/airflow/pages/getting_started/first_steps.adoc Co-authored-by: Malte Sander <[email protected]> * Update docs/modules/airflow/pages/index.adoc Co-authored-by: Malte Sander <[email protected]> * review feedback: resources clarification, corrected naming of test step * review feedback: re-worked role config --------- Co-authored-by: Malte Sander <[email protected]>
1 parent cd52d4a commit 138f333

38 files changed

+1775
-105
lines changed

CHANGELOG.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,7 @@
55
### Added
66

77
- Add a flag to determine if database initialization steps should be executed ([#669]).
8+
- Add new roles for dag-processor and triggerer processes ([#679]).
89

910
### Fixed
1011

@@ -19,6 +20,7 @@
1920
[#668]: https://github.com/stackabletech/airflow-operator/pull/668
2021
[#669]: https://github.com/stackabletech/airflow-operator/pull/669
2122
[#678]: https://github.com/stackabletech/airflow-operator/pull/678
23+
[#679]: https://github.com/stackabletech/airflow-operator/pull/679
2224
[#683]: https://github.com/stackabletech/airflow-operator/pull/683
2325

2426
## [25.7.0] - 2025-07-23

deploy/helm/airflow-operator/crds/crds.yaml

Lines changed: 845 additions & 15 deletions
Large diffs are not rendered by default.

docs/modules/airflow/examples/getting_started/code/airflow.yaml

Lines changed: 9 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -19,8 +19,16 @@ spec:
1919
celeryExecutors:
2020
roleGroups:
2121
default:
22-
replicas: 2
22+
replicas: 1
2323
schedulers:
2424
roleGroups:
2525
default:
2626
replicas: 1
27+
dagProcessors:
28+
roleGroups:
29+
default:
30+
replicas: 1
31+
triggerers:
32+
roleGroups:
33+
default:
34+
replicas: 1

docs/modules/airflow/examples/getting_started/code/getting_started.sh

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -88,6 +88,8 @@ echo "Awaiting Airflow rollout finish ..."
8888
kubectl rollout status --watch --timeout=5m statefulset/airflow-webserver-default
8989
kubectl rollout status --watch --timeout=5m statefulset/airflow-worker-default
9090
kubectl rollout status --watch --timeout=5m statefulset/airflow-scheduler-default
91+
kubectl rollout status --watch --timeout=5m statefulset/airflow-dagprocessor-default
92+
kubectl rollout status --watch --timeout=5m statefulset/airflow-triggerer-default
9193
# end::watch-airflow-rollout[]
9294

9395
echo "Starting port-forwarding of port 8080"

docs/modules/airflow/examples/getting_started/code/getting_started.sh.j2

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -88,6 +88,8 @@ echo "Awaiting Airflow rollout finish ..."
8888
kubectl rollout status --watch --timeout=5m statefulset/airflow-webserver-default
8989
kubectl rollout status --watch --timeout=5m statefulset/airflow-worker-default
9090
kubectl rollout status --watch --timeout=5m statefulset/airflow-scheduler-default
91+
kubectl rollout status --watch --timeout=5m statefulset/airflow-dagprocessor-default
92+
kubectl rollout status --watch --timeout=5m statefulset/airflow-triggerer-default
9193
# end::watch-airflow-rollout[]
9294

9395
echo "Starting port-forwarding of port 8080"

docs/modules/airflow/pages/getting_started/first_steps.adoc

Lines changed: 8 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -38,11 +38,15 @@ NOTE: The admin user is disabled if you use a non-default authentication mechani
3838

3939
== Airflow
4040

41-
An Airflow cluster is made of up three components:
41+
An Airflow cluster is made up of several components, two of which are optional:
4242

4343
* `webserver`: this provides the main UI for user-interaction
4444
* `executors`: the CeleryExecutor or KubernetesExecutor nodes over which the job workload is distributed by the scheduler
4545
* `scheduler`: responsible for triggering jobs and persisting their metadata to the backend database
46+
* `dagProcessors`: (Optional) responsible for monitoring, parsing and preparing DAGs for processing.
47+
If this role is not specified then the process will be started as a scheduler subprocess (Airflow 2.x), or as a standalone process in the same container as the scheduler (Airflow 3.x+)
48+
* `triggerers`: (Optional) DAGs making use of deferrable operators can be used together with one or more triggerer processes to free up worker slots.
49+
This deferral process is also useful for providing a measure of high availability
4650

4751
Create a file named `airflow.yaml` with the following contents:
4852

@@ -92,7 +96,9 @@ airflow-redis-master 1/1 16m
9296
airflow-redis-replicas 1/1 16m
9397
airflow-scheduler-default 1/1 11m
9498
airflow-webserver-default 1/1 11m
95-
airflow-celery-executor-default 2/2 11m
99+
airflow-celery-executor-default 1/1 11m
100+
airflow-dagprocessor-default 1/1 11m
101+
airflow-triggerer-default 1/1 11m
96102
----
97103

98104
When the Airflow cluster has been created and the database is initialized, Airflow can be opened in the

docs/modules/airflow/pages/index.adoc

Lines changed: 14 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -3,11 +3,11 @@
33
:keywords: Stackable Operator, Apache Airflow, Kubernetes, k8s, operator, job pipeline, scheduler, ETL
44
:airflow: https://airflow.apache.org/
55
:dags: https://airflow.apache.org/docs/apache-airflow/stable/core-concepts/dags.html
6-
:k8s-crs: https://kubernetes.io/docs/concepts/extend-kubernetes/api-extension/custom-resources/
76
:github: https://github.com/stackabletech/airflow-operator/
87
:crd: {crd-docs-base-url}/airflow-operator/{crd-docs-version}/
98
:crd-airflowcluster: {crd-docs}/airflow.stackable.tech/airflowcluster/v1alpha1/
109
:feature-tracker: https://features.stackable.tech/unified
10+
:deferrable-operators: https://airflow.apache.org/docs/apache-airflow/stable/authoring-and-scheduling/deferring.html#deferrable-operators-triggers
1111

1212
[.link-bar]
1313
* {github}[GitHub {external-link-icon}^]
@@ -27,7 +27,8 @@ It guides you through installing the operator alongside a PostgreSQL database an
2727
=== Custom resources
2828

2929
The AirflowCluster is the resource for the configuration of the Airflow instance.
30-
The resource defines three xref:concepts:roles-and-role-groups.adoc[roles]: `webserver`, `worker` and `scheduler` (the `worker` role is embedded within `spec.celeryExecutors`: this is described in the next section).
30+
The custom resource defines the following xref:concepts:roles-and-role-groups.adoc[roles]: `webserver`, `worker`, `scheduler`, `dagProcessor` and `triggerer` (the `worker` role is embedded within `spec.celeryExecutors`: this is described in the next section).
31+
The `dagProcessor` and `triggerer` roles are optional.
3132
The various configuration options are explained in the xref:usage-guide/index.adoc[].
3233
It helps you tune your cluster to your needs by configuring xref:usage-guide/storage-resources.adoc[resource usage], xref:usage-guide/security.adoc[security], xref:usage-guide/logging.adoc[logging] and more.
3334

@@ -70,6 +71,17 @@ kubernetesExecutors:
7071
...
7172
----
7273

74+
=== DAG-Processors
75+
76+
In Airflow 2.x, a DAG-Processor can be started either as a standalone process or a subprocess within the scheduler component.
77+
For Airflow 3.x+ it _must_ be started as a standalone process, either in a separate container or in the scheduler container.
78+
In each case the default will be applied (subprocess or combined in the scheduler container) if the role is not specified.
79+
80+
=== Triggerers
81+
82+
DAGs using deferrable operators can be combined with the triggerer component to free up worker slots and/or provide high availability.
83+
For more information, please refer to the {deferrable-operators}[documentation {external-link-icon}^].
84+
7385
=== Kubernetes resources
7486

7587
Based on the custom resources you define, the operator creates ConfigMaps, StatefulSets and Services.

docs/modules/airflow/pages/usage-guide/storage-resources.adoc

Lines changed: 37 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,14 +1,15 @@
11
= Resource Requests
2-
:description: Find out about minimal HA Airflow requirements for CPU and memory, with defaults for schedulers, Celery executors, webservers using Kubernetes resource limits.
2+
:description: Find out about minimal HA Airflow requirements for CPU and memory, with defaults for schedulers, Celery executors, webservers, dagProcessors and triggerers using Kubernetes resource limits.
33

44
include::home:concepts:stackable_resource_requests.adoc[]
55

6-
A minimal HA setup consisting of 2 schedulers, 2 workers and 2 webservers has the following https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/[resource requirements]:
6+
A minimal HA setup consisting of 2 schedulers, 2 workers, 2 webservers, 2 dag-processors and 1 triggerer has the following https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/[resource requirements]:
77

8-
* `8700m` CPU request
9-
* `17400m` CPU limit
10-
* `15872Mi` memory request and limit
8+
* `11600` CPU request
9+
* `23200` CPU limit
10+
* `18432Mi` memory request and limit
1111
12+
This includes auxiliary containers for logging, metrics, and gitsync.
1213
Corresponding to the values above, the operator uses the following resource defaults:
1314

1415
[source,yaml]
@@ -22,6 +23,9 @@ spec:
2223
max: "2"
2324
memory:
2425
limit: 1Gi
26+
roleGroups:
27+
default:
28+
replicas: 2
2529
celeryExecutors:
2630
config:
2731
resources:
@@ -30,6 +34,9 @@ spec:
3034
max: "2"
3135
memory:
3236
limit: 3Gi
37+
roleGroups:
38+
default:
39+
replicas: 2
3340
webservers:
3441
config:
3542
resources:
@@ -38,4 +45,29 @@ spec:
3845
max: "2"
3946
memory:
4047
limit: 3Gi
48+
roleGroups:
49+
default:
50+
replicas: 2
51+
dagProcessors:
52+
config:
53+
resources:
54+
cpu:
55+
min: "1"
56+
max: "2"
57+
memory:
58+
limit: 1Gi
59+
roleGroups:
60+
default:
61+
replicas: 2
62+
triggerers:
63+
config:
64+
resources:
65+
cpu:
66+
min: "1"
67+
max: "2"
68+
memory:
69+
limit: 1Gi
70+
roleGroups:
71+
default:
72+
replicas: 1
4173
----

rust/operator-binary/src/airflow_controller.rs

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1156,7 +1156,10 @@ fn build_server_rolegroup_statefulset(
11561156
AirflowRole::Scheduler => {
11571157
"OrderedReady" // Scheduler pods should start after another, since part of their startup phase is initializing the database, see crd/src/lib.rs
11581158
}
1159-
AirflowRole::Webserver | AirflowRole::Worker => "Parallel",
1159+
AirflowRole::Webserver
1160+
| AirflowRole::Worker
1161+
| AirflowRole::DagProcessor
1162+
| AirflowRole::Triggerer => "Parallel",
11601163
}
11611164
.to_string(),
11621165
),

0 commit comments

Comments
 (0)