Skip to content

Research Airflow 3.x support #618

Closed
Closed
@sbernauer

Description

@sbernauer

Relevant Slack thread: https://stackable-workspace.slack.com/archives/C071M36AF45/p1745911389456239

Which new version of Apache Airflow should we support?

3.0.x

Additional information

https://airflow.apache.org/blog/airflow-three-point-oh-is-here/
https://airflow.apache.org/docs/apache-airflow/stable/installation/upgrading_to_airflow3.html

Breaking changes

  • SubDAGs: Replaced by TaskGroups, Assets, and Data Aware Scheduling.
    • ✅ Nothing we can do here
  • Sequential Executor: Replaced by LocalExecutor, which can be used with SQLite for local development use cases.
    • ✅ Nothing we can do here
  • SLAs: Deprecated and removed; Will be replaced by forthcoming Deadline Alerts.
    • ✅ Nothing we can do here
  • Subdir: Used as an argument on many CLI commands, --subdir or -S has been superseded by DAG bundles.
    • 🔴 Don't know, probably nothing we can do here
  • Some Airflow context variables: The following keys are no longer available in a task instance’s context. If not replaced, will cause dag errors: - tomorrow_ds - tomorrow_ds_nodash - yesterday_ds - yesterday_ds_nodash - prev_ds - prev_ds_nodash - prev_execution_date - prev_execution_date_success - next_execution_date - next_ds_nodash - next_ds - execution_date
    • ✅ Nothing we can do here
  • The catchup_by_default dag parameter is now False by default.
    • ✅ Nothing we can do here
  • The create_cron_data_intervals configuration is now False by default. This means that the CronTriggerTimetable will be used by default instead of the CronDataIntervalTimetable
    • ✅ Nothing we can do here
  • Simple Auth is now default auth_manager. To continue using FAB as the Auth Manager, please install the FAB provider and set auth_manager to FabAuthManager:
    • 🔴 We need to do something. Either keep using FabAuthManager (technical debt?) or siwtch to Simple Auth. Let's hope it can do OIDC properly ;)
    • 🔴 The OPA authorizer propably needs touching as well

Changes required

  • Think about if we want to have some sort of automated backup mechanism. Probably not, as this would balloon the issue. But some shell scripts to copy/paste would be awesome!
  • As we are already using Python 3.12 we are good to go
  • (possible change required) Deprecated env-vars should be replaced/updated. See Update environment variable that is used for sql_alchemy_conn #319
  • Start-up commands are different for 3.x. See here. Maybe we should move these out into a start-up script, much like we do for HBase.
    comment(sbernauer): I don't like shell scripts in docker-images, they are hard to maintain and keep versions in sync. I think e.g. in Hive we have a if {} else {} in operator and different start commands. I don't think this is ugly at all.

Implementation checklist

  • Update the Docker image
  • Update documentation to include supported version(s)
  • Update and test getting started guide with updated version(s)
  • Update operator to support the new version (if needed)
  • Update integration tests to test use the new versions (in addition or replacing old versions
  • Update examples to use new versions

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

Status

Done

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions