You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
It is likely that it was working like this all the time, so it's not a regression. What is more, I am wondering if this is actually a bug in how Airflow is packaged as the more I am describing this issue, the more it seems to be the case.
Description
I have reproduced the behaviour that I was seeing when using airflow within bazel. It seems that the airflow package is relying on namespace pkgs feature by splitting the whole airflow package into smaller ones and making providers non-mandatory to be installed on the system, e.g. see the sqlite provider: https://pypi.org/project/apache-airflow-providers-sqlite/. airflow also relies on lazy-loading some of the core classes from the main package and thus we may have a problem - the lazy loading does not work if the sys.path entry for the provider happens to be before the airflow entry. This is because it seems that Python is going to use the __getattr__ function declared in the main module only if the main module is the first one in the path.
My assumption is that the majority of the installs out there are not hitting this edge-case because everybody is working in a single virtualenv which contains all of the airflow providers and airflow itself and I am wondering if there is something we should do to workaround the issue. At the very least this will document it and a possible workaround for it. For now what we are doing is we are sorting the sys.path elements alphabetically before importing anything from airflow and this achieves the effect that we want.
@rickeylev, do you know how the order of sys.path elements within the bazel/rules_python is determined? Given the fact that most of the times buildifieris sorting the dependencies and this may affect the order of the sys.path elements if I understand things correctly? Do you think having a virtualenv layout of the runfiles would be something that rules_python could support?
Thinking about it more, I think it is a bug/limitation in Airflow, but I am we may notice more cases in the future and I am wondering if having some predefined order in the sys.path entries within the entrypoint template would be a good thing here.
This issue has been automatically marked as stale because it has not had any activity for 180 days. It will be closed if no further activity occurs in 30 days.
Collaborators can add an assignee to keep this open indefinitely. Thanks for your contributions to rules_python!
🐞 bug report
Affected Rule
py_library
,py_test
,py_binary
Is this a regression?
It is likely that it was working like this all the time, so it's not a regression. What is more, I am wondering if this is actually a bug in how Airflow is packaged as the more I am describing this issue, the more it seems to be the case.
Description
I have reproduced the behaviour that I was seeing when using
airflow
withinbazel
. It seems that theairflow
package is relying on namespace pkgs feature by splitting the wholeairflow
package into smaller ones and makingproviders
non-mandatory to be installed on the system, e.g. see thesqlite
provider: https://pypi.org/project/apache-airflow-providers-sqlite/.airflow
also relies on lazy-loading some of the core classes from the main package and thus we may have a problem - the lazy loading does not work if thesys.path
entry for the provider happens to be before theairflow
entry. This is because it seems that Python is going to use the__getattr__
function declared in the main module only if the main module is the first one in thepath
.My assumption is that the majority of the installs out there are not hitting this edge-case because everybody is working in a single
virtualenv
which contains all of theairflow
providers andairflow
itself and I am wondering if there is something we should do to workaround the issue. At the very least this will document it and a possible workaround for it. For now what we are doing is we are sorting thesys.path
elements alphabetically before importing anything fromairflow
and this achieves the effect that we want.@rickeylev, do you know how the order of
sys.path
elements within thebazel
/rules_python
is determined? Given the fact that most of the timesbuildifier
is sorting the dependencies and this may affect the order of thesys.path
elements if I understand things correctly? Do you think having avirtualenv
layout of therunfiles
would be something thatrules_python
could support?🔬 Minimal Reproduction
A reproduction of the behaviour is documented in https://github.com/aignas/rules_python/tree/test/namespace_pkgs/examples/namespace_pkgs
The text was updated successfully, but these errors were encountered: