Skip to content

Python runtime dependencies override user-provided libraries #8

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
mpszumowski opened this issue Feb 5, 2021 · 10 comments
Closed

Python runtime dependencies override user-provided libraries #8

mpszumowski opened this issue Feb 5, 2021 · 10 comments

Comments

@mpszumowski
Copy link

mpszumowski commented Feb 5, 2021

Image used:
amazon/aws-lambda-python:3.7

Dockerfile:

FROM amazon/aws-lambda-python:3.7

RUN yum -y install gcc

COPY ./requirements.txt ./requirements.txt
RUN pip3 install -r requirements.txt
COPY ./lambda_function.py ./lambda_function.py

RUN python -c 'import sys; print(sys.path)'
RUN pip3 freeze | grep idna

CMD [ "lambda_function.lambda_handler" ]

requirements.txt

snowflake-connector-python==2.3.10
snowflake-sqlalchemy==1.2.4

lambda_function.py

import pkg_resources
import sys

from snowflake.sqlalchemy import URL
from sqlalchemy import create_engine


def lambda_handler(event, context):

    print(sys.path)
    print(pkg_resources.working_set.by_key['idna'])
    engine = create_engine(
        URL(account='account', user='user',
            database='"DATABASE"', warehouse='warehouse')
    )
    return {'success': 200}

What I would expect:
Dependencies are installed correctly, the lambda_handler imports them and executes properly on Lambda.

What is the case:
Log from Lambda:

[ERROR] ContextualVersionConflict: (idna 3.1 (/var/runtime), Requirement.parse('idna<3,>=2.5'), {'requests', 'snowflake-connector-python'})
Traceback (most recent call last):
  File "/var/task/lambda_function.py", line 17, in lambda_handler
    database='"DATABASE"', warehouse='warehouse')
  File "/var/lang/lib/python3.7/site-packages/sqlalchemy/engine/__init__.py", line 520, in create_engine
    return strategy.create(*args, **kwargs)
  File "/var/lang/lib/python3.7/site-packages/sqlalchemy/engine/strategies.py", line 61, in create
    entrypoint = u._get_entrypoint()
  File "/var/lang/lib/python3.7/site-packages/sqlalchemy/engine/url.py", line 172, in _get_entrypoint
    cls = registry.load(name)
  File "/var/lang/lib/python3.7/site-packages/sqlalchemy/util/langhelpers.py", line 275, in load
    return impl.load()
  File "/var/lang/lib/python3.7/site-packages/pkg_resources/__init__.py", line 2461, in load
    self.require(*args, **kwargs)
  File "/var/lang/lib/python3.7/site-packages/pkg_resources/__init__.py", line 2484, in require
    items = working_set.resolve(reqs, env, installer, extras=self.extras)
  File "/var/lang/lib/python3.7/site-packages/pkg_resources/__init__.py", line 792, in resolve
    raise VersionConflict(dist, req).with_context(dependent_req)

The snowflake-connector-python imports a different version of its dependency than pip has installed during Docker build. It then fails due to the fact that idna 3.1 library does not match its requirements: idna<3,>=2.5.

Why it tries to import a different version is suggested by the logging I have added in the Dockerfile and the lambda_function.

Dockerfile: RUN python -c 'import sys; print(sys.path)'

---> Running in 5c925a9e311a
['', '/var/lang/lib/python37.zip', '/var/lang/lib/python3.7', '/var/lang/lib/python3.7/lib-dynload', '/var/lang/lib/python3.7/site-packages']

lambda_function.lambda_handler: print(sys.path)

['/var/task', '/opt/python/lib/python3.7/site-packages', '/opt/python', '/var/runtime', '/var/lang/lib/python37.zip', '/var/lang/lib/python3.7', '/var/lang/lib/python3.7/lib-dynload', '/var/lang/lib/python3.7/site-packages', '/opt/python/lib/python3.7/site-packages', '/opt/python']

Dockerfile: RUN pip3 freeze | grep idna'
idna==2.10

lambda_function.lambda_handler: print(pkg_resources.working_set.by_key['idna'])
idna 3.1

What happens is that Lambda Runtime sets the /var/runtime directory in front of /var/lang/lib/python3.7 and populates the pkg_resources.WorkingSet with the distributions installed there (mostly boto3 + deps). This is being carried over to the lambda handler which is executed with the "overriden" libraries. Seeing how sys.path at the moment when the handler executes, I presume that it has been manually modified to not place the runtime path at the beginning, but the user provided libraries in /var/task. Why /opt/python/lib/python3.7/site-packages is the second if it is not the pip site-packages directory?

The outcome is really confusing - using Docker I expect to be able to handle my runtime and (at least) my dependencies. I definitely expect the Lambda Runtime to be transparent and its dependencies not to impact my workload. Especially if the bug is this opaque and undocumented.

I was able to work my way around with the following hack.

import pkg_resources
import sys

from snowflake.sqlalchemy import URL
from sqlalchemy import create_engine


def lambda_handler(event, context):

    entry = '/var/lang/lib/python3.7/site-packages'
    sys.path = [entry] + sys.path
    for dist in pkg_resources.find_distributions(entry, True):
        pkg_resources.working_set.add(dist, entry, False, replace=True)

    [...]

It may be dangerous if the manually imported libraries libraries will in turn conflict with the downstream code in the runtime. I think, however, that something of this kind can be implemented in the runtime itself so the the handler use only the environment libraries.

@denis-ryzhkov
Copy link

The same workaround, just moved to top-level, compatible with running locally, and creating no duplicate entries in sys.path:

import pkg_resources
import sys

site_packages = "/var/lang/lib/python3.8/site-packages"
try:
    sys.path.remove(site_packages)
except ValueError:
    pass
else:
    sys.path.insert(0, site_packages)
    for dist in pkg_resources.find_distributions(site_packages, True):
        pkg_resources.working_set.add(dist, site_packages, False, replace=True)

@ilias-at-adarma
Copy link

This behavior is terrible, I am surprised this hasn't been fixed. If you install your own boto3/botocore3 and any other library that is shadowed by /var/runtime you will think you are running on your pinned requirement but nope, you completely rely on the shadowed version. Production ticking bomb. Even worse if you haven't pinned on a dated image tag and your image is cached; your dependencies slowly grow out of date without you knowing.

There is also not much control over this, removing syspath being a rather hacky solution in my view.

Does anyone know if those existing boto3/botocore libraries shipped with the lambda are actually used by the runtime? Could we just wipe them from our Dockerfile?

@SteggyLeggy
Copy link

SteggyLeggy commented Mar 30, 2023

What happens is that Lambda Runtime sets the /var/runtime directory in front of /var/lang/lib/python3.7 and populates the pkg_resources.WorkingSet with the distributions installed there (mostly boto3 + deps).

Where does this happen? I mean what component is actually doing this?

Just thinking of making my own docker base image to use for lambda's, but I'd like to be sure that doing so will actually fix this issue.

@ilias-at-adarma
Copy link

Where does this happen? I mean what component is actually doing this?

The lambda runtime script runs in /var/runtime which is in the same directory as boto3. From there it creates the lambda listener and at some point does an import of the lambda handler function and to finally run your lambda code upon event receive. The fact it's in the same directory means Python will give priority to import in the same directory. If Python doesn’t find the module in the local directory, it’ll then move onto the paths specified in $PYTHONPATH

@SteggyLeggy
Copy link

SteggyLeggy commented Mar 30, 2023

Oh I see, so this behaviour is more a bi-product of how it is run, and not something done on purpose.

Looking at https://github.com/aws/aws-lambda-python-runtime-interface-client I cannot see any dependency documented on boto3 or botocore. That doesn't of course mean that it doesn't just rely on those libraries already being available though I guess.

Maybe making our own base images, that don't have boto etc installed alongside the lambda runtime interface client will help in this situation.

Looking at the documentation here, it doesn't suggest that boto3 etc are required dependencies either.

https://docs.aws.amazon.com/lambda/latest/dg/images-create.html#images-create-from-alt

@aws-haddad
Copy link

Thanks @SteggyLeggy, based on your suggestion I followed "Using an AWS base image for custom runtimes" from here: https://docs.aws.amazon.com/lambda/latest/dg/images-create.html#runtimes-images-custom which worked great.

@jtuliani
Copy link

We have published an updated image for Python 3.11 which addresses this issue.

Previously, the Lambda base container images for Python included the /var/runtime directory before the /var/lang/lib/python3.x directory in the search path. This meant that packages in /var/runtime are loaded in preference to packages pip installed into /var/lang/lib/python3.x. Since the AWS SDK for Python (boto3/botocore) was installed into /var/runtime, this made it harder for customers to upgrade the SDK version.

With the Python 3.11 runtime, the AWS SDK and its dependencies are now pre-installed into the /var/lang/lib/python3.11 directory, and the search path has been modified so this directory has precedence over /var/runtime. Customers can override the SDK by pip installing a newer version. This change also enables pip to verify and track that the pre-installed SDK and its dependencies are compatible with any customer-installed packages.

@ryancausey
Copy link

Will this fix also be backported to the older lambda python base images that are still supported? I think it should be backported to all the image versions listed here: https://docs.aws.amazon.com/lambda/latest/dg/python-image.html#python-image-base

@jtuliani
Copy link

@ryancausey We don't currently plan to back-port this change to the Lambda images for earlier Python versions. It's a (potentially) breaking change and we don't want to break existing customer configurations.

@jimmyorpheus
Copy link

jimmyorpheus commented Jan 28, 2025

Since I'm using public.ecr.aws/lambda/python:3.7 as my base image and currently cannot update to public.ecr.aws/lambda/python:3.11, I came up with the following workaround to deal with the described issue:

RUN python -m pip install -r requirements.txt --target=/opt/python/lib/python3.7/site-packages

Instead of using the regular pip site-packages directory (/var/lang/lib/python3.7/site-packages) I set another target (/opt/python/lib/python3.7/site-packages) when installing my requirements:

As @mpszumowski pointed out, this folder precedes the /var/runtime in sys.path when the handler executes, meaning that my custom package versions - as intended - override the preinstalled package versions at /var/runtime.

This seems to work fine. But since I don't know the reason for /opt/python/lib/python3.7/site-packages being included within sys.path I might be doing something wrong, by using this directory for my package installations.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants