Skip to content

Python runtime dependencies override user-provided libraries #8

Closed
@mpszumowski

Description

@mpszumowski

Image used:
amazon/aws-lambda-python:3.7

Dockerfile:

FROM amazon/aws-lambda-python:3.7

RUN yum -y install gcc

COPY ./requirements.txt ./requirements.txt
RUN pip3 install -r requirements.txt
COPY ./lambda_function.py ./lambda_function.py

RUN python -c 'import sys; print(sys.path)'
RUN pip3 freeze | grep idna

CMD [ "lambda_function.lambda_handler" ]

requirements.txt

snowflake-connector-python==2.3.10
snowflake-sqlalchemy==1.2.4

lambda_function.py

import pkg_resources
import sys

from snowflake.sqlalchemy import URL
from sqlalchemy import create_engine


def lambda_handler(event, context):

    print(sys.path)
    print(pkg_resources.working_set.by_key['idna'])
    engine = create_engine(
        URL(account='account', user='user',
            database='"DATABASE"', warehouse='warehouse')
    )
    return {'success': 200}

What I would expect:
Dependencies are installed correctly, the lambda_handler imports them and executes properly on Lambda.

What is the case:
Log from Lambda:

[ERROR] ContextualVersionConflict: (idna 3.1 (/var/runtime), Requirement.parse('idna<3,>=2.5'), {'requests', 'snowflake-connector-python'})
Traceback (most recent call last):
  File "/var/task/lambda_function.py", line 17, in lambda_handler
    database='"DATABASE"', warehouse='warehouse')
  File "/var/lang/lib/python3.7/site-packages/sqlalchemy/engine/__init__.py", line 520, in create_engine
    return strategy.create(*args, **kwargs)
  File "/var/lang/lib/python3.7/site-packages/sqlalchemy/engine/strategies.py", line 61, in create
    entrypoint = u._get_entrypoint()
  File "/var/lang/lib/python3.7/site-packages/sqlalchemy/engine/url.py", line 172, in _get_entrypoint
    cls = registry.load(name)
  File "/var/lang/lib/python3.7/site-packages/sqlalchemy/util/langhelpers.py", line 275, in load
    return impl.load()
  File "/var/lang/lib/python3.7/site-packages/pkg_resources/__init__.py", line 2461, in load
    self.require(*args, **kwargs)
  File "/var/lang/lib/python3.7/site-packages/pkg_resources/__init__.py", line 2484, in require
    items = working_set.resolve(reqs, env, installer, extras=self.extras)
  File "/var/lang/lib/python3.7/site-packages/pkg_resources/__init__.py", line 792, in resolve
    raise VersionConflict(dist, req).with_context(dependent_req)

The snowflake-connector-python imports a different version of its dependency than pip has installed during Docker build. It then fails due to the fact that idna 3.1 library does not match its requirements: idna<3,>=2.5.

Why it tries to import a different version is suggested by the logging I have added in the Dockerfile and the lambda_function.

Dockerfile: RUN python -c 'import sys; print(sys.path)'

---> Running in 5c925a9e311a
['', '/var/lang/lib/python37.zip', '/var/lang/lib/python3.7', '/var/lang/lib/python3.7/lib-dynload', '/var/lang/lib/python3.7/site-packages']

lambda_function.lambda_handler: print(sys.path)

['/var/task', '/opt/python/lib/python3.7/site-packages', '/opt/python', '/var/runtime', '/var/lang/lib/python37.zip', '/var/lang/lib/python3.7', '/var/lang/lib/python3.7/lib-dynload', '/var/lang/lib/python3.7/site-packages', '/opt/python/lib/python3.7/site-packages', '/opt/python']

Dockerfile: RUN pip3 freeze | grep idna'
idna==2.10

lambda_function.lambda_handler: print(pkg_resources.working_set.by_key['idna'])
idna 3.1

What happens is that Lambda Runtime sets the /var/runtime directory in front of /var/lang/lib/python3.7 and populates the pkg_resources.WorkingSet with the distributions installed there (mostly boto3 + deps). This is being carried over to the lambda handler which is executed with the "overriden" libraries. Seeing how sys.path at the moment when the handler executes, I presume that it has been manually modified to not place the runtime path at the beginning, but the user provided libraries in /var/task. Why /opt/python/lib/python3.7/site-packages is the second if it is not the pip site-packages directory?

The outcome is really confusing - using Docker I expect to be able to handle my runtime and (at least) my dependencies. I definitely expect the Lambda Runtime to be transparent and its dependencies not to impact my workload. Especially if the bug is this opaque and undocumented.

I was able to work my way around with the following hack.

import pkg_resources
import sys

from snowflake.sqlalchemy import URL
from sqlalchemy import create_engine


def lambda_handler(event, context):

    entry = '/var/lang/lib/python3.7/site-packages'
    sys.path = [entry] + sys.path
    for dist in pkg_resources.find_distributions(entry, True):
        pkg_resources.working_set.add(dist, entry, False, replace=True)

    [...]

It may be dangerous if the manually imported libraries libraries will in turn conflict with the downstream code in the runtime. I think, however, that something of this kind can be implemented in the runtime itself so the the handler use only the environment libraries.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions