Skip to content

Support container-based (e.g. Docker-based) local development workflows #3356

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
nottrobin opened this issue Dec 5, 2018 · 14 comments
Closed
Labels
Category: Docker Issue affects docker builds. rotten Type: Discussion This issue is open for discussion.

Comments

@nottrobin
Copy link

nottrobin commented Dec 5, 2018

I've touched on this topic in another issue a while ago but I don't think I articulated it very well, so I'll try to lay out our actual problem here:

We do local development on our projects (e.g. www.ubuntu.com) effectively like this (Django example, illustrative only):

docker run \
    --volume `pwd`:/srv  \
    --workdir /srv  \
    --publish 8000:8000  \
    python:3  \
    bash -c "pip install -r requirements.txt && ./manage.py runserver"

(In practice, we have a bash script at ./run in each project, which spins up the appropriate Docker containers for you, with some extra settings)

The point is, rather than using filesystem and $PATH tricks to mock an encapsulated environment for our Python dependencies (like Python virtual environments do), we use containers (in this case, Docker containers). Using Python virtualenvs can both often have unexpected effects on the wider system (there's nothing to stop Python packages writing to or changing permissions in your home directory - and many do) and break because of quirks in your wider system (like not having specific C libraries installed or set up correctly). Using containers offers significantly more reliable encapsulation.

This allows us to have confidence that the project will run the same for each of our devs, on both Linux and macOs, with very little knowledge about the project or help from other devs. The single dependency our devs need installed is Docker. We've been doing it this way for about 4 years and it works very well.

I love what Pipenv is doing - the new Pipfile format is nice, but the lockfiles are the killer feature. The problem is that, as far as I understand, the way Pipenv is currently implemented, it can't be used for development without also using the virtual environments themselves. This is because it generates the lockfile hashes from the virtual environment.

The problem is that with our use of containers for encapsulation, we run everything as root inside the container - installing everything at the system level. This prevents Pipenv being able to generate a lockfile, so we can't use Pipenv with this workflow.

(We could, of course, configure our containers to have a whole encapsulated user environment and still use Python virtualenvs inside the container, but it's a whole load of extra weight and complexity to our containers that we don't need.)

Is there any way to use Pipenv with our workflow? Or perhaps, that Pipenv might move towards supporting our workflow? E.g. by decoupling the Pipfile and Pipfile.lock format support from the virtual environment management.

@frostming
Copy link
Contributor

frostming commented Dec 6, 2018

Doens't pipenv install --deploy --system work in that case?

@vovimayhem
Copy link

vovimayhem commented Dec 6, 2018

@frostming I tried that back in June, but didn't worked as I wanted for me - see #2292

@frostming
Copy link
Contributor

frostming commented Dec 6, 2018

@vilhalmer The recommended way is to generate a proper Pipfile and lock it out of the container and copy the two files into docker container then do pipenv install --deploy --system. Docker container is more considered as a deployment machine than a development machine.

Of course, if you think that feature is necessary to have, you can submit a PEEP into the peeps directory via a PR.

@nottrobin
Copy link
Author

nottrobin commented Dec 6, 2018

@frostming that workflow definitely doesn't work for me, as the whole point of the container-based encapsulation is that the container can provide the exact required version of Python, Pip and Pipenv for all developers.

Docker container is more considered as a deployment machine than a development machine.

"Considered" by whom? Docker is used for local development by tons of people. In fact, Docker was used exclusively for local development for about the first 2 years of its life while people were extremely sceptical about using it for production.

Container-based encapsulation for local development is simply vastly superior to Python virtual environments. If there's a different container tool than Docker that you think is "considered as a development machine", then let me know. But even if we used lxd we'd still want to install dependencies at the system level when we're inside a container.

Not to mention that there are massive benefits to using the same tooling in development as in production ("dev-prod parity"). If you're using Docker for production (e.g. Kubernetes), it's pretty convenient to use it in local development, get the increased encapsulation benefits, and increase the chance of uncovering production issues.

@techalchemy
Copy link
Member

Technically it’s still a security issue to not use a virtual environment in docker containers — I’m not really sure what benefits you get by avoiding virtual environments in docker, but any package that could modify things inside a virtualenv is a lot more likely to wreak havoc and behave unpredictability outside a virtualenv

In fact, there is a lot that would cause you to want to use virtualenvs inside of docker containers rather than directly using the system python install or any system-wide install, which is definitely going to be a broader security risk and significantly more unpredictable than the alternatives. This is how I personally use it and If you need additional resources it’s possible that @tiran can point you at some.

FYI, setting $PATH isn’t really a bad thing or a trick, not sure why your acting like this is a negative.

Also, since you’re using docker, C dependencies aren’t an issue because you can just define them to be installed, which is exactly the same as if you didn’t use a virtualenv.

Ultimately dependency resolution needs to happen in an isolated environment. You can give it access to the system site packages with --site-packages but you need a virtualenv to avoid cross contaminating resolution and cross contaminating your system during the resolution process.

You have fine-grained total control over the environment. You probably should accept that maybe it is possible to use a virtualenv in the container.

@nottrobin
Copy link
Author

nottrobin commented Dec 6, 2018

I’m not really sure what benefits you get by avoiding virtual environments in docker

The benefit is that the whole system is significantly simpler. The container's sole job is to encapsulate and run dependencies for that specific project. That is much easier to both understand and work on if everything inside the container is system-level. Otherwise, you have your own user, running a docker container, which is in turn running a different user account, which is then creating an encapsulated virtual python environment.

which is definitely going to be ... significantly more unpredictable than the alternatives

I suspect this perspective is somewhat a result of stockholm syndrome. You've only ever worked with virtual environments, and so you assume they're the simplest thing. In fact, I would argue that they are empirically more complex than not using virtual environments. The number of steps you have to understand to properly comprehend what's going on when you are working inside a virtual environment is clearly greater.

All this is kind of beside the point. I don't need to justify our workflow, it is the one we use at pretty significant scale, it works very well, and it is a pattern practised throughout the industry. We're certainly not going to change it because you don't happen to think it's a good one.

The only important question is, will the Pipenv maintainers try to help me with using it within our workflow, or will they double-down on their reputation of being dismissive of workflows that aren't the ones practiced by the core devs? I admit, I was sceptical when I filed this issue.

@bittner
Copy link
Contributor

bittner commented Dec 7, 2018

The benefit is that the whole system is significantly simpler.

Didn't anyone using a virtual environment in a Docker image have the problem of having to activate the venv to run the Python application? I consider this to be a major source of developer confusion.

Also, of course, no virtualenv needed means you only install the software you need to run your application. Why have unnecessary overhead? The deployed system should be really as simple as possible. To be able to understand it better, and for security.

On the other hand, I think virtual environments have their place. It's faster to develop with them compared to develop using Docker (all the time). A developer should have the freedom to use the tools of her/his choice for development -- until it goes towards production (which may be from commit 1, and via your CI/CD pipeline). It shouldn't be an either-or story for development. For deployment, however, we should use the simplest possible form to ensure we have it easy to understand the application and potential side-effects that show up with unwanted features and bugs.

@techalchemy
Copy link
Member

So pipenv completely obviates the need to try and find your virtualenv in docker, since you can just use pipenv run. The workflow stays the same but you get isolation and safety from touching system dependencies. At the end of the day, the concern is that installing and upgrading system packages will break your system.

Your docker image can easily contain:

RUN apt-get -y -q install python python-pip
RUN pip install pipenv
RUN useradd useracct
ADD . /app
WORKDIR /app
RUN chown -R useracct:useracct /app
USER useracct
RUN pipenv install --dev 2>&1
RUN pipenv run <some command or tests>

@nottrobin if you are concerned about being dismissed, throwing around accusations of ‘Stockholm syndrome’ and accusing us of only having ever used virtual environments is a pretty good effort to that end. Linking to a reddit thread that started a massive flame war is probably the best effort. It seems like you are not making a good faith effort here, and I don’t really want to engage with you in that case. Please read the code of conduct and come back if you can be productive. Until then, I have to conclude you are attempting to incite some kind of incident. Sorry. Not interested. Have a nice day.

@nottrobin
Copy link
Author

nottrobin commented Dec 8, 2018

@techalchemy I am not trying to start anything, I filed the issue in good faith and tried to lay out my points as clearly as I possibly could, trying very hard to avoid prejudice or unreasonable demands. All I want is to know what the best path looks like to using Pipenv with our workflow.

I feel that what I was immediately met with was a suggestion that the workflow that we use is fundamentally flawed and that the workflow should be changed. Although I do accept that I was probably somewhat sensitive to that possibility because of the history, both in community opinion and, before that, from my previous experience, so I may have got more defensive than I should have.

I really do like the mission of Pipenv. requirements.txt obviously has many problems, and when I first read and looked into the Pipenv project I was really super excited about it. So many great ideas from the latest packaging tools. It was only through trying to use it and then trying to ask for help with using it that my opinion has soured.

I want our projects to follow standards, and PyPA have enshrined Pipenv as the new standard packaging tool, so I would absolutely love nothing more than to be able to make use of it, and to help improve it.

But I have no idea what approach to trying to do that could possibly work here. It seems I need to find a way to convince you that there's some value in the workflow we use before you will consider it worth trying to help me with it. I've tried to do that from about 5 different angles.

I assure you that if you move a little beyond a pure Python world, containers are in use for encapsulation everywhere. The success of Docker and Kubernetes is just one example. The benefits for local development are really huge, and I truly believe that this is the inexorable direction of how all projects will be developed in the future. Python like everything else will need to fit in with the movement.

It does just seem to me a clear self-evident fact that when using a container purely to encapsulate the dependencies for developing on a project, the extra weight associated with then setting up a user account inside that container (and then running a virtual environment inside that user account) is really not only completely unnecessary, but also makes things significantly more difficult (as is apparently clear to the only independent voice in this thread, @bittner, who I promise you, I've never had any prior interaction with).

To illustrate that point, and move from the general to the specific for a second, your example above where you set up a user account in your Dockerfile, doesn't work well in our local development flow because we share the project directory from the host system to the development container (with --volume $(pwd):/srv). This means that any files that are touched by the running application now have to have permissions for the user account inside the container. When we run our container, we do so with --user $(id -u):$(id -g), which works well, but won't work if a specific user inside the container needs to be running Pipenv. This is just one example of how things are just much simpler to work with if the container itself fundamentally has all the project's dependencies installed at the system level.

But you really don't have to buy into everything I'm saying to help me out. All you have to do is accept that there might be some legitimate cases where one might want to actually work on projects with Python dependencies installed at the system level. All I need is any way to use Pipfile and Pipfile.lock without a virtual environment.

@bittner
Copy link
Contributor

bittner commented Dec 8, 2018

pipenv run foo sounds like a good idea - thanks for the hint! -, but it's slow. That is very unfortunate. In production I want to avoid this. This is not a theoretical claim.

We're helping customers moving their projects onto modern cloud platforms. The Docker/Kubernetes/OpenShift stack, while convenient introduces additional overhead. We need to cut down build time and startup time (as well as stability and security threats) by taking the right technical decisions. Having an additional virtualization layer is plain obviously not adding a performance gain. Same goes for stability and security, as I noted earlier. I know these are small pieces, but small pieces sum up to bigger ones.

I really want everyone in the Python world to be constructive. And the constructive argument of this issue here is that we need pipenv (or pip directly, see pypa/pip#2488 (comment)) to install requirements from a Pipfile with or without a virtual environment.

pipenv really shouldn't force us to use a virtual environment if we have valid technical reasons not to take that route. Thinking about it, maybe pip install --pipfile is actually more obvious as a solution. But that depends on whether Pipfile really becomes an official Python standard. (I don't think there is a PEP for that, is there one?)

@techalchemy
Copy link
Member

Pipenv doesn’t force you to use virtual environments unless you are generating a lockfile. pipenv install has a few options that combine with --system which install directly into the containers system python.

OP is taking issue with the fact that it’s not currently possible to regenerate a lockfile outside of an isolated python environment and without performing path manipulations and such. That’s a technical problem with dependency in python which has nothing to do with respecting or dismissing a workflow. If you want to create a new lockfile where we resolve your dependencies, we need isolation, not whatever your Linux distro package maintainers decided to include or unbundle. I’d welcome a PR that alleviates this need and still resolves dependencies consistently but I’m pretty sure it’s not possible.

Anyhow that’s the primary issue here, we aren’t dismissive of any workflow, just of people who try to manipulate us into spending our free time and effort engaged in internet fights. This is a technical constraint. If you don’t like it, please have your company donate some developer time to the project and I will review the PR that fixes it.

@nottrobin
Copy link
Author

nottrobin commented Dec 8, 2018

it’s not currently possible to regenerate a lockfile outside of an isolated python environment and without performing path manipulations and such. That’s a technical problem with dependency in python which has nothing to do with respecting or dismissing a workflow.

There is nothing about Python core that means a lockfile couldn't be generated from system dependencies - the dependencies are in exactly the same format in the system as they are in a virtual environment.

My current understand is that Pipenv deliberately enforces only generating lockfiles from virtual environments because the opinion of whoever implemented that functionality was that that was a good way to ensure the exact installed versions and hashes in the lockfile are repeatable every time - if you start from a clean virtualenv, and pip install from the same Pipfile at the same point in history, you would get the exact same lockfile.

This is the discussion I'd really like to get into.

In the case where we have a development flow that relies on starting from the same container image every time, we can also achieve the same reliability of the generated lockfile from system dependencies, and so perhaps we could consider loosening that requirement, allowing the developer to choose whether it's valid to generate their lockfile from system deps? Or, perhaps, it's not actually very important that the lockfile is so reproducible. Perhaps all that matters is that it is an official snapshot of dependencies that the project was tested with.

If you don’t like it, please have your company donate some developer time to the project and I will review the PR that fixes it.

This is my company (Canonical) donating developer time, in that I've spent a fair bit of time having this discussion both on company time and on my own time.

The last time I tried to look into this problem (also on time "donated by my company") I earnestly tried to delve into the code to fix the errors I was getting when trying to generate the lockfile from the system dependencies (with the hope of filing a PR), but as I looked into it it became clear that it was a fundamental architectural decision of Pipenv - to change it would be a fundamental change and a huge PR. I don't think I can submit that sort of PR without having first come to an agreement about the change with the project maintainers.

During both personal and professional time, I work on the things that it makes sense to me to work on to best solve the problems I'm having, and I submit PRs to projects either explicitly to fix my problems or just to help out because I like the project, again both on company and personal time. For instance, I plan to spend some time on Monday seeing if I can file a PR to fix python-poetry/poetry#712 (which will, I suppose, be in a sense "donated by my company"). Or maybe I'll do it now.

@bittner
Copy link
Contributor

bittner commented Dec 8, 2018

Alright, then that topic seems to have been discussed. It's now official that a PR is welcome.

please have your company donate some developer time to the project

This has become an overused (and somewhat arrogant) argument. It is obvious from this discussion that no-one is asking "the PyPA" to implement a change. We're all part of the Python community and spend either our private or our work time (or both, as I do) here. That deserves mutual respect.

@oz123 oz123 added Category: Docker Issue affects docker builds. Type: Discussion This issue is open for discussion. rotten labels Jan 23, 2022
@matteius
Copy link
Member

I think we can address any additional concerns here with new issue reports or other existing items in the backlog, but there is a lot of changes since the last comments since 2018 -- we continue to make improvements to pipenv and there is a number of tickets labeled with the --system flag -- I have desire to make pipenv work better for people using the system flag in general and we are getting closer to Hacktoberfest.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Category: Docker Issue affects docker builds. rotten Type: Discussion This issue is open for discussion.
Projects
None yet
Development

No branches or pull requests

7 participants