-
Notifications
You must be signed in to change notification settings - Fork 633
Dockerize CI (1/N) #1225
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dockerize CI (1/N) #1225
Conversation
My suggestion from the previous discussion #1162 (comment) is to avoid a Dockerfile and a docker build step. All the deps already can be done as part of the setup / run. This gives us flexibility to choose which Docker image we want (manylinux for Release, Ubuntu 22.04 for CIs). |
Thanks @powderluv . I had a discussion with my team around these requirements (cc: @sanjoy , @asaadaldien ) and we think a good definition of "what is the minimum we need to get this going" would help us. As noted in the description, this PR doesn't touch on the "unification with release builds" work. That would require more work along these lines:
In our opinion, these are orthogonal feature requests and need not be tied to the scope of this PR # 1/N. This PR simply preserves the existing CI and adds a docker wrapper to run things both locally + in CI reproducibly, which should suffice to enable the docker reproducers we need, and addresses @silvasean 's request to make it load bearing of CI.
While eliminating the need for a dockerfile may sound okay for CI / release builds, this does impact local flows and fast-turnaround requirements from local runs because it doesn't utilize layered caching from docker (because each local invocation would be a fresh install at runtime). Maybe I'm missing details, could you elaborate on why want to avoid a Dockerfile in the first place? |
@powderluv I think running docker in GHA isn't making things worse in fact making it better, because it makes it easy for downstream users & contributors to reproduce same build artifacts as GHA without running GHA. |
@powderluv Can you please elaborate, not sure I follow how do we choose between multiple Docker images without adding docker files ? |
c56bc95
to
cebca45
Compare
Yeah release has to use manylinux.
This is not required. We just call the CI commands today in the script.
I want to avoid us taking on a heavyweight docker release process (like IREE). What are we building that we don't already have ? Can we avoid the build stage and just use Docker images ? Pip already caches your packages locally and in your Docker container when you are doing local runs. |
Oh I hope I didn't say running docker makes it worse. I am saying it will help us all to avoid yet another new flow if we did #1162 (comment) So a dev doesn't have to do:
The last two can be one entry point |
We already have a way for release builds so it would look something like (whatever we want the env vars to be):
|
Probably easier to just explain it in code #1234 . It provides a single entry point for end users to replicate the CI/Release. It allows for the same workflow to run any docker image you want -- build all CI build and Release builds in one go. You can also pick the Python versions and if you want PyTorch versions in source or binary form. It only needs filling out of the actual CMake commands. @sjain-stanford didn't want to hijack your PR so please take over any / all parts of #1234. |
Thanks for explaining @powderluv, to test my understanding we want CI to run |
Correct me if I'm wrong but we wouldn't need to have (1) and (2) split anymore, as the docker flow would eliminate the need for (1). CI is transparent to developers and they only need to care about:
I am hesitant in combining the developer flow with the build release flow because they address different requirements: |
The docker flow supports both native CMake flows (OOT, in-tree). It just runs them in a container rather than natively (which is more robust and guarantees clean CI if local works, so devs don't need to run CI separately). So with this PR we'd be reducing the # of dev workflows from 3 to 2:
|
Having learned from the libtorch changes I won't change current developer behavior. So it becomes:
The second and third use the same scripts. We can try to bridge the gap between 1 and 2. |
Having learned from the libtorch changes I won't change current developer behavior. So it becomes:
The second and third use the same scripts. We can try to bridge the gap between 1 and 2 if 2 is slower (with cache etc) |
I'm surely missing some details here. Could you please elaborate what aspect of libtorch requires us to use a "separate" lightweight developer flow that doesn't run inside a docker? Is it the pytorch binary support? Would it help if we supported installation of pytorch from source inside a docker? |
The way the Here the first command would launch a container interactively, for the next three commands to be run from. So users get the chance to make incremental fixes, re-test without launching another container.
|
I think I better understand why you'd want to keep the native developer workflow alive in conjunction to the docker workflow - e.g. for macos (which doesn't natively support docker in GHA). However I'm missing the point on how this is adding "one other workflow users need to worry about". Currently, any CMake change users make would require changing (1) the local CMake configuration for native builds, (2) the GHA CMake config for CI and (3) release script's setup.py based CMake config. With this change, we still have the same number of flows right? The only thing this PR changes is with (2) instead of updating the raw GHA workflow blindly (which are difficult to validate locally and require pushing), they can now update the Regarding unifying this docker workflow with the release docker workflow, that seems like a good incremental next step but not required for this PR to be considered. |
I'm also happy to close this one if @powderluv you'd like to continue to bring #1234 to completion (include CMake config, ccache, pip cache, run unit and integration tests etc). Anything that helps us validate CMake CI locally seems like a net improvement and we'd benefit from it. It doesn't have to be this way. We (Cruise) are mainly concerned with the bazel builds, so I can send a separate PR out to dockerize just the bazel GHA build, and extend it to include unit and integration tests which should cover all bases for us. |
cebca45
to
ce26326
Compare
Ok will take over to complete #1234. Will get to it later this week. |
Thank you @powderluv. This is very much appreciated. Looking forward to it. |
This is the first step towards dockerizing CI.
Fixes #1162
PR includes:
PR does not include (can be follow-on):
build_tools/python_deploy/build_linux_packages.sh
Local repro steps:
This should provide much needed relief with issues reproducing CI failures locally and avoid any environment discrepancies.