Skip to content

Enable AWSSDK on Linux by statically linking OpenSSL and cURL #421

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 15 commits into from

Conversation

ejguan
Copy link
Contributor

@ejguan ejguan commented May 18, 2022

This PR adds all supports to enable awssdk for linux releases

  • The linux release workflow starts to use pytorch/manylinux-cpu docker to align with compiler version with PyTorch Core
  • Add shell script to install OpenSSL and cURL on docker image (This step can be improved if TorchData wants to own a new docker image with these two libraries pre-built)
  • Update torchdata/csrc/CMakeLists.txt to respect the static OpenSSL and cURL
  • Correct thrid_party/CMakeLists.txt without adding redundant dependencies for _torchdata
  • Add auditwheel show to validate the binaries are manylinux_2_17 (alias of manylinux2014)
  • Update Readme for release
  • Make our testing CI running with AWSSDK enabled by default to save more time/money to run CI tests

See the successful workflow for night release: https://github.com/pytorch/data/actions/runs/2391843245
See the manylinux version: https://github.com/pytorch/data/runs/6612861049?check_suite_focus=true#step:9:67

manylinux_2_17_x86_64 is the alias of manylinux_2014

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label May 18, 2022
@ejguan ejguan force-pushed the py_manylinux branch 24 times, most recently from ef8968f to 0453f12 Compare May 23, 2022 17:08
@ejguan ejguan changed the title Change to manylinux1 based on last release Statically linking OpenSSL and CURL for Linux Release May 23, 2022
@ejguan ejguan force-pushed the py_manylinux branch 3 times, most recently from ae88807 to 21abff4 Compare May 24, 2022 19:14
@ejguan ejguan changed the title Statically linking OpenSSL and CURL for Linux Release Enable AWSSDK on Linux by statically linking OpenSSL and cURL May 24, 2022
@facebook-github-bot
Copy link
Contributor

@ejguan has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@ejguan ejguan marked this pull request as ready for review May 26, 2022 17:28
Copy link
Contributor

@NivekT NivekT left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks for making all these changes!

Can you elaborate on "Make our testing CI running with AWSSDK enabled by default to save more time/money to run CI tests"? How does that work?

@@ -65,7 +62,7 @@ jobs:
run: |
python setup.py install
env:
BUILD_S3: ${{ matrix.with-s3 }}
BUILD_S3: 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any situation where we do not enable BUILD_S3?

It looks like only when the setup is Windows with conda, why is that?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For binary releases, it's just because there are too many Error that Idk how to solve with AWS enabled for conda on windows.

For CI testing, you can treat it as wheels, we could enable testing with awssdk.

Generally, the reason I think it makes sense to always enable AWSSDK in our testing CI is the >95% tests are the same with or without AWSSDK enabled.

@ejguan
Copy link
Contributor Author

ejguan commented May 26, 2022

Make our testing CI running with AWSSDK enabled by default to save more time/money to run CI tests"?

By simply removing one test case of BUILD_S3=0.

@ejguan ejguan mentioned this pull request May 31, 2022
ejguan added a commit to ejguan/data that referenced this pull request May 31, 2022
…h#421)

Summary:
This PR adds all supports to enable awssdk for linux releases
- The linux release workflow starts to use `pytorch/manylinux-cpu` docker to align with compiler version with PyTorch Core
- Add shell script to install OpenSSL and cURL on docker image (This step can be improved if TorchData wants to own a new docker image with these two libraries pre-built)
- Update `torchdata/csrc/CMakeLists.txt` to respect the static OpenSSL and cURL
- Correct `thrid_party/CMakeLists.txt` without adding redundant dependencies for `_torchdata`
- Add `auditwheel show` to validate the binaries are `manylinux_2_17` (alias of `manylinux2014`)
- Update Readme for release
- Make our testing CI running with AWSSDK enabled by default to save more time/money to run CI tests
- Remove redundant `numpy` dependency introduced by `tf_record`. And, fix tests in this file

See the workflow for night release: https://github.com/pytorch/data/actions/runs/2391843245
See the `manylinux` version: https://github.com/pytorch/data/runs/6612861049?check_suite_focus=true#step:9:67

`manylinux_2_17_x86_64` is the alias of `manylinux_2014`

Pull Request resolved: pytorch#421

Reviewed By: NivekT

Differential Revision: D36641092

Pulled By: ejguan

fbshipit-source-id: 349bfd896ee0db01eea849580984f4000ca2bc3f
ejguan added a commit that referenced this pull request May 31, 2022
Summary:
This PR adds all supports to enable awssdk for linux releases
- The linux release workflow starts to use `pytorch/manylinux-cpu` docker to align with compiler version with PyTorch Core
- Add shell script to install OpenSSL and cURL on docker image (This step can be improved if TorchData wants to own a new docker image with these two libraries pre-built)
- Update `torchdata/csrc/CMakeLists.txt` to respect the static OpenSSL and cURL
- Correct `thrid_party/CMakeLists.txt` without adding redundant dependencies for `_torchdata`
- Add `auditwheel show` to validate the binaries are `manylinux_2_17` (alias of `manylinux2014`)
- Update Readme for release
- Make our testing CI running with AWSSDK enabled by default to save more time/money to run CI tests
- Remove redundant `numpy` dependency introduced by `tf_record`. And, fix tests in this file

See the workflow for night release: https://github.com/pytorch/data/actions/runs/2391843245
See the `manylinux` version: https://github.com/pytorch/data/runs/6612861049?check_suite_focus=true#step:9:67

`manylinux_2_17_x86_64` is the alias of `manylinux_2014`

Pull Request resolved: #421

Reviewed By: NivekT

Differential Revision: D36641092

Pulled By: ejguan

fbshipit-source-id: 349bfd896ee0db01eea849580984f4000ca2bc3f
bushshrub pushed a commit to bushshrub/data that referenced this pull request Jun 2, 2022
…h#421)

Summary:
This PR adds all supports to enable awssdk for linux releases
- The linux release workflow starts to use `pytorch/manylinux-cpu` docker to align with compiler version with PyTorch Core
- Add shell script to install OpenSSL and cURL on docker image (This step can be improved if TorchData wants to own a new docker image with these two libraries pre-built)
- Update `torchdata/csrc/CMakeLists.txt` to respect the static OpenSSL and cURL
- Correct `thrid_party/CMakeLists.txt` without adding redundant dependencies for `_torchdata`
- Add `auditwheel show` to validate the binaries are `manylinux_2_17` (alias of `manylinux2014`)
- Update Readme for release
- Make our testing CI running with AWSSDK enabled by default to save more time/money to run CI tests
- Remove redundant `numpy` dependency introduced by `tf_record`. And, fix tests in this file

See the workflow for night release: https://github.com/pytorch/data/actions/runs/2391843245
See the `manylinux` version: https://github.com/pytorch/data/runs/6612861049?check_suite_focus=true#step:9:67

`manylinux_2_17_x86_64` is the alias of `manylinux_2014`

Pull Request resolved: pytorch#421

Reviewed By: NivekT

Differential Revision: D36641092

Pulled By: ejguan

fbshipit-source-id: 349bfd896ee0db01eea849580984f4000ca2bc3f
@andrewkho andrewkho deleted the py_manylinux branch September 26, 2024 04:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants