Skip to content

Add utilities for running release tests #56

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 30 commits into from
Jan 6, 2022
Merged

Add utilities for running release tests #56

merged 30 commits into from
Jan 6, 2022

Conversation

angerson
Copy link
Contributor

@angerson angerson commented Dec 2, 2021

Now that tf-nightly is working, I'm working on getting the tests working just the same as they are in our internal CI. I know for sure that many test dependencies are missing, and first I'm getting jobs online so that I can see just how much is absent.

@angerson angerson added the sig build dockerfiles Relating to the TF SIG Build Dockerfiles label Dec 2, 2021
@angerson angerson self-assigned this Dec 2, 2021
@angerson angerson added the build and push to gcr.io for staging Create a staging release on gcr.io label Dec 2, 2021
@github-actions github-actions bot removed the build and push to gcr.io for staging Create a staging release on gcr.io label Dec 2, 2021
@github-actions
Copy link

github-actions bot commented Dec 2, 2021

I pushed these containers:

  • gcr.io/tensorflow-sigs/build:56-python3.9
  • gcr.io/tensorflow-sigs/build:56-python3.8
  • gcr.io/tensorflow-sigs/build:56-python3.7
    Re-apply the build and push to gcr.io for staging label to
    rebuild and push again. This comment will only be posted once.

@bhack
Copy link
Contributor

bhack commented Dec 2, 2021

I think this is useful and partially related to our old threads/PR:
https://discuss.tensorflow.org/t/run-python-tests-without-compiling-tf/1724
tensorflow/tensorflow#50163

Do you think that we need to enable these tests over install nightly wheels cause we still cannot rely on the cache to build TF in a Github Action and run the standard tests?

# drops all the bazel dependencies for each py_test; this makes all the tests
# use the wheel's TensorFlow installation instead of the one made available
# through bazel. This must be done in a different root directory, //bazel_pip/...,
# because "import tensorflow" run from the root directory would instead import
Copy link
Contributor

@bhack bhack Dec 2, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think that this will work in same root dir?
https://github.com/tensorflow/tensorflow/pull/50163/files#diff-364cf915db9af8a15c691cddfdb7a5808e927715ada650b01911e74d1e2ba944R187

It is a little bit old so I don't remember if it worked or not

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the end, until the cache reproducibility is verified, it could be really useful if we could find a solution to run in source python tests on pip installed wheels. Do you think that there is a quick workaround to implement also this modality?

@angerson angerson added the build and push to gcr.io for staging Create a staging release on gcr.io label Dec 3, 2021
@github-actions github-actions bot removed the build and push to gcr.io for staging Create a staging release on gcr.io label Dec 3, 2021
@angerson angerson added the build and push to gcr.io for staging Create a staging release on gcr.io label Dec 3, 2021
@github-actions github-actions bot removed the build and push to gcr.io for staging Create a staging release on gcr.io label Dec 3, 2021
@angerson angerson added the build and push to gcr.io for staging Create a staging release on gcr.io label Dec 3, 2021
@angerson
Copy link
Contributor Author

angerson commented Dec 3, 2021

@bhack FYI, I'm going to remove the exec-log option when this is merged because of bazelbuild/bazel#12510, which wasn't fixed until bazel 5 or 6 (bazelbuild/bazel@e58dd7e).

@bhack
Copy link
Contributor

bhack commented Dec 3, 2021

It is not a problem for me as I could add this flag manually but as you are using this in CI we could loose the CI execution log to debug why we have all these cache misses with the nightly build with the same Docker image and commit.

As it is seems that the bug it is only with bazel test can we remove it just for that?

@angerson
Copy link
Contributor Author

angerson commented Dec 3, 2021

Well... even if we did store the logs, it's not a high priority right now (until Q1 at least) to debug the cache misses, and I wouldn't use them. There is a lot of work I am doing first to replicate the rest of our internal CI, which doesn't use the cache very much.

@bhack
Copy link
Contributor

bhack commented Dec 3, 2021

Ok please comment out these lines with a reference to the bug without removing the whole block. We will review it in Q1.

Just a side note about the priorities:
as the roadmap is not public we just guess what the team is doing or not and we cannot do better then guessing.

@bhack
Copy link
Contributor

bhack commented Dec 31, 2021

I don't know what Is the final scope of the log squasher but probably with Basel --test-outputand --test-summary you could get test failures with these args.

@angerson angerson marked this pull request as ready for review January 6, 2022 21:47
@angerson angerson requested a review from perfinion as a code owner January 6, 2022 21:47
@angerson
Copy link
Contributor Author

angerson commented Jan 6, 2022

I've made significant progress on this, and am going to merge it; there is still some work that needs to be done on documentation, and I expect I'll be making more changes to some of the helper tools. Those are still on the way, but since these containers are generally working now for my experimental internal CI jobs, I'll commit the change and continue work on new PRs.

@angerson angerson merged commit 6089e13 into master Jan 6, 2022
@angerson angerson deleted the future branch January 6, 2022 21:57
sampathweb added a commit that referenced this pull request Jan 10, 2022
nitins17 added a commit that referenced this pull request Feb 16, 2022
Set up symlink for devtoolset-8

Combine Docker GCR presubmits and also push main to gcr

Commit missed files

Log in to GCR

Fix conditional, hopefully

Clarify

Add Python 3.10 support (#58)

Adds Python 3.10 support to the containers. Python 3.10 changes some library behavior and, for now, needs an alternative installation method to work.

Upgrade gcrpio for fast build and cleanup setup

Add utilities for running release tests (#56)

This adds the dependencies and notably bazelrc config options to run TensorFlow's Nightly and Release tests, which I've been working on replicating on internal CI. I still have documentation and migration work to do, but the major portion of the support work is here.

add gdb to the system packages

change to gcc 8.3.1 from centos7 for devtoolset8

fix libstdc++ symlink in devtoolset-8 environment

Undo ignoring other xml files

Update README

Deduplicate repeated messages

Squash long runfiles paths

Lock nvidia driver to 460

libtensorflow work

Fix libtensorflow script and start prelim check

Update Test Requirements to have same versions as  tf_sig_build_dockerfiles/devel.requirements.txt (#65)

* Add additional gitignore files

* Update requirements with same versions

Keep versions consistent with  tf_sig_build_dockerfiles/devel.requirements.txt

Cleanup

Fix Build issue from `python_include` (#67)

* Remove Python 3.10 pip special handling

* Link usr/include to usr/local/include

* Update location of python include

* Update setup.python.sh

Assorted changes -- see details

- Remove installation of nvidia-profiler, which depends on libcuda1,
which ultimately installs an nvidia driver package, which we don't want
because we're running in docker, in which the drivers are mounted. I
hope nvidia-profiler isn't necessary for anything important; otherwise
we'll need to synchronize driver versions between the containers and VM
images.
- Add less, colordiff and a newer version of clang-format
- Add code_check_changed_files, which is intended to replace the
"incremental" parts of ci_sanity. Still a work in progress because we
need to decide on valuable configurations (clang-format and pylint
cannot be run the same way as we have them configured internally and
currently have a lot of findings)
- Add code_check_full, which is intended to replace the "across entire
code base" parts of ci_sanity. I rewrote many of the clunkier tests.
Still a work in progress because we must verify that the changed tests
will still fail.
- Fix bad "bazel test " expansion for libtensorflow
- Fix bad chmod for libtensoflow repacker

Change libtensorflow config values to fix target selection

Fix a typo in venv installation

(Thanks to reedwm)

Remove extra lines

(Thanks again to reedwm)

Clarify ctrl-s warning

Correctly remove extra test filters

Make it possible to run isolated pip tests

More work on code checks

Fix a typo

Clean up code check full

Remove clang-format

Cleanup changed_files and move one to full

Add a missing test

Clean up and fix code_check_full

Update docs and create experimental RBE configs

Update docs and create experimental RBE configs

Update dependencies to 2.9.0.dev

Update Go API installation guide for TensorFlow 2.8.0 (#74)

Clarify usage of nightly commit

Fix mistaken 'test' command

Update docs and create experimental RBE configs

Update docs and create experimental RBE configs

Update dependencies to 2.9.0.dev

Update Go API installation guide for TensorFlow 2.8.0 (#74)

Clarify usage of nightly commit

Fix mistaken 'test' command

change to devtoolset-9 and gcc 9.3.1 for manylinux2014
nitins17 added a commit that referenced this pull request Mar 2, 2022
Set up symlink for devtoolset-8

Combine Docker GCR presubmits and also push main to gcr

Commit missed files

Log in to GCR

Fix conditional, hopefully

Clarify

Add Python 3.10 support (#58)

Adds Python 3.10 support to the containers. Python 3.10 changes some library behavior and, for now, needs an alternative installation method to work.

Upgrade gcrpio for fast build and cleanup setup

Add utilities for running release tests (#56)

This adds the dependencies and notably bazelrc config options to run TensorFlow's Nightly and Release tests, which I've been working on replicating on internal CI. I still have documentation and migration work to do, but the major portion of the support work is here.

add gdb to the system packages

change to gcc 8.3.1 from centos7 for devtoolset8

fix libstdc++ symlink in devtoolset-8 environment

Undo ignoring other xml files

Update README

Deduplicate repeated messages

Squash long runfiles paths

Lock nvidia driver to 460

libtensorflow work

Fix libtensorflow script and start prelim check

Update Test Requirements to have same versions as  tf_sig_build_dockerfiles/devel.requirements.txt (#65)

* Add additional gitignore files

* Update requirements with same versions

Keep versions consistent with  tf_sig_build_dockerfiles/devel.requirements.txt

Cleanup

Fix Build issue from `python_include` (#67)

* Remove Python 3.10 pip special handling

* Link usr/include to usr/local/include

* Update location of python include

* Update setup.python.sh

Assorted changes -- see details

- Remove installation of nvidia-profiler, which depends on libcuda1,
which ultimately installs an nvidia driver package, which we don't want
because we're running in docker, in which the drivers are mounted. I
hope nvidia-profiler isn't necessary for anything important; otherwise
we'll need to synchronize driver versions between the containers and VM
images.
- Add less, colordiff and a newer version of clang-format
- Add code_check_changed_files, which is intended to replace the
"incremental" parts of ci_sanity. Still a work in progress because we
need to decide on valuable configurations (clang-format and pylint
cannot be run the same way as we have them configured internally and
currently have a lot of findings)
- Add code_check_full, which is intended to replace the "across entire
code base" parts of ci_sanity. I rewrote many of the clunkier tests.
Still a work in progress because we must verify that the changed tests
will still fail.
- Fix bad "bazel test " expansion for libtensorflow
- Fix bad chmod for libtensoflow repacker

Change libtensorflow config values to fix target selection

Fix a typo in venv installation

(Thanks to reedwm)

Remove extra lines

(Thanks again to reedwm)

Clarify ctrl-s warning

Correctly remove extra test filters

Make it possible to run isolated pip tests

More work on code checks

Fix a typo

Clean up code check full

Remove clang-format

Cleanup changed_files and move one to full

Add a missing test

Clean up and fix code_check_full

Update docs and create experimental RBE configs

Update docs and create experimental RBE configs

Update dependencies to 2.9.0.dev

Update Go API installation guide for TensorFlow 2.8.0 (#74)

Clarify usage of nightly commit

Fix mistaken 'test' command

Update docs and create experimental RBE configs

Update docs and create experimental RBE configs

Update dependencies to 2.9.0.dev

Update Go API installation guide for TensorFlow 2.8.0 (#74)

Clarify usage of nightly commit

Fix mistaken 'test' command

change to devtoolset-9 and gcc 9.3.1 for manylinux2014
nitins17 added a commit that referenced this pull request Mar 15, 2022
Set up symlink for devtoolset-8

Combine Docker GCR presubmits and also push main to gcr

Commit missed files

Log in to GCR

Fix conditional, hopefully

Clarify

Add Python 3.10 support (#58)

Adds Python 3.10 support to the containers. Python 3.10 changes some library behavior and, for now, needs an alternative installation method to work.

Upgrade gcrpio for fast build and cleanup setup

Add utilities for running release tests (#56)

This adds the dependencies and notably bazelrc config options to run TensorFlow's Nightly and Release tests, which I've been working on replicating on internal CI. I still have documentation and migration work to do, but the major portion of the support work is here.

add gdb to the system packages

change to gcc 8.3.1 from centos7 for devtoolset8

fix libstdc++ symlink in devtoolset-8 environment

Undo ignoring other xml files

Update README

Deduplicate repeated messages

Squash long runfiles paths

Lock nvidia driver to 460

libtensorflow work

Fix libtensorflow script and start prelim check

Update Test Requirements to have same versions as  tf_sig_build_dockerfiles/devel.requirements.txt (#65)

* Add additional gitignore files

* Update requirements with same versions

Keep versions consistent with  tf_sig_build_dockerfiles/devel.requirements.txt

Cleanup

Fix Build issue from `python_include` (#67)

* Remove Python 3.10 pip special handling

* Link usr/include to usr/local/include

* Update location of python include

* Update setup.python.sh

Assorted changes -- see details

- Remove installation of nvidia-profiler, which depends on libcuda1,
which ultimately installs an nvidia driver package, which we don't want
because we're running in docker, in which the drivers are mounted. I
hope nvidia-profiler isn't necessary for anything important; otherwise
we'll need to synchronize driver versions between the containers and VM
images.
- Add less, colordiff and a newer version of clang-format
- Add code_check_changed_files, which is intended to replace the
"incremental" parts of ci_sanity. Still a work in progress because we
need to decide on valuable configurations (clang-format and pylint
cannot be run the same way as we have them configured internally and
currently have a lot of findings)
- Add code_check_full, which is intended to replace the "across entire
code base" parts of ci_sanity. I rewrote many of the clunkier tests.
Still a work in progress because we must verify that the changed tests
will still fail.
- Fix bad "bazel test " expansion for libtensorflow
- Fix bad chmod for libtensoflow repacker

Change libtensorflow config values to fix target selection

Fix a typo in venv installation

(Thanks to reedwm)

Remove extra lines

(Thanks again to reedwm)

Clarify ctrl-s warning

Correctly remove extra test filters

Make it possible to run isolated pip tests

More work on code checks

Fix a typo

Clean up code check full

Remove clang-format

Cleanup changed_files and move one to full

Add a missing test

Clean up and fix code_check_full

Update docs and create experimental RBE configs

Update docs and create experimental RBE configs

Update dependencies to 2.9.0.dev

Update Go API installation guide for TensorFlow 2.8.0 (#74)

Clarify usage of nightly commit

Fix mistaken 'test' command

Update docs and create experimental RBE configs

Update docs and create experimental RBE configs

Update dependencies to 2.9.0.dev

Update Go API installation guide for TensorFlow 2.8.0 (#74)

Clarify usage of nightly commit

Fix mistaken 'test' command

change to devtoolset-9 and gcc 9.3.1 for manylinux2014

change cachebuster value for ml2014 remote cache

change to new libstdcxx abi for devtoolset-9

change cachebuster value to use the new libstdcxx abi

link against nonshared44 in devtoolset-9

update the cachebuster value

change CACHEBUSTER value for gpu builds

remove redudant commands during build environment setup

change cachbuster variable name for gpu builds

store manylinux2014 cache in a different location
nitins17 added a commit that referenced this pull request Mar 15, 2022
Set up symlink for devtoolset-8

Combine Docker GCR presubmits and also push main to gcr

Commit missed files

Log in to GCR

Fix conditional, hopefully

Clarify

Add Python 3.10 support (#58)

Adds Python 3.10 support to the containers. Python 3.10 changes some library behavior and, for now, needs an alternative installation method to work.

Upgrade gcrpio for fast build and cleanup setup

Add utilities for running release tests (#56)

This adds the dependencies and notably bazelrc config options to run TensorFlow's Nightly and Release tests, which I've been working on replicating on internal CI. I still have documentation and migration work to do, but the major portion of the support work is here.

add gdb to the system packages

change to gcc 8.3.1 from centos7 for devtoolset8

fix libstdc++ symlink in devtoolset-8 environment

Undo ignoring other xml files

Update README

Deduplicate repeated messages

Squash long runfiles paths

Lock nvidia driver to 460

libtensorflow work

Fix libtensorflow script and start prelim check

Update Test Requirements to have same versions as  tf_sig_build_dockerfiles/devel.requirements.txt (#65)

* Add additional gitignore files

* Update requirements with same versions

Keep versions consistent with  tf_sig_build_dockerfiles/devel.requirements.txt

Cleanup

Fix Build issue from `python_include` (#67)

* Remove Python 3.10 pip special handling

* Link usr/include to usr/local/include

* Update location of python include

* Update setup.python.sh

Assorted changes -- see details

- Remove installation of nvidia-profiler, which depends on libcuda1,
which ultimately installs an nvidia driver package, which we don't want
because we're running in docker, in which the drivers are mounted. I
hope nvidia-profiler isn't necessary for anything important; otherwise
we'll need to synchronize driver versions between the containers and VM
images.
- Add less, colordiff and a newer version of clang-format
- Add code_check_changed_files, which is intended to replace the
"incremental" parts of ci_sanity. Still a work in progress because we
need to decide on valuable configurations (clang-format and pylint
cannot be run the same way as we have them configured internally and
currently have a lot of findings)
- Add code_check_full, which is intended to replace the "across entire
code base" parts of ci_sanity. I rewrote many of the clunkier tests.
Still a work in progress because we must verify that the changed tests
will still fail.
- Fix bad "bazel test " expansion for libtensorflow
- Fix bad chmod for libtensoflow repacker

Change libtensorflow config values to fix target selection

Fix a typo in venv installation

(Thanks to reedwm)

Remove extra lines

(Thanks again to reedwm)

Clarify ctrl-s warning

Correctly remove extra test filters

Make it possible to run isolated pip tests

More work on code checks

Fix a typo

Clean up code check full

Remove clang-format

Cleanup changed_files and move one to full

Add a missing test

Clean up and fix code_check_full

Update docs and create experimental RBE configs

Update docs and create experimental RBE configs

Update dependencies to 2.9.0.dev

Update Go API installation guide for TensorFlow 2.8.0 (#74)

Clarify usage of nightly commit

Fix mistaken 'test' command

Update docs and create experimental RBE configs

Update docs and create experimental RBE configs

Update dependencies to 2.9.0.dev

Update Go API installation guide for TensorFlow 2.8.0 (#74)

Clarify usage of nightly commit

Fix mistaken 'test' command

change to devtoolset-9 and gcc 9.3.1 for manylinux2014

change cachebuster value for ml2014 remote cache

change to new libstdcxx abi for devtoolset-9

change cachebuster value to use the new libstdcxx abi

link against nonshared44 in devtoolset-9

update the cachebuster value

change CACHEBUSTER value for gpu builds

remove redudant commands during build environment setup

change cachbuster variable name for gpu builds

store manylinux2014 cache in a different location

amend comment for accuracy
angerson pushed a commit that referenced this pull request Mar 21, 2022
Set up symlink for devtoolset-8

Combine Docker GCR presubmits and also push main to gcr

Commit missed files

Log in to GCR

Fix conditional, hopefully

Clarify

Add Python 3.10 support (#58)

Adds Python 3.10 support to the containers. Python 3.10 changes some library behavior and, for now, needs an alternative installation method to work.

Upgrade gcrpio for fast build and cleanup setup

Add utilities for running release tests (#56)

This adds the dependencies and notably bazelrc config options to run TensorFlow's Nightly and Release tests, which I've been working on replicating on internal CI. I still have documentation and migration work to do, but the major portion of the support work is here.

add gdb to the system packages

change to gcc 8.3.1 from centos7 for devtoolset8

fix libstdc++ symlink in devtoolset-8 environment

Undo ignoring other xml files

Update README

Deduplicate repeated messages

Squash long runfiles paths

Lock nvidia driver to 460

libtensorflow work

Fix libtensorflow script and start prelim check

Update Test Requirements to have same versions as  tf_sig_build_dockerfiles/devel.requirements.txt (#65)

* Add additional gitignore files

* Update requirements with same versions

Keep versions consistent with  tf_sig_build_dockerfiles/devel.requirements.txt

Cleanup

Fix Build issue from `python_include` (#67)

* Remove Python 3.10 pip special handling

* Link usr/include to usr/local/include

* Update location of python include

* Update setup.python.sh

Assorted changes -- see details

- Remove installation of nvidia-profiler, which depends on libcuda1,
which ultimately installs an nvidia driver package, which we don't want
because we're running in docker, in which the drivers are mounted. I
hope nvidia-profiler isn't necessary for anything important; otherwise
we'll need to synchronize driver versions between the containers and VM
images.
- Add less, colordiff and a newer version of clang-format
- Add code_check_changed_files, which is intended to replace the
"incremental" parts of ci_sanity. Still a work in progress because we
need to decide on valuable configurations (clang-format and pylint
cannot be run the same way as we have them configured internally and
currently have a lot of findings)
- Add code_check_full, which is intended to replace the "across entire
code base" parts of ci_sanity. I rewrote many of the clunkier tests.
Still a work in progress because we must verify that the changed tests
will still fail.
- Fix bad "bazel test " expansion for libtensorflow
- Fix bad chmod for libtensoflow repacker

Change libtensorflow config values to fix target selection

Fix a typo in venv installation

(Thanks to reedwm)

Remove extra lines

(Thanks again to reedwm)

Clarify ctrl-s warning

Correctly remove extra test filters

Make it possible to run isolated pip tests

More work on code checks

Fix a typo

Clean up code check full

Remove clang-format

Cleanup changed_files and move one to full

Add a missing test

Clean up and fix code_check_full

Update docs and create experimental RBE configs

Update docs and create experimental RBE configs

Update dependencies to 2.9.0.dev

Update Go API installation guide for TensorFlow 2.8.0 (#74)

Clarify usage of nightly commit

Fix mistaken 'test' command

Update docs and create experimental RBE configs

Update docs and create experimental RBE configs

Update dependencies to 2.9.0.dev

Update Go API installation guide for TensorFlow 2.8.0 (#74)

Clarify usage of nightly commit

Fix mistaken 'test' command

change to devtoolset-9 and gcc 9.3.1 for manylinux2014

change cachebuster value for ml2014 remote cache

change to new libstdcxx abi for devtoolset-9

change cachebuster value to use the new libstdcxx abi

link against nonshared44 in devtoolset-9

update the cachebuster value

change CACHEBUSTER value for gpu builds

remove redudant commands during build environment setup

change cachbuster variable name for gpu builds

store manylinux2014 cache in a different location

amend comment for accuracy
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
build and push to gcr.io for staging Create a staging release on gcr.io sig build dockerfiles Relating to the TF SIG Build Dockerfiles
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants