-
Notifications
You must be signed in to change notification settings - Fork 126
Add utilities for running release tests #56
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
I pushed these containers:
|
I think this is useful and partially related to our old threads/PR: Do you think that we need to enable these tests over install nightly wheels cause we still cannot rely on the cache to build TF in a Github Action and run the standard tests? |
# drops all the bazel dependencies for each py_test; this makes all the tests | ||
# use the wheel's TensorFlow installation instead of the one made available | ||
# through bazel. This must be done in a different root directory, //bazel_pip/..., | ||
# because "import tensorflow" run from the root directory would instead import |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think that this will work in same root dir?
https://github.com/tensorflow/tensorflow/pull/50163/files#diff-364cf915db9af8a15c691cddfdb7a5808e927715ada650b01911e74d1e2ba944R187
It is a little bit old so I don't remember if it worked or not
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the end, until the cache reproducibility is verified, it could be really useful if we could find a solution to run in source python tests on pip installed wheels. Do you think that there is a quick workaround to implement also this modality?
@bhack FYI, I'm going to remove the exec-log option when this is merged because of bazelbuild/bazel#12510, which wasn't fixed until bazel 5 or 6 (bazelbuild/bazel@e58dd7e). |
It is not a problem for me as I could add this flag manually but as you are using this in CI we could loose the CI execution log to debug why we have all these cache misses with the nightly build with the same Docker image and commit. As it is seems that the bug it is only with |
Well... even if we did store the logs, it's not a high priority right now (until Q1 at least) to debug the cache misses, and I wouldn't use them. There is a lot of work I am doing first to replicate the rest of our internal CI, which doesn't use the cache very much. |
Ok please comment out these lines with a reference to the bug without removing the whole block. We will review it in Q1. Just a side note about the priorities: |
I don't know what Is the final scope of the log squasher but probably with Basel |
I've made significant progress on this, and am going to merge it; there is still some work that needs to be done on documentation, and I expect I'll be making more changes to some of the helper tools. Those are still on the way, but since these containers are generally working now for my experimental internal CI jobs, I'll commit the change and continue work on new PRs. |
This reverts commit 6089e13.
Set up symlink for devtoolset-8 Combine Docker GCR presubmits and also push main to gcr Commit missed files Log in to GCR Fix conditional, hopefully Clarify Add Python 3.10 support (#58) Adds Python 3.10 support to the containers. Python 3.10 changes some library behavior and, for now, needs an alternative installation method to work. Upgrade gcrpio for fast build and cleanup setup Add utilities for running release tests (#56) This adds the dependencies and notably bazelrc config options to run TensorFlow's Nightly and Release tests, which I've been working on replicating on internal CI. I still have documentation and migration work to do, but the major portion of the support work is here. add gdb to the system packages change to gcc 8.3.1 from centos7 for devtoolset8 fix libstdc++ symlink in devtoolset-8 environment Undo ignoring other xml files Update README Deduplicate repeated messages Squash long runfiles paths Lock nvidia driver to 460 libtensorflow work Fix libtensorflow script and start prelim check Update Test Requirements to have same versions as tf_sig_build_dockerfiles/devel.requirements.txt (#65) * Add additional gitignore files * Update requirements with same versions Keep versions consistent with tf_sig_build_dockerfiles/devel.requirements.txt Cleanup Fix Build issue from `python_include` (#67) * Remove Python 3.10 pip special handling * Link usr/include to usr/local/include * Update location of python include * Update setup.python.sh Assorted changes -- see details - Remove installation of nvidia-profiler, which depends on libcuda1, which ultimately installs an nvidia driver package, which we don't want because we're running in docker, in which the drivers are mounted. I hope nvidia-profiler isn't necessary for anything important; otherwise we'll need to synchronize driver versions between the containers and VM images. - Add less, colordiff and a newer version of clang-format - Add code_check_changed_files, which is intended to replace the "incremental" parts of ci_sanity. Still a work in progress because we need to decide on valuable configurations (clang-format and pylint cannot be run the same way as we have them configured internally and currently have a lot of findings) - Add code_check_full, which is intended to replace the "across entire code base" parts of ci_sanity. I rewrote many of the clunkier tests. Still a work in progress because we must verify that the changed tests will still fail. - Fix bad "bazel test " expansion for libtensorflow - Fix bad chmod for libtensoflow repacker Change libtensorflow config values to fix target selection Fix a typo in venv installation (Thanks to reedwm) Remove extra lines (Thanks again to reedwm) Clarify ctrl-s warning Correctly remove extra test filters Make it possible to run isolated pip tests More work on code checks Fix a typo Clean up code check full Remove clang-format Cleanup changed_files and move one to full Add a missing test Clean up and fix code_check_full Update docs and create experimental RBE configs Update docs and create experimental RBE configs Update dependencies to 2.9.0.dev Update Go API installation guide for TensorFlow 2.8.0 (#74) Clarify usage of nightly commit Fix mistaken 'test' command Update docs and create experimental RBE configs Update docs and create experimental RBE configs Update dependencies to 2.9.0.dev Update Go API installation guide for TensorFlow 2.8.0 (#74) Clarify usage of nightly commit Fix mistaken 'test' command change to devtoolset-9 and gcc 9.3.1 for manylinux2014
Set up symlink for devtoolset-8 Combine Docker GCR presubmits and also push main to gcr Commit missed files Log in to GCR Fix conditional, hopefully Clarify Add Python 3.10 support (#58) Adds Python 3.10 support to the containers. Python 3.10 changes some library behavior and, for now, needs an alternative installation method to work. Upgrade gcrpio for fast build and cleanup setup Add utilities for running release tests (#56) This adds the dependencies and notably bazelrc config options to run TensorFlow's Nightly and Release tests, which I've been working on replicating on internal CI. I still have documentation and migration work to do, but the major portion of the support work is here. add gdb to the system packages change to gcc 8.3.1 from centos7 for devtoolset8 fix libstdc++ symlink in devtoolset-8 environment Undo ignoring other xml files Update README Deduplicate repeated messages Squash long runfiles paths Lock nvidia driver to 460 libtensorflow work Fix libtensorflow script and start prelim check Update Test Requirements to have same versions as tf_sig_build_dockerfiles/devel.requirements.txt (#65) * Add additional gitignore files * Update requirements with same versions Keep versions consistent with tf_sig_build_dockerfiles/devel.requirements.txt Cleanup Fix Build issue from `python_include` (#67) * Remove Python 3.10 pip special handling * Link usr/include to usr/local/include * Update location of python include * Update setup.python.sh Assorted changes -- see details - Remove installation of nvidia-profiler, which depends on libcuda1, which ultimately installs an nvidia driver package, which we don't want because we're running in docker, in which the drivers are mounted. I hope nvidia-profiler isn't necessary for anything important; otherwise we'll need to synchronize driver versions between the containers and VM images. - Add less, colordiff and a newer version of clang-format - Add code_check_changed_files, which is intended to replace the "incremental" parts of ci_sanity. Still a work in progress because we need to decide on valuable configurations (clang-format and pylint cannot be run the same way as we have them configured internally and currently have a lot of findings) - Add code_check_full, which is intended to replace the "across entire code base" parts of ci_sanity. I rewrote many of the clunkier tests. Still a work in progress because we must verify that the changed tests will still fail. - Fix bad "bazel test " expansion for libtensorflow - Fix bad chmod for libtensoflow repacker Change libtensorflow config values to fix target selection Fix a typo in venv installation (Thanks to reedwm) Remove extra lines (Thanks again to reedwm) Clarify ctrl-s warning Correctly remove extra test filters Make it possible to run isolated pip tests More work on code checks Fix a typo Clean up code check full Remove clang-format Cleanup changed_files and move one to full Add a missing test Clean up and fix code_check_full Update docs and create experimental RBE configs Update docs and create experimental RBE configs Update dependencies to 2.9.0.dev Update Go API installation guide for TensorFlow 2.8.0 (#74) Clarify usage of nightly commit Fix mistaken 'test' command Update docs and create experimental RBE configs Update docs and create experimental RBE configs Update dependencies to 2.9.0.dev Update Go API installation guide for TensorFlow 2.8.0 (#74) Clarify usage of nightly commit Fix mistaken 'test' command change to devtoolset-9 and gcc 9.3.1 for manylinux2014
Set up symlink for devtoolset-8 Combine Docker GCR presubmits and also push main to gcr Commit missed files Log in to GCR Fix conditional, hopefully Clarify Add Python 3.10 support (#58) Adds Python 3.10 support to the containers. Python 3.10 changes some library behavior and, for now, needs an alternative installation method to work. Upgrade gcrpio for fast build and cleanup setup Add utilities for running release tests (#56) This adds the dependencies and notably bazelrc config options to run TensorFlow's Nightly and Release tests, which I've been working on replicating on internal CI. I still have documentation and migration work to do, but the major portion of the support work is here. add gdb to the system packages change to gcc 8.3.1 from centos7 for devtoolset8 fix libstdc++ symlink in devtoolset-8 environment Undo ignoring other xml files Update README Deduplicate repeated messages Squash long runfiles paths Lock nvidia driver to 460 libtensorflow work Fix libtensorflow script and start prelim check Update Test Requirements to have same versions as tf_sig_build_dockerfiles/devel.requirements.txt (#65) * Add additional gitignore files * Update requirements with same versions Keep versions consistent with tf_sig_build_dockerfiles/devel.requirements.txt Cleanup Fix Build issue from `python_include` (#67) * Remove Python 3.10 pip special handling * Link usr/include to usr/local/include * Update location of python include * Update setup.python.sh Assorted changes -- see details - Remove installation of nvidia-profiler, which depends on libcuda1, which ultimately installs an nvidia driver package, which we don't want because we're running in docker, in which the drivers are mounted. I hope nvidia-profiler isn't necessary for anything important; otherwise we'll need to synchronize driver versions between the containers and VM images. - Add less, colordiff and a newer version of clang-format - Add code_check_changed_files, which is intended to replace the "incremental" parts of ci_sanity. Still a work in progress because we need to decide on valuable configurations (clang-format and pylint cannot be run the same way as we have them configured internally and currently have a lot of findings) - Add code_check_full, which is intended to replace the "across entire code base" parts of ci_sanity. I rewrote many of the clunkier tests. Still a work in progress because we must verify that the changed tests will still fail. - Fix bad "bazel test " expansion for libtensorflow - Fix bad chmod for libtensoflow repacker Change libtensorflow config values to fix target selection Fix a typo in venv installation (Thanks to reedwm) Remove extra lines (Thanks again to reedwm) Clarify ctrl-s warning Correctly remove extra test filters Make it possible to run isolated pip tests More work on code checks Fix a typo Clean up code check full Remove clang-format Cleanup changed_files and move one to full Add a missing test Clean up and fix code_check_full Update docs and create experimental RBE configs Update docs and create experimental RBE configs Update dependencies to 2.9.0.dev Update Go API installation guide for TensorFlow 2.8.0 (#74) Clarify usage of nightly commit Fix mistaken 'test' command Update docs and create experimental RBE configs Update docs and create experimental RBE configs Update dependencies to 2.9.0.dev Update Go API installation guide for TensorFlow 2.8.0 (#74) Clarify usage of nightly commit Fix mistaken 'test' command change to devtoolset-9 and gcc 9.3.1 for manylinux2014 change cachebuster value for ml2014 remote cache change to new libstdcxx abi for devtoolset-9 change cachebuster value to use the new libstdcxx abi link against nonshared44 in devtoolset-9 update the cachebuster value change CACHEBUSTER value for gpu builds remove redudant commands during build environment setup change cachbuster variable name for gpu builds store manylinux2014 cache in a different location
Set up symlink for devtoolset-8 Combine Docker GCR presubmits and also push main to gcr Commit missed files Log in to GCR Fix conditional, hopefully Clarify Add Python 3.10 support (#58) Adds Python 3.10 support to the containers. Python 3.10 changes some library behavior and, for now, needs an alternative installation method to work. Upgrade gcrpio for fast build and cleanup setup Add utilities for running release tests (#56) This adds the dependencies and notably bazelrc config options to run TensorFlow's Nightly and Release tests, which I've been working on replicating on internal CI. I still have documentation and migration work to do, but the major portion of the support work is here. add gdb to the system packages change to gcc 8.3.1 from centos7 for devtoolset8 fix libstdc++ symlink in devtoolset-8 environment Undo ignoring other xml files Update README Deduplicate repeated messages Squash long runfiles paths Lock nvidia driver to 460 libtensorflow work Fix libtensorflow script and start prelim check Update Test Requirements to have same versions as tf_sig_build_dockerfiles/devel.requirements.txt (#65) * Add additional gitignore files * Update requirements with same versions Keep versions consistent with tf_sig_build_dockerfiles/devel.requirements.txt Cleanup Fix Build issue from `python_include` (#67) * Remove Python 3.10 pip special handling * Link usr/include to usr/local/include * Update location of python include * Update setup.python.sh Assorted changes -- see details - Remove installation of nvidia-profiler, which depends on libcuda1, which ultimately installs an nvidia driver package, which we don't want because we're running in docker, in which the drivers are mounted. I hope nvidia-profiler isn't necessary for anything important; otherwise we'll need to synchronize driver versions between the containers and VM images. - Add less, colordiff and a newer version of clang-format - Add code_check_changed_files, which is intended to replace the "incremental" parts of ci_sanity. Still a work in progress because we need to decide on valuable configurations (clang-format and pylint cannot be run the same way as we have them configured internally and currently have a lot of findings) - Add code_check_full, which is intended to replace the "across entire code base" parts of ci_sanity. I rewrote many of the clunkier tests. Still a work in progress because we must verify that the changed tests will still fail. - Fix bad "bazel test " expansion for libtensorflow - Fix bad chmod for libtensoflow repacker Change libtensorflow config values to fix target selection Fix a typo in venv installation (Thanks to reedwm) Remove extra lines (Thanks again to reedwm) Clarify ctrl-s warning Correctly remove extra test filters Make it possible to run isolated pip tests More work on code checks Fix a typo Clean up code check full Remove clang-format Cleanup changed_files and move one to full Add a missing test Clean up and fix code_check_full Update docs and create experimental RBE configs Update docs and create experimental RBE configs Update dependencies to 2.9.0.dev Update Go API installation guide for TensorFlow 2.8.0 (#74) Clarify usage of nightly commit Fix mistaken 'test' command Update docs and create experimental RBE configs Update docs and create experimental RBE configs Update dependencies to 2.9.0.dev Update Go API installation guide for TensorFlow 2.8.0 (#74) Clarify usage of nightly commit Fix mistaken 'test' command change to devtoolset-9 and gcc 9.3.1 for manylinux2014 change cachebuster value for ml2014 remote cache change to new libstdcxx abi for devtoolset-9 change cachebuster value to use the new libstdcxx abi link against nonshared44 in devtoolset-9 update the cachebuster value change CACHEBUSTER value for gpu builds remove redudant commands during build environment setup change cachbuster variable name for gpu builds store manylinux2014 cache in a different location amend comment for accuracy
Set up symlink for devtoolset-8 Combine Docker GCR presubmits and also push main to gcr Commit missed files Log in to GCR Fix conditional, hopefully Clarify Add Python 3.10 support (#58) Adds Python 3.10 support to the containers. Python 3.10 changes some library behavior and, for now, needs an alternative installation method to work. Upgrade gcrpio for fast build and cleanup setup Add utilities for running release tests (#56) This adds the dependencies and notably bazelrc config options to run TensorFlow's Nightly and Release tests, which I've been working on replicating on internal CI. I still have documentation and migration work to do, but the major portion of the support work is here. add gdb to the system packages change to gcc 8.3.1 from centos7 for devtoolset8 fix libstdc++ symlink in devtoolset-8 environment Undo ignoring other xml files Update README Deduplicate repeated messages Squash long runfiles paths Lock nvidia driver to 460 libtensorflow work Fix libtensorflow script and start prelim check Update Test Requirements to have same versions as tf_sig_build_dockerfiles/devel.requirements.txt (#65) * Add additional gitignore files * Update requirements with same versions Keep versions consistent with tf_sig_build_dockerfiles/devel.requirements.txt Cleanup Fix Build issue from `python_include` (#67) * Remove Python 3.10 pip special handling * Link usr/include to usr/local/include * Update location of python include * Update setup.python.sh Assorted changes -- see details - Remove installation of nvidia-profiler, which depends on libcuda1, which ultimately installs an nvidia driver package, which we don't want because we're running in docker, in which the drivers are mounted. I hope nvidia-profiler isn't necessary for anything important; otherwise we'll need to synchronize driver versions between the containers and VM images. - Add less, colordiff and a newer version of clang-format - Add code_check_changed_files, which is intended to replace the "incremental" parts of ci_sanity. Still a work in progress because we need to decide on valuable configurations (clang-format and pylint cannot be run the same way as we have them configured internally and currently have a lot of findings) - Add code_check_full, which is intended to replace the "across entire code base" parts of ci_sanity. I rewrote many of the clunkier tests. Still a work in progress because we must verify that the changed tests will still fail. - Fix bad "bazel test " expansion for libtensorflow - Fix bad chmod for libtensoflow repacker Change libtensorflow config values to fix target selection Fix a typo in venv installation (Thanks to reedwm) Remove extra lines (Thanks again to reedwm) Clarify ctrl-s warning Correctly remove extra test filters Make it possible to run isolated pip tests More work on code checks Fix a typo Clean up code check full Remove clang-format Cleanup changed_files and move one to full Add a missing test Clean up and fix code_check_full Update docs and create experimental RBE configs Update docs and create experimental RBE configs Update dependencies to 2.9.0.dev Update Go API installation guide for TensorFlow 2.8.0 (#74) Clarify usage of nightly commit Fix mistaken 'test' command Update docs and create experimental RBE configs Update docs and create experimental RBE configs Update dependencies to 2.9.0.dev Update Go API installation guide for TensorFlow 2.8.0 (#74) Clarify usage of nightly commit Fix mistaken 'test' command change to devtoolset-9 and gcc 9.3.1 for manylinux2014 change cachebuster value for ml2014 remote cache change to new libstdcxx abi for devtoolset-9 change cachebuster value to use the new libstdcxx abi link against nonshared44 in devtoolset-9 update the cachebuster value change CACHEBUSTER value for gpu builds remove redudant commands during build environment setup change cachbuster variable name for gpu builds store manylinux2014 cache in a different location amend comment for accuracy
Now that tf-nightly is working, I'm working on getting the tests working just the same as they are in our internal CI. I know for sure that many test dependencies are missing, and first I'm getting jobs online so that I can see just how much is absent.