Add utilities for running release tests #56

angerson · 2021-12-02T21:11:14Z

Now that tf-nightly is working, I'm working on getting the tests working just the same as they are in our internal CI. I know for sure that many test dependencies are missing, and first I'm getting jobs online so that I can see just how much is absent.

github-actions · 2021-12-02T21:17:05Z

I pushed these containers:

gcr.io/tensorflow-sigs/build:56-python3.9
gcr.io/tensorflow-sigs/build:56-python3.8
gcr.io/tensorflow-sigs/build:56-python3.7
Re-apply the build and push to gcr.io for staging label to
rebuild and push again. This comment will only be posted once.

bhack · 2021-12-02T22:18:58Z

I think this is useful and partially related to our old threads/PR:
https://discuss.tensorflow.org/t/run-python-tests-without-compiling-tf/1724
tensorflow/tensorflow#50163

Do you think that we need to enable these tests over install nightly wheels cause we still cannot rely on the cache to build TF in a Github Action and run the standard tests?

bhack · 2021-12-02T22:20:22Z

tf_sig_build_dockerfiles/devel.usertools/cpu.bazelrc

+# drops all the bazel dependencies for each py_test; this makes all the tests
+# use the wheel's TensorFlow installation instead of the one made available
+# through bazel. This must be done in a different root directory, //bazel_pip/...,
+# because "import tensorflow" run from the root directory would instead import


Do you think that this will work in same root dir?
https://github.com/tensorflow/tensorflow/pull/50163/files#diff-364cf915db9af8a15c691cddfdb7a5808e927715ada650b01911e74d1e2ba944R187

It is a little bit old so I don't remember if it worked or not

In the end, until the cache reproducibility is verified, it could be really useful if we could find a solution to run in source python tests on pip installed wheels. Do you think that there is a quick workaround to implement also this modality?

angerson · 2021-12-03T19:48:52Z

@bhack FYI, I'm going to remove the exec-log option when this is merged because of bazelbuild/bazel#12510, which wasn't fixed until bazel 5 or 6 (bazelbuild/bazel@e58dd7e).

bhack · 2021-12-03T20:24:04Z

It is not a problem for me as I could add this flag manually but as you are using this in CI we could loose the CI execution log to debug why we have all these cache misses with the nightly build with the same Docker image and commit.

As it is seems that the bug it is only with bazel test can we remove it just for that?

angerson · 2021-12-03T20:32:27Z

Well... even if we did store the logs, it's not a high priority right now (until Q1 at least) to debug the cache misses, and I wouldn't use them. There is a lot of work I am doing first to replicate the rest of our internal CI, which doesn't use the cache very much.

bhack · 2021-12-03T20:44:23Z

Ok please comment out these lines with a reference to the bug without removing the whole block. We will review it in Q1.

Just a side note about the priorities:
as the roadmap is not public we just guess what the team is doing or not and we cannot do better then guessing.

bhack · 2021-12-31T08:12:31Z

I don't know what Is the final scope of the log squasher but probably with Basel --test-outputand --test-summary you could get test failures with these args.

angerson · 2022-01-06T21:54:06Z

I've made significant progress on this, and am going to merge it; there is still some work that needs to be done on documentation, and I expect I'll be making more changes to some of the helper tools. Those are still on the way, but since these containers are generally working now for my experimental internal CI jobs, I'll commit the change and continue work on new PRs.

This reverts commit 6089e13.

Set up symlink for devtoolset-8 Combine Docker GCR presubmits and also push main to gcr Commit missed files Log in to GCR Fix conditional, hopefully Clarify Add Python 3.10 support (#58) Adds Python 3.10 support to the containers. Python 3.10 changes some library behavior and, for now, needs an alternative installation method to work. Upgrade gcrpio for fast build and cleanup setup Add utilities for running release tests (#56) This adds the dependencies and notably bazelrc config options to run TensorFlow's Nightly and Release tests, which I've been working on replicating on internal CI. I still have documentation and migration work to do, but the major portion of the support work is here. add gdb to the system packages change to gcc 8.3.1 from centos7 for devtoolset8 fix libstdc++ symlink in devtoolset-8 environment Undo ignoring other xml files Update README Deduplicate repeated messages Squash long runfiles paths Lock nvidia driver to 460 libtensorflow work Fix libtensorflow script and start prelim check Update Test Requirements to have same versions as tf_sig_build_dockerfiles/devel.requirements.txt (#65) * Add additional gitignore files * Update requirements with same versions Keep versions consistent with tf_sig_build_dockerfiles/devel.requirements.txt Cleanup Fix Build issue from `python_include` (#67) * Remove Python 3.10 pip special handling * Link usr/include to usr/local/include * Update location of python include * Update setup.python.sh Assorted changes -- see details - Remove installation of nvidia-profiler, which depends on libcuda1, which ultimately installs an nvidia driver package, which we don't want because we're running in docker, in which the drivers are mounted. I hope nvidia-profiler isn't necessary for anything important; otherwise we'll need to synchronize driver versions between the containers and VM images. - Add less, colordiff and a newer version of clang-format - Add code_check_changed_files, which is intended to replace the "incremental" parts of ci_sanity. Still a work in progress because we need to decide on valuable configurations (clang-format and pylint cannot be run the same way as we have them configured internally and currently have a lot of findings) - Add code_check_full, which is intended to replace the "across entire code base" parts of ci_sanity. I rewrote many of the clunkier tests. Still a work in progress because we must verify that the changed tests will still fail. - Fix bad "bazel test " expansion for libtensorflow - Fix bad chmod for libtensoflow repacker Change libtensorflow config values to fix target selection Fix a typo in venv installation (Thanks to reedwm) Remove extra lines (Thanks again to reedwm) Clarify ctrl-s warning Correctly remove extra test filters Make it possible to run isolated pip tests More work on code checks Fix a typo Clean up code check full Remove clang-format Cleanup changed_files and move one to full Add a missing test Clean up and fix code_check_full Update docs and create experimental RBE configs Update docs and create experimental RBE configs Update dependencies to 2.9.0.dev Update Go API installation guide for TensorFlow 2.8.0 (#74) Clarify usage of nightly commit Fix mistaken 'test' command Update docs and create experimental RBE configs Update docs and create experimental RBE configs Update dependencies to 2.9.0.dev Update Go API installation guide for TensorFlow 2.8.0 (#74) Clarify usage of nightly commit Fix mistaken 'test' command change to devtoolset-9 and gcc 9.3.1 for manylinux2014

Set up symlink for devtoolset-8 Combine Docker GCR presubmits and also push main to gcr Commit missed files Log in to GCR Fix conditional, hopefully Clarify Add Python 3.10 support (#58) Adds Python 3.10 support to the containers. Python 3.10 changes some library behavior and, for now, needs an alternative installation method to work. Upgrade gcrpio for fast build and cleanup setup Add utilities for running release tests (#56) This adds the dependencies and notably bazelrc config options to run TensorFlow's Nightly and Release tests, which I've been working on replicating on internal CI. I still have documentation and migration work to do, but the major portion of the support work is here. add gdb to the system packages change to gcc 8.3.1 from centos7 for devtoolset8 fix libstdc++ symlink in devtoolset-8 environment Undo ignoring other xml files Update README Deduplicate repeated messages Squash long runfiles paths Lock nvidia driver to 460 libtensorflow work Fix libtensorflow script and start prelim check Update Test Requirements to have same versions as tf_sig_build_dockerfiles/devel.requirements.txt (#65) * Add additional gitignore files * Update requirements with same versions Keep versions consistent with tf_sig_build_dockerfiles/devel.requirements.txt Cleanup Fix Build issue from `python_include` (#67) * Remove Python 3.10 pip special handling * Link usr/include to usr/local/include * Update location of python include * Update setup.python.sh Assorted changes -- see details - Remove installation of nvidia-profiler, which depends on libcuda1, which ultimately installs an nvidia driver package, which we don't want because we're running in docker, in which the drivers are mounted. I hope nvidia-profiler isn't necessary for anything important; otherwise we'll need to synchronize driver versions between the containers and VM images. - Add less, colordiff and a newer version of clang-format - Add code_check_changed_files, which is intended to replace the "incremental" parts of ci_sanity. Still a work in progress because we need to decide on valuable configurations (clang-format and pylint cannot be run the same way as we have them configured internally and currently have a lot of findings) - Add code_check_full, which is intended to replace the "across entire code base" parts of ci_sanity. I rewrote many of the clunkier tests. Still a work in progress because we must verify that the changed tests will still fail. - Fix bad "bazel test " expansion for libtensorflow - Fix bad chmod for libtensoflow repacker Change libtensorflow config values to fix target selection Fix a typo in venv installation (Thanks to reedwm) Remove extra lines (Thanks again to reedwm) Clarify ctrl-s warning Correctly remove extra test filters Make it possible to run isolated pip tests More work on code checks Fix a typo Clean up code check full Remove clang-format Cleanup changed_files and move one to full Add a missing test Clean up and fix code_check_full Update docs and create experimental RBE configs Update docs and create experimental RBE configs Update dependencies to 2.9.0.dev Update Go API installation guide for TensorFlow 2.8.0 (#74) Clarify usage of nightly commit Fix mistaken 'test' command Update docs and create experimental RBE configs Update docs and create experimental RBE configs Update dependencies to 2.9.0.dev Update Go API installation guide for TensorFlow 2.8.0 (#74) Clarify usage of nightly commit Fix mistaken 'test' command change to devtoolset-9 and gcc 9.3.1 for manylinux2014 change cachebuster value for ml2014 remote cache change to new libstdcxx abi for devtoolset-9 change cachebuster value to use the new libstdcxx abi link against nonshared44 in devtoolset-9 update the cachebuster value change CACHEBUSTER value for gpu builds remove redudant commands during build environment setup change cachbuster variable name for gpu builds store manylinux2014 cache in a different location

Set up symlink for devtoolset-8 Combine Docker GCR presubmits and also push main to gcr Commit missed files Log in to GCR Fix conditional, hopefully Clarify Add Python 3.10 support (#58) Adds Python 3.10 support to the containers. Python 3.10 changes some library behavior and, for now, needs an alternative installation method to work. Upgrade gcrpio for fast build and cleanup setup Add utilities for running release tests (#56) This adds the dependencies and notably bazelrc config options to run TensorFlow's Nightly and Release tests, which I've been working on replicating on internal CI. I still have documentation and migration work to do, but the major portion of the support work is here. add gdb to the system packages change to gcc 8.3.1 from centos7 for devtoolset8 fix libstdc++ symlink in devtoolset-8 environment Undo ignoring other xml files Update README Deduplicate repeated messages Squash long runfiles paths Lock nvidia driver to 460 libtensorflow work Fix libtensorflow script and start prelim check Update Test Requirements to have same versions as tf_sig_build_dockerfiles/devel.requirements.txt (#65) * Add additional gitignore files * Update requirements with same versions Keep versions consistent with tf_sig_build_dockerfiles/devel.requirements.txt Cleanup Fix Build issue from `python_include` (#67) * Remove Python 3.10 pip special handling * Link usr/include to usr/local/include * Update location of python include * Update setup.python.sh Assorted changes -- see details - Remove installation of nvidia-profiler, which depends on libcuda1, which ultimately installs an nvidia driver package, which we don't want because we're running in docker, in which the drivers are mounted. I hope nvidia-profiler isn't necessary for anything important; otherwise we'll need to synchronize driver versions between the containers and VM images. - Add less, colordiff and a newer version of clang-format - Add code_check_changed_files, which is intended to replace the "incremental" parts of ci_sanity. Still a work in progress because we need to decide on valuable configurations (clang-format and pylint cannot be run the same way as we have them configured internally and currently have a lot of findings) - Add code_check_full, which is intended to replace the "across entire code base" parts of ci_sanity. I rewrote many of the clunkier tests. Still a work in progress because we must verify that the changed tests will still fail. - Fix bad "bazel test " expansion for libtensorflow - Fix bad chmod for libtensoflow repacker Change libtensorflow config values to fix target selection Fix a typo in venv installation (Thanks to reedwm) Remove extra lines (Thanks again to reedwm) Clarify ctrl-s warning Correctly remove extra test filters Make it possible to run isolated pip tests More work on code checks Fix a typo Clean up code check full Remove clang-format Cleanup changed_files and move one to full Add a missing test Clean up and fix code_check_full Update docs and create experimental RBE configs Update docs and create experimental RBE configs Update dependencies to 2.9.0.dev Update Go API installation guide for TensorFlow 2.8.0 (#74) Clarify usage of nightly commit Fix mistaken 'test' command Update docs and create experimental RBE configs Update docs and create experimental RBE configs Update dependencies to 2.9.0.dev Update Go API installation guide for TensorFlow 2.8.0 (#74) Clarify usage of nightly commit Fix mistaken 'test' command change to devtoolset-9 and gcc 9.3.1 for manylinux2014 change cachebuster value for ml2014 remote cache change to new libstdcxx abi for devtoolset-9 change cachebuster value to use the new libstdcxx abi link against nonshared44 in devtoolset-9 update the cachebuster value change CACHEBUSTER value for gpu builds remove redudant commands during build environment setup change cachbuster variable name for gpu builds store manylinux2014 cache in a different location amend comment for accuracy

angerson added the sig build dockerfiles Relating to the TF SIG Build Dockerfiles label Dec 2, 2021

angerson self-assigned this Dec 2, 2021

angerson added the build and push to gcr.io for staging Create a staging release on gcr.io label Dec 2, 2021

github-actions bot removed the build and push to gcr.io for staging Create a staging release on gcr.io label Dec 2, 2021

bhack reviewed Dec 2, 2021

View reviewed changes

angerson added the build and push to gcr.io for staging Create a staging release on gcr.io label Dec 3, 2021

github-actions bot removed the build and push to gcr.io for staging Create a staging release on gcr.io label Dec 3, 2021

angerson added the build and push to gcr.io for staging Create a staging release on gcr.io label Dec 3, 2021

github-actions bot removed the build and push to gcr.io for staging Create a staging release on gcr.io label Dec 3, 2021

angerson added the build and push to gcr.io for staging Create a staging release on gcr.io label Dec 3, 2021

angerson force-pushed the future branch from a90458c to 59a991a Compare December 3, 2021 00:24

angerson force-pushed the future branch from 38d151f to 227a762 Compare December 3, 2021 20:52

angerson added 12 commits December 10, 2021 14:02

wip

4ff35f7

Prevent failure if directory does not exist

cee34ed

Add comments to venv test setup

3f5a3cc

Fix mistaken dependency on Python 3.9

46bbdaa

Ignore log fails and fix bazel issue

d3603ee

Discard extra content if larger than 10MiB

7dc0d0a

Add test reqs and fix log squasher

b8f60ea

Cleanup

419a339

Squash different and dont install cuda-11-2, which seems to fix it

c659d8e

Add CUDA libraries metapackages and dont overshorten logs

5013db3

Fix broken log squasher

f25ff7a

Add missing nopip filters

f193459

angerson added kokoro:force-run and removed kokoro:force-run labels Dec 21, 2021

kokoro-team removed the kokoro:force-run label Dec 21, 2021

angerson mentioned this pull request Dec 21, 2021

gh pr checks retains stale checks until a new commit is pushed cli/cli#4946

Closed

Meaningless commit to reset the check list

9bf2091

angerson added the kokoro:force-run label Dec 21, 2021

kokoro-team removed the kokoro:force-run label Dec 21, 2021

angerson removed the docker:cpu_py38 label Dec 28, 2021

angerson added 6 commits December 29, 2021 10:58

Fix log squasher

3f9b2b2

More deduplication for test logs

799629f

Fix deduplication

5a3271e

fix typo

f13e577

Add extglob

a28f691

Add messy note about log source

39dda1d

angerson added 2 commits January 6, 2022 10:17

much shorter?

c152bd7

Greatly simplify the log squasher

23af240

angerson marked this pull request as ready for review January 6, 2022 21:47

angerson requested a review from perfinion as a code owner January 6, 2022 21:47

Add another note

66037a2

angerson merged commit 6089e13 into master Jan 6, 2022

angerson deleted the future branch January 6, 2022 21:57

sampathweb added a commit that referenced this pull request Jan 10, 2022

Revert "Add utilities for running release tests (#56)"

ca33720

This reverts commit 6089e13.

sampathweb mentioned this pull request Jan 10, 2022

Revert "Add utilities for running release tests" #62

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add utilities for running release tests #56

Add utilities for running release tests #56

Uh oh!

angerson commented Dec 2, 2021

Uh oh!

github-actions bot commented Dec 2, 2021

Uh oh!

bhack commented Dec 2, 2021

Uh oh!

bhack Dec 2, 2021 •

edited

Loading

Uh oh!

bhack Dec 2, 2021

Uh oh!

angerson commented Dec 3, 2021

Uh oh!

bhack commented Dec 3, 2021 •

edited

Loading

Uh oh!

angerson commented Dec 3, 2021

Uh oh!

bhack commented Dec 3, 2021 •

edited

Loading

Uh oh!

bhack commented Dec 31, 2021

Uh oh!

angerson commented Jan 6, 2022

Uh oh!

Uh oh!

Add utilities for running release tests #56

Add utilities for running release tests #56

Uh oh!

Conversation

angerson commented Dec 2, 2021

Uh oh!

github-actions bot commented Dec 2, 2021

Uh oh!

bhack commented Dec 2, 2021

Uh oh!

bhack Dec 2, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

bhack Dec 2, 2021

Choose a reason for hiding this comment

Uh oh!

angerson commented Dec 3, 2021

Uh oh!

bhack commented Dec 3, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

angerson commented Dec 3, 2021

Uh oh!

bhack commented Dec 3, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bhack commented Dec 31, 2021

Uh oh!

angerson commented Jan 6, 2022

Uh oh!

Uh oh!

bhack Dec 2, 2021 •

edited

Loading

bhack commented Dec 3, 2021 •

edited

Loading

bhack commented Dec 3, 2021 •

edited

Loading