Upgrade for TensorFlow 2.4.1 #212

saudet · 2021-02-10T01:27:10Z

This draft PR includes initial changes required just to get the builds going with TF 2.4.1. Some tests are crashing, probably because we still need to update the handling of TF_String as per the breaking change below, but there's probably a bit more to be done.

The byte layout for string tensors across the C-API has been updated to match TF Core/C++; i.e., a contiguous array of tensorflow::tstring/TF_TStrings.

https://github.com/tensorflow/tensorflow/releases/tag/v2.4.0

It's quite a big release, and a lot of ops have been added, so even after getting everything fixed up, there's probably more to be done before we can get this merged. @karllessard Please take a look and feel free to push any changes to my fork and/or merge this into a branch of the main repository.

On the flip side, most toolchain workarounds have been moved from configure.py to .bazelrc (so we can clean up build.sh a little), we finally have support for CUDA 11 and cuDNN 8, MKL-DNN 1.x now works on all platforms (Linux, Mac, and Windows) without huge external binaries, and I was able to find what the issue was with CUDA on CentOS. Apparently, it doesn't like its new version of glibc, and we can prevent it from getting installed by calling yum this way:

          GLIBC="glibc glibc-common glibc-devel glibc-headers"
          yum --disablerepo updates -y install $GLIBC
          yum -x "$GLIBC" -y update ...

https://github.com/saudet/tensorflow-java/blob/upgrade-tensorflow-241/.github/workflows/ci.yml#L53-L58
@karllessard Feel free to merge this bit right away for TF 2.3.1!

karllessard · 2021-02-11T03:10:09Z

Thanks a lot @saudet , let's prioritize this merge and leave 2.3.1 broken for now. I'll pull your branch and see what I can do to fix the string issues + classify the new ops

karllessard · 2021-02-11T03:54:31Z

Finally, while I'm building your branch, let's give it a try on 2.3.1... 8b36d7e

karllessard · 2021-02-22T02:38:43Z

...w-core/tensorflow-core-api/src/gen/java/org/tensorflow/internal/c_api/global/tensorflow.java

+
+// #include "tensorflow/core/platform/ctstring_internal.h"
+
+// Initialize a new tstring.  This must be called before using any function


Any clue why the documentation of these new string endpoints get all messed up by being aggregated together?

The TF_TString type isn't defined in ctstring.h, so we need to include ctstring_internal.h as well, which contains the definitions of those functions too. JavaCPP simply skips the redundant declarations that follow. We can hack something up, but this is something they should fix upstream, and eventually will I assume, so probably not something worth spending time on here.

karllessard · 2021-02-22T02:55:48Z

tensorflow-core/tensorflow-core-api/src/bazel/op_generator/op_generator.cc

+    if ("size" == output.var().name()) {
+        // reserved name
+        continue;
+    }


We should not skip the getter of an output that have a reserved name but just rename it in the API def, though I'm not sure why size would cause any problem here. Can you give more context to this change?

There are new ops with outputs named "size". Do whatever you want with them, I just did that to get them to compile.

Ok, I'll take care of it

saudet · 2021-02-22T11:09:00Z

Everything builds and all tests pass, so I believe this is ready to merge, unless we should refine all changes to the ops here as well? Also, the GPU builds on Linux are timing out:
https://github.com/saudet/tensorflow-java/actions/runs/585339990
For the past few days, the normal CPU builds have been taking 4 hours instead of 3, and that would explain why the GPU builds don't finish in 6 hours. Either something's changed with the VMs, but more likely this will pass.

karllessard · 2021-02-22T13:24:09Z

I'm actually working on classifying the new ops, which should be completed before merging. I'll make a PR to your branch probably today.

I'm almost done, I just hit a backward-compatibility issue with some of them which I'll duplicate for now but that's something I'd like to bring as a topic for our next session to decide how to handle this properly next time.

google-cla · 2021-02-22T14:46:01Z

All (the pull request submitter and all commit authors) CLAs are signed, but one or more commits were authored or co-authored by someone other than the pull request submitter.

We need to confirm that all authors are ok with their commits being contributed to this project. Please have them confirm that by leaving a comment that contains only @googlebot I consent. in this pull request.

Note to project maintainer: There may be cases where the author cannot leave a comment, or the comment is not properly detected as consent. In those cases, you can manually confirm consent of the commit author(s), and set the cla label to yes (if enabled on your project).

ℹ️ Googlers: Go here for more info.

karllessard · 2021-02-22T14:48:51Z

Just pushed the changes, let's see how it goes with the build: https://github.com/tensorflow/java/actions/runs/589500056

karllessard · 2021-02-22T16:03:37Z

@googlebot I consent

Upgrade for TensorFlow 2.4.1

c98e457

Update handling of string tensors using TF_TString

a5ea689

karllessard reviewed Feb 22, 2021

View reviewed changes

saudet marked this pull request as ready for review February 22, 2021 11:09

Classify TF2.4.1 ops

53f1f7f

karllessard merged commit 3f0369a into tensorflow:master Feb 23, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Upgrade for TensorFlow 2.4.1 #212

Upgrade for TensorFlow 2.4.1 #212

saudet commented Feb 10, 2021

Uh oh!

karllessard commented Feb 11, 2021 •

edited

Loading

Uh oh!

karllessard commented Feb 11, 2021

Uh oh!

karllessard Feb 22, 2021

Uh oh!

saudet Feb 22, 2021

Uh oh!

karllessard Feb 22, 2021

Uh oh!

saudet Feb 22, 2021

Uh oh!

karllessard Feb 22, 2021

Uh oh!

saudet commented Feb 22, 2021

Uh oh!

karllessard commented Feb 22, 2021

Uh oh!

google-cla bot commented Feb 22, 2021

Uh oh!

karllessard commented Feb 22, 2021

Uh oh!

karllessard commented Feb 22, 2021

Uh oh!


		// #include "tensorflow/core/platform/ctstring_internal.h"

		// Initialize a new tstring. This must be called before using any function

Upgrade for TensorFlow 2.4.1 #212

Upgrade for TensorFlow 2.4.1 #212

Conversation

saudet commented Feb 10, 2021

Uh oh!

karllessard commented Feb 11, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

karllessard commented Feb 11, 2021

Uh oh!

karllessard Feb 22, 2021

Choose a reason for hiding this comment

Uh oh!

saudet Feb 22, 2021

Choose a reason for hiding this comment

Uh oh!

karllessard Feb 22, 2021

Choose a reason for hiding this comment

Uh oh!

saudet Feb 22, 2021

Choose a reason for hiding this comment

Uh oh!

karllessard Feb 22, 2021

Choose a reason for hiding this comment

Uh oh!

saudet commented Feb 22, 2021

Uh oh!

karllessard commented Feb 22, 2021

Uh oh!

google-cla bot commented Feb 22, 2021

Uh oh!

karllessard commented Feb 22, 2021

Uh oh!

karllessard commented Feb 22, 2021

Uh oh!

karllessard commented Feb 11, 2021 •

edited

Loading