-
Notifications
You must be signed in to change notification settings - Fork 74.7k
Tracker for TensorFlow-DirectML #55226
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
@penpornk How can pluggable devices implement the |
Looping in @wangpengmit for more info on variables. |
Resource and ref vars are very different, so my first thought is that |
I have a working prototype in a fork that I put in the kernels_experimental header. I can submit a PR if that's ok with everyone. Also, my understanding is that ref vars are deprecated in TF2 and are replaced by resources at the python API level. Is that correct? But even though they are deprecated, some popular benchmarks like AI-Benchmark use a frozen TF1 model when running on TF2, which yield bad results for pluggable devices since they don't currently support them. |
Yes, ref vars are deprecated at the Python level. Existing TF1 models may still be using them, so TF2 internals still support them. |
@PatriceVignola Oh, that's great! Please submit the PR and we can continue the discussion there (e.g., whether this needs to be an RFC, etc). Thank you very much! :) |
New PR: #55544 |
New PR: #55557 |
New PR: #55558 |
New PR: #55579 |
@penpornk We noticed that support for Since some models depend heavily on variant tensors which contain TensorList objects, we'd like to propose 2 new APIs that address this issue: TF_CAPI_EXPORT extern void TF_AddNVariant(
TF_OpKernelContext* ctx,
void (*binaryAddFunc)(TF_OpKernelContext* ctx, const TF_Tensor* a, const TF_Tensor* b, TF_Tensor* out),
TF_Status* status);
TF_CAPI_EXPORT extern void TF_ZerosLikeVariant(
TF_OpKernelContext* ctx,
void (*zerosLikeFunc)(TF_OpKernelContext* ctx, const TF_Tensor* input, TF_Tensor* out),
TF_Status* status); Like the Is this something that the TensorFlow team would like to see an RFC or PR for? For one of the RNN models that we track (Pixel-RNN), we see up to a 5x performance improvement by not having to do those operations on the CPU. |
Yes, we currently don't have a generic way to support
Thank you for the suggestion! The team thinks this sounds reasonable. If you already have a prototype code for this, would you mind opening a PR? We can take it from there. (If there are some points that need more discussion, we can start an RFC.) |
Sure! I created a PR here: #55645 |
@PatriceVignola Thank you! Added to the list.
Let's just reuse this one. :) |
Would it be possible to ensure that 2.10 has #54330 in it? |
@penpornk Thank you for monitoring the other PR! Another PR that is important for us and we believe is a pretty significant bug in the pluggable device implementation is this one: #56707 The Pluggable Device RFC says that pluggable devices should be able to name themselves "GPU" and overwrite the built-in CUDA GPU, but in practice what happens is a lot of duplicate registration errors are being thrown because the previous registrations for the CUDA GPU device are not removed. This forces users to use the |
@PatriceVignola Happy to report that #55558 went into TF 2.10.
Unfortunately, we need more time to carefully consider possible side effects of this one. I have replied on the PR. I would like to introduce @rishikasinha-tf. In the future, please help cc both her and me on PRs that got stuck. I'll also try to go through PRs that are tracked in the top post soon. Apologies again for the delay! :( |
@NeilGirdhar Yes, #54330 is in v2.10. We have just cut the branch on Wednesday so anything that was merged before that is in the release. |
Uh oh!
There was an error while loading. Please reload this page.
cc: @PatriceVignola @wchao1115
This issue tracks pending PRs, issues, and possible cherry-picks necessary for TensorFlow-DirectML for each TF release. Please post a comment with new things to track and I will update this post to reflect the changes.
New PRs:
GPU
type #56707TF_GetInputTensorFromVariable
#55677PRs that need more investigation.
PRs that made it into TF nightly (post r2.10 branch cut):
TF_OpKernelConstruction_GetNodeDef
#52157PRs that made it into TF 2.10:
TF_AssignVariable
#55678PRs that made it into TF 2.9:
candidate_input_indices
constant inTF_ForwardInputOrAllocateOutput
#54139Closed PR:
The text was updated successfully, but these errors were encountered: