Skip to content

TensorFlow eager-mode VS PyTorch eager-mode #49229

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
innat opened this issue May 17, 2021 · 9 comments
Closed

TensorFlow eager-mode VS PyTorch eager-mode #49229

innat opened this issue May 17, 2021 · 9 comments
Assignees
Labels
comp:eager Eager related issues TF 2.4 for issues related to TF 2.4 type:performance Performance Issue

Comments

@innat
Copy link

innat commented May 17, 2021

Config:

OS: Windows 10
TensorFlow 2.4.1
Torch 1.7.1

Query

We know that eager mode is slow compared to graph mode in TF 2.x. But how much it can be slow compared to PyTorch's eager mode??

A question was asked in SO regarding this, where the OP used a deep reinforcement learning code example with a custom training loop to compare. In that example, whereas a pytorch code takes approximately ~3 minutes to complete; and with the same training pipeline a tf code takes approximately ~2 hours to complete, even with less accuracy comparatively.

It probably also brings some other stuff like memory leaks during custom training loops etc. When I run the pytorch code, the CPU gets uses 100% and the 3D thread of GPU (RTX 2070) was using approximately 20%. But when I run tf code, the CPU gets uses ~50%, physical RAM gets increased over time (possible memory leaks), and VRAM gets super high and no use of 3D thread of GPU. Not sure what's the root cause.

The only and significant difference occurs after optimizing the tf code and compile it with graph execution, as demonstrated in the accepted answer. The answer is fine but it seems more like the way to optimize tf code.

I'm wondering, let's say we need to run tf code in eager mode, in that point, what is the root cause of this performance and execution gap between tf and pytorch. Is it expected behavior? The OP shared the plug-n-play code example, please find them from here.

@innat innat added the type:performance Performance Issue label May 17, 2021
@saikumarchalla saikumarchalla added TF 2.4 for issues related to TF 2.4 comp:eager Eager related issues labels May 18, 2021
@rmothukuru rmothukuru added the stat:awaiting tensorflower Status - Awaiting response from tensorflower label May 18, 2021
@innat
Copy link
Author

innat commented Jun 2, 2021

@rmothukuru please let me know if I need to add any more information. The code (mention above) is plug-n-play type code, you don't need to bother with unnecessary libraries or anything serious basically.

@innat
Copy link
Author

innat commented Nov 25, 2021

@rmothukuru
any update?

@innat
Copy link
Author

innat commented Mar 7, 2022

@rmothukuru
Could you please give some feedback?

@innat
Copy link
Author

innat commented Jun 12, 2022

It's been 1 year but still no response. To accelerate it, I've prepared a gist file to quickly execute the program. Please find the gist here. Note that, when I reported the anomaly, it was tf 2.4. And now, it upgrades to tf 2.8 or tf 29. But the behavior is still the same. Thanks.

TL, DR, same code but pytorch takes a minute to finish whereas tf takes hours. But executing the tensorflow as a graph mode can improve its execution time (details). And so the title goes, tf eager mode vs pytorch eager mode.

@mohantym
Copy link
Contributor

Hi @innat!

I see a huge difference between timings of eager mode of Tensorflow/Pytorch now in 2.10 and 2.11.
Could you give us an update from your side.

Thank you!

@mohantym mohantym added stat:awaiting response Status - Awaiting response from author and removed stat:awaiting tensorflower Status - Awaiting response from tensorflower labels Jan 26, 2023
@innat
Copy link
Author

innat commented Jan 28, 2023

@mohantym Could you please provide quantitative results of execution time that you found. I still observe the issue.

@google-ml-butler google-ml-butler bot removed the stat:awaiting response Status - Awaiting response from author label Jan 28, 2023
@innat
Copy link
Author

innat commented Jan 29, 2023

Also, if you think tf version with 2.10 or 2.11 fixe the issue that was reported during tf 2.4 (and till 2.9), please redirect me with relevant PR that fixes the unknown issue that cause such dramatic performance drop.

@innat
Copy link
Author

innat commented Aug 4, 2023

PyTorch is not only faster but also more efficient than TensorFlow in an eager mode setup. Cool!

@innat innat closed this as completed Aug 4, 2023
@google-ml-butler
Copy link

Are you satisfied with the resolution of your issue?
Yes
No

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
comp:eager Eager related issues TF 2.4 for issues related to TF 2.4 type:performance Performance Issue
Projects
None yet
Development

No branches or pull requests

4 participants