-
Notifications
You must be signed in to change notification settings - Fork 74.7k
TensorFlow eager-mode VS PyTorch eager-mode #49229
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
@rmothukuru please let me know if I need to add any more information. The code (mention above) is plug-n-play type code, you don't need to bother with unnecessary libraries or anything serious basically. |
@rmothukuru |
@rmothukuru |
It's been 1 year but still no response. To accelerate it, I've prepared a gist file to quickly execute the program. Please find the gist here. Note that, when I reported the anomaly, it was tf 2.4. And now, it upgrades to tf 2.8 or tf 29. But the behavior is still the same. Thanks. TL, DR, same code but |
@mohantym Could you please provide quantitative results of execution time that you found. I still observe the issue. |
Also, if you think tf version with 2.10 or 2.11 fixe the issue that was reported during tf 2.4 (and till 2.9), please redirect me with relevant PR that fixes the unknown issue that cause such dramatic performance drop. |
PyTorch is not only faster but also more efficient than TensorFlow in an eager mode setup. Cool! |
Uh oh!
There was an error while loading. Please reload this page.
Config:
Query
We know that eager mode is slow compared to graph mode in
TF 2.x
. But how much it can be slow compared toPyTorch's
eager mode??A question was asked in SO regarding this, where the OP used a deep reinforcement learning code example with a custom training loop to compare. In that example, whereas a
pytorch
code takes approximately~3
minutes to complete; and with the same training pipeline atf
code takes approximately~2
hours to complete, even with less accuracy comparatively.It probably also brings some other stuff like memory leaks during custom training loops etc. When I run the
pytorch
code, theCPU
gets uses 100% and the3D
thread ofGPU
(RTX 2070) was using approximately 20%. But when I runtf
code, theCPU
gets uses ~50%, physical RAM gets increased over time (possible memory leaks), and VRAM gets super high and no use of3D
thread ofGPU
. Not sure what's the root cause.The only and significant difference occurs after optimizing the
tf
code and compile it with graph execution, as demonstrated in the accepted answer. The answer is fine but it seems more like the way to optimizetf
code.I'm wondering, let's say we need to run
tf
code in eager mode, in that point, what is the root cause of this performance and execution gap betweentf
andpytorch
. Is it expected behavior? The OP shared the plug-n-play code example, please find them from here.The text was updated successfully, but these errors were encountered: