-
Notifications
You must be signed in to change notification settings - Fork 24.2k
[Static Runtime] Use composite op for TE fusion #74126
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
When we perform fusion without the composite op, `TensorExprDynamicGroup`, it ends up not reusing the output tensor buffers. So, until we figure out a way to do that with `TensorExprGroup` op, it seems strictly better to use composite op, even though it involves going to the JIT. Differential Revision: [D34831280](https://our.internmc.facebook.com/intern/diff/D34831280/) [ghstack-poisoned]
🔗 Helpful links
💊 CI failures summary and remediationsAs of commit f907fdb (more details on the Dr. CI page): 💚 💚 Looks good so far! There are no failures yet. 💚 💚 This comment was automatically generated by Dr. CI (expand for details).Please report bugs/suggestions to the (internal) Dr. CI Users group. |
CI Flow Status⚛️ CI FlowRuleset - Version:
|
When we perform fusion without the composite op, `TensorExprDynamicGroup`, it ends up not reusing the output tensor buffers. So, until we figure out a way to do that with `TensorExprGroup` op, it seems strictly better to use composite op, even though it involves going to the JIT. Differential Revision: [D34831280](https://our.internmc.facebook.com/intern/diff/D34831280/) ghstack-source-id: 151191941 Pull Request resolved: #74126
Summary: Pull Request resolved: #74126 When we perform fusion without the composite op, `TensorExprDynamicGroup`, it ends up not reusing the output tensor buffers. So, until we figure out a way to do that with `TensorExprGroup` op, it seems strictly better to use composite op, even though it involves going to the JIT. ghstack-source-id: 151191941 Test Plan: Tested locally with `ptvsc2_predictor_bench` on the Video model. Performance analysis with `caffe2/caffe2/fb/predictor/bench:limb` on the Video model locally showed an improvement of ~1% with this change. Reviewed By: mikeiovine Differential Revision: D34831280 fbshipit-source-id: e523878364b519ccd51b78d52d9f6c9d3e8def17
Stack from ghstack (oldest at bottom):
When we perform fusion without the composite op,
TensorExprDynamicGroup
, it ends up not reusing the output tensor buffers. So, until we figure out a way to do that withTensorExprGroup
op, it seems strictly better to use composite op, even though it involves going to the JIT.Differential Revision: D34831280