Skip to content

JIT fuser can't handle scalar ops #9940

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
apaszke opened this issue Jul 27, 2018 · 2 comments
Closed

JIT fuser can't handle scalar ops #9940

apaszke opened this issue Jul 27, 2018 · 2 comments
Labels
oncall: jit Add this issue/PR to JIT oncall triage queue

Comments

@apaszke
Copy link
Contributor

apaszke commented Jul 27, 2018

#8919 changes the semantics of the most basic binary ops, and we need to update the JIT code to handle that.

Previously, the fuser depended on shape propagation to insert numerous expands, making the sizes of inputs to pointwise ops match (this is a hard requirement of our codegen). However, this is no longer a sound transformation, because the type inference rules depend on input ranks and the "scalar-auto-wrapping status" of input tensors.

Since all this is fundamentally only a requirement of the fuser, it makes sense to handle it there. We will need to special case the pointwise operators between regular tensors and (possibly CPU, possibly differently typed) tensors representing scalars. There are two ways to approach this:

  1. Allow the codegen to take scalars as plain kernel arguments. This even gives us some nice perf benefits we didn't have before.
  2. If the TensorIterator actually allows to easily write CUDA kernels, and handles type promotion and expands for us, we might want to use that in our codegen and relax some constraints. I'm not very familiar with this functionality, so I don't know how feasible this is.
  3. (inferior to previous) insert type_as and expand nodes, making the inputs amenable to fusion
@apaszke apaszke added the oncall: jit Add this issue/PR to JIT oncall triage queue label Jul 27, 2018
@zou3519
Copy link
Contributor

zou3519 commented Jul 27, 2018

Possibly related, if it's still applicable : #8560

@apaszke
Copy link
Contributor Author

apaszke commented Jul 27, 2018

It's not strongly related. The issue you linked is a run-time one, while this one is that we simply won't allow scalar ops to appear in fused kernels now.

zou3519 added a commit to zou3519/pytorch that referenced this issue Aug 17, 2018
This is on the way to resolving pytorch#9940.

This PR modifies graph fuser to fuse operations that have constant
scalar arguments. These constant scalar arguments are directly inlined
into the kernel body.

The context for this is that LSTM backward (in particular, sigmoid
backward) has many add(x, 1.) operations. This PR should be sufficient for
LSTM backward to get fused by the graph fuser.
facebook-github-bot pushed a commit that referenced this issue Aug 17, 2018
Summary:
This is on the way to resolving #9940.

Fixes #10501

This PR modifies graph fuser to fuse operations that have constant
scalar arguments. These constant scalar arguments are directly inlined
into the kernel body.

The context for this is that LSTM backward (in particular, sigmoid
backward) has many add(x, 1.) operations. This PR should be sufficient for
LSTM backward to get fused by the graph fuser.

cc apaszke zdevito
Pull Request resolved: #10511

Differential Revision: D9378896

Pulled By: zou3519

fbshipit-source-id: 6a7a2987f5b6e8edaaf4b599cd200df33361650f
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
oncall: jit Add this issue/PR to JIT oncall triage queue
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants