-
Notifications
You must be signed in to change notification settings - Fork 699
[ONNX] Import LSTM #3713
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ONNX] Import LSTM #3713
Conversation
Can we have this reviewed? Or at least start a first round of review because I suspect you might have a lot to say :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mciprian13 This looks really great! Thanks for contributing this, along with the script for generating the tests and reference data. Apologies for the delay in review, there's a lot to chew on here 🙂
I'd like to make sure I'm reading the state variables part correctly. The initial hidden state is a Placeholder Y_h_ph
. Then each LSTM unrolled passes its hidden state to the next as an intermediate, i.e. not through Y_h_ph
. Then the final LSTM will save its final hidden state back out to Y_h_ph
.
Do you have any specific reason for reusing Y_h_ph
for both initial and final state? It works fine, but I would consider/suggest separating the two. If you're looking for best performance, depending on the architecture you may gain a lot from having the initial hidden state as a Constant, as it can be preloaded/held in fast memory closer to compute resources. So if they're separate PHs you could more easily convert the initial Y_h_ph
to a Constant when you're deploying a final model (otherwise you will have an issue due to the final save to Y_h_ph
since you're converting it to a Constant). Actually you'd probably want to just leave out saving back to Y_h_ph
when you deploy anyway...
Also, I'm wondering if some of the logic here in the ONNXModelLoader
could be moved into Graph.cpp
. Or we could even actually introduce an LSTMNode
and move some of this logic into Lower.cpp
. This would allow reuse of this logic across other frontends, e.g. the Caffe2 proto loader, the PyTorch loader, etc. WDYT?
Your understanding about the state placeholders usage is correct: Y_h and Y_c are used both as input and output (input for the first LSTM cell, output for the last LSTM cell, intermediate states do NOT pass thru these placeholders. I see there are two proposals in your comment:
Therefore I propose I will do the following:
Note: We actually designed a model in-house based on LSTM which was running on one text character at a time (the model inference was run for each character) but needed to maintain the state up to the point where a sentence was over (period) when the state needed to be reset (set to zero). Having a bundle (AOT) generated by Glow which exposed the LSTM state as placeholders, made the state manipulation easy (moreover, having the in/out state placeholders over-imposed, the LSTM state was tracked automatically). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To clarify, the user running multiple inferences means that the unroll length was too short and they want to continue with execution? Or you have some other use cases here where you will want to continue with execution as you're mentioning? Sure, I think that makes sense. I don't think it's a big deal to simply swap the Tensors backing Y_h_inp
and Y_h_out
in the PlaceholderBindings, but I don't feel strongly here.
Anyway, your proposal sounds good. For the two flavors of createONNXLSTM()
-- it could probably just be a boolean flag to the same function.
…tSlice function since the optimization was deferred to the optimizer.
What I meant by multiple inference was this case:
After the discussions, therefore, I created a separate function in Graph.cpp named "createONNXLSTM" which:
Please have another look and tell me if all is good (and btw what is going on with the CI??) |
Can you run again the CI tests and have this landed? The error about "the unrelated histories" does not make sense since I pushed the newest commits in the same branch. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mciprian13 LGTM!
Not exactly sure what's up with CI -- when I look at your branch it's 104 commits behind Glow master. Can you rebase on the latest Glow master and update the branch? I think that might solve the issue. I tried restarting it but it consistently shows that same error.
Indeed a simple merge with the master fixed this problem. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jfix71 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
Summary: **Summary** Add RNN and GRU modules in the ONNX Importer. This PR is similar to #3713. I added both RNN and GRU in the same PR because they are very similar. **Documentation** None **Test Plan** Add ONNX models (and Python generator scripts) with PyTorch numerical references. Pull Request resolved: #3847 Differential Revision: D18987803 Pulled By: jfix71 fbshipit-source-id: 09f760aa57cd416bec91f8b67c83f315cb5acfff
Summary: **Summary** Add logic to import LSTM layer from ONNX model. **Documentation** None **Test Plan** Testcases to compare the numerical results of the LSTM with a Python (pytorch) reference implementation. Pull Request resolved: pytorch#3713 Differential Revision: D18782616 Pulled By: jfix71 fbshipit-source-id: 339606694ba9684179be979be5da980faa7e8097
Summary: **Summary** Add RNN and GRU modules in the ONNX Importer. This PR is similar to pytorch#3713. I added both RNN and GRU in the same PR because they are very similar. **Documentation** None **Test Plan** Add ONNX models (and Python generator scripts) with PyTorch numerical references. Pull Request resolved: pytorch#3847 Differential Revision: D18987803 Pulled By: jfix71 fbshipit-source-id: 09f760aa57cd416bec91f8b67c83f315cb5acfff
Summary
Add logic to import LSTM layer from ONNX model.
Documentation
None
Test Plan
Testcases to compare the numerical results of the LSTM with a Python (pytorch) reference implementation.