[ONNX] Import LSTM #3713

mciprian13 · 2019-10-31T19:54:27Z

Summary
Add logic to import LSTM layer from ONNX model.

Documentation
None

Test Plan
Testcases to compare the numerical results of the LSTM with a Python (pytorch) reference implementation.

mciprian13 · 2019-11-22T00:41:16Z

Can we have this reviewed? Or at least start a first round of review because I suspect you might have a lot to say :)
Thanks!

jfix71

@mciprian13 This looks really great! Thanks for contributing this, along with the script for generating the tests and reference data. Apologies for the delay in review, there's a lot to chew on here 🙂

I'd like to make sure I'm reading the state variables part correctly. The initial hidden state is a Placeholder Y_h_ph. Then each LSTM unrolled passes its hidden state to the next as an intermediate, i.e. not through Y_h_ph. Then the final LSTM will save its final hidden state back out to Y_h_ph.

Do you have any specific reason for reusing Y_h_ph for both initial and final state? It works fine, but I would consider/suggest separating the two. If you're looking for best performance, depending on the architecture you may gain a lot from having the initial hidden state as a Constant, as it can be preloaded/held in fast memory closer to compute resources. So if they're separate PHs you could more easily convert the initial Y_h_ph to a Constant when you're deploying a final model (otherwise you will have an issue due to the final save to Y_h_ph since you're converting it to a Constant). Actually you'd probably want to just leave out saving back to Y_h_ph when you deploy anyway...

Also, I'm wondering if some of the logic here in the ONNXModelLoader could be moved into Graph.cpp. Or we could even actually introduce an LSTMNode and move some of this logic into Lower.cpp. This would allow reuse of this logic across other frontends, e.g. the Caffe2 proto loader, the PyTorch loader, etc. WDYT?

lib/Importer/ONNXModelLoader.cpp

include/glow/Graph/Graph.h

lib/Importer/ONNXModelLoader.cpp

mciprian13 · 2019-11-26T11:20:59Z

Your understanding about the state placeholders usage is correct: Y_h and Y_c are used both as input and output (input for the first LSTM cell, output for the last LSTM cell, intermediate states do NOT pass thru these placeholders.

I see there are two proposals in your comment:

Separate the state placeholders between input and output:

this means take Y_h and Y_c state placeholders and create 4 placeholders, say Y_h_inp, Y_h_out, Y_c_inp and Y_c_out
from a usability point of view (ease-of-use) I think this is not the right approach, because if the user wants for the LSTM state to be tracked between multiple inferences, this means it has to manually intervene between subsequent inferences by copying data from Y_h_out to Y_h_inp and from Y_c_out to Y_c_inp. If the input/output placeholders coincide then this is done automatically.
although the point above about keeping the input/output placeholders separate or same is not that strong, I am pretty convinced that I would not want to convert the input state to constant and I would not drop the output state. From a flexibility point of view, I would leave both the input AND output states accessible as placeholders because:
- using a constant for the input state limits the possibility of initializing same model with different initial states (for example one might want to save the LSTM state after 100 inferences and initialize another instance of the LSTM with this state)
- I would NOT drop the output states because they are useful for saving the state of the LSTM (and thus can be resumed/restored later for same LSTM instance or another)

Agree, I can put the code somewhere where it is more reusable but I wouldn`t go that far and assume it would be used everywhere because this is concerned more with the ONNX definition of the LSTM which is pretty particular.

Therefore I propose I will do the following:

I will move the function in Graph.cpp where I will name it "createONNXLSTM()" (the name suggests the definition of the LSTM is particular to the ONNX standard). I don`t think I will create an LSTM node, the lowering is done implicitly in the creation function itself.
I will actually create two flavors of the function with same/separate input/output placeholders states.
What do you say?

Note: We actually designed a model in-house based on LSTM which was running on one text character at a time (the model inference was run for each character) but needed to maintain the state up to the point where a sentence was over (period) when the state needed to be reset (set to zero). Having a bundle (AOT) generated by Glow which exposed the LSTM state as placeholders, made the state manipulation easy (moreover, having the in/out state placeholders over-imposed, the LSTM state was tracked automatically).

jfix71

To clarify, the user running multiple inferences means that the unroll length was too short and they want to continue with execution? Or you have some other use cases here where you will want to continue with execution as you're mentioning? Sure, I think that makes sense. I don't think it's a big deal to simply swap the Tensors backing Y_h_inp and Y_h_out in the PlaceholderBindings, but I don't feel strongly here.

Anyway, your proposal sounds good. For the two flavors of createONNXLSTM() -- it could probably just be a boolean flag to the same function.

lib/Importer/ONNXModelLoader.cpp

include/glow/Graph/Graph.h

…tSlice function since the optimization was deferred to the optimizer.

mciprian13 · 2019-11-27T15:49:04Z

What I meant by multiple inference was this case:

the model is defined with only one LSTM cell (for example NLP model to process just one character) while the application needs to unroll the LSTM while propagating the state in order to process one sentence
this case was used with the AOT bundle: having the LSTM state exposed as prescribed, allowed me to propagate the state between multiple calls of the bundle entry function while processing a sentence and reset the state (set to 0) when the sentence was over

After the discussions, therefore, I created a separate function in Graph.cpp named "createONNXLSTM" which:

is generic to handle both separated or overlapped in/out states.
I removed the "manual" optimizations which currently are (or will be) done by the optimizer
The invocation in the importer now has the following behavior with respect to the state:
if the input states are explicitly provided then those variables will be used (can be constants, placeholders, whatever)
if the input states are not explicitly provided then the placeholders used for the output states will be used

Please have another look and tell me if all is good (and btw what is going on with the CI??)
Thanks again!

mciprian13 · 2019-12-02T09:06:41Z

Can you run again the CI tests and have this landed? The error about "the unrelated histories" does not make sense since I pushed the newest commits in the same branch.

jfix71

@mciprian13 LGTM!

Not exactly sure what's up with CI -- when I look at your branch it's 104 commits behind Glow master. Can you rebase on the latest Glow master and update the branch? I think that might solve the issue. I tried restarting it but it consistently shows that same error.

mciprian13 · 2019-12-03T11:00:25Z

Indeed a simple merge with the master fixed this problem.
Let's have this landed then!
I`m also "calling dibs" on adding support for ONNX GRU since it is very similar to this (actually slightly simpler).
Thanks!

facebook-github-bot

@jfix71 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

Summary: **Summary** Add RNN and GRU modules in the ONNX Importer. This PR is similar to #3713. I added both RNN and GRU in the same PR because they are very similar. **Documentation** None **Test Plan** Add ONNX models (and Python generator scripts) with PyTorch numerical references. Pull Request resolved: #3847 Differential Revision: D18987803 Pulled By: jfix71 fbshipit-source-id: 09f760aa57cd416bec91f8b67c83f315cb5acfff

Summary: **Summary** Add logic to import LSTM layer from ONNX model. **Documentation** None **Test Plan** Testcases to compare the numerical results of the LSTM with a Python (pytorch) reference implementation. Pull Request resolved: pytorch#3713 Differential Revision: D18782616 Pulled By: jfix71 fbshipit-source-id: 339606694ba9684179be979be5da980faa7e8097

Summary: **Summary** Add RNN and GRU modules in the ONNX Importer. This PR is similar to pytorch#3713. I added both RNN and GRU in the same PR because they are very similar. **Documentation** None **Test Plan** Add ONNX models (and Python generator scripts) with PyTorch numerical references. Pull Request resolved: pytorch#3847 Differential Revision: D18987803 Pulled By: jfix71 fbshipit-source-id: 09f760aa57cd416bec91f8b67c83f315cb5acfff

Add LSTM in ONNX importer.

81e3f2f

facebook-github-bot added the CLA Signed label Oct 31, 2019

Remove lstm models from the exporter tests.

0c7311b

jfix71 requested changes Nov 26, 2019

View reviewed changes

Review updates.

72e70d2

jfix71 reviewed Nov 26, 2019

View reviewed changes

lib/Importer/ONNXModelLoader.cpp Outdated Show resolved Hide resolved

include/glow/Graph/Graph.h Outdated Show resolved Hide resolved

Create logic in Graph.cpp for creating LSTM. Remove the createConstan…

090407b

…tSlice function since the optimization was deferred to the optimizer.

jfix71 approved these changes Dec 2, 2019

View reviewed changes

Merge branch 'master' into Add_LSTM_ONNX_Import

c86e943

facebook-github-bot reviewed Dec 3, 2019

View reviewed changes

facebook-github-bot closed this in 367c563 Dec 3, 2019

mciprian13 deleted the Add_LSTM_ONNX_Import branch December 4, 2019 09:34

mciprian13 mentioned this pull request Dec 4, 2019

[ONNX Importer] Add RNN and GRU. #3847

Closed

billamiable mentioned this pull request Dec 24, 2019

error while loading lstm model #3939

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[ONNX] Import LSTM #3713

[ONNX] Import LSTM #3713

Uh oh!

mciprian13 commented Oct 31, 2019

Uh oh!

mciprian13 commented Nov 22, 2019

Uh oh!

jfix71 left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mciprian13 commented Nov 26, 2019 •

edited

Loading

Uh oh!

jfix71 left a comment

Uh oh!

Uh oh!

Uh oh!

mciprian13 commented Nov 27, 2019 •

edited

Loading

Uh oh!

mciprian13 commented Dec 2, 2019

Uh oh!

jfix71 left a comment •

edited

Loading

Uh oh!

mciprian13 commented Dec 3, 2019 •

edited

Loading

Uh oh!

facebook-github-bot left a comment

Uh oh!

Uh oh!

[ONNX] Import LSTM #3713

[ONNX] Import LSTM #3713

Uh oh!

Conversation

mciprian13 commented Oct 31, 2019

Uh oh!

mciprian13 commented Nov 22, 2019

Uh oh!

jfix71 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mciprian13 commented Nov 26, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jfix71 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

mciprian13 commented Nov 27, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mciprian13 commented Dec 2, 2019

Uh oh!

jfix71 left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mciprian13 commented Dec 3, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

facebook-github-bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

mciprian13 commented Nov 26, 2019 •

edited

Loading

mciprian13 commented Nov 27, 2019 •

edited

Loading

jfix71 left a comment •

edited

Loading

mciprian13 commented Dec 3, 2019 •

edited

Loading