Skip to content

[ONNX] Import LSTM #3713

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 5 commits into from
Closed

Conversation

mciprian13
Copy link
Contributor

Summary
Add logic to import LSTM layer from ONNX model.

Documentation
None

Test Plan
Testcases to compare the numerical results of the LSTM with a Python (pytorch) reference implementation.

@mciprian13
Copy link
Contributor Author

Can we have this reviewed? Or at least start a first round of review because I suspect you might have a lot to say :)
Thanks!

Copy link
Contributor

@jfix71 jfix71 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mciprian13 This looks really great! Thanks for contributing this, along with the script for generating the tests and reference data. Apologies for the delay in review, there's a lot to chew on here 🙂

I'd like to make sure I'm reading the state variables part correctly. The initial hidden state is a Placeholder Y_h_ph. Then each LSTM unrolled passes its hidden state to the next as an intermediate, i.e. not through Y_h_ph. Then the final LSTM will save its final hidden state back out to Y_h_ph.

Do you have any specific reason for reusing Y_h_ph for both initial and final state? It works fine, but I would consider/suggest separating the two. If you're looking for best performance, depending on the architecture you may gain a lot from having the initial hidden state as a Constant, as it can be preloaded/held in fast memory closer to compute resources. So if they're separate PHs you could more easily convert the initial Y_h_ph to a Constant when you're deploying a final model (otherwise you will have an issue due to the final save to Y_h_ph since you're converting it to a Constant). Actually you'd probably want to just leave out saving back to Y_h_ph when you deploy anyway...

Also, I'm wondering if some of the logic here in the ONNXModelLoader could be moved into Graph.cpp. Or we could even actually introduce an LSTMNode and move some of this logic into Lower.cpp. This would allow reuse of this logic across other frontends, e.g. the Caffe2 proto loader, the PyTorch loader, etc. WDYT?

@mciprian13
Copy link
Contributor Author

mciprian13 commented Nov 26, 2019

Your understanding about the state placeholders usage is correct: Y_h and Y_c are used both as input and output (input for the first LSTM cell, output for the last LSTM cell, intermediate states do NOT pass thru these placeholders.

I see there are two proposals in your comment:

  1. Separate the state placeholders between input and output:
  • this means take Y_h and Y_c state placeholders and create 4 placeholders, say Y_h_inp, Y_h_out, Y_c_inp and Y_c_out
  • from a usability point of view (ease-of-use) I think this is not the right approach, because if the user wants for the LSTM state to be tracked between multiple inferences, this means it has to manually intervene between subsequent inferences by copying data from Y_h_out to Y_h_inp and from Y_c_out to Y_c_inp. If the input/output placeholders coincide then this is done automatically.
  • although the point above about keeping the input/output placeholders separate or same is not that strong, I am pretty convinced that I would not want to convert the input state to constant and I would not drop the output state. From a flexibility point of view, I would leave both the input AND output states accessible as placeholders because:
    • using a constant for the input state limits the possibility of initializing same model with different initial states (for example one might want to save the LSTM state after 100 inferences and initialize another instance of the LSTM with this state)
    • I would NOT drop the output states because they are useful for saving the state of the LSTM (and thus can be resumed/restored later for same LSTM instance or another)
  1. Agree, I can put the code somewhere where it is more reusable but I wouldn`t go that far and assume it would be used everywhere because this is concerned more with the ONNX definition of the LSTM which is pretty particular.

Therefore I propose I will do the following:

  1. I will move the function in Graph.cpp where I will name it "createONNXLSTM()" (the name suggests the definition of the LSTM is particular to the ONNX standard). I don`t think I will create an LSTM node, the lowering is done implicitly in the creation function itself.
  2. I will actually create two flavors of the function with same/separate input/output placeholders states.
    What do you say?

Note: We actually designed a model in-house based on LSTM which was running on one text character at a time (the model inference was run for each character) but needed to maintain the state up to the point where a sentence was over (period) when the state needed to be reset (set to zero). Having a bundle (AOT) generated by Glow which exposed the LSTM state as placeholders, made the state manipulation easy (moreover, having the in/out state placeholders over-imposed, the LSTM state was tracked automatically).

Copy link
Contributor

@jfix71 jfix71 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To clarify, the user running multiple inferences means that the unroll length was too short and they want to continue with execution? Or you have some other use cases here where you will want to continue with execution as you're mentioning? Sure, I think that makes sense. I don't think it's a big deal to simply swap the Tensors backing Y_h_inp and Y_h_out in the PlaceholderBindings, but I don't feel strongly here.

Anyway, your proposal sounds good. For the two flavors of createONNXLSTM() -- it could probably just be a boolean flag to the same function.

…tSlice function since the optimization was deferred to the optimizer.
@mciprian13
Copy link
Contributor Author

mciprian13 commented Nov 27, 2019

What I meant by multiple inference was this case:

  • the model is defined with only one LSTM cell (for example NLP model to process just one character) while the application needs to unroll the LSTM while propagating the state in order to process one sentence
  • this case was used with the AOT bundle: having the LSTM state exposed as prescribed, allowed me to propagate the state between multiple calls of the bundle entry function while processing a sentence and reset the state (set to 0) when the sentence was over

After the discussions, therefore, I created a separate function in Graph.cpp named "createONNXLSTM" which:

  • is generic to handle both separated or overlapped in/out states.
  • I removed the "manual" optimizations which currently are (or will be) done by the optimizer
    The invocation in the importer now has the following behavior with respect to the state:
  • if the input states are explicitly provided then those variables will be used (can be constants, placeholders, whatever)
  • if the input states are not explicitly provided then the placeholders used for the output states will be used

Please have another look and tell me if all is good (and btw what is going on with the CI??)
Thanks again!

@mciprian13
Copy link
Contributor Author

Can you run again the CI tests and have this landed? The error about "the unrelated histories" does not make sense since I pushed the newest commits in the same branch.

Copy link
Contributor

@jfix71 jfix71 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mciprian13 LGTM!

Not exactly sure what's up with CI -- when I look at your branch it's 104 commits behind Glow master. Can you rebase on the latest Glow master and update the branch? I think that might solve the issue. I tried restarting it but it consistently shows that same error.

@mciprian13
Copy link
Contributor Author

mciprian13 commented Dec 3, 2019

Indeed a simple merge with the master fixed this problem.
Let's have this landed then!
I`m also "calling dibs" on adding support for ONNX GRU since it is very similar to this (actually slightly simpler).
Thanks!

Copy link

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jfix71 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@mciprian13 mciprian13 deleted the Add_LSTM_ONNX_Import branch December 4, 2019 09:34
facebook-github-bot pushed a commit that referenced this pull request Dec 13, 2019
Summary:
**Summary**
Add RNN and GRU modules in the ONNX Importer.
This PR is similar to #3713.
I added both RNN and GRU in the same PR because they are very similar.

**Documentation**
None

**Test Plan**
Add ONNX models (and Python generator scripts) with PyTorch numerical references.
Pull Request resolved: #3847

Differential Revision: D18987803

Pulled By: jfix71

fbshipit-source-id: 09f760aa57cd416bec91f8b67c83f315cb5acfff
vdantu pushed a commit to vdantu/glow that referenced this pull request Jul 12, 2020
Summary:
**Summary**
Add logic to import LSTM layer from ONNX model.

**Documentation**
None

**Test Plan**
Testcases to compare the numerical results of the LSTM with a Python (pytorch) reference implementation.
Pull Request resolved: pytorch#3713

Differential Revision: D18782616

Pulled By: jfix71

fbshipit-source-id: 339606694ba9684179be979be5da980faa7e8097
vdantu pushed a commit to vdantu/glow that referenced this pull request Jul 12, 2020
Summary:
**Summary**
Add RNN and GRU modules in the ONNX Importer.
This PR is similar to pytorch#3713.
I added both RNN and GRU in the same PR because they are very similar.

**Documentation**
None

**Test Plan**
Add ONNX models (and Python generator scripts) with PyTorch numerical references.
Pull Request resolved: pytorch#3847

Differential Revision: D18987803

Pulled By: jfix71

fbshipit-source-id: 09f760aa57cd416bec91f8b67c83f315cb5acfff
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants