Skip to content

Control flows #2738

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
mattiasmar opened this issue Apr 17, 2019 · 14 comments
Open

Control flows #2738

mattiasmar opened this issue Apr 17, 2019 · 14 comments

Comments

@mattiasmar
Copy link

Hi,
When can Glow be expected to support control flows? I would like to see LSTM implemented in Glow.
Thanks!

@nadavrot
Copy link
Contributor

Hi @mattiasmar! We support LSTMs by unrolling the network. For example, the char-rnn and English-To-German use LSTM structures. At some point we may extend Glow to support more control flow features but this
is not something that anyone is actively working on.

@mattiasmar
Copy link
Author

For the purpose of extending Glow to support LSTM structures (not unrolled); Could you provide some pointers? Design thoughts? Links to work in progress? Core elements that would need to be implemented?
Thanks!

@nadavrot
Copy link
Contributor

To support general control flow we would need to extend the internal Glow data structures. Rename Function to BasicBlock, and add a new top-level function. Allow BasicBlocks to be terminated with some "branch" instruction that jumps to a different basic block.

This is going to be a major change, and I am not sure how to implement things like control-flow-aware automatic differentiation (that we have today for the DAG).

I think that the best path forward right now would be to rely on the high-level framework (example, PyTorch) to implement the control flow and call into Glow for the dag sections of the compute. Moving forward we can gradually implement full support for control-flow, based on the semantics of PyTorch.

@mattiasmar
Copy link
Author

So how you approach a model like GNMT? Unrolling the network wouldn't be very processing efficient. How could we best take benefit of Glow, without doing a major change to the code base (while keeping the option to do some minor changes)?

@jfix71
Copy link
Contributor

jfix71 commented Apr 17, 2019

We have some initial support for Predication, which would allow us to skip unnecessary computation in an unrolled network. That would probably be the shortest path to a highly performant NMT. Note that this requires the backend to also support predication.

@ajayanto
Copy link

ajayanto commented Sep 9, 2019

Our backend supports LSTM as-is. So if I add code for loading LSTM operator in glow, then can we implement something similar to https://github.com/microsoft/onnxruntime/blob/master/onnxruntime/core/providers/cpu/rnn/deep_cpu_lstm.cc in BoundInterpreterFunction for profiling?

@jfix71
Copy link
Contributor

jfix71 commented Sep 9, 2019

@ajayanto We already have basic LSTM support, just not as a single node (it's currently directly implemented via its component pieces instead of as a Node that's lowered; see Function::createLSTM()). We could add in an LSTM Node and then lower it essentially using the logic from Function::createLSTM(), and then your backend could just prevent lowering for it.

This could be useful for you if you are going to run an unrolled RNN that uses LSTMs, or some other network with an LSTM that does not use control flow. But just to be clear, we still do not have support for control flow needed for general RNNs, e.g. ONNX's Scan or C2's RecurrentNetwork, which we would need to unroll.

@ajayanto
Copy link

ajayanto commented Sep 9, 2019

@jfix71 as a workaround,
Instead of lowering LSTM into multiple nodes, If we add LSTM node(as-is) in glow and implement its functionality (similar to onnx runtime implementation) in BoundInterpreterFunction::fwdLSTMInst function then is it possible to run inference and generate profile using interpreter backend?

@jfix71
Copy link
Contributor

jfix71 commented Sep 10, 2019

@ajayanto You don't need to provide an implementation for fwdLSTMInst in order to get a profile. We will want to lower the LSTMNode to its subnodes for the Interpreter backend, with very similar logic to that found currently inside Function::createLSTM().

If your backend does not want to lower the LSTM then it does not need to -- it would return false for LSTMNode inside YourBackend::shouldLower(). It would then see the LSTMNode passed to it inside YourBackend::compile().

In order to do this you'd need add an LSTMNode, somewhat following the instructions in docs/NewOperators.md. You'd follow the instructions for if the Node does not need to have low-level IR because it will be lowered (see the third bullet here). The only main difference for the steps there is that there is already a Function::createLSTM() which we would instead return a newly created LSTMNode for. And then the logic currently inside Function::createLSTM() will be moved to its own case in Lower.cpp.

@hgarg5072
Copy link

@jfix71 the current implementation of LSTM in Function::createLSTM() doesn't has any option to provide weights as input. So is this function training specific only? What do I have to do in order to have an inference node?
Also, can't we have LSTMs as node instead of just a function?

@jfix71
Copy link
Contributor

jfix71 commented Oct 18, 2019

@hgarg5072 For historical reasons we didn't have an LSTMNode due to lowering/quantization-related issues, but those issues have been resolved, so it could be added now. And yes we should also add an API for creating an LSTM with already-trained weights regardless of whether we add an LSTMNode -- there's no real reason we don't have one yet. Are you interested in working on either of these issues?

@ksaurabh-cadence
Copy link
Contributor

@jfix71 @nadavrot Just checking if there is any progress on supporting loop/control flow? It's been a while :-) Any approaches which have been considered and rejected?

@jfix71
Copy link
Contributor

jfix71 commented Sep 30, 2020

Hi @ksaurabh-cadence -- I don't think much has changed here, actually. I still think the easiest path forward for now would be to allow e.g. PyTorch (or whatever is driving runtime execution, e.g. a C wrapper with a main for executing an AOT bundle) to call into Glow for high performance execution of each iteration of the model, assuming the control flow is mostly there to wrap around e.g. and LSTM until an end token is seen. Or alternatively, to improve support for predication and then unroll the model to some max length. We could still do a large refactor as Nadav mentioned as well, but I am not clear on the cost-benefit there, as it will require a decent amount of work, and I think we could likely get most of the benefit from the first approach.

@ksaurabh-cadence
Copy link
Contributor

Hi @jfix71 Is there an example that I can look up for the first approach you suggest using a c-wrapper? If not, is there an example which can be added?

What kind of improvement for predication do you have in mind?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants