Skip to content

Conversation

fmassa
Copy link
Contributor

@fmassa fmassa commented Feb 27, 2016

Here is an attempt to avoid using the assignment operator in Identity.
I tried to handle all previous cases, but maybe I missed something.
Here is a summary:

  • recursively calls set on tensors
  • copy non-tensors and non-table objects (as lua numbers, strings and functions)
  • reuses previously created tensors/tables.
  • add tests to Identity

If we want to go through the path of avoiding lua assignments everywhere in lua (as in several Containers for example), then maybe we should factor out the identity function to use it everywhere instead of =. In this case maybe it would be better to rename it assign or something like that and put it in nn.utils.

@jfsantos @soumith as you have worked on that in the past
@dominikgrewe @koraykv to check if I didn't break anything in nngraph and in your internal tests.
@szagoruyko for the removal of clearState

-- clean up behind if the current input is
-- smaller than the previous one
for k in pairs(out) do
if not input[k] then
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be "if input[k] == nil", since we do not want to remove "false" entries from out.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, I should fix it ! Thanks for pointing this out !

@fmassa
Copy link
Contributor Author

fmassa commented Mar 1, 2016

@davidsaxton I've fixed the false case, and added the corresponding tests for it.

@fmassa
Copy link
Contributor Author

fmassa commented Mar 4, 2016

After giving it a second thought, I think we might not want to use :set for the Containers.

I think that what is necessary is to ensure that all leaf modules (non-container modules) to have it's own self.output and self.gradInput tensors. @szagoruyko has a PR that complements this one for Dropout, AddConstant, MulConstant and Copy.

I think that in this setup, clearState will still work properly (even after we remove some hacks here and there), and also it will allow for a simpler parsing of the nn network as a graph (say for memory optimization). Indeed, in this setup every leaf node (corresponding to non-containers) will have it's own tensor, and we can run over the network looking for unique output/gradInput tensors, without having redundant new tensors coming from containers.

To summarize, I propose to keep all the containers as is, and only change the leaf modules to avoid lua assignment.

@apaszke
Copy link
Contributor

apaszke commented Mar 10, 2016

@fmassa Awesome! I've been fighting with that Identity behaviour for a while. It will finally be consistent with other modules.

@apaszke
Copy link
Contributor

apaszke commented Mar 10, 2016

@fmassa The only downside of not using :set for Containers is that it will leave them inconsistent. For example nn.Concat owns it's output, but nn.Sequential doesn't, so it still makes you write additional logic.

@fmassa
Copy link
Contributor Author

fmassa commented Mar 10, 2016

@apaszke That was my initial thought too. But I'm not sure we need Sequential (and ParallelTable and ConcatTable, among others) to have it's own output tensor.

If you look at the network as a computation graph, where each node performs some operation (and owns its output tensor) and an edge corresponds to taking the output of node A and passing it as input to node B, a Sequential is simply linking nodes (it doesn't actually perform any computation), so it shouldn't own it's output tensor (which would mean a spurious tensor floating around in the graph).

On the other hand, you can see a Concat as a ConcatTable + JoinTable, and as a JoinTable indeed performs some operation (and thus should own its output tensor), the same should apply for Concat (i.e., it should also own its output tensor).
In the same spirit, you can see a Parallel as a SplitTable+ParallelTable+JoinTable, so it should also own its output.

@apaszke
Copy link
Contributor

apaszke commented Mar 11, 2016

@fmassa Well, Identity doesn't perform any operation either 😜 You could see a Sequential as a container ending with an Identity, so it might also perform an operation that owns a tensor in a way.

But I understand the point of view, that some of the containers only serve computation graph construction, while others perform "true" operations as well. However, in this setting, Identity can be seen as one of the construction modules too - it represents a single graph edge (that might not be connected yet - e.g. nngraph). I'm not sure if it's a good idea to change it then.

@fmassa
Copy link
Contributor Author

fmassa commented Mar 16, 2016

@apaszke after thinking about what you said, I think that we might not want to change the behaviour of Identity as proposed by this PR.

What we could possible enforce is that all modules that perform some operation should own its output/gradInput, and its tensor references should not change with time (except if the computation graph itself changes during time).

Indeed, nn.Identity is a "graph-constructor" module with no operation except linking two (or more) nodes, so it's probably better to leave it like that.

Having that in mind, we can easily create nice graphs from nn networks if what I said before is enforced.

If you all agree, I can close this PR.

@apaszke
Copy link
Contributor

apaszke commented Mar 17, 2016

@fmassa Ok, I'm fine with taking that way. However, it would be worth documenting it somewhere. When I wrote code for sharing/clearing outputs/gradInputs it was a real pain to do that (it was before :clearState has been introduced).

Wow, that package looks very cool! It's awesome that it shares internal buffers.

@fmassa
Copy link
Contributor Author

fmassa commented Aug 8, 2016

I don't think anymore that this PR is a good idea. Closing it.

@fmassa fmassa closed this Aug 8, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants