functorch doesn't work in debug mode #465

zou3519 · 2022-02-07T22:36:44Z

It's that autograd assert that we run into often:

import torch
from functorch import make_fx
from functorch.compile import nnc_jit


def f(x, y):
    return torch.broadcast_tensors(x, y)


inp1 = torch.rand(())
inp2 = torch.rand(3)

print(f(inp1, inp2))  # without nnc compile everything works fine

print(make_fx(f)(inp1, inp2))  # fails
print(nnc_jit(f)(inp1, inp2))
# RuntimeError: self__storage_saved.value().is_alias_of(result.storage())INTERNAL ASSERT FAILED at "autograd/generated/VariableType_3.cpp":3899, please report a bug to PyTorch.

cc @albanD @soulitzer what's the chance we can add an option to turn these off? They've been more harmful (e.g. prevent debugging in debug mode) than useful for us.

The text was updated successfully, but these errors were encountered:

albanD · 2022-02-08T20:41:19Z

Why do they fail though?
Things that should be views should also be views in the context of functorch.

zou3519 · 2022-02-09T13:26:44Z

@albanD It fails in AOTAutograd and AOTAutograd uses __torch_dispatch__ and there is no existing easy way for __torch_dispatch__ to set the alias relationship. Because that doesn't exist yet, most people who use __torch_dispatch__ with wrapper tensors in debug mode will run into this assert, right?

albanD · 2022-02-09T14:47:30Z

I don't think we run any debug build anywhere in CI? EDIT: after checking, there is actually a CI debug build that runs all the testss.
I also stopped building debug locally as 32GB of ram are not enough to make one unless you set a super low MAX_JOBS count.
So I don't know tbh.

gmagogsfm · 2022-02-16T19:42:40Z

Hit this issue when using functorch, I actually regularly develop in DEBUG mode because I need to step through compiler stuff often, would be great if we could fix it.

soulitzer · 2022-02-17T00:13:07Z

Maybe we could temporarily update self.has_storage() check to self.has_storage() && self.storage().data() != nullptr if we can assume that is the case for all tensors created by make_wrapper_subclass. Then once we figure out a way to set the alias relationship revert this change.

zou3519 · 2022-02-17T14:33:03Z

@soulitzer another way to go about it is to check self for the Python dispatch key -- if it has the Python dispatch key then we temporarily don't do the check. Is there an issue on the pytorch side for fixing the alias relationship for Tensor Subclasses?

soulitzer · 2022-02-17T15:08:15Z

Is there an issue on the pytorch side for fixing the alias relationship for Tensor Subclasses?

This one seems related pytorch/pytorch#65339

gmagogsfm · 2022-02-17T21:48:22Z

@zhxchen17

albanD · 2022-02-17T22:44:45Z

Quick point following some offline discussion with Jeffrey:

We don't want to disable these tests in general, they have a good reason for existing.
We need to improve our view story for subclass as discussed in the issue linked above. We have quite a few possible ways that need to be explored. We should dive into them for sure.
If there are other DEBUG tests that you want to run for functorch, then I think it is ok to have a way for you to disable these. But they should run by default on DEBUG builds unless specified otherwise.

Instead of saying that a PythonTensor has a regular (e.g., CPU) tensor and an FX proxy, a PythonTensor *is a* regular CPU tensor, that also carries an FX proxy (that updates as we go along). This should fix #465 and it also fixed some expected failures in the test suite. Signed-off-by: Edward Z. Yang <[email protected]>

Chillee · 2022-03-03T16:23:09Z

Ed's PR resolves this for AOTAutograd, but not for vmap/grad more generally from my understanding.

zou3519 · 2022-03-03T16:49:29Z

vmap/grad don't hit the asserts I think. But BatchedTensor and GradTensor do not have storage with leads to some other fun things...

…h/functorch#554) * Don't unnecessarily wrap the elem in PythonTensor Instead of saying that a PythonTensor has a regular (e.g., CPU) tensor and an FX proxy, a PythonTensor *is a* regular CPU tensor, that also carries an FX proxy (that updates as we go along). This should fix pytorch/functorch#465 and it also fixed some expected failures in the test suite. This kills the meta variant logic entirely; maybe some other time we'll try to bring it back. Signed-off-by: Edward Z. Yang <[email protected]>

ezyang mentioned this issue Mar 2, 2022

Don't unnecessarily wrap the elem in PythonTensor #554

Merged

ezyang closed this as completed in e7444f9 Mar 3, 2022

Chillee reopened this Mar 3, 2022

zou3519 closed this as completed Jun 23, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

functorch doesn't work in debug mode #465

functorch doesn't work in debug mode #465

zou3519 commented Feb 7, 2022 •

edited

Loading

albanD commented Feb 8, 2022

zou3519 commented Feb 9, 2022

albanD commented Feb 9, 2022 •

edited

Loading

gmagogsfm commented Feb 16, 2022

soulitzer commented Feb 17, 2022

zou3519 commented Feb 17, 2022

soulitzer commented Feb 17, 2022

gmagogsfm commented Feb 17, 2022

albanD commented Feb 17, 2022

Chillee commented Mar 3, 2022

zou3519 commented Mar 3, 2022

functorch doesn't work in debug mode #465

functorch doesn't work in debug mode #465

Comments

zou3519 commented Feb 7, 2022 • edited Loading

albanD commented Feb 8, 2022

zou3519 commented Feb 9, 2022

albanD commented Feb 9, 2022 • edited Loading

gmagogsfm commented Feb 16, 2022

soulitzer commented Feb 17, 2022

zou3519 commented Feb 17, 2022

soulitzer commented Feb 17, 2022

gmagogsfm commented Feb 17, 2022

albanD commented Feb 17, 2022

Chillee commented Mar 3, 2022

zou3519 commented Mar 3, 2022

zou3519 commented Feb 7, 2022 •

edited

Loading

albanD commented Feb 9, 2022 •

edited

Loading