Skip to content

functorch doesn't work in debug mode #465

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
zou3519 opened this issue Feb 7, 2022 · 11 comments
Closed

functorch doesn't work in debug mode #465

zou3519 opened this issue Feb 7, 2022 · 11 comments

Comments

@zou3519
Copy link
Contributor

zou3519 commented Feb 7, 2022

It's that autograd assert that we run into often:

import torch
from functorch import make_fx
from functorch.compile import nnc_jit


def f(x, y):
    return torch.broadcast_tensors(x, y)


inp1 = torch.rand(())
inp2 = torch.rand(3)

print(f(inp1, inp2))  # without nnc compile everything works fine

print(make_fx(f)(inp1, inp2))  # fails
print(nnc_jit(f)(inp1, inp2))
# RuntimeError: self__storage_saved.value().is_alias_of(result.storage())INTERNAL ASSERT FAILED at "autograd/generated/VariableType_3.cpp":3899, please report a bug to PyTorch.

cc @albanD @soulitzer what's the chance we can add an option to turn these off? They've been more harmful (e.g. prevent debugging in debug mode) than useful for us.

@albanD
Copy link
Contributor

albanD commented Feb 8, 2022

Why do they fail though?
Things that should be views should also be views in the context of functorch.

@zou3519
Copy link
Contributor Author

zou3519 commented Feb 9, 2022

@albanD It fails in AOTAutograd and AOTAutograd uses __torch_dispatch__ and there is no existing easy way for __torch_dispatch__ to set the alias relationship. Because that doesn't exist yet, most people who use __torch_dispatch__ with wrapper tensors in debug mode will run into this assert, right?

@albanD
Copy link
Contributor

albanD commented Feb 9, 2022

I don't think we run any debug build anywhere in CI? EDIT: after checking, there is actually a CI debug build that runs all the testss.
I also stopped building debug locally as 32GB of ram are not enough to make one unless you set a super low MAX_JOBS count.
So I don't know tbh.

@gmagogsfm
Copy link

Hit this issue when using functorch, I actually regularly develop in DEBUG mode because I need to step through compiler stuff often, would be great if we could fix it.

@soulitzer
Copy link
Contributor

Maybe we could temporarily update self.has_storage() check to self.has_storage() && self.storage().data() != nullptr if we can assume that is the case for all tensors created by make_wrapper_subclass. Then once we figure out a way to set the alias relationship revert this change.

@zou3519
Copy link
Contributor Author

zou3519 commented Feb 17, 2022

@soulitzer another way to go about it is to check self for the Python dispatch key -- if it has the Python dispatch key then we temporarily don't do the check. Is there an issue on the pytorch side for fixing the alias relationship for Tensor Subclasses?

@soulitzer
Copy link
Contributor

Is there an issue on the pytorch side for fixing the alias relationship for Tensor Subclasses?

This one seems related pytorch/pytorch#65339

@gmagogsfm
Copy link

@zhxchen17

@albanD
Copy link
Contributor

albanD commented Feb 17, 2022

Quick point following some offline discussion with Jeffrey:

  • We don't want to disable these tests in general, they have a good reason for existing.
  • We need to improve our view story for subclass as discussed in the issue linked above. We have quite a few possible ways that need to be explored. We should dive into them for sure.
  • If there are other DEBUG tests that you want to run for functorch, then I think it is ok to have a way for you to disable these. But they should run by default on DEBUG builds unless specified otherwise.

ezyang added a commit that referenced this issue Mar 2, 2022
Instead of saying that a PythonTensor has a regular (e.g., CPU) tensor
and an FX proxy, a PythonTensor *is a* regular CPU tensor, that also
carries an FX proxy (that updates as we go along).

This should fix #465 and
it also fixed some expected failures in the test suite.

Signed-off-by: Edward Z. Yang <[email protected]>
@ezyang ezyang closed this as completed in e7444f9 Mar 3, 2022
@Chillee Chillee reopened this Mar 3, 2022
@Chillee
Copy link
Contributor

Chillee commented Mar 3, 2022

Ed's PR resolves this for AOTAutograd, but not for vmap/grad more generally from my understanding.

@zou3519
Copy link
Contributor Author

zou3519 commented Mar 3, 2022

vmap/grad don't hit the asserts I think. But BatchedTensor and GradTensor do not have storage with leads to some other fun things...

@zou3519 zou3519 closed this as completed Jun 23, 2022
zou3519 pushed a commit to zou3519/pytorch that referenced this issue Jul 20, 2022
…h/functorch#554)

* Don't unnecessarily wrap the elem in PythonTensor

Instead of saying that a PythonTensor has a regular (e.g., CPU) tensor
and an FX proxy, a PythonTensor *is a* regular CPU tensor, that also
carries an FX proxy (that updates as we go along).

This should fix pytorch/functorch#465 and
it also fixed some expected failures in the test suite.

This kills the meta variant logic entirely; maybe some other time we'll
try to bring it back.

Signed-off-by: Edward Z. Yang <[email protected]>
bigfootjon pushed a commit to pytorch/pytorch that referenced this issue Jul 21, 2022
…h/functorch#554)

* Don't unnecessarily wrap the elem in PythonTensor

Instead of saying that a PythonTensor has a regular (e.g., CPU) tensor
and an FX proxy, a PythonTensor *is a* regular CPU tensor, that also
carries an FX proxy (that updates as we go along).

This should fix pytorch/functorch#465 and
it also fixed some expected failures in the test suite.

This kills the meta variant logic entirely; maybe some other time we'll
try to bring it back.

Signed-off-by: Edward Z. Yang <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants