Skip to content

Debug TorchScript error from moco #94

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
anijain2305 opened this issue Mar 23, 2022 · 2 comments
Closed

Debug TorchScript error from moco #94

anijain2305 opened this issue Mar 23, 2022 · 2 comments
Labels
bug Something isn't working

Comments

@anijain2305
Copy link
Contributor

Repro - python torchbench.py --training --devices=cuda --accuracy-ts --only=moco

This ones has a DistributedDataParallel module, so it might be something we can table for now.

Error is pretty long, the important section is as follows

	First diverging operator:
	Node diff:
		- %mod : __torch__.torch.nn.parallel.distributed.DistributedDataParallel = prim::GetAttr[name="mod"](%self.1)
		+ %mod : __torch__.torch.nn.parallel.distributed.___torch_mangle_596.DistributedDataParallel = prim::GetAttr[name="mod"](%self.1)
		?                                                ++++++++++++++++++++
ERROR: Tensor-valued Constant nodes differed in value across invocations. This often indicates that the tracer has encountered untraceable code.

@anijain2305
Copy link
Contributor Author

anijain2305 commented Mar 23, 2022

cc @eellison

@eellison
Copy link
Contributor

I don't think traced graphs are guaranteed to be consistent generally... Not sure this is a TorchScript error

@chekangliang chekangliang added the bug Something isn't working label May 14, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants