Skip to content

run_tests.sh fails at test_view_copy_xla #3795

@miladm

Description

@miladm

🐛 Bug

I started to observe the following error when calling ./run_tests.sh locally. Has anyone else seen it? This is happening on the latest master of pytorch and pytorch/xla. The stack trace goes through autograd, engine, and expand. It makes me wonder if it has to do with a recent upstream change for dynamic shape support.

This issue is not showing up in CI tests. Here are the flags I have enables when running the test:

export COMPILE_PARALLEL=1 DEBUG=0 XLA_IR_DEBUG=1 XLA_HLO_DEBUG=1 TF_CPP_LOG_THREAD_ID=1

Error:

test_view_copy_xla (__main__.TestViewOpsXLA) ... *** Received signal 11 ***

...

*** Begin stack trace ***
    tensorflow::CurrentStackTrace[abi:cxx11]()


    torch::lazy::GetPythonFrames()

    torch::lazy::GetMetaDataIfDebugging()
    torch::lazy::Node::Node(torch::lazy::OpKind, c10::ArrayRef<torch::lazy::Value>, std::vector<torch::lazy::Shape, std::allocator<torch::lazy::Shape> >&&, unsigned long)
    torch_xla::XlaNode::XlaNode(torch::lazy::OpKind, c10::ArrayRef<torch::lazy::Value>, std::vector<torch::lazy::Shape, std::allocator<torch::lazy::Shape> >&&, xla::Shape, unsigned long, torch::lazy::hash_t)
    torch_xla::XlaNode::XlaNode(torch::lazy::OpKind, c10::ArrayRef<torch::lazy::Value>, xla::Shape, unsigned long, torch::lazy::hash_t)
    torch_xla::XlaNode::XlaNode(torch::lazy::OpKind, c10::ArrayRef<torch::lazy::Value>, std::function<xla::Shape ()> const&, unsigned long, torch::lazy::hash_t)
    torch_xla::Expand::Expand(torch::lazy::Value const&, std::vector<long, std::allocator<long> >)
    torch_xla::XLATensor::expand(c10::intrusive_ptr<torch_xla::XLATensor, c10::detail::intrusive_target_default_null_type<torch_xla::XLATensor> > const&, std::vector<long, std::allocator<long> >)
    torch_xla::XLANativeFunctions::expand(at::Tensor const&, c10::ArrayRef<long>, bool)

    at::_ops::expand::redispatch(c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<long>, bool)

    at::_ops::expand::redispatch(c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<long>, bool)

    at::_ops::expand::call(at::Tensor const&, c10::ArrayRef<long>, bool)
    torch::autograd::generated::SumBackward0::apply(std::vector<at::Tensor, std::allocator<at::Tensor> >&&)

    torch::autograd::Engine::evaluate_function(std::shared_ptr<torch::autograd::GraphTask>&, torch::autograd::Node*, torch::autograd::InputBuffer&, std::shared_ptr<torch::autograd::ReadyQueue> const&)
    torch::autograd::Engine::thread_main(std::shared_ptr<torch::autograd::GraphTask> const&)
    torch::autograd::Engine::thread_init(int, std::shared_ptr<torch::autograd::ReadyQueue> const&, bool)
    torch::autograd::python::PythonEngine::thread_init(int, std::shared_ptr<torch::autograd::ReadyQueue> const&, bool)


    clone
*** End stack trace ***

To Reproduce

Run run_tests.sh locally on latest PyTorch and PyTorch/XLA `master.

CC @Krovatkin @JackCaoG @yeounoh

Metadata

Metadata

Assignees

Labels

dynamismDynamic Shape Features

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions