-
Notifications
You must be signed in to change notification settings - Fork 566
Closed
Labels
dynamismDynamic Shape FeaturesDynamic Shape Featurespytorch bugThis bug (likely) requires changes in PyTorch to addressThis bug (likely) requires changes in PyTorch to address
Milestone
Description
In running TestExpandSymInt
(ref PR, I run into the following error. The error suggests is_symbolic
returns an incorrect value when calling toSymbolicIntNode()
method. This seems to suggest the upstream API call needs investigation. @Gamrix wdyt?
(base) $ source xlaCppTest.sh ExpandSymInt
Note: Google Test filter = AtenXlaTensorTest.TestExpandSymInt
[==========] Running 1 test from 1 test suite.
[----------] Global test environment set-up.
[----------] 1 test from AtenXlaTensorTest
[ RUN ] AtenXlaTensorTest.TestExpandSymInt
2022-07-04 06:27:20.555229: I 178569 tensorflow/core/tpu/tpu_initializer_helper.cc:253] Libtpu path is: libtpu.so
2022-07-04 06:27:20.555808: I 178569 tensorflow/compiler/xla/xla_client/xrt_local_service.cc:55] libtpu status: OK
2022-07-04 06:27:20.555859: I 178569 tensorflow/compiler/xla/xla_client/xrt_local_service.cc:41] Peer localservice 1 {localhost:40934}
2022-07-04 06:27:20.556054: I 178569 tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: SSE3 SSE4.1 SSE4.2 AVX AVX2 AVX512F FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-07-04 06:27:20.573048: W 178569 tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda/lib64:/usr/local/nvidia/lib:/usr/local/nvidia/lib64
2022-07-04 06:27:20.573096: W 178569 tensorflow/stream_executor/cuda/cuda_driver.cc:269] failed call to cuInit: UNKNOWN ERROR (303)
2022-07-04 06:27:20.573133: I 178569 tensorflow/stream_executor/cuda/cuda_diagnostics.cc:156] kernel driver does not appear to be running on this host (bc9148e3e599): /proc/driver/nvidia/version does not exist
2022-07-04 06:27:20.620601: I 178569 tensorflow/compiler/xla/service/service.cc:174] XLA service 0x17ad290 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2022-07-04 06:27:20.620662: I 178569 tensorflow/compiler/xla/service/service.cc:182] StreamExecutor device (0): Host, Default Version
2022-07-04 06:27:20.672979: I 178569 tensorflow/core/distributed_runtime/rpc/grpc_channel.cc:272] Initialize GrpcChannelCache for job localservice -> {0 -> localhost:40934}
2022-07-04 06:27:20.674373: I 178569 tensorflow/core/distributed_runtime/rpc/grpc_server_lib.cc:438] Started server with target: grpc://localhost:40934
2022-07-04 06:27:20.766387: I 179241 tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:354] MLIR V1 optimization pass is not enabled
2022-07-04 06:27:20.776891: I 178865 tensorflow/compiler/jit/xla_device.cc:429] XLA_GPU and XLA_CPU devices are deprecated and will be removed in subsequent releases. Instead, use either @tf.function(jit_compile=True) for must-compile semantics, or run with TF_XLA_FLAGS=--tf_xla_auto_jit=2 for auto-clustering best-effort compilation.
unknown file: Failure
C++ exception with description "Expected is_symbolic() to be true, but got false. (Could this error message be improved? If so, please report an enhancement request to PyTorch.)
Exception raised from toSymbolicIntNode at /workspace/pytorch/c10/core/SymInt.cpp:9 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 0x7d (0x7f048e8bd01d in /workspace/pytorch/torch/lib/libc10.so)
frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, char const*) + 0xdd (0x7f048e8bb84d in /workspace/pytorch/torch/lib/libc10.so)
frame #2: <unknown function> + 0x1dd41 (0x7f048e8a5d41 in /workspace/pytorch/torch/lib/libc10.so)
frame #3: torch_xla::SymIntElements::SetSymIntNodeElements(c10::SymInt&) + 0x1b (0x7f048e6f9f8b in /workspace/pytorch/xla/build/lib.linux-x86_64-3.7/libptxla.so)
frame #4: torch_xla::SymIntElements::SymIntElements(c10::SymIntArrayRef&) + 0x9c (0x7f048e29399c in /workspace/pytorch/xla/build/lib.linux-x86_64-3.7/libptxla.so)
frame #5: torch_xla::XLANativeFunctions::expand_symint(at::Tensor const&, c10::SymIntArrayRef, bool) + 0x5a (0x7f048e24aaca in /workspace/pytorch/xla/build/lib.linux-x86_64-3.7/libptxla.so)
frame #6: <unknown function> + 0x24da98 (0x7f048e31ca98 in /workspace/pytorch/xla/build/lib.linux-x86_64-3.7/libptxla.so)
frame #7: at::_ops::expand_SymInt::redispatch(c10::DispatchKeySet, at::Tensor const&, c10::SymIntArrayRef, bool) + 0x80 (0x7f0463f6edf0 in /workspace/pytorch/torch/lib/libtorch_cpu.so)
frame #8: <unknown function> + 0x3b4be4a (0x7f04660e2e4a in /workspace/pytorch/torch/lib/libtorch_cpu.so)
frame #9: at::_ops::expand_SymInt::redispatch(c10::DispatchKeySet, at::Tensor const&, c10::SymIntArrayRef, bool) + 0x80 (0x7f0463f6edf0 in /workspace/pytorch/torch/lib/libtorch_cpu.so)
frame #10: <unknown function> + 0x324d68b (0x7f04657e468b in /workspace/pytorch/torch/lib/libtorch_cpu.so)
frame #11: at::_ops::expand_SymInt::call(at::Tensor const&, c10::SymIntArrayRef, bool) + 0x156 (0x7f0463f6eac6 in /workspace/pytorch/torch/lib/libtorch_cpu.so)
frame #12: /workspace/pytorch/xla/test/cpp/build/test_ptxla() [0x6bb281]
frame #13: torch_xla::cpp_test::ForEachDevice(absl::lts_20211102::Span<torch_xla::DeviceType const>, std::function<void (c10::Device const&)> const&) + 0x140 (0x592110 in /workspace/pytorch/xla/test/cpp/build/test_ptxla)
frame #14: torch_xla::cpp_test::AtenXlaTensorTest_TestExpandSymInt_Test::TestBody() + 0xf7 (0x607507 in /workspace/pytorch/xla/test/cpp/build/test_ptxla)
frame #15: void testing::internal::HandleSehExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) + 0x7e (0x7bee5e in /workspace/pytorch/xla/test/cpp/build/test_ptxla)
frame #16: void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) + 0x7b (0x7a37db in /workspace/pytorch/xla/test/cpp/build/test_ptxla)
frame #17: testing::Test::Run() + 0xd9 (0x77d3e9 in /workspace/pytorch/xla/test/cpp/build/test_ptxla)
frame #18: testing::TestInfo::Run() + 0x10d (0x77e19d in /workspace/pytorch/xla/test/cpp/build/test_ptxla)
frame #19: testing::TestSuite::Run() + 0x110 (0x77ea00 in /workspace/pytorch/xla/test/cpp/build/test_ptxla)
frame #20: testing::internal::UnitTestImpl::RunAllTests() + 0x473 (0x78f6c3 in /workspace/pytorch/xla/test/cpp/build/test_ptxla)
frame #21: bool testing::internal::HandleSehExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) + 0x7e (0x7c2a6e in /workspace/pytorch/xla/test/cpp/build/test_ptxla)
frame #22: bool testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) + 0x7b (0x7a616b in /workspace/pytorch/xla/test/cpp/build/test_ptxla)
frame #23: testing::UnitTest::Run() + 0xd4 (0x78f204 in /workspace/pytorch/xla/test/cpp/build/test_ptxla)
frame #24: main + 0x1c (0x58fc6c in /workspace/pytorch/xla/test/cpp/build/test_ptxla)
frame #25: __libc_start_main + 0xe7 (0x7f0461c0ac87 in /lib/x86_64-linux-gnu/libc.so.6)
frame #26: _start + 0x2a (0x58fb8a in /workspace/pytorch/xla/test/cpp/build/test_ptxla)
" thrown in the test body.
[ FAILED ] AtenXlaTensorTest.TestExpandSymInt (245 ms)
[----------] 1 test from AtenXlaTensorTest (245 ms total)
[----------] Global test environment tear-down
[==========] 1 test from 1 test suite ran. (245 ms total)
[ PASSED ] 0 tests.
[ FAILED ] 1 test, listed below:
[ FAILED ] AtenXlaTensorTest.TestExpandSymInt
FWIW, TestExpand
runs successfully as expected.
CC @Krovatkin
Metadata
Metadata
Assignees
Labels
dynamismDynamic Shape FeaturesDynamic Shape Featurespytorch bugThis bug (likely) requires changes in PyTorch to addressThis bug (likely) requires changes in PyTorch to address