Closed
Description
@lantiga just confirmed that this issue was introduced with the batching PR (8752589) and it happens deterministically. Was not able yet to make this crash happen with less load and using RLTest.
gdb bactrace showing issues on RAI_ModelRunTF -> TF_DeleteSession:
Thread 2 "redis-server" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7f3864bf2700 (LWP 4669)]
0x0000000000000000 in ?? ()
(gdb) bt
#0 0x0000000000000000 in ?? ()
#1 0x00007f385813f2c3 in tensorflow::OpSegment::ShouldOwnKernel(tensorflow::FunctionLibraryRuntime*, std::string const&) ()
from /home/filipe/redislabs/RedisAI/./bin/linux-x64-debug/install-cpu/backends/redisai_tensorflow/lib/libtensorflow_framework.so.1
#2 0x00007f385f764ab6 in std::_Function_handler<void (tensorflow::OpKernel*), tensorflow::DirectSession::CreateExecutors(tensorflow::CallableOptions const&, std::unique_ptr<tensorflow::DirectSession::ExecutorsAndKeys, std::default_delete<tensorflow::DirectSession::ExecutorsAndKeys> >*, std::unique_ptr<tensorflow::DirectSession::FunctionInfo, std::default_delete<tensorflow::DirectSession::FunctionInfo> >*, tensorflow::DirectSession::RunStateArgs*)::{lambda(tensorflow::OpKernel*)#2}>::_M_invoke(std::_Any_data const&, tensorflow::OpKernel*) ()
from /home/filipe/redislabs/RedisAI/./bin/linux-x64-debug/install-cpu/backends/redisai_tensorflow/lib/libtensorflow.so.1
#3 0x00007f3858382d67 in tensorflow::(anonymous namespace)::ExecutorImpl::~ExecutorImpl() ()
from /home/filipe/redislabs/RedisAI/./bin/linux-x64-debug/install-cpu/backends/redisai_tensorflow/lib/libtensorflow_framework.so.1
#4 0x00007f385f777ebf in std::_Sp_counted_deleter<tensorflow::DirectSession::ExecutorsAndKeys*, std::default_delete<tensorflow::DirectSession::ExecutorsAndKeys>, std::allocator<void>, (__gnu_cxx::_Lock_policy)2>::_M_dispose() () from /home/filipe/redislabs/RedisAI/./bin/linux-x64-debug/install-cpu/backends/redisai_tensorflow/lib/libtensorflow.so.1
#5 0x00007f38598fef39 in std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release() ()
from /home/filipe/redislabs/RedisAI/./bin/linux-x64-debug/install-cpu/backends/redisai_tensorflow/lib/libtensorflow.so.1
#6 0x00007f385f77162e in tensorflow::DirectSession::~DirectSession() () from /home/filipe/redislabs/RedisAI/./bin/linux-x64-debug/install-cpu/backends/redisai_tensorflow/lib/libtensorflow.so.1
#7 0x00007f385f771db1 in tensorflow::DirectSession::~DirectSession() () from /home/filipe/redislabs/RedisAI/./bin/linux-x64-debug/install-cpu/backends/redisai_tensorflow/lib/libtensorflow.so.1
#8 0x00007f385990b846 in TF_DeleteSession () from /home/filipe/redislabs/RedisAI/./bin/linux-x64-debug/install-cpu/backends/redisai_tensorflow/lib/libtensorflow.so.1
#9 0x00007f38621e2c30 in ?? ()
#10 0x00007f3864bf0570 in ?? ()
#11 0x00007f3865c29000 in ?? ()
#12 0x0000000000000000 in ?? ()
Backtrace on another crash, this time reported via redis-server output:
EIP:
/home/filipe/redislabs/RedisAI/./bin/linux-x64-debug/install-cpu/backends/redisai_tensorflow/lib/libtensorflow_framework.so.1(_ZN5nsync13nsync_mu_lockEPNS_11nsync_mu_s_E+0x17)[0x7fa4c1ff7db7]
Backtrace:
redis-server *:6379(logStackTrace+0x5a)[0x563616c68f8a]
redis-server *:6379(sigsegvHandler+0xb1)[0x563616c69741]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x12890)[0x7fa4cf80c890]
/home/filipe/redislabs/RedisAI/./bin/linux-x64-debug/install-cpu/backends/redisai_tensorflow/lib/libtensorflow_framework.so.1(_ZN5nsync13nsync_mu_lockEPNS_11nsync_mu_s_E+0x17)[0x7fa4c1ff7db7]
/home/filipe/redislabs/RedisAI/./bin/linux-x64-debug/install-cpu/backends/redisai_tensorflow/lib/libtensorflow.so.1(TF_GraphOperationByName+0x1c)[0x7fa4c52b2f9c]
./bin/linux-x64-debug/install-cpu/backends/redisai_tensorflow/redisai_tensorflow.so(RAI_ModelRunTF+0x63e)[0x7fa4cb5e63e4]
./bin/linux-x64-debug/install-cpu/redisai.so(RAI_ModelRun+0x89)[0x7fa4ce011cc1]
./bin/linux-x64-debug/install-cpu/redisai.so(RedisAI_RunSession+0x127)[0x7fa4ce006d3c]
./bin/linux-x64-debug/install-cpu/redisai.so(RedisAI_Run_ThreadMain+0x370)[0x7fa4ce009840]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x76db)[0x7fa4cf8016db]
/lib/x86_64-linux-gnu/libc.so.6(clone+0x3f)[0x7fa4cf52a88f]