Description
- What versions are you using?
process.platform = 'linux'
process.version = 'v14.16.0'
process.arch = 'x64'
require('oracledb').versionString = '5.1.0'
require('oracledb').oracleClientVersionString = '18.3.0.0.0'
- Is it an error or a hang or a crash?
Crash. It's caused by a memory bug and is random but occurs with high frequency.
When running a Jest test scenario (details below) it will generally occur between the 2nd and 5th test suite that's run.
There is a full setup / teardown between each test suite, including for the db driver. This means each test suite the driver is getting shut own and stood back up. So my guess is that there's a missing coordination piece that is occasionally causing out of order memory actions.
- What error(s) or behavior you are seeing?
Either a segmentation fault (segfault
), a null pointer exception, or sometimes munmap_chunk(): invalid pointer
. These have been caught via various tools I've tried with it. It tends to occur on the "startup" of a new test suite. Perhaps there's an async teardown process for the library that's overlapping somehow with the rebuild?
Surprisingly, llnode
wasn't able to backtrace any Javascript, which I assume means that the oracledb compiled code is directly registering some async process that's causing this.
SIGSEGV example 1
PID 13245 received SIGSEGV for address: 0xf040c0147
/home/ferris/dev/PROJECT/node_modules/segfault-handler/build/Release/segfault-handler.node(+0x3137)[0x7fefa827b137]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x12980)[0x7fefaa884980]
/home/ferris/dev/PROJECT/node_modules/oracledb/build/Release/oracledb-5.1.0-linux-x64.node(dpiGen__setRefCount+0x17)[0x7fefa15b6d67]
/home/ferris/dev/PROJECT/node_modules/oracledb/build/Release/oracledb-5.1.0-linux-x64.node(dpiStmt__close+0x119)[0x7fefa15d2099]
/home/ferris/dev/PROJECT/node_modules/oracledb/build/Release/oracledb-5.1.0-linux-x64.node(dpiStmt_close+0xce)[0x7fefa15d3d5e]
/home/ferris/dev/PROJECT/node_modules/oracledb/build/Release/oracledb-5.1.0-linux-x64.node(+0x27cee)[0x7fefa15a4cee]
/home/ferris/dev/PROJECT/node_modules/oracledb/build/Release/oracledb-5.1.0-linux-x64.node(+0x187b5)[0x7fefa15957b5]
node[0x1376e04]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x76db)[0x7fefaa8796db]
/lib/x86_64-linux-gnu/libc.so.6(clone+0x3f)[0x7fefaa5a271f]
Segmentation fault (core dumped)
SIGSEGV example 2
Process 13528 stopped
* thread #11, name = 'node', stop reason = signal SIGSEGV: invalid address (fault address: 0x0)
frame #0: 0x00007ffd5cef29ac libclntsh.so`ttcacs + 3116
libclntsh.so`ttcacs:
-> 0x7ffd5cef29ac <+3116>: movl (%rdx), %ebx
0x7ffd5cef29ae <+3118>: jmp 0x7ffd5cef2945 ; <+3013>
0x7ffd5cef29b0 <+3120>: imull %ecx, %eax
0x7ffd5cef29b3 <+3123>: addq 0x20(%r13), %rax
(llnode) v8 bt
* thread #11: tid = 13579, 0x00007ffd5cef29ac libclntsh.so`ttcacs + 3116, name = 'node', stop reason = signal SIGSEGV: invalid address (fault address: 0x0)
* frame #0: 0x00007ffd5cef29ac libclntsh.so`ttcacs + 3116
frame #1: 0x00007ffd5cef3642 libclntsh.so`ttcdrv + 2146
frame #2: 0x00007ffd5cec84e9 libclntsh.so`nioqwa + 73
frame #3: 0x00007ffd5ceb17b9 libclntsh.so`upirtrc + 1465
frame #4: 0x00007ffd5cebeb96 libclntsh.so`kpurcsc + 102
frame #5: 0x00007ffd5ceb4ac5 libclntsh.so`kpuexec + 3045
frame #6: 0x00007ffd5ceb0f09 libclntsh.so`OCIStmtExecute + 41
frame #7: 0x00007ffd5ddbda2e oracledb-5.1.0-linux-x64.node`dpiOci__stmtExecute + 78
frame #8: 0x00007ffd5ddc6503 oracledb-5.1.0-linux-x64.node`dpiStmt__execute + 451
frame #9: 0x00007ffd5ddc91b4 oracledb-5.1.0-linux-x64.node`dpiStmt_execute + 148
frame #10: 0x00007ffd5dd8f0ac oracledb-5.1.0-linux-x64.node`njsConnection_executeAsync + 108
frame #11: 0x00007ffd5dd8a7b5 oracledb-5.1.0-linux-x64.node`njsBaton_executeAsync + 21
frame #12: 0x0000000001376e04 node`worker(arg=<unavailable>) at threadpool.c:122
frame #13: 0x00007ffff70786db libpthread.so.0`start_thread + 219
frame #14: 0x00007ffff6da171f libc.so.6`__GI___clone at clone.S:95
(llnode) i 0x00007ffd5dd8a7b5
invalid command 'target modules 0x00007ffd5dd8a7b5'.
(llnode) frame info
frame #0: 0x00007ffd5cef29ac libclntsh.so`ttcacs + 3116
(llnode) frame select 12
frame #12: 0x0000000001376e04 node`worker(arg=<unavailable>) at threadpool.c:122
(llnode) frame info
frame #12: 0x0000000001376e04 node`worker(arg=<unavailable>) at threadpool.c:122
(llnode) frame select 11
frame #11: 0x00007ffd5dd8a7b5 oracledb-5.1.0-linux-x64.node`njsBaton_executeAsync + 21
oracledb-5.1.0-linux-x64.node`njsBaton_executeAsync:
0x7ffd5dd8a7b5 <+21>: testb %al, %al
0x7ffd5dd8a7b7 <+23>: jne 0x7ffd5dd8a7bd ; <+29>
0x7ffd5dd8a7b9 <+25>: movb $0x1, 0x11(%rbx)
0x7ffd5dd8a7bd <+29>: addq $0x8, %rsp
- Include a runnable Node.js script that shows the problem.
I can attempt to cobble one of these up if needed, but this is a quite large project and the issue seems to be with the life cycling of the oracle library between test suite runs in Jest.
I'm forgoing this to start with as it would take a number of hours to do.
- Additional info
I'm relatively familiar with debugging compiled code. So happy to help if I can with that! I'm relatively new to Oracledb though, so apologies in advance if I'm there are things I'm not 100% familiar with it.
We are using Knex.js as a driver on top of oracledb
for Node. But based on the evidence, it appears this is not a Knex issue (it's not in any stack trace or error).