Skip to content

Memory bug and crash when running in Jest tests #1369

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
code-ape opened this issue May 7, 2021 · 7 comments
Closed

Memory bug and crash when running in Jest tests #1369

code-ape opened this issue May 7, 2021 · 7 comments
Labels

Comments

@code-ape
Copy link

code-ape commented May 7, 2021

  1. What versions are you using?
process.platform = 'linux'
process.version = 'v14.16.0'
process.arch = 'x64'
require('oracledb').versionString = '5.1.0'
require('oracledb').oracleClientVersionString = '18.3.0.0.0'
  1. Is it an error or a hang or a crash?

Crash. It's caused by a memory bug and is random but occurs with high frequency.

When running a Jest test scenario (details below) it will generally occur between the 2nd and 5th test suite that's run.
There is a full setup / teardown between each test suite, including for the db driver. This means each test suite the driver is getting shut own and stood back up. So my guess is that there's a missing coordination piece that is occasionally causing out of order memory actions.

  1. What error(s) or behavior you are seeing?

Either a segmentation fault (segfault), a null pointer exception, or sometimes munmap_chunk(): invalid pointer. These have been caught via various tools I've tried with it. It tends to occur on the "startup" of a new test suite. Perhaps there's an async teardown process for the library that's overlapping somehow with the rebuild?

Surprisingly, llnode wasn't able to backtrace any Javascript, which I assume means that the oracledb compiled code is directly registering some async process that's causing this.

SIGSEGV example 1
PID 13245 received SIGSEGV for address: 0xf040c0147
/home/ferris/dev/PROJECT/node_modules/segfault-handler/build/Release/segfault-handler.node(+0x3137)[0x7fefa827b137]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x12980)[0x7fefaa884980]
/home/ferris/dev/PROJECT/node_modules/oracledb/build/Release/oracledb-5.1.0-linux-x64.node(dpiGen__setRefCount+0x17)[0x7fefa15b6d67]
/home/ferris/dev/PROJECT/node_modules/oracledb/build/Release/oracledb-5.1.0-linux-x64.node(dpiStmt__close+0x119)[0x7fefa15d2099]
/home/ferris/dev/PROJECT/node_modules/oracledb/build/Release/oracledb-5.1.0-linux-x64.node(dpiStmt_close+0xce)[0x7fefa15d3d5e]
/home/ferris/dev/PROJECT/node_modules/oracledb/build/Release/oracledb-5.1.0-linux-x64.node(+0x27cee)[0x7fefa15a4cee]
/home/ferris/dev/PROJECT/node_modules/oracledb/build/Release/oracledb-5.1.0-linux-x64.node(+0x187b5)[0x7fefa15957b5]
node[0x1376e04]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x76db)[0x7fefaa8796db]
/lib/x86_64-linux-gnu/libc.so.6(clone+0x3f)[0x7fefaa5a271f]
Segmentation fault (core dumped)
SIGSEGV example 2
Process 13528 stopped
* thread #11, name = 'node', stop reason = signal SIGSEGV: invalid address (fault address: 0x0)
    frame #0: 0x00007ffd5cef29ac libclntsh.so`ttcacs + 3116
libclntsh.so`ttcacs:
->  0x7ffd5cef29ac <+3116>: movl   (%rdx), %ebx
    0x7ffd5cef29ae <+3118>: jmp    0x7ffd5cef2945            ; <+3013>
    0x7ffd5cef29b0 <+3120>: imull  %ecx, %eax
    0x7ffd5cef29b3 <+3123>: addq   0x20(%r13), %rax
(llnode) v8 bt
 * thread #11: tid = 13579, 0x00007ffd5cef29ac libclntsh.so`ttcacs + 3116, name = 'node', stop reason = signal SIGSEGV: invalid address (fault address: 0x0)
  * frame #0: 0x00007ffd5cef29ac libclntsh.so`ttcacs + 3116
    frame #1: 0x00007ffd5cef3642 libclntsh.so`ttcdrv + 2146
    frame #2: 0x00007ffd5cec84e9 libclntsh.so`nioqwa + 73
    frame #3: 0x00007ffd5ceb17b9 libclntsh.so`upirtrc + 1465
    frame #4: 0x00007ffd5cebeb96 libclntsh.so`kpurcsc + 102
    frame #5: 0x00007ffd5ceb4ac5 libclntsh.so`kpuexec + 3045
    frame #6: 0x00007ffd5ceb0f09 libclntsh.so`OCIStmtExecute + 41
    frame #7: 0x00007ffd5ddbda2e oracledb-5.1.0-linux-x64.node`dpiOci__stmtExecute + 78
    frame #8: 0x00007ffd5ddc6503 oracledb-5.1.0-linux-x64.node`dpiStmt__execute + 451
    frame #9: 0x00007ffd5ddc91b4 oracledb-5.1.0-linux-x64.node`dpiStmt_execute + 148
    frame #10: 0x00007ffd5dd8f0ac oracledb-5.1.0-linux-x64.node`njsConnection_executeAsync + 108
    frame #11: 0x00007ffd5dd8a7b5 oracledb-5.1.0-linux-x64.node`njsBaton_executeAsync + 21
    frame #12: 0x0000000001376e04 node`worker(arg=<unavailable>) at threadpool.c:122
    frame #13: 0x00007ffff70786db libpthread.so.0`start_thread + 219
    frame #14: 0x00007ffff6da171f libc.so.6`__GI___clone at clone.S:95
(llnode) i 0x00007ffd5dd8a7b5
invalid command 'target modules 0x00007ffd5dd8a7b5'.
(llnode) frame info
frame #0: 0x00007ffd5cef29ac libclntsh.so`ttcacs + 3116
(llnode) frame select 12
frame #12: 0x0000000001376e04 node`worker(arg=<unavailable>) at threadpool.c:122
(llnode) frame info
frame #12: 0x0000000001376e04 node`worker(arg=<unavailable>) at threadpool.c:122
(llnode) frame select 11
frame #11: 0x00007ffd5dd8a7b5 oracledb-5.1.0-linux-x64.node`njsBaton_executeAsync + 21
oracledb-5.1.0-linux-x64.node`njsBaton_executeAsync:
    0x7ffd5dd8a7b5 <+21>: testb  %al, %al
    0x7ffd5dd8a7b7 <+23>: jne    0x7ffd5dd8a7bd            ; <+29>
    0x7ffd5dd8a7b9 <+25>: movb   $0x1, 0x11(%rbx)
    0x7ffd5dd8a7bd <+29>: addq   $0x8, %rsp
  1. Include a runnable Node.js script that shows the problem.

I can attempt to cobble one of these up if needed, but this is a quite large project and the issue seems to be with the life cycling of the oracle library between test suite runs in Jest.
I'm forgoing this to start with as it would take a number of hours to do.

  1. Additional info

I'm relatively familiar with debugging compiled code. So happy to help if I can with that! I'm relatively new to Oracledb though, so apologies in advance if I'm there are things I'm not 100% familiar with it.

We are using Knex.js as a driver on top of oracledb for Node. But based on the evidence, it appears this is not a Knex issue (it's not in any stack trace or error).

@code-ape code-ape added the bug label May 7, 2021
@code-ape
Copy link
Author

code-ape commented May 7, 2021

To test the async coordination issue I've added a 1 second await timeout between between each setup and tear down. After running the test suite 3 time (over 100 suites run total) I've had no memory issues. This suggests that it is indeed an asynchronous piece of code that's not properly awaited during shutdown.

@cjbj
Copy link
Member

cjbj commented May 8, 2021

I'm not familiar with Jest's process/thread behavior. It would be interesting to know if the current 5.2.0-dev (compile from GitHub) differs because:

@code-ape
Copy link
Author

code-ape commented May 8, 2021

Huzzah @cjbj! Running with 5.2.0-dev (master branch commit 8c1bb87) seems to have done the trick 🙌 if you don't mind to leave this open until around Wednesday that should give me enough time to put it through some proper stress tests and make sure it's truly resolved. Thanks so much for your time and help!

@cjbj
Copy link
Member

cjbj commented May 8, 2021

@code-ape no problems. Let us know of any issues (or improvements!) you find with 5.2.0-dev

@cjbj
Copy link
Member

cjbj commented May 16, 2021

@code-ape what was the result of your testing?

@code-ape
Copy link
Author

@cjbj apologies for the delayed following up. Yes, it is confirmed! We are able to run our full, 100+ cycle test suite using 5.2.0-dev (master branch commit 8c1bb87) without any memory issues or crashes 😄 I'll keep an eye out for when 5.2 is published.

@cjbj
Copy link
Member

cjbj commented May 30, 2021

@code-ape That's great. We've frozen 5.2 and plan to complete release testing this week.

I'll close this now.

@cjbj cjbj closed this as completed May 30, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants