test-run doesn't kill hung tests in the single process mode #106

locker · 2018-08-02T10:00:03Z

Reprodcued by Travis CI:

vinyl/constraint.test.lua                                       [ pass ]
vinyl/ddl.test.lua                                              
No output has been received in the last 10m0s, this potentially indicates a stalled build or something wrong with the build itself.
Check the details on how to adjust your build configuration on: https://docs.travis-ci.com/user/common-build-problems/#Build-times-out-because-no-output-was-received
The build has been terminated

The text was updated successfully, but these errors were encountered:

locker · 2018-08-02T10:04:40Z

It would also be great if test-run printed the test output (both for tap and diff tests) after killing it on timeout. I filed a separate issue for this #107.

locker · 2018-08-02T13:34:53Z

A note: test-run does kill a hung test if called without options (e.g. ./test-run box/cfg.test.lua), but does not if called in the single process mode (i.e. ./test-run -j -1 box/cfg.test.lua).

Totktonada · 2018-08-03T01:34:46Z

Maybe it worth to just use -j 1 in Travis CI instead of -j -1. I don’t know how this quasi-parallel mode can increase tests flakiness.

@sergw Maybe you want to participare in this decision.

Open question: can we estimate level of tests flakiness increase after the switch with current level of flakiness?

sergw · 2018-08-03T11:55:35Z

It will be good to change mode to -j 1:

one step out from legacy consistent test-run
behaviour and output same as in parallel
easy to contribute & bug fix (one place to change)

can we estimate level of tests flakiness

I gueess it will be the same, cause as I see, flakiness depends on tests not the test-run.

Totktonada · 2018-08-03T12:21:50Z

I propose to change tarantool CI (.travis.mk and rpm/tarantool.spec files) w/o actual changes of test-run and estimate the effect. Then, maybe, remove the single process mode, because it going to be unmaintained.

The -j -1 used to legacy consistent mode. Reducing the number of jobs to one by switching to -j 1, uses same part of the code as in parallel mode. The code in parallel mode kills hung tests. Part of tarantool/test-run#106

Totktonada · 2018-08-15T14:42:42Z

@sergw Can be considered as worked around and be closed as won’t fix?

The -j -1 used to legacy consistent mode. Reducing the number of jobs to one by switching to -j 1, uses same part of the code as in parallel mode. The code in parallel mode kills hung tests. Part of tarantool/test-run#106

This update contains changes from 0.6.5 release (cited below) and usage of yaml.safe_load() instead of yaml.load() in tarantool-python tests (doesn't affect test-run behaviour). The reason why it is updated here is just to keep things in sync and, second, to eliminate usage of yaml.load() w/o an explicit loader everywhere where it is possible. The latter is because it was banned in recent versions of pyyaml in Gentoo Linux; see [1]. [1]: https://gitweb.gentoo.org/repo/gentoo.git/commit/?id=79ba924d94cb0cf8559565178414c2a1d687b90c # tarantool-python 0.6.5 ## Breaking changes This release should not break existing code. ## New features * Added MeshConnection that allows to switch between nodes from a user provided list if a current node is down using round-robin strategy (#106). * Added connection_timeout parameter to Connection (#115). ## Bugfixes * Fixed auto-reconnection in Connection. * Eliminated deprecation warnings on Python 3 (#114). * Added TCP_NODELAY back (it was removed in 0.6.4) (#127). https://github.com/tarantool/tarantool-python/releases/tag/0.6.5

This update contains changes from 0.6.5 release (cited below) and usage of yaml.safe_load() instead of yaml.load() in tarantool-python tests (doesn't affect test-run behaviour). The reason why it is updated here is just to keep things in sync and, second, to eliminate usage of yaml.load() w/o an explicit loader everywhere where it is possible. The latter is because it was banned in recent versions of pyyaml in Gentoo Linux; see [1]. There was also related change 38400e9 ('Update pyyaml version') where yaml.load() was replaced with yaml.safe_load() within test-run itself. [1]: https://gitweb.gentoo.org/repo/gentoo.git/commit/?id=79ba924d94cb0cf8559565178414c2a1d687b90c # tarantool-python 0.6.5 ## Breaking changes This release should not break existing code. ## New features * Added MeshConnection that allows to switch between nodes from a user provided list if a current node is down using round-robin strategy (#106). * Added connection_timeout parameter to Connection (#115). ## Bugfixes * Fixed auto-reconnection in Connection. * Eliminated deprecation warnings on Python 3 (#114). * Added TCP_NODELAY back (it was removed in 0.6.4) (#127). https://github.com/tarantool/tarantool-python/releases/tag/0.6.5

Found ASAN error: [001] + ok 206 - ================================================================= [001] +==6889==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x604000000031 at pc 0x0000005a72e7 bp 0x7ffe47c30c80 sp 0x7ffe47c30c78 [001] +WRITE of size 1 at 0x604000000031 thread T0 [001] + #0 0x5a72e6 in mp_store_u8 /tarantool/src/lib/msgpuck/msgpuck.h:258:1 [001] + #1 0x5a72e6 in mp_encode_uint /tarantool/src/lib/msgpuck/msgpuck.h:1768 [001] + #2 0x4fa657 in test_mp_print /tarantool/src/lib/msgpuck/test/msgpuck.c:957:16 [001] + #3 0x509024 in main /tarantool/src/lib/msgpuck/test/msgpuck.c:1331:2 [001] + #4 0x7f3658fd909a in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2409a) [001] + #5 0x41f339 in _start (/tnt/test/unit/msgpack.test+0x41f339) [001] + [001] +0x604000000031 is located 0 bytes to the right of 33-byte region [0x604000000010,0x604000000031) [001] +allocated by thread T0 here: [001] + #0 0x4cace3 in malloc (/tnt/test/unit/msgpack.test+0x4cace3) [001] + #1 0x4fa5db in test_mp_print /tarantool/src/lib/msgpuck/test/msgpuck.c:945:18 [001] + #2 0x509024 in main /tarantool/src/lib/msgpuck/test/msgpuck.c:1331:2 [001] + #3 0x7f3658fd909a in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2409a) [001] + [001] +SUMMARY: AddressSanitizer: heap-buffer-overflow /tarantool/src/lib/msgpuck/msgpuck.h:258:1 in mp_store_u8 [001] +Shadow bytes around the buggy address: [001] + 0x0c087fff7fb0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [001] + 0x0c087fff7fc0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [001] + 0x0c087fff7fd0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [001] + 0x0c087fff7fe0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [001] + 0x0c087fff7ff0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [001] +=>0x0c087fff8000: fa fa 00 00 00 00[01]fa fa fa fa fa fa fa fa fa [001] + 0x0c087fff8010: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa [001] + 0x0c087fff8020: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa [001] + 0x0c087fff8030: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa [001] + 0x0c087fff8040: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa [001] + 0x0c087fff8050: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa [001] +Shadow byte legend (one shadow byte represents 8 application bytes): [001] + Addressable: 00 [001] + Partially addressable: 01 02 03 04 05 06 07 [001] + Heap left redzone: fa [001] + Freed heap region: fd [001] + Stack left redzone: f1 [001] + Stack mid redzone: f2 [001] + Stack right redzone: f3 [001] + Stack after return: f5 [001] + Stack use after scope: f8 [001] + Global redzone: f9 [001] + Global init order: f6 [001] + Poisoned by user: f7 [001] + Container overflow: fc [001] + Array cookie: ac [001] + Intra object redzone: bb [001] + ASan internal: fe [001] + Left alloca redzone: ca Investigated the buffer size that was allocated was 33 bytes, but it needed 34. The fix was to increase this buffer for another mp_encode_array(1). Part of tarantool/tarantool#4360 Reviewed-by: Vladislav Shpilevoy <[email protected]> test: obuf test refactoring Added slab_arena_destroy for graceful resources release, removed global seed value, removed unused value from enum. Merge pull request #136 from tbeu/patch-1 Update README.rst test: move unit/ to test/ This virtually reverts commit 436218defd4c284134f59975d4642405bdf2d918 ('move unit tests to unit'), that was made in the scope of #106. Despite the fact that testing of the connector uses `unittest` framework, it is functional (and integration) testing by its nature: most of the test cases verify that public API of the connector properly works with tarantool. In seems meaningful to locate such kind of test cases in the `test/` directory, not `unit/`, disregarding of used framework. Follows up #106. Add timeout for starting tarantool server Checking that tarantool server is started by finding pattern 'entering the event loop|will retry binding|hot standby mode' in the xlog. If server is hanging it could be killed after test timeout. Was added start-server-timeout. Now the pattern is searching until this timeout. If there is no pattern functions wait_until_started returns False (else True) and TarantoolServer.start() returns same. Default value of start-server-timeout is 90 sec. Fixes: #276 RELEASE-NOTES: synced curl 7.76.0 release Use rawset() when exporting functions to _G test: fix directory detection in lua-Harness suite A test <314-regex.t> uses `arg[0]:find'314'` to determine the name of the directory where rx_* files are located. This leads to the test failure, when lua-Harness suite runs in a directory containing "314" in its name, because the found path doesn't contain the required files. This patch fixes directory name detection. Follows up tarantool/tarantool#5844 Reviewed-by: Igor Munkin <[email protected]> Reviewed-by: Sergey Ostanevich <[email protected]> Signed-off-by: Igor Munkin <[email protected]> Add the chdir option for make Flag --chdir for make command (with help) has been added. It's add possibility to specify a source directory of the rock when make. Merge pull request #2435 from facebook/dev v1.4.8 hotfix

locker added the bug Something isn't working label Aug 2, 2018

Totktonada changed the title ~~test-run doesn't kill hung tests~~ test-run doesn't kill hung tests in the single process mode Aug 3, 2018

Totktonada assigned sergw Aug 6, 2018

sergw closed this as completed Aug 20, 2018

Totktonada mentioned this issue Apr 29, 2019

Update tarantool-python submodule #165

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

test-run doesn't kill hung tests in the single process mode #106

test-run doesn't kill hung tests in the single process mode #106

locker commented Aug 2, 2018

locker commented Aug 2, 2018 •

edited

Loading

Uh oh!

locker commented Aug 2, 2018

Uh oh!

Totktonada commented Aug 3, 2018

Uh oh!

sergw commented Aug 3, 2018

Uh oh!

Totktonada commented Aug 3, 2018

Uh oh!

Totktonada commented Aug 15, 2018

Uh oh!

test-run doesn't kill hung tests in the single process mode #106

test-run doesn't kill hung tests in the single process mode #106

Comments

locker commented Aug 2, 2018

locker commented Aug 2, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

locker commented Aug 2, 2018

Uh oh!

Totktonada commented Aug 3, 2018

Uh oh!

sergw commented Aug 3, 2018

Uh oh!

Totktonada commented Aug 3, 2018

Uh oh!

Totktonada commented Aug 15, 2018

Uh oh!

locker commented Aug 2, 2018 •

edited

Loading