Skip to content

Test replication/replica_rejoin fails sporadically #3895

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
kyukhin opened this issue Dec 19, 2018 · 1 comment
Closed

Test replication/replica_rejoin fails sporadically #3895

kyukhin opened this issue Dec 19, 2018 · 1 comment
Assignees
Labels
bug Something isn't working flaky test
Milestone

Comments

@kyukhin
Copy link
Contributor

kyukhin commented Dec 19, 2018

Tarantool version: 2.1

OS version: Linux

Bug description:
The test fails sometimes. Please fix it.

[092] replication/replica_rejoin.test.lua             vinyl           [ fail ]                                                                                                                            
[092]                                                                                                                                                                                                     
[092] Test failed! Result content mismatch:                                                                                                                                                               
[092] --- replication/replica_rejoin.result     Fri Dec 14 08:19:07 2018                                                                                                                                  
[092] +++ replica_rejoin.reject Wed Dec 19 12:54:26 2018                                                                                                                                                  
[092] @@ -297,6 +297,7 @@                                                                                                                                                                                 
[092]  ...                                                                                                                                                                                                
[092]  _ = test_run:wait_vclock('default', vclock)                                                                                                                                                        
[092]  ---                                                                                                                                                                                                
[092] +- error: './test_run.lua:79: attempt to compare nil with number'                                                                                                                                   
[092]  ...                                                                                                                                                                                                
[092]  -- Restart the master and force garbage collection.                                                                                                                                                
[092]  test_run:cmd("switch default")                                                                                                                                                                     
[092]                                                                                                                                                                                                     
[092] Last 15 lines of Tarantool Log file [Instance "master"][/export/tarantool/src/test/var/092_replication/master.log]:                                                                                 
[092] 2018-12-19 12:54:24.182 [29113] main/122/applier/localhost:49099 I> can't join/subscribe                                                                                                            
[092] 2018-12-19 12:54:24.182 [29113] main/122/applier/localhost:49099 xrow.c:960 E> ER_CFG: Incorrect value for option 'replication': duplicate connection with the same replica UUID                    
[092] 2018-12-19 12:54:24.182 [29113] main/122/applier/localhost:49099 I> will retry every 0.10 second                                                                                                    
[092] 2018-12-19 12:54:25.729 [29113] main/122/applier/localhost:49099 I> can't read row                                                                                                                  
[092] 2018-12-19 12:54:25.729 [29113] main/122/applier/localhost:49099 coio.cc:379 !> SystemError unexpected EOF when reading from socket, called on fd 28, aka [::1]:60150, peer of [::1]:49099: Broken \
pipe                                                                                                                                                                                                      
[092] 2018-12-19 12:54:25.729 [29113] main/122/applier/localhost:49099 I> will retry every 0.10 second                                                                                                    
[092] 2018-12-19 12:54:26.086 [29113] main/413/main I> initial data sent.                                                                                                                                 
[092] 2018-12-19 12:54:26.087 [29113] relay/[::1]:58346/101/main I> recover from `/export/tarantool/src/test/var/092_replication/master/00000000000000000037.xlog'                                        
[092] 2018-12-19 12:54:26.087 [29113] main/413/main I> final data sent.                                                                                                                                   
[092] 2018-12-19 12:54:26.135 [29113] main/122/applier/localhost:49099 I> can't join/subscribe                                                                                                            
[092] 2018-12-19 12:54:26.135 [29113] main/122/applier/localhost:49099 xrow.c:960 E> ER_LOADING: Instance bootstrap hasn't finished yet                                                                   
[092] 2018-12-19 12:54:26.135 [29113] main/122/applier/localhost:49099 I> will retry every 0.10 second                                                                                                    
[092] 2018-12-19 12:54:26.146 [29113] relay_0x55cf4e3f6520/101/main I> recover from `/export/tarantool/src/test/var/092_replication/master/00000000000000000037.xlog'                                     
[092] 2018-12-19 12:54:26.285 [29113] main/417/console/unix/: I> set 'replication' configuration option to []                                                                                             
[092] 2018-12-19 12:54:26.299 [29113] relay/[::1]:58346/101/main C> exiting the relay loop   

logs.tgz.zip

Steps to reproduce:

/export/tarantool/src/test$ ./test-run.py --builddir ../../bld --long --force -j 48
@kyukhin kyukhin added bug Something isn't working flaky test labels Dec 19, 2018
@kyukhin kyukhin added this to the QA milestone Dec 19, 2018
sergw pushed a commit to tarantool/test-run that referenced this issue Dec 19, 2018
Got this error then testing parallel mode.

Fixes: tarantool/tarantool#3895
sergw pushed a commit to tarantool/test-run that referenced this issue Dec 23, 2018
Got this error then testing parallel mode.

Fixes: tarantool/tarantool#3895
sergw pushed a commit that referenced this issue Dec 23, 2018
- After enabling replication in parallel mode, sometimes got error:
  `attempt to compare nil with number`.

Fixes: #3895
Totktonada added a commit that referenced this issue Jan 21, 2019
* Fixed wait_vclock() LSN problem with nil handling (#3895).
* Enabled HangWatcher under --long.
* Enumerate result file lines.
* Show result file for a hang test once at the end.
Totktonada added a commit that referenced this issue Jan 21, 2019
* Fixed wait_vclock() LSN problem with nil handling (#3895).
* Enabled HangWatcher under --long.
* Enumerate result file lines.
* Show result file for a hang test once at the end.
Totktonada added a commit that referenced this issue Jan 22, 2019
* Fixed wait_vclock() LSN problem with nil handling (#3895).
* Enabled HangWatcher under --long.
* Show result file for a hang test once at the end.
* Show diff against a result file for a hung test.
Totktonada added a commit that referenced this issue Jan 24, 2019
* Fixed wait_vclock() LSN problem with nil handling (#3895).
* Enabled HangWatcher under --long.
* Show result file for a hang test once at the end.
* Show diff against a result file for a hung test.
Totktonada added a commit that referenced this issue Jan 24, 2019
* Fixed wait_vclock() LSN problem with nil handling (#3895).
* Enabled HangWatcher under --long.
* Show result file for a hang test once at the end.
* Show diff against a result file for a hung test.

(cherry picked from commit 0fc536c)
@Totktonada
Copy link
Member

Now the test-run commit is really come into tarantool repository (2.1 and 1.10 branches) and the problem should be fixed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working flaky test
Projects
None yet
Development

No branches or pull requests

3 participants