Skip to content

Timer trigger deadlock in DefaultStateMachineExecutor #315

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
davidtepel opened this issue Feb 12, 2017 · 16 comments
Closed

Timer trigger deadlock in DefaultStateMachineExecutor #315

davidtepel opened this issue Feb 12, 2017 · 16 comments
Labels
type/bug Is a bug report
Milestone

Comments

@davidtepel
Copy link

Hello,

We need to run multiple state machines simultaniously, yet when we do some of them get BLOCKED. Looking in the thread dump we see that it happens in the DefaultStateMachineExecutor class method: scheduleEventQueueProcessing.

Please see attached thread dump.
see thread number: 12 & 19.

Thanks,
David
test.txt

@jvalkeal
Copy link
Contributor

Looks like you have some tests to reproduce this, would you mind giving me those tests to speed things up. If there's something you can't show publicly then email would be good either to my gmail or jvalkealahti(at)pivotal.io.

@davidtepel
Copy link
Author

We made some progress - found out that timers that we have configured to the state macihne somehowe cause the issue.
When we start the state machine again with the start method, it seems that old timers still exist in the queue and cause the deadlock.

We can send the test but we have to clean the code and it is not an easy job.

@jvalkeal
Copy link
Contributor

Maybe we can start with high level machine configuration which hopefully is enough to craft a test case without removing your internal code. Think I just need a config or machine statechart and then what your test do to end up with deadlock.

@davidtepel
Copy link
Author

Just sent you the class which defines the state machine by mail. hope it helps.

@davidtepel
Copy link
Author

Thought it might be helpfull - here is the thread dump of the 2 deadlock thread:


Name: Thread number: 6
State: BLOCKED on org.springframework.statemachine.support.DefaultStateMachineExecutor@3ba19a9c owned by: pool-2-thread-1
Total blocked: 3 Total waited: 0

Stack trace:
org.springframework.statemachine.support.DefaultStateMachineExecutor.scheduleEventQueueProcessing(DefaultStateMachineExecutor.java:257)
org.springframework.statemachine.support.DefaultStateMachineExecutor.execute(DefaultStateMachineExecutor.java:142)
org.springframework.statemachine.support.DefaultStateMachineExecutor.doStart(DefaultStateMachineExecutor.java:153)
org.springframework.statemachine.support.LifecycleObjectSupport.start(LifecycleObjectSupport.java:120)
org.springframework.statemachine.support.AbstractStateMachine.doStart(AbstractStateMachine.java:390)
org.springframework.statemachine.support.LifecycleObjectSupport.start(LifecycleObjectSupport.java:120)
com.att.sei.upt.bl.manager.flow.TermCallStateMachine.start(TermCallStateMachine.java:99)
com.att.sei.upt.bl.TermCallStateMachineTest.runTest(TermCallStateMachineTest.java:157)
com.att.sei.upt.bl.TermCallStateMachineTest.access$000(TermCallStateMachineTest.java:48)
com.att.sei.upt.bl.TermCallStateMachineTest$1.run(TermCallStateMachineTest.java:126)
java.lang.Thread.run(Thread.java:745)

Name: pool-2-thread-1
State: WAITING on java.util.concurrent.locks.ReentrantLock$NonfairSync@13cedca9 owned by: Thread number: 6
Total blocked: 32 Total waited: 40

Stack trace:
sun.misc.Unsafe.park(Native Method)
java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870)
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199)
java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:209)
java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:285)
org.springframework.statemachine.support.LifecycleObjectSupport.isRunning(LifecycleObjectSupport.java:106)
org.springframework.statemachine.support.DefaultStateMachineExecutor.processTriggerQueue(DefaultStateMachineExecutor.java:324)
org.springframework.statemachine.support.DefaultStateMachineExecutor.access$100(DefaultStateMachineExecutor.java:61)
org.springframework.statemachine.support.DefaultStateMachineExecutor$1.run(DefaultStateMachineExecutor.java:277)
org.springframework.core.task.SyncTaskExecutor.execute(SyncTaskExecutor.java:50)
org.springframework.statemachine.support.DefaultStateMachineExecutor.scheduleEventQueueProcessing(DefaultStateMachineExecutor.java:290)

  • locked org.springframework.statemachine.support.DefaultStateMachineExecutor@3ba19a9c
    org.springframework.statemachine.support.DefaultStateMachineExecutor.access$500(DefaultStateMachineExecutor.java:61)
    org.springframework.statemachine.support.DefaultStateMachineExecutor$2.triggered(DefaultStateMachineExecutor.java:452)
    org.springframework.statemachine.trigger.CompositeTriggerListener.triggered(CompositeTriggerListener.java:34)
    org.springframework.statemachine.trigger.TimerTrigger.notifyTriggered(TimerTrigger.java:115)
    org.springframework.statemachine.trigger.TimerTrigger.access$000(TimerTrigger.java:33)
    org.springframework.statemachine.trigger.TimerTrigger$1.run(TimerTrigger.java:109)
    org.springframework.scheduling.support.DelegatingErrorHandlingRunnable.run(DelegatingErrorHandlingRunnable.java:54)
    org.springframework.scheduling.concurrent.ReschedulingRunnable.run(ReschedulingRunnable.java:81)
    java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
    java.util.concurrent.FutureTask.run(FutureTask.java:266)
    java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
    java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
    java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    java.lang.Thread.run(Thread.java:745)

@jvalkeal
Copy link
Contributor

Thanks, I think I have an idea what happens. I'm just trying to come up with a failing test.

@davidtepel
Copy link
Author

In order to reproduce this issue I run 30 simultaneously threads of the state machine, at least 10 iterations.
It is timing issue, so you need many threads running to get this deadlock.

@jvalkeal jvalkeal changed the title DefaultStateMachineExecutor is BLOCKED when state machine is starting Timer trigger deadlock in DefaultStateMachineExecutor Feb 13, 2017
@jvalkeal jvalkeal added the type/bug Is a bug report label Feb 13, 2017
@jvalkeal jvalkeal added this to the 1.2.2.RELEASE milestone Feb 13, 2017
jvalkeal added a commit to jvalkeal/spring-statemachine that referenced this issue Feb 13, 2017
- Add new tests for DefaultStateMachineExecutor to find
  possible causes for a deadlock.
- Handle lifecycle scenario little differently
  if timer is fired during a executor starting phase
  order to not getting into a deadlock.
- Relates to spring-projects#315
@jvalkeal
Copy link
Contributor

I did push one fix what I'd like you to try. SDG-B12X is building so 1.2.2.BUILD-SNAPSHOT should be there in 10 minutes or so.

@davidtepel
Copy link
Author

Will test this fix. Yet, when release version will be ready?

@jvalkeal
Copy link
Contributor

If fix is good for you I can release 1.2.2 asap as there's also one other fix in that branch what I'd like to get out.

@davidtepel
Copy link
Author

Your fix looks fine. Lets get the release version.

@davidtepel
Copy link
Author

Please let us know as soon as the release version is ready.
Thanks,
David

@jvalkeal
Copy link
Contributor

@davidtepel it's done and in our repos. sync to maven central takes few hours.

@davidtepel
Copy link
Author

Could you please supply me a link to the release version?

@davidtepel
Copy link
Author

Thanks!

jvalkeal added a commit that referenced this issue Mar 9, 2017
- Add new tests for DefaultStateMachineExecutor to find
  possible causes for a deadlock.
- Handle lifecycle scenario little differently
  if timer is fired during a executor starting phase
  order to not getting into a deadlock.
- Backport #315
- Relates to #307
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/bug Is a bug report
Projects
None yet
Development

No branches or pull requests

2 participants