-
Notifications
You must be signed in to change notification settings - Fork 655
Description
On some of our FIX nodes we noticed that sometimes Heartbeats weren't sent out when they were due. After analysis and taking some stack dumps we noticed that QFJ was trying to connect to a FIX session at that time. But the connection attempt was not refused but timed out. This will block the thread (running on a SingleThreadedExecutor
) for a maximum of 2 seconds.
quickfixj/quickfixj-core/src/main/java/quickfix/mina/initiator/IoSessionInitiator.java
Lines 230 to 232 in b40c9a0
private void pollConnectFuture() { | |
try { | |
connectFuture.awaitUninterruptibly(CONNECT_POLL_TIMEOUT); |
There already was a JIRA issue for this (https://www.quickfixj.org/jira/browse/QFJ-291) which introduced the 2 second timeout. Before it probably waited until the network timeout triggered.
The same SingleThreadedExecutor
is also responsible to call the Session.next()
method (which among other things is responsible to generate Heartbeats) for all Sessions.
As a first step we will introduce a separate Executor with a few threads that will deal with the Initiator connection attempts. This will
- enable concurrent establishment of Sessions even if one or two are timing out when trying to connect.
- not block calling
Session.next()
for other Sessions.
(Later we might also upgrade theSingleThreadedExecutor
to have more threads. But we need to take care thatSession.next()
gets called always by the same thread from the pool to prevent possible concurrency/visibility issues)
The latter point is also tracked in https://www.quickfixj.org/jira/browse/QFJ-555.
Will open a PR for this shortly. Just opening this issue so I don't forget about it.