Skip to content

Connection timeout on one Session will block timer-related tasks on other Sessions (cf. QFJ-291) #254

@chrjohn

Description

@chrjohn

On some of our FIX nodes we noticed that sometimes Heartbeats weren't sent out when they were due. After analysis and taking some stack dumps we noticed that QFJ was trying to connect to a FIX session at that time. But the connection attempt was not refused but timed out. This will block the thread (running on a SingleThreadedExecutor) for a maximum of 2 seconds.

private void pollConnectFuture() {
try {
connectFuture.awaitUninterruptibly(CONNECT_POLL_TIMEOUT);

There already was a JIRA issue for this (https://www.quickfixj.org/jira/browse/QFJ-291) which introduced the 2 second timeout. Before it probably waited until the network timeout triggered.

The same SingleThreadedExecutor is also responsible to call the Session.next() method (which among other things is responsible to generate Heartbeats) for all Sessions.

As a first step we will introduce a separate Executor with a few threads that will deal with the Initiator connection attempts. This will

  • enable concurrent establishment of Sessions even if one or two are timing out when trying to connect.
  • not block calling Session.next() for other Sessions.
    (Later we might also upgrade the SingleThreadedExecutor to have more threads. But we need to take care that Session.next() gets called always by the same thread from the pool to prevent possible concurrency/visibility issues)

The latter point is also tracked in https://www.quickfixj.org/jira/browse/QFJ-555.

Will open a PR for this shortly. Just opening this issue so I don't forget about it.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions