Skip to content

RedisMessageListenerContainer has race conditions [DATAREDIS-389] #964

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
spring-projects-issues opened this issue Apr 1, 2015 · 9 comments

Comments

@spring-projects-issues
Copy link

Adrian Riley opened DATAREDIS-389 and commented

The lazyListen() method sets the listening flag to true, then starts a Thread to run the SubscriptionTask. Calls to addMessageListener() then use the same JedisConnection to subscribe to further topics. But there is no guard on the SubscriptionTask startup process, a call to addMessageListener() may arrive before that process is complete, so the subscription may be lost entirely or both threads may write to the same output stream at the same time, so Redis is sent a corrupted command.

There is a similar issue when the task is shutdown after all topics have been unsubscribed. It is possible for a new subscribe command to be sent on the connection before it is returned to the pool. If that connection is then used for non-subscription commands, an error occurs.

Attached is a TestNG test which shows the problem, sometimes. You can modify the two constants to control the multi-threaded execution, but for me it usually fails every few times I run it. You may also hit the issue redis/jedis#933


Affects: 1.5 GA (Fowler)

Reference URL: http://stackoverflow.com/questions/29353615/spring-data-redis-redismessagelistenercontainer-seems-to-have-race-conditions

Attachments:

4 votes, 6 watchers

@spring-projects-issues
Copy link
Author

Adrian Riley commented

If you want to be sure, run the test in a debugger and set a breakpoint in SubscriptionTask.run after the connection has been acquired and before the call to eventuallyPerformSubscription(). You will find that all subsequent calls to addMessageListener are apparently successful but in fact no subscribe commands are sent to redis (use redis-cli monitor to see the commands)

@spring-projects-issues
Copy link
Author

Markus Heiden commented

The current version (2.3.1) has still the same/a similar race condition. This race condition causes the connection.getSubscription() in at least RedisMessageListenerContainer.SubscriptionTask#subscribeChannel() to return null when the subscription thread is too slow. It would be nice if at least the else-case of if (sub != null) will get some kind of logging, so this problem will be more obvious instead of silently not registering the channel

@spring-projects-issues
Copy link
Author

Markus Heiden commented

Suggestion: eventuallyPerformSubscription() should be guarded by the localMonitor, somewhere internally or completely externally

@spring-projects-issues
Copy link
Author

Markus Heiden commented

A working but suboptimal workaround is to use a separate RedisMessageListenerContainer for each listener registration.

Another is to wait ~ a second (depending of the speed of your network and Redis) after the initial listener registration. After that the race condition no longer occurs

@spring-projects-issues
Copy link
Author

Mark Paluch commented

RedisMessageListenerContainer has a pretty extensive synchronization mechanism that allows progressing in calls when used concurrently. Markus Heiden do you want to submit a pull request to improve synchronization? Logging races isn't really addressing the issue. The alternative is to revamp RedisMessageListenerContainer with a simplified design that blocks start and addMessageListener until the container is in a proper state that avoids races. The redesign is something we need to discuss in our team

@spring-projects-issues
Copy link
Author

Markus Heiden commented

I do not completely understand why the RedisMessageListenerContainer is that complex, so I don't think that I can provide a fix or re-implementation. I created the PR because we spend many hours debugging the problem. We spend that much time because we did not consider Spring to be the problem. This problem hit us during the migration to GCP, so we considered the changed environment to be the problem.

The intention of the PR is to improve the logging to avoid that others getting hit by the race condition without even noticing it. If they are hit, the PR provides a hint to the problem and possible solutions. Anyway IMO logging is better than to silently ignore the problem

@spring-projects-issues
Copy link
Author

Mark Paluch commented

RedisMessageListenerContainer tries to minimize the duration in which it blocks a calling thread. It aims for a good runtime behavior but that aspiration leads to a lot of complexity. Also, the container ensures a certain degree of resiliency. It also caters for the blocking vs. non-blocking behavior of drivers (Jedis blocks when calling subscribe/psubscribe while Lettuce is non-blocking) which eventually adds up to the state where it currently is. We briefly discussed and agreed in the team to rewrite RedisMessageListenerContainer using a simplified synchronization approach

@spring-projects-issues spring-projects-issues added type: bug A general bug in: jedis Jedis driver labels Dec 30, 2020
@mp911de mp911de changed the title RedisMessageListenerContainer seems to have race conditions [DATAREDIS-389] RedisMessageListenerContainer has race conditions [DATAREDIS-389] Feb 9, 2022
christophstrobl pushed a commit that referenced this issue Feb 17, 2022
christophstrobl pushed a commit that referenced this issue Feb 17, 2022
RedisMessageListenerContainer is now reimplemented using non-blocking synchronization guards and a state management to simplify its maintenances. Additionally, listener registration and subscription setup through the start() method awaits until the listener subscription is confirmed by the Redis server. The synchronization removes potential race conditions that could happen by concurrent access to blocking Redis connectors in which the registration state was guessed and not awaited.

Resolves: #964
Original Pull Request: #2256
christophstrobl pushed a commit that referenced this issue Feb 17, 2022
@christophstrobl christophstrobl added this to the 2.7 M3 (2021.2.0) milestone Feb 17, 2022
@jebeaudet
Copy link

I just hit that bug using Spring Session which use the same container for a pattern + channel topic. Some fonky race condition leads to silently unsubscribed pattern topic and corrupted streams on shutdown.

I saw that it was rewritten on the 2.7 branch but are you open on fixing some race conditions on the 2.5 and 2.6 branch?

Thanks

@christophstrobl
Copy link
Member

@jebeaudet we are.Please open a new issue (or a PR if you have the time).

jebeaudet added a commit to jebeaudet/spring-data-redis that referenced this issue Apr 11, 2022
RedisMessageListenerContainer relies on 2 threads for subscription when patterns and channels topics are present. With Jedis, since the subscription thread blocks while listening for messages, an additional thread is used to subscribe to patterns while the subscription threads subscribe to channels and block.

There were some race conditions between those two threads that could corrupt the Jedis stream since operations are not synchronized in JedisSubscription. A lock on the JedisSubscription instance has been added  to enforce that operations on the Jedis stream cannot be affected by a concurrent thread.

Additionaly, there were no error handling and retry mechanism on the pattern subscription thread. Multiple conditions could trigger an unexpected behavior here, exceptions were not handled and logged to stderr with no notice. Also, if the connection was not subscribed after 3 tries, the thread would exit silently with no log. Defensive measure have been added to retry redis connection failures and the subscription will now retry indefinitely, unless canceled on shutdown and on the main subscription thread errors.

Fixes spring-projects#964 for versions before spring-projects#2256 was introduced.
jebeaudet added a commit to jebeaudet/spring-data-redis that referenced this issue Apr 11, 2022
RedisMessageListenerContainer relies on 2 threads for subscription when patterns and channels topics are present. With Jedis, since the subscription thread blocks while listening for messages, an additional thread is used to subscribe to patterns while the subscription threads subscribe to channels and block.

There were some race conditions between those two threads that could corrupt the Jedis stream since operations are not synchronized in JedisSubscription. A lock on the JedisSubscription instance has been added  to enforce that operations on the Jedis stream cannot be affected by a concurrent thread.

Additionaly, there were no error handling and retry mechanism on the pattern subscription thread. Multiple conditions could trigger an unexpected behavior here, exceptions were not handled and logged to stderr with no notice. Also, if the connection was not subscribed after 3 tries, the thread would exit silently with no log. Defensive measure have been added to retry redis connection failures and the subscription will now retry indefinitely, unless canceled on shutdown and on the main subscription thread errors.

Fixes spring-projects#964 for versions before spring-projects#2256 was introduced.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants