-
Notifications
You must be signed in to change notification settings - Fork 1.2k
RedisMessageListenerContainer has race conditions [DATAREDIS-389] #964
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Adrian Riley commented If you want to be sure, run the test in a debugger and set a breakpoint in SubscriptionTask.run after the connection has been acquired and before the call to eventuallyPerformSubscription(). You will find that all subsequent calls to addMessageListener are apparently successful but in fact no subscribe commands are sent to redis (use redis-cli monitor to see the commands) |
Markus Heiden commented The current version (2.3.1) has still the same/a similar race condition. This race condition causes the |
Markus Heiden commented Suggestion: |
Markus Heiden commented A working but suboptimal workaround is to use a separate RedisMessageListenerContainer for each listener registration. Another is to wait ~ a second (depending of the speed of your network and Redis) after the initial listener registration. After that the race condition no longer occurs |
Mark Paluch commented
|
Markus Heiden commented I do not completely understand why the RedisMessageListenerContainer is that complex, so I don't think that I can provide a fix or re-implementation. I created the PR because we spend many hours debugging the problem. We spend that much time because we did not consider Spring to be the problem. This problem hit us during the migration to GCP, so we considered the changed environment to be the problem. The intention of the PR is to improve the logging to avoid that others getting hit by the race condition without even noticing it. If they are hit, the PR provides a hint to the problem and possible solutions. Anyway IMO logging is better than to silently ignore the problem |
Mark Paluch commented
|
RedisMessageListenerContainer is now reimplemented using non-blocking synchronization guards and a state management to simplify its maintenances. Additionally, listener registration and subscription setup through the start() method awaits until the listener subscription is confirmed by the Redis server. The synchronization removes potential race conditions that could happen by concurrent access to blocking Redis connectors in which the registration state was guessed and not awaited. Resolves: #964 Original Pull Request: #2256
I just hit that bug using Spring Session which use the same container for a pattern + channel topic. Some fonky race condition leads to silently unsubscribed pattern topic and corrupted streams on shutdown. I saw that it was rewritten on the 2.7 branch but are you open on fixing some race conditions on the 2.5 and 2.6 branch? Thanks |
@jebeaudet we are.Please open a new issue (or a PR if you have the time). |
RedisMessageListenerContainer relies on 2 threads for subscription when patterns and channels topics are present. With Jedis, since the subscription thread blocks while listening for messages, an additional thread is used to subscribe to patterns while the subscription threads subscribe to channels and block. There were some race conditions between those two threads that could corrupt the Jedis stream since operations are not synchronized in JedisSubscription. A lock on the JedisSubscription instance has been added to enforce that operations on the Jedis stream cannot be affected by a concurrent thread. Additionaly, there were no error handling and retry mechanism on the pattern subscription thread. Multiple conditions could trigger an unexpected behavior here, exceptions were not handled and logged to stderr with no notice. Also, if the connection was not subscribed after 3 tries, the thread would exit silently with no log. Defensive measure have been added to retry redis connection failures and the subscription will now retry indefinitely, unless canceled on shutdown and on the main subscription thread errors. Fixes spring-projects#964 for versions before spring-projects#2256 was introduced.
RedisMessageListenerContainer relies on 2 threads for subscription when patterns and channels topics are present. With Jedis, since the subscription thread blocks while listening for messages, an additional thread is used to subscribe to patterns while the subscription threads subscribe to channels and block. There were some race conditions between those two threads that could corrupt the Jedis stream since operations are not synchronized in JedisSubscription. A lock on the JedisSubscription instance has been added to enforce that operations on the Jedis stream cannot be affected by a concurrent thread. Additionaly, there were no error handling and retry mechanism on the pattern subscription thread. Multiple conditions could trigger an unexpected behavior here, exceptions were not handled and logged to stderr with no notice. Also, if the connection was not subscribed after 3 tries, the thread would exit silently with no log. Defensive measure have been added to retry redis connection failures and the subscription will now retry indefinitely, unless canceled on shutdown and on the main subscription thread errors. Fixes spring-projects#964 for versions before spring-projects#2256 was introduced.
Adrian Riley opened DATAREDIS-389 and commented
The lazyListen() method sets the listening flag to true, then starts a Thread to run the SubscriptionTask. Calls to addMessageListener() then use the same JedisConnection to subscribe to further topics. But there is no guard on the SubscriptionTask startup process, a call to addMessageListener() may arrive before that process is complete, so the subscription may be lost entirely or both threads may write to the same output stream at the same time, so Redis is sent a corrupted command.
There is a similar issue when the task is shutdown after all topics have been unsubscribed. It is possible for a new subscribe command to be sent on the connection before it is returned to the pool. If that connection is then used for non-subscription commands, an error occurs.
Attached is a TestNG test which shows the problem, sometimes. You can modify the two constants to control the multi-threaded execution, but for me it usually fails every few times I run it. You may also hit the issue redis/jedis#933
Affects: 1.5 GA (Fowler)
Reference URL: http://stackoverflow.com/questions/29353615/spring-data-redis-redismessagelistenercontainer-seems-to-have-race-conditions
Attachments:
4 votes, 6 watchers
The text was updated successfully, but these errors were encountered: