Queue process termination and restart causes consumer to stop receiving messages #13369
-
Describe the bugWe are noticing some odd behavior when our queue processes crash. The reason for the crashing is hard to identify, we are assuming some windows security software is interacting poorly with Rabbit. When the queues do crash, our consumers tied to those queues stop receiving messages. When the queues automatically restart the consumers still do not receive messages. The channels used by these consumers do not encounter a ChannelShutdown event, so our software is not detecting that anything has gone wrong with the rabbit process. Snippet of Queue crash stack:
Snippet of Queue Restart The below link seems to be a related issue, and the guidance was to check in on security software. We have done this to the best of our ability by disabling the security software that was exposed to use, but we don't fundamentally control the host, so its possible that other security software is interacting with RabbitMQ. Any guidance on:
Reproduction steps1.Have a consumer with a channel that is bound to Queue A Expected behavior
Additional context
|
Beta Was this translation helpful? Give feedback.
Replies: 4 comments 8 replies
-
Versions: |
Beta Was this translation helpful? Give feedback.
-
is the line you are looking for. A filesystem operation has failed with Queue replicas and channels are completely independent from each other, although channels can monitor queues. Channels do retry some operations with a delay when a queue does not have an elected leader (in the case of quorum queues and streams). So for non-transient data, use one of those two. For transient (specifically exclusive, since non-exclusive non-durable queues are going away later in the 4.x series), there already is #12949 which is not as trivial to address as it may sound. |
Beta Was this translation helpful? Give feedback.
-
Hi guys, This issue appears to be related to what we previously encountered here. We still experience this problem occasionally, but only on Windows. It seems to be caused by an unknown antivirus blocking our process, leading to consumer crashes with no recovery options. Does anyone know of a reference page listing similar cases and recommended AV exclusions to mitigate this issue? Such a resource could also be useful for troubleshooting similar problems. Thanks in advance! |
Beta Was this translation helpful? Give feedback.
-
Thanks to @tomerrt, we have learned that a filesystem API behavior that can result in difficult to explain While our current workaround helps, it must cover all file deletions, which is a fair number of places even in the CQ storage layer alone. Adopting a newer Windows version can be a viable option that would help those running RabbitMQ 3.13.x or any 4.x version before |
Beta Was this translation helpful? Give feedback.
Thanks to @tomerrt, we have learned that a filesystem API behavior that can result in difficult to explain
eacces
errors is not only Windows-specific, it is specific to a certain set of Windows versions, and the newest versionsshould not be affected.
While our current workaround helps, it must cover all file deletions, which is a fair number of places even in the CQ storage layer alone. Adopting a newer Windows version can be a viable option that would help those running RabbitMQ 3.13.x or any 4.x version before
4.1.2
where the first slew of workarounds shipped.