Node fails to start with error access to vhost '/' refused for user 'XXXX': vhost '/' is down #10052

ravisingh-adeptia · 2023-12-05T12:13:29Z

ravisingh-adeptia
Dec 5, 2023

I have deployed rabbitmq as a statefulset on kubernetes cluster, there I am frequently facing error access to vhost '/' refused for user 'XXXX': vhost '/' is down.

Rabbitmq version: 3.9.7 on Erlang 24.1.1 [jit]

Error stack:
[info] <0.508.0> Making sure data directory Starting message stores for vhost '/'
2023-08-26 01:45:04.966313+00:00 [info] <0.513.0> 2023-08-26 01:45:05.149240+00:00 [erro] <0.508.0> 2023-08-26 01:45:05.149240+00:00 [erro] <0.508.0> 2023-08-26 01:45:05.149240+00:00 [erro] <0.508.0> 2023-08-26 01:45:05.149240+00:00 [erro] <0.508.0> 2023-08-26 01:45:05.149240+00:00 [erro] <0.508.0> 2023-08-26 01:45:05.149240+00:00 [erro] <0.508.0> 2023-08-26 01:45:05.149240+00:00 [erro] <0.508.0> 2023-08-26 01:45:05.149240+00:00 [erro] <0.508.0> 2023-08-26 01:45:05.149240+00:00 [erro] <0.508.0> 2023-08-26 01:45:05.149240+00:00 [erro] <0.508.0> 2023-08-26 01:45:05.149240+00:00 [erro] <0.508.0> 2023-08-26 01:45:05.149240+00:00 [erro] <0.508.0> 2023-08-26 01:45:05.149240+00:00 [erro] <0.508.0> 2023-08-26 01:45:05.149240+00:00 [erro] <0.508.0> 2023-08-26 01:45:05.149240+00:00 [erro] <0.508.0> 2023-08-26 01:45:05.149240+00:00 [erro] <0.508.0> 2023-08-26 01:45:05.149240+00:00 [erro] <0.508.0> 2023-08-26 01:45:05.149240+00:00 [erro] <0.508.0> 2023-08-26 01:45:05.149240+00:00 [erro] <0.508.0> 2023-08-26 01:45:05.149240+00:00 [erro] <0.508.0> 2023-08-26 01:45:05.149240+00:00 [erro] <0.508.0> 2023-08-26 01:45:05.149240+00:00 [erro] <0.508.0> 2023-08-26 01:45:05.149240+00:00 [erro] <0.508.0> 2023-08-26 01:45:05.149240+00:00 [erro] <0.508.0> 2023-08-26 01:45:05.149240+00:00 [erro] <0.508.0> 2023-08-26 01:45:05.149240+00:00 [erro] <0.508.0> 2023-08-26 01:45:05.149240+00:00 [erro] <0.508.0> 2023-08-26 01:45:05.149240+00:00 [erro] <0.508.0> 2023-08-26 01:45:05.149240+00:00 [erro] <0.508.0> 2023-08-26 01:45:05.149240+00:00 [erro] <0.508.0> 2023-08-26 01:45:05.149240+00:00 [erro] <0.508.0> 2023-08-26 01:45:05.149240+00:00 [erro] <0.508.0> 2023-08-26 01:45:05.149240+00:00 [erro] <0.508.0> 2023-08-26 01:45:05.149240+00:00 [erro] <0.508.0> 2023-08-26 01:45:05.149240+00:00 [erro] <0.508.0> 2023-08-26 01:45:05.149240+00:00 [erro] <0.508.0> 2023-08-26 01:45:05.149240+00:00 [erro] <0.508.0> 2023-08-26 01:45:05.149240+00:00 [erro] <0.508.0> 2023-08-26 01:45:05.149240+00:00 [erro] <0.508.0> 2023-08-26 01:45:05.149443+00:00 [erro] <0.513.0> 2023-08-26 01:45:05.149443+00:00 [erro] <0.513.0> 2023-08-26 01:45:05.149443+00:00 [erro] <0.513.0> 2023-08-26 01:45:05.149443+00:00 [erro] <0.513.0> 2023-08-26 01:45:05.149443+00:00 [erro] <0.513.0> 2023-08-26 01:45:05.149443+00:00 [erro] <0.513.0> 2023-08-26 01:45:05.149443+00:00 [erro] <0.513.0> 2023-08-26 01:45:05.149443+00:00 [erro] <0.513.0> 2023-08-26 01:45:05.149443+00:00 [erro] <0.513.0> 2023-08-26 01:45:05.149443+00:00 [erro] <0.513.0> 2023-08-26 01:45:05.149443+00:00 [erro] <0.513.0> 2023-08-26 01:45:05.149443+00:00 [erro] <0.513.0> 2023-08-26 01:45:05.149443+00:00 [erro] <0.513.0> 2023-08-26 01:45:05.149443+00:00 [erro] <0.513.0> 2023-08-26 01:45:05.149443+00:00 [erro] <0.513.0> 2023-08-26 01:45:05.149443+00:00 [erro] <0.513.0> 2023-08-26 01:45:05.149443+00:00 [erro] <0.513.0> 2023-08-26 01:45:05.149443+00:00 [erro] <0.513.0> 2023-08-26 01:45:05.149443+00:00 [erro] <0.513.0> 2023-08-26 01:45:05.149443+00:00 [erro] <0.513.0> 2023-08-26 01:45:05.149443+00:00 [erro] <0.513.0> 2023-08-26 01:45:05.149443+00:00 [erro] <0.513.0> 2023-08-26 01:45:05.149443+00:00 [erro] <0.513.0> 2023-08-26 01:45:05.149443+00:00 [erro] <0.513.0> 2023-08-26 01:45:05.149443+00:00 [erro] <0.513.0> 2023-08-26 01:45:05.149443+00:00 [erro] <0.513.0> 2023-08-26 01:45:05.149443+00:00 [erro] <0.513.0> 2023-08-26 01:45:05.149443+00:00 [erro] <0.513.0> 2023-08-26 01:45:05.232984+00:00 [erro] <0.508.0> 2023-08-26 01:45:05.232984+00:00 [erro] <0.508.0> 2023-08-26 01:45:05.232984+00:00 [erro] <0.508.0> 2023-08-26 01:45:05.232984+00:00 [erro] <0.508.0> 2023-08-26 01:45:05.232984+00:00 [erro] <0.508.0> 2023-08-26 01:45:05.232984+00:00 [erro] <0.508.0> 2023-08-26 01:45:05.232984+00:00 [erro] <0.508.0> 2023-08-26 01:45:05.232984+00:00 [erro] <0.508.0> 2023-08-26 01:45:05.232984+00:00 [erro] <0.508.0> 2023-08-26 01:45:05.232984+00:00 [erro] <0.508.0> 2023-08-26 01:45:05.232984+00:00 [erro] <0.508.0> 2023-08-26 01:45:05.232984+00:00 [erro] <0.508.0> 2023-08-26 01:45:05.232984+00:00 [erro] <0.508.0> 2023-08-26 01:45:05.232984+00:00 [erro] <0.508.0> 2023-08-26 01:45:05.232984+00:00 [erro] <0.508.0> 2023-08-26 01:45:05.232984+00:00 [erro] <0.508.0> 2023-08-26 01:45:05.232984+00:00 [erro] <0.508.0> 2023-08-26 01:45:05.232984+00:00 [erro] <0.508.0> 2023-08-26 01:45:05.232984+00:00 [erro] <0.508.0> 2023-08-26 01:45:05.232984+00:00 [erro] <0.508.0> 2023-08-26 01:45:05.232984+00:00 [erro] <0.508.0> 2023-08-26 01:45:05.232984+00:00 [erro] <0.508.0> 2023-08-26 01:45:05.232984+00:00 [erro] <0.508.0> 2023-08-26 01:45:05.232984+00:00 [erro] <0.508.0> 2023-08-26 01:45:05.232984+00:00 [erro] <0.508.0> 2023-08-26 01:45:05.232984+00:00 [erro] <0.508.0> 2023-08-26 01:45:05.232984+00:00 [erro] <0.508.0> 2023-08-26 01:45:05.232984+00:00 [erro] <0.508.0> 2023-08-26 01:45:05.232984+00:00 [erro] <0.508.0> 2023-08-26 01:45:05.232984+00:00 [erro] <0.508.0> 2023-08-26 01:45:05.232984+00:00 [erro] <0.508.0> 2023-08-26 01:45:05.232984+00:00 [erro] <0.508.0> 2023-08-26 01:45:05.232984+00:00 [erro] <0.508.0> 2023-08-26 01:45:05.232984+00:00 [erro] <0.508.0> 2023-08-26 01:45:05.232984+00:00 [erro] <0.508.0> 2023-08-26 01:45:05.232984+00:00 [erro] <0.508.0> 2023-08-26 01:45:05.232984+00:00 [erro] <0.508.0> 2023-08-26 01:45:05.232984+00:00 [erro] <0.508.0> 2023-08-26 01:45:05.232984+00:00 [erro] <0.508.0> 2023-08-26 01:45:05.232984+00:00 [erro] <0.508.0> 2023-08-26 01:45:05.232984+00:00 [erro] <0.508.0> 2023-08-26 01:45:05.232984+00:00 [erro] <0.508.0> 2023-08-26 01:45:05.232984+00:00 [erro] <0.508.0> 2023-08-26 01:45:05.232984+00:00 [erro] <0.508.0> 2023-08-26 01:45:05.233769+00:00 [erro] <0.506.0> 2023-08-26 01:45:05.233769+00:00 [erro] <0.506.0> 2023-08-26 01:45:05.233769+00:00 [erro] <0.506.0> 2023-08-26 01:45:05.233769+00:00 [erro] <0.506.0> 2023-08-26 01:45:05.233769+00:00 [erro] <0.506.0> 2023-08-26 01:45:05.233769+00:00 [erro] <0.506.0> 2023-08-26 01:45:05.233769+00:00 [erro] <0.506.0> 2023-08-26 01:45:05.233769+00:00 [erro] <0.506.0> 2023-08-26 01:45:05.233769+00:00 [erro] <0.506.0> 2023-08-26 01:45:05.233769+00:00 [erro] <0.506.0> 2023-08-26 01:45:05.233769+00:00 [erro] <0.506.0> 2023-08-26 01:45:05.233769+00:00 [erro] <0.506.0> 2023-08-26 01:45:05.233769+00:00 [erro] <0.506.0> 2023-08-26 01:45:05.233769+00:00 [erro] <0.506.0> 2023-08-26 01:45:05.233769+00:00 [erro] <0.506.0> 2023-08-26 01:45:05.233769+00:00 [erro] <0.506.0> 2023-08-26 01:45:05.233769+00:00 [erro] <0.506.0> 2023-08-26 01:45:05.233769+00:00 [erro] <0.506.0> 2023-08-26 01:45:05.233769+00:00 [erro] <0.506.0> 2023-08-26 01:45:05.233769+00:00 [erro] <0.506.0> 2023-08-26 01:45:05.233769+00:00 [erro] <0.506.0> 2023-08-26 01:45:05.233769+00:00 [erro] <0.506.0> 2023-08-26 01:45:05.233769+00:00 [erro] <0.506.0> 2023-08-26 01:45:05.233769+00:00 [erro] <0.506.0> 2023-08-26 01:45:05.233769+00:00 [erro] <0.506.0> 2023-08-26 01:45:05.233769+00:00 [erro] <0.506.0> 2023-08-26 01:45:05.233769+00:00 [erro] <0.506.0> 2023-08-26 01:45:05.233769+00:00 [erro] <0.506.0> 2023-08-26 01:45:05.233769+00:00 [erro] <0.506.0> 2023-08-26 01:45:05.233645+00:00 [erro] <0.508.0> 2023-08-26 01:45:05.233645+00:00 [erro] <0.508.0> 2023-08-26 01:45:05.233645+00:00 [erro] <0.508.0> 2023-08-26 01:45:05.233645+00:00 [erro] <0.508.0> 2023-08-26 01:45:05.233645+00:00 [erro] <0.508.0> 2023-08-26 01:45:05.233645+00:00 [erro] <0.508.0> 2023-08-26 01:45:05.233645+00:00 [erro] <0.508.0> 2023-08-26 01:45:05.233645+00:00 [erro] <0.508.0> 2023-08-26 01:45:05.233645+00:00 [erro] <0.508.0> 2023-08-26 01:45:05.233645+00:00 [erro] <0.508.0> 2023-08-26 01:45:05.233645+00:00 [erro] <0.508.0> 2023-08-26 01:45:05.233645+00:00 [erro] <0.508.0> 2023-08-26 01:45:05.233645+00:00 [erro] <0.508.0> 2023-08-26 01:45:05.233645+00:00 [erro] <0.508.0> 2023-08-26 01:45:05.233645+00:00 [erro] <0.508.0> 2023-08-26 01:45:05.233645+00:00 [erro] <0.508.0> 2023-08-26 01:45:05.233645+00:00 [erro] <0.508.0> 2023-08-26 01:45:05.233645+00:00 [erro] <0.508.0> 2023-08-26 01:45:05.233645+00:00 [erro] <0.508.0> 2023-08-26 01:45:05.233645+00:00 [erro] <0.508.0> 2023-08-26 01:45:05.233645+00:00 [erro] <0.508.0> 2023-08-26 01:45:05.233645+00:00 [erro] <0.508.0> 2023-08-26 01:45:05.233645+00:00 [erro] <0.508.0> 2023-08-26 01:45:05.233645+00:00 [erro] <0.508.0> 2023-08-26 01:45:05.233645+00:00 [erro] <0.508.0> 2023-08-26 01:45:05.233645+00:00 [erro] <0.508.0> 2023-08-26 01:45:05.233645+00:00 [erro] <0.508.0> 2023-08-26 01:45:05.233645+00:00 [erro] <0.508.0> 2023-08-26 01:45:05.233645+00:00 [erro] <0.508.0> 2023-08-26 01:45:05.233645+00:00 [erro] <0.508.0> 2023-08-26 01:45:05.233645+00:00 [erro] <0.508.0> 2023-08-26 01:45:05.233645+00:00 [erro] <0.508.0> 2023-08-26 01:45:05.233645+00:00 [erro] <0.508.0> 2023-08-26 01:45:05.233645+00:00 [erro] <0.508.0> 2023-08-26 01:45:05.233645+00:00 [erro] <0.508.0> 2023-08-26 01:45:05.233645+00:00 [erro] <0.508.0> 2023-08-26 01:45:05.233645+00:00 [erro] <0.508.0> 2023-08-26 01:45:05.235013+00:00 [warn] <0.223.0> 2023-08-26 01:45:05.536729+00:00 [info] <0.527.0> 2023-08-26 01:45:05.536729+00:00 [info] <0.527.0> 2023-08-26 01:45:05.536729+00:00 [info] <0.527.0> 2023-08-26 01:45:05.536729+00:00 [info] <0.527.0> 2023-08-26 01:45:05.536729+00:00 [info] <0.527.0> 2023-08-26 01:45:05.536729+00:00 [info] <0.527.0> 2023-08-26 01:45:05.536729+00:00 [info] <0.527.0> 2023-08-26 01:45:05.536729+00:00 [info] <0.527.0> 2023-08-26 01:45:05.536729+00:00 [info] <0.527.0> 2023-08-26 01:45:31.242987+00:00 [info] <0.713.0> 2023-08-26 01:45:31.245519+00:00 [info] <0.713.0> 2023-08-26 01:45:31.246976+00:00 [erro] <0.713.0> 2023-08-26 01:45:31.246976+00:00 [erro] <0.713.0> 2023-08-26 01:45:31.246976+00:00 [erro] <0.713.0> 2023-08-26 01:45:31.246976+00:00 [erro] <0.713.0> 2023-08-26 01:45:31.246976+00:00 [erro] <0.713.0> 2023-08-26 01:45:31.247628+00:00 [info] <0.713.0> 2023-08-26 01:45:31.254313+00:00 [info] <0.724.0> '/bitnami/rabbitmq/mnesia/adeptia-rabbitmq/msg_stores/vhosts/628WB79CIFDYO9LJI6DKMI09L' for vhost '/' exists
Message store "628WB79CIFDYO9LJI6DKMI09L/msg_store_transient": using rabbit_msg_store_ets_index to provide index
Failed to start message store of type msg_store_transient for vhost '/': {{{badmatch,
{error,
{"/bitnami/rabbitmq/mnesia/adeptia-rabbitmq/msg_stores/vhosts/628WB79CIFDYO9LJI6DKMI09L/msg_store_transient",
eexist}}},
[{rabbit_msg_store,
init,
1,
[{file,
"rabbit_msg_store.erl"},
{line,
724}]},
{gen_server2,
init_it,
6,
[{file,
"gen_server2.erl"},
{line,
565}]},
{proc_lib,
init_p_do_apply,
3,
[{file,
"proc_lib.erl"},
{line,
226}]}]},
{child,
undefined,
msg_store_transient,
{rabbit_msg_store,
start_link,
[msg_store_transient,
"/bitnami/rabbitmq/mnesia/adeptia-rabbitmq/msg_stores/vhosts/628WB79CIFDYO9LJI6DKMI09L",
undefined,
{#Fun<rabbit_variable_queue.0.124157698>,
ok}]},
transient,
600000,
worker,
[rabbit_msg_store]}}
crasher:
initial call: rabbit_msg_store:init/1
pid: <0.513.0>
registered_name: []
exception exit: {{badmatch,
{error,
{"/bitnami/rabbitmq/mnesia/adeptia-rabbitmq/msg_stores/vhosts/628WB79CIFDYO9LJI6DKMI09L/msg_store_transient",
eexist}}},
[{rabbit_msg_store,init,1,
[{file,"rabbit_msg_store.erl"},{line,724}]},
{gen_server2,init_it,6,
[{file,"gen_server2.erl"},{line,565}]},
{proc_lib,init_p_do_apply,3,
[{file,"proc_lib.erl"},{line,226}]}]}
in function gen_server2:init_it/6 (gen_server2.erl, line 608)
ancestors: [<0.507.0>,<0.506.0>,rabbit_vhost_sup_sup,rabbit_sup,
<0.223.0>]
message_queue_len: 0
messages: []
links: [<0.507.0>]
dictionary: []
trap_exit: true
status: running
heap_size: 2586
stack_size: 29
reductions: 3071
neighbours:

Unable to recover vhost <<"/">> data. Reason {error,
{{{badmatch,
{error,
{"/bitnami/rabbitmq/mnesia/adeptia-rabbitmq/msg_stores/vhosts/628WB79CIFDYO9LJI6DKMI09L/msg_store_transient",
eexist}}},
[{rabbit_msg_store,init,1,
[{file,
"rabbit_msg_store.erl"},
{line,724}]},
{gen_server2,init_it,6,
[{file,"gen_server2.erl"},
{line,565}]},
{proc_lib,init_p_do_apply,3,
[{file,"proc_lib.erl"},
{line,226}]}]},
{child,undefined,
msg_store_transient,
{rabbit_msg_store,start_link,
[msg_store_transient,
"/bitnami/rabbitmq/mnesia/adeptia-rabbitmq/msg_stores/vhosts/628WB79CIFDYO9LJI6DKMI09L",
undefined,
{#Fun<rabbit_variable_queue.0.124157698>,
ok}]},
transient,600000,worker,
[rabbit_msg_store]}}}
Stacktrace [{rabbit_variable_queue,do_start_msg_store,4,
[{file,"rabbit_variable_queue.erl"},
{line,503}]},
{rabbit_variable_queue,start_msg_store,3,
[{file,"rabbit_variable_queue.erl"},
{line,488}]},
{rabbit_variable_queue,start,2,
[{file,"rabbit_variable_queue.erl"},
{line,479}]},
{rabbit_priority_queue,start,2,
[{file,"rabbit_priority_queue.erl"},
{line,85}]},
{rabbit_classic_queue,recover,2,
[{file,"rabbit_classic_queue.erl"},
{line,124}]},
{rabbit_queue_type,'-recover/2-fun-2-',4,
[{file,"rabbit_queue_type.erl"},{line,387}]},
{maps,fold_1,3,[{file,"maps.erl"},{line,410}]},
{rabbit_vhost,recover,1,[{file,"rabbit_vhost.erl"},{line,60}]}]
supervisor: {<0.506.0>,rabbit_vhost_sup_wrapper}
errorContext: start_error
reason: {error,
{{{badmatch,
{error,
{"/bitnami/rabbitmq/mnesia/adeptia-rabbitmq/msg_stores/vhosts/628WB79CIFDYO9LJI6DKMI09L/msg_store_transient",
eexist}}},
[{rabbit_msg_store,init,1,
[{file,"rabbit_msg_store.erl"},{line,724}]},
{gen_server2,init_it,6,
[{file,"gen_server2.erl"},{line,565}]},
{proc_lib,init_p_do_apply,3,
[{file,"proc_lib.erl"},{line,226}]}]},
{child,undefined,msg_store_transient,
{rabbit_msg_store,start_link,
[msg_store_transient,
"/bitnami/rabbitmq/mnesia/adeptia-rabbitmq/msg_stores/vhosts/628WB79CIFDYO9LJI6DKMI09L",
undefined,
{#Fun<rabbit_variable_queue.0.124157698>,ok}]},
transient,600000,worker,
[rabbit_msg_store]}}}
offender: [{pid,undefined},
{id,rabbit_vhost_process},
{mfargs,{rabbit_vhost_process,start_link,[<<"/">>]}},
{restart_type,permanent},
{significant,false},
{shutdown,300000},
{child_type,worker}]

crasher:
initial call: rabbit_vhost_process:init/1
pid: <0.508.0>
registered_name: []
exception exit: {error,
{{{badmatch,
{error,
{"/bitnami/rabbitmq/mnesia/adeptia-rabbitmq/msg_stores/vhosts/628WB79CIFDYO9LJI6DKMI09L/msg_store_transient",
eexist}}},
[{rabbit_msg_store,init,1,
[{file,"rabbit_msg_store.erl"},{line,724}]},
{gen_server2,init_it,6,
[{file,"gen_server2.erl"},{line,565}]},
{proc_lib,init_p_do_apply,3,
[{file,"proc_lib.erl"},{line,226}]}]},
{child,undefined,msg_store_transient,
{rabbit_msg_store,start_link,
[msg_store_transient,
"/bitnami/rabbitmq/mnesia/adeptia-rabbitmq/msg_stores/vhosts/628WB79CIFDYO9LJI6DKMI09L",
undefined,
{#Fun<rabbit_variable_queue.0.124157698>,
ok}]},
transient,600000,worker,
[rabbit_msg_store]}}}
in function gen_server2:init_it/6 (gen_server2.erl, line 600)
ancestors: [<0.506.0>,rabbit_vhost_sup_sup,rabbit_sup,<0.223.0>]
message_queue_len: 0
messages: []
links: [<0.506.0>]
dictionary: []
trap_exit: true
status: running
heap_size: 28690
stack_size: 29
reductions: 82553
neighbours:

Unable to initialize vhost data store for vhost '/'. The vhost will be stopped for this node. Reason: {shutdown, [rabbit_msg_store]}}}}}
Server startup complete; 8 plugins started.
* rabbitmq_prometheus
* rabbitmq_peer_discovery_k8s
* rabbitmq_peer_discovery_common
* rabbitmq_auth_backend_ldap
* rabbitmq_delayed_message_exchange
* rabbitmq_management
* rabbitmq_web_dispatch
* rabbitmq_management_agent
accepting AMQP connection <0.713.0> (10.0.4.250:33358 -> 10.0.4.12:5671)
Connection <0.713.0> (10.0.4.250:33358 -> 10.0.4.12:5671) has a client-provided name: rabbitConnectionFactory#7bcc0392:22
Error on AMQP connection <0.713.0> (10.0.4.250:33358 -> 10.0.4.12:5671 - rabbitConnectionFactory#7bcc0392:22, vhost: 'none', user: 'adeptia', state: opening), channel 0:
{handshake_error,opening,
{amqp_error,internal_error,
"access to vhost '/' refused for user 'adeptia': vhost '/' is down",
'connection.open'}}
closing AMQP connection <0.713.0> (10.0.4.250:33358 -> 10.0.4.12:5671 - rabbitConnectionFactory#7bcc0392:22, vhost: 'none', user: 'adeptia')
accepting AMQP connection <0.724.0> (10.0.4.250:33360 -> 10.0.4.12:5671)

On restarting the pod/container the issue is resolved.
But on production cluster user doesn't have privileges to restart the pod/container.
Also I want to understand why I am getting this error.

Answered by michaelklishin

Jul 10, 2025

I recall discussing this with another core team member and our conclusion was the following: if the mounted volume is not yet ready for writes by the time the node boots, it will fail to seed the data and the virtual host then will fail to stop.

This is pretty clearly hinted at by one of the function names: rabbit_variable_queue:do_start_msg_store/4.

We have never seen this behavior outside of Kubernetes, and RabbitMQ nodes do not do anything creative when it comes to initializing the schema data store or the CQ message store. So there is nothing to "fix once and for all" in RabbitMQ.

A while ago we have considered adding optional delays before and after the node boots, for very different…

View full answer

michaelklishin · 2023-12-05T12:55:33Z

michaelklishin
Dec 5, 2023
Maintainer

RabbitMQ 3.9 has reached EOL. You will not get any further support from the core team unless you upgrade to the latest supported version.

Unable to initialize vhost data store for vhost '/'

is the log line you are looking for. Something prevents a message store from starting in the first boot. That something is an environment-specific problem, maybe the storage device is not ready initially on pod startup, or something like that.

2 replies

ravisingh-adeptia Dec 6, 2023
Author

@michaelklishin Which version should I consider for upgrading rabbitmq

michaelklishin Dec 6, 2023
Maintainer

The latest there is.

See Upgrading RabbitMQ and Blue-Green Deployment before you try to jump straight from 3.9 to 3.12.10.

michaelklishin · 2025-07-10T12:32:55Z

michaelklishin
Jul 10, 2025
Maintainer

I recall discussing this with another core team member and our conclusion was the following: if the mounted volume is not yet ready for writes by the time the node boots, it will fail to seed the data and the virtual host then will fail to stop.

This is pretty clearly hinted at by one of the function names: rabbit_variable_queue:do_start_msg_store/4.

We have never seen this behavior outside of Kubernetes, and RabbitMQ nodes do not do anything creative when it comes to initializing the schema data store or the CQ message store. So there is nothing to "fix once and for all" in RabbitMQ.

A while ago we have considered adding optional delays before and after the node boots, for very different reasons. The former might help here. Or you can inject a startup pause using a Kubernetes-specific method, e.g. an init container that would verify that the volume is ready (writeable).

To our knowledge, this behavior was never reported by those who use our Kubernetes Cluster Operator. Most likely because it introduces a startup delay to work around a widely known unfortunate CoreDNS caching behavior/default.

So, a similar delay will likely help with volumes not being ready early enough.

0 replies

bdoublet91 · 2025-07-14T22:34:41Z

bdoublet91
Jul 14, 2025

Hi, thanks for your reply. I didnt check discussion topic.
For now, I am testing a fix that check rabbitmq data dir is writtable before start rabbitmq in the container:

#! /bin/bash

RABBITMQ_DATA_DIR="/var/lib/rabbitmq"
TEST_FILE="$RABBITMQ_DATA_DIR/.writetest"

echo "Checking if RabbitMQ volume is writable..."
SUCCESS="false"
# Try writing a temp file, retry if not writable
for i in $(seq 1 30); do
  if touch "$TEST_FILE" 2>/dev/null; then
    echo "Writable: $RABBITMQ_DATA_DIR"
    rm "$TEST_FILE"
    SUCCESS="true"
    break
  else
    echo "Attempt $i: Volume not writable yet, retrying in 2s..."
    sleep 2
  fi
done

if [[ "${SUCCESS}" == "false" ]]; then
  echo "ERROR: Volume $RABBITMQ_DATA_DIR is not writable after timeout."
  exit 1
fi

/usr/local/bin/docker-entrypoint.sh rabbitmq-server

I have also change the mountdir from /var/lib/rabbitmq/mnesia to /var/lib/rabbitmq and bind to my NFS on the docker host.
Also change the update workflow, stop the container before start the new one, maybe concurrency access could lead to corrupted data

    deploy:
      update_config:
        order: stop-first

I will try this week these fix and I will give you a feedback.
thanks

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Node fails to start with error access to vhost '/' refused for user 'XXXX': vhost '/' is down #10052

Uh oh!

{{title}}

Uh oh!

Replies: 3 comments 2 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Node fails to start with error access to vhost '/' refused for user 'XXXX': vhost '/' is down #10052

Uh oh!

ravisingh-adeptia Dec 5, 2023

Replies: 3 comments · 2 replies

Uh oh!

michaelklishin Dec 5, 2023 Maintainer

Uh oh!

ravisingh-adeptia Dec 6, 2023 Author

Uh oh!

michaelklishin Dec 6, 2023 Maintainer

Uh oh!

Uh oh!

michaelklishin Jul 10, 2025 Maintainer

Uh oh!

bdoublet91 Jul 14, 2025

ravisingh-adeptia
Dec 5, 2023

Replies: 3 comments 2 replies

michaelklishin
Dec 5, 2023
Maintainer

ravisingh-adeptia Dec 6, 2023
Author

michaelklishin Dec 6, 2023
Maintainer

michaelklishin
Jul 10, 2025
Maintainer

bdoublet91
Jul 14, 2025