Mnesia and schema locks

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Mnesia and schema locks

Loïc Hoguin-3
Hello,

We are trying to debug an issue where we observe a lot of contention
when a RabbitMQ node go down. It has a number of symptoms and we are in
the middle of figuring things out.

One particular symptom occurs on the node that restarts, it gets stuck
and there are two Mnesia locks:

[{{schema,rabbit_durable_route},read,{tid,879886,<6502.2299.18>}},
  {{schema,rabbit_exchange},read,{tid,879887,<6502.2302.18>}}]

The locks are only cleared when the other node in the cluster stops
being so busy deleting data from a number of tables (another symptom)
and things go back to normal.

Part of the problem is that while this is going on, the restarting node
cannot be used, so I would like to understand what conditions can result
in these locks staying up for so long. Any tips appreciated!

Thanks in advance,

--
Loïc Hoguin
https://ninenines.eu
_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: Mnesia and schema locks

Dan Gudmundsson-2
Well you will need to figure out what ,<6502.2299.18> <6502.2302.18> are doing, but they probably waiting
for other locks which are occupied by the busy processes you wrote about.
But you will have to look at that, debugging mnesia is just following the breadcrumbs around the system.

mnesia_locker:get_held_locks() and mnesia_locker:get_lock_queue() may also help.  

Using observer to attach to the different nodes is probably easiest, then you can get a stacktrace of each process,
normally when I do it I don't have a live system. If I want to debug post mortem I use mnesia_lib:dist_coredump() 
to collect each mnesia nodes state and analyse them. Though with many nodes it will take some time to debug or
figure out why it appears to be hanging.


On Wed, Feb 14, 2018 at 6:39 PM Loïc Hoguin <[hidden email]> wrote:
Hello,

We are trying to debug an issue where we observe a lot of contention
when a RabbitMQ node go down. It has a number of symptoms and we are in
the middle of figuring things out.

One particular symptom occurs on the node that restarts, it gets stuck
and there are two Mnesia locks:

[{{schema,rabbit_durable_route},read,{tid,879886,<6502.2299.18>}},
  {{schema,rabbit_exchange},read,{tid,879887,<6502.2302.18>}}]

The locks are only cleared when the other node in the cluster stops
being so busy deleting data from a number of tables (another symptom)
and things go back to normal.

Part of the problem is that while this is going on, the restarting node
cannot be used, so I would like to understand what conditions can result
in these locks staying up for so long. Any tips appreciated!

Thanks in advance,

--
Loïc Hoguin
https://ninenines.eu
_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions

_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: Mnesia and schema locks

Loïc Hoguin-3
Thanks, that helped a lot.

What we ended up doing was call mnesia:set_debug_level(debug) and
subscribe to system events and schema table events using
mnesia:subscribe/1 and this gave us both the transaction/lock that keeps
getting restarted and the transaction/lock that is the cause for this
restart. We then inspected things in Observer and could get a very clear
view of what is going on.

By the way is there a search function for finding a process in Observer?
That would be useful to find the ones we are looking. :-)

Cheers,

On 02/14/2018 07:32 PM, Dan Gudmundsson wrote:

> Well you will need to figure out what ,<6502.2299.18> <6502.2302.18> are
> doing, but they probably waiting
> for other locks which are occupied by the busy processes you wrote about.
> But you will have to look at that, debugging mnesia is just following
> the breadcrumbs around the system.
>
> mnesia_locker:get_held_locks() and mnesia_locker:get_lock_queue() may
> also help.
>
> Using observer to attach to the different nodes is probably easiest,
> then you can get a stacktrace of each process,
> normally when I do it I don't have a live system. If I want to debug
> post mortem I use mnesia_lib:dist_coredump()
> to collect each mnesia nodes state and analyse them. Though with many
> nodes it will take some time to debug or
> figure out why it appears to be hanging.
>
>
> On Wed, Feb 14, 2018 at 6:39 PM Loïc Hoguin <[hidden email]
> <mailto:[hidden email]>> wrote:
>
>     Hello,
>
>     We are trying to debug an issue where we observe a lot of contention
>     when a RabbitMQ node go down. It has a number of symptoms and we are in
>     the middle of figuring things out.
>
>     One particular symptom occurs on the node that restarts, it gets stuck
>     and there are two Mnesia locks:
>
>     [{{schema,rabbit_durable_route},read,{tid,879886,<6502.2299.18>}},
>        {{schema,rabbit_exchange},read,{tid,879887,<6502.2302.18>}}]
>
>     The locks are only cleared when the other node in the cluster stops
>     being so busy deleting data from a number of tables (another symptom)
>     and things go back to normal.
>
>     Part of the problem is that while this is going on, the restarting node
>     cannot be used, so I would like to understand what conditions can result
>     in these locks staying up for so long. Any tips appreciated!
>
>     Thanks in advance,
>
>     --
>     Loïc Hoguin
>     https://ninenines.eu
>     _______________________________________________
>     erlang-questions mailing list
>     [hidden email] <mailto:[hidden email]>
>     http://erlang.org/mailman/listinfo/erlang-questions
>

--
Loïc Hoguin
https://ninenines.eu
_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: Mnesia and schema locks

Dan Gudmundsson-2


On Tue, Feb 20, 2018 at 2:53 PM Loïc Hoguin <[hidden email]> wrote:
Thanks, that helped a lot.

What we ended up doing was call mnesia:set_debug_level(debug) and
subscribe to system events and schema table events using
mnesia:subscribe/1 and this gave us both the transaction/lock that keeps
getting restarted and the transaction/lock that is the cause for this
restart. We then inspected things in Observer and could get a very clear
view of what is going on.


Great
 
By the way is there a search function for finding a process in Observer?
That would be useful to find the ones we are looking. :-)


Not yet, sounds useful, you can sort columns to ease the scrolling,
but no I have not received an PR on that yet :-)


 
Cheers,

On 02/14/2018 07:32 PM, Dan Gudmundsson wrote:
> Well you will need to figure out what ,<6502.2299.18> <6502.2302.18> are
> doing, but they probably waiting
> for other locks which are occupied by the busy processes you wrote about.
> But you will have to look at that, debugging mnesia is just following
> the breadcrumbs around the system.
>
> mnesia_locker:get_held_locks() and mnesia_locker:get_lock_queue() may
> also help.
>
> Using observer to attach to the different nodes is probably easiest,
> then you can get a stacktrace of each process,
> normally when I do it I don't have a live system. If I want to debug
> post mortem I use mnesia_lib:dist_coredump()
> to collect each mnesia nodes state and analyse them. Though with many
> nodes it will take some time to debug or
> figure out why it appears to be hanging.
>
>
> On Wed, Feb 14, 2018 at 6:39 PM Loïc Hoguin <[hidden email]
> <mailto:[hidden email]>> wrote:
>
>     Hello,
>
>     We are trying to debug an issue where we observe a lot of contention
>     when a RabbitMQ node go down. It has a number of symptoms and we are in
>     the middle of figuring things out.
>
>     One particular symptom occurs on the node that restarts, it gets stuck
>     and there are two Mnesia locks:
>
>     [{{schema,rabbit_durable_route},read,{tid,879886,<6502.2299.18>}},
>        {{schema,rabbit_exchange},read,{tid,879887,<6502.2302.18>}}]
>
>     The locks are only cleared when the other node in the cluster stops
>     being so busy deleting data from a number of tables (another symptom)
>     and things go back to normal.
>
>     Part of the problem is that while this is going on, the restarting node
>     cannot be used, so I would like to understand what conditions can result
>     in these locks staying up for so long. Any tips appreciated!
>
>     Thanks in advance,
>
>     --
>     Loïc Hoguin
>     https://ninenines.eu
>     _______________________________________________
>     erlang-questions mailing list
>     [hidden email] <mailto:[hidden email]>
>     http://erlang.org/mailman/listinfo/erlang-questions
>

--
Loïc Hoguin
https://ninenines.eu

_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions