Mnesia: strategy for auto-recovery from netsplit

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Mnesia: strategy for auto-recovery from netsplit

Daniel Dormont-2
Hi Erlangers,

I'm running ejabberd with a two-node cluster in my production environment.
Today that system encountered a netsplit. It was properly recorded and
logged. But I need to work on some way to automate a solution for this. I'm
aware that the problem can't be solved in general, but there are two
mitigating factors in my case:

1 - Almost all of my tables are RAM-only.
2 - None of the data are truly critical for me. That is, loss of some
portion of the data isn't critical because my application can recover.

So in this case, I just picked a node, restarted ejabberd on it, and all is
well. But what I'd like to do is write some actual Erlang code that can
subscribe to the Mnesia   partitioned network event and do something about
it. What are my options there?

thanks,
Dan
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20130503/8ac427e3/attachment.html>

Reply | Threaded
Open this post in threaded view
|

Mnesia: strategy for auto-recovery from netsplit

Vance Shipley-2
You seem to know what to do, restart one of the nodes. Or at least restart
mnesia. To restart a node programmatically you may use init:restart/0.
 On May 3, 2013 9:02 PM, "Daniel Dormont" <dan> wrote:

> Hi Erlangers,
>
> I'm running ejabberd with a two-node cluster in my production environment.
> Today that system encountered a netsplit. It was properly recorded and
> logged. But I need to work on some way to automate a solution for this. I'm
> aware that the problem can't be solved in general, but there are two
> mitigating factors in my case:
>
> 1 - Almost all of my tables are RAM-only.
> 2 - None of the data are truly critical for me. That is, loss of some
> portion of the data isn't critical because my application can recover.
>
> So in this case, I just picked a node, restarted ejabberd on it, and all
> is well. But what I'd like to do is write some actual Erlang code that can
> subscribe to the Mnesia   partitioned network event and do something about
> it. What are my options there?
>
> thanks,
> Dan
>
> _______________________________________________
> erlang-questions mailing list
> erlang-questions
> http://erlang.org/mailman/listinfo/erlang-questions
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20130503/57bc9856/attachment.html>

Reply | Threaded
Open this post in threaded view
|

Mnesia: strategy for auto-recovery from netsplit

Michael Truog-2
In reply to this post by Daniel Dormont-2
The only solution seems to be https://github.com/uwiger/unsplit usage where you manually resolve any conflicts.  Someone may already have integration with ejabberd that is available, but the problem of which side of the nodesplit to take should be error-prone, difficult, and sometimes impossible (depending on the data stored).  I think it is simpler to just hookup ejabberd to postgres or mysql instead of the mnesia usage.  There still remains mnesia usage internally, but I don't think the internal mnesia usage that doesn't go to postgres or mysql is distributed (would be good to check).

On 05/03/2013 08:32 AM, Daniel Dormont wrote:

> Hi Erlangers,
>
> I'm running ejabberd with a two-node cluster in my production environment. Today that system encountered a netsplit. It was properly recorded and logged. But I need to work on some way to automate a solution for this. I'm aware that the problem can't be solved in general, but there are two mitigating factors in my case:
>
> 1 - Almost all of my tables are RAM-only.
> 2 - None of the data are truly critical for me. That is, loss of some portion of the data isn't critical because my application can recover.
>
> So in this case, I just picked a node, restarted ejabberd on it, and all is well. But what I'd like to do is write some actual Erlang code that can subscribe to the Mnesia   partitioned network event and do something about it. What are my options there?
>
> thanks,
> Dan
>
>
> _______________________________________________
> erlang-questions mailing list
> erlang-questions
> http://erlang.org/mailman/listinfo/erlang-questions

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20130503/53080bd7/attachment.html>

Reply | Threaded
Open this post in threaded view
|

Mnesia: strategy for auto-recovery from netsplit

Daniel Dormont-2
It is. For example the mappings between Jabber IDs of various kinds (user,
chatroom, etc) and process IDs are kept in Mnesia tables which are
distributed - in fact this is really the core of how clustered ejabberd
works. So I will really need to do something here.

A brief past experiment suggested that ejabberd did not take kindly to a
Mnesia restart on a live node - I think I will have to restart the node.

A related question while I'm thinking of it - are there any modules out
there that can hook into the error logger (or configuration options in the
error logger) and do something different with certain log messages - for
example send them by email?

Dan


On Fri, May 3, 2013 at 12:02 PM, Michael Truog <mjtruog> wrote:

>  The only solution seems to be https://github.com/uwiger/unsplit usage
> where you manually resolve any conflicts.  Someone may already have
> integration with ejabberd that is available, but the problem of which side
> of the nodesplit to take should be error-prone, difficult, and sometimes
> impossible (depending on the data stored).  I think it is simpler to just
> hookup ejabberd to postgres or mysql instead of the mnesia usage.  There
> still remains mnesia usage internally, but I don't think the internal
> mnesia usage that doesn't go to postgres or mysql is distributed (would be
> good to check).
>
>
> On 05/03/2013 08:32 AM, Daniel Dormont wrote:
>
> Hi Erlangers,
>
>  I'm running ejabberd with a two-node cluster in my production
> environment. Today that system encountered a netsplit. It was properly
> recorded and logged. But I need to work on some way to automate a solution
> for this. I'm aware that the problem can't be solved in general, but there
> are two mitigating factors in my case:
>
>  1 - Almost all of my tables are RAM-only.
> 2 - None of the data are truly critical for me. That is, loss of some
> portion of the data isn't critical because my application can recover.
>
>  So in this case, I just picked a node, restarted ejabberd on it, and all
> is well. But what I'd like to do is write some actual Erlang code that can
> subscribe to the Mnesia   partitioned network event and do something about
> it. What are my options there?
>
>  thanks,
> Dan
>
>
> _______________________________________________
> erlang-questions mailing listerlang-questions://erlang.org/mailman/listinfo/erlang-questions
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20130507/bf049b4a/attachment.html>