Erlang mnesia node getting isolated from cluster

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Erlang mnesia node getting isolated from cluster

Saurav Prakash
have an erlang(release 17.3) mnesia cluster of 3 nodes running in 1 datacenter with disk+ram based tables. Once in a while I would see that one node at random,say A, would show other 2 nodes as stopped(stopped_db_nodes). Also other 2 nodes, say B and C would show A in stopped_db_nodes. This basically leaves the cluster partitioned although no network split actually happens.The call to erlang:nodes() on all 3 nodes return the whole cluster.I don't even see mnesia system events of partition,maybe because the erlang node never went down.

Is there a bug somewhere in mnesia that causes false network partitions? What would be the right way to remedy this? We are thinking about turning majority on in the cluster.


_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: Erlang mnesia node getting isolated from cluster

Dan Gudmundsson-3

If they report each other as down, then node A have lost connection with the other nodes sometime ago.
Or at least the process links between mnesia on A and B, A and C where broken.

Erlang distribution will by default reconnect to the nodes as you send a msg between A and B and the erlang-network will be
reconnected, though mnesia should detect that the network was partitioned and should not reconnect.

So my guess it that the erlang connection was down for a short period of time, or during the start and they
never had (or could get) a connection when mnesia started on A, and afterwards node A connected to and B,C where mnesia was already started,
but that should also generated a partitioned_network event.

/Dan

On Fri, Feb 2, 2018 at 6:38 AM Saurav Prakash <[hidden email]> wrote:
have an erlang(release 17.3) mnesia cluster of 3 nodes running in 1 datacenter with disk+ram based tables. Once in a while I would see that one node at random,say A, would show other 2 nodes as stopped(stopped_db_nodes). Also other 2 nodes, say B and C would show A in stopped_db_nodes. This basically leaves the cluster partitioned although no network split actually happens.The call to erlang:nodes() on all 3 nodes return the whole cluster.I don't even see mnesia system events of partition,maybe because the erlang node never went down.

Is there a bug somewhere in mnesia that causes false network partitions? What would be the right way to remedy this? We are thinking about turning majority on in the cluster.

_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions

_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions