Mnesia: inconsistent views without netsplit?

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Mnesia: inconsistent views without netsplit?

Daniel Dormont
Hi all,

I have a three node Mnesia cluster (hosting a somewhat outdated
version of ejabberd, but I'm not sure that matters). I have a table
that is stored as ram_copies on all three nodes. Yet, this table has
differing numbers of records among the three.

The table info from one of them is pasted below. Running the same
query on one of my other nodes, I get more or less the same result,
but the "size" is very different: 553 vs 867. And indeed, there are
individual records that turn up in a mnesia:read/2 or
mnesia:dirty_read/2 on one node and not the other.

Yet, nothing in my log indicates that there was ever a netsplit or
disconnection. So I have two questions:

1) What might cause this? and
2) Is there any way, especially given I know which records are
affected, to force some kind of replication on this table without
completely restarting one of the nodes?

thanks,
Dan Dormont


[{access_mode,read_write},
 {active_replicas,['[hidden email]',
                   '[hidden email]',
                   '[hidden email]']},
 {all_nodes,['[hidden email]',
             '[hidden email]',
             '[hidden email]']},
 {arity,3},
 {attributes,[name_host,pid]},
 {checkpoints,[]},
 {commit_work,[]},
 {cookie,{{1341,344810,207763},'ejabberd@10.86.211.63'}},
 {cstruct,{cstruct,muc_online_room,set,
                   ['[hidden email]',
                    '[hidden email]',
                    '[hidden email]'],
                   [],[],0,read_write,false,[],[],false,muc_online_room,
                   [name_host,pid],
                   [],[],[],{...},...}},
 {disc_copies,[]},
 {disc_only_copies,[]},
 {frag_properties,[]},
 {index,[]},
 {load_by_force,false},
 {load_node,'[hidden email]'},
 {load_order,0},
 {load_reason,{active_remote,'[hidden email]'}},
 {local_content,false},
 {majority,false},
 {master_nodes,[]},
 {memory,73643},
 {ram_copies,['[hidden email]',
              '[hidden email]',
              '[hidden email]']},
 {record_name,muc_online_room},
 {record_validation,{muc_online_room,3,set}},
 {type,set},
 {size,867},
 {snmp,[]},
 {storage_properties,...},
 {...}|...]
_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: Mnesia: inconsistent views without netsplit?

Dan Gudmundsson-2
1) No clue. But would be interested if you have an idea what have gone wrong.

2) mnesia:del_table_copy(...) followed by mnesia:add_table_copy(..) should re-copy the table from the other nodes.

On Tue, Apr 26, 2016 at 9:30 PM Daniel Dormont <[hidden email]> wrote:
Hi all,

I have a three node Mnesia cluster (hosting a somewhat outdated
version of ejabberd, but I'm not sure that matters). I have a table
that is stored as ram_copies on all three nodes. Yet, this table has
differing numbers of records among the three.

The table info from one of them is pasted below. Running the same
query on one of my other nodes, I get more or less the same result,
but the "size" is very different: 553 vs 867. And indeed, there are
individual records that turn up in a mnesia:read/2 or
mnesia:dirty_read/2 on one node and not the other.

Yet, nothing in my log indicates that there was ever a netsplit or
disconnection. So I have two questions:

1) What might cause this? and
2) Is there any way, especially given I know which records are
affected, to force some kind of replication on this table without
completely restarting one of the nodes?

thanks,
Dan Dormont


[{access_mode,read_write},
 {active_replicas,['[hidden email]',
                   '[hidden email]',
                   '[hidden email]']},
 {all_nodes,['[hidden email]',
             '[hidden email]',
             '[hidden email]']},
 {arity,3},
 {attributes,[name_host,pid]},
 {checkpoints,[]},
 {commit_work,[]},
 {cookie,{{1341,344810,207763},'[hidden email]'}},
 {cstruct,{cstruct,muc_online_room,set,
                   ['[hidden email]',
                    '[hidden email]',
                    '[hidden email]'],
                   [],[],0,read_write,false,[],[],false,muc_online_room,
                   [name_host,pid],
                   [],[],[],{...},...}},
 {disc_copies,[]},
 {disc_only_copies,[]},
 {frag_properties,[]},
 {index,[]},
 {load_by_force,false},
 {load_node,'[hidden email]'},
 {load_order,0},
 {load_reason,{active_remote,'[hidden email]'}},
 {local_content,false},
 {majority,false},
 {master_nodes,[]},
 {memory,73643},
 {ram_copies,['[hidden email]',
              '[hidden email]',
              '[hidden email]']},
 {record_name,muc_online_room},
 {record_validation,{muc_online_room,3,set}},
 {type,set},
 {size,867},
 {snmp,[]},
 {storage_properties,...},
 {...}|...]
_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions

_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: Mnesia: inconsistent views without netsplit?

Daniel Dormont
On Wed, Apr 27, 2016 at 3:03 AM, Dan Gudmundsson <[hidden email]> wrote:
> 1) No clue. But would be interested if you have an idea what have gone
> wrong.
>
> 2) mnesia:del_table_copy(...) followed by mnesia:add_table_copy(..) should
> re-copy the table from the other nodes.

Thanks. I'll give that a try.

Along the same lines I was wondering: is there a setting I can use to
adjust the sensitivity of the system's detection of node disconnects,
either generically or specifically within Mnesia? My production
environment appears to have occasional momentary network hiccups (it's
Amazon EC2 instances spanning zones within a region, for anyone
curious). I'd like to make it less likely for those hiccups to cause
Mnesia to enter an inconsistent state, even if it means real failures
take a little longer to detect.

thanks,
Dan


>
> On Tue, Apr 26, 2016 at 9:30 PM Daniel Dormont <[hidden email]>
> wrote:
>>
>> Hi all,
>>
>> I have a three node Mnesia cluster (hosting a somewhat outdated
>> version of ejabberd, but I'm not sure that matters). I have a table
>> that is stored as ram_copies on all three nodes. Yet, this table has
>> differing numbers of records among the three.
>>
>> The table info from one of them is pasted below. Running the same
>> query on one of my other nodes, I get more or less the same result,
>> but the "size" is very different: 553 vs 867. And indeed, there are
>> individual records that turn up in a mnesia:read/2 or
>> mnesia:dirty_read/2 on one node and not the other.
>>
>> Yet, nothing in my log indicates that there was ever a netsplit or
>> disconnection. So I have two questions:
>>
>> 1) What might cause this? and
>> 2) Is there any way, especially given I know which records are
>> affected, to force some kind of replication on this table without
>> completely restarting one of the nodes?
>>
>> thanks,
>> Dan Dormont
>>
>>
>> [{access_mode,read_write},
>>  {active_replicas,['[hidden email]',
>>                    '[hidden email]',
>>                    '[hidden email]']},
>>  {all_nodes,['[hidden email]',
>>              '[hidden email]',
>>              '[hidden email]']},
>>  {arity,3},
>>  {attributes,[name_host,pid]},
>>  {checkpoints,[]},
>>  {commit_work,[]},
>>  {cookie,{{1341,344810,207763},'ejabberd@10.86.211.63'}},
>>  {cstruct,{cstruct,muc_online_room,set,
>>                    ['[hidden email]',
>>                     '[hidden email]',
>>                     '[hidden email]'],
>>                    [],[],0,read_write,false,[],[],false,muc_online_room,
>>                    [name_host,pid],
>>                    [],[],[],{...},...}},
>>  {disc_copies,[]},
>>  {disc_only_copies,[]},
>>  {frag_properties,[]},
>>  {index,[]},
>>  {load_by_force,false},
>>  {load_node,'[hidden email]'},
>>  {load_order,0},
>>  {load_reason,{active_remote,'[hidden email]'}},
>>  {local_content,false},
>>  {majority,false},
>>  {master_nodes,[]},
>>  {memory,73643},
>>  {ram_copies,['[hidden email]',
>>               '[hidden email]',
>>               '[hidden email]']},
>>  {record_name,muc_online_room},
>>  {record_validation,{muc_online_room,3,set}},
>>  {type,set},
>>  {size,867},
>>  {snmp,[]},
>>  {storage_properties,...},
>>  {...}|...]
>> _______________________________________________
>> erlang-questions mailing list
>> [hidden email]
>> http://erlang.org/mailman/listinfo/erlang-questions
_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: Mnesia: inconsistent views without netsplit?

Garret Smith-2
On Wed, Apr 27, 2016 at 12:32 PM, Daniel Dormont
<[hidden email]> wrote:

> On Wed, Apr 27, 2016 at 3:03 AM, Dan Gudmundsson <[hidden email]> wrote:
>> 1) No clue. But would be interested if you have an idea what have gone
>> wrong.
>>
>> 2) mnesia:del_table_copy(...) followed by mnesia:add_table_copy(..) should
>> re-copy the table from the other nodes.
>
> Thanks. I'll give that a try.
>
> Along the same lines I was wondering: is there a setting I can use to
> adjust the sensitivity of the system's detection of node disconnects,
> either generically or specifically within Mnesia? My production
> environment appears to have occasional momentary network hiccups (it's
> Amazon EC2 instances spanning zones within a region, for anyone
> curious). I'd like to make it less likely for those hiccups to cause
> Mnesia to enter an inconsistent state, even if it means real failures
> take a little longer to detect.

If the "hiccups" are high latency, you can look at adjusting
net_ticktime, documented here.
http://erlang.org/doc/man/kernel_app.html

>
> thanks,
> Dan
>
>
>>
>> On Tue, Apr 26, 2016 at 9:30 PM Daniel Dormont <[hidden email]>
>> wrote:
>>>
>>> Hi all,
>>>
>>> I have a three node Mnesia cluster (hosting a somewhat outdated
>>> version of ejabberd, but I'm not sure that matters). I have a table
>>> that is stored as ram_copies on all three nodes. Yet, this table has
>>> differing numbers of records among the three.
>>>
>>> The table info from one of them is pasted below. Running the same
>>> query on one of my other nodes, I get more or less the same result,
>>> but the "size" is very different: 553 vs 867. And indeed, there are
>>> individual records that turn up in a mnesia:read/2 or
>>> mnesia:dirty_read/2 on one node and not the other.
>>>
>>> Yet, nothing in my log indicates that there was ever a netsplit or
>>> disconnection. So I have two questions:
>>>
>>> 1) What might cause this? and
>>> 2) Is there any way, especially given I know which records are
>>> affected, to force some kind of replication on this table without
>>> completely restarting one of the nodes?
>>>
>>> thanks,
>>> Dan Dormont
>>>
>>>
>>> [{access_mode,read_write},
>>>  {active_replicas,['[hidden email]',
>>>                    '[hidden email]',
>>>                    '[hidden email]']},
>>>  {all_nodes,['[hidden email]',
>>>              '[hidden email]',
>>>              '[hidden email]']},
>>>  {arity,3},
>>>  {attributes,[name_host,pid]},
>>>  {checkpoints,[]},
>>>  {commit_work,[]},
>>>  {cookie,{{1341,344810,207763},'ejabberd@10.86.211.63'}},
>>>  {cstruct,{cstruct,muc_online_room,set,
>>>                    ['[hidden email]',
>>>                     '[hidden email]',
>>>                     '[hidden email]'],
>>>                    [],[],0,read_write,false,[],[],false,muc_online_room,
>>>                    [name_host,pid],
>>>                    [],[],[],{...},...}},
>>>  {disc_copies,[]},
>>>  {disc_only_copies,[]},
>>>  {frag_properties,[]},
>>>  {index,[]},
>>>  {load_by_force,false},
>>>  {load_node,'[hidden email]'},
>>>  {load_order,0},
>>>  {load_reason,{active_remote,'[hidden email]'}},
>>>  {local_content,false},
>>>  {majority,false},
>>>  {master_nodes,[]},
>>>  {memory,73643},
>>>  {ram_copies,['[hidden email]',
>>>               '[hidden email]',
>>>               '[hidden email]']},
>>>  {record_name,muc_online_room},
>>>  {record_validation,{muc_online_room,3,set}},
>>>  {type,set},
>>>  {size,867},
>>>  {snmp,[]},
>>>  {storage_properties,...},
>>>  {...}|...]
>>> _______________________________________________
>>> erlang-questions mailing list
>>> [hidden email]
>>> http://erlang.org/mailman/listinfo/erlang-questions
> _______________________________________________
> erlang-questions mailing list
> [hidden email]
> http://erlang.org/mailman/listinfo/erlang-questions
_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions