gen_statem - state_timeout sometimes reaching callback module as [info, {timeout, Ref, Name}

classic Classic list List threaded Threaded
10 messages Options
Reply | Threaded
Open this post in threaded view
|

gen_statem - state_timeout sometimes reaching callback module as [info, {timeout, Ref, Name}

Peter Morgan
Hello -

We are _sometimes_ seeing cases where a state_timeout in a gen_statem results in the timeout reaching the callback module as info {timeout, Ref, Name} with OTP 21.2.

The crash looks like:

=ERROR REPORT==== 8-Apr-2019::23:37:19.035346 === <0.969.0> gen_statem:error_info/5:1895
** State machine {kafire_fetcher,<<“abc">>,<<“def">>,0} terminating
** Last event = {info,{timeout,#Ref<0.2399112782.889192450.118024>,fetch}}

We crash because we are not expecting info messages - interestingly the following messages are in the queue for the crashed process:


    message_queue_len: 279
    messages: [{timeout,#Ref<0.2399112782.903610369.70160>,fetch},
                  {cancel_timer,#Ref<0.2399112782.903610369.70160>,false},
                  {cancel_timer,#Ref<0.2399112782.903610369.70168>,4},
                  {cancel_timer,#Ref<0.2399112782.889192450.118038>,5},
                  {cancel_timer,#Ref<0.2399112782.889192450.118039>,5},
                  {cancel_timer,#Ref<0.2399112782.889192450.118043>,5},
                  {cancel_timer,#Ref<0.2399112782.889192450.118044>,4},
                  {cancel_timer,#Ref<0.2399112782.889192450.118045>,5},
                  {cancel_timer,#Ref<0.2399112782.889192450.118049>,5},
                  {cancel_timer,#Ref<0.2399112782.889192450.118050>,5},
                  {cancel_timer,#Ref<0.2399112782.889192450.118051>,4},
                  {cancel_timer,#Ref<0.2399112782.889192450.118052>,5},
                  {cancel_timer,#Ref<0.2399112782.889192450.118053>,5},
                  {cancel_timer,#Ref<0.2399112782.889192450.118054>,5},
                  {cancel_timer,#Ref<0.2399112782.889192450.118055>,4},
                  {cancel_timer,#Ref<0.2399112782.889192450.118056>,5},
                  {cancel_timer,#Ref<0.2399112782.889192450.118060>,5},

Followed by lots more cancel_timer messages (200ish!). Our gen_statem does use {state_timeout, 500, fetch} so we are expecting a “fetch” timeout to happen, and we use repeat_state_and_data to requeue the state timeout (and also transition to other states). Is it possible that timeouts in gen_statem can be delivered as an info message?

Thanks
Peter.


_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: gen_statem - state_timeout sometimes reaching callback module as [info, {timeout, Ref, Name}

Raimo Niskanen-11
On Wed, Apr 10, 2019 at 01:41:06PM +0100, Peter Morgan wrote:

> Hello -
>
> We are _sometimes_ seeing cases where a state_timeout in a gen_statem results in the timeout reaching the callback module as info {timeout, Ref, Name} with OTP 21.2.
>
> The crash looks like:
>
> =ERROR REPORT==== 8-Apr-2019::23:37:19.035346 === <0.969.0> gen_statem:error_info/5:1895
> ** State machine {kafire_fetcher,<<“abc">>,<<“def">>,0} terminating
> ** Last event = {info,{timeout,#Ref<0.2399112782.889192450.118024>,fetch}}
>
> We crash because we are not expecting info messages - interestingly the following messages are in the queue for the crashed process:
>
>
>     message_queue_len: 279
>     messages: [{timeout,#Ref<0.2399112782.903610369.70160>,fetch},
>                   {cancel_timer,#Ref<0.2399112782.903610369.70160>,false},
>                   {cancel_timer,#Ref<0.2399112782.903610369.70168>,4},
>                   {cancel_timer,#Ref<0.2399112782.889192450.118038>,5},
>                   {cancel_timer,#Ref<0.2399112782.889192450.118039>,5},
>                   {cancel_timer,#Ref<0.2399112782.889192450.118043>,5},
>                   {cancel_timer,#Ref<0.2399112782.889192450.118044>,4},
>                   {cancel_timer,#Ref<0.2399112782.889192450.118045>,5},
>                   {cancel_timer,#Ref<0.2399112782.889192450.118049>,5},
>                   {cancel_timer,#Ref<0.2399112782.889192450.118050>,5},
>                   {cancel_timer,#Ref<0.2399112782.889192450.118051>,4},
>                   {cancel_timer,#Ref<0.2399112782.889192450.118052>,5},
>                   {cancel_timer,#Ref<0.2399112782.889192450.118053>,5},
>                   {cancel_timer,#Ref<0.2399112782.889192450.118054>,5},
>                   {cancel_timer,#Ref<0.2399112782.889192450.118055>,4},
>                   {cancel_timer,#Ref<0.2399112782.889192450.118056>,5},
>                   {cancel_timer,#Ref<0.2399112782.889192450.118060>,5},

Is that from a process_info() call?

>
> Followed by lots more cancel_timer messages (200ish!). Our gen_statem does use {state_timeout, 500, fetch} so we are expecting a “fetch” timeout to happen, and we use repeat_state_and_data to requeue the state timeout (and also transition to other states). Is it possible that timeouts in gen_statem can be delivered as an info message?

It is not supposed to happen.  This must be a bug.

The gen_statem engine keeps track of the running state_timeout timer
and should never present it as an info event.

When a timer is stopped (or restarted) it is done with an asynchronous
cancel (in OTP 21) so the cancel_timer messages comes from that.
They should be matched against map entries in the internal gen_statem
engine state when they arrive so them being in the inbox may be ok.

However, how come you have so many?  Are you restarting the state_timeout
in a very tight loop of repeat_state:s?  Can you show the essential parts
of the code that causes state_timeout:s?

Also - OTP-21.2; is that your exact OTP version?  What is your stdlib version?

I will look at the relevant parts of the code with the new knowledge that
a state_timeout timer can be lost, probably in combination with
repeat_state and state_timeout restart!

On master since Jan 21 so in OTP-22.0-rc1 among other changes the timer handling
has been rewritten to use a synchronous cancel, which simplified the code
significantly.  I do not know if that would be worth trying?

/ Raimo Niskanen


>
> Thanks
> Peter.
>

--

/ Raimo Niskanen, Erlang/OTP, Ericsson AB
_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: gen_statem - state_timeout sometimes reaching callback module as [info, {timeout, Ref, Name}

Raimo Niskanen-11
On Fri, Apr 12, 2019 at 10:38:42AM +0200, Raimo Niskanen wrote:

> On Wed, Apr 10, 2019 at 01:41:06PM +0100, Peter Morgan wrote:
> > Hello -
> >
> > We are _sometimes_ seeing cases where a state_timeout in a gen_statem results in the timeout reaching the callback module as info {timeout, Ref, Name} with OTP 21.2.
> >
> > The crash looks like:
> >
> > =ERROR REPORT==== 8-Apr-2019::23:37:19.035346 === <0.969.0> gen_statem:error_info/5:1895
> > ** State machine {kafire_fetcher,<<“abc">>,<<“def">>,0} terminating
> > ** Last event = {info,{timeout,#Ref<0.2399112782.889192450.118024>,fetch}}
> >
> > We crash because we are not expecting info messages - interestingly the following messages are in the queue for the crashed process:
> >
> >
> >     message_queue_len: 279
> >     messages: [{timeout,#Ref<0.2399112782.903610369.70160>,fetch},
> >                   {cancel_timer,#Ref<0.2399112782.903610369.70160>,false},
> >                   {cancel_timer,#Ref<0.2399112782.903610369.70168>,4},
> >                   {cancel_timer,#Ref<0.2399112782.889192450.118038>,5},
> >                   {cancel_timer,#Ref<0.2399112782.889192450.118039>,5},
> >                   {cancel_timer,#Ref<0.2399112782.889192450.118043>,5},
> >                   {cancel_timer,#Ref<0.2399112782.889192450.118044>,4},
> >                   {cancel_timer,#Ref<0.2399112782.889192450.118045>,5},
> >                   {cancel_timer,#Ref<0.2399112782.889192450.118049>,5},
> >                   {cancel_timer,#Ref<0.2399112782.889192450.118050>,5},
> >                   {cancel_timer,#Ref<0.2399112782.889192450.118051>,4},
> >                   {cancel_timer,#Ref<0.2399112782.889192450.118052>,5},
> >                   {cancel_timer,#Ref<0.2399112782.889192450.118053>,5},
> >                   {cancel_timer,#Ref<0.2399112782.889192450.118054>,5},
> >                   {cancel_timer,#Ref<0.2399112782.889192450.118055>,4},
> >                   {cancel_timer,#Ref<0.2399112782.889192450.118056>,5},
> >                   {cancel_timer,#Ref<0.2399112782.889192450.118060>,5},
>
> Is that from a process_info() call?
>
> >
> > Followed by lots more cancel_timer messages (200ish!). Our gen_statem does use {state_timeout, 500, fetch} so we are expecting a “fetch” timeout to happen, and we use repeat_state_and_data to requeue the state timeout (and also transition to other states). Is it possible that timeouts in gen_statem can be delivered as an info message?
>
> It is not supposed to happen.  This must be a bug.
>
> The gen_statem engine keeps track of the running state_timeout timer
> and should never present it as an info event.
>
> When a timer is stopped (or restarted) it is done with an asynchronous
> cancel (in OTP 21) so the cancel_timer messages comes from that.
> They should be matched against map entries in the internal gen_statem
> engine state when they arrive so them being in the inbox may be ok.
>
> However, how come you have so many?  Are you restarting the state_timeout
> in a very tight loop of repeat_state:s?  Can you show the essential parts
> of the code that causes state_timeout:s?
>
> Also - OTP-21.2; is that your exact OTP version?  What is your stdlib version?
>
> I will look at the relevant parts of the code with the new knowledge that
> a state_timeout timer can be lost, probably in combination with
> repeat_state and state_timeout restart!


Sorry, the knowledge did not help; I did not find anything suspicous.

More information will be needed...

>
> On master since Jan 21 so in OTP-22.0-rc1 among other changes the timer handling
> has been rewritten to use a synchronous cancel, which simplified the code
> significantly.  I do not know if that would be worth trying?
>
> / Raimo Niskanen
>
>
> >
> > Thanks
> > Peter.
> >

--

/ Raimo Niskanen, Erlang/OTP, Ericsson AB
_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: gen_statem - state_timeout sometimes reaching callback module as [info, {timeout, Ref, Name}

Peter Morgan
Hi Raimo,

Thanks for looking at this, and the background on the implementation which is very useful. I am in the process of putting together a simple test case, but having trouble reproducing it. It appears to happen very infrequently - possibly something that is load related? The original output was from the Crash Report which I’ve attached - where you can see the info timeout was received by our gen_statem behaviour callback. Our OTP is 21.2 (without any 21.2.X patches):

{stdlib,"ERTS  CXC 138 10","3.7”}
{kernel,"ERTS  CXC 138 10","6.2”}


Your question - “why so many?” - it could be a failure of a downstream service that is ultimately causing this, and we’re not backing off in this case - something that I’m checking too. However, I still thought it was interesting that the info timeout reaches the behaviour module.




I’ll try and get a simple test case for you.
Regards,
Peter.



> On 12 Apr 2019, at 13:38, Raimo Niskanen <[hidden email]> wrote:
>
> On Fri, Apr 12, 2019 at 10:38:42AM +0200, Raimo Niskanen wrote:
>> On Wed, Apr 10, 2019 at 01:41:06PM +0100, Peter Morgan wrote:
>>> Hello -
>>>
>>> We are _sometimes_ seeing cases where a state_timeout in a gen_statem results in the timeout reaching the callback module as info {timeout, Ref, Name} with OTP 21.2.
>>>
>>> The crash looks like:
>>>
>>> =ERROR REPORT==== 8-Apr-2019::23:37:19.035346 === <0.969.0> gen_statem:error_info/5:1895
>>> ** State machine {kafire_fetcher,<<“abc">>,<<“def">>,0} terminating
>>> ** Last event = {info,{timeout,#Ref<0.2399112782.889192450.118024>,fetch}}
>>>
>>> We crash because we are not expecting info messages - interestingly the following messages are in the queue for the crashed process:
>>>
>>>
>>>    message_queue_len: 279
>>>    messages: [{timeout,#Ref<0.2399112782.903610369.70160>,fetch},
>>>                  {cancel_timer,#Ref<0.2399112782.903610369.70160>,false},
>>>                  {cancel_timer,#Ref<0.2399112782.903610369.70168>,4},
>>>                  {cancel_timer,#Ref<0.2399112782.889192450.118038>,5},
>>>                  {cancel_timer,#Ref<0.2399112782.889192450.118039>,5},
>>>                  {cancel_timer,#Ref<0.2399112782.889192450.118043>,5},
>>>                  {cancel_timer,#Ref<0.2399112782.889192450.118044>,4},
>>>                  {cancel_timer,#Ref<0.2399112782.889192450.118045>,5},
>>>                  {cancel_timer,#Ref<0.2399112782.889192450.118049>,5},
>>>                  {cancel_timer,#Ref<0.2399112782.889192450.118050>,5},
>>>                  {cancel_timer,#Ref<0.2399112782.889192450.118051>,4},
>>>                  {cancel_timer,#Ref<0.2399112782.889192450.118052>,5},
>>>                  {cancel_timer,#Ref<0.2399112782.889192450.118053>,5},
>>>                  {cancel_timer,#Ref<0.2399112782.889192450.118054>,5},
>>>                  {cancel_timer,#Ref<0.2399112782.889192450.118055>,4},
>>>                  {cancel_timer,#Ref<0.2399112782.889192450.118056>,5},
>>>                  {cancel_timer,#Ref<0.2399112782.889192450.118060>,5},
>>
>> Is that from a process_info() call?
>>
>>>
>>> Followed by lots more cancel_timer messages (200ish!). Our gen_statem does use {state_timeout, 500, fetch} so we are expecting a “fetch” timeout to happen, and we use repeat_state_and_data to requeue the state timeout (and also transition to other states). Is it possible that timeouts in gen_statem can be delivered as an info message?
>>
>> It is not supposed to happen.  This must be a bug.
>>
>> The gen_statem engine keeps track of the running state_timeout timer
>> and should never present it as an info event.
>>
>> When a timer is stopped (or restarted) it is done with an asynchronous
>> cancel (in OTP 21) so the cancel_timer messages comes from that.
>> They should be matched against map entries in the internal gen_statem
>> engine state when they arrive so them being in the inbox may be ok.
>>
>> However, how come you have so many?  Are you restarting the state_timeout
>> in a very tight loop of repeat_state:s?  Can you show the essential parts
>> of the code that causes state_timeout:s?
>>
>> Also - OTP-21.2; is that your exact OTP version?  What is your stdlib version?
>>
>> I will look at the relevant parts of the code with the new knowledge that
>> a state_timeout timer can be lost, probably in combination with
>> repeat_state and state_timeout restart!
>
>
> Sorry, the knowledge did not help; I did not find anything suspicous.
>
> More information will be needed...
>
>>
>> On master since Jan 21 so in OTP-22.0-rc1 among other changes the timer handling
>> has been rewritten to use a synchronous cancel, which simplified the code
>> significantly.  I do not know if that would be worth trying?
>>
>> / Raimo Niskanen
>>
>>
>>>
>>> Thanks
>>> Peter.
>>>
>
> --
>
> / Raimo Niskanen, Erlang/OTP, Ericsson AB
> _______________________________________________
> erlang-questions mailing list
> [hidden email]
> http://erlang.org/mailman/listinfo/erlang-questions

_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions

info.timeout.txt (20K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: gen_statem - state_timeout sometimes reaching callback module as [info, {timeout, Ref, Name}

Raimo Niskanen-11
After some discussions with the VM guys this seems to be a bug in
gen_statem coming from a misunderstanding of which guarantees that
asynchronous timer cancels gives and does not give.

The {timeout, Ref, Msg} message may actually come _after_ the
{cancel_timer, Ref, false} message.  Heavy load seems to increase the
probability from almost zero to very small.

So the handling of these messages is broken for gen_statem on OTP-21, but
this should be fixed on master (soon to be OTP-22) since gen_statem there
does not use asynchronous timer cancels.

I will look into how big a diff it will be to rewrite gen_statem in OTP-21.latest
to use synchronous timer cancel...

Thank you for finding a tricky bug!
/ Raimo



On Fri, Apr 12, 2019 at 04:13:27PM +0100, Peter Morgan wrote:

> Hi Raimo,
>
> Thanks for looking at this, and the background on the implementation which is very useful. I am in the process of putting together a simple test case, but having trouble reproducing it. It appears to happen very infrequently - possibly something that is load related? The original output was from the Crash Report which I’ve attached - where you can see the info timeout was received by our gen_statem behaviour callback. Our OTP is 21.2 (without any 21.2.X patches):
>
> {stdlib,"ERTS  CXC 138 10","3.7”}
> {kernel,"ERTS  CXC 138 10","6.2”}
>
>
> Your question - “why so many?” - it could be a failure of a downstream service that is ultimately causing this, and we’re not backing off in this case - something that I’m checking too. However, I still thought it was interesting that the info timeout reaches the behaviour module.
>

> =CRASH REPORT==== 8-Apr-2019::23:37:19.038905 === <0.969.0> proc_lib:crash_report/4:508
>   crasher:
>     initial call: kafire_fetcher:init/1
>     pid: <0.969.0>
>     registered_name: []
>     exception error: no function clause matching
>                      kafire_fetcher:handle_event(info,
>                                                  {timeout,
>                                                   #Ref<0.2399112782.889192450.118024>,
>                                                   fetch},
>                                                  subsequent_fetches,
>                                                  our_state) (src/kafire_fetcher.erl, line 445)
>       in function  gen_statem:call_state_function/5 (gen_statem.erl, line 1662)
>       in call from gen_statem:loop_event_state_function/6 (gen_statem.erl, line 1023)
>     ancestors: [<0.931.0>,<0.332.0>,kafire_cluster_sup,kafire_sup,<0.227.0>]
>     message_queue_len: 279
>     messages: [{timeout,#Ref<0.2399112782.903610369.70160>,fetch},
>                   {cancel_timer,#Ref<0.2399112782.903610369.70160>,false},
>                   {cancel_timer,#Ref<0.2399112782.903610369.70168>,4},
>                   {cancel_timer,#Ref<0.2399112782.889192450.118038>,5},
>                   {cancel_timer,#Ref<0.2399112782.889192450.118039>,5},
>                   {cancel_timer,#Ref<0.2399112782.889192450.118043>,5},
>                   {cancel_timer,#Ref<0.2399112782.889192450.118044>,4},
>                   {cancel_timer,#Ref<0.2399112782.889192450.118045>,5},
>                   {cancel_timer,#Ref<0.2399112782.889192450.118049>,5},
>                   {cancel_timer,#Ref<0.2399112782.889192450.118050>,5},
>                   {cancel_timer,#Ref<0.2399112782.889192450.118051>,4},
>                   {cancel_timer,#Ref<0.2399112782.889192450.118052>,5},
>                   {cancel_timer,#Ref<0.2399112782.889192450.118053>,5},
>                   {cancel_timer,#Ref<0.2399112782.889192450.118054>,5},
>                   {cancel_timer,#Ref<0.2399112782.889192450.118055>,4},
>                   {cancel_timer,#Ref<0.2399112782.889192450.118056>,5},
>                   {cancel_timer,#Ref<0.2399112782.889192450.118060>,5},
>                   {cancel_timer,#Ref<0.2399112782.889192450.118061>,5},
>                   {cancel_timer,#Ref<0.2399112782.889192450.118062>,4},
>                   {cancel_timer,#Ref<0.2399112782.889192450.118063>,5},
>                   {cancel_timer,#Ref<0.2399112782.889192450.118064>,5},
>                   {cancel_timer,#Ref<0.2399112782.889192450.118068>,5},
>                   {cancel_timer,#Ref<0.2399112782.889192450.118069>,5},
>                   {cancel_timer,#Ref<0.2399112782.889192450.118070>,4},
>                   {cancel_timer,#Ref<0.2399112782.889192450.118071>,5},
>                   {cancel_timer,#Ref<0.2399112782.889192450.118074>,5},
>                   {cancel_timer,#Ref<0.2399112782.889192450.118079>,5},
>                   {cancel_timer,#Ref<0.2399112782.889192450.118081>,4},
>                   {cancel_timer,#Ref<0.2399112782.889192450.118083>,5},
>                   {cancel_timer,#Ref<0.2399112782.889192450.118087>,5},
>                   {cancel_timer,#Ref<0.2399112782.889192450.118088>,0},
>                   {cancel_timer,#Ref<0.2399112782.889192450.118089>,4},
>                   {cancel_timer,#Ref<0.2399112782.889192450.118093>,5},
>                   {cancel_timer,#Ref<0.2399112782.889192450.118094>,5},
>                   {cancel_timer,#Ref<0.2399112782.889192450.118095>,3},
>                   {cancel_timer,#Ref<0.2399112782.889192450.118096>,5},
>                   {cancel_timer,#Ref<0.2399112782.889192450.118097>,5},
>                   {cancel_timer,#Ref<0.2399112782.889192450.118098>,5},
>                   {cancel_timer,#Ref<0.2399112782.889192450.118099>,3},
>                   {cancel_timer,#Ref<0.2399112782.889192450.118103>,5},
>                   {cancel_timer,#Ref<0.2399112782.889192450.118104>,5},
>                   {cancel_timer,#Ref<0.2399112782.889192450.118105>,5},
>                   {cancel_timer,#Ref<0.2399112782.889192450.118106>,3},
>                   {cancel_timer,#Ref<0.2399112782.889192450.118107>,5},
>                   {cancel_timer,#Ref<0.2399112782.889192450.118108>,1},
>                   {cancel_timer,#Ref<0.2399112782.889192450.118109>,4},
>                   {cancel_timer,#Ref<0.2399112782.889192450.118113>,5},
>                   {cancel_timer,#Ref<0.2399112782.889192450.118114>,5},
>                   {cancel_timer,#Ref<0.2399112782.889192450.118115>,5},
>                   {cancel_timer,#Ref<0.2399112782.889192450.118116>,4},
>                   {cancel_timer,#Ref<0.2399112782.889192450.118117>,4},
>                   {cancel_timer,#Ref<0.2399112782.889192450.118118>,5},
>                   {cancel_timer,#Ref<0.2399112782.889192450.118119>,5},
>                   {cancel_timer,#Ref<0.2399112782.889192450.118120>,5},
>                   {cancel_timer,#Ref<0.2399112782.889192450.118121>,4},
>                   {cancel_timer,#Ref<0.2399112782.889192450.118122>,5},
>                   {cancel_timer,#Ref<0.2399112782.889192450.118123>,5},
>                   {cancel_timer,#Ref<0.2399112782.889192450.118124>,5},
>                   {cancel_timer,#Ref<0.2399112782.889192450.118125>,5},
>                   {cancel_timer,#Ref<0.2399112782.889192450.118126>,4},
>                   {cancel_timer,#Ref<0.2399112782.889192450.118127>,5},
>                   {cancel_timer,#Ref<0.2399112782.889192450.118128>,5},
>                   {cancel_timer,#Ref<0.2399112782.889192450.118129>,5},
>                   {cancel_timer,#Ref<0.2399112782.889192450.118130>,5},
>                   {cancel_timer,#Ref<0.2399112782.889192450.118131>,4},
>                   {cancel_timer,#Ref<0.2399112782.889192450.118132>,5},
>                   {cancel_timer,#Ref<0.2399112782.889192450.118133>,5},
>                   {cancel_timer,#Ref<0.2399112782.889192450.118134>,5},
>                   {cancel_timer,#Ref<0.2399112782.889192450.118135>,4},
>                   {cancel_timer,#Ref<0.2399112782.889192450.118136>,5},
>                   {cancel_timer,#Ref<0.2399112782.889192450.118137>,5},
>                   {cancel_timer,#Ref<0.2399112782.889192450.118138>,5},
>                   {cancel_timer,#Ref<0.2399112782.889192450.118139>,5},
>                   {cancel_timer,#Ref<0.2399112782.889192450.118140>,4},
>                   {cancel_timer,#Ref<0.2399112782.889192450.118141>,5},
>                   {cancel_timer,#Ref<0.2399112782.889192450.118145>,5},
>                   {cancel_timer,#Ref<0.2399112782.889192450.118146>,5},
>                   {cancel_timer,#Ref<0.2399112782.889192450.118147>,5},
>                   {cancel_timer,#Ref<0.2399112782.889192450.118148>,4},
>                   {cancel_timer,#Ref<0.2399112782.889192450.118149>,5},
>                   {cancel_timer,#Ref<0.2399112782.889192450.118150>,5},
>                   {cancel_timer,#Ref<0.2399112782.889192450.118151>,5},
>                   {cancel_timer,#Ref<0.2399112782.889192450.118152>,5},
>                   {cancel_timer,#Ref<0.2399112782.889192450.118153>,4},
>                   {cancel_timer,#Ref<0.2399112782.889192450.118154>,5},
>                   {timeout,#Ref<0.2399112782.889192450.118159>,fetch},
>                   {cancel_timer,#Ref<0.2399112782.889192450.118159>,false},
>                   {cancel_timer,#Ref<0.2399112782.889192450.118165>,0},
>                   {cancel_timer,#Ref<0.2399112782.889192450.118169>,5},
>                   {cancel_timer,#Ref<0.2399112782.889192450.118170>,5},
>                   {cancel_timer,#Ref<0.2399112782.889192450.118173>,5},
>                   {cancel_timer,#Ref<0.2399112782.889192450.118174>,4},
>                   {cancel_timer,#Ref<0.2399112782.889192450.118175>,5},
>                   {cancel_timer,#Ref<0.2399112782.889192450.118176>,5},
>                   {cancel_timer,#Ref<0.2399112782.889192450.118177>,5},
>                   {cancel_timer,#Ref<0.2399112782.889192450.118178>,5},
>                   {cancel_timer,#Ref<0.2399112782.889192450.118179>,4},
>                   {cancel_timer,#Ref<0.2399112782.889192450.118180>,5},
>                   {cancel_timer,#Ref<0.2399112782.889192450.118181>,5},
>                   {cancel_timer,#Ref<0.2399112782.889192450.118182>,5},
>                   {cancel_timer,#Ref<0.2399112782.889192450.118183>,5},
>                   {cancel_timer,#Ref<0.2399112782.889192450.118184>,4},
>                   {cancel_timer,#Ref<0.2399112782.889192450.118185>,5},
>                   {cancel_timer,#Ref<0.2399112782.889192450.118189>,5},
>                   {cancel_timer,#Ref<0.2399112782.889192450.118190>,5},
>                   {cancel_timer,#Ref<0.2399112782.889192450.118191>,4},
>                   {cancel_timer,#Ref<0.2399112782.889192450.118192>,5},
>                   {cancel_timer,#Ref<0.2399112782.889192450.118196>,5},
>                   {cancel_timer,#Ref<0.2399112782.889192450.118197>,5},
>                   {cancel_timer,#Ref<0.2399112782.889192450.118198>,4},
>                   {cancel_timer,#Ref<0.2399112782.889192450.118199>,5},
>                   {timeout,#Ref<0.2399112782.889192450.118203>,fetch},
>                   {cancel_timer,#Ref<0.2399112782.889192450.118203>,false},
>                   {cancel_timer,#Ref<0.2399112782.889192450.118206>,5},
>                   {cancel_timer,#Ref<0.2399112782.889192450.118207>,5},
>                   {cancel_timer,#Ref<0.2399112782.889192450.118208>,5},
>                   {cancel_timer,#Ref<0.2399112782.889192450.118212>,4},
>                   {cancel_timer,#Ref<0.2399112782.889192450.118213>,5},
>                   {cancel_timer,#Ref<0.2399112782.889192450.118214>,5},
>                   {cancel_timer,#Ref<0.2399112782.889192450.118215>,4},
>                   {cancel_timer,#Ref<0.2399112782.889192450.118221>,5},
>                   {cancel_timer,#Ref<0.2399112782.889192450.118225>,5},
>                   {cancel_timer,#Ref<0.2399112782.889192450.118226>,4},
>                   {cancel_timer,#Ref<0.2399112782.889192450.118227>,5},
>                   {cancel_timer,#Ref<0.2399112782.889192450.118231>,5},
>                   {cancel_timer,#Ref<0.2399112782.889192450.118235>,4},
>                   {cancel_timer,#Ref<0.2399112782.889192450.118236>,5},
>                   {cancel_timer,#Ref<0.2399112782.889192450.118237>,5},
>                   {cancel_timer,#Ref<0.2399112782.889192450.118238>,2},
>                   {cancel_timer,#Ref<0.2399112782.889192450.118242>,4},
>                   {cancel_timer,#Ref<0.2399112782.889192450.118243>,5},
>                   {cancel_timer,#Ref<0.2399112782.889192450.118244>,5},
>                   {cancel_timer,#Ref<0.2399112782.889192450.118248>,5},
>                   {cancel_timer,#Ref<0.2399112782.889192450.118249>,4},
>                   {cancel_timer,#Ref<0.2399112782.889192450.118250>,5},
>                   {cancel_timer,#Ref<0.2399112782.889192450.118254>,5},
>                   {cancel_timer,#Ref<0.2399112782.889192450.118255>,5},
>                   {cancel_timer,#Ref<0.2399112782.889192450.118256>,4},
>                   {cancel_timer,#Ref<0.2399112782.889192450.118260>,5},
>                   {cancel_timer,#Ref<0.2399112782.889192450.118264>,0},
>                   {cancel_timer,#Ref<0.2399112782.889192450.118265>,4},
>                   {cancel_timer,#Ref<0.2399112782.889192450.118266>,5},
>                   {cancel_timer,#Ref<0.2399112782.889192450.118270>,5},
>                   {cancel_timer,#Ref<0.2399112782.889192450.118271>,5},
>                   {cancel_timer,#Ref<0.2399112782.889192450.118272>,4},
>                   {cancel_timer,#Ref<0.2399112782.889192450.118273>,5},
>                   {cancel_timer,#Ref<0.2399112782.889192450.118274>,5},
>                   {cancel_timer,#Ref<0.2399112782.889192450.118278>,4},
>                   {cancel_timer,#Ref<0.2399112782.889192450.118279>,5},
>                   {cancel_timer,#Ref<0.2399112782.889192450.118280>,5},
>                   {cancel_timer,#Ref<0.2399112782.889192450.118281>,5},
>                   {cancel_timer,#Ref<0.2399112782.889192450.118282>,4},
>                   {cancel_timer,#Ref<0.2399112782.889192450.118283>,5},
>                   {cancel_timer,#Ref<0.2399112782.889192450.118287>,5},
>                   {cancel_timer,#Ref<0.2399112782.889192450.118288>,5},
>                   {cancel_timer,#Ref<0.2399112782.889192450.118289>,5},
>                   {cancel_timer,#Ref<0.2399112782.889192450.118290>,4},
>                   {cancel_timer,#Ref<0.2399112782.889192450.118291>,5},
>                   {cancel_timer,#Ref<0.2399112782.889192450.118295>,5},
>                   {cancel_timer,#Ref<0.2399112782.889192450.118296>,5},
>                   {cancel_timer,#Ref<0.2399112782.889192450.118297>,4},
>                   {cancel_timer,#Ref<0.2399112782.889192450.118298>,5},
>                   {cancel_timer,#Ref<0.2399112782.889192450.118299>,5},
>                   {cancel_timer,#Ref<0.2399112782.889192450.118303>,5},
>                   {cancel_timer,#Ref<0.2399112782.889192450.118304>,4},
>                   {cancel_timer,#Ref<0.2399112782.889192450.118305>,5},
>                   {cancel_timer,#Ref<0.2399112782.889192450.118306>,5},
>                   {cancel_timer,#Ref<0.2399112782.889192450.118307>,5},
>                   {cancel_timer,#Ref<0.2399112782.889192450.118308>,4},
>                   {cancel_timer,#Ref<0.2399112782.889192450.118309>,5},
>                   {cancel_timer,#Ref<0.2399112782.889192450.118313>,5},
>                   {cancel_timer,#Ref<0.2399112782.889192450.118314>,5},
>                   {cancel_timer,#Ref<0.2399112782.889192450.118315>,4},
>                   {cancel_timer,#Ref<0.2399112782.889192450.118316>,5},
>                   {cancel_timer,#Ref<0.2399112782.889192450.118317>,5},
>                   {cancel_timer,#Ref<0.2399112782.889192450.118318>,5},
>                   {cancel_timer,#Ref<0.2399112782.889192450.118319>,4},
>                   {cancel_timer,#Ref<0.2399112782.889192450.118323>,5},
>                   {cancel_timer,#Ref<0.2399112782.889192450.118324>,5},
>                   {cancel_timer,#Ref<0.2399112782.889192450.118325>,5},
>                   {timeout,#Ref<0.2399112782.889192450.118326>,fetch},
>                   {cancel_timer,#Ref<0.2399112782.889192450.118326>,false},
>                   {cancel_timer,#Ref<0.2399112782.889192450.118327>,5},
>                   {cancel_timer,#Ref<0.2399112782.889192450.118328>,5},
>                   {cancel_timer,#Ref<0.2399112782.889192450.118329>,5},
>                   {cancel_timer,#Ref<0.2399112782.889192450.118330>,5},
>                   {cancel_timer,#Ref<0.2399112782.889192450.118331>,4},
>                   {cancel_timer,#Ref<0.2399112782.889192450.118332>,5},
>                   {cancel_timer,#Ref<0.2399112782.889192450.118333>,5},
>                   {cancel_timer,#Ref<0.2399112782.889192450.118337>,3},
>                   {cancel_timer,#Ref<0.2399112782.889192450.118338>,5},
>                   {cancel_timer,#Ref<0.2399112782.889192450.118339>,5},
>                   {cancel_timer,#Ref<0.2399112782.889192450.118340>,4},
>                   {cancel_timer,#Ref<0.2399112782.889192450.118344>,5},
>                   {cancel_timer,#Ref<0.2399112782.889192450.118345>,5},
>                   {cancel_timer,#Ref<0.2399112782.889192450.118346>,4},
>                   {cancel_timer,#Ref<0.2399112782.889192450.118347>,5},
>                   {cancel_timer,#Ref<0.2399112782.889192450.118351>,5},
>                   {cancel_timer,#Ref<0.2399112782.889192450.118352>,4},
>                   {cancel_timer,#Ref<0.2399112782.889192450.118353>,5},
>                   {cancel_timer,#Ref<0.2399112782.889192450.118357>,5},
>                   {cancel_timer,#Ref<0.2399112782.889192450.118358>,5},
>                   {cancel_timer,#Ref<0.2399112782.889192450.118359>,4},
>                   {cancel_timer,#Ref<0.2399112782.889192450.118360>,5},
>                   {cancel_timer,#Ref<0.2399112782.889192450.118364>,5},
>                   {cancel_timer,#Ref<0.2399112782.889192450.118365>,5},
>                   {cancel_timer,#Ref<0.2399112782.889192450.118366>,4},
>                   {cancel_timer,#Ref<0.2399112782.889192450.118367>,5},
>                   {cancel_timer,#Ref<0.2399112782.889192450.118371>,5},
>                   {cancel_timer,#Ref<0.2399112782.889192450.118372>,5},
>                   {cancel_timer,#Ref<0.2399112782.889192450.118373>,4},
>                   {cancel_timer,#Ref<0.2399112782.889192450.118377>,5},
>                   {cancel_timer,#Ref<0.2399112782.889192450.118378>,5},
>                   {cancel_timer,#Ref<0.2399112782.889192450.118379>,5},
>                   {cancel_timer,#Ref<0.2399112782.889192450.118383>,4},
>                   {cancel_timer,#Ref<0.2399112782.889192450.118384>,5},
>                   {cancel_timer,#Ref<0.2399112782.889192450.118385>,5},
>                   {cancel_timer,#Ref<0.2399112782.889192450.118389>,5},
>                   {cancel_timer,#Ref<0.2399112782.889192450.118390>,4},
>                   {cancel_timer,#Ref<0.2399112782.889192450.118391>,5},
>                   {cancel_timer,#Ref<0.2399112782.889192450.118392>,5},
>                   {cancel_timer,#Ref<0.2399112782.889192450.118393>,5},
>                   {cancel_timer,#Ref<0.2399112782.889192450.118394>,0},
>                   {cancel_timer,#Ref<0.2399112782.889192450.118395>,5},
>                   {cancel_timer,#Ref<0.2399112782.889192450.118399>,5},
>                   {cancel_timer,#Ref<0.2399112782.889192450.118400>,4},
>                   {cancel_timer,#Ref<0.2399112782.889192450.118401>,5},
>                   {cancel_timer,#Ref<0.2399112782.889192450.118402>,5},
>                   {cancel_timer,#Ref<0.2399112782.889192450.118403>,5},
>                   {cancel_timer,#Ref<0.2399112782.889192450.118404>,4},
>                   {cancel_timer,#Ref<0.2399112782.889192450.118405>,5},
>                   {cancel_timer,#Ref<0.2399112782.889192450.118409>,5},
>                   {cancel_timer,#Ref<0.2399112782.889192450.118410>,5},
>                   {cancel_timer,#Ref<0.2399112782.889192450.118411>,4},
>                   {cancel_timer,#Ref<0.2399112782.889192450.118412>,1},
>                   {cancel_timer,#Ref<0.2399112782.889192450.118413>,5},
>                   {cancel_timer,#Ref<0.2399112782.889192450.118414>,4},
>                   {cancel_timer,#Ref<0.2399112782.889192450.118415>,5},
>                   {cancel_timer,#Ref<0.2399112782.889192450.118419>,5},
>                   {cancel_timer,#Ref<0.2399112782.889192450.118420>,5},
>                   {cancel_timer,#Ref<0.2399112782.889192450.118421>,4},
>                   {cancel_timer,#Ref<0.2399112782.889192450.118422>,5},
>                   {cancel_timer,#Ref<0.2399112782.889192450.118423>,5},
>                   {cancel_timer,#Ref<0.2399112782.889192450.118424>,5},
>                   {cancel_timer,#Ref<0.2399112782.889192450.118425>,4},
>                   {cancel_timer,#Ref<0.2399112782.889192450.118429>,5},
>                   {cancel_timer,#Ref<0.2399112782.889192450.118430>,0},
>                   {cancel_timer,#Ref<0.2399112782.889192450.118431>,5},
>                   {cancel_timer,#Ref<0.2399112782.889192450.118432>,5},
>                   {cancel_timer,#Ref<0.2399112782.889192450.118433>,5},
>                   {cancel_timer,#Ref<0.2399112782.889192450.118434>,4},
>                   {cancel_timer,#Ref<0.2399112782.889192450.118435>,5},
>                   {cancel_timer,#Ref<0.2399112782.889192450.118439>,5},
>                   {cancel_timer,#Ref<0.2399112782.889192450.118440>,1},
>                   {cancel_timer,#Ref<0.2399112782.889192450.118441>,5},
>                   {cancel_timer,#Ref<0.2399112782.889192450.118442>,4},
>                   {cancel_timer,#Ref<0.2399112782.889192450.118443>,5},
>                   {cancel_timer,#Ref<0.2399112782.889192450.118444>,5},
>                   {cancel_timer,#Ref<0.2399112782.889192450.118445>,4},
>                   {cancel_timer,#Ref<0.2399112782.889192450.118449>,5},
>                   {cancel_timer,#Ref<0.2399112782.889192450.118450>,5},
>                   {cancel_timer,#Ref<0.2399112782.889192450.118451>,5},
>                   {cancel_timer,#Ref<0.2399112782.889192450.118452>,4},
>                   {cancel_timer,#Ref<0.2399112782.889192450.118453>,5},
>                   {cancel_timer,#Ref<0.2399112782.889192450.118454>,5},
>                   {cancel_timer,#Ref<0.2399112782.889192450.118455>,5},
>                   {timeout,#Ref<0.2399112782.889192450.118456>,fetch},
>                   {'EXIT',<0.10080.8>,normal},
>                   {'EXIT',<0.10079.8>,normal},
>                   {'EXIT',<0.10081.8>,normal},
>                   {'EXIT',<0.10082.8>,normal},
>                   {'EXIT',<0.10083.8>,normal},
>                   {'EXIT',<0.10084.8>,normal},
>                   {'EXIT',<0.10085.8>,normal},
>                   {'EXIT',<0.10086.8>,normal},
>                   {'EXIT',<0.10087.8>,normal},
>                   {'EXIT',<0.10088.8>,normal},
>                   {'EXIT',<0.10089.8>,normal},
>                   {'EXIT',<0.10071.8>,normal}]
>     links: [<0.931.0>]
>     dictionary: []
>     trap_exit: true
>     status: running
>     heap_size: 17731
>     stack_size: 27
>     reductions: 28849413204
>   neighbours:

>
>
> I’ll try and get a simple test case for you.
> Regards,
> Peter.
>
>
>
> > On 12 Apr 2019, at 13:38, Raimo Niskanen <[hidden email]> wrote:
> >
> > On Fri, Apr 12, 2019 at 10:38:42AM +0200, Raimo Niskanen wrote:
> >> On Wed, Apr 10, 2019 at 01:41:06PM +0100, Peter Morgan wrote:
> >>> Hello -
> >>>
> >>> We are _sometimes_ seeing cases where a state_timeout in a gen_statem results in the timeout reaching the callback module as info {timeout, Ref, Name} with OTP 21.2.
> >>>
> >>> The crash looks like:
> >>>
> >>> =ERROR REPORT==== 8-Apr-2019::23:37:19.035346 === <0.969.0> gen_statem:error_info/5:1895
> >>> ** State machine {kafire_fetcher,<<“abc">>,<<“def">>,0} terminating
> >>> ** Last event = {info,{timeout,#Ref<0.2399112782.889192450.118024>,fetch}}
> >>>
> >>> We crash because we are not expecting info messages - interestingly the following messages are in the queue for the crashed process:
> >>>
> >>>
> >>>    message_queue_len: 279
> >>>    messages: [{timeout,#Ref<0.2399112782.903610369.70160>,fetch},
> >>>                  {cancel_timer,#Ref<0.2399112782.903610369.70160>,false},
> >>>                  {cancel_timer,#Ref<0.2399112782.903610369.70168>,4},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118038>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118039>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118043>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118044>,4},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118045>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118049>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118050>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118051>,4},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118052>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118053>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118054>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118055>,4},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118056>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118060>,5},
> >>
> >> Is that from a process_info() call?
> >>
> >>>
> >>> Followed by lots more cancel_timer messages (200ish!). Our gen_statem does use {state_timeout, 500, fetch} so we are expecting a “fetch” timeout to happen, and we use repeat_state_and_data to requeue the state timeout (and also transition to other states). Is it possible that timeouts in gen_statem can be delivered as an info message?
> >>
> >> It is not supposed to happen.  This must be a bug.
> >>
> >> The gen_statem engine keeps track of the running state_timeout timer
> >> and should never present it as an info event.
> >>
> >> When a timer is stopped (or restarted) it is done with an asynchronous
> >> cancel (in OTP 21) so the cancel_timer messages comes from that.
> >> They should be matched against map entries in the internal gen_statem
> >> engine state when they arrive so them being in the inbox may be ok.
> >>
> >> However, how come you have so many?  Are you restarting the state_timeout
> >> in a very tight loop of repeat_state:s?  Can you show the essential parts
> >> of the code that causes state_timeout:s?
> >>
> >> Also - OTP-21.2; is that your exact OTP version?  What is your stdlib version?
> >>
> >> I will look at the relevant parts of the code with the new knowledge that
> >> a state_timeout timer can be lost, probably in combination with
> >> repeat_state and state_timeout restart!
> >
> >
> > Sorry, the knowledge did not help; I did not find anything suspicous.
> >
> > More information will be needed...
> >
> >>
> >> On master since Jan 21 so in OTP-22.0-rc1 among other changes the timer handling
> >> has been rewritten to use a synchronous cancel, which simplified the code
> >> significantly.  I do not know if that would be worth trying?
> >>
> >> / Raimo Niskanen
> >>
> >>
> >>>
> >>> Thanks
> >>> Peter.
> >>>
> >
> > --
> >
> > / Raimo Niskanen, Erlang/OTP, Ericsson AB
> > _______________________________________________
> > erlang-questions mailing list
> > [hidden email]
> > http://erlang.org/mailman/listinfo/erlang-questions
>


--

/ Raimo Niskanen, Erlang/OTP, Ericsson AB
_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: gen_statem - state_timeout sometimes reaching callback module as [info, {timeout, Ref, Name}

Raimo Niskanen-11
On Wed, Apr 17, 2019 at 11:54:42AM +0200, Raimo Niskanen wrote:

> After some discussions with the VM guys this seems to be a bug in
> gen_statem coming from a misunderstanding of which guarantees that
> asynchronous timer cancels gives and does not give.
>
> The {timeout, Ref, Msg} message may actually come _after_ the
> {cancel_timer, Ref, false} message.  Heavy load seems to increase the
> probability from almost zero to very small.
>
> So the handling of these messages is broken for gen_statem on OTP-21, but
> this should be fixed on master (soon to be OTP-22) since gen_statem there
> does not use asynchronous timer cancels.
>
> I will look into how big a diff it will be to rewrite gen_statem in OTP-21.latest
> to use synchronous timer cancel...

The diff became a fat one, so I created a pull request:

    https://github.com/erlang/otp/pull/2211

If you can try it out, that would be awsome!

Best Regards
/ Raimo



>
> Thank you for finding a tricky bug!
> / Raimo
>
>
>
> On Fri, Apr 12, 2019 at 04:13:27PM +0100, Peter Morgan wrote:
> > Hi Raimo,
> >
> > Thanks for looking at this, and the background on the implementation which is very useful. I am in the process of putting together a simple test case, but having trouble reproducing it. It appears to happen very infrequently - possibly something that is load related? The original output was from the Crash Report which I’ve attached - where you can see the info timeout was received by our gen_statem behaviour callback. Our OTP is 21.2 (without any 21.2.X patches):
> >
> > {stdlib,"ERTS  CXC 138 10","3.7”}
> > {kernel,"ERTS  CXC 138 10","6.2”}
> >
> >
> > Your question - “why so many?” - it could be a failure of a downstream service that is ultimately causing this, and we’re not backing off in this case - something that I’m checking too. However, I still thought it was interesting that the info timeout reaches the behaviour module.
> >
>
> > =CRASH REPORT==== 8-Apr-2019::23:37:19.038905 === <0.969.0> proc_lib:crash_report/4:508
> >   crasher:
> >     initial call: kafire_fetcher:init/1
> >     pid: <0.969.0>
> >     registered_name: []
> >     exception error: no function clause matching
> >                      kafire_fetcher:handle_event(info,
> >                                                  {timeout,
> >                                                   #Ref<0.2399112782.889192450.118024>,
> >                                                   fetch},
> >                                                  subsequent_fetches,
> >                                                  our_state) (src/kafire_fetcher.erl, line 445)
> >       in function  gen_statem:call_state_function/5 (gen_statem.erl, line 1662)
> >       in call from gen_statem:loop_event_state_function/6 (gen_statem.erl, line 1023)
> >     ancestors: [<0.931.0>,<0.332.0>,kafire_cluster_sup,kafire_sup,<0.227.0>]
> >     message_queue_len: 279
> >     messages: [{timeout,#Ref<0.2399112782.903610369.70160>,fetch},
> >                   {cancel_timer,#Ref<0.2399112782.903610369.70160>,false},
> >                   {cancel_timer,#Ref<0.2399112782.903610369.70168>,4},
> >                   {cancel_timer,#Ref<0.2399112782.889192450.118038>,5},
> >                   {cancel_timer,#Ref<0.2399112782.889192450.118039>,5},
> >                   {cancel_timer,#Ref<0.2399112782.889192450.118043>,5},
> >                   {cancel_timer,#Ref<0.2399112782.889192450.118044>,4},
> >                   {cancel_timer,#Ref<0.2399112782.889192450.118045>,5},
> >                   {cancel_timer,#Ref<0.2399112782.889192450.118049>,5},
> >                   {cancel_timer,#Ref<0.2399112782.889192450.118050>,5},
> >                   {cancel_timer,#Ref<0.2399112782.889192450.118051>,4},
> >                   {cancel_timer,#Ref<0.2399112782.889192450.118052>,5},
> >                   {cancel_timer,#Ref<0.2399112782.889192450.118053>,5},
> >                   {cancel_timer,#Ref<0.2399112782.889192450.118054>,5},
> >                   {cancel_timer,#Ref<0.2399112782.889192450.118055>,4},
> >                   {cancel_timer,#Ref<0.2399112782.889192450.118056>,5},
> >                   {cancel_timer,#Ref<0.2399112782.889192450.118060>,5},
> >                   {cancel_timer,#Ref<0.2399112782.889192450.118061>,5},
> >                   {cancel_timer,#Ref<0.2399112782.889192450.118062>,4},
> >                   {cancel_timer,#Ref<0.2399112782.889192450.118063>,5},
> >                   {cancel_timer,#Ref<0.2399112782.889192450.118064>,5},
> >                   {cancel_timer,#Ref<0.2399112782.889192450.118068>,5},
> >                   {cancel_timer,#Ref<0.2399112782.889192450.118069>,5},
> >                   {cancel_timer,#Ref<0.2399112782.889192450.118070>,4},
> >                   {cancel_timer,#Ref<0.2399112782.889192450.118071>,5},
> >                   {cancel_timer,#Ref<0.2399112782.889192450.118074>,5},
> >                   {cancel_timer,#Ref<0.2399112782.889192450.118079>,5},
> >                   {cancel_timer,#Ref<0.2399112782.889192450.118081>,4},
> >                   {cancel_timer,#Ref<0.2399112782.889192450.118083>,5},
> >                   {cancel_timer,#Ref<0.2399112782.889192450.118087>,5},
> >                   {cancel_timer,#Ref<0.2399112782.889192450.118088>,0},
> >                   {cancel_timer,#Ref<0.2399112782.889192450.118089>,4},
> >                   {cancel_timer,#Ref<0.2399112782.889192450.118093>,5},
> >                   {cancel_timer,#Ref<0.2399112782.889192450.118094>,5},
> >                   {cancel_timer,#Ref<0.2399112782.889192450.118095>,3},
> >                   {cancel_timer,#Ref<0.2399112782.889192450.118096>,5},
> >                   {cancel_timer,#Ref<0.2399112782.889192450.118097>,5},
> >                   {cancel_timer,#Ref<0.2399112782.889192450.118098>,5},
> >                   {cancel_timer,#Ref<0.2399112782.889192450.118099>,3},
> >                   {cancel_timer,#Ref<0.2399112782.889192450.118103>,5},
> >                   {cancel_timer,#Ref<0.2399112782.889192450.118104>,5},
> >                   {cancel_timer,#Ref<0.2399112782.889192450.118105>,5},
> >                   {cancel_timer,#Ref<0.2399112782.889192450.118106>,3},
> >                   {cancel_timer,#Ref<0.2399112782.889192450.118107>,5},
> >                   {cancel_timer,#Ref<0.2399112782.889192450.118108>,1},
> >                   {cancel_timer,#Ref<0.2399112782.889192450.118109>,4},
> >                   {cancel_timer,#Ref<0.2399112782.889192450.118113>,5},
> >                   {cancel_timer,#Ref<0.2399112782.889192450.118114>,5},
> >                   {cancel_timer,#Ref<0.2399112782.889192450.118115>,5},
> >                   {cancel_timer,#Ref<0.2399112782.889192450.118116>,4},
> >                   {cancel_timer,#Ref<0.2399112782.889192450.118117>,4},
> >                   {cancel_timer,#Ref<0.2399112782.889192450.118118>,5},
> >                   {cancel_timer,#Ref<0.2399112782.889192450.118119>,5},
> >                   {cancel_timer,#Ref<0.2399112782.889192450.118120>,5},
> >                   {cancel_timer,#Ref<0.2399112782.889192450.118121>,4},
> >                   {cancel_timer,#Ref<0.2399112782.889192450.118122>,5},
> >                   {cancel_timer,#Ref<0.2399112782.889192450.118123>,5},
> >                   {cancel_timer,#Ref<0.2399112782.889192450.118124>,5},
> >                   {cancel_timer,#Ref<0.2399112782.889192450.118125>,5},
> >                   {cancel_timer,#Ref<0.2399112782.889192450.118126>,4},
> >                   {cancel_timer,#Ref<0.2399112782.889192450.118127>,5},
> >                   {cancel_timer,#Ref<0.2399112782.889192450.118128>,5},
> >                   {cancel_timer,#Ref<0.2399112782.889192450.118129>,5},
> >                   {cancel_timer,#Ref<0.2399112782.889192450.118130>,5},
> >                   {cancel_timer,#Ref<0.2399112782.889192450.118131>,4},
> >                   {cancel_timer,#Ref<0.2399112782.889192450.118132>,5},
> >                   {cancel_timer,#Ref<0.2399112782.889192450.118133>,5},
> >                   {cancel_timer,#Ref<0.2399112782.889192450.118134>,5},
> >                   {cancel_timer,#Ref<0.2399112782.889192450.118135>,4},
> >                   {cancel_timer,#Ref<0.2399112782.889192450.118136>,5},
> >                   {cancel_timer,#Ref<0.2399112782.889192450.118137>,5},
> >                   {cancel_timer,#Ref<0.2399112782.889192450.118138>,5},
> >                   {cancel_timer,#Ref<0.2399112782.889192450.118139>,5},
> >                   {cancel_timer,#Ref<0.2399112782.889192450.118140>,4},
> >                   {cancel_timer,#Ref<0.2399112782.889192450.118141>,5},
> >                   {cancel_timer,#Ref<0.2399112782.889192450.118145>,5},
> >                   {cancel_timer,#Ref<0.2399112782.889192450.118146>,5},
> >                   {cancel_timer,#Ref<0.2399112782.889192450.118147>,5},
> >                   {cancel_timer,#Ref<0.2399112782.889192450.118148>,4},
> >                   {cancel_timer,#Ref<0.2399112782.889192450.118149>,5},
> >                   {cancel_timer,#Ref<0.2399112782.889192450.118150>,5},
> >                   {cancel_timer,#Ref<0.2399112782.889192450.118151>,5},
> >                   {cancel_timer,#Ref<0.2399112782.889192450.118152>,5},
> >                   {cancel_timer,#Ref<0.2399112782.889192450.118153>,4},
> >                   {cancel_timer,#Ref<0.2399112782.889192450.118154>,5},
> >                   {timeout,#Ref<0.2399112782.889192450.118159>,fetch},
> >                   {cancel_timer,#Ref<0.2399112782.889192450.118159>,false},
> >                   {cancel_timer,#Ref<0.2399112782.889192450.118165>,0},
> >                   {cancel_timer,#Ref<0.2399112782.889192450.118169>,5},
> >                   {cancel_timer,#Ref<0.2399112782.889192450.118170>,5},
> >                   {cancel_timer,#Ref<0.2399112782.889192450.118173>,5},
> >                   {cancel_timer,#Ref<0.2399112782.889192450.118174>,4},
> >                   {cancel_timer,#Ref<0.2399112782.889192450.118175>,5},
> >                   {cancel_timer,#Ref<0.2399112782.889192450.118176>,5},
> >                   {cancel_timer,#Ref<0.2399112782.889192450.118177>,5},
> >                   {cancel_timer,#Ref<0.2399112782.889192450.118178>,5},
> >                   {cancel_timer,#Ref<0.2399112782.889192450.118179>,4},
> >                   {cancel_timer,#Ref<0.2399112782.889192450.118180>,5},
> >                   {cancel_timer,#Ref<0.2399112782.889192450.118181>,5},
> >                   {cancel_timer,#Ref<0.2399112782.889192450.118182>,5},
> >                   {cancel_timer,#Ref<0.2399112782.889192450.118183>,5},
> >                   {cancel_timer,#Ref<0.2399112782.889192450.118184>,4},
> >                   {cancel_timer,#Ref<0.2399112782.889192450.118185>,5},
> >                   {cancel_timer,#Ref<0.2399112782.889192450.118189>,5},
> >                   {cancel_timer,#Ref<0.2399112782.889192450.118190>,5},
> >                   {cancel_timer,#Ref<0.2399112782.889192450.118191>,4},
> >                   {cancel_timer,#Ref<0.2399112782.889192450.118192>,5},
> >                   {cancel_timer,#Ref<0.2399112782.889192450.118196>,5},
> >                   {cancel_timer,#Ref<0.2399112782.889192450.118197>,5},
> >                   {cancel_timer,#Ref<0.2399112782.889192450.118198>,4},
> >                   {cancel_timer,#Ref<0.2399112782.889192450.118199>,5},
> >                   {timeout,#Ref<0.2399112782.889192450.118203>,fetch},
> >                   {cancel_timer,#Ref<0.2399112782.889192450.118203>,false},
> >                   {cancel_timer,#Ref<0.2399112782.889192450.118206>,5},
> >                   {cancel_timer,#Ref<0.2399112782.889192450.118207>,5},
> >                   {cancel_timer,#Ref<0.2399112782.889192450.118208>,5},
> >                   {cancel_timer,#Ref<0.2399112782.889192450.118212>,4},
> >                   {cancel_timer,#Ref<0.2399112782.889192450.118213>,5},
> >                   {cancel_timer,#Ref<0.2399112782.889192450.118214>,5},
> >                   {cancel_timer,#Ref<0.2399112782.889192450.118215>,4},
> >                   {cancel_timer,#Ref<0.2399112782.889192450.118221>,5},
> >                   {cancel_timer,#Ref<0.2399112782.889192450.118225>,5},
> >                   {cancel_timer,#Ref<0.2399112782.889192450.118226>,4},
> >                   {cancel_timer,#Ref<0.2399112782.889192450.118227>,5},
> >                   {cancel_timer,#Ref<0.2399112782.889192450.118231>,5},
> >                   {cancel_timer,#Ref<0.2399112782.889192450.118235>,4},
> >                   {cancel_timer,#Ref<0.2399112782.889192450.118236>,5},
> >                   {cancel_timer,#Ref<0.2399112782.889192450.118237>,5},
> >                   {cancel_timer,#Ref<0.2399112782.889192450.118238>,2},
> >                   {cancel_timer,#Ref<0.2399112782.889192450.118242>,4},
> >                   {cancel_timer,#Ref<0.2399112782.889192450.118243>,5},
> >                   {cancel_timer,#Ref<0.2399112782.889192450.118244>,5},
> >                   {cancel_timer,#Ref<0.2399112782.889192450.118248>,5},
> >                   {cancel_timer,#Ref<0.2399112782.889192450.118249>,4},
> >                   {cancel_timer,#Ref<0.2399112782.889192450.118250>,5},
> >                   {cancel_timer,#Ref<0.2399112782.889192450.118254>,5},
> >                   {cancel_timer,#Ref<0.2399112782.889192450.118255>,5},
> >                   {cancel_timer,#Ref<0.2399112782.889192450.118256>,4},
> >                   {cancel_timer,#Ref<0.2399112782.889192450.118260>,5},
> >                   {cancel_timer,#Ref<0.2399112782.889192450.118264>,0},
> >                   {cancel_timer,#Ref<0.2399112782.889192450.118265>,4},
> >                   {cancel_timer,#Ref<0.2399112782.889192450.118266>,5},
> >                   {cancel_timer,#Ref<0.2399112782.889192450.118270>,5},
> >                   {cancel_timer,#Ref<0.2399112782.889192450.118271>,5},
> >                   {cancel_timer,#Ref<0.2399112782.889192450.118272>,4},
> >                   {cancel_timer,#Ref<0.2399112782.889192450.118273>,5},
> >                   {cancel_timer,#Ref<0.2399112782.889192450.118274>,5},
> >                   {cancel_timer,#Ref<0.2399112782.889192450.118278>,4},
> >                   {cancel_timer,#Ref<0.2399112782.889192450.118279>,5},
> >                   {cancel_timer,#Ref<0.2399112782.889192450.118280>,5},
> >                   {cancel_timer,#Ref<0.2399112782.889192450.118281>,5},
> >                   {cancel_timer,#Ref<0.2399112782.889192450.118282>,4},
> >                   {cancel_timer,#Ref<0.2399112782.889192450.118283>,5},
> >                   {cancel_timer,#Ref<0.2399112782.889192450.118287>,5},
> >                   {cancel_timer,#Ref<0.2399112782.889192450.118288>,5},
> >                   {cancel_timer,#Ref<0.2399112782.889192450.118289>,5},
> >                   {cancel_timer,#Ref<0.2399112782.889192450.118290>,4},
> >                   {cancel_timer,#Ref<0.2399112782.889192450.118291>,5},
> >                   {cancel_timer,#Ref<0.2399112782.889192450.118295>,5},
> >                   {cancel_timer,#Ref<0.2399112782.889192450.118296>,5},
> >                   {cancel_timer,#Ref<0.2399112782.889192450.118297>,4},
> >                   {cancel_timer,#Ref<0.2399112782.889192450.118298>,5},
> >                   {cancel_timer,#Ref<0.2399112782.889192450.118299>,5},
> >                   {cancel_timer,#Ref<0.2399112782.889192450.118303>,5},
> >                   {cancel_timer,#Ref<0.2399112782.889192450.118304>,4},
> >                   {cancel_timer,#Ref<0.2399112782.889192450.118305>,5},
> >                   {cancel_timer,#Ref<0.2399112782.889192450.118306>,5},
> >                   {cancel_timer,#Ref<0.2399112782.889192450.118307>,5},
> >                   {cancel_timer,#Ref<0.2399112782.889192450.118308>,4},
> >                   {cancel_timer,#Ref<0.2399112782.889192450.118309>,5},
> >                   {cancel_timer,#Ref<0.2399112782.889192450.118313>,5},
> >                   {cancel_timer,#Ref<0.2399112782.889192450.118314>,5},
> >                   {cancel_timer,#Ref<0.2399112782.889192450.118315>,4},
> >                   {cancel_timer,#Ref<0.2399112782.889192450.118316>,5},
> >                   {cancel_timer,#Ref<0.2399112782.889192450.118317>,5},
> >                   {cancel_timer,#Ref<0.2399112782.889192450.118318>,5},
> >                   {cancel_timer,#Ref<0.2399112782.889192450.118319>,4},
> >                   {cancel_timer,#Ref<0.2399112782.889192450.118323>,5},
> >                   {cancel_timer,#Ref<0.2399112782.889192450.118324>,5},
> >                   {cancel_timer,#Ref<0.2399112782.889192450.118325>,5},
> >                   {timeout,#Ref<0.2399112782.889192450.118326>,fetch},
> >                   {cancel_timer,#Ref<0.2399112782.889192450.118326>,false},
> >                   {cancel_timer,#Ref<0.2399112782.889192450.118327>,5},
> >                   {cancel_timer,#Ref<0.2399112782.889192450.118328>,5},
> >                   {cancel_timer,#Ref<0.2399112782.889192450.118329>,5},
> >                   {cancel_timer,#Ref<0.2399112782.889192450.118330>,5},
> >                   {cancel_timer,#Ref<0.2399112782.889192450.118331>,4},
> >                   {cancel_timer,#Ref<0.2399112782.889192450.118332>,5},
> >                   {cancel_timer,#Ref<0.2399112782.889192450.118333>,5},
> >                   {cancel_timer,#Ref<0.2399112782.889192450.118337>,3},
> >                   {cancel_timer,#Ref<0.2399112782.889192450.118338>,5},
> >                   {cancel_timer,#Ref<0.2399112782.889192450.118339>,5},
> >                   {cancel_timer,#Ref<0.2399112782.889192450.118340>,4},
> >                   {cancel_timer,#Ref<0.2399112782.889192450.118344>,5},
> >                   {cancel_timer,#Ref<0.2399112782.889192450.118345>,5},
> >                   {cancel_timer,#Ref<0.2399112782.889192450.118346>,4},
> >                   {cancel_timer,#Ref<0.2399112782.889192450.118347>,5},
> >                   {cancel_timer,#Ref<0.2399112782.889192450.118351>,5},
> >                   {cancel_timer,#Ref<0.2399112782.889192450.118352>,4},
> >                   {cancel_timer,#Ref<0.2399112782.889192450.118353>,5},
> >                   {cancel_timer,#Ref<0.2399112782.889192450.118357>,5},
> >                   {cancel_timer,#Ref<0.2399112782.889192450.118358>,5},
> >                   {cancel_timer,#Ref<0.2399112782.889192450.118359>,4},
> >                   {cancel_timer,#Ref<0.2399112782.889192450.118360>,5},
> >                   {cancel_timer,#Ref<0.2399112782.889192450.118364>,5},
> >                   {cancel_timer,#Ref<0.2399112782.889192450.118365>,5},
> >                   {cancel_timer,#Ref<0.2399112782.889192450.118366>,4},
> >                   {cancel_timer,#Ref<0.2399112782.889192450.118367>,5},
> >                   {cancel_timer,#Ref<0.2399112782.889192450.118371>,5},
> >                   {cancel_timer,#Ref<0.2399112782.889192450.118372>,5},
> >                   {cancel_timer,#Ref<0.2399112782.889192450.118373>,4},
> >                   {cancel_timer,#Ref<0.2399112782.889192450.118377>,5},
> >                   {cancel_timer,#Ref<0.2399112782.889192450.118378>,5},
> >                   {cancel_timer,#Ref<0.2399112782.889192450.118379>,5},
> >                   {cancel_timer,#Ref<0.2399112782.889192450.118383>,4},
> >                   {cancel_timer,#Ref<0.2399112782.889192450.118384>,5},
> >                   {cancel_timer,#Ref<0.2399112782.889192450.118385>,5},
> >                   {cancel_timer,#Ref<0.2399112782.889192450.118389>,5},
> >                   {cancel_timer,#Ref<0.2399112782.889192450.118390>,4},
> >                   {cancel_timer,#Ref<0.2399112782.889192450.118391>,5},
> >                   {cancel_timer,#Ref<0.2399112782.889192450.118392>,5},
> >                   {cancel_timer,#Ref<0.2399112782.889192450.118393>,5},
> >                   {cancel_timer,#Ref<0.2399112782.889192450.118394>,0},
> >                   {cancel_timer,#Ref<0.2399112782.889192450.118395>,5},
> >                   {cancel_timer,#Ref<0.2399112782.889192450.118399>,5},
> >                   {cancel_timer,#Ref<0.2399112782.889192450.118400>,4},
> >                   {cancel_timer,#Ref<0.2399112782.889192450.118401>,5},
> >                   {cancel_timer,#Ref<0.2399112782.889192450.118402>,5},
> >                   {cancel_timer,#Ref<0.2399112782.889192450.118403>,5},
> >                   {cancel_timer,#Ref<0.2399112782.889192450.118404>,4},
> >                   {cancel_timer,#Ref<0.2399112782.889192450.118405>,5},
> >                   {cancel_timer,#Ref<0.2399112782.889192450.118409>,5},
> >                   {cancel_timer,#Ref<0.2399112782.889192450.118410>,5},
> >                   {cancel_timer,#Ref<0.2399112782.889192450.118411>,4},
> >                   {cancel_timer,#Ref<0.2399112782.889192450.118412>,1},
> >                   {cancel_timer,#Ref<0.2399112782.889192450.118413>,5},
> >                   {cancel_timer,#Ref<0.2399112782.889192450.118414>,4},
> >                   {cancel_timer,#Ref<0.2399112782.889192450.118415>,5},
> >                   {cancel_timer,#Ref<0.2399112782.889192450.118419>,5},
> >                   {cancel_timer,#Ref<0.2399112782.889192450.118420>,5},
> >                   {cancel_timer,#Ref<0.2399112782.889192450.118421>,4},
> >                   {cancel_timer,#Ref<0.2399112782.889192450.118422>,5},
> >                   {cancel_timer,#Ref<0.2399112782.889192450.118423>,5},
> >                   {cancel_timer,#Ref<0.2399112782.889192450.118424>,5},
> >                   {cancel_timer,#Ref<0.2399112782.889192450.118425>,4},
> >                   {cancel_timer,#Ref<0.2399112782.889192450.118429>,5},
> >                   {cancel_timer,#Ref<0.2399112782.889192450.118430>,0},
> >                   {cancel_timer,#Ref<0.2399112782.889192450.118431>,5},
> >                   {cancel_timer,#Ref<0.2399112782.889192450.118432>,5},
> >                   {cancel_timer,#Ref<0.2399112782.889192450.118433>,5},
> >                   {cancel_timer,#Ref<0.2399112782.889192450.118434>,4},
> >                   {cancel_timer,#Ref<0.2399112782.889192450.118435>,5},
> >                   {cancel_timer,#Ref<0.2399112782.889192450.118439>,5},
> >                   {cancel_timer,#Ref<0.2399112782.889192450.118440>,1},
> >                   {cancel_timer,#Ref<0.2399112782.889192450.118441>,5},
> >                   {cancel_timer,#Ref<0.2399112782.889192450.118442>,4},
> >                   {cancel_timer,#Ref<0.2399112782.889192450.118443>,5},
> >                   {cancel_timer,#Ref<0.2399112782.889192450.118444>,5},
> >                   {cancel_timer,#Ref<0.2399112782.889192450.118445>,4},
> >                   {cancel_timer,#Ref<0.2399112782.889192450.118449>,5},
> >                   {cancel_timer,#Ref<0.2399112782.889192450.118450>,5},
> >                   {cancel_timer,#Ref<0.2399112782.889192450.118451>,5},
> >                   {cancel_timer,#Ref<0.2399112782.889192450.118452>,4},
> >                   {cancel_timer,#Ref<0.2399112782.889192450.118453>,5},
> >                   {cancel_timer,#Ref<0.2399112782.889192450.118454>,5},
> >                   {cancel_timer,#Ref<0.2399112782.889192450.118455>,5},
> >                   {timeout,#Ref<0.2399112782.889192450.118456>,fetch},
> >                   {'EXIT',<0.10080.8>,normal},
> >                   {'EXIT',<0.10079.8>,normal},
> >                   {'EXIT',<0.10081.8>,normal},
> >                   {'EXIT',<0.10082.8>,normal},
> >                   {'EXIT',<0.10083.8>,normal},
> >                   {'EXIT',<0.10084.8>,normal},
> >                   {'EXIT',<0.10085.8>,normal},
> >                   {'EXIT',<0.10086.8>,normal},
> >                   {'EXIT',<0.10087.8>,normal},
> >                   {'EXIT',<0.10088.8>,normal},
> >                   {'EXIT',<0.10089.8>,normal},
> >                   {'EXIT',<0.10071.8>,normal}]
> >     links: [<0.931.0>]
> >     dictionary: []
> >     trap_exit: true
> >     status: running
> >     heap_size: 17731
> >     stack_size: 27
> >     reductions: 28849413204
> >   neighbours:
>
> >
> >
> > I’ll try and get a simple test case for you.
> > Regards,
> > Peter.
> >
> >
> >
> > > On 12 Apr 2019, at 13:38, Raimo Niskanen <[hidden email]> wrote:
> > >
> > > On Fri, Apr 12, 2019 at 10:38:42AM +0200, Raimo Niskanen wrote:
> > >> On Wed, Apr 10, 2019 at 01:41:06PM +0100, Peter Morgan wrote:
> > >>> Hello -
> > >>>
> > >>> We are _sometimes_ seeing cases where a state_timeout in a gen_statem results in the timeout reaching the callback module as info {timeout, Ref, Name} with OTP 21.2.
> > >>>
> > >>> The crash looks like:
> > >>>
> > >>> =ERROR REPORT==== 8-Apr-2019::23:37:19.035346 === <0.969.0> gen_statem:error_info/5:1895
> > >>> ** State machine {kafire_fetcher,<<“abc">>,<<“def">>,0} terminating
> > >>> ** Last event = {info,{timeout,#Ref<0.2399112782.889192450.118024>,fetch}}
> > >>>
> > >>> We crash because we are not expecting info messages - interestingly the following messages are in the queue for the crashed process:
> > >>>
> > >>>
> > >>>    message_queue_len: 279
> > >>>    messages: [{timeout,#Ref<0.2399112782.903610369.70160>,fetch},
> > >>>                  {cancel_timer,#Ref<0.2399112782.903610369.70160>,false},
> > >>>                  {cancel_timer,#Ref<0.2399112782.903610369.70168>,4},
> > >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118038>,5},
> > >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118039>,5},
> > >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118043>,5},
> > >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118044>,4},
> > >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118045>,5},
> > >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118049>,5},
> > >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118050>,5},
> > >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118051>,4},
> > >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118052>,5},
> > >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118053>,5},
> > >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118054>,5},
> > >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118055>,4},
> > >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118056>,5},
> > >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118060>,5},
> > >>
> > >> Is that from a process_info() call?
> > >>
> > >>>
> > >>> Followed by lots more cancel_timer messages (200ish!). Our gen_statem does use {state_timeout, 500, fetch} so we are expecting a “fetch” timeout to happen, and we use repeat_state_and_data to requeue the state timeout (and also transition to other states). Is it possible that timeouts in gen_statem can be delivered as an info message?
> > >>
> > >> It is not supposed to happen.  This must be a bug.
> > >>
> > >> The gen_statem engine keeps track of the running state_timeout timer
> > >> and should never present it as an info event.
> > >>
> > >> When a timer is stopped (or restarted) it is done with an asynchronous
> > >> cancel (in OTP 21) so the cancel_timer messages comes from that.
> > >> They should be matched against map entries in the internal gen_statem
> > >> engine state when they arrive so them being in the inbox may be ok.
> > >>
> > >> However, how come you have so many?  Are you restarting the state_timeout
> > >> in a very tight loop of repeat_state:s?  Can you show the essential parts
> > >> of the code that causes state_timeout:s?
> > >>
> > >> Also - OTP-21.2; is that your exact OTP version?  What is your stdlib version?
> > >>
> > >> I will look at the relevant parts of the code with the new knowledge that
> > >> a state_timeout timer can be lost, probably in combination with
> > >> repeat_state and state_timeout restart!
> > >
> > >
> > > Sorry, the knowledge did not help; I did not find anything suspicous.
> > >
> > > More information will be needed...
> > >
> > >>
> > >> On master since Jan 21 so in OTP-22.0-rc1 among other changes the timer handling
> > >> has been rewritten to use a synchronous cancel, which simplified the code
> > >> significantly.  I do not know if that would be worth trying?
> > >>
> > >> / Raimo Niskanen
> > >>
> > >>
> > >>>
> > >>> Thanks
> > >>> Peter.
> > >>>
> > >
> > > --
> > >
> > > / Raimo Niskanen, Erlang/OTP, Ericsson AB
> > > _______________________________________________
> > > erlang-questions mailing list
> > > [hidden email]
> > > http://erlang.org/mailman/listinfo/erlang-questions
> >
>
>
> --
>
> / Raimo Niskanen, Erlang/OTP, Ericsson AB
> _______________________________________________
> erlang-questions mailing list
> [hidden email]
> http://erlang.org/mailman/listinfo/erlang-questions

--

/ Raimo Niskanen, Erlang/OTP, Ericsson AB
_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: gen_statem - state_timeout sometimes reaching callback module as [info, {timeout, Ref, Name}

Raimo Niskanen-11
On Wed, Apr 24, 2019 at 06:46:27PM +0100, Peter Morgan wrote:
> Hi Raimo,
>
> I am having great difficultly reproducing in a simple test case. Despite best efforts of hundreds of thousands of processes creating timers and retry_state I’ve not managed to reproduce our production issue - with a vanilla OTP21 without your patch. Will let you know if I do.
>

Can you try my patch on OTP-21 - pull/2211 to see if it works without
errors in your original setup?

/ Raimo



> Thanks again for your help on this issue.
>
> Regards,
> Peter.
>
> > On 17 Apr 2019, at 15:36, Raimo Niskanen <[hidden email]> wrote:
> >
> > On Wed, Apr 17, 2019 at 11:54:42AM +0200, Raimo Niskanen wrote:
> >> After some discussions with the VM guys this seems to be a bug in
> >> gen_statem coming from a misunderstanding of which guarantees that
> >> asynchronous timer cancels gives and does not give.
> >>
> >> The {timeout, Ref, Msg} message may actually come _after_ the
> >> {cancel_timer, Ref, false} message.  Heavy load seems to increase the
> >> probability from almost zero to very small.
> >>
> >> So the handling of these messages is broken for gen_statem on OTP-21, but
> >> this should be fixed on master (soon to be OTP-22) since gen_statem there
> >> does not use asynchronous timer cancels.
> >>
> >> I will look into how big a diff it will be to rewrite gen_statem in OTP-21.latest
> >> to use synchronous timer cancel...
> >
> > The diff became a fat one, so I created a pull request:
> >
> >    https://github.com/erlang/otp/pull/2211 <https://github.com/erlang/otp/pull/2211>
> >
> > If you can try it out, that would be awsome!
> >
> > Best Regards
> > / Raimo
> >
> >
> >
> >>
> >> Thank you for finding a tricky bug!
> >> / Raimo
> >>
> >>
> >>
> >> On Fri, Apr 12, 2019 at 04:13:27PM +0100, Peter Morgan wrote:
> >>> Hi Raimo,
> >>>
> >>> Thanks for looking at this, and the background on the implementation which is very useful. I am in the process of putting together a simple test case, but having trouble reproducing it. It appears to happen very infrequently - possibly something that is load related? The original output was from the Crash Report which I’ve attached - where you can see the info timeout was received by our gen_statem behaviour callback. Our OTP is 21.2 (without any 21.2.X patches):
> >>>
> >>> {stdlib,"ERTS  CXC 138 10","3.7”}
> >>> {kernel,"ERTS  CXC 138 10","6.2”}
> >>>
> >>>
> >>> Your question - “why so many?” - it could be a failure of a downstream service that is ultimately causing this, and we’re not backing off in this case - something that I’m checking too. However, I still thought it was interesting that the info timeout reaches the behaviour module.
> >>>
> >>
> >>> =CRASH REPORT==== 8-Apr-2019::23:37:19.038905 === <0.969.0> proc_lib:crash_report/4:508
> >>>  crasher:
> >>>    initial call: kafire_fetcher:init/1
> >>>    pid: <0.969.0>
> >>>    registered_name: []
> >>>    exception error: no function clause matching
> >>>                     kafire_fetcher:handle_event(info,
> >>>                                                 {timeout,
> >>>                                                  #Ref<0.2399112782.889192450.118024>,
> >>>                                                  fetch},
> >>>                                                 subsequent_fetches,
> >>>                                                 our_state) (src/kafire_fetcher.erl, line 445)
> >>>      in function  gen_statem:call_state_function/5 (gen_statem.erl, line 1662)
> >>>      in call from gen_statem:loop_event_state_function/6 (gen_statem.erl, line 1023)
> >>>    ancestors: [<0.931.0>,<0.332.0>,kafire_cluster_sup,kafire_sup,<0.227.0>]
> >>>    message_queue_len: 279
> >>>    messages: [{timeout,#Ref<0.2399112782.903610369.70160>,fetch},
> >>>                  {cancel_timer,#Ref<0.2399112782.903610369.70160>,false},
> >>>                  {cancel_timer,#Ref<0.2399112782.903610369.70168>,4},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118038>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118039>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118043>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118044>,4},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118045>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118049>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118050>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118051>,4},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118052>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118053>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118054>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118055>,4},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118056>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118060>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118061>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118062>,4},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118063>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118064>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118068>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118069>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118070>,4},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118071>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118074>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118079>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118081>,4},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118083>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118087>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118088>,0},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118089>,4},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118093>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118094>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118095>,3},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118096>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118097>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118098>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118099>,3},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118103>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118104>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118105>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118106>,3},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118107>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118108>,1},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118109>,4},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118113>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118114>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118115>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118116>,4},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118117>,4},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118118>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118119>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118120>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118121>,4},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118122>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118123>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118124>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118125>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118126>,4},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118127>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118128>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118129>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118130>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118131>,4},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118132>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118133>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118134>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118135>,4},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118136>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118137>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118138>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118139>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118140>,4},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118141>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118145>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118146>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118147>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118148>,4},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118149>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118150>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118151>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118152>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118153>,4},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118154>,5},
> >>>                  {timeout,#Ref<0.2399112782.889192450.118159>,fetch},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118159>,false},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118165>,0},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118169>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118170>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118173>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118174>,4},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118175>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118176>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118177>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118178>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118179>,4},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118180>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118181>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118182>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118183>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118184>,4},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118185>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118189>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118190>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118191>,4},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118192>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118196>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118197>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118198>,4},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118199>,5},
> >>>                  {timeout,#Ref<0.2399112782.889192450.118203>,fetch},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118203>,false},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118206>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118207>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118208>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118212>,4},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118213>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118214>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118215>,4},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118221>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118225>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118226>,4},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118227>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118231>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118235>,4},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118236>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118237>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118238>,2},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118242>,4},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118243>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118244>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118248>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118249>,4},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118250>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118254>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118255>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118256>,4},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118260>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118264>,0},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118265>,4},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118266>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118270>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118271>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118272>,4},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118273>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118274>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118278>,4},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118279>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118280>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118281>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118282>,4},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118283>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118287>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118288>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118289>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118290>,4},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118291>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118295>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118296>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118297>,4},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118298>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118299>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118303>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118304>,4},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118305>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118306>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118307>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118308>,4},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118309>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118313>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118314>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118315>,4},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118316>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118317>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118318>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118319>,4},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118323>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118324>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118325>,5},
> >>>                  {timeout,#Ref<0.2399112782.889192450.118326>,fetch},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118326>,false},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118327>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118328>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118329>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118330>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118331>,4},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118332>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118333>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118337>,3},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118338>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118339>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118340>,4},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118344>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118345>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118346>,4},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118347>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118351>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118352>,4},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118353>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118357>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118358>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118359>,4},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118360>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118364>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118365>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118366>,4},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118367>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118371>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118372>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118373>,4},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118377>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118378>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118379>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118383>,4},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118384>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118385>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118389>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118390>,4},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118391>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118392>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118393>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118394>,0},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118395>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118399>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118400>,4},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118401>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118402>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118403>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118404>,4},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118405>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118409>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118410>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118411>,4},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118412>,1},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118413>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118414>,4},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118415>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118419>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118420>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118421>,4},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118422>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118423>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118424>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118425>,4},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118429>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118430>,0},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118431>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118432>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118433>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118434>,4},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118435>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118439>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118440>,1},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118441>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118442>,4},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118443>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118444>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118445>,4},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118449>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118450>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118451>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118452>,4},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118453>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118454>,5},
> >>>                  {cancel_timer,#Ref<0.2399112782.889192450.118455>,5},
> >>>                  {timeout,#Ref<0.2399112782.889192450.118456>,fetch},
> >>>                  {'EXIT',<0.10080.8>,normal},
> >>>                  {'EXIT',<0.10079.8>,normal},
> >>>                  {'EXIT',<0.10081.8>,normal},
> >>>                  {'EXIT',<0.10082.8>,normal},
> >>>                  {'EXIT',<0.10083.8>,normal},
> >>>                  {'EXIT',<0.10084.8>,normal},
> >>>                  {'EXIT',<0.10085.8>,normal},
> >>>                  {'EXIT',<0.10086.8>,normal},
> >>>                  {'EXIT',<0.10087.8>,normal},
> >>>                  {'EXIT',<0.10088.8>,normal},
> >>>                  {'EXIT',<0.10089.8>,normal},
> >>>                  {'EXIT',<0.10071.8>,normal}]
> >>>    links: [<0.931.0>]
> >>>    dictionary: []
> >>>    trap_exit: true
> >>>    status: running
> >>>    heap_size: 17731
> >>>    stack_size: 27
> >>>    reductions: 28849413204
> >>>  neighbours:
> >>
> >>>
> >>>
> >>> I’ll try and get a simple test case for you.
> >>> Regards,
> >>> Peter.
> >>>
> >>>
> >>>
> >>>> On 12 Apr 2019, at 13:38, Raimo Niskanen <[hidden email]> wrote:
> >>>>
> >>>> On Fri, Apr 12, 2019 at 10:38:42AM +0200, Raimo Niskanen wrote:
> >>>>> On Wed, Apr 10, 2019 at 01:41:06PM +0100, Peter Morgan wrote:
> >>>>>> Hello -
> >>>>>>
> >>>>>> We are _sometimes_ seeing cases where a state_timeout in a gen_statem results in the timeout reaching the callback module as info {timeout, Ref, Name} with OTP 21.2.
> >>>>>>
> >>>>>> The crash looks like:
> >>>>>>
> >>>>>> =ERROR REPORT==== 8-Apr-2019::23:37:19.035346 === <0.969.0> gen_statem:error_info/5:1895
> >>>>>> ** State machine {kafire_fetcher,<<“abc">>,<<“def">>,0} terminating
> >>>>>> ** Last event = {info,{timeout,#Ref<0.2399112782.889192450.118024>,fetch}}
> >>>>>>
> >>>>>> We crash because we are not expecting info messages - interestingly the following messages are in the queue for the crashed process:
> >>>>>>
> >>>>>>
> >>>>>>   message_queue_len: 279
> >>>>>>   messages: [{timeout,#Ref<0.2399112782.903610369.70160>,fetch},
> >>>>>>                 {cancel_timer,#Ref<0.2399112782.903610369.70160>,false},
> >>>>>>                 {cancel_timer,#Ref<0.2399112782.903610369.70168>,4},
> >>>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118038>,5},
> >>>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118039>,5},
> >>>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118043>,5},
> >>>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118044>,4},
> >>>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118045>,5},
> >>>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118049>,5},
> >>>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118050>,5},
> >>>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118051>,4},
> >>>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118052>,5},
> >>>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118053>,5},
> >>>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118054>,5},
> >>>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118055>,4},
> >>>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118056>,5},
> >>>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118060>,5},
> >>>>>
> >>>>> Is that from a process_info() call?
> >>>>>
> >>>>>>
> >>>>>> Followed by lots more cancel_timer messages (200ish!). Our gen_statem does use {state_timeout, 500, fetch} so we are expecting a “fetch” timeout to happen, and we use repeat_state_and_data to requeue the state timeout (and also transition to other states). Is it possible that timeouts in gen_statem can be delivered as an info message?
> >>>>>
> >>>>> It is not supposed to happen.  This must be a bug.
> >>>>>
> >>>>> The gen_statem engine keeps track of the running state_timeout timer
> >>>>> and should never present it as an info event.
> >>>>>
> >>>>> When a timer is stopped (or restarted) it is done with an asynchronous
> >>>>> cancel (in OTP 21) so the cancel_timer messages comes from that.
> >>>>> They should be matched against map entries in the internal gen_statem
> >>>>> engine state when they arrive so them being in the inbox may be ok.
> >>>>>
> >>>>> However, how come you have so many?  Are you restarting the state_timeout
> >>>>> in a very tight loop of repeat_state:s?  Can you show the essential parts
> >>>>> of the code that causes state_timeout:s?
> >>>>>
> >>>>> Also - OTP-21.2; is that your exact OTP version?  What is your stdlib version?
> >>>>>
> >>>>> I will look at the relevant parts of the code with the new knowledge that
> >>>>> a state_timeout timer can be lost, probably in combination with
> >>>>> repeat_state and state_timeout restart!
> >>>>
> >>>>
> >>>> Sorry, the knowledge did not help; I did not find anything suspicous.
> >>>>
> >>>> More information will be needed...
> >>>>
> >>>>>
> >>>>> On master since Jan 21 so in OTP-22.0-rc1 among other changes the timer handling
> >>>>> has been rewritten to use a synchronous cancel, which simplified the code
> >>>>> significantly.  I do not know if that would be worth trying?
> >>>>>
> >>>>> / Raimo Niskanen
> >>>>>
> >>>>>
> >>>>>>
> >>>>>> Thanks
> >>>>>> Peter.
> >>>>>>
> >>>>
> >>>> --
> >>>>
> >>>> / Raimo Niskanen, Erlang/OTP, Ericsson AB
> >>>> _______________________________________________
> >>>> erlang-questions mailing list
> >>>> [hidden email]
> >>>> http://erlang.org/mailman/listinfo/erlang-questions
> >>>
> >>
> >>
> >> --
> >>
> >> / Raimo Niskanen, Erlang/OTP, Ericsson AB
> >> _______________________________________________
> >> erlang-questions mailing list
> >> [hidden email]
> >> http://erlang.org/mailman/listinfo/erlang-questions
> >
> > --
> >
> > / Raimo Niskanen, Erlang/OTP, Ericsson AB
> > _______________________________________________
> > erlang-questions mailing list
> > [hidden email] <mailto:[hidden email]>
> > http://erlang.org/mailman/listinfo/erlang-questions <http://erlang.org/mailman/listinfo/erlang-questions>

--

/ Raimo Niskanen, Erlang/OTP, Ericsson AB
_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: gen_statem - state_timeout sometimes reaching callback module as [info, {timeout, Ref, Name}

Peter Morgan
Hi Raimo,

I’ve tried your patch and it hasn’t caused any errors. We are still having problems reproducing the issue, but it at least doesn’t look to have caused any side effects.

Regards,
Peter.

On 25 Apr 2019, at 08:28, Raimo Niskanen <[hidden email]> wrote:

On Wed, Apr 24, 2019 at 06:46:27PM +0100, Peter Morgan wrote:
Hi Raimo,

I am having great difficultly reproducing in a simple test case. Despite best efforts of hundreds of thousands of processes creating timers and retry_state I’ve not managed to reproduce our production issue - with a vanilla OTP21 without your patch. Will let you know if I do.


Can you try my patch on OTP-21 - pull/2211 to see if it works without
errors in your original setup?

/ Raimo



Thanks again for your help on this issue.

Regards,
Peter.

On 17 Apr 2019, at 15:36, Raimo Niskanen <[hidden email]> wrote:

On Wed, Apr 17, 2019 at 11:54:42AM +0200, Raimo Niskanen wrote:
After some discussions with the VM guys this seems to be a bug in
gen_statem coming from a misunderstanding of which guarantees that
asynchronous timer cancels gives and does not give.

The {timeout, Ref, Msg} message may actually come _after_ the
{cancel_timer, Ref, false} message.  Heavy load seems to increase the
probability from almost zero to very small.

So the handling of these messages is broken for gen_statem on OTP-21, but
this should be fixed on master (soon to be OTP-22) since gen_statem there
does not use asynchronous timer cancels.

I will look into how big a diff it will be to rewrite gen_statem in OTP-21.latest
to use synchronous timer cancel...

The diff became a fat one, so I created a pull request:

  https://github.com/erlang/otp/pull/2211 <https://github.com/erlang/otp/pull/2211>

If you can try it out, that would be awsome!

Best Regards
/ Raimo




Thank you for finding a tricky bug!
/ Raimo



On Fri, Apr 12, 2019 at 04:13:27PM +0100, Peter Morgan wrote:
Hi Raimo,

Thanks for looking at this, and the background on the implementation which is very useful. I am in the process of putting together a simple test case, but having trouble reproducing it. It appears to happen very infrequently - possibly something that is load related? The original output was from the Crash Report which I’ve attached - where you can see the info timeout was received by our gen_statem behaviour callback. Our OTP is 21.2 (without any 21.2.X patches):

{stdlib,"ERTS  CXC 138 10","3.7”}
{kernel,"ERTS  CXC 138 10","6.2”}


Your question - “why so many?” - it could be a failure of a downstream service that is ultimately causing this, and we’re not backing off in this case - something that I’m checking too. However, I still thought it was interesting that the info timeout reaches the behaviour module.


=CRASH REPORT==== 8-Apr-2019::23:37:19.038905 === <0.969.0> proc_lib:crash_report/4:508
crasher:
  initial call: kafire_fetcher:init/1
  pid: <0.969.0>
  registered_name: []
  exception error: no function clause matching 
                   kafire_fetcher:handle_event(info,
                                               {timeout,
                                                #Ref<0.2399112782.889192450.118024>,
                                                fetch},
                                               subsequent_fetches,
                                               our_state) (src/kafire_fetcher.erl, line 445)
    in function  gen_statem:call_state_function/5 (gen_statem.erl, line 1662)
    in call from gen_statem:loop_event_state_function/6 (gen_statem.erl, line 1023)
  ancestors: [<0.931.0>,<0.332.0>,kafire_cluster_sup,kafire_sup,<0.227.0>]
  message_queue_len: 279
  messages: [{timeout,#Ref<0.2399112782.903610369.70160>,fetch},
                {cancel_timer,#Ref<0.2399112782.903610369.70160>,false},
                {cancel_timer,#Ref<0.2399112782.903610369.70168>,4},
                {cancel_timer,#Ref<0.2399112782.889192450.118038>,5},
                {cancel_timer,#Ref<0.2399112782.889192450.118039>,5},
                {cancel_timer,#Ref<0.2399112782.889192450.118043>,5},
                {cancel_timer,#Ref<0.2399112782.889192450.118044>,4},
                {cancel_timer,#Ref<0.2399112782.889192450.118045>,5},
                {cancel_timer,#Ref<0.2399112782.889192450.118049>,5},
                {cancel_timer,#Ref<0.2399112782.889192450.118050>,5},
                {cancel_timer,#Ref<0.2399112782.889192450.118051>,4},
                {cancel_timer,#Ref<0.2399112782.889192450.118052>,5},
                {cancel_timer,#Ref<0.2399112782.889192450.118053>,5},
                {cancel_timer,#Ref<0.2399112782.889192450.118054>,5},
                {cancel_timer,#Ref<0.2399112782.889192450.118055>,4},
                {cancel_timer,#Ref<0.2399112782.889192450.118056>,5},
                {cancel_timer,#Ref<0.2399112782.889192450.118060>,5},
                {cancel_timer,#Ref<0.2399112782.889192450.118061>,5},
                {cancel_timer,#Ref<0.2399112782.889192450.118062>,4},
                {cancel_timer,#Ref<0.2399112782.889192450.118063>,5},
                {cancel_timer,#Ref<0.2399112782.889192450.118064>,5},
                {cancel_timer,#Ref<0.2399112782.889192450.118068>,5},
                {cancel_timer,#Ref<0.2399112782.889192450.118069>,5},
                {cancel_timer,#Ref<0.2399112782.889192450.118070>,4},
                {cancel_timer,#Ref<0.2399112782.889192450.118071>,5},
                {cancel_timer,#Ref<0.2399112782.889192450.118074>,5},
                {cancel_timer,#Ref<0.2399112782.889192450.118079>,5},
                {cancel_timer,#Ref<0.2399112782.889192450.118081>,4},
                {cancel_timer,#Ref<0.2399112782.889192450.118083>,5},
                {cancel_timer,#Ref<0.2399112782.889192450.118087>,5},
                {cancel_timer,#Ref<0.2399112782.889192450.118088>,0},
                {cancel_timer,#Ref<0.2399112782.889192450.118089>,4},
                {cancel_timer,#Ref<0.2399112782.889192450.118093>,5},
                {cancel_timer,#Ref<0.2399112782.889192450.118094>,5},
                {cancel_timer,#Ref<0.2399112782.889192450.118095>,3},
                {cancel_timer,#Ref<0.2399112782.889192450.118096>,5},
                {cancel_timer,#Ref<0.2399112782.889192450.118097>,5},
                {cancel_timer,#Ref<0.2399112782.889192450.118098>,5},
                {cancel_timer,#Ref<0.2399112782.889192450.118099>,3},
                {cancel_timer,#Ref<0.2399112782.889192450.118103>,5},
                {cancel_timer,#Ref<0.2399112782.889192450.118104>,5},
                {cancel_timer,#Ref<0.2399112782.889192450.118105>,5},
                {cancel_timer,#Ref<0.2399112782.889192450.118106>,3},
                {cancel_timer,#Ref<0.2399112782.889192450.118107>,5},
                {cancel_timer,#Ref<0.2399112782.889192450.118108>,1},
                {cancel_timer,#Ref<0.2399112782.889192450.118109>,4},
                {cancel_timer,#Ref<0.2399112782.889192450.118113>,5},
                {cancel_timer,#Ref<0.2399112782.889192450.118114>,5},
                {cancel_timer,#Ref<0.2399112782.889192450.118115>,5},
                {cancel_timer,#Ref<0.2399112782.889192450.118116>,4},
                {cancel_timer,#Ref<0.2399112782.889192450.118117>,4},
                {cancel_timer,#Ref<0.2399112782.889192450.118118>,5},
                {cancel_timer,#Ref<0.2399112782.889192450.118119>,5},
                {cancel_timer,#Ref<0.2399112782.889192450.118120>,5},
                {cancel_timer,#Ref<0.2399112782.889192450.118121>,4},
                {cancel_timer,#Ref<0.2399112782.889192450.118122>,5},
                {cancel_timer,#Ref<0.2399112782.889192450.118123>,5},
                {cancel_timer,#Ref<0.2399112782.889192450.118124>,5},
                {cancel_timer,#Ref<0.2399112782.889192450.118125>,5},
                {cancel_timer,#Ref<0.2399112782.889192450.118126>,4},
                {cancel_timer,#Ref<0.2399112782.889192450.118127>,5},
                {cancel_timer,#Ref<0.2399112782.889192450.118128>,5},
                {cancel_timer,#Ref<0.2399112782.889192450.118129>,5},
                {cancel_timer,#Ref<0.2399112782.889192450.118130>,5},
                {cancel_timer,#Ref<0.2399112782.889192450.118131>,4},
                {cancel_timer,#Ref<0.2399112782.889192450.118132>,5},
                {cancel_timer,#Ref<0.2399112782.889192450.118133>,5},
                {cancel_timer,#Ref<0.2399112782.889192450.118134>,5},
                {cancel_timer,#Ref<0.2399112782.889192450.118135>,4},
                {cancel_timer,#Ref<0.2399112782.889192450.118136>,5},
                {cancel_timer,#Ref<0.2399112782.889192450.118137>,5},
                {cancel_timer,#Ref<0.2399112782.889192450.118138>,5},
                {cancel_timer,#Ref<0.2399112782.889192450.118139>,5},
                {cancel_timer,#Ref<0.2399112782.889192450.118140>,4},
                {cancel_timer,#Ref<0.2399112782.889192450.118141>,5},
                {cancel_timer,#Ref<0.2399112782.889192450.118145>,5},
                {cancel_timer,#Ref<0.2399112782.889192450.118146>,5},
                {cancel_timer,#Ref<0.2399112782.889192450.118147>,5},
                {cancel_timer,#Ref<0.2399112782.889192450.118148>,4},
                {cancel_timer,#Ref<0.2399112782.889192450.118149>,5},
                {cancel_timer,#Ref<0.2399112782.889192450.118150>,5},
                {cancel_timer,#Ref<0.2399112782.889192450.118151>,5},
                {cancel_timer,#Ref<0.2399112782.889192450.118152>,5},
                {cancel_timer,#Ref<0.2399112782.889192450.118153>,4},
                {cancel_timer,#Ref<0.2399112782.889192450.118154>,5},
                {timeout,#Ref<0.2399112782.889192450.118159>,fetch},
                {cancel_timer,#Ref<0.2399112782.889192450.118159>,false},
                {cancel_timer,#Ref<0.2399112782.889192450.118165>,0},
                {cancel_timer,#Ref<0.2399112782.889192450.118169>,5},
                {cancel_timer,#Ref<0.2399112782.889192450.118170>,5},
                {cancel_timer,#Ref<0.2399112782.889192450.118173>,5},
                {cancel_timer,#Ref<0.2399112782.889192450.118174>,4},
                {cancel_timer,#Ref<0.2399112782.889192450.118175>,5},
                {cancel_timer,#Ref<0.2399112782.889192450.118176>,5},
                {cancel_timer,#Ref<0.2399112782.889192450.118177>,5},
                {cancel_timer,#Ref<0.2399112782.889192450.118178>,5},
                {cancel_timer,#Ref<0.2399112782.889192450.118179>,4},
                {cancel_timer,#Ref<0.2399112782.889192450.118180>,5},
                {cancel_timer,#Ref<0.2399112782.889192450.118181>,5},
                {cancel_timer,#Ref<0.2399112782.889192450.118182>,5},
                {cancel_timer,#Ref<0.2399112782.889192450.118183>,5},
                {cancel_timer,#Ref<0.2399112782.889192450.118184>,4},
                {cancel_timer,#Ref<0.2399112782.889192450.118185>,5},
                {cancel_timer,#Ref<0.2399112782.889192450.118189>,5},
                {cancel_timer,#Ref<0.2399112782.889192450.118190>,5},
                {cancel_timer,#Ref<0.2399112782.889192450.118191>,4},
                {cancel_timer,#Ref<0.2399112782.889192450.118192>,5},
                {cancel_timer,#Ref<0.2399112782.889192450.118196>,5},
                {cancel_timer,#Ref<0.2399112782.889192450.118197>,5},
                {cancel_timer,#Ref<0.2399112782.889192450.118198>,4},
                {cancel_timer,#Ref<0.2399112782.889192450.118199>,5},
                {timeout,#Ref<0.2399112782.889192450.118203>,fetch},
                {cancel_timer,#Ref<0.2399112782.889192450.118203>,false},
                {cancel_timer,#Ref<0.2399112782.889192450.118206>,5},
                {cancel_timer,#Ref<0.2399112782.889192450.118207>,5},
                {cancel_timer,#Ref<0.2399112782.889192450.118208>,5},
                {cancel_timer,#Ref<0.2399112782.889192450.118212>,4},
                {cancel_timer,#Ref<0.2399112782.889192450.118213>,5},
                {cancel_timer,#Ref<0.2399112782.889192450.118214>,5},
                {cancel_timer,#Ref<0.2399112782.889192450.118215>,4},
                {cancel_timer,#Ref<0.2399112782.889192450.118221>,5},
                {cancel_timer,#Ref<0.2399112782.889192450.118225>,5},
                {cancel_timer,#Ref<0.2399112782.889192450.118226>,4},
                {cancel_timer,#Ref<0.2399112782.889192450.118227>,5},
                {cancel_timer,#Ref<0.2399112782.889192450.118231>,5},
                {cancel_timer,#Ref<0.2399112782.889192450.118235>,4},
                {cancel_timer,#Ref<0.2399112782.889192450.118236>,5},
                {cancel_timer,#Ref<0.2399112782.889192450.118237>,5},
                {cancel_timer,#Ref<0.2399112782.889192450.118238>,2},
                {cancel_timer,#Ref<0.2399112782.889192450.118242>,4},
                {cancel_timer,#Ref<0.2399112782.889192450.118243>,5},
                {cancel_timer,#Ref<0.2399112782.889192450.118244>,5},
                {cancel_timer,#Ref<0.2399112782.889192450.118248>,5},
                {cancel_timer,#Ref<0.2399112782.889192450.118249>,4},
                {cancel_timer,#Ref<0.2399112782.889192450.118250>,5},
                {cancel_timer,#Ref<0.2399112782.889192450.118254>,5},
                {cancel_timer,#Ref<0.2399112782.889192450.118255>,5},
                {cancel_timer,#Ref<0.2399112782.889192450.118256>,4},
                {cancel_timer,#Ref<0.2399112782.889192450.118260>,5},
                {cancel_timer,#Ref<0.2399112782.889192450.118264>,0},
                {cancel_timer,#Ref<0.2399112782.889192450.118265>,4},
                {cancel_timer,#Ref<0.2399112782.889192450.118266>,5},
                {cancel_timer,#Ref<0.2399112782.889192450.118270>,5},
                {cancel_timer,#Ref<0.2399112782.889192450.118271>,5},
                {cancel_timer,#Ref<0.2399112782.889192450.118272>,4},
                {cancel_timer,#Ref<0.2399112782.889192450.118273>,5},
                {cancel_timer,#Ref<0.2399112782.889192450.118274>,5},
                {cancel_timer,#Ref<0.2399112782.889192450.118278>,4},
                {cancel_timer,#Ref<0.2399112782.889192450.118279>,5},
                {cancel_timer,#Ref<0.2399112782.889192450.118280>,5},
                {cancel_timer,#Ref<0.2399112782.889192450.118281>,5},
                {cancel_timer,#Ref<0.2399112782.889192450.118282>,4},
                {cancel_timer,#Ref<0.2399112782.889192450.118283>,5},
                {cancel_timer,#Ref<0.2399112782.889192450.118287>,5},
                {cancel_timer,#Ref<0.2399112782.889192450.118288>,5},
                {cancel_timer,#Ref<0.2399112782.889192450.118289>,5},
                {cancel_timer,#Ref<0.2399112782.889192450.118290>,4},
                {cancel_timer,#Ref<0.2399112782.889192450.118291>,5},
                {cancel_timer,#Ref<0.2399112782.889192450.118295>,5},
                {cancel_timer,#Ref<0.2399112782.889192450.118296>,5},
                {cancel_timer,#Ref<0.2399112782.889192450.118297>,4},
                {cancel_timer,#Ref<0.2399112782.889192450.118298>,5},
                {cancel_timer,#Ref<0.2399112782.889192450.118299>,5},
                {cancel_timer,#Ref<0.2399112782.889192450.118303>,5},
                {cancel_timer,#Ref<0.2399112782.889192450.118304>,4},
                {cancel_timer,#Ref<0.2399112782.889192450.118305>,5},
                {cancel_timer,#Ref<0.2399112782.889192450.118306>,5},
                {cancel_timer,#Ref<0.2399112782.889192450.118307>,5},
                {cancel_timer,#Ref<0.2399112782.889192450.118308>,4},
                {cancel_timer,#Ref<0.2399112782.889192450.118309>,5},
                {cancel_timer,#Ref<0.2399112782.889192450.118313>,5},
                {cancel_timer,#Ref<0.2399112782.889192450.118314>,5},
                {cancel_timer,#Ref<0.2399112782.889192450.118315>,4},
                {cancel_timer,#Ref<0.2399112782.889192450.118316>,5},
                {cancel_timer,#Ref<0.2399112782.889192450.118317>,5},
                {cancel_timer,#Ref<0.2399112782.889192450.118318>,5},
                {cancel_timer,#Ref<0.2399112782.889192450.118319>,4},
                {cancel_timer,#Ref<0.2399112782.889192450.118323>,5},
                {cancel_timer,#Ref<0.2399112782.889192450.118324>,5},
                {cancel_timer,#Ref<0.2399112782.889192450.118325>,5},
                {timeout,#Ref<0.2399112782.889192450.118326>,fetch},
                {cancel_timer,#Ref<0.2399112782.889192450.118326>,false},
                {cancel_timer,#Ref<0.2399112782.889192450.118327>,5},
                {cancel_timer,#Ref<0.2399112782.889192450.118328>,5},
                {cancel_timer,#Ref<0.2399112782.889192450.118329>,5},
                {cancel_timer,#Ref<0.2399112782.889192450.118330>,5},
                {cancel_timer,#Ref<0.2399112782.889192450.118331>,4},
                {cancel_timer,#Ref<0.2399112782.889192450.118332>,5},
                {cancel_timer,#Ref<0.2399112782.889192450.118333>,5},
                {cancel_timer,#Ref<0.2399112782.889192450.118337>,3},
                {cancel_timer,#Ref<0.2399112782.889192450.118338>,5},
                {cancel_timer,#Ref<0.2399112782.889192450.118339>,5},
                {cancel_timer,#Ref<0.2399112782.889192450.118340>,4},
                {cancel_timer,#Ref<0.2399112782.889192450.118344>,5},
                {cancel_timer,#Ref<0.2399112782.889192450.118345>,5},
                {cancel_timer,#Ref<0.2399112782.889192450.118346>,4},
                {cancel_timer,#Ref<0.2399112782.889192450.118347>,5},
                {cancel_timer,#Ref<0.2399112782.889192450.118351>,5},
                {cancel_timer,#Ref<0.2399112782.889192450.118352>,4},
                {cancel_timer,#Ref<0.2399112782.889192450.118353>,5},
                {cancel_timer,#Ref<0.2399112782.889192450.118357>,5},
                {cancel_timer,#Ref<0.2399112782.889192450.118358>,5},
                {cancel_timer,#Ref<0.2399112782.889192450.118359>,4},
                {cancel_timer,#Ref<0.2399112782.889192450.118360>,5},
                {cancel_timer,#Ref<0.2399112782.889192450.118364>,5},
                {cancel_timer,#Ref<0.2399112782.889192450.118365>,5},
                {cancel_timer,#Ref<0.2399112782.889192450.118366>,4},
                {cancel_timer,#Ref<0.2399112782.889192450.118367>,5},
                {cancel_timer,#Ref<0.2399112782.889192450.118371>,5},
                {cancel_timer,#Ref<0.2399112782.889192450.118372>,5},
                {cancel_timer,#Ref<0.2399112782.889192450.118373>,4},
                {cancel_timer,#Ref<0.2399112782.889192450.118377>,5},
                {cancel_timer,#Ref<0.2399112782.889192450.118378>,5},
                {cancel_timer,#Ref<0.2399112782.889192450.118379>,5},
                {cancel_timer,#Ref<0.2399112782.889192450.118383>,4},
                {cancel_timer,#Ref<0.2399112782.889192450.118384>,5},
                {cancel_timer,#Ref<0.2399112782.889192450.118385>,5},
                {cancel_timer,#Ref<0.2399112782.889192450.118389>,5},
                {cancel_timer,#Ref<0.2399112782.889192450.118390>,4},
                {cancel_timer,#Ref<0.2399112782.889192450.118391>,5},
                {cancel_timer,#Ref<0.2399112782.889192450.118392>,5},
                {cancel_timer,#Ref<0.2399112782.889192450.118393>,5},
                {cancel_timer,#Ref<0.2399112782.889192450.118394>,0},
                {cancel_timer,#Ref<0.2399112782.889192450.118395>,5},
                {cancel_timer,#Ref<0.2399112782.889192450.118399>,5},
                {cancel_timer,#Ref<0.2399112782.889192450.118400>,4},
                {cancel_timer,#Ref<0.2399112782.889192450.118401>,5},
                {cancel_timer,#Ref<0.2399112782.889192450.118402>,5},
                {cancel_timer,#Ref<0.2399112782.889192450.118403>,5},
                {cancel_timer,#Ref<0.2399112782.889192450.118404>,4},
                {cancel_timer,#Ref<0.2399112782.889192450.118405>,5},
                {cancel_timer,#Ref<0.2399112782.889192450.118409>,5},
                {cancel_timer,#Ref<0.2399112782.889192450.118410>,5},
                {cancel_timer,#Ref<0.2399112782.889192450.118411>,4},
                {cancel_timer,#Ref<0.2399112782.889192450.118412>,1},
                {cancel_timer,#Ref<0.2399112782.889192450.118413>,5},
                {cancel_timer,#Ref<0.2399112782.889192450.118414>,4},
                {cancel_timer,#Ref<0.2399112782.889192450.118415>,5},
                {cancel_timer,#Ref<0.2399112782.889192450.118419>,5},
                {cancel_timer,#Ref<0.2399112782.889192450.118420>,5},
                {cancel_timer,#Ref<0.2399112782.889192450.118421>,4},
                {cancel_timer,#Ref<0.2399112782.889192450.118422>,5},
                {cancel_timer,#Ref<0.2399112782.889192450.118423>,5},
                {cancel_timer,#Ref<0.2399112782.889192450.118424>,5},
                {cancel_timer,#Ref<0.2399112782.889192450.118425>,4},
                {cancel_timer,#Ref<0.2399112782.889192450.118429>,5},
                {cancel_timer,#Ref<0.2399112782.889192450.118430>,0},
                {cancel_timer,#Ref<0.2399112782.889192450.118431>,5},
                {cancel_timer,#Ref<0.2399112782.889192450.118432>,5},
                {cancel_timer,#Ref<0.2399112782.889192450.118433>,5},
                {cancel_timer,#Ref<0.2399112782.889192450.118434>,4},
                {cancel_timer,#Ref<0.2399112782.889192450.118435>,5},
                {cancel_timer,#Ref<0.2399112782.889192450.118439>,5},
                {cancel_timer,#Ref<0.2399112782.889192450.118440>,1},
                {cancel_timer,#Ref<0.2399112782.889192450.118441>,5},
                {cancel_timer,#Ref<0.2399112782.889192450.118442>,4},
                {cancel_timer,#Ref<0.2399112782.889192450.118443>,5},
                {cancel_timer,#Ref<0.2399112782.889192450.118444>,5},
                {cancel_timer,#Ref<0.2399112782.889192450.118445>,4},
                {cancel_timer,#Ref<0.2399112782.889192450.118449>,5},
                {cancel_timer,#Ref<0.2399112782.889192450.118450>,5},
                {cancel_timer,#Ref<0.2399112782.889192450.118451>,5},
                {cancel_timer,#Ref<0.2399112782.889192450.118452>,4},
                {cancel_timer,#Ref<0.2399112782.889192450.118453>,5},
                {cancel_timer,#Ref<0.2399112782.889192450.118454>,5},
                {cancel_timer,#Ref<0.2399112782.889192450.118455>,5},
                {timeout,#Ref<0.2399112782.889192450.118456>,fetch},
                {'EXIT',<0.10080.8>,normal},
                {'EXIT',<0.10079.8>,normal},
                {'EXIT',<0.10081.8>,normal},
                {'EXIT',<0.10082.8>,normal},
                {'EXIT',<0.10083.8>,normal},
                {'EXIT',<0.10084.8>,normal},
                {'EXIT',<0.10085.8>,normal},
                {'EXIT',<0.10086.8>,normal},
                {'EXIT',<0.10087.8>,normal},
                {'EXIT',<0.10088.8>,normal},
                {'EXIT',<0.10089.8>,normal},
                {'EXIT',<0.10071.8>,normal}]
  links: [<0.931.0>]
  dictionary: []
  trap_exit: true
  status: running
  heap_size: 17731
  stack_size: 27
  reductions: 28849413204
neighbours:



I’ll try and get a simple test case for you.
Regards,
Peter.



On 12 Apr 2019, at 13:38, Raimo Niskanen <[hidden email]> wrote:

On Fri, Apr 12, 2019 at 10:38:42AM +0200, Raimo Niskanen wrote:
On Wed, Apr 10, 2019 at 01:41:06PM +0100, Peter Morgan wrote:
Hello -

We are _sometimes_ seeing cases where a state_timeout in a gen_statem results in the timeout reaching the callback module as info {timeout, Ref, Name} with OTP 21.2.

The crash looks like:

=ERROR REPORT==== 8-Apr-2019::23:37:19.035346 === <0.969.0> gen_statem:error_info/5:1895
** State machine {kafire_fetcher,<<“abc">>,<<“def">>,0} terminating
** Last event = {info,{timeout,#Ref<0.2399112782.889192450.118024>,fetch}}

We crash because we are not expecting info messages - interestingly the following messages are in the queue for the crashed process:


 message_queue_len: 279
 messages: [{timeout,#Ref<0.2399112782.903610369.70160>,fetch},
               {cancel_timer,#Ref<0.2399112782.903610369.70160>,false},
               {cancel_timer,#Ref<0.2399112782.903610369.70168>,4},
               {cancel_timer,#Ref<0.2399112782.889192450.118038>,5},
               {cancel_timer,#Ref<0.2399112782.889192450.118039>,5},
               {cancel_timer,#Ref<0.2399112782.889192450.118043>,5},
               {cancel_timer,#Ref<0.2399112782.889192450.118044>,4},
               {cancel_timer,#Ref<0.2399112782.889192450.118045>,5},
               {cancel_timer,#Ref<0.2399112782.889192450.118049>,5},
               {cancel_timer,#Ref<0.2399112782.889192450.118050>,5},
               {cancel_timer,#Ref<0.2399112782.889192450.118051>,4},
               {cancel_timer,#Ref<0.2399112782.889192450.118052>,5},
               {cancel_timer,#Ref<0.2399112782.889192450.118053>,5},
               {cancel_timer,#Ref<0.2399112782.889192450.118054>,5},
               {cancel_timer,#Ref<0.2399112782.889192450.118055>,4},
               {cancel_timer,#Ref<0.2399112782.889192450.118056>,5},
               {cancel_timer,#Ref<0.2399112782.889192450.118060>,5},

Is that from a process_info() call?


Followed by lots more cancel_timer messages (200ish!). Our gen_statem does use {state_timeout, 500, fetch} so we are expecting a “fetch” timeout to happen, and we use repeat_state_and_data to requeue the state timeout (and also transition to other states). Is it possible that timeouts in gen_statem can be delivered as an info message?

It is not supposed to happen.  This must be a bug.

The gen_statem engine keeps track of the running state_timeout timer
and should never present it as an info event.

When a timer is stopped (or restarted) it is done with an asynchronous
cancel (in OTP 21) so the cancel_timer messages comes from that.
They should be matched against map entries in the internal gen_statem
engine state when they arrive so them being in the inbox may be ok.

However, how come you have so many?  Are you restarting the state_timeout
in a very tight loop of repeat_state:s?  Can you show the essential parts
of the code that causes state_timeout:s?

Also - OTP-21.2; is that your exact OTP version?  What is your stdlib version?

I will look at the relevant parts of the code with the new knowledge that
a state_timeout timer can be lost, probably in combination with
repeat_state and state_timeout restart!


Sorry, the knowledge did not help; I did not find anything suspicous.

More information will be needed...


On master since Jan 21 so in OTP-22.0-rc1 among other changes the timer handling
has been rewritten to use a synchronous cancel, which simplified the code
significantly.  I do not know if that would be worth trying?

/ Raimo Niskanen



Thanks
Peter.


-- 

/ Raimo Niskanen, Erlang/OTP, Ericsson AB
_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions



-- 

/ Raimo Niskanen, Erlang/OTP, Ericsson AB
_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions

-- 

/ Raimo Niskanen, Erlang/OTP, Ericsson AB
_______________________________________________
erlang-questions mailing list
[hidden email] <[hidden email]>
http://erlang.org/mailman/listinfo/erlang-questions <http://erlang.org/mailman/listinfo/erlang-questions>

-- 

/ Raimo Niskanen, Erlang/OTP, Ericsson AB
_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions


_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: gen_statem - state_timeout sometimes reaching callback module as [info, {timeout, Ref, Name}

Raimo Niskanen-11
On Mon, Apr 29, 2019 at 11:49:18AM +0100, Peter Morgan wrote:
> Hi Raimo,
>
> I’ve tried your patch and it hasn’t caused any errors. We are still having problems reproducing the issue, but it at least doesn’t look to have caused any side effects.

Well, that is at least reassuring :-)

So, it is so hard to reproduce that you can not be sure that a patch (any
patch) has fixed it...?

I will put it in our daily builds too, but we do not have many machines
running daily for OTP-21, and this is a backported change from OTP-22, so
it is hard to be sure that the change is exactly the same as in OTP-22
where it runs without problems.

Since it passes your regression test, if it does not cause any regression
errors for our few OTP-21 machines, I will probably place it in queue for next
OTP-21 maintenance patch...

If you get to any further conclusions on whether the patch fixes your
problem or breaks anything, please let me know ASAP.

Thank you for your help in investigating this!

/ Raimo



>
> Regards,
> Peter.
>
> > On 25 Apr 2019, at 08:28, Raimo Niskanen <[hidden email]> wrote:
> >
> > On Wed, Apr 24, 2019 at 06:46:27PM +0100, Peter Morgan wrote:
> >> Hi Raimo,
> >>
> >> I am having great difficultly reproducing in a simple test case. Despite best efforts of hundreds of thousands of processes creating timers and retry_state I’ve not managed to reproduce our production issue - with a vanilla OTP21 without your patch. Will let you know if I do.
> >>
> >
> > Can you try my patch on OTP-21 - pull/2211 to see if it works without
> > errors in your original setup?
> >
> > / Raimo
> >
> >
> >
> >> Thanks again for your help on this issue.
> >>
> >> Regards,
> >> Peter.
> >>
> >>> On 17 Apr 2019, at 15:36, Raimo Niskanen <[hidden email]> wrote:
> >>>
> >>> On Wed, Apr 17, 2019 at 11:54:42AM +0200, Raimo Niskanen wrote:
> >>>> After some discussions with the VM guys this seems to be a bug in
> >>>> gen_statem coming from a misunderstanding of which guarantees that
> >>>> asynchronous timer cancels gives and does not give.
> >>>>
> >>>> The {timeout, Ref, Msg} message may actually come _after_ the
> >>>> {cancel_timer, Ref, false} message.  Heavy load seems to increase the
> >>>> probability from almost zero to very small.
> >>>>
> >>>> So the handling of these messages is broken for gen_statem on OTP-21, but
> >>>> this should be fixed on master (soon to be OTP-22) since gen_statem there
> >>>> does not use asynchronous timer cancels.
> >>>>
> >>>> I will look into how big a diff it will be to rewrite gen_statem in OTP-21.latest
> >>>> to use synchronous timer cancel...
> >>>
> >>> The diff became a fat one, so I created a pull request:
> >>>
> >>>   https://github.com/erlang/otp/pull/2211 <https://github.com/erlang/otp/pull/2211> <https://github.com/erlang/otp/pull/2211 <https://github.com/erlang/otp/pull/2211>>
> >>>
> >>> If you can try it out, that would be awsome!
> >>>
> >>> Best Regards
> >>> / Raimo
> >>>
> >>>
> >>>
> >>>>
> >>>> Thank you for finding a tricky bug!
> >>>> / Raimo
> >>>>
> >>>>
> >>>>
> >>>> On Fri, Apr 12, 2019 at 04:13:27PM +0100, Peter Morgan wrote:
> >>>>> Hi Raimo,
> >>>>>
> >>>>> Thanks for looking at this, and the background on the implementation which is very useful. I am in the process of putting together a simple test case, but having trouble reproducing it. It appears to happen very infrequently - possibly something that is load related? The original output was from the Crash Report which I’ve attached - where you can see the info timeout was received by our gen_statem behaviour callback. Our OTP is 21.2 (without any 21.2.X patches):
> >>>>>
> >>>>> {stdlib,"ERTS  CXC 138 10","3.7”}
> >>>>> {kernel,"ERTS  CXC 138 10","6.2”}
> >>>>>
> >>>>>
> >>>>> Your question - “why so many?” - it could be a failure of a downstream service that is ultimately causing this, and we’re not backing off in this case - something that I’m checking too. However, I still thought it was interesting that the info timeout reaches the behaviour module.
> >>>>>
> >>>>
> >>>>> =CRASH REPORT==== 8-Apr-2019::23:37:19.038905 === <0.969.0> proc_lib:crash_report/4:508
> >>>>> crasher:
> >>>>>   initial call: kafire_fetcher:init/1
> >>>>>   pid: <0.969.0>
> >>>>>   registered_name: []
> >>>>>   exception error: no function clause matching
> >>>>>                    kafire_fetcher:handle_event(info,
> >>>>>                                                {timeout,
> >>>>>                                                 #Ref<0.2399112782.889192450.118024>,
> >>>>>                                                 fetch},
> >>>>>                                                subsequent_fetches,
> >>>>>                                                our_state) (src/kafire_fetcher.erl, line 445)
> >>>>>     in function  gen_statem:call_state_function/5 (gen_statem.erl, line 1662)
> >>>>>     in call from gen_statem:loop_event_state_function/6 (gen_statem.erl, line 1023)
> >>>>>   ancestors: [<0.931.0>,<0.332.0>,kafire_cluster_sup,kafire_sup,<0.227.0>]
> >>>>>   message_queue_len: 279
> >>>>>   messages: [{timeout,#Ref<0.2399112782.903610369.70160>,fetch},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.903610369.70160>,false},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.903610369.70168>,4},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118038>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118039>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118043>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118044>,4},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118045>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118049>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118050>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118051>,4},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118052>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118053>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118054>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118055>,4},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118056>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118060>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118061>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118062>,4},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118063>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118064>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118068>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118069>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118070>,4},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118071>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118074>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118079>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118081>,4},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118083>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118087>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118088>,0},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118089>,4},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118093>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118094>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118095>,3},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118096>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118097>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118098>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118099>,3},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118103>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118104>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118105>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118106>,3},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118107>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118108>,1},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118109>,4},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118113>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118114>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118115>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118116>,4},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118117>,4},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118118>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118119>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118120>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118121>,4},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118122>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118123>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118124>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118125>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118126>,4},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118127>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118128>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118129>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118130>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118131>,4},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118132>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118133>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118134>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118135>,4},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118136>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118137>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118138>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118139>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118140>,4},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118141>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118145>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118146>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118147>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118148>,4},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118149>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118150>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118151>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118152>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118153>,4},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118154>,5},
> >>>>>                 {timeout,#Ref<0.2399112782.889192450.118159>,fetch},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118159>,false},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118165>,0},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118169>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118170>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118173>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118174>,4},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118175>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118176>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118177>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118178>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118179>,4},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118180>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118181>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118182>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118183>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118184>,4},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118185>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118189>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118190>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118191>,4},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118192>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118196>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118197>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118198>,4},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118199>,5},
> >>>>>                 {timeout,#Ref<0.2399112782.889192450.118203>,fetch},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118203>,false},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118206>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118207>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118208>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118212>,4},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118213>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118214>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118215>,4},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118221>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118225>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118226>,4},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118227>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118231>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118235>,4},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118236>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118237>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118238>,2},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118242>,4},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118243>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118244>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118248>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118249>,4},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118250>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118254>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118255>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118256>,4},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118260>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118264>,0},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118265>,4},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118266>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118270>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118271>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118272>,4},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118273>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118274>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118278>,4},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118279>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118280>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118281>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118282>,4},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118283>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118287>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118288>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118289>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118290>,4},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118291>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118295>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118296>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118297>,4},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118298>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118299>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118303>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118304>,4},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118305>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118306>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118307>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118308>,4},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118309>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118313>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118314>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118315>,4},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118316>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118317>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118318>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118319>,4},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118323>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118324>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118325>,5},
> >>>>>                 {timeout,#Ref<0.2399112782.889192450.118326>,fetch},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118326>,false},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118327>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118328>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118329>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118330>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118331>,4},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118332>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118333>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118337>,3},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118338>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118339>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118340>,4},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118344>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118345>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118346>,4},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118347>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118351>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118352>,4},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118353>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118357>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118358>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118359>,4},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118360>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118364>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118365>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118366>,4},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118367>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118371>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118372>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118373>,4},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118377>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118378>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118379>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118383>,4},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118384>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118385>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118389>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118390>,4},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118391>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118392>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118393>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118394>,0},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118395>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118399>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118400>,4},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118401>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118402>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118403>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118404>,4},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118405>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118409>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118410>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118411>,4},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118412>,1},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118413>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118414>,4},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118415>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118419>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118420>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118421>,4},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118422>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118423>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118424>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118425>,4},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118429>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118430>,0},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118431>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118432>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118433>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118434>,4},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118435>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118439>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118440>,1},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118441>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118442>,4},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118443>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118444>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118445>,4},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118449>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118450>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118451>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118452>,4},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118453>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118454>,5},
> >>>>>                 {cancel_timer,#Ref<0.2399112782.889192450.118455>,5},
> >>>>>                 {timeout,#Ref<0.2399112782.889192450.118456>,fetch},
> >>>>>                 {'EXIT',<0.10080.8>,normal},
> >>>>>                 {'EXIT',<0.10079.8>,normal},
> >>>>>                 {'EXIT',<0.10081.8>,normal},
> >>>>>                 {'EXIT',<0.10082.8>,normal},
> >>>>>                 {'EXIT',<0.10083.8>,normal},
> >>>>>                 {'EXIT',<0.10084.8>,normal},
> >>>>>                 {'EXIT',<0.10085.8>,normal},
> >>>>>                 {'EXIT',<0.10086.8>,normal},
> >>>>>                 {'EXIT',<0.10087.8>,normal},
> >>>>>                 {'EXIT',<0.10088.8>,normal},
> >>>>>                 {'EXIT',<0.10089.8>,normal},
> >>>>>                 {'EXIT',<0.10071.8>,normal}]
> >>>>>   links: [<0.931.0>]
> >>>>>   dictionary: []
> >>>>>   trap_exit: true
> >>>>>   status: running
> >>>>>   heap_size: 17731
> >>>>>   stack_size: 27
> >>>>>   reductions: 28849413204
> >>>>> neighbours:
> >>>>
> >>>>>
> >>>>>
> >>>>> I’ll try and get a simple test case for you.
> >>>>> Regards,
> >>>>> Peter.
> >>>>>
> >>>>>
> >>>>>
> >>>>>> On 12 Apr 2019, at 13:38, Raimo Niskanen <[hidden email]> wrote:
> >>>>>>
> >>>>>> On Fri, Apr 12, 2019 at 10:38:42AM +0200, Raimo Niskanen wrote:
> >>>>>>> On Wed, Apr 10, 2019 at 01:41:06PM +0100, Peter Morgan wrote:
> >>>>>>>> Hello -
> >>>>>>>>
> >>>>>>>> We are _sometimes_ seeing cases where a state_timeout in a gen_statem results in the timeout reaching the callback module as info {timeout, Ref, Name} with OTP 21.2.
> >>>>>>>>
> >>>>>>>> The crash looks like:
> >>>>>>>>
> >>>>>>>> =ERROR REPORT==== 8-Apr-2019::23:37:19.035346 === <0.969.0> gen_statem:error_info/5:1895
> >>>>>>>> ** State machine {kafire_fetcher,<<“abc">>,<<“def">>,0} terminating
> >>>>>>>> ** Last event = {info,{timeout,#Ref<0.2399112782.889192450.118024>,fetch}}
> >>>>>>>>
> >>>>>>>> We crash because we are not expecting info messages - interestingly the following messages are in the queue for the crashed process:
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>  message_queue_len: 279
> >>>>>>>>  messages: [{timeout,#Ref<0.2399112782.903610369.70160>,fetch},
> >>>>>>>>                {cancel_timer,#Ref<0.2399112782.903610369.70160>,false},
> >>>>>>>>                {cancel_timer,#Ref<0.2399112782.903610369.70168>,4},
> >>>>>>>>                {cancel_timer,#Ref<0.2399112782.889192450.118038>,5},
> >>>>>>>>                {cancel_timer,#Ref<0.2399112782.889192450.118039>,5},
> >>>>>>>>                {cancel_timer,#Ref<0.2399112782.889192450.118043>,5},
> >>>>>>>>                {cancel_timer,#Ref<0.2399112782.889192450.118044>,4},
> >>>>>>>>                {cancel_timer,#Ref<0.2399112782.889192450.118045>,5},
> >>>>>>>>                {cancel_timer,#Ref<0.2399112782.889192450.118049>,5},
> >>>>>>>>                {cancel_timer,#Ref<0.2399112782.889192450.118050>,5},
> >>>>>>>>                {cancel_timer,#Ref<0.2399112782.889192450.118051>,4},
> >>>>>>>>                {cancel_timer,#Ref<0.2399112782.889192450.118052>,5},
> >>>>>>>>                {cancel_timer,#Ref<0.2399112782.889192450.118053>,5},
> >>>>>>>>                {cancel_timer,#Ref<0.2399112782.889192450.118054>,5},
> >>>>>>>>                {cancel_timer,#Ref<0.2399112782.889192450.118055>,4},
> >>>>>>>>                {cancel_timer,#Ref<0.2399112782.889192450.118056>,5},
> >>>>>>>>                {cancel_timer,#Ref<0.2399112782.889192450.118060>,5},
> >>>>>>>
> >>>>>>> Is that from a process_info() call?
> >>>>>>>
> >>>>>>>>
> >>>>>>>> Followed by lots more cancel_timer messages (200ish!). Our gen_statem does use {state_timeout, 500, fetch} so we are expecting a “fetch” timeout to happen, and we use repeat_state_and_data to requeue the state timeout (and also transition to other states). Is it possible that timeouts in gen_statem can be delivered as an info message?
> >>>>>>>
> >>>>>>> It is not supposed to happen.  This must be a bug.
> >>>>>>>
> >>>>>>> The gen_statem engine keeps track of the running state_timeout timer
> >>>>>>> and should never present it as an info event.
> >>>>>>>
> >>>>>>> When a timer is stopped (or restarted) it is done with an asynchronous
> >>>>>>> cancel (in OTP 21) so the cancel_timer messages comes from that.
> >>>>>>> They should be matched against map entries in the internal gen_statem
> >>>>>>> engine state when they arrive so them being in the inbox may be ok.
> >>>>>>>
> >>>>>>> However, how come you have so many?  Are you restarting the state_timeout
> >>>>>>> in a very tight loop of repeat_state:s?  Can you show the essential parts
> >>>>>>> of the code that causes state_timeout:s?
> >>>>>>>
> >>>>>>> Also - OTP-21.2; is that your exact OTP version?  What is your stdlib version?
> >>>>>>>
> >>>>>>> I will look at the relevant parts of the code with the new knowledge that
> >>>>>>> a state_timeout timer can be lost, probably in combination with
> >>>>>>> repeat_state and state_timeout restart!
> >>>>>>
> >>>>>>
> >>>>>> Sorry, the knowledge did not help; I did not find anything suspicous.
> >>>>>>
> >>>>>> More information will be needed...
> >>>>>>
> >>>>>>>
> >>>>>>> On master since Jan 21 so in OTP-22.0-rc1 among other changes the timer handling
> >>>>>>> has been rewritten to use a synchronous cancel, which simplified the code
> >>>>>>> significantly.  I do not know if that would be worth trying?
> >>>>>>>
> >>>>>>> / Raimo Niskanen
> >>>>>>>
> >>>>>>>
> >>>>>>>>
> >>>>>>>> Thanks
> >>>>>>>> Peter.
> >>>>>>>>
> >>>>>>
> >>>>>> --
> >>>>>>
> >>>>>> / Raimo Niskanen, Erlang/OTP, Ericsson AB
> >>>>>> _______________________________________________
> >>>>>> erlang-questions mailing list
> >>>>>> [hidden email]
> >>>>>> http://erlang.org/mailman/listinfo/erlang-questions
> >>>>>
> >>>>
> >>>>
> >>>> --
> >>>>
> >>>> / Raimo Niskanen, Erlang/OTP, Ericsson AB
> >>>> _______________________________________________
> >>>> erlang-questions mailing list
> >>>> [hidden email]
> >>>> http://erlang.org/mailman/listinfo/erlang-questions
> >>>
> >>> --
> >>>
> >>> / Raimo Niskanen, Erlang/OTP, Ericsson AB
> >>> _______________________________________________
> >>> erlang-questions mailing list
> >>> [hidden email] <mailto:[hidden email]> <mailto:[hidden email] <mailto:[hidden email]>>
> >>> http://erlang.org/mailman/listinfo/erlang-questions <http://erlang.org/mailman/listinfo/erlang-questions> <http://erlang.org/mailman/listinfo/erlang-questions <http://erlang.org/mailman/listinfo/erlang-questions>>
> >
> > --
> >
> > / Raimo Niskanen, Erlang/OTP, Ericsson AB
> > _______________________________________________
> > erlang-questions mailing list
> > [hidden email] <mailto:[hidden email]>
> > http://erlang.org/mailman/listinfo/erlang-questions <http://erlang.org/mailman/listinfo/erlang-questions>

--

/ Raimo Niskanen, Erlang/OTP, Ericsson AB
_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: gen_statem - state_timeout sometimes reaching callback module as [info, {timeout, Ref, Name}

Peter Morgan
Hi Raimo,

On 29 Apr 2019, at 15:53, Raimo Niskanen <[hidden email]> wrote:

On Mon, Apr 29, 2019 at 11:49:18AM +0100, Peter Morgan wrote:
Hi Raimo,

I’ve tried your patch and it hasn’t caused any errors. We are still having problems reproducing the issue, but it at least doesn’t look to have caused any side effects.

Well, that is at least reassuring :-)

So, it is so hard to reproduce that you can not be sure that a patch (any
patch) has fixed it…?

Correct. We’ve managed to get it to happen _once_ pre patch, and suspect that it is some combination of load and/or environment (AWS). Sadly, it is proving elusive to reproduce.



I will put it in our daily builds too, but we do not have many machines
running daily for OTP-21, and this is a backported change from OTP-22, so
it is hard to be sure that the change is exactly the same as in OTP-22
where it runs without problems.

Since it passes your regression test, if it does not cause any regression
errors for our few OTP-21 machines, I will probably place it in queue for next
OTP-21 maintenance patch…

That would be superb if we can get it into the queue for a maintenance patch.


If you get to any further conclusions on whether the patch fixes your
problem or breaks anything, please let me know ASAP.

:) Will do.


Thanks again,
Peter.



Thank you for your help in investigating this!

/ Raimo




Regards,
Peter.

On 25 Apr 2019, at 08:28, Raimo Niskanen <[hidden email]> wrote:

On Wed, Apr 24, 2019 at 06:46:27PM +0100, Peter Morgan wrote:
Hi Raimo,

I am having great difficultly reproducing in a simple test case. Despite best efforts of hundreds of thousands of processes creating timers and retry_state I’ve not managed to reproduce our production issue - with a vanilla OTP21 without your patch. Will let you know if I do.


Can you try my patch on OTP-21 - pull/2211 to see if it works without
errors in your original setup?

/ Raimo



Thanks again for your help on this issue.

Regards,
Peter.

On 17 Apr 2019, at 15:36, Raimo Niskanen <[hidden email]> wrote:

On Wed, Apr 17, 2019 at 11:54:42AM +0200, Raimo Niskanen wrote:
After some discussions with the VM guys this seems to be a bug in
gen_statem coming from a misunderstanding of which guarantees that
asynchronous timer cancels gives and does not give.

The {timeout, Ref, Msg} message may actually come _after_ the
{cancel_timer, Ref, false} message.  Heavy load seems to increase the
probability from almost zero to very small.

So the handling of these messages is broken for gen_statem on OTP-21, but
this should be fixed on master (soon to be OTP-22) since gen_statem there
does not use asynchronous timer cancels.

I will look into how big a diff it will be to rewrite gen_statem in OTP-21.latest
to use synchronous timer cancel...

The diff became a fat one, so I created a pull request:

 https://github.com/erlang/otp/pull/2211 <https://github.com/erlang/otp/pull/2211> <https://github.com/erlang/otp/pull/2211 <https://github.com/erlang/otp/pull/2211>>

If you can try it out, that would be awsome!

Best Regards
/ Raimo




Thank you for finding a tricky bug!
/ Raimo



On Fri, Apr 12, 2019 at 04:13:27PM +0100, Peter Morgan wrote:
Hi Raimo,

Thanks for looking at this, and the background on the implementation which is very useful. I am in the process of putting together a simple test case, but having trouble reproducing it. It appears to happen very infrequently - possibly something that is load related? The original output was from the Crash Report which I’ve attached - where you can see the info timeout was received by our gen_statem behaviour callback. Our OTP is 21.2 (without any 21.2.X patches):

{stdlib,"ERTS  CXC 138 10","3.7”}
{kernel,"ERTS  CXC 138 10","6.2”}


Your question - “why so many?” - it could be a failure of a downstream service that is ultimately causing this, and we’re not backing off in this case - something that I’m checking too. However, I still thought it was interesting that the info timeout reaches the behaviour module.


=CRASH REPORT==== 8-Apr-2019::23:37:19.038905 === <0.969.0> proc_lib:crash_report/4:508
crasher:
 initial call: kafire_fetcher:init/1
 pid: <0.969.0>
 registered_name: []
 exception error: no function clause matching 
                  kafire_fetcher:handle_event(info,
                                              {timeout,
                                               #Ref<0.2399112782.889192450.118024>,
                                               fetch},
                                              subsequent_fetches,
                                              our_state) (src/kafire_fetcher.erl, line 445)
   in function  gen_statem:call_state_function/5 (gen_statem.erl, line 1662)
   in call from gen_statem:loop_event_state_function/6 (gen_statem.erl, line 1023)
 ancestors: [<0.931.0>,<0.332.0>,kafire_cluster_sup,kafire_sup,<0.227.0>]
 message_queue_len: 279
 messages: [{timeout,#Ref<0.2399112782.903610369.70160>,fetch},
               {cancel_timer,#Ref<0.2399112782.903610369.70160>,false},
               {cancel_timer,#Ref<0.2399112782.903610369.70168>,4},
               {cancel_timer,#Ref<0.2399112782.889192450.118038>,5},
               {cancel_timer,#Ref<0.2399112782.889192450.118039>,5},
               {cancel_timer,#Ref<0.2399112782.889192450.118043>,5},
               {cancel_timer,#Ref<0.2399112782.889192450.118044>,4},
               {cancel_timer,#Ref<0.2399112782.889192450.118045>,5},
               {cancel_timer,#Ref<0.2399112782.889192450.118049>,5},
               {cancel_timer,#Ref<0.2399112782.889192450.118050>,5},
               {cancel_timer,#Ref<0.2399112782.889192450.118051>,4},
               {cancel_timer,#Ref<0.2399112782.889192450.118052>,5},
               {cancel_timer,#Ref<0.2399112782.889192450.118053>,5},
               {cancel_timer,#Ref<0.2399112782.889192450.118054>,5},
               {cancel_timer,#Ref<0.2399112782.889192450.118055>,4},
               {cancel_timer,#Ref<0.2399112782.889192450.118056>,5},
               {cancel_timer,#Ref<0.2399112782.889192450.118060>,5},
               {cancel_timer,#Ref<0.2399112782.889192450.118061>,5},
               {cancel_timer,#Ref<0.2399112782.889192450.118062>,4},
               {cancel_timer,#Ref<0.2399112782.889192450.118063>,5},
               {cancel_timer,#Ref<0.2399112782.889192450.118064>,5},
               {cancel_timer,#Ref<0.2399112782.889192450.118068>,5},
               {cancel_timer,#Ref<0.2399112782.889192450.118069>,5},
               {cancel_timer,#Ref<0.2399112782.889192450.118070>,4},
               {cancel_timer,#Ref<0.2399112782.889192450.118071>,5},
               {cancel_timer,#Ref<0.2399112782.889192450.118074>,5},
               {cancel_timer,#Ref<0.2399112782.889192450.118079>,5},
               {cancel_timer,#Ref<0.2399112782.889192450.118081>,4},
               {cancel_timer,#Ref<0.2399112782.889192450.118083>,5},
               {cancel_timer,#Ref<0.2399112782.889192450.118087>,5},
               {cancel_timer,#Ref<0.2399112782.889192450.118088>,0},
               {cancel_timer,#Ref<0.2399112782.889192450.118089>,4},
               {cancel_timer,#Ref<0.2399112782.889192450.118093>,5},
               {cancel_timer,#Ref<0.2399112782.889192450.118094>,5},
               {cancel_timer,#Ref<0.2399112782.889192450.118095>,3},
               {cancel_timer,#Ref<0.2399112782.889192450.118096>,5},
               {cancel_timer,#Ref<0.2399112782.889192450.118097>,5},
               {cancel_timer,#Ref<0.2399112782.889192450.118098>,5},
               {cancel_timer,#Ref<0.2399112782.889192450.118099>,3},
               {cancel_timer,#Ref<0.2399112782.889192450.118103>,5},
               {cancel_timer,#Ref<0.2399112782.889192450.118104>,5},
               {cancel_timer,#Ref<0.2399112782.889192450.118105>,5},
               {cancel_timer,#Ref<0.2399112782.889192450.118106>,3},
               {cancel_timer,#Ref<0.2399112782.889192450.118107>,5},
               {cancel_timer,#Ref<0.2399112782.889192450.118108>,1},
               {cancel_timer,#Ref<0.2399112782.889192450.118109>,4},
               {cancel_timer,#Ref<0.2399112782.889192450.118113>,5},
               {cancel_timer,#Ref<0.2399112782.889192450.118114>,5},
               {cancel_timer,#Ref<0.2399112782.889192450.118115>,5},
               {cancel_timer,#Ref<0.2399112782.889192450.118116>,4},
               {cancel_timer,#Ref<0.2399112782.889192450.118117>,4},
               {cancel_timer,#Ref<0.2399112782.889192450.118118>,5},
               {cancel_timer,#Ref<0.2399112782.889192450.118119>,5},
               {cancel_timer,#Ref<0.2399112782.889192450.118120>,5},
               {cancel_timer,#Ref<0.2399112782.889192450.118121>,4},
               {cancel_timer,#Ref<0.2399112782.889192450.118122>,5},
               {cancel_timer,#Ref<0.2399112782.889192450.118123>,5},
               {cancel_timer,#Ref<0.2399112782.889192450.118124>,5},
               {cancel_timer,#Ref<0.2399112782.889192450.118125>,5},
               {cancel_timer,#Ref<0.2399112782.889192450.118126>,4},
               {cancel_timer,#Ref<0.2399112782.889192450.118127>,5},
               {cancel_timer,#Ref<0.2399112782.889192450.118128>,5},
               {cancel_timer,#Ref<0.2399112782.889192450.118129>,5},
               {cancel_timer,#Ref<0.2399112782.889192450.118130>,5},
               {cancel_timer,#Ref<0.2399112782.889192450.118131>,4},
               {cancel_timer,#Ref<0.2399112782.889192450.118132>,5},
               {cancel_timer,#Ref<0.2399112782.889192450.118133>,5},
               {cancel_timer,#Ref<0.2399112782.889192450.118134>,5},
               {cancel_timer,#Ref<0.2399112782.889192450.118135>,4},
               {cancel_timer,#Ref<0.2399112782.889192450.118136>,5},
               {cancel_timer,#Ref<0.2399112782.889192450.118137>,5},
               {cancel_timer,#Ref<0.2399112782.889192450.118138>,5},
               {cancel_timer,#Ref<0.2399112782.889192450.118139>,5},
               {cancel_timer,#Ref<0.2399112782.889192450.118140>,4},
               {cancel_timer,#Ref<0.2399112782.889192450.118141>,5},
               {cancel_timer,#Ref<0.2399112782.889192450.118145>,5},
               {cancel_timer,#Ref<0.2399112782.889192450.118146>,5},
               {cancel_timer,#Ref<0.2399112782.889192450.118147>,5},
               {cancel_timer,#Ref<0.2399112782.889192450.118148>,4},
               {cancel_timer,#Ref<0.2399112782.889192450.118149>,5},
               {cancel_timer,#Ref<0.2399112782.889192450.118150>,5},
               {cancel_timer,#Ref<0.2399112782.889192450.118151>,5},
               {cancel_timer,#Ref<0.2399112782.889192450.118152>,5},
               {cancel_timer,#Ref<0.2399112782.889192450.118153>,4},
               {cancel_timer,#Ref<0.2399112782.889192450.118154>,5},
               {timeout,#Ref<0.2399112782.889192450.118159>,fetch},
               {cancel_timer,#Ref<0.2399112782.889192450.118159>,false},
               {cancel_timer,#Ref<0.2399112782.889192450.118165>,0},
               {cancel_timer,#Ref<0.2399112782.889192450.118169>,5},
               {cancel_timer,#Ref<0.2399112782.889192450.118170>,5},
               {cancel_timer,#Ref<0.2399112782.889192450.118173>,5},
               {cancel_timer,#Ref<0.2399112782.889192450.118174>,4},
               {cancel_timer,#Ref<0.2399112782.889192450.118175>,5},
               {cancel_timer,#Ref<0.2399112782.889192450.118176>,5},
               {cancel_timer,#Ref<0.2399112782.889192450.118177>,5},
               {cancel_timer,#Ref<0.2399112782.889192450.118178>,5},
               {cancel_timer,#Ref<0.2399112782.889192450.118179>,4},
               {cancel_timer,#Ref<0.2399112782.889192450.118180>,5},
               {cancel_timer,#Ref<0.2399112782.889192450.118181>,5},
               {cancel_timer,#Ref<0.2399112782.889192450.118182>,5},
               {cancel_timer,#Ref<0.2399112782.889192450.118183>,5},
               {cancel_timer,#Ref<0.2399112782.889192450.118184>,4},
               {cancel_timer,#Ref<0.2399112782.889192450.118185>,5},
               {cancel_timer,#Ref<0.2399112782.889192450.118189>,5},
               {cancel_timer,#Ref<0.2399112782.889192450.118190>,5},
               {cancel_timer,#Ref<0.2399112782.889192450.118191>,4},
               {cancel_timer,#Ref<0.2399112782.889192450.118192>,5},
               {cancel_timer,#Ref<0.2399112782.889192450.118196>,5},
               {cancel_timer,#Ref<0.2399112782.889192450.118197>,5},
               {cancel_timer,#Ref<0.2399112782.889192450.118198>,4},
               {cancel_timer,#Ref<0.2399112782.889192450.118199>,5},
               {timeout,#Ref<0.2399112782.889192450.118203>,fetch},
               {cancel_timer,#Ref<0.2399112782.889192450.118203>,false},
               {cancel_timer,#Ref<0.2399112782.889192450.118206>,5},
               {cancel_timer,#Ref<0.2399112782.889192450.118207>,5},
               {cancel_timer,#Ref<0.2399112782.889192450.118208>,5},
               {cancel_timer,#Ref<0.2399112782.889192450.118212>,4},
               {cancel_timer,#Ref<0.2399112782.889192450.118213>,5},
               {cancel_timer,#Ref<0.2399112782.889192450.118214>,5},
               {cancel_timer,#Ref<0.2399112782.889192450.118215>,4},
               {cancel_timer,#Ref<0.2399112782.889192450.118221>,5},
               {cancel_timer,#Ref<0.2399112782.889192450.118225>,5},
               {cancel_timer,#Ref<0.2399112782.889192450.118226>,4},
               {cancel_timer,#Ref<0.2399112782.889192450.118227>,5},
               {cancel_timer,#Ref<0.2399112782.889192450.118231>,5},
               {cancel_timer,#Ref<0.2399112782.889192450.118235>,4},
               {cancel_timer,#Ref<0.2399112782.889192450.118236>,5},
               {cancel_timer,#Ref<0.2399112782.889192450.118237>,5},
               {cancel_timer,#Ref<0.2399112782.889192450.118238>,2},
               {cancel_timer,#Ref<0.2399112782.889192450.118242>,4},
               {cancel_timer,#Ref<0.2399112782.889192450.118243>,5},
               {cancel_timer,#Ref<0.2399112782.889192450.118244>,5},
               {cancel_timer,#Ref<0.2399112782.889192450.118248>,5},
               {cancel_timer,#Ref<0.2399112782.889192450.118249>,4},
               {cancel_timer,#Ref<0.2399112782.889192450.118250>,5},
               {cancel_timer,#Ref<0.2399112782.889192450.118254>,5},
               {cancel_timer,#Ref<0.2399112782.889192450.118255>,5},
               {cancel_timer,#Ref<0.2399112782.889192450.118256>,4},
               {cancel_timer,#Ref<0.2399112782.889192450.118260>,5},
               {cancel_timer,#Ref<0.2399112782.889192450.118264>,0},
               {cancel_timer,#Ref<0.2399112782.889192450.118265>,4},
               {cancel_timer,#Ref<0.2399112782.889192450.118266>,5},
               {cancel_timer,#Ref<0.2399112782.889192450.118270>,5},
               {cancel_timer,#Ref<0.2399112782.889192450.118271>,5},
               {cancel_timer,#Ref<0.2399112782.889192450.118272>,4},
               {cancel_timer,#Ref<0.2399112782.889192450.118273>,5},
               {cancel_timer,#Ref<0.2399112782.889192450.118274>,5},
               {cancel_timer,#Ref<0.2399112782.889192450.118278>,4},
               {cancel_timer,#Ref<0.2399112782.889192450.118279>,5},
               {cancel_timer,#Ref<0.2399112782.889192450.118280>,5},
               {cancel_timer,#Ref<0.2399112782.889192450.118281>,5},
               {cancel_timer,#Ref<0.2399112782.889192450.118282>,4},
               {cancel_timer,#Ref<0.2399112782.889192450.118283>,5},
               {cancel_timer,#Ref<0.2399112782.889192450.118287>,5},
               {cancel_timer,#Ref<0.2399112782.889192450.118288>,5},
               {cancel_timer,#Ref<0.2399112782.889192450.118289>,5},
               {cancel_timer,#Ref<0.2399112782.889192450.118290>,4},
               {cancel_timer,#Ref<0.2399112782.889192450.118291>,5},
               {cancel_timer,#Ref<0.2399112782.889192450.118295>,5},
               {cancel_timer,#Ref<0.2399112782.889192450.118296>,5},
               {cancel_timer,#Ref<0.2399112782.889192450.118297>,4},
               {cancel_timer,#Ref<0.2399112782.889192450.118298>,5},
               {cancel_timer,#Ref<0.2399112782.889192450.118299>,5},
               {cancel_timer,#Ref<0.2399112782.889192450.118303>,5},
               {cancel_timer,#Ref<0.2399112782.889192450.118304>,4},
               {cancel_timer,#Ref<0.2399112782.889192450.118305>,5},
               {cancel_timer,#Ref<0.2399112782.889192450.118306>,5},
               {cancel_timer,#Ref<0.2399112782.889192450.118307>,5},
               {cancel_timer,#Ref<0.2399112782.889192450.118308>,4},
               {cancel_timer,#Ref<0.2399112782.889192450.118309>,5},
               {cancel_timer,#Ref<0.2399112782.889192450.118313>,5},
               {cancel_timer,#Ref<0.2399112782.889192450.118314>,5},
               {cancel_timer,#Ref<0.2399112782.889192450.118315>,4},
               {cancel_timer,#Ref<0.2399112782.889192450.118316>,5},
               {cancel_timer,#Ref<0.2399112782.889192450.118317>,5},
               {cancel_timer,#Ref<0.2399112782.889192450.118318>,5},
               {cancel_timer,#Ref<0.2399112782.889192450.118319>,4},
               {cancel_timer,#Ref<0.2399112782.889192450.118323>,5},
               {cancel_timer,#Ref<0.2399112782.889192450.118324>,5},
               {cancel_timer,#Ref<0.2399112782.889192450.118325>,5},
               {timeout,#Ref<0.2399112782.889192450.118326>,fetch},
               {cancel_timer,#Ref<0.2399112782.889192450.118326>,false},
               {cancel_timer,#Ref<0.2399112782.889192450.118327>,5},
               {cancel_timer,#Ref<0.2399112782.889192450.118328>,5},
               {cancel_timer,#Ref<0.2399112782.889192450.118329>,5},
               {cancel_timer,#Ref<0.2399112782.889192450.118330>,5},
               {cancel_timer,#Ref<0.2399112782.889192450.118331>,4},
               {cancel_timer,#Ref<0.2399112782.889192450.118332>,5},
               {cancel_timer,#Ref<0.2399112782.889192450.118333>,5},
               {cancel_timer,#Ref<0.2399112782.889192450.118337>,3},
               {cancel_timer,#Ref<0.2399112782.889192450.118338>,5},
               {cancel_timer,#Ref<0.2399112782.889192450.118339>,5},
               {cancel_timer,#Ref<0.2399112782.889192450.118340>,4},
               {cancel_timer,#Ref<0.2399112782.889192450.118344>,5},
               {cancel_timer,#Ref<0.2399112782.889192450.118345>,5},
               {cancel_timer,#Ref<0.2399112782.889192450.118346>,4},
               {cancel_timer,#Ref<0.2399112782.889192450.118347>,5},
               {cancel_timer,#Ref<0.2399112782.889192450.118351>,5},
               {cancel_timer,#Ref<0.2399112782.889192450.118352>,4},
               {cancel_timer,#Ref<0.2399112782.889192450.118353>,5},
               {cancel_timer,#Ref<0.2399112782.889192450.118357>,5},
               {cancel_timer,#Ref<0.2399112782.889192450.118358>,5},
               {cancel_timer,#Ref<0.2399112782.889192450.118359>,4},
               {cancel_timer,#Ref<0.2399112782.889192450.118360>,5},
               {cancel_timer,#Ref<0.2399112782.889192450.118364>,5},
               {cancel_timer,#Ref<0.2399112782.889192450.118365>,5},
               {cancel_timer,#Ref<0.2399112782.889192450.118366>,4},
               {cancel_timer,#Ref<0.2399112782.889192450.118367>,5},
               {cancel_timer,#Ref<0.2399112782.889192450.118371>,5},
               {cancel_timer,#Ref<0.2399112782.889192450.118372>,5},
               {cancel_timer,#Ref<0.2399112782.889192450.118373>,4},
               {cancel_timer,#Ref<0.2399112782.889192450.118377>,5},
               {cancel_timer,#Ref<0.2399112782.889192450.118378>,5},
               {cancel_timer,#Ref<0.2399112782.889192450.118379>,5},
               {cancel_timer,#Ref<0.2399112782.889192450.118383>,4},
               {cancel_timer,#Ref<0.2399112782.889192450.118384>,5},
               {cancel_timer,#Ref<0.2399112782.889192450.118385>,5},
               {cancel_timer,#Ref<0.2399112782.889192450.118389>,5},
               {cancel_timer,#Ref<0.2399112782.889192450.118390>,4},
               {cancel_timer,#Ref<0.2399112782.889192450.118391>,5},
               {cancel_timer,#Ref<0.2399112782.889192450.118392>,5},
               {cancel_timer,#Ref<0.2399112782.889192450.118393>,5},
               {cancel_timer,#Ref<0.2399112782.889192450.118394>,0},
               {cancel_timer,#Ref<0.2399112782.889192450.118395>,5},
               {cancel_timer,#Ref<0.2399112782.889192450.118399>,5},
               {cancel_timer,#Ref<0.2399112782.889192450.118400>,4},
               {cancel_timer,#Ref<0.2399112782.889192450.118401>,5},
               {cancel_timer,#Ref<0.2399112782.889192450.118402>,5},
               {cancel_timer,#Ref<0.2399112782.889192450.118403>,5},
               {cancel_timer,#Ref<0.2399112782.889192450.118404>,4},
               {cancel_timer,#Ref<0.2399112782.889192450.118405>,5},
               {cancel_timer,#Ref<0.2399112782.889192450.118409>,5},
               {cancel_timer,#Ref<0.2399112782.889192450.118410>,5},
               {cancel_timer,#Ref<0.2399112782.889192450.118411>,4},
               {cancel_timer,#Ref<0.2399112782.889192450.118412>,1},
               {cancel_timer,#Ref<0.2399112782.889192450.118413>,5},
               {cancel_timer,#Ref<0.2399112782.889192450.118414>,4},
               {cancel_timer,#Ref<0.2399112782.889192450.118415>,5},
               {cancel_timer,#Ref<0.2399112782.889192450.118419>,5},
               {cancel_timer,#Ref<0.2399112782.889192450.118420>,5},
               {cancel_timer,#Ref<0.2399112782.889192450.118421>,4},
               {cancel_timer,#Ref<0.2399112782.889192450.118422>,5},
               {cancel_timer,#Ref<0.2399112782.889192450.118423>,5},
               {cancel_timer,#Ref<0.2399112782.889192450.118424>,5},
               {cancel_timer,#Ref<0.2399112782.889192450.118425>,4},
               {cancel_timer,#Ref<0.2399112782.889192450.118429>,5},
               {cancel_timer,#Ref<0.2399112782.889192450.118430>,0},
               {cancel_timer,#Ref<0.2399112782.889192450.118431>,5},
               {cancel_timer,#Ref<0.2399112782.889192450.118432>,5},
               {cancel_timer,#Ref<0.2399112782.889192450.118433>,5},
               {cancel_timer,#Ref<0.2399112782.889192450.118434>,4},
               {cancel_timer,#Ref<0.2399112782.889192450.118435>,5},
               {cancel_timer,#Ref<0.2399112782.889192450.118439>,5},
               {cancel_timer,#Ref<0.2399112782.889192450.118440>,1},
               {cancel_timer,#Ref<0.2399112782.889192450.118441>,5},
               {cancel_timer,#Ref<0.2399112782.889192450.118442>,4},
               {cancel_timer,#Ref<0.2399112782.889192450.118443>,5},
               {cancel_timer,#Ref<0.2399112782.889192450.118444>,5},
               {cancel_timer,#Ref<0.2399112782.889192450.118445>,4},
               {cancel_timer,#Ref<0.2399112782.889192450.118449>,5},
               {cancel_timer,#Ref<0.2399112782.889192450.118450>,5},
               {cancel_timer,#Ref<0.2399112782.889192450.118451>,5},
               {cancel_timer,#Ref<0.2399112782.889192450.118452>,4},
               {cancel_timer,#Ref<0.2399112782.889192450.118453>,5},
               {cancel_timer,#Ref<0.2399112782.889192450.118454>,5},
               {cancel_timer,#Ref<0.2399112782.889192450.118455>,5},
               {timeout,#Ref<0.2399112782.889192450.118456>,fetch},
               {'EXIT',<0.10080.8>,normal},
               {'EXIT',<0.10079.8>,normal},
               {'EXIT',<0.10081.8>,normal},
               {'EXIT',<0.10082.8>,normal},
               {'EXIT',<0.10083.8>,normal},
               {'EXIT',<0.10084.8>,normal},
               {'EXIT',<0.10085.8>,normal},
               {'EXIT',<0.10086.8>,normal},
               {'EXIT',<0.10087.8>,normal},
               {'EXIT',<0.10088.8>,normal},
               {'EXIT',<0.10089.8>,normal},
               {'EXIT',<0.10071.8>,normal}]
 links: [<0.931.0>]
 dictionary: []
 trap_exit: true
 status: running
 heap_size: 17731
 stack_size: 27
 reductions: 28849413204
neighbours:



I’ll try and get a simple test case for you.
Regards,
Peter.



On 12 Apr 2019, at 13:38, Raimo Niskanen <[hidden email]> wrote:

On Fri, Apr 12, 2019 at 10:38:42AM +0200, Raimo Niskanen wrote:
On Wed, Apr 10, 2019 at 01:41:06PM +0100, Peter Morgan wrote:
Hello -

We are _sometimes_ seeing cases where a state_timeout in a gen_statem results in the timeout reaching the callback module as info {timeout, Ref, Name} with OTP 21.2.

The crash looks like:

=ERROR REPORT==== 8-Apr-2019::23:37:19.035346 === <0.969.0> gen_statem:error_info/5:1895
** State machine {kafire_fetcher,<<“abc">>,<<“def">>,0} terminating
** Last event = {info,{timeout,#Ref<0.2399112782.889192450.118024>,fetch}}

We crash because we are not expecting info messages - interestingly the following messages are in the queue for the crashed process:


message_queue_len: 279
messages: [{timeout,#Ref<0.2399112782.903610369.70160>,fetch},
              {cancel_timer,#Ref<0.2399112782.903610369.70160>,false},
              {cancel_timer,#Ref<0.2399112782.903610369.70168>,4},
              {cancel_timer,#Ref<0.2399112782.889192450.118038>,5},
              {cancel_timer,#Ref<0.2399112782.889192450.118039>,5},
              {cancel_timer,#Ref<0.2399112782.889192450.118043>,5},
              {cancel_timer,#Ref<0.2399112782.889192450.118044>,4},
              {cancel_timer,#Ref<0.2399112782.889192450.118045>,5},
              {cancel_timer,#Ref<0.2399112782.889192450.118049>,5},
              {cancel_timer,#Ref<0.2399112782.889192450.118050>,5},
              {cancel_timer,#Ref<0.2399112782.889192450.118051>,4},
              {cancel_timer,#Ref<0.2399112782.889192450.118052>,5},
              {cancel_timer,#Ref<0.2399112782.889192450.118053>,5},
              {cancel_timer,#Ref<0.2399112782.889192450.118054>,5},
              {cancel_timer,#Ref<0.2399112782.889192450.118055>,4},
              {cancel_timer,#Ref<0.2399112782.889192450.118056>,5},
              {cancel_timer,#Ref<0.2399112782.889192450.118060>,5},

Is that from a process_info() call?


Followed by lots more cancel_timer messages (200ish!). Our gen_statem does use {state_timeout, 500, fetch} so we are expecting a “fetch” timeout to happen, and we use repeat_state_and_data to requeue the state timeout (and also transition to other states). Is it possible that timeouts in gen_statem can be delivered as an info message?

It is not supposed to happen.  This must be a bug.

The gen_statem engine keeps track of the running state_timeout timer
and should never present it as an info event.

When a timer is stopped (or restarted) it is done with an asynchronous
cancel (in OTP 21) so the cancel_timer messages comes from that.
They should be matched against map entries in the internal gen_statem
engine state when they arrive so them being in the inbox may be ok.

However, how come you have so many?  Are you restarting the state_timeout
in a very tight loop of repeat_state:s?  Can you show the essential parts
of the code that causes state_timeout:s?

Also - OTP-21.2; is that your exact OTP version?  What is your stdlib version?

I will look at the relevant parts of the code with the new knowledge that
a state_timeout timer can be lost, probably in combination with
repeat_state and state_timeout restart!


Sorry, the knowledge did not help; I did not find anything suspicous.

More information will be needed...


On master since Jan 21 so in OTP-22.0-rc1 among other changes the timer handling
has been rewritten to use a synchronous cancel, which simplified the code
significantly.  I do not know if that would be worth trying?

/ Raimo Niskanen



Thanks
Peter.


-- 

/ Raimo Niskanen, Erlang/OTP, Ericsson AB
_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions



-- 

/ Raimo Niskanen, Erlang/OTP, Ericsson AB
_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions

-- 

/ Raimo Niskanen, Erlang/OTP, Ericsson AB
_______________________________________________
erlang-questions mailing list
[hidden email] <[hidden email]> <[hidden email] <[hidden email]>>
http://erlang.org/mailman/listinfo/erlang-questions <http://erlang.org/mailman/listinfo/erlang-questions> <http://erlang.org/mailman/listinfo/erlang-questions <http://erlang.org/mailman/listinfo/erlang-questions>>

-- 

/ Raimo Niskanen, Erlang/OTP, Ericsson AB
_______________________________________________
erlang-questions mailing list
[hidden email] <[hidden email]>
http://erlang.org/mailman/listinfo/erlang-questions <http://erlang.org/mailman/listinfo/erlang-questions>

-- 

/ Raimo Niskanen, Erlang/OTP, Ericsson AB
_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions


_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions