missing line numbers when gen_statem crashes

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

missing line numbers when gen_statem crashes

Torben Hoffmann
Hi,

I have a crashing gen_statem, but to the best of my knowledge I cannot see from the stock error messages which line my gen_statem dies on.

I have searched high and low in my console output, but this is the only thing that I could find:

=CRASH REPORT==== 13-Jun-2018::11:17:28 ===
  crasher:
    initial call: eee_sched_shard:init/1
    pid: <0.3149.0>
    registered_name: []
    exception error: {badrecord,trans_opts}
      in function  gen_statem:loop_event_actions_list/10 (gen_statem.erl, line 1210)
    ancestors: [<0.3148.0>,<0.3114.0>,<0.3107.0>,eee_sched_streams_sup,
                  eee_sched_task_sup,eee_sched_sup,<0.2867.0>]
    message_queue_len: 0
    messages: []
    links: [<0.3148.0>]
    dictionary: []
    trap_exit: false
    status: running
    heap_size: 6772
    stack_size: 27
    reductions: 58869
  neighbours:

I'm about to turn on debug mode, but I don't think that that should be the answer to this kind of problem. Line numbers should come out by default.

Cheers,
Torben
--

_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: missing line numbers when gen_statem crashes

Raimo Niskanen-2
On Wed, Jun 13, 2018 at 11:51:31AM +0200, Torben Hoffmann wrote:

> Hi,
>
> I have a crashing gen_statem, but to the best of my knowledge I cannot see
> from the stock error messages which line my gen_statem dies on.
>
> I have searched high and low in my console output, but this is the only
> thing that I could find:
>
> =CRASH REPORT==== 13-Jun-2018::11:17:28 ===
>   crasher:
>     initial call: eee_sched_shard:init/1
>     pid: <0.3149.0>
>     registered_name: []
>     exception error: {badrecord,trans_opts}
>       in function  gen_statem:loop_event_actions_list/10 (gen_statem.erl,
> line 1210)

This is an internal crash in gen_statem - I think i recognize this
badrecord at line 1210.  Your callback module returns a bad action in the
actions list, and when gen_statem terminates there is a bug that causes it
to crash instead, hiding the informative reason.

It might be so that you return an action that is not allowed from a state
enter call, or simply a not recognized action.

Since the error is "cosmetic" or rather "not in the function", I did not
patch this in the 20.3 track.  The bad diagnostics error was introduced in
20.3 and is corrected in 21.0-rc1 and forwards.


>     ancestors: [<0.3148.0>,<0.3114.0>,<0.3107.0>,eee_sched_streams_sup,
>                   eee_sched_task_sup,eee_sched_sup,<0.2867.0>]
>     message_queue_len: 0
>     messages: []
>     links: [<0.3148.0>]
>     dictionary: []
>     trap_exit: false
>     status: running
>     heap_size: 6772
>     stack_size: 27
>     reductions: 58869
>   neighbours:
>
> I'm about to turn on debug mode, but I don't think that that should be the
> answer to this kind of problem. Line numbers should come out by default.

In this case no line number can be presented since the callback module
returned a bad term, and it is when processing the return value that the
error is discovered.  In this case the bad action term should be presented
in the exception error (which it is from 21.0-rc1).


>
> Cheers,
> Torben
> --
> https://www.linkedin.com/in/torbenhoffmann/


--

/ Raimo Niskanen, Erlang/OTP, Ericsson AB
_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: missing line numbers when gen_statem crashes

Torben Hoffmann
Hi Raimo,

That was indeed the case. Was missing a From in a reply tuple.

I wonder what can be done to make all of this more approachable.
My memory of working with the gen_fsm is a much more pleasant one when you did errors.
I can appreciate that stuffing things into the callback return tuples makes things harder, but it goes against one of the best things about OTP: it has always been super easy to find the place in the code where you have messed up.
With gen_statem I'm not getting the same kind of tranquility as with gen_server and gen_statem.

I'm wondering if the replies and next_event's should be made explicit calls in the code.
That way you'd get a line number in your face if you did it wrong.
And it would simplify the callback return structures quite a bit.

So {reply, From, Msg} would have to be done with gen_statem:reply/1,2 exclusively and {next_event, Type, EventContent} would need new functions gen_statem:next_event/1,2.

If the gen_statem:next_event/1,2 was a thing one could do this without removing the complex callback return values.
Now that I know what pain error in the callback return values cause when debugging I would switch to using them exclusively, if the next_event/1,2 functions were available. I value ease of debugging that much.

Cheers,
Torben

On Wed, Jun 13, 2018 at 3:09 PM Raimo Niskanen <[hidden email]> wrote:
On Wed, Jun 13, 2018 at 11:51:31AM +0200, Torben Hoffmann wrote:
> Hi,
>
> I have a crashing gen_statem, but to the best of my knowledge I cannot see
> from the stock error messages which line my gen_statem dies on.
>
> I have searched high and low in my console output, but this is the only
> thing that I could find:
>
> =CRASH REPORT==== 13-Jun-2018::11:17:28 ===
>   crasher:
>     initial call: eee_sched_shard:init/1
>     pid: <0.3149.0>
>     registered_name: []
>     exception error: {badrecord,trans_opts}
>       in function  gen_statem:loop_event_actions_list/10 (gen_statem.erl,
> line 1210)

This is an internal crash in gen_statem - I think i recognize this
badrecord at line 1210.  Your callback module returns a bad action in the
actions list, and when gen_statem terminates there is a bug that causes it
to crash instead, hiding the informative reason.

It might be so that you return an action that is not allowed from a state
enter call, or simply a not recognized action.

Since the error is "cosmetic" or rather "not in the function", I did not
patch this in the 20.3 track.  The bad diagnostics error was introduced in
20.3 and is corrected in 21.0-rc1 and forwards.


>     ancestors: [<0.3148.0>,<0.3114.0>,<0.3107.0>,eee_sched_streams_sup,
>                   eee_sched_task_sup,eee_sched_sup,<0.2867.0>]
>     message_queue_len: 0
>     messages: []
>     links: [<0.3148.0>]
>     dictionary: []
>     trap_exit: false
>     status: running
>     heap_size: 6772
>     stack_size: 27
>     reductions: 58869
>   neighbours:
>
> I'm about to turn on debug mode, but I don't think that that should be the
> answer to this kind of problem. Line numbers should come out by default.

In this case no line number can be presented since the callback module
returned a bad term, and it is when processing the return value that the
error is discovered.  In this case the bad action term should be presented
in the exception error (which it is from 21.0-rc1).


>
> Cheers,
> Torben
> --
> https://www.linkedin.com/in/torbenhoffmann/


--

/ Raimo Niskanen, Erlang/OTP, Ericsson AB
_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
--

_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: missing line numbers when gen_statem crashes

Raimo Niskanen-2
On Wed, Jun 13, 2018 at 04:03:17PM +0200, Torben Hoffmann wrote:

> Hi Raimo,
>
> That was indeed the case. Was missing a From in a reply tuple.
>
> I wonder what can be done to make all of this more approachable.
> My memory of working with the gen_fsm is a much more pleasant one when you
> did errors.
> I can appreciate that stuffing things into the callback return tuples makes
> things harder, but it goes against one of the best things about OTP: it has
> always been super easy to find the place in the code where you have messed
> up.
> With gen_statem I'm not getting the same kind of tranquility as with
> gen_server and gen_statem.

This particular case was a bug hiding the intended error message.

In OTP-21.0 you will get a regular error report (that contains
among other things the current state, the event in play and the reason:

    {bad_action_from_state_function,Action}

For a bad return tuple you would get:

    {bad_return_from_state_function,Result}


For what is not allowed from a state enter call you get:

    {bad_state_enter_action_from_state_function,Action}
and
    {bad_state_enter_return_from_state_function,Result}

There are also these:
    {bad_return_from_init,Result}
    {bad_return_from_callback_mode,CallbackMode}
    {bad_reply_action_from_state_function,R}

I hope they will be more helpful and enough to pinpoint the cause.
The current state and event is not exactly the same as the callback module
line number, but should be about as useful.


>
> I'm wondering if the replies and next_event's should be made explicit calls
> in the code.
> That way you'd get a line number in your face if you did it wrong.
> And it would simplify the callback return structures quite a bit.
>
> So {reply, From, Msg} would have to be done with gen_statem:reply/1,2
> exclusively and {next_event, Type, EventContent} would need new functions
> gen_statem:next_event/1,2.

That does not play well with sys:trace/2 debugging, since the debug state
is in the behaviour engine context and not reachable when called from
the callback module.

That kind of model is very non-functional but could be done by passing
values through the process dictionary from the callback module context to
the behaviour engine context.

I do not like that at all.


>
> If the gen_statem:next_event/1,2 was a thing one could do this without
> removing the complex callback return values.
> Now that I know what pain error in the callback return values cause when

You keep talking about the pain increased by a bug corrected in 21.0.
Please re-evaluate the 21.0 version to see how painful it hopefully not is.

/ Raimo



> debugging I would switch to using them exclusively, if the next_event/1,2
> functions were available. I value ease of debugging that much.
>
> Cheers,
> Torben
>
> On Wed, Jun 13, 2018 at 3:09 PM Raimo Niskanen <
> [hidden email]> wrote:
>
> > On Wed, Jun 13, 2018 at 11:51:31AM +0200, Torben Hoffmann wrote:
> > > Hi,
> > >
> > > I have a crashing gen_statem, but to the best of my knowledge I cannot
> > see
> > > from the stock error messages which line my gen_statem dies on.
> > >
> > > I have searched high and low in my console output, but this is the only
> > > thing that I could find:
> > >
> > > =CRASH REPORT==== 13-Jun-2018::11:17:28 ===
> > >   crasher:
> > >     initial call: eee_sched_shard:init/1
> > >     pid: <0.3149.0>
> > >     registered_name: []
> > >     exception error: {badrecord,trans_opts}
> > >       in function  gen_statem:loop_event_actions_list/10 (gen_statem.erl,
> > > line 1210)
> >
> > This is an internal crash in gen_statem - I think i recognize this
> > badrecord at line 1210.  Your callback module returns a bad action in the
> > actions list, and when gen_statem terminates there is a bug that causes it
> > to crash instead, hiding the informative reason.
> >
> > It might be so that you return an action that is not allowed from a state
> > enter call, or simply a not recognized action.
> >
> > Since the error is "cosmetic" or rather "not in the function", I did not
> > patch this in the 20.3 track.  The bad diagnostics error was introduced in
> > 20.3 and is corrected in 21.0-rc1 and forwards.
> >
> >
> > >     ancestors: [<0.3148.0>,<0.3114.0>,<0.3107.0>,eee_sched_streams_sup,
> > >                   eee_sched_task_sup,eee_sched_sup,<0.2867.0>]
> > >     message_queue_len: 0
> > >     messages: []
> > >     links: [<0.3148.0>]
> > >     dictionary: []
> > >     trap_exit: false
> > >     status: running
> > >     heap_size: 6772
> > >     stack_size: 27
> > >     reductions: 58869
> > >   neighbours:
> > >
> > > I'm about to turn on debug mode, but I don't think that that should be
> > the
> > > answer to this kind of problem. Line numbers should come out by default.
> >
> > In this case no line number can be presented since the callback module
> > returned a bad term, and it is when processing the return value that the
> > error is discovered.  In this case the bad action term should be presented
> > in the exception error (which it is from 21.0-rc1).
> >
> >
> > >
> > > Cheers,
> > > Torben
> > > --
> > > https://www.linkedin.com/in/torbenhoffmann/
> >
> >
> > --
> >
> > / Raimo Niskanen, Erlang/OTP, Ericsson AB
> > _______________________________________________
> > erlang-questions mailing list
> > [hidden email]
> > http://erlang.org/mailman/listinfo/erlang-questions
> >
> --
> https://www.linkedin.com/in/torbenhoffmann/

> _______________________________________________
> erlang-questions mailing list
> [hidden email]
> http://erlang.org/mailman/listinfo/erlang-questions


--

/ Raimo Niskanen, Erlang/OTP, Ericsson AB
_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions