Broken gen:call/3?

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Broken gen:call/3?

Sean Hinde-2
Hi,

This behaviour seems broken to me:

1. One process is linked to another (for supervision reasons), and a  
gen_*:call/2 synchronous request is made from one to the other.

2. The called process crashes while handling the call.

3. gen:call consumes *both* it's own monitor 'DOWN' message *and* the  
'EXIT' message arising from the link

Result: calling process doesn't get 'EXIT' message, and hence doesn't  
know about the crash. It does not then function well as a supervisor...

Sean


Reply | Threaded
Open this post in threaded view
|

Broken gen:call/3?

Raimo Niskanen-3
Aaah, well, yes.. This is an old flaw.

Once upon a time there were only links to supervise other
processes, so the only way to know if a server died during
a library call e.g inside gen_server:call after sending
the request while receiving the response, was that an
'EXIT' message was received instead; and then the library
code for gen_server:call would have to trap exit messages
and set a link to the server.

But that can not be done by library code, since there can
be only one link between any pair of processes. Possibly
exit message trapping could be done, but there is a time
window after receive before disabling exit message trapping
that can not be controlled, so the library code can not
be sure to not accidentally convert a link exit to an
exit message.

So, it was then designed so that _if_ the calling process
had activated exit message trapping _and_ set a link to the
server, then the gen_server:call could receive the 'EXIT'
message and return an error code as a result of the server call.

Later, when monitors was introduced we could not change
the behaviour of gen_server:call to not consume 'EXIT'
messages at all (which would be the right(TM) way, in
the precence of monitors); the result would be passing
undesired 'EXIT' messages onto old calling applications.

So, there we are today. The calling process should check
the result from gen_server:call plus receive 'EXIT' messages.
Or set a monitor of its own.

sean.hinde (Sean Hinde) writes:

> Hi,
>
> This behaviour seems broken to me:
>
> 1. One process is linked to another (for supervision reasons), and a
> gen_*:call/2 synchronous request is made from one to the other.
>
> 2. The called process crashes while handling the call.
>
> 3. gen:call consumes *both* it's own monitor 'DOWN' message *and* the
> 'EXIT' message arising from the link
>
> Result: calling process doesn't get 'EXIT' message, and hence doesn't
> know about the crash. It does not then function well as a supervisor...
>
> Sean

--

/ Raimo Niskanen, Erlang/OTP, Ericsson AB


Reply | Threaded
Open this post in threaded view
|

Broken gen:call/3?

Sean Hinde-2
Indeed !

I wonder how much code there is out there which is currently broken  
because the author did not realise this happens vs code which would  
be broken if it was changed.

My guess, based on the assumption that people would expect to have to  
handle 'EXIT' messages if they have chosen to link, is that this  
behaviour hides many more latent bugs than would be introduced if it  
were changed..

Sean

On 31 Oct 2005, at 14:18, Raimo Niskanen wrote:

> Aaah, well, yes.. This is an old flaw.
>
> Once upon a time there were only links to supervise other
> processes, so the only way to know if a server died during
> a library call e.g inside gen_server:call after sending
> the request while receiving the response, was that an
> 'EXIT' message was received instead; and then the library
> code for gen_server:call would have to trap exit messages
> and set a link to the server.
>
> But that can not be done by library code, since there can
> be only one link between any pair of processes. Possibly
> exit message trapping could be done, but there is a time
> window after receive before disabling exit message trapping
> that can not be controlled, so the library code can not
> be sure to not accidentally convert a link exit to an
> exit message.
>
> So, it was then designed so that _if_ the calling process
> had activated exit message trapping _and_ set a link to the
> server, then the gen_server:call could receive the 'EXIT'
> message and return an error code as a result of the server call.
>
> Later, when monitors was introduced we could not change
> the behaviour of gen_server:call to not consume 'EXIT'
> messages at all (which would be the right(TM) way, in
> the precence of monitors); the result would be passing
> undesired 'EXIT' messages onto old calling applications.
>
> So, there we are today. The calling process should check
> the result from gen_server:call plus receive 'EXIT' messages.
> Or set a monitor of its own.
>
> sean.hinde (Sean Hinde) writes:
>
>
>> Hi,
>>
>> This behaviour seems broken to me:
>>
>> 1. One process is linked to another (for supervision reasons), and a
>> gen_*:call/2 synchronous request is made from one to the other.
>>
>> 2. The called process crashes while handling the call.
>>
>> 3. gen:call consumes *both* it's own monitor 'DOWN' message *and* the
>> 'EXIT' message arising from the link
>>
>> Result: calling process doesn't get 'EXIT' message, and hence doesn't
>> know about the crash. It does not then function well as a  
>> supervisor...
>>
>> Sean
>>
>
> --
>
> / Raimo Niskanen, Erlang/OTP, Ericsson AB
>



Reply | Threaded
Open this post in threaded view
|

Broken gen:call/3?

Raimo Niskanen-3
In the best of worlds you would be right, but since this strange
behaviour has been tested and in production for many many years
you just _might_ not be right. And a behaviour change would
expose new bugs.

Therefore we assume a behaviour change would make our major
paying customers, which are in the maintenance phase of their
products, avoid taking a new OTP release; forcing us to maintain
one release more than necessary, stealing recources from
new development...

sean.hinde (Sean Hinde) writes:

> Indeed !
>
> I wonder how much code there is out there which is currently broken
> because the author did not realise this happens vs code which would
> be broken if it was changed.
>
> My guess, based on the assumption that people would expect to have to
> handle 'EXIT' messages if they have chosen to link, is that this
> behaviour hides many more latent bugs than would be introduced if it
> were changed..
>
> Sean
>
> On 31 Oct 2005, at 14:18, Raimo Niskanen wrote:
>
> > Aaah, well, yes.. This is an old flaw.
> >
> > Once upon a time there were only links to supervise other
> > processes, so the only way to know if a server died during
> > a library call e.g inside gen_server:call after sending
> > the request while receiving the response, was that an
> > 'EXIT' message was received instead; and then the library
> > code for gen_server:call would have to trap exit messages
> > and set a link to the server.
> >
> > But that can not be done by library code, since there can
> > be only one link between any pair of processes. Possibly
> > exit message trapping could be done, but there is a time
> > window after receive before disabling exit message trapping
> > that can not be controlled, so the library code can not
> > be sure to not accidentally convert a link exit to an
> > exit message.
> >
> > So, it was then designed so that _if_ the calling process
> > had activated exit message trapping _and_ set a link to the
> > server, then the gen_server:call could receive the 'EXIT'
> > message and return an error code as a result of the server call.
> >
> > Later, when monitors was introduced we could not change
> > the behaviour of gen_server:call to not consume 'EXIT'
> > messages at all (which would be the right(TM) way, in
> > the precence of monitors); the result would be passing
> > undesired 'EXIT' messages onto old calling applications.
> >
> > So, there we are today. The calling process should check
> > the result from gen_server:call plus receive 'EXIT' messages.
> > Or set a monitor of its own.
> >
> > sean.hinde (Sean Hinde) writes:
> >
> >
> >> Hi,
> >>
> >> This behaviour seems broken to me:
> >>
> >> 1. One process is linked to another (for supervision reasons), and a
> >> gen_*:call/2 synchronous request is made from one to the other.
> >>
> >> 2. The called process crashes while handling the call.
> >>
> >> 3. gen:call consumes *both* it's own monitor 'DOWN' message *and* the
> >> 'EXIT' message arising from the link
> >>
> >> Result: calling process doesn't get 'EXIT' message, and hence doesn't
> >> know about the crash. It does not then function well as a
> >> supervisor...
> >>
> >> Sean
> >>
> >
> > --
> >
> > / Raimo Niskanen, Erlang/OTP, Ericsson AB
> >
>

--

/ Raimo Niskanen, Erlang/OTP, Ericsson AB


Reply | Threaded
Open this post in threaded view
|

Broken gen:call/3?

Sean Hinde-2
Interesting. The nature of this bug is such that it will only be  
exposed in unusual error cases, particularly ones that are difficult  
to create during testing

I am sure there are many "fully tested" systems which do not  
correctly account for this bug.

There is not even a realistic way currently to examine the exit  
reason to determine if the process has crashed (vs the caller  
crashing). It would actually be very difficult to write code which  
correctly works around this bug while still relying on links.

Perhaps the maintenance mode customers would be pleased if you made a  
change which correctly uncovered bugs in their systems?

Sean

On 1 Nov 2005, at 08:25, Raimo Niskanen wrote:

> In the best of worlds you would be right, but since this strange
> behaviour has been tested and in production for many many years
> you just _might_ not be right. And a behaviour change would
> expose new bugs.
>
> Therefore we assume a behaviour change would make our major
> paying customers, which are in the maintenance phase of their
> products, avoid taking a new OTP release; forcing us to maintain
> one release more than necessary, stealing recources from
> new development...
>
> sean.hinde (Sean Hinde) writes:
>
>
>> Indeed !
>>
>> I wonder how much code there is out there which is currently broken
>> because the author did not realise this happens vs code which would
>> be broken if it was changed.
>>
>> My guess, based on the assumption that people would expect to have to
>> handle 'EXIT' messages if they have chosen to link, is that this
>> behaviour hides many more latent bugs than would be introduced if it
>> were changed..
>>
>> Sean
>>
>> On 31 Oct 2005, at 14:18, Raimo Niskanen wrote:
>>
>>
>>> Aaah, well, yes.. This is an old flaw.
>>>
>>> Once upon a time there were only links to supervise other
>>> processes, so the only way to know if a server died during
>>> a library call e.g inside gen_server:call after sending
>>> the request while receiving the response, was that an
>>> 'EXIT' message was received instead; and then the library
>>> code for gen_server:call would have to trap exit messages
>>> and set a link to the server.
>>>
>>> But that can not be done by library code, since there can
>>> be only one link between any pair of processes. Possibly
>>> exit message trapping could be done, but there is a time
>>> window after receive before disabling exit message trapping
>>> that can not be controlled, so the library code can not
>>> be sure to not accidentally convert a link exit to an
>>> exit message.
>>>
>>> So, it was then designed so that _if_ the calling process
>>> had activated exit message trapping _and_ set a link to the
>>> server, then the gen_server:call could receive the 'EXIT'
>>> message and return an error code as a result of the server call.
>>>
>>> Later, when monitors was introduced we could not change
>>> the behaviour of gen_server:call to not consume 'EXIT'
>>> messages at all (which would be the right(TM) way, in
>>> the precence of monitors); the result would be passing
>>> undesired 'EXIT' messages onto old calling applications.
>>>
>>> So, there we are today. The calling process should check
>>> the result from gen_server:call plus receive 'EXIT' messages.
>>> Or set a monitor of its own.
>>>
>>> sean.hinde (Sean Hinde) writes:
>>>
>>>
>>>
>>>> Hi,
>>>>
>>>> This behaviour seems broken to me:
>>>>
>>>> 1. One process is linked to another (for supervision reasons),  
>>>> and a
>>>> gen_*:call/2 synchronous request is made from one to the other.
>>>>
>>>> 2. The called process crashes while handling the call.
>>>>
>>>> 3. gen:call consumes *both* it's own monitor 'DOWN' message  
>>>> *and* the
>>>> 'EXIT' message arising from the link
>>>>
>>>> Result: calling process doesn't get 'EXIT' message, and hence  
>>>> doesn't
>>>> know about the crash. It does not then function well as a
>>>> supervisor...
>>>>
>>>> Sean
>>>>
>>>>
>>>
>>> --
>>>
>>> / Raimo Niskanen, Erlang/OTP, Ericsson AB
>>>
>>>
>>
>>
>
> --
>
> / Raimo Niskanen, Erlang/OTP, Ericsson AB
>