Quantcast

On selective receive (Re: eep: multiple patterns)

classic Classic list List threaded Threaded
32 messages Options
12
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

On selective receive (Re: eep: multiple patterns)

Ulf Wiger-2
I would really like to discourage people from avoiding
selective receive because it's "expensive". It can be
expensive on very large message queues, but this is
a pretty rare error condition, and fairly easily observable.

(I know of projects that have banned use of selective
receive for this reason, but without having thought much
about what to use instead, and when.)

You can use erlang:system_monitor/2 to quickly detect
if a process is growing memory in a strange way.

An old legacy Ericsson system implemented selective receive
in a way that the message queue could hold at most 6 messages.
Any more than that was obviously an error.

I think it might be useful to be able to specify such a limit as
a spawn option, perhaps together with maximum heap size.
Exceeding the limit could perhaps lead to the process being
killed (which might seem backwards in the case of the message
queue, but at least gives a visible indication), or that the sender
process would be suspended (which could potentially lead to the
whole system stopping.)

BR,
Ulf W

2008/5/31 Christopher Atkins <[hidden email]>:

> Hello, I tried (poorly--I'm a complete novice) to implement a benchmark from
> your earlier statement.  I didn't do the same thing (load up the message
> mailbox before consuming them), but what I did write led to a perplexing (to
> me) discovery.  If I uncomment the line in [loop1/0] below, performance for
> that loop degrades by an order of magnitude.  Why is that?
>
> -module(test_receive).
> -compile(export_all).
>
> start() ->
>         statistics(runtime),
>         statistics(wall_clock),
>         PidLoop1 = spawn(?MODULE, loop1,[]),
>         sender(PidLoop1, 10000000),
>         {_, Loop1Time1} = statistics(runtime),
>         {_, Loop1Time2} = statistics(wall_clock),
>         io:format("Sent ~p messages in ~p /~p~n", [100000, Loop1Time1,
> Loop1Time2]),
>         statistics(runtime),
>         statistics(wall_clock),
>         PidLoop2 = spawn(?MODULE, loop2,[]),
>         sender(PidLoop2, 10000000),
>         {_, Loop2Time1} = statistics(runtime),
>         {_, Loop2Time2} = statistics(wall_clock),
>         io:format("Sent ~p messages in ~p /~p~n", [100000, Loop2Time1,
> Loop2Time2]).
>
> sender(_, 0) -> void;
> sender(Pid, N) ->
>         if
>           N rem 2 =:= 2 ->
>                 Pid ! test2;
>           true ->
>                 Pid ! test1
>         end,
>         sender(Pid, N - 1).
>
> proc1(F) ->
>         receive
>                 start -> spawn_link(F)
>         end.
>
> loop1() ->
>         receive
>                 %%test1 -> loop1();
>                 test2 -> loop1()
>         end.
>
> loop2() ->
>         receive
>                 _ -> loop2()
>         end.
>
>
> ------------------------------------------------------------------------------------------------------------------------------
> Message: 2
> Date: Fri, 30 May 2008 18:07:18 +0200
> From: "Per Melin" <[hidden email]>
> Subject: Re: [erlang-questions] eep: multiple patterns
> To: "Sean Hinde" <[hidden email]>
> Cc: Erlang Questions <[hidden email]>
> Message-ID:
>        <[hidden email]>
> Content-Type: text/plain; charset=ISO-8859-1
>
> 2008/5/30 Per Melin <[hidden email]>:
>> If I send 100k 'foo' messages and then 100k 'bar' messages to a
>> process, and then do a catch-all receive until there are no messages
>> left, that takes 0.03 seconds.
>>
>> If I do a selective receive of only the 'bar' messages, it takes 90
>> seconds.
>
> I found my old test code:
>
> -module(selective).
>
> -export([a/2, c/2]).
>
> a(Atom, N) ->
>    spawn(fun() -> b(Atom, N) end).
>
> b(Atom, N) ->
>    spam_me(foo, N),
>    spam_me(bar, N),
>    R = timer:tc(?MODULE, c, [Atom, N]),
>    io:format("TC: ~p~n", [R]).
>
> c(Atom, N) ->
>    receive
>        Atom -> c(Atom, N - 1)
>    after 0 ->
>        N
>    end.
>
> spam_me(Msg, Copies) ->
>    lists:foreach(fun(_) -> self() ! Msg end, lists:duplicate(Copies, 0)).
>
> ---
>
> 2> selective:a(bar, 100000).
> <0.38.0>
> TC: {124130689,0}
> 3> selective:a(foo, 100000).
> <0.40.0>
> TC: {23176,0}
>
>
> _______________________________________________
> erlang-questions mailing list
> [hidden email]
> http://www.erlang.org/mailman/listinfo/erlang-questions
>
_______________________________________________
erlang-questions mailing list
[hidden email]
http://www.erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: On selective receive (Re: eep: multiple patterns)

Ulf Wiger-2
Actually, I assume that in just about all cases where you
have a process that needs selective receive semantics,
it's probably perfectly ok to set a low limit on the maximum
length of the message queue. A buffering process could
be placed in front of it, which might also normally do
dispatch. It would not use selective receive, and so wouldn't
suffer much from a large message queue.

BR,
Ulf W

2008/5/31 Ulf Wiger <[hidden email]>:

> An old legacy Ericsson system implemented selective receive
> in a way that the message queue could hold at most 6 messages.
> Any more than that was obviously an error.
>
> I think it might be useful to be able to specify such a limit as
> a spawn option, perhaps together with maximum heap size.
> Exceeding the limit could perhaps lead to the process being
> killed (which might seem backwards in the case of the message
> queue, but at least gives a visible indication), or that the sender
> process would be suspended (which could potentially lead to the
> whole system stopping.)
>
> BR,
> Ulf W
_______________________________________________
erlang-questions mailing list
[hidden email]
http://www.erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: On selective receive (Re: eep: multiple patterns)

Per Melin-3
2008/5/31 Ulf Wiger <[hidden email]>:
> Actually, I assume that in just about all cases where you
> have a process that needs selective receive semantics,
> it's probably perfectly ok to set a low limit on the maximum
> length of the message queue. A buffering process could
> be placed in front of it, which might also normally do
> dispatch. It would not use selective receive, and so wouldn't
> suffer much from a large message queue.

The last time selective receive broke things for me was actually not
in my own code, but in Mnesia.

When Mnesia loads a distributed table from another node it subscribes
to table events before it starts to copy the table, and then ignores
those table event messages while it's (selectively) receiving the
table contents. Depending on the size of the table and the rate at
which the table is updated on the other node, this can make your
message queue grow until you run out of memory.

This is not a case where a long queue obviously is an error. Except
perhaps in the design.


> 2008/5/31 Ulf Wiger <[hidden email]>:
>
>> An old legacy Ericsson system implemented selective receive
>> in a way that the message queue could hold at most 6 messages.
>> Any more than that was obviously an error.
>>
>> I think it might be useful to be able to specify such a limit as
>> a spawn option, perhaps together with maximum heap size.
>> Exceeding the limit could perhaps lead to the process being
>> killed (which might seem backwards in the case of the message
>> queue, but at least gives a visible indication), or that the sender
>> process would be suspended (which could potentially lead to the
>> whole system stopping.)
>>
>> BR,
>> Ulf W
> _______________________________________________
> erlang-questions mailing list
> [hidden email]
> http://www.erlang.org/mailman/listinfo/erlang-questions
>
_______________________________________________
erlang-questions mailing list
[hidden email]
http://www.erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: On selective receive (Re: eep: multiple patterns)

Jay Nelson-2
In reply to this post by Ulf Wiger-2
 > I would really like to discourage people from avoiding
 > selective receive because it's "expensive".

I would second that.  Selective receive is similar to thinking
single threaded in a multi-threaded environment (the approach
that erlang in general supports).  Isolate a group of related
messages using the selective part and then you don't have to
worry about all the other interleave interruptions that may occur.

But as Ulf said, we aren't aware of any books on how to structure your
messaging architecture which take you stepwise up from a
simple architecture to a complicated selective receive.  I do
caution a beginner to start simple and build up; understand how
the message queue works by creating test scenarios that produce
specific results.

One of the early admonishments one hears is to always have a
catch all clause in your receive statements, which of course
eliminates selective receive and causes your code to process
messages in the order they arrived.  To get around this, you
can split the receive into separate functions, and then call one
function to handle one logical message stream and another
function to handle a different logical message stream.

The thing to watch out in the split receive case is the missing
message:

receive
    {a, How} -> do_stuff();
    {a, When, Why} -> do_stuff();
after 500 -> timeout
end.

receive
   {b, How} -> do_stuff();
   {b, When, Why} -> do_stuff();
after 500 -> timeout
end.

Now you can handle the two streams independently, maybe
giving more time to 'a' stream items than 'b' stream items.
But suppose you accidentally send a message with {c, X}
and it only happens once an hour.  You will gradually get a
queue which fills up with {c, X} messages, but you won't notice
the slowdown for a few days.

Whenever you have disjoint receive statements, you need to
take care that there is a technique for emptying unexpected
messages.  Even though your queue should never get long,
a new programmer on the staff may send a new message to
your process without you knowing and it will take a while
to discover the cause.

jay

_______________________________________________
erlang-questions mailing list
[hidden email]
http://www.erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: On selective receive (Re: eep: multiple patterns)

Jay Nelson-2
In reply to this post by Ulf Wiger-2
I wrote:

 > >Whenever you have disjoint receive statements, you need to
 > >take care that there is a technique for emptying unexpected
 > >messages.

Edward Fine accidentally replied only to me directly:

 > Is this a good place to use the catch-all, or is there a better
 > technique? I ask this as a newcomer to Erlang.

(This posting also gives an alternative example to Valentin's
priority problem suggestion)

Consider a case where you are doing a scatter / gather algorithm
to spread processing across nodes or across different processing
algorithms.  To make it concrete, suppose we have a database
with 5 different tables and we need to collect information from each
table to assemble into a single view to the user.

The standard approach is to use the DB capability to join the tables.
This introduces a single point access problem since the database
server is doing all the work while the initiating process waits.

Instead we put each of the tables in a different DB, flat file or ets  
table.
Then we create a process for each one that provides caching and
an access interface using messages.  They may end up on the same
machine or on 5 different machines, but we will get parallelism on
the I/O and possibly on the cache and assembly processing (if there
are multiple cores or multiple machines in the case of cache and
assembly).

What does the code look like?  [Assume getQueries(UserId) generates
a list of queries that are related to the database information we would
like to display and that the length of this list matches the number of
DB processes we have. ]

doUserQuery(UserId) ->
    Queries = getQueries(UserId),
    QueryRef = make_ref(),
    [Pid ! {getData, QueryRef, UserId, Query} || {Pid, Query} <-  
lists:zip(DbPids, Queries)],
    Responses = collect_responses(QueryRef),
    display_db_info(Responses),
    erlang:send_after(1000, self(), {cleanup, QueryRef}).


This is a pretty hokey approach -- you would want something better
than a 1 second delay to tell you whether to eliminate old messages
from the queue, but it is a concrete example to describe why you
would want to use selective receive and what to do to make sure it
doesn't cause you a problem.

collectResponses(QueryRef) ->
    collectResponses(QueryRef, []).

collectResponses(QueryRef, Responses) ->
    receive
       {responseData, QueryRef, _UserId, Results} ->
            collectResponses(QueryRef, [Responses | Results])
    after 100 -> Responses
    end.

Again, my hokey example collects results as long as they are present
or no new ones show up for 100 milliseconds.

What we have so far is a single request message sent to 5 processes
and a function which implements selective receive to collect only the
messages that are in response to the initial request from a variety of
responders (hopefully all, but not if some are slow to respond).

What happens if we have a slow responding database, but it does
actually produce results after 1/2 second.  It was too slow to be  
collected
but it puts messages on the queue anyway.  If we have no mechanism
to clear them, they will build up and cause things to gradually slow  
down.

So at some higher level we need the following code:

main() ->
    receive
        %% Throw away late arriving results from a previous request
        {cleanup, QueryRef} -> dumpOldResults(QueryRef);
        {userRequest, UserId} -> doUserQuery(UserId)
    end,
    main().

dumpOldQueryResults(QueryRef) ->
     receive
         {responseData, QueryRef, _UserId, _Results} ->
              dumpOldQueryResults(QueryRef)
     after 0 -> ok
     end.

In the main function, we give priority to cleaning up old messages.
This will keep the queue short, however, it ensures a full queue
scan for every user request.  As long as the queue is short, that
won't hurt us.  Dumping old messages just cycles as fast as it can
accepting messages that have our unique token and ignoring the
rest of the data in the message.  If there are no clean up messages
remaining, we than accept a new user request (which will necessarily
cause the message queue to grow for a short period) and display the
results.

What did we see?  Selective receive used in 3 different ways:

1) To collect the results of a request (a two-way session conversation)
2) To handle self notifications for maintenance + user requests
3) To handle old messages from an expired session

It turns out the {cleanup, QueryRef} message is not necessary in
the above example and we can just consume all {responseData, ...}
messages inside main(), but it depends on how new requests are
placed on the queue and whether timing allows two requests to
be interleaved in the results set (you don't want to remove all the
responseData for a pending request that has not had time to collect
results yet).  Structuring as above gave more explicit different uses
of selective receive.

The problem remaining in the code above is that there is no
"catch all" clause.  Do we worry about that?  It depends on how the
system evolves.  If you interface to a known protocol and you have
covered all the messages supported via selective receive, then
you could do without a catch all.  If your system is evolving or there
are other processes or programmers who might inject new message
types, you need a catch all in the main/0 function (although you have
to be careful not consume something that should stay on the queue).

I have not tried this code, nor have I typed it into a erl prompt, so I
can't guarantee it even compiles.  Mostly it should give you ideas
about ways to use selective receive.

What if we didn't have selective receive?  I see two choices:

1) Start a thread and open a new socket to the databases for each
user request.  Maintain the conversations as independent channels.

2) Create a hash table of messages received related to each request.
This requires managing the conversation correlations yourself.

Both of these approaches are much more code than selective receive
requires and the complexity of concepts does not increase, so selective
receive is a better approach and a useful feature of erlang.

Is there a better way to manage the conversations rather than the whole
cleanup back channel messaging?

If you can spawn a new process for each request, the responses will
go to privately owned message queues.  When enough responses, or
enough time has passed, the newly spawned request process returns
its results and terminates.  Any messages stuck on the queue are
eliminated.  Any future messages are silently discarded since there is
no process to receive them.  If the backend DB process were monitoring
the request process, it could even interrupt its response to discard the
results rather than waiting for processing to complete and pass them
on to a non-existent process.

With erlang, there are many architectural choices when you consider
the uses of messaging and selective receive.

jay

_______________________________________________
erlang-questions mailing list
[hidden email]
http://www.erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: On selective receive (Re: eep: multiple patterns)

Edwin Fine
Jay,

Thanks for a very detailed and informative response. Although it obviously depends on circumstances, I feel that, given Erlang's extremely fast process creation time and small process size, I would first consider your last option, namely, to create an individual process per request, and use an ETS table to coordinate responses. If there are very many responses to be collected for each request, I would intuitively imagine in my "Erlang newbie fog" that using an ETS table with its constant-time performance and no-garbage-collection characteristics would be better on average than using selective receive, which I understand has to do a linear scan and move unprocessed messages to another area. Of course, intuition often does not stand up to the reality of performance measurements, so it would be interesting to see a benchmark of the various architectural options you have described, perhaps as a function of response time vs. request rate.

Regards,
Edwin Fine

On Sun, Jun 1, 2008 at 8:09 PM, Jay Nelson <[hidden email]> wrote:
I wrote:

 > >Whenever you have disjoint receive statements, you need to
 > >take care that there is a technique for emptying unexpected
 > >messages.

Edward Fine accidentally replied only to me directly:

 > Is this a good place to use the catch-all, or is there a better
 > technique? I ask this as a newcomer to Erlang.

(This posting also gives an alternative example to Valentin's
priority problem suggestion)

Consider a case where you are doing a scatter / gather algorithm
to spread processing across nodes or across different processing
algorithms.  To make it concrete, suppose we have a database
with 5 different tables and we need to collect information from each
table to assemble into a single view to the user.

The standard approach is to use the DB capability to join the tables.
This introduces a single point access problem since the database
server is doing all the work while the initiating process waits.

Instead we put each of the tables in a different DB, flat file or ets
table.
Then we create a process for each one that provides caching and
an access interface using messages.  They may end up on the same
machine or on 5 different machines, but we will get parallelism on
the I/O and possibly on the cache and assembly processing (if there
are multiple cores or multiple machines in the case of cache and
assembly).

What does the code look like?  [Assume getQueries(UserId) generates
a list of queries that are related to the database information we would
like to display and that the length of this list matches the number of
DB processes we have. ]

doUserQuery(UserId) ->
   Queries = getQueries(UserId),
   QueryRef = make_ref(),
   [Pid ! {getData, QueryRef, UserId, Query} || {Pid, Query} <-
lists:zip(DbPids, Queries)],
   Responses = collect_responses(QueryRef),
   display_db_info(Responses),
   erlang:send_after(1000, self(), {cleanup, QueryRef}).


This is a pretty hokey approach -- you would want something better
than a 1 second delay to tell you whether to eliminate old messages
from the queue, but it is a concrete example to describe why you
would want to use selective receive and what to do to make sure it
doesn't cause you a problem.

collectResponses(QueryRef) ->
   collectResponses(QueryRef, []).

collectResponses(QueryRef, Responses) ->
   receive
      {responseData, QueryRef, _UserId, Results} ->
           collectResponses(QueryRef, [Responses | Results])
   after 100 -> Responses
   end.

Again, my hokey example collects results as long as they are present
or no new ones show up for 100 milliseconds.

What we have so far is a single request message sent to 5 processes
and a function which implements selective receive to collect only the
messages that are in response to the initial request from a variety of
responders (hopefully all, but not if some are slow to respond).

What happens if we have a slow responding database, but it does
actually produce results after 1/2 second.  It was too slow to be
collected
but it puts messages on the queue anyway.  If we have no mechanism
to clear them, they will build up and cause things to gradually slow
down.

So at some higher level we need the following code:

main() ->
   receive
       %% Throw away late arriving results from a previous request
       {cleanup, QueryRef} -> dumpOldResults(QueryRef);
       {userRequest, UserId} -> doUserQuery(UserId)
   end,
   main().

dumpOldQueryResults(QueryRef) ->
    receive
        {responseData, QueryRef, _UserId, _Results} ->
             dumpOldQueryResults(QueryRef)
    after 0 -> ok
    end.

In the main function, we give priority to cleaning up old messages.
This will keep the queue short, however, it ensures a full queue
scan for every user request.  As long as the queue is short, that
won't hurt us.  Dumping old messages just cycles as fast as it can
accepting messages that have our unique token and ignoring the
rest of the data in the message.  If there are no clean up messages
remaining, we than accept a new user request (which will necessarily
cause the message queue to grow for a short period) and display the
results.

What did we see?  Selective receive used in 3 different ways:

1) To collect the results of a request (a two-way session conversation)
2) To handle self notifications for maintenance + user requests
3) To handle old messages from an expired session

It turns out the {cleanup, QueryRef} message is not necessary in
the above example and we can just consume all {responseData, ...}
messages inside main(), but it depends on how new requests are
placed on the queue and whether timing allows two requests to
be interleaved in the results set (you don't want to remove all the
responseData for a pending request that has not had time to collect
results yet).  Structuring as above gave more explicit different uses
of selective receive.

The problem remaining in the code above is that there is no
"catch all" clause.  Do we worry about that?  It depends on how the
system evolves.  If you interface to a known protocol and you have
covered all the messages supported via selective receive, then
you could do without a catch all.  If your system is evolving or there
are other processes or programmers who might inject new message
types, you need a catch all in the main/0 function (although you have
to be careful not consume something that should stay on the queue).

I have not tried this code, nor have I typed it into a erl prompt, so I
can't guarantee it even compiles.  Mostly it should give you ideas
about ways to use selective receive.

What if we didn't have selective receive?  I see two choices:

1) Start a thread and open a new socket to the databases for each
user request.  Maintain the conversations as independent channels.

2) Create a hash table of messages received related to each request.
This requires managing the conversation correlations yourself.

Both of these approaches are much more code than selective receive
requires and the complexity of concepts does not increase, so selective
receive is a better approach and a useful feature of erlang.

Is there a better way to manage the conversations rather than the whole
cleanup back channel messaging?

If you can spawn a new process for each request, the responses will
go to privately owned message queues.  When enough responses, or
enough time has passed, the newly spawned request process returns
its results and terminates.  Any messages stuck on the queue are
eliminated.  Any future messages are silently discarded since there is
no process to receive them.  If the backend DB process were monitoring
the request process, it could even interrupt its response to discard the
results rather than waiting for processing to complete and pass them
on to a non-existent process.

With erlang, there are many architectural choices when you consider
the uses of messaging and selective receive.

jay

_______________________________________________
erlang-questions mailing list
[hidden email]
http://www.erlang.org/mailman/listinfo/erlang-questions



_______________________________________________
erlang-questions mailing list
[hidden email]
http://www.erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: On selective receive (Re: eep: multiple patterns)

Jay Nelson-2

On Jun 1, 2008, at 8:35 PM, Edwin Fine wrote:

> Jay,
>
> Thanks for a very detailed and informative response. Although it  
> obviously depends on circumstances, I feel that, given Erlang's  
> extremely fast process creation time and small process size, I  
> would first consider your last option, namely, to create an  
> individual process per request, and use an ETS table to coordinate  
> responses. If there are very many responses to be collected for  
> each request, I would intuitively imagine in my "Erlang newbie fog"  
> that using an ETS table with its constant-time performance and no-
> garbage-collection characteristics would be better on average than  
> using selective receive, which I understand has to do a linear scan  
> and move unprocessed messages to another area. Of course, intuition  
> often does not stand up to the reality of performance measurements,  
> so it would be interesting to see a benchmark of the various  
> architectural options you have described, perhaps as a function of  
> response time vs. request rate.

If you spawn a separate process for each, there is no need for an ets  
table.  Just have the process send the results back "en masse".

Dying PID's final message:

     Caller ! {responseData, QueryRef, AllTheDataAssembledAsNeeded}

The caller's main loop can just:

     receive
        {responseData, QueryRef, Results} ->
               do_something(Results)
     end.


If you need to pass it back to another process, just arrange:

QueryRef = {make_ref(), UltimatePidToSendResults}

Then the receive pattern above can become:

    {responseData, {Ref, EndPid}, Results} ->
         EndPid ! {response, self(), Results}


In erlang you find that you lose code where in other languages you  
must add code.
You don't check for errors, just code like it will succeed.  Don't  
reconstruct structured
data as a hash table or tree when the message can be tagged and  
returned to you
with the correct classification as you knew it to start with.

jay

_______________________________________________
erlang-questions mailing list
[hidden email]
http://www.erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: On selective receive (Re: eep: multiple patterns)

Mats Cronqvist-4
In reply to this post by Ulf Wiger-2
Ulf Wiger wrote:
> I would really like to discourage people from avoiding
> selective receive because it's "expensive". It can be
> expensive on very large message queues, but this is
> a pretty rare error condition, and fairly easily observable.
>  

  i think the "issue" of how the emu deals with huge in-queues is pretty
uninteresting.
  in my personal experience, every single time this has come up the real
problem has turned out to be lack of proper flow control (typically
using {active,true} sockets). having 100k messages in an in-queue is not
a realistic use case.
  the fact that this is not, afaik, particularly well documented is of
course a problem.

  mats
_______________________________________________
erlang-questions mailing list
[hidden email]
http://www.erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: On selective receive (Re: eep: multiple patterns)

Chandru-4
2008/6/2 Mats Cronqvist <[hidden email]>:
Ulf Wiger wrote:
> I would really like to discourage people from avoiding
> selective receive because it's "expensive". It can be
> expensive on very large message queues, but this is
> a pretty rare error condition, and fairly easily observable.
>

 i think the "issue" of how the emu deals with huge in-queues is pretty
uninteresting.
 in my personal experience, every single time this has come up the real
problem has turned out to be lack of proper flow control (typically
using {active,true} sockets). having 100k messages in an in-queue is not
a realistic use case.
 the fact that this is not, afaik, particularly well documented is of
course a problem.

This is true - but if one has no prior experience of this situation, it is hard to understand why a system is behaving sluggishly. What will be nice is having an option, as Ulf suggested earlier, to have bounded message queues (kill the process if the message queue length exceeds a certain value). That way, flow control problems will be more readily visible to users. In real life situations, when a process gets into this state, the only way to fix it is to kill that process as it will probably never catch up. This has been discussed before: http://www.erlang.org/pipermail/erlang-questions/2006-January/018364.html

It also fits in well with the "Let it crash" philosophy.

cheers
Chandru


_______________________________________________
erlang-questions mailing list
[hidden email]
http://www.erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: On selective receive (Re: eep: multiple patterns)

Mats Cronqvist-4
Chandru wrote:

> 2008/6/2 Mats Cronqvist <[hidden email]
> <mailto:[hidden email]>>:
>
>     Ulf Wiger wrote:
>     > I would really like to discourage people from avoiding
>     > selective receive because it's "expensive". It can be
>     > expensive on very large message queues, but this is
>     > a pretty rare error condition, and fairly easily observable.
>     >
>
>      i think the "issue" of how the emu deals with huge in-queues is
>     pretty
>     uninteresting.
>      in my personal experience, every single time this has come up the
>     real
>     problem has turned out to be lack of proper flow control (typically
>     using {active,true} sockets). having 100k messages in an in-queue
>     is not
>     a realistic use case.
>      the fact that this is not, afaik, particularly well documented is of
>     course a problem.
>
>
> This is true - but if one has no prior experience of this situation,
> it is hard to understand why a system is behaving sluggishly. What
> will be nice is having an option, as Ulf suggested earlier, to have
> bounded message queues (kill the process if the message queue length
> exceeds a certain value). That way, flow control problems will be more
> readily visible to users.

  true enough.

  mats
_______________________________________________
erlang-questions mailing list
[hidden email]
http://www.erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: On selective receive (Re: eep: multiple patterns)

Gleb Peregud
On Tue, Jun 3, 2008 at 8:28 AM, Mats Cronqvist <[hidden email]> wrote:

> Chandru wrote:
>> 2008/6/2 Mats Cronqvist <[hidden email]
>> <mailto:[hidden email]>>:
>>
>>     Ulf Wiger wrote:
>>     > I would really like to discourage people from avoiding
>>     > selective receive because it's "expensive". It can be
>>     > expensive on very large message queues, but this is
>>     > a pretty rare error condition, and fairly easily observable.
>>     >
>>
>>      i think the "issue" of how the emu deals with huge in-queues is
>>     pretty
>>     uninteresting.
>>      in my personal experience, every single time this has come up the
>>     real
>>     problem has turned out to be lack of proper flow control (typically
>>     using {active,true} sockets). having 100k messages in an in-queue
>>     is not
>>     a realistic use case.
>>      the fact that this is not, afaik, particularly well documented is of
>>     course a problem.
>>
>>
>> This is true - but if one has no prior experience of this situation,
>> it is hard to understand why a system is behaving sluggishly. What
>> will be nice is having an option, as Ulf suggested earlier, to have
>> bounded message queues (kill the process if the message queue length
>> exceeds a certain value). That way, flow control problems will be more
>> readily visible to users.
>
>  true enough.
>
>  mats
> _______________________________________________
> erlang-questions mailing list
> [hidden email]
> http://www.erlang.org/mailman/listinfo/erlang-questions
>

> What
> will be nice is having an option, as Ulf suggested earlier, to have
> bounded message queues (kill the process if the message queue length
> exceeds a certain value).

+1

P.S. Sorry Mats for sending this only to You previously
--
Gleb Peregud
http://gleber.pl/

Every minute is to be grasped.
Time waits for nobody.
-- Inscription on a Zen Gong
_______________________________________________
erlang-questions mailing list
[hidden email]
http://www.erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: On selective receive (Re: eep: multiple patterns)

Sean Hinde
In reply to this post by Chandru-4

On 2 Jun 2008, at 14:31, Chandru wrote:

> 2008/6/2 Mats Cronqvist <[hidden email]>:
> Ulf Wiger wrote:
> > I would really like to discourage people from avoiding
> > selective receive because it's "expensive". It can be
> > expensive on very large message queues, but this is
> > a pretty rare error condition, and fairly easily observable.
> >
>
>  i think the "issue" of how the emu deals with huge in-queues is  
> pretty
> uninteresting.
>  in my personal experience, every single time this has come up the  
> real
> problem has turned out to be lack of proper flow control (typically
> using {active,true} sockets). having 100k messages in an in-queue is  
> not
> a realistic use case.
>  the fact that this is not, afaik, particularly well documented is of
> course a problem.
>
> This is true - but if one has no prior experience of this situation,  
> it is hard to understand why a system is behaving sluggishly. What  
> will be nice is having an option, as Ulf suggested earlier, to have  
> bounded message queues (kill the process if the message queue length  
> exceeds a certain value). That way, flow control problems will be  
> more readily visible to users. In real life situations, when a  
> process gets into this state, the only way to fix it is to kill that  
> process as it will probably never catch up. This has been discussed  
> before: http://www.erlang.org/pipermail/erlang-questions/2006-January/018364.html
>
> It also fits in well with the "Let it crash" philosophy.

I respectfully disagree. It is nigh on impossible to predict where  
there might be some error that leads to a large queue, and this would  
lead to "defensive programming" where every process has a short max  
length. This would result in random crashes and loss of data for those  
uncommon situations in an generally well designed system where there  
might be a legitimate short term peak in queue lengths.

We already have a mechanism to restart if a queue grows too large  
(actually 2 - process_info monitoring, and out of memory !)

cheers,
Sean

_______________________________________________
erlang-questions mailing list
[hidden email]
http://www.erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: On selective receive (Re: eep: multiple patterns)

Christian S-2
On Tue, Jun 3, 2008 at 12:11 PM, Sean Hinde <[hidden email]> wrote:
> I respectfully disagree. It is nigh on impossible to predict where
> there might be some error that leads to a large queue, and this would
> lead to "defensive programming" where every process has a short max
> length. This would result in random crashes and loss of data for those
> uncommon situations in an generally well designed system where there
> might be a legitimate short term peak in queue lengths.
>
> We already have a mechanism to restart if a queue grows too large
> (actually 2 - process_info monitoring, and out of memory !)

Maybe more-queued-than-a-set-threshold could be made into a traceable event?


What happened to the thread about creating a dtrace provider for erlang?
_______________________________________________
erlang-questions mailing list
[hidden email]
http://www.erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: On selective receive (Re: eep: multiple patterns)

Sean Hinde

On 3 Jun 2008, at 11:38, Christian S wrote:

> On Tue, Jun 3, 2008 at 12:11 PM, Sean Hinde <[hidden email]>  
> wrote:
>> I respectfully disagree. It is nigh on impossible to predict where
>> there might be some error that leads to a large queue, and this would
>> lead to "defensive programming" where every process has a short max
>> length. This would result in random crashes and loss of data for  
>> those
>> uncommon situations in an generally well designed system where there
>> might be a legitimate short term peak in queue lengths.
>>
>> We already have a mechanism to restart if a queue grows too large
>> (actually 2 - process_info monitoring, and out of memory !)
>
> Maybe more-queued-than-a-set-threshold could be made into a  
> traceable event?

Could be nice yes.

I sill think it would also be much better if the system didn't slow to  
a crawl if queues grow large - this is the effect that almost  
guarantees the need for a restart. To quote Chandru "In real life  
situations, when a process gets into this state, the only way to fix  
it is to kill that process as it will *probably never catch  
up*" (emphasis mine).

Both slowdown effects (GC copying and selective receive repeated  
searching) seem quite amenable to smart optimisations.

> What happened to the thread about creating a dtrace provider for  
> erlang?

I was left with the impression someone went away to start implementing  
stuff ..

Cheers,
Sean

_______________________________________________
erlang-questions mailing list
[hidden email]
http://www.erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: On selective receive (Re: eep: multiple patterns)

Chandru-4
In reply to this post by Sean Hinde
2008/6/3 Sean Hinde <[hidden email]>:

On 2 Jun 2008, at 14:31, Chandru wrote:

2008/6/2 Mats Cronqvist <[hidden email]>:
Ulf Wiger wrote:
> I would really like to discourage people from avoiding
> selective receive because it's "expensive". It can be
> expensive on very large message queues, but this is
> a pretty rare error condition, and fairly easily observable.
>

 i think the "issue" of how the emu deals with huge in-queues is pretty
uninteresting.
 in my personal experience, every single time this has come up the real
problem has turned out to be lack of proper flow control (typically
using {active,true} sockets). having 100k messages in an in-queue is not
a realistic use case.
 the fact that this is not, afaik, particularly well documented is of
course a problem.

This is true - but if one has no prior experience of this situation, it is hard to understand why a system is behaving sluggishly. What will be nice is having an option, as Ulf suggested earlier, to have bounded message queues (kill the process if the message queue length exceeds a certain value). That way, flow control problems will be more readily visible to users. In real life situations, when a process gets into this state, the only way to fix it is to kill that process as it will probably never catch up. This has been discussed before: http://www.erlang.org/pipermail/erlang-questions/2006-January/018364.html

It also fits in well with the "Let it crash" philosophy.

I respectfully disagree. It is nigh on impossible to predict where there might be some error that leads to a large queue, and this would lead to "defensive programming" where every process has a short max length. This would result in random crashes and loss of data for those uncommon situations in an generally well designed system where there might be a legitimate short term peak in queue lengths.

We already have a mechanism to restart if a queue grows too large (actually 2 - process_info monitoring, and out of memory !)


I agree it is nearly impossible to predict this -  but what options does a programmer have without this bounded queue facility.

  1. Introduce message queue monitoring for every process which is potentially long lived, which imho is extra boiler plate code which reduces readability of core functionality. Also there will be different ways of doing it depending on how your process is structured (gen_fsm, gen_server, gen_event, pure erlang...). If all that one does upon detecting this condition is clear the message queue by discarding messages, or terminate the process, wouldn't it be good to have this built-in?

  2. have another process which monitors the entire system - which is not very scalable when you have hundreds of thousands of processes.

  3. Wait for the system to crash in live and then figure out what happened.

cheers
Chandru


_______________________________________________
erlang-questions mailing list
[hidden email]
http://www.erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: On selective receive (Re: eep: multiple patterns)

Chandru-4
2008/6/3 Vlad Dumitrescu <[hidden email]>:
HI,

2008/6/3 Chandru <[hidden email]>:
2008/6/3 Sean Hinde <[hidden email]>:
On 2 Jun 2008, at 14:31, Chandru wrote:

I respectfully disagree. It is nigh on impossible to predict where there might be some error that leads to a large queue, and this would lead to "defensive programming" where every process has a short max length. This would result in random crashes and loss of data for those uncommon situations in an generally well designed system where there might be a legitimate short term peak in queue lengths.

We already have a mechanism to restart if a queue grows too large (actually 2 - process_info monitoring, and out of memory !)


I agree it is nearly impossible to predict this -  but what options does a programmer have without this bounded queue facility.

  1. Introduce message queue monitoring for every process which is potentially long lived, which imho is extra boiler plate code which reduces readability of core functionality. Also there will be different ways of doing it depending on how your process is structured (gen_fsm, gen_server, gen_event, pure erlang...). If all that one does upon detecting this condition is clear the message queue by discarding messages, or terminate the process, wouldn't it be good to have this built-in?

  2. have another process which monitors the entire system - which is not very scalable when you have hundreds of thousands of processes.

  3. Wait for the system to crash in live and then figure out what happened.


An alternative to the crash-when-queue-is-full solution could be that a user-defined function gets called first. This function should use application-specific knowledge to clean up the queue with the least amount of disturbance for the application. The process should crash only if this cleanup is still not enough or if the function sets a new higher threshold for the queue length. The function could also turn on debug mode so that the reason for the many messages is found as it happens.

I agree. This would be useful too.

cheers
Chandru
 

_______________________________________________
erlang-questions mailing list
[hidden email]
http://www.erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: On selective receive (Re: eep: multiple patterns)

Sean Hinde
In reply to this post by Chandru-4

On 3 Jun 2008, at 12:30, Chandru wrote:

>
> We already have a mechanism to restart if a queue grows too large  
> (actually 2 - process_info monitoring, and out of memory !)
>
>
> I agree it is nearly impossible to predict this -  but what options  
> does a programmer have without this bounded queue facility.

Well, I guess, mostly you need to have a design that doesn't lead to  
massive queue build up under sustained overload :-). This might mean  
input load regulation, or tweaking the process structure (the logger  
process problem).

The system is unlikely to be performing to spec during this whole  
period of queue build up followed by cyclic restart - it doesn't  
really matter if the system restarts because it runs out of memory or  
cyclic restarts one process inside. It is still an outage for  
customers of the system.

All you need to know is that it has crashed and why, so you can fix  
the bug. The erl_crash dump will tell you about the huge message queue.

>   1. Introduce message queue monitoring for every process which is  
> potentially long lived, which imho is extra boiler plate code which  
> reduces readability of core functionality. Also there will be  
> different ways of doing it depending on how your process is  
> structured (gen_fsm, gen_server, gen_event, pure erlang...). If all  
> that one does upon detecting this condition is clear the message  
> queue by discarding messages, or terminate the process, wouldn't it  
> be good to have this built-in?

Another option - fix the system so that it doesn't get into that state.

>   2. have another process which monitors the entire system - which  
> is not very scalable when you have hundreds of thousands of processes.
>
>   3. Wait for the system to crash in live and then figure out what  
> happened.

Exactly. It is a bad bug that leads to such queue build up. Crashing  
is fine in this case, and probably preferable to lingering onwards  
silently failing to provide service.

Cheers,
Sean
_______________________________________________
erlang-questions mailing list
[hidden email]
http://www.erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: On selective receive (Re: eep: multiple patterns)

Chandru-4

2008/6/3 Sean Hinde <[hidden email]>:

On 3 Jun 2008, at 12:30, Chandru wrote:


We already have a mechanism to restart if a queue grows too large (actually 2 - process_info monitoring, and out of memory !)


I agree it is nearly impossible to predict this -  but what options does a programmer have without this bounded queue facility.

Well, I guess, mostly you need to have a design that doesn't lead to massive queue build up under sustained overload :-). This might mean input load regulation, or tweaking the process structure (the logger process problem).

Ofcourse :-) But  as you say, sometimes it is hard to predict it so the design probably didn't cater for it.
 
The system is unlikely to be performing to spec during this whole period of queue build up followed by cyclic restart - it doesn't really matter if the system restarts because it runs out of memory or cyclic restarts one process inside. It is still an outage for customers of the system.

All you need to know is that it has crashed and why, so you can fix the bug. The erl_crash dump will tell you about the huge message queue.

I have seen erlang nodes die a few times without producing an erl_crash.dump. Sometimes it is because Ops got impatient and brutally killed all erlang related processes. Even if you did allow the system to run out of memory, for a system with a lot of memory, it will take a long time. All the while, the system will not be responding as it should be.

1. Introduce message queue monitoring for every process which is potentially long lived, which imho is extra boiler plate code which reduces readability of core functionality. Also there will be different ways of doing it depending on how your process is structured (gen_fsm, gen_server, gen_event, pure erlang...). If all that one does upon detecting this condition is clear the message queue by discarding messages, or terminate the process, wouldn't it be good to have this built-in?

Another option - fix the system so that it doesn't get into that state.

I'm all for fixing the system - all I'm asking for is facilities to detect this with less pain.


 3. Wait for the system to crash in live and then figure out what happened.
 2. have another process which monitors the entire system - which is not very scalable when you have hundreds of thousands of processes.

Exactly. It is a bad bug that leads to such queue build up. Crashing is fine in this case, and probably preferable to lingering onwards silently failing to provide service.

Exactly my point. I guess we both agree that it should crash. The disagreement seems to be about *when and how* it should crash.I would prefer that the process in question crash because in all probability, it's callers have timedout and not expecting a response any way.

cheers
Chandru

_______________________________________________
erlang-questions mailing list
[hidden email]
http://www.erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: On selective receive (Re: eep: multiple patterns)

Sean Hinde

On 3 Jun 2008, at 13:49, Chandru wrote:

>
> 2008/6/3 Sean Hinde <[hidden email]>:
>
>
> Well, I guess, mostly you need to have a design that doesn't lead to  
> massive queue build up under sustained overload :-). This might mean  
> input load regulation, or tweaking the process structure (the logger  
> process problem).
>
> Ofcourse :-) But  as you say, sometimes it is hard to predict it so  
> the design probably didn't cater for it.

All telecom systems are soak tested at X times overload on all  
external interfaces before going into service right :-)

Although not all web systems perhaps !!

> Another option - fix the system so that it doesn't get into that  
> state.
>
> I'm all for fixing the system - all I'm asking for is facilities to  
> detect this with less pain.

If it is just detection you are after then have a process that calls  
process_info to get the queue length of all processes in the system  
once per minute and raise an alarm if any are above a threshold. That  
is not much overhead at all, and can be done without introducing new  
features.

> Exactly my point. I guess we both agree that it should crash. The  
> disagreement seems to be about *when and how* it should crash.I  
> would prefer that the process in question crash because in all  
> probability, it's callers have timedout and not expecting a response  
> any way.

Either way in all likelihood the same fault will manifest itself again  
within a few seconds. I can't help but imagine the proposed feature  
misused in all sorts of quite disgusting ways. Shudder!

Sean

_______________________________________________
erlang-questions mailing list
[hidden email]
http://www.erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: On selective receive (Re: eep: multiple patterns)

Ulf Wiger-2
2008/6/3 Sean Hinde <[hidden email]>:
>
> If it is just detection you are after then have a process that calls
> process_info to get the queue length of all processes in the system
> once per minute and raise an alarm if any are above a threshold. That
> is not much overhead at all, and can be done without introducing new
> features.

Just for fun, I made a few additions to plain_fsm, to play around with this.
The idea is that since you have a hook there anyway, you might
parameterize that hook so that it can check certain limits upon
receive.

The example program fsm_example.erl had a state
with an extended_receive and a timeout clause.
I added an option to tell plain_fsm to react if the
message queue grew past 3 messages:

spawn_link() ->
    plain_fsm:spawn_link(?MODULE, fun() ->
                                          process_flag(trap_exit,true),
                                          queue_limit(),
                                          idle(mystate)
                                  end).

queue_limit() ->
    plain_fsm:store_options(
      [{watch, [{queue, 3, fun(S) ->
                                           io:format("msg queue too long!~n"),
                                           flush(),
                                           S
                           end}]}
      ]).

Testing the code in the shell:

1> P = fsm_example:spawn_link().
<0.33.0>
timeout in idle
timeout in idle
2> [P ! hi || _ <- lists:seq(1,10)].
[hi,hi,hi,hi,hi,hi,hi,hi,hi,hi]
timeout in idle
msg queue too long!
timeout in idle
timeout in idle

In the current version, you can insert checks for message queue length
and heap size, and run_queue, as a quick and dirty way to detect CPU
overload. I haven't checked it in in Jungerl - not convinced yet that
it's a good idea. If anyone wants to play with it, I can send you the code.

Anyway, you're absolutely right in that this kind of check can be made
fairly easily without introducing new 'features'.

BR,
Ulf W
_______________________________________________
erlang-questions mailing list
[hidden email]
http://www.erlang.org/mailman/listinfo/erlang-questions
12
Loading...