Interesting benchmark performance (was RE: Pitiful benchmark perf ormance)

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Interesting benchmark performance (was RE: Pitiful benchmark perf ormance)

Sean Hinde-2


> This was interesting.
>
> Erlang strikes me as more and more human.  ;)
> I seem to function in the same way: when I have lots to do, I
> become more efficient; when I have little to do, I become very
> inefficient.

I too have this same scheduling quirk, although I would suggest the
mechanism is not quite the same as the Erlang runtime one..

> Seriously, Erlang *has been* highly optimized for the very type
> of systems where there are lots of things going on at once. This
> reflects on the I/O, for example, where Erlang tries to handle up
> to hundreds of active I/O ports fairly, rather than handling only
> one extremely well.

It says to me rather that Erlang is optimised for applications which are
sufficiently complex that at least 2000 reductions are required following
each external input or timeout. It seems to have little to do with just
general stuff going on, and that for many applications Erlang would be
otherwise superb at (web server, Mnesia based Online Transaction Processing
server) there is a significant penalty.

A small change to the runtime could mean that I don't have to run a busy
loop in the background to get better performance out of my remote mnesia
servers..

- Sean



NOTICE AND DISCLAIMER:
This email (including attachments) is confidential.  If you have received
this email in error please notify the sender immediately and delete this
email from your system without copying or disseminating it or placing any
reliance upon its contents.  We cannot accept liability for any breaches of
confidence arising through use of email.  Any opinions expressed in this
email (including attachments) are those of the author and do not necessarily
reflect our opinions.  We will not accept responsibility for any commitments
made by our employees outside the scope of our business.  We do not warrant
the accuracy or completeness of such information.



Reply | Threaded
Open this post in threaded view
|

Interesting benchmark performance (was RE: Pitiful benchmark perf ormance)

Thomas Lindgren-2

> It says to me rather that Erlang is optimised for applications which are
> sufficiently complex that at least 2000 reductions are required following
> each external input or timeout. It seems to have little to do with just
> general stuff going on, and that for many applications Erlang would be
> otherwise superb at (web server, Mnesia based Online Transaction Processing
> server) there is a significant penalty.

Maybe the time has come to adaptively adjust the number of reductions
before yielding (ie, rescheduling)?

Here is a portable, straightforward approach: instead of compiling in
a constant number of reductions, the number of remaining reductions
should be loaded from the process structure when checking for
yielding.

The interesting part, then, is deciding how many reductions you get
when you're scheduled. A simple approach is to permit the system to
set the reductions-per-yield at runtime (per process or for the entire
node), by using a BIF. But this must be supplemented by some way to
measure activity, so that the decision can be made systematically.
(Alternatively, one could take the approach that reductions-per-yield
is set _only_ inside the runtime, to avoid messing around with BIFs.)

A second, orthogonal, topic to consider is how well a "reduction"
corresponds to a time tick. A reduction can vary quite a bit in the
amount of time it requires, because of BIFs: today, there are ways to
bump reductions when a long-running BIF begins.

Another approach to yielding might be to measure the available time
slice in hardware cycles, rather than procedure calls. All desktop
processors have cycle counters, for example, so it is viable for a
wide range of systems. Unfortunately, the counters are often somewhat
messy to work with.

                        Thomas
--
Thomas Lindgren thomas+junk
Alteon WebSystems



Reply | Threaded
Open this post in threaded view
|

Interesting benchmark performance (was RE: Pitiful benchmark perf ormance)

Ulf Wiger-4

Another option could be to allow "high-priority" ports, that are
polled e.g. just before checking the normal priority process
queue -- or just after, to reduce the starvation risk.

Not that I'm against a time- or cycle-based scheduling model.

/Uffe

On Mon, 18 Jun 2001, Thomas Lindgren wrote:

>Maybe the time has come to adaptively adjust the number of
>reductions before yielding (ie, rescheduling)?
>
>Here is a portable, straightforward approach: instead of
>compiling in a constant number of reductions, the number of
>remaining reductions should be loaded from the process
>structure when checking for yielding.
>
>The interesting part, then, is deciding how many reductions you
>get when you're scheduled. A simple approach is to permit the
>system to set the reductions-per-yield at runtime (per process
>or for the entire node), by using a BIF. But this must be
>supplemented by some way to measure activity, so that the
>decision can be made systematically. (Alternatively, one could
>take the approach that reductions-per-yield is set _only_
>inside the runtime, to avoid messing around with BIFs.)
>
>A second, orthogonal, topic to consider is how well a
>"reduction" corresponds to a time tick. A reduction can vary
>quite a bit in the amount of time it requires, because of BIFs:
>today, there are ways to bump reductions when a long-running
>BIF begins.
>
>Another approach to yielding might be to measure the available
>time slice in hardware cycles, rather than procedure calls. All
>desktop processors have cycle counters, for example, so it is
>viable for a wide range of systems. Unfortunately, the counters
>are often somewhat messy to work with.
>
> Thomas
>

--
Ulf Wiger                                    tfn: +46  8 719 81 95
Senior System Architect                      mob: +46 70 519 81 95
Strategic Product & System Management    ATM Multiservice Networks
Data Backbone & Optical Services Division      Ericsson Telecom AB



Reply | Threaded
Open this post in threaded view
|

Interesting benchmark performance (was RE: Pitiful benchmark perf ormance)

Scott Lystig Fritchie-3
>>>>> "uw" == Ulf Wiger <etxuwig> writes:

uw> Another option could be to allow "high-priority" ports, that are
uw> polled e.g. just before checking the normal priority process queue
uw> -- or just after, to reduce the starvation risk.

There was a conversation not long ago over in comp.lang.functional
about managing global state.  One person noted that the latency to
retrieve some "global" state from process S by process C can be a
high-latency operation: it may be a while before S is scheduled to
run, and when S sends its reply, it may be a while before C is
scheduled to run again.

A stray thought this weekend suggested a scheduling mechanism similar
to doors, which are found in Spring, Solaris, and (I think) Linux.
Doors have *very* low latency because the normal process scheduler is
short-circuited: the door server is scheduled to run immediately after
the client executes door_call(), and the client is rescheduled
immediately after the server executes door_return().  From a
wall-clock point of view, a door call looks an awful lot like a local
procedure call: there is process context switching during the call &
return, but they happen without interference from the regular
scheduling algorithm(s).

Doors are a synchronous communication mechanism.  Erlang's are
asynchronous.  If a lower-latency communication scheme were added onto
Erlang, how would you do it to get round trip times door-like?

1. The server process has an attribute that would cause it to preempt
the message sending process immediately.  When the server process
yields or blocks, the message sending process is the next one
scheduled so that it can get its reply right away.

There are a couple of problems with this, I guess.  One, there's no
guarantee that the server will reply to the client right away.  (But
if you want door-like performance, the server must have an answer
right away.  Caveat emptor.)  Two, every message sent to the server
would be handled this way.  This may or may not be bad.

2. Use a new syntactic thingie to explicitly denote door-like
behavior.  For example, use "Pid !! Message".

It's an icky idea, but I thought I ought to mention it.  It may even
have merit, somehow.  {shrug}

3. Have the server process have a second "address" to receive
messages.  Messages received via the real pid make no change in
scheduling.  Messages received via a second pid (pseudo-pid?) would
cause door-like scheduling to occur.

This would be a weird thing for Erlang: messages sent to two different
pids actually get delivered to the same mailbox for the same process.
However, it doesn't solve the problem of how to get door-like
scheduling behavior when sending a reply back to the client ... unless
the client has a pseudo-pid, too.

4. None of the above.

-Scott
---
Scott Lystig Fritchie
Professional Governing: Is It Faked?