Representation of a map in bytecode?

classic Classic list List threaded Threaded
14 messages Options
Reply | Threaded
Open this post in threaded view
|

Representation of a map in bytecode?

Hugh Perkins
How is a map represented in bytecode?  To what extent is it clearly
identifiable as a map, directly from the bytecode?

Kindof contemplating to what extent it could be interesting/useful/fun
to make a fork of Erlang that automatically parallelizes maps across
available processor cores.

This should probably be done dynamically, at runtime.  I guess two options are:

- write a wrapper function around map that communicates with some kind
of scheduler process   (presumably not the build-in scheduler, at
least initially), to split the map across multiple cores if there are
some available

- extend the vm (and perhaps the byte-code) so that the vm can detect
maps and automatically split them across multiple available processor
cores where appropriate

Just kindof contemplating out of curiosity.  No idea how feasible it
is, or how useful, but interests me, especially with Tilera-64s and so
on on the horizon.
_______________________________________________
erlang-questions mailing list
[hidden email]
http://www.erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: Representation of a map in bytecode?

Dustin Sallings-2

On Sep 14, 2007, at 21:02, Hugh Perkins wrote:

> How is a map represented in bytecode?  To what extent is it clearly
> identifiable as a map, directly from the bytecode?
>
> Kindof contemplating to what extent it could be interesting/useful/fun
> to make a fork of Erlang that automatically parallelizes maps across
> available processor cores.

        Do you mean lists:map/{2,3}?  It'd be trivial to write your own map  
functions to use in place of lists:map.  If you wanted to have it  
replaced wholesale, then you'd need to patch the lists module, of  
course.

--
Dustin Sallings


_______________________________________________
erlang-questions mailing list
[hidden email]
http://www.erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: Representation of a map in bytecode?

Kenneth Lundin
In reply to this post by Hugh Perkins
Hi,

On 9/15/07, Hugh Perkins <[hidden email]> wrote:
> How is a map represented in bytecode?  To what extent is it clearly
> identifiable as a map, directly from the bytecode?

There is nothing special that distinguishes a map in the byte code.
A map is typically implemented with a call to lists:map(Fun,List) or with
a list comprehension like this [mapping(X)|| X <- List].
Although it would be possible to find the pattern for a map in the byte code
I think it is a bad idea trying to parallellize this by some automatic magic.
One of the strengths with Erlang is that it is very easy and natural
to parallellize things explicitly by the use of Erlang processes. If a
system already is parallellized
into e.g. one Erlang process per connection in a web-server, one
Erlang process per ongoing call in a telecom system etc. there is very
little reason trying to parallellize each map if there are thousands
of call or connection processes
executing simultaneously and that each of those potentially will
perform a map operation as part of their work.

What I probably try to say is that there are different approaches to
parallellism.

1) Trying to optimize a specific algorithm by use of parallell
computation. Assuming that this is the only ongoing thing on this
computer.
This can be a heavy map operation for example, but even then it must
be considered if the actuall mapping work for each element is big
enough to motivate a split into many parallell tasks.

2) A SW system that is explicitly designed to be parallell from the
beginning (like the one process per connection or one process per call
above).
If the system design already allows thousands of parallell tasks it is
far from obvious that also the internal execution of each such task
should benefit from
automatic parallellisation. I actually believe that a system can be
over parallellized and that this will degrade the performance.

I don't think automatic parallellization of a map operation is
interesting when it is so easy to write and encapsulate that parallell
map into a function and call that function when parallellization is
wanted. This function can of course be included in
one of the standard libraries if the problem is common enough.

/Kenneth (Erlang/OTP team at Ericsson)

>
> Kindof contemplating to what extent it could be interesting/useful/fun
> to make a fork of Erlang that automatically parallelizes maps across
> available processor cores.
>
> This should probably be done dynamically, at runtime.  I guess two options are:
>
> - write a wrapper function around map that communicates with some kind
> of scheduler process   (presumably not the build-in scheduler, at
> least initially), to split the map across multiple cores if there are
> some available
>
> - extend the vm (and perhaps the byte-code) so that the vm can detect
> maps and automatically split them across multiple available processor
> cores where appropriate
>
> Just kindof contemplating out of curiosity.  No idea how feasible it
> is, or how useful, but interests me, especially with Tilera-64s and so
> on on the horizon.
> _______________________________________________
> erlang-questions mailing list
> [hidden email]
> http://www.erlang.org/mailman/listinfo/erlang-questions
>
_______________________________________________
erlang-questions mailing list
[hidden email]
http://www.erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: Representation of a map in bytecode?

Rynö Jouni (FMI)
On Mon, 2007-09-17 at 12:33 +0200, Kenneth Lundin wrote:

> If the system design already allows thousands of parallell tasks it is
> far from obvious that also the internal execution of each such task
> should benefit from
> automatic parallellisation. I actually believe that a system can be
> over parallellized and that this will degrade the performance.
>
There used to be a beautiful processor architecture called Transputer
and the language called Occam to program it. Making things going
parallel was even more easier and natural than in Erlang.

And there were many papers published, that going from extreme parallel
systems to more serialised one was the way optimise the performance.
Extreme parallel meaning systems, where you have decades more processes
than real executing hardware. It all depends on the process switching
time and communication overhead.

regards
        Jouni
--

  Jouni Rynö                            mailto://[hidden email]/
                                        http://space.fmi.fi/~ryno/
  Finnish Meteorological Institute      http://www.fmi.fi/
  Space Research                        http://space.fmi.fi/
  P.O.BOX 503                           Tel      (+358)-9-19294656
  FIN-00101 Helsinki                    FAX      (+358)-9-19294603
  Finland                               priv-GSM (+358)-50-5302903
 
  "It's just zeros and ones, it cannot be hard"

_______________________________________________
erlang-questions mailing list
[hidden email]
http://www.erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: Representation of a map in bytecode?

Robert Virding
There is another problem with automatic parallelisation and Erlang. As Erlang DOES have side-effects, message passing basically, the order in which things are evaluated is significant. It is therefore not safe to automatically parallelise code. It is up to the programmer/designer to decide how the application should be made parallel.

This is, of course, not always easy. Especially for most who come from a sequential world where you usually work out how to make things sequential. Yes I have been there. :-)

Robert

On 17/09/2007, Jouni Rynö <[hidden email]> wrote:
On Mon, 2007-09-17 at 12:33 +0200, Kenneth Lundin wrote:

> If the system design already allows thousands of parallell tasks it is
> far from obvious that also the internal execution of each such task
> should benefit from
> automatic parallellisation. I actually believe that a system can be
> over parallellized and that this will degrade the performance.
>
There used to be a beautiful processor architecture called Transputer
and the language called Occam to program it. Making things going
parallel was even more easier and natural than in Erlang.

And there were many papers published, that going from extreme parallel
systems to more serialised one was the way optimise the performance.
Extreme parallel meaning systems, where you have decades more processes
than real executing hardware. It all depends on the process switching
time and communication overhead.

regards
        Jouni
--

  Jouni Rynö                            mailto://Jouni.Ryno@.../
                                        http://space.fmi.fi/~ryno/
  Finnish Meteorological Institute      http://www.fmi.fi/
  Space Research                        http://space.fmi.fi/
  P.O.BOX 503                           Tel      (+358)-9-19294656
  FIN-00101 Helsinki                    FAX      (+358)-9-19294603
  Finland                               priv-GSM (+358)-50-5302903

  "It's just zeros and ones, it cannot be hard"

_______________________________________________
erlang-questions mailing list
[hidden email]
http://www.erlang.org/mailman/listinfo/erlang-questions


_______________________________________________
erlang-questions mailing list
[hidden email]
http://www.erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: Representation of a map in bytecode?

Ulf Wiger-2
2007/9/17, Robert Virding <[hidden email]>:
> There is another problem with automatic parallelisation and Erlang. As
> Erlang DOES have side-effects, message passing basically, the order in which
> things are evaluated is significant. It is therefore not safe to
> automatically parallelise code. It is up to the programmer/designer to
> decide how the application should be made parallel.

In the Barklund spec, which you helped write, it states clearly that the
only order which is specified is that of expressions in a body. In an
example, (X=8) + (X=9) is used to illustrate how the evaluation order can
cause different run-time behavior(*), whereas the compiler accepts it, since
all possible orders are valid at compile-time. (Chapter 6.5)

(*) In this case, if X is unbound; then the exception will look different
depending on the evaluation order: {badmatch, 8} or {badmatch,9}

The new Erlang Reference Manual doesn't, as far as I can tell, address
the issue of evaluation order within patterns. Personally, I think the
description in the Specification is just fine.

BR,
Ulf W
_______________________________________________
erlang-questions mailing list
[hidden email]
http://www.erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: Representation of a map in bytecode?

Hugh Perkins
In reply to this post by Robert Virding
On 9/18/07, Robert Virding <[hidden email]> wrote:
> There is another problem with automatic parallelisation and Erlang. As
> Erlang DOES have side-effects, message passing basically, the order in which
> things are evaluated is significant.

Hmmm, interesting.  Good point.

>  In the Barklund spec, which you helped write, it states clearly that the
only order which is specified is that of expressions in a body. In an
example, (X=8) + (X=9) is used to illustrate how the evaluation order can
cause different run-time behavior(*), whereas the compiler accepts it, since
all possible orders are valid at compile-time. (Chapter 6.5)

Hmmm, interesting too. So, is the order of execution of lists:map
guaranteed to proceed always in a certain order?

> Erlang DOES have side-effects, message passing basically

Apart from io:format, are there any other side-effects possible in Erlang?

Could one use a kindof MapReduce implementation where a map is
evaluated as following:
- incoming list to map split into n sub-lists, where n is number of
available cores (eg on 64-core machine, if 17 cores are free at moment
of execution of map, n = 17)
- map run in n child processes, with one sub-list given to each process
- any outgoing messages are cached temporarily
- once all the child processes are finished, the messages are sent to
their targets, in order (this is the "Reduce" part of the MapReduce
implementation)

Presumably, one could do the same thing for any io:formats too actually?

Of course, another approach could be to use Haskell style monads to
ensure purity, but Haskell has a number of other issues, such as lack
of a vm.

(By the way, just to clarify, in answer to objections such as:

> If a system already is parallellized
into e.g. one Erlang process per connection in a web-server, one
Erlang process per ongoing call in a telecom system etc. there is very
little reason trying to parallellize each map if there are thousands
of call or connection processes
executing simultaneously and that each of those potentially will
perform a map operation as part of their work.

I'm not really good at communication :-) so it's not really obvious in
my proposition, but when I say "number of available cores", I dont
mean the total number of cores on the system, I mean the number of
available cores *at the time of execution of the map*.

For example, example 1, imagine our program looks like this:

dosomething(Max) -> lists:map( fun blah/1, [1..Max] ).

Imagine we're running on a 64-core machine.

At the time of execution of the map, there is a single process
running, this one, and 63 other cores free, so there are a total of 64
cores available to the map.

So, the list will be split into chunks of about (Max / 64 ) elements.

Now, example 2, imagine our program is a webserver (your example), and
the webpage runs a map.

Example 2b, lets say there are 98 requests currently being handled, so
at least 98 processes running.  All 64 cores are in use.  When we
reach the map statement, we dont bother to split it into chunks,
there's no point, as you say.

Example 2b, lets say there is 1 request currently being handled, plus
our new request.  The request currently being handled is using 54
cores (lets say), so there are 10 cores free.

Now, when we reach our map statement, it makes sense to split the map
across the 10 available cores.

Of course, the chunking will never be perfect, but its' really easy to
do, and transparent to the programmer (in the absence of side-effects,
but see above for mapreduce approach, or maybe use monads).  It's
generally going to be fairly optimal compared to static approaches,
such as SPJ's NDP, or hard-coding directly by the programmer?
_______________________________________________
erlang-questions mailing list
[hidden email]
http://www.erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: Representation of a map in bytecode?

David King
> Hmmm, interesting too. So, is the order of execution of lists:map
> guaranteed to proceed always in a certain order?


http://erlang.org/doc/man/lists.html says:

> The evaluation order [of lists:map/1] is implementation dependent.

_______________________________________________
erlang-questions mailing list
[hidden email]
http://www.erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: Representation of a map in bytecode?

Hugh Perkins
On 9/18/07, David King <[hidden email]> wrote:
> > The evaluation order [of lists:map/1] is implementation dependent.
>

Ok so, in theory, so we dont care about side-effects in a parallelized map?

Still, probably better to enhance test repeatability by ensuring that
the side-effects generated are predictable on a per-implementation
basis?

Ermmm..... random question, if lists:map is implementation dependent,
I guess this means that, in standard erlang, the test results on one
architecture are not portable to other architectures?  Is this a good
thing?
_______________________________________________
erlang-questions mailing list
[hidden email]
http://www.erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: Representation of a map in bytecode?

Bjorn Gustavsson
"Hugh Perkins" <[hidden email]> writes:

> Ermmm..... random question, if lists:map is implementation dependent,
> I guess this means that, in standard erlang, the test results on one
> architecture are not portable to other architectures?  Is this a good
> thing?

The result IS the same on all architectures.

There once was a difference between the BEAM and JAM (Joe Armstrong
virtual Machine). JAM has since then been retired (it was last seen in
the R5B release).

/Bjorn

--
Björn Gustavsson, Erlang/OTP, Ericsson AB
_______________________________________________
erlang-questions mailing list
[hidden email]
http://www.erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: Representation of a map in bytecode?

Richard A. O'Keefe-2
In reply to this post by Hugh Perkins
To repeat a point that has already been made:
(1) Providing a parallel map that people can call IF THEY WANT TO is  
both a
     good idea and easy to do in Erlang.  A low level implementation  
of pmap
     might be a good idea.
(2) Turning ALL maps into parallel ones is a dumb idea.  Think about  
a tree.
     There are two ways to measure how big it is:  SIZE (total number  
of nodes)
     and DEPTH (length of longest path).  Now any computation can be  
seen as a
     tree of dependencies (this is calculated in terms of that).  In  
a serial
     system, the time is proportional to the SIZE.  In a system with  
more
     processors than you could ever want, the time is proportional to  
the
     DEPTH.  It is only worth parallelising a map when the SIZE of the
     computation significantly exceeds the DEPTH, that is, when the  
amount
     of work per element is "high" compared with the cost of simply  
getting to
     all the elements.  Given the current implementation of lists,
        map(fun(X) -> X+1 end, L)
     is going to be O(length(L)) even given infinitely many processors.

Suppose we had tuple maps, specified as

        tmap(F, T) -> list_to_tuple(map(F, tuple_to_list(T))).

but implemented directly.  It is straightforward to implement this with
depth proportional to log(size(T)), so automatically parallelising tuple
maps would very nearly be sensible.  (And it's what things like  
parellising
C and Fortran compilers do.)  Even then, for cheap enough F or small
enough T, it might not be worth while.

I am *not* saying that automatic parallisation is a bad idea, only that
*blind* automatic parallelisation is a bad idea.  Doing it right  
requires
reasonably thorough whole-module analysis (at least!), not simply  
pattern
matching byte code.

Something like the approach that Clean used to take might work pretty
well.  Clean is a pure lazy functional language like Haskell, but with
lightweight arrays and usually getting better performance.  It was
called Concurrent Clean because it really did have a multiprocessor
concurrent implementation.  Clean used programmer supplied annotations
to indicate where parallelising would be a good idea.

Another approach is the "parallel skeletons" one, where there are
preprogrammed patterns of concurrency and you just plug your details
in.  Not unlike Erlang behaviours, come to think of it.  And not
unlike Joe's parallel map that you can call when it's appropriate.


_______________________________________________
erlang-questions mailing list
[hidden email]
http://www.erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: Representation of a map in bytecode?

Richard A. O'Keefe-2
In reply to this post by Hugh Perkins

On 18 Sep 2007, at 7:47 pm, Hugh Perkins wrote:

> On 9/18/07, David King <[hidden email]> wrote:
>>> The evaluation order [of lists:map/1] is implementation dependent.
>>
>
> Ok so, in theory, so we dont care about side-effects in a  
> parallelized map?

Whyever would you say THAT?  No, the conclusion is that if you care
about side effects, you don't use lists:map/1, you use your own code
and *document* what order things must be done.

And if you map a function with side effects down a list, you had
*better* care about the order, even if only to prove that in this
case the order does not matter.

> Ermmm..... random question, if lists:map is implementation dependent,
> I guess this means that, in standard erlang, the test results on one
> architecture are not portable to other architectures?  Is this a good
> thing?

Common Lisp and Scheme say exactly the same thing: the order in which
user functions are applied in the supplied list processing functions is
not defined.  Common Lisp has a fairly large section saying what such
a user-supplied function must not do.

Here are two possible implementations of map:

        map(F, [H|T]) ->
            H1 = F(H),
             T1 = map(F, T),
             [H1|T1];
        map(F, []) when is_function(F, 1) ->
            [].

is just the existing lists:map/2 written in a funny way, while

        map(F, [H|T]) ->
            T1 = map(F, T),
             H1 = F(H),
             [H1|T1];
        map(F, []) when is_function(F, 1) ->
            [].

For pure functions F (that is, for the application map/2 was designed  
for)
you get the same result either way.  If now you write

        map(F, [H|T]) ->
            [F(H) | map(F, T)];
        map(F, []) when is_function(F, 1) ->
            [].

the version you get depends on what the compiler does with [E1 | E2].
Which was defined in one Erlang specification to be left to right, but
it is the kind of thing people can change their minds about.

Anyone who cares about the order of operations really should be using
the sequencing operator (,) to make it absolutely explicit and
unmistakable.

I would be a bit wary about list comprehensions too...  Do you know what
the compiler does with them?  Are you sure it will still be doing the
same thing in 10 years?  (I can think of a fairly obvious strategy
which, for equally obvious plausible reasons, might evaluate parts of a
comprehension left to right and other parts right to left.)

_______________________________________________
erlang-questions mailing list
[hidden email]
http://www.erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: Representation of a map in bytecode?

Joe Armstrong-2
In reply to this post by Richard A. O'Keefe-2
On 9/19/07, ok <[hidden email]> wrote:
> To repeat a point that has already been made:
> (1) Providing a parallel map that people can call IF THEY WANT TO is
> both a
>      good idea and easy to do in Erlang.  A low level implementation
> of pmap
>      might be a good idea.

Absolutely - there are only two ways you can exploit parallelism "on the chip"
these are instruction level parallelism (ie the instruction set on the
chip pipelines several instructions for higher performance, fetching
one instruction as it evaluates another) and thread level parallelism
(multicores) - there are no other ways of exploiting parallelism in
the hardware.

For the programmer this means that only thread level parallelism is available
(in the form of multicores). Setting up a thread level parallel
computation costs
something - if this cost is greater than the cost of doing the
computation in-place
then nothing is won. Only the programmer knows (or should know) if a given map
is worth parallelising. Automatic attempts to parallelise sequential
code have always
been dismal failures (despite massive research budgets)

/Joe
_______________________________________________
erlang-questions mailing list
[hidden email]
http://www.erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: Representation of a map in bytecode?

Hugh Perkins
Wow, Joe has replied in my thread.  That's cool :-)

I guess really we're never going to know for sure until we've got some
64-cores to play with.

I kindof tend to use real-world analogies to solve programming
problems.  Maybe this is why I always hated quantum physics where this
isnt possible.

In the real-world, a manager will assign tasks to people, then follow
up how the tasks are going.  Any tasks that are taking a while will be
shared among several people.  People who finish their tasks sooner
will be given new tasks.

The manager doesnt need to know how the tasks are actually done, or
need any prior knowledge of how long a task will take.

So, switching back to threading, I kindof see it like you have a
supervisor process which watches the processes around it.  When there
are cores free, it looks at which processes are (a) taking a while,
and (b) running some kind of parallelisable task (map, fold of
associative function, etc ...), and shares that task out among the
available cores.

In a later version, one could imagine that the supervisor process can
memorize (cache) which bits of the program tended to need
parallelizing out, so that in the future it proactively parallelizes
them out right off the bat.

"but everything ends up being choked by a serial bottleneck".  Well,
in the real-world tasks can be shared among thousands of people
successfully, in various ways.

"what about books, theyre always written by one person right?"  In the
case of books, we take a "mud sticks" strategy in the real world.
Many people write books, and a few of them turn out to be really good.

Parallelizing is only hard when you seek to run at close to 100% of
the single-core efficiency.  If one is willing to allow for a lot of
wastage, maybe each thread is only 10% as efficient as in a
single-threaded application, it's maybe easy to share among thousands
of cores?
_______________________________________________
erlang-questions mailing list
[hidden email]
http://www.erlang.org/mailman/listinfo/erlang-questions