lists:pmap ?

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

lists:pmap ?

Ola Andersson A (AL/EAB)
Reading the discussion about list:mapfind reminded me of a function I rediscovered recently.
It's rpc:pmap/3 that was originally defined for use in a distributed environment, spreading out the processing over several nodes. It actually works on a single multicore node as well even though it wasn't designed for that purpose.
With the limited tests I have done it seems to significantly outperform lists:map/2 and also scale reasonably well. The almost negligible cost of spawning erlang processes is still amazing.
How about adding a lists:pmap/2 function, designed for multicore, in the lists module?
/OLA.

_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: lists:pmap ?

zxq9-2
On 2017年08月22日 火曜日 08:42:01 Ola Andersson A wrote:
> Reading the discussion about list:mapfind reminded me of a function I rediscovered recently.
> It's rpc:pmap/3 that was originally defined for use in a distributed environment, spreading out the processing over several nodes. It actually works on a single multicore node as well even though it wasn't designed for that purpose.
> With the limited tests I have done it seems to significantly outperform lists:map/2 and also scale reasonably well. The almost negligible cost of spawning erlang processes is still amazing.
> How about adding a lists:pmap/2 function, designed for multicore, in the lists module?


I've implemented something along these lines several times in an ad hoc manner for pure map functions in various projects, especially client-side code (but the function being mapped really needs to be pure!). I imagine that's fairly common (in client code not doing this is sometimes crazy).

It WOULD be pretty awesome to have this in the standard library, and I can easily imagine a version of that which would be written against other collection-type data structures as well...

But I also imagine that the arbitrariness of the input would suddenly make an implementation become non-trivial to go about in a safe way (or not at least without splashing the documentation liberally with warnings that pmap might blow your node up, that mapped functions really need to be pure, etc.). Do we limit worker spawns to a finite total size based on the VM's condition and keep up with it as return values are reported back? "The VM's condition" is a special ball of madness right there. Etc.

This could be picked through, and probably quite effectively if some time is put to it. I just want to point out that the implementation will either be super naive (but still useful in cases where people know what they are dealing with), or super involved for what is conceptually a very simple idea -- but probably not very much happy middle exists between those two extremes.

-Craig
_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: lists:pmap ?

Joe Armstrong-2
Interesting-  I teach a class in concurrent and parallel programming
where I show a simple pmap subject to the condition that all processes
terminate
within a reasonable time and with no errors.

The more fun cases do things like limiting the number of parallel processes
and versions which don't bother about the order of the replies in the
result list. We might want to compute F(X) for all elements X in a
list L, but given
L = [X1,X2,X3]   might not want to compute [F(X1),F(X2),F(X3)] any ordering
might be acceptable (say [F(X2),F(X1),F(X3)].

If we have N cores we might want to limit the number of parallel processes
to say 2N since we can't actually do more that N things in parallel -
or we might not.

You might want to look at this project

http://skel.weebly.com/about-skel.html

Which has a number of algorithmic skeletons for parallelizing erlang programs

Cheers

/Joe


On Tue, Aug 22, 2017 at 10:58 AM, zxq9 <[hidden email]> wrote:

> On 2017年08月22日 火曜日 08:42:01 Ola Andersson A wrote:
>> Reading the discussion about list:mapfind reminded me of a function I rediscovered recently.
>> It's rpc:pmap/3 that was originally defined for use in a distributed environment, spreading out the processing over several nodes. It actually works on a single multicore node as well even though it wasn't designed for that purpose.
>> With the limited tests I have done it seems to significantly outperform lists:map/2 and also scale reasonably well. The almost negligible cost of spawning erlang processes is still amazing.
>> How about adding a lists:pmap/2 function, designed for multicore, in the lists module?
>
>
> I've implemented something along these lines several times in an ad hoc manner for pure map functions in various projects, especially client-side code (but the function being mapped really needs to be pure!). I imagine that's fairly common (in client code not doing this is sometimes crazy).
>
> It WOULD be pretty awesome to have this in the standard library, and I can easily imagine a version of that which would be written against other collection-type data structures as well...
>
> But I also imagine that the arbitrariness of the input would suddenly make an implementation become non-trivial to go about in a safe way (or not at least without splashing the documentation liberally with warnings that pmap might blow your node up, that mapped functions really need to be pure, etc.). Do we limit worker spawns to a finite total size based on the VM's condition and keep up with it as return values are reported back? "The VM's condition" is a special ball of madness right there. Etc.
>
> This could be picked through, and probably quite effectively if some time is put to it. I just want to point out that the implementation will either be super naive (but still useful in cases where people know what they are dealing with), or super involved for what is conceptually a very simple idea -- but probably not very much happy middle exists between those two extremes.
>
> -Craig
> _______________________________________________
> erlang-questions mailing list
> [hidden email]
> http://erlang.org/mailman/listinfo/erlang-questions
_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: lists:pmap ?

zxq9-2
On 2017年08月22日 火曜日 18:34:44 you wrote:
> You might want to look at this project
>
> http://skel.weebly.com/about-skel.html
>
> Which has a number of algorithmic skeletons for parallelizing erlang programs

Hi, Joe!

Thank you for pointing out Skel!

I'll have more time to see how hard it would be to package it (or maybe something derived from it) soon. I imagine it might be an easy way to scratch the pmap-flavored itch Ola was talking about.

-Craig
_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: lists:pmap ?

Michał Muskała
Elixir recently introduced Task.async_stream/3,5 functions [1]. The primary feature is a pmap over a collection with a bounded concurrency. It's surprisingly useful and covers a lot of use cases in practice.
An additional nice thing is that it accepts a potentially lazy collection and returns a lazy collection as well, so it's possible to consume only a part of it.

On 23 Aug 2017, 13:07 +0200, zxq9 <[hidden email]>, wrote:
On 2017年08月22日 火曜日 18:34:44 you wrote:
You might want to look at this project

http://skel.weebly.com/about-skel.html

Which has a number of algorithmic skeletons for parallelizing erlang programs

Hi, Joe!

Thank you for pointing out Skel!

I'll have more time to see how hard it would be to package it (or maybe something derived from it) soon. I imagine it might be an easy way to scratch the pmap-flavored itch Ola was talking about.

-Craig
_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions

_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: lists:pmap ?

Ulf Wiger-2
In reply to this post by Ola Andersson A (AL/EAB)
I discussed the problems with a general pmap in a presentation back in 2009.


The purpose wasn't really to argue against a general pmap, but rather to use it as a simple example to illustrate the problems with treating erlang processes as 'fibers' rather than independent actors.

BR,
Ulf W

2017-08-22 10:42 GMT+02:00 Ola Andersson A <[hidden email]>:
Reading the discussion about list:mapfind reminded me of a function I rediscovered recently.
It's rpc:pmap/3 that was originally defined for use in a distributed environment, spreading out the processing over several nodes. It actually works on a single multicore node as well even though it wasn't designed for that purpose.
With the limited tests I have done it seems to significantly outperform lists:map/2 and also scale reasonably well. The almost negligible cost of spawning erlang processes is still amazing.
How about adding a lists:pmap/2 function, designed for multicore, in the lists module?
/OLA.

_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions


_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: lists:pmap ?

Robert Virding
There are many issues you need to address if you are going to add a "standard" pmap to your library. Some have already been mentioned like how much parallelism and are you going to preserve the list order, which I think is critical.

Another one is what to do when you get an error. There are some obvious alternatives:

- If you get an error crash all the processes and return an error pretty much in the same way map does
- Always return the full list but indicate which elements worked and which didn't, or maybe just return a default "error" value

But you could always do something for your specific case.

Robert


On 30 August 2017 at 13:31, Ulf Wiger <[hidden email]> wrote:
I discussed the problems with a general pmap in a presentation back in 2009.


The purpose wasn't really to argue against a general pmap, but rather to use it as a simple example to illustrate the problems with treating erlang processes as 'fibers' rather than independent actors.

BR,
Ulf W

2017-08-22 10:42 GMT+02:00 Ola Andersson A <[hidden email]>:
Reading the discussion about list:mapfind reminded me of a function I rediscovered recently.
It's rpc:pmap/3 that was originally defined for use in a distributed environment, spreading out the processing over several nodes. It actually works on a single multicore node as well even though it wasn't designed for that purpose.
With the limited tests I have done it seems to significantly outperform lists:map/2 and also scale reasonably well. The almost negligible cost of spawning erlang processes is still amazing.
How about adding a lists:pmap/2 function, designed for multicore, in the lists module?
/OLA.

_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions


_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions



_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions