Process pool map/3 implementation

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

Process pool map/3 implementation

Parnell Springmeyer-2
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

For a work project I have a large list (thousands of items) to process
and at first built a "pmap" implementation as per Joe's book until I
found the plists module (which is awesome btw).

There is one glaring issue with the list -> subdivide -> spawn x
processes for n sublist items strategy; if an item in the sublist takes
longer than all the other items it blocks the entire resource allotment
until it is done.

In most cases, the plists/pmap implementation works just fine because
the items in the list probably don't take more than a few milliseconds
to map the fun over. However, it does become an issue when that is not
the case.

So, I figured the next best strategy would be to implement a process
pool since it would allow for slow running processes to continue their
work while finished processes can die and new processes spawned into the
pool ready for work - so none of the resources are sitting idle.

Right now, my module isn't nearly as feature-complete as the plists
module is - this is only a drop in replacement for map. Please submit
your criticisms and comments to me at this address.

You may find the code on BitBucket: https://bitbucket.org/ixmatus/ppool

- --
Parnell "ixmatus" Springmeyer (http://ixmat.us)
-----BEGIN PGP SIGNATURE-----
Version: GnuPG/MacGPG2 v2.0.17 (Darwin)
Comment: GPGTools - http://gpgtools.org

iQEcBAEBAgAGBQJOJmKwAAoJEPvtlbpI1POL+asIAKPcR0SOw67hFwwIbmkf89sS
4+Zx9hx1V/+86OVtXcqcOY+yxNcHezNEKkw8z2XHmDAWbeOl3bbINFySRXbQVydV
854lArqCHRG+ZlJ6ZrgecXKf9mG8ldbK1InwEZWOVZBj63rhmloMaGiyTzmxA88S
7mDNS4uhhpvRT2znpnsWt1x12IAzeayV0hf5/BLjp+b5FMZPc9oSa4n5uzyA9AVW
+av6hyuFfK32lhxUb4u3bVMaHOf2n/YwJexS25+NODcpkI3BLXNkrmKwgz8Lv/sA
omKzKTiuhpa0vTM+TLI9pn82GCJLdD+ON9DDOFN4ww+BnmXjhykiicBQCg7yhtQ=
=GP7K
-----END PGP SIGNATURE-----
_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: Process pool map/3 implementation

Dmitrii Dimandt-2

There's also RabbitMQ's worker_pool, http://www.lshift.net/blog/2010/03/29/on-the-limits-of-concurrency-worker-pools-in-erlang,  and my feeble attempt at somewhat extending it: https://github.com/dmitriid/worker_pool

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

For a work project I have a large list (thousands of items) to process
and at first built a "pmap" implementation as per Joe's book until I
found the plists module (which is awesome btw).

There is one glaring issue with the list -> subdivide -> spawn x
processes for n sublist items strategy; if an item in the sublist takes
longer than all the other items it blocks the entire resource allotment
until it is done.

In most cases, the plists/pmap implementation works just fine because
the items in the list probably don't take more than a few milliseconds
to map the fun over. However, it does become an issue when that is not
the case.

So, I figured the next best strategy would be to implement a process
pool since it would allow for slow running processes to continue their
work while finished processes can die and new processes spawned into the
pool ready for work - so none of the resources are sitting idle.

Right now, my module isn't nearly as feature-complete as the plists
module is - this is only a drop in replacement for map. Please submit
your criticisms and comments to me at this address.

You may find the code on BitBucket: https://bitbucket.org/ixmatus/ppool

- --
Parnell "ixmatus" Springmeyer (http://ixmat.us)
-----BEGIN PGP SIGNATURE-----
Version: GnuPG/MacGPG2 v2.0.17 (Darwin)
Comment: GPGTools - http://gpgtools.org

iQEcBAEBAgAGBQJOJmKwAAoJEPvtlbpI1POL+asIAKPcR0SOw67hFwwIbmkf89sS
4+Zx9hx1V/+86OVtXcqcOY+yxNcHezNEKkw8z2XHmDAWbeOl3bbINFySRXbQVydV
854lArqCHRG+ZlJ6ZrgecXKf9mG8ldbK1InwEZWOVZBj63rhmloMaGiyTzmxA88S
7mDNS4uhhpvRT2znpnsWt1x12IAzeayV0hf5/BLjp+b5FMZPc9oSa4n5uzyA9AVW
+av6hyuFfK32lhxUb4u3bVMaHOf2n/YwJexS25+NODcpkI3BLXNkrmKwgz8Lv/sA
omKzKTiuhpa0vTM+TLI9pn82GCJLdD+ON9DDOFN4ww+BnmXjhykiicBQCg7yhtQ=
=GP7K
-----END PGP SIGNATURE-----
_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions

===================================
Dmitrii Dimandt
[hidden email]

------------------------------------------------------------
Erlang in Russian
http://erlanger.ru/

TurkeyTPS
------------------------------------------------------------







_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: Process pool map/3 implementation

Parnell Springmeyer-2
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Thanks for pointing that out - I tried looking for stuff that was
already out there but couldn't find anything. Suppose I should have
asked first!!

Dmitrii Dimandt <[hidden email]> writes:

> There's also RabbitMQ's worker_pool, http://www.lshift.net/blog/2010/03
> /29/on-the-limits-of-concurrency-worker-pools-in-erlang,  and my feeble
> attempt at somewhat extending it: https://github.com/dmitriid/
> worker_pool
>
>     -----BEGIN PGP SIGNED MESSAGE-----
>     Hash: SHA1
>    
>     For a work project I have a large list (thousands of items) to
>     process
>     and at first built a "pmap" implementation as per Joe's book until
>     I
>     found the plists module (which is awesome btw).
>    
>     There is one glaring issue with the list -> subdivide -> spawn x
>     processes for n sublist items strategy; if an item in the sublist
>     takes
>     longer than all the other items it blocks the entire resource
>     allotment
>     until it is done.
>    
>     In most cases, the plists/pmap implementation works just fine
>     because
>     the items in the list probably don't take more than a few
>     milliseconds
>     to map the fun over. However, it does become an issue when that is
>     not
>     the case.
>    
>     So, I figured the next best strategy would be to implement a
>     process
>     pool since it would allow for slow running processes to continue
>     their
>     work while finished processes can die and new processes spawned
>     into the
>     pool ready for work - so none of the resources are sitting idle.
>    
>     Right now, my module isn't nearly as feature-complete as the plists
>     module is - this is only a drop in replacement for map. Please
>     submit
>     your criticisms and comments to me at this address.
>    
>     You may find the code on BitBucket: https://bitbucket.org/ixmatus/
>     ppool
>    
>     - --
>     Parnell "ixmatus" Springmeyer (http://ixmat.us)
>     -----BEGIN PGP SIGNATURE-----
>     Version: GnuPG/MacGPG2 v2.0.17 (Darwin)
>     Comment: GPGTools - http://gpgtools.org
>    
>     iQEcBAEBAgAGBQJOJmKwAAoJEPvtlbpI1POL+asIAKPcR0SOw67hFwwIbmkf89sS
>     4+Zx9hx1V/+86OVtXcqcOY+yxNcHezNEKkw8z2XHmDAWbeOl3bbINFySRXbQVydV
>     854lArqCHRG+ZlJ6ZrgecXKf9mG8ldbK1InwEZWOVZBj63rhmloMaGiyTzmxA88S
>     7mDNS4uhhpvRT2znpnsWt1x12IAzeayV0hf5/BLjp+b5FMZPc9oSa4n5uzyA9AVW
>     +av6hyuFfK32lhxUb4u3bVMaHOf2n/YwJexS25+NODcpkI3BLXNkrmKwgz8Lv/sA
>     omKzKTiuhpa0vTM+TLI9pn82GCJLdD+ON9DDOFN4ww+BnmXjhykiicBQCg7yhtQ=
>     =GP7K
>     -----END PGP SIGNATURE-----
>     _______________________________________________
>     erlang-questions mailing list
>     [hidden email]
>     http://erlang.org/mailman/listinfo/erlang-questions
>
> ===================================
> Dmitrii Dimandt
> [hidden email]
>
> ------------------------------------------------------------
> Erlang in Russian
> http://erlanger.ru/
>
> TurkeyTPS
> http://turkeytps.com/
> ------------------------------------------------------------
>
> LinkedIn: http://www.linkedin.com/in/dmitriid
> GitHub: https://github.com/dmitriid
>
> _______________________________________________
> erlang-questions mailing list
> [hidden email]
> http://erlang.org/mailman/listinfo/erlang-questions

- --
Parnell "ixmatus" Springmeyer (http://ixmat.us)
-----BEGIN PGP SIGNATURE-----
Version: GnuPG/MacGPG2 v2.0.17 (Darwin)
Comment: GPGTools - http://gpgtools.org

iQEcBAEBAgAGBQJOJwh8AAoJEPvtlbpI1POLeEoIAJClEPTB6onqGk/1uPvZNW6M
KQNYEjFTr3gm5ZP7oiQNcdlLbjSoOXtHMok6eZxOz2sKzNOiIcDa+7pbEbVvybPD
8WncYqKMS65CQQNMyG5AXyIOtRuCygLQgbSeSUEjuAprGUhTid9aklIeCIDMY1d6
tmdleHZZhvBKtONxnFtKJ7u+a2CzywNRBZ4BaKB+ThfvMnjNSlEM1IswujV8T6by
vuld6eHtZu6410ksM05PK6FiDObdUa728/3E6BSz16ZSag0QYafC9BDHJ0S11eSS
uazgs8Au6VK94tKLm/XE8f1JSNBMRKknWtbfuAHKZd46TLtI0KCntaAaT5VLAWk=
=7RDO
-----END PGP SIGNATURE-----
_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: Process pool map/3 implementation

Robert Virding-2
In reply to this post by Parnell Springmeyer-2
One quick question: what was wrong with the straightforward solution of just spawning one process for each element in the list? Did this break or do you actually need more control?

Robert

----- "Parnell Springmeyer" <[hidden email]> wrote:

> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> For a work project I have a large list (thousands of items) to
> process
> and at first built a "pmap" implementation as per Joe's book until I
> found the plists module (which is awesome btw).
>
> There is one glaring issue with the list -> subdivide -> spawn x
> processes for n sublist items strategy; if an item in the sublist
> takes
> longer than all the other items it blocks the entire resource
> allotment
> until it is done.
>
> In most cases, the plists/pmap implementation works just fine because
> the items in the list probably don't take more than a few
> milliseconds
> to map the fun over. However, it does become an issue when that is
> not
> the case.
>
> So, I figured the next best strategy would be to implement a process
> pool since it would allow for slow running processes to continue
> their
> work while finished processes can die and new processes spawned into
> the
> pool ready for work - so none of the resources are sitting idle.
>
> Right now, my module isn't nearly as feature-complete as the plists
> module is - this is only a drop in replacement for map. Please submit
> your criticisms and comments to me at this address.
>
> You may find the code on BitBucket:
> https://bitbucket.org/ixmatus/ppool
>
> - --
> Parnell "ixmatus" Springmeyer (http://ixmat.us)
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG/MacGPG2 v2.0.17 (Darwin)
> Comment: GPGTools - http://gpgtools.org
>
> iQEcBAEBAgAGBQJOJmKwAAoJEPvtlbpI1POL+asIAKPcR0SOw67hFwwIbmkf89sS
> 4+Zx9hx1V/+86OVtXcqcOY+yxNcHezNEKkw8z2XHmDAWbeOl3bbINFySRXbQVydV
> 854lArqCHRG+ZlJ6ZrgecXKf9mG8ldbK1InwEZWOVZBj63rhmloMaGiyTzmxA88S
> 7mDNS4uhhpvRT2znpnsWt1x12IAzeayV0hf5/BLjp+b5FMZPc9oSa4n5uzyA9AVW
> +av6hyuFfK32lhxUb4u3bVMaHOf2n/YwJexS25+NODcpkI3BLXNkrmKwgz8Lv/sA
> omKzKTiuhpa0vTM+TLI9pn82GCJLdD+ON9DDOFN4ww+BnmXjhykiicBQCg7yhtQ=
> =GP7K
> -----END PGP SIGNATURE-----
> _______________________________________________
> erlang-questions mailing list
> [hidden email]
> http://erlang.org/mailman/listinfo/erlang-questions
_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: Process pool map/3 implementation

dmercer
I was curious about that, too.  Hoping you'll get a response...

> -----Original Message-----
> From: [hidden email] [mailto:erlang-questions-
> [hidden email]] On Behalf Of Robert Virding
> Sent: Wednesday, July 20, 2011 8:42 PM
> To: Parnell Springmeyer
> Cc: erlang-questions
> Subject: Re: [erlang-questions] Process pool map/3 implementation
>
> One quick question: what was wrong with the straightforward solution of
> just spawning one process for each element in the list? Did this break
> or do you actually need more control?
>
> Robert
>
> ----- "Parnell Springmeyer" <[hidden email]> wrote:
>
> > -----BEGIN PGP SIGNED MESSAGE-----
> > Hash: SHA1
> >
> > For a work project I have a large list (thousands of items) to
> > process
> > and at first built a "pmap" implementation as per Joe's book until I
> > found the plists module (which is awesome btw).
> >
> > There is one glaring issue with the list -> subdivide -> spawn x
> > processes for n sublist items strategy; if an item in the sublist
> > takes
> > longer than all the other items it blocks the entire resource
> > allotment
> > until it is done.
> >
> > In most cases, the plists/pmap implementation works just fine because
> > the items in the list probably don't take more than a few
> > milliseconds
> > to map the fun over. However, it does become an issue when that is
> > not
> > the case.
> >
> > So, I figured the next best strategy would be to implement a process
> > pool since it would allow for slow running processes to continue
> > their
> > work while finished processes can die and new processes spawned into
> > the
> > pool ready for work - so none of the resources are sitting idle.
> >
> > Right now, my module isn't nearly as feature-complete as the plists
> > module is - this is only a drop in replacement for map. Please submit
> > your criticisms and comments to me at this address.
> >
> > You may find the code on BitBucket:
> > https://bitbucket.org/ixmatus/ppool
> >
> > - --
> > Parnell "ixmatus" Springmeyer (http://ixmat.us)
> > -----BEGIN PGP SIGNATURE-----
> > Version: GnuPG/MacGPG2 v2.0.17 (Darwin)
> > Comment: GPGTools - http://gpgtools.org
> >
> > iQEcBAEBAgAGBQJOJmKwAAoJEPvtlbpI1POL+asIAKPcR0SOw67hFwwIbmkf89sS
> > 4+Zx9hx1V/+86OVtXcqcOY+yxNcHezNEKkw8z2XHmDAWbeOl3bbINFySRXbQVydV
> > 854lArqCHRG+ZlJ6ZrgecXKf9mG8ldbK1InwEZWOVZBj63rhmloMaGiyTzmxA88S
> > 7mDNS4uhhpvRT2znpnsWt1x12IAzeayV0hf5/BLjp+b5FMZPc9oSa4n5uzyA9AVW
> > +av6hyuFfK32lhxUb4u3bVMaHOf2n/YwJexS25+NODcpkI3BLXNkrmKwgz8Lv/sA
> > omKzKTiuhpa0vTM+TLI9pn82GCJLdD+ON9DDOFN4ww+BnmXjhykiicBQCg7yhtQ=
> > =GP7K
> > -----END PGP SIGNATURE-----
> > _______________________________________________
> > erlang-questions mailing list
> > [hidden email]
> > http://erlang.org/mailman/listinfo/erlang-questions
> _______________________________________________
> erlang-questions mailing list
> [hidden email]
> http://erlang.org/mailman/listinfo/erlang-questions

_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: Process pool map/3 implementation

Parnell Springmeyer-2
Because the list has about 3000 items in it, and for each item about 20-50 HTTP requests are made; I needed a way of parallelizing the operations (instead of stepping through the list one by one) but in a controlled fashion and using a round robin strategy (worker pool).

On Fri, Jul 22, 2011 at 6:10 AM, David Mercer <[hidden email]> wrote:
I was curious about that, too.  Hoping you'll get a response...

> -----Original Message-----
> From: [hidden email] [mailto:[hidden email]
> [hidden email]] On Behalf Of Robert Virding
> Sent: Wednesday, July 20, 2011 8:42 PM
> To: Parnell Springmeyer
> Cc: erlang-questions
> Subject: Re: [erlang-questions] Process pool map/3 implementation
>
> One quick question: what was wrong with the straightforward solution of
> just spawning one process for each element in the list? Did this break
> or do you actually need more control?
>
> Robert
>
> ----- "Parnell Springmeyer" <[hidden email]> wrote:
>
> > -----BEGIN PGP SIGNED MESSAGE-----
> > Hash: SHA1
> >
> > For a work project I have a large list (thousands of items) to
> > process
> > and at first built a "pmap" implementation as per Joe's book until I
> > found the plists module (which is awesome btw).
> >
> > There is one glaring issue with the list -> subdivide -> spawn x
> > processes for n sublist items strategy; if an item in the sublist
> > takes
> > longer than all the other items it blocks the entire resource
> > allotment
> > until it is done.
> >
> > In most cases, the plists/pmap implementation works just fine because
> > the items in the list probably don't take more than a few
> > milliseconds
> > to map the fun over. However, it does become an issue when that is
> > not
> > the case.
> >
> > So, I figured the next best strategy would be to implement a process
> > pool since it would allow for slow running processes to continue
> > their
> > work while finished processes can die and new processes spawned into
> > the
> > pool ready for work - so none of the resources are sitting idle.
> >
> > Right now, my module isn't nearly as feature-complete as the plists
> > module is - this is only a drop in replacement for map. Please submit
> > your criticisms and comments to me at this address.
> >
> > You may find the code on BitBucket:
> > https://bitbucket.org/ixmatus/ppool
> >
> > - --
> > Parnell "ixmatus" Springmeyer (http://ixmat.us)
> > -----BEGIN PGP SIGNATURE-----
> > Version: GnuPG/MacGPG2 v2.0.17 (Darwin)
> > Comment: GPGTools - http://gpgtools.org
> >
> > iQEcBAEBAgAGBQJOJmKwAAoJEPvtlbpI1POL+asIAKPcR0SOw67hFwwIbmkf89sS
> > 4+Zx9hx1V/+86OVtXcqcOY+yxNcHezNEKkw8z2XHmDAWbeOl3bbINFySRXbQVydV
> > 854lArqCHRG+ZlJ6ZrgecXKf9mG8ldbK1InwEZWOVZBj63rhmloMaGiyTzmxA88S
> > 7mDNS4uhhpvRT2znpnsWt1x12IAzeayV0hf5/BLjp+b5FMZPc9oSa4n5uzyA9AVW
> > +av6hyuFfK32lhxUb4u3bVMaHOf2n/YwJexS25+NODcpkI3BLXNkrmKwgz8Lv/sA
> > omKzKTiuhpa0vTM+TLI9pn82GCJLdD+ON9DDOFN4ww+BnmXjhykiicBQCg7yhtQ=
> > =GP7K
> > -----END PGP SIGNATURE-----
> > _______________________________________________
> > erlang-questions mailing list
> > [hidden email]
> > http://erlang.org/mailman/listinfo/erlang-questions
> _______________________________________________
> erlang-questions mailing list
> [hidden email]
> http://erlang.org/mailman/listinfo/erlang-questions




--
Parnell "ixmatus" Springmeyer (http://ixmat.us)

_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: Process pool map/3 implementation

Robert Virding-2
But a pmap SHOULD start processes for all the elements in the list in parallel. It is after all a 'P'map. In which case all the processes will be running and processing in parallel as you want. The only reason I can see for using a worker pool is if you actually want to LIMIT the number of processes running at the same time.

IMAO in Erlang there are only two reasons for using worker/process pools:

- you want/need to limit the number of "things" running in parallel
- you actually do want to reuse a process for another computation, there is something in the application which mandates reusing processes.

Otherwise it is just extra work, process creation/termination is so fast that there is no real gain in keeping them around to reuse.

Robert

----- "Parnell Springmeyer" <[hidden email]> wro te:

> Because the list has about 3000 items in it, and for each item about 20-50 HTTP requests are made; I needed a way of parallelizing the operations (instead of stepping through the list one by one) but in a controlled fashion and using a round robin strategy (worker pool).

>
> On Fri, Jul 22, 2011 at 6:10 AM, David Mercer <[hidden email]> wrote:
>
I was curious about that, too.  Hoping you'll get a response...
>
> > -----Original Message-----
> > From: [hidden email] [mailto:[hidden email]
> > [hidden email]] On Behalf Of Robert Virding
> > Sent: Wednesday, July 20, 2011 8:42 PM
> > To: Parnell Springmeyer
> > Cc: erlang-questions
> > Subject: Re: [erlang-questions] Process pool map/3 implementation
> >
> > One quick question: what was wrong with the straightforward solution of
> > just spawning one process for each element in the list? Did this break
> > or do you actually need more control?
> >
> > Robert
>
> >
> > ----- "Parnell Springmeyer" <[hidden email]> wrote:
> >
> > > -----BEGIN PGP SIGNED MESSAGE-----
> > > Hash: SHA1
> > >
> > > For a work project I have a large list (thousands of items) to
> > > process
> > > and at first built a "pmap" implementation as per Joe's book until I
> > > found the plists module (which is awesome btw).
> > >
> > > There is one glaring issue with the list -> subdivide -> spawn x
> > > processes for n sublist items strategy; if an item in the sublist
> > > takes
> > > longer than all the other items it blocks the entire resource
> > > allotment
> > > until it is done.
> > >
> > > In most cases, the plists/pmap implementation works just fine because
> > > the items in the list probably don't take more than a few
> > > milliseconds
> > > to map the fun over. However, it does become an issue when that is
> > > not
> > > the case.
> > >
> > > So, I figured the next best strategy would be to implement a process
> > > pool since it would allow for slow running processes to continue
> > > their
> > > work while finished processes can die and new processes spawned into
> > > the
> > > pool ready for work - so none of the resources are sitting idle.
> > >
> > > Right now, my module isn't nearly as feature-complete as the plists
> > > module is - this is only a drop in replacement for map. Please submit
> > > your criticisms and comments to me at this address.
> > >
> > > You may find the code on BitBucket:
> > > https://bitbucket.org/ixmatus/ppool
> > >
> > > - --
> > > Parnell "ixmatus" Springmeyer (http://ixmat.us)
> > > -----BEGIN PGP SIGNATURE-----
> > > Version: GnuPG/MacGPG2 v2.0.17 (Darwin)
> > > Comment: GPGTools - http://gpgtools.org
> > >
> > > iQEcBAEBAgAGBQJOJmKwAAoJEPvtlbpI1POL+asIAKPcR0SOw67hFwwIbmkf89sS
> > > 4+Zx9hx1V/+86OVtXcqcOY+yxNcHezNEKkw8z2XHmDAWbeOl3bbINFySRXbQVydV
> > > 854lArqCHRG+ZlJ6ZrgecXKf9mG8ldbK1InwEZWOVZBj63rhmloMaGiyTzmxA88S
> > > 7mDNS4uhhpvRT2znpnsWt1x12IAzeayV0hf5/BLjp+b5FMZPc9oSa4n5uzyA9AVW
> > > +av6hyuFfK32lhxUb4u3bVMaHOf2n/YwJexS25+NODcpkI3BLXNkrmKwgz8Lv/sA
> > > omKzKTiuhpa0vTM+TLI9pn82GCJLdD+ON9DDOFN4ww+BnmXjhykiicBQCg7yhtQ=
> > > =GP7K
> > > -----END PGP SIGNATURE-----
>
> > > _______________________________________________
> > > erlang-questions mailing list
> > > [hidden email]
> > > http://erlang.org/mailman/listinfo/erlang-questions
> > _______________________________________________
> > erlang-questions mailing list
> > [hidden email]
> > http://erlang.org/mailman/listinfo/erlang-questions
>
>

>

> --
> Parnell "ixmatus" Springmeyer (http://ixmat.us)
>

> _______________________________________________ erlang-questions mailing list [hidden email] http://erlang.org/mailman/listinfo/erlang-questions

_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: Process pool map/3 implementation

Parnell Springmeyer-2
Robert,

In my ppool implementation I re-create them, I don't recycle. I call it a "pool" because instead of breaking off "hunks" (sublists) then spawning x processes for each item in the sublist - if one process takes longer than the others, those resources sit idle till that one is done, then the subdivision process starts again etc...

The "pool" implementation lets me delegate in a round robin fashion, I don't recycle, I create new processes as other processes finish to keep at most x number of processes working until the job is done.

pmap in my use case would be "bad". Very bad. 3000 items in a list with about 1400 of those items making > 20 HTTP requests (the rest doing about 3 to 4 requests) would completely tank the machine and would also be irresponsible crawling. But, I *do* want to do the work in parallel - just not at that scale; so using a process pool strategy I limit the number of concurrent crawl workers to about 5 or 6; which is effective on the machine.

On Fri, Jul 22, 2011 at 5:10 PM, Robert Virding <[hidden email]> wrote:
But a pmap SHOULD start processes for all the elements in the list in parallel. It is after all a 'P'map. In which case all the processes will be running and processing in parallel as you want. The only reason I can see for using a worker pool is if you actually want to LIMIT the number of processes running at the same time.

IMAO in Erlang there are only two reasons for using worker/process pools:

- you want/need to limit the number of "things" running in parallel
- you actually do want to reuse a process for another computation, there is something in the application which mandates reusing processes.

Otherwise it is just extra work, process creation/termination is so fast that there is no real gain in keeping them around to reuse.


Robert

----- "Parnell Springmeyer" <[hidden email]> wro te:

> Because the list has about 3000 items in it, and for each item about 20-50 HTTP requests are made; I needed a way of parallelizing the operations (instead of stepping through the list one by one) but in a controlled fashion and using a round robin strategy (worker pool).

>
> On Fri, Jul 22, 2011 at 6:10 AM, David Mercer <[hidden email]> wrote:
>
I was curious about that, too.  Hoping you'll get a response...
>
> > -----Original Message-----
> > From: [hidden email] [mailto:[hidden email]
> > [hidden email]] On Behalf Of Robert Virding
> > Sent: Wednesday, July 20, 2011 8:42 PM
> > To: Parnell Springmeyer
> > Cc: erlang-questions
> > Subject: Re: [erlang-questions] Process pool map/3 implementation
> >
> > One quick question: what was wrong with the straightforward solution of
> > just spawning one process for each element in the list? Did this break
> > or do you actually need more control?
> >
> > Robert
>
> >
> > ----- "Parnell Springmeyer" <[hidden email]> wrote:
> >
> > > -----BEGIN PGP SIGNED MESSAGE-----
> > > Hash: SHA1
> > >
> > > For a work project I have a large list (thousands of items) to
> > > process
> > > and at first built a "pmap" implementation as per Joe's book until I
> > > found the plists module (which is awesome btw).
> > >
> > > There is one glaring issue with the list -> subdivide -> spawn x
> > > processes for n sublist items strategy; if an item in the sublist
> > > takes
> > > longer than all the other items it blocks the entire resource
> > > allotment
> > > until it is done.
> > >
> > > In most cases, the plists/pmap implementation works just fine because
> > > the items in the list probably don't take more than a few
> > > milliseconds
> > > to map the fun over. However, it does become an issue when that is
> > > not
> > > the case.
> > >
> > > So, I figured the next best strategy would be to implement a process
> > > pool since it would allow for slow running processes to continue
> > > their
> > > work while finished processes can die and new processes spawned into
> > > the
> > > pool ready for work - so none of the resources are sitting idle.
> > >
> > > Right now, my module isn't nearly as feature-complete as the plists
> > > module is - this is only a drop in replacement for map. Please submit
> > > your criticisms and comments to me at this address.
> > >
> > > You may find the code on BitBucket:
> > > https://bitbucket.org/ixmatus/ppool
> > >
> > > - --
> > > Parnell "ixmatus" Springmeyer (http://ixmat.us)
> > > -----BEGIN PGP SIGNATURE-----
> > > Version: GnuPG/MacGPG2 v2.0.17 (Darwin)
> > > Comment: GPGTools - http://gpgtools.org
> > >
> > > iQEcBAEBAgAGBQJOJmKwAAoJEPvtlbpI1POL+asIAKPcR0SOw67hFwwIbmkf89sS
> > > 4+Zx9hx1V/+86OVtXcqcOY+yxNcHezNEKkw8z2XHmDAWbeOl3bbINFySRXbQVydV
> > > 854lArqCHRG+ZlJ6ZrgecXKf9mG8ldbK1InwEZWOVZBj63rhmloMaGiyTzmxA88S
> > > 7mDNS4uhhpvRT2znpnsWt1x12IAzeayV0hf5/BLjp+b5FMZPc9oSa4n5uzyA9AVW
> > > +av6hyuFfK32lhxUb4u3bVMaHOf2n/YwJexS25+NODcpkI3BLXNkrmKwgz8Lv/sA
> > > omKzKTiuhpa0vTM+TLI9pn82GCJLdD+ON9DDOFN4ww+BnmXjhykiicBQCg7yhtQ=
> > > =GP7K
> > > -----END PGP SIGNATURE-----
>
> > > _______________________________________________
> > > erlang-questions mailing list
> > > [hidden email]
> > > http://erlang.org/mailman/listinfo/erlang-questions
> > _______________________________________________
> > erlang-questions mailing list
> > [hidden email]
> > http://erlang.org/mailman/listinfo/erlang-questions
>
>

>

> --
> Parnell "ixmatus" Springmeyer (http://ixmat.us)
>

> _______________________________________________ erlang-questions mailing list [hidden email] http://erlang.org/mailman/listinfo/erlang-questions



--
Parnell "ixmatus" Springmeyer (http://ixmat.us)

_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions