|
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1 For a work project I have a large list (thousands of items) to process and at first built a "pmap" implementation as per Joe's book until I found the plists module (which is awesome btw). There is one glaring issue with the list -> subdivide -> spawn x processes for n sublist items strategy; if an item in the sublist takes longer than all the other items it blocks the entire resource allotment until it is done. In most cases, the plists/pmap implementation works just fine because the items in the list probably don't take more than a few milliseconds to map the fun over. However, it does become an issue when that is not the case. So, I figured the next best strategy would be to implement a process pool since it would allow for slow running processes to continue their work while finished processes can die and new processes spawned into the pool ready for work - so none of the resources are sitting idle. Right now, my module isn't nearly as feature-complete as the plists module is - this is only a drop in replacement for map. Please submit your criticisms and comments to me at this address. You may find the code on BitBucket: https://bitbucket.org/ixmatus/ppool - -- Parnell "ixmatus" Springmeyer (http://ixmat.us) -----BEGIN PGP SIGNATURE----- Version: GnuPG/MacGPG2 v2.0.17 (Darwin) Comment: GPGTools - http://gpgtools.org iQEcBAEBAgAGBQJOJmKwAAoJEPvtlbpI1POL+asIAKPcR0SOw67hFwwIbmkf89sS 4+Zx9hx1V/+86OVtXcqcOY+yxNcHezNEKkw8z2XHmDAWbeOl3bbINFySRXbQVydV 854lArqCHRG+ZlJ6ZrgecXKf9mG8ldbK1InwEZWOVZBj63rhmloMaGiyTzmxA88S 7mDNS4uhhpvRT2znpnsWt1x12IAzeayV0hf5/BLjp+b5FMZPc9oSa4n5uzyA9AVW +av6hyuFfK32lhxUb4u3bVMaHOf2n/YwJexS25+NODcpkI3BLXNkrmKwgz8Lv/sA omKzKTiuhpa0vTM+TLI9pn82GCJLdD+ON9DDOFN4ww+BnmXjhykiicBQCg7yhtQ= =GP7K -----END PGP SIGNATURE----- _______________________________________________ erlang-questions mailing list [hidden email] http://erlang.org/mailman/listinfo/erlang-questions |
|
There's also RabbitMQ's worker_pool, http://www.lshift.net/blog/2010/03/29/on-the-limits-of-concurrency-worker-pools-in-erlang, and my feeble attempt at somewhat extending it: https://github.com/dmitriid/worker_pool
=================================== Dmitrii Dimandt [hidden email] ------------------------------------------------------------ Erlang in Russian http://erlanger.ru/ TurkeyTPS ------------------------------------------------------------ LinkedIn: http://www.linkedin.com/in/dmitriid GitHub: https://github.com/dmitriid _______________________________________________ erlang-questions mailing list [hidden email] http://erlang.org/mailman/listinfo/erlang-questions |
|
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1 Thanks for pointing that out - I tried looking for stuff that was already out there but couldn't find anything. Suppose I should have asked first!! Dmitrii Dimandt <[hidden email]> writes: > There's also RabbitMQ's worker_pool, http://www.lshift.net/blog/2010/03 > /29/on-the-limits-of-concurrency-worker-pools-in-erlang, and my feeble > attempt at somewhat extending it: https://github.com/dmitriid/ > worker_pool > > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > For a work project I have a large list (thousands of items) to > process > and at first built a "pmap" implementation as per Joe's book until > I > found the plists module (which is awesome btw). > > There is one glaring issue with the list -> subdivide -> spawn x > processes for n sublist items strategy; if an item in the sublist > takes > longer than all the other items it blocks the entire resource > allotment > until it is done. > > In most cases, the plists/pmap implementation works just fine > because > the items in the list probably don't take more than a few > milliseconds > to map the fun over. However, it does become an issue when that is > not > the case. > > So, I figured the next best strategy would be to implement a > process > pool since it would allow for slow running processes to continue > their > work while finished processes can die and new processes spawned > into the > pool ready for work - so none of the resources are sitting idle. > > Right now, my module isn't nearly as feature-complete as the plists > module is - this is only a drop in replacement for map. Please > submit > your criticisms and comments to me at this address. > > You may find the code on BitBucket: https://bitbucket.org/ixmatus/ > ppool > > - -- > Parnell "ixmatus" Springmeyer (http://ixmat.us) > -----BEGIN PGP SIGNATURE----- > Version: GnuPG/MacGPG2 v2.0.17 (Darwin) > Comment: GPGTools - http://gpgtools.org > > iQEcBAEBAgAGBQJOJmKwAAoJEPvtlbpI1POL+asIAKPcR0SOw67hFwwIbmkf89sS > 4+Zx9hx1V/+86OVtXcqcOY+yxNcHezNEKkw8z2XHmDAWbeOl3bbINFySRXbQVydV > 854lArqCHRG+ZlJ6ZrgecXKf9mG8ldbK1InwEZWOVZBj63rhmloMaGiyTzmxA88S > 7mDNS4uhhpvRT2znpnsWt1x12IAzeayV0hf5/BLjp+b5FMZPc9oSa4n5uzyA9AVW > +av6hyuFfK32lhxUb4u3bVMaHOf2n/YwJexS25+NODcpkI3BLXNkrmKwgz8Lv/sA > omKzKTiuhpa0vTM+TLI9pn82GCJLdD+ON9DDOFN4ww+BnmXjhykiicBQCg7yhtQ= > =GP7K > -----END PGP SIGNATURE----- > _______________________________________________ > erlang-questions mailing list > [hidden email] > http://erlang.org/mailman/listinfo/erlang-questions > > =================================== > Dmitrii Dimandt > [hidden email] > > ------------------------------------------------------------ > Erlang in Russian > http://erlanger.ru/ > > TurkeyTPS > http://turkeytps.com/ > ------------------------------------------------------------ > > LinkedIn: http://www.linkedin.com/in/dmitriid > GitHub: https://github.com/dmitriid > > _______________________________________________ > erlang-questions mailing list > [hidden email] > http://erlang.org/mailman/listinfo/erlang-questions - -- Parnell "ixmatus" Springmeyer (http://ixmat.us) -----BEGIN PGP SIGNATURE----- Version: GnuPG/MacGPG2 v2.0.17 (Darwin) Comment: GPGTools - http://gpgtools.org iQEcBAEBAgAGBQJOJwh8AAoJEPvtlbpI1POLeEoIAJClEPTB6onqGk/1uPvZNW6M KQNYEjFTr3gm5ZP7oiQNcdlLbjSoOXtHMok6eZxOz2sKzNOiIcDa+7pbEbVvybPD 8WncYqKMS65CQQNMyG5AXyIOtRuCygLQgbSeSUEjuAprGUhTid9aklIeCIDMY1d6 tmdleHZZhvBKtONxnFtKJ7u+a2CzywNRBZ4BaKB+ThfvMnjNSlEM1IswujV8T6by vuld6eHtZu6410ksM05PK6FiDObdUa728/3E6BSz16ZSag0QYafC9BDHJ0S11eSS uazgs8Au6VK94tKLm/XE8f1JSNBMRKknWtbfuAHKZd46TLtI0KCntaAaT5VLAWk= =7RDO -----END PGP SIGNATURE----- _______________________________________________ erlang-questions mailing list [hidden email] http://erlang.org/mailman/listinfo/erlang-questions |
|
In reply to this post by Parnell Springmeyer-2
One quick question: what was wrong with the straightforward solution of just spawning one process for each element in the list? Did this break or do you actually need more control?
Robert ----- "Parnell Springmeyer" <[hidden email]> wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > For a work project I have a large list (thousands of items) to > process > and at first built a "pmap" implementation as per Joe's book until I > found the plists module (which is awesome btw). > > There is one glaring issue with the list -> subdivide -> spawn x > processes for n sublist items strategy; if an item in the sublist > takes > longer than all the other items it blocks the entire resource > allotment > until it is done. > > In most cases, the plists/pmap implementation works just fine because > the items in the list probably don't take more than a few > milliseconds > to map the fun over. However, it does become an issue when that is > not > the case. > > So, I figured the next best strategy would be to implement a process > pool since it would allow for slow running processes to continue > their > work while finished processes can die and new processes spawned into > the > pool ready for work - so none of the resources are sitting idle. > > Right now, my module isn't nearly as feature-complete as the plists > module is - this is only a drop in replacement for map. Please submit > your criticisms and comments to me at this address. > > You may find the code on BitBucket: > https://bitbucket.org/ixmatus/ppool > > - -- > Parnell "ixmatus" Springmeyer (http://ixmat.us) > -----BEGIN PGP SIGNATURE----- > Version: GnuPG/MacGPG2 v2.0.17 (Darwin) > Comment: GPGTools - http://gpgtools.org > > iQEcBAEBAgAGBQJOJmKwAAoJEPvtlbpI1POL+asIAKPcR0SOw67hFwwIbmkf89sS > 4+Zx9hx1V/+86OVtXcqcOY+yxNcHezNEKkw8z2XHmDAWbeOl3bbINFySRXbQVydV > 854lArqCHRG+ZlJ6ZrgecXKf9mG8ldbK1InwEZWOVZBj63rhmloMaGiyTzmxA88S > 7mDNS4uhhpvRT2znpnsWt1x12IAzeayV0hf5/BLjp+b5FMZPc9oSa4n5uzyA9AVW > +av6hyuFfK32lhxUb4u3bVMaHOf2n/YwJexS25+NODcpkI3BLXNkrmKwgz8Lv/sA > omKzKTiuhpa0vTM+TLI9pn82GCJLdD+ON9DDOFN4ww+BnmXjhykiicBQCg7yhtQ= > =GP7K > -----END PGP SIGNATURE----- > _______________________________________________ > erlang-questions mailing list > [hidden email] > http://erlang.org/mailman/listinfo/erlang-questions erlang-questions mailing list [hidden email] http://erlang.org/mailman/listinfo/erlang-questions |
|
I was curious about that, too. Hoping you'll get a response...
> -----Original Message----- > From: [hidden email] [mailto:erlang-questions- > [hidden email]] On Behalf Of Robert Virding > Sent: Wednesday, July 20, 2011 8:42 PM > To: Parnell Springmeyer > Cc: erlang-questions > Subject: Re: [erlang-questions] Process pool map/3 implementation > > One quick question: what was wrong with the straightforward solution of > just spawning one process for each element in the list? Did this break > or do you actually need more control? > > Robert > > ----- "Parnell Springmeyer" <[hidden email]> wrote: > > > -----BEGIN PGP SIGNED MESSAGE----- > > Hash: SHA1 > > > > For a work project I have a large list (thousands of items) to > > process > > and at first built a "pmap" implementation as per Joe's book until I > > found the plists module (which is awesome btw). > > > > There is one glaring issue with the list -> subdivide -> spawn x > > processes for n sublist items strategy; if an item in the sublist > > takes > > longer than all the other items it blocks the entire resource > > allotment > > until it is done. > > > > In most cases, the plists/pmap implementation works just fine because > > the items in the list probably don't take more than a few > > milliseconds > > to map the fun over. However, it does become an issue when that is > > not > > the case. > > > > So, I figured the next best strategy would be to implement a process > > pool since it would allow for slow running processes to continue > > their > > work while finished processes can die and new processes spawned into > > the > > pool ready for work - so none of the resources are sitting idle. > > > > Right now, my module isn't nearly as feature-complete as the plists > > module is - this is only a drop in replacement for map. Please submit > > your criticisms and comments to me at this address. > > > > You may find the code on BitBucket: > > https://bitbucket.org/ixmatus/ppool > > > > - -- > > Parnell "ixmatus" Springmeyer (http://ixmat.us) > > -----BEGIN PGP SIGNATURE----- > > Version: GnuPG/MacGPG2 v2.0.17 (Darwin) > > Comment: GPGTools - http://gpgtools.org > > > > iQEcBAEBAgAGBQJOJmKwAAoJEPvtlbpI1POL+asIAKPcR0SOw67hFwwIbmkf89sS > > 4+Zx9hx1V/+86OVtXcqcOY+yxNcHezNEKkw8z2XHmDAWbeOl3bbINFySRXbQVydV > > 854lArqCHRG+ZlJ6ZrgecXKf9mG8ldbK1InwEZWOVZBj63rhmloMaGiyTzmxA88S > > 7mDNS4uhhpvRT2znpnsWt1x12IAzeayV0hf5/BLjp+b5FMZPc9oSa4n5uzyA9AVW > > +av6hyuFfK32lhxUb4u3bVMaHOf2n/YwJexS25+NODcpkI3BLXNkrmKwgz8Lv/sA > > omKzKTiuhpa0vTM+TLI9pn82GCJLdD+ON9DDOFN4ww+BnmXjhykiicBQCg7yhtQ= > > =GP7K > > -----END PGP SIGNATURE----- > > _______________________________________________ > > erlang-questions mailing list > > [hidden email] > > http://erlang.org/mailman/listinfo/erlang-questions > _______________________________________________ > erlang-questions mailing list > [hidden email] > http://erlang.org/mailman/listinfo/erlang-questions _______________________________________________ erlang-questions mailing list [hidden email] http://erlang.org/mailman/listinfo/erlang-questions |
|
Because the list has about 3000 items in it, and for each item about 20-50 HTTP requests are made; I needed a way of parallelizing the operations (instead of stepping through the list one by one) but in a controlled fashion and using a round robin strategy (worker pool).
On Fri, Jul 22, 2011 at 6:10 AM, David Mercer <[hidden email]> wrote: I was curious about that, too. Hoping you'll get a response... -- Parnell "ixmatus" Springmeyer (http://ixmat.us) _______________________________________________ erlang-questions mailing list [hidden email] http://erlang.org/mailman/listinfo/erlang-questions |
|
But a pmap SHOULD start processes for all the elements in the list in parallel. It is after all a 'P'map. In which case all the processes will be running and processing in parallel as you want. The only reason I can see for using a worker pool is if you actually want to LIMIT the number of processes running at the same time. IMAO in Erlang there are only two reasons for using worker/process pools: - you want/need to limit the number of "things" running in parallel - you actually do want to reuse a process for another computation, there is something in the application which mandates reusing processes. Otherwise it is just extra work, process creation/termination is so fast that there is no real gain in keeping them around to reuse. Robert ----- "Parnell Springmeyer" <[hidden email]> wro te: > Because the list has about 3000 items in it, and for each item about 20-50 HTTP requests are made; I needed a way of parallelizing the operations (instead of stepping through the list one by one) but in a controlled fashion and using a round robin strategy (worker pool). > > On Fri, Jul 22, 2011 at 6:10 AM, David Mercer <[hidden email]> wrote: > I was curious about that, too. Hoping you'll get a response... > > -- > Parnell "ixmatus" Springmeyer (http://ixmat.us) > > _______________________________________________ erlang-questions mailing list [hidden email] http://erlang.org/mailman/listinfo/erlang-questions _______________________________________________ erlang-questions mailing list [hidden email] http://erlang.org/mailman/listinfo/erlang-questions |
|
Robert,
In my ppool implementation I re-create them, I don't recycle. I call it a "pool" because instead of breaking off "hunks" (sublists) then spawning x processes for each item in the sublist - if one process takes longer than the others, those resources sit idle till that one is done, then the subdivision process starts again etc...
The "pool" implementation lets me delegate in a round robin fashion, I don't recycle, I create new processes as other processes finish to keep at most x number of processes working until the job is done.
pmap in my use case would be "bad". Very bad. 3000 items in a list with about 1400 of those items making > 20 HTTP requests (the rest doing about 3 to 4 requests) would completely tank the machine and would also be irresponsible crawling. But, I *do* want to do the work in parallel - just not at that scale; so using a process pool strategy I limit the number of concurrent crawl workers to about 5 or 6; which is effective on the machine.
On Fri, Jul 22, 2011 at 5:10 PM, Robert Virding <[hidden email]> wrote:
-- Parnell "ixmatus" Springmeyer (http://ixmat.us) _______________________________________________ erlang-questions mailing list [hidden email] http://erlang.org/mailman/listinfo/erlang-questions |
| Powered by Nabble | Edit this page |
